> -----Original Message-----
> From: Michal Privoznik [mailto:mprivozn@redhat.com]
> Sent: Monday, March 5, 2018 8:09 PM
> To: Cordius Wu; 'Wuzongyong (Euler Dept)'; libvir-list(a)redhat.com
> Cc: 'Wanzongshun (Vincent)'; 'weijinfen'
> Subject: Re: [libvirt] [Question]Libvirt doesn't care about qemu monitor
> event if fail to destroy qemu process
>
> On 03/05/2018 12:43 PM, Cordius Wu wrote:
>>>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
>>>>>> Hi,
>>>>>>
>>>>>> We unregister qemu monitor after sending
>>>>>> QEMU_PROCESS_EVENT_MONITOR_EOF
>>>>> to workerPool:
>>>>>>
>>>>>> static void
>>>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
>>>>>> virDomainObjPtr vm,
>>>>>> void *opaque) {
>>>>>> virQEMUDriverPtr driver = opaque;
>>>>>> qemuDomainObjPrivatePtr priv;
>>>>>> struct qemuProcessEvent *processEvent; ...
>>>>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF;
>>>>>> processEvent->vm = vm;
>>>>>>
>>>>>> virObjectRef(vm);
>>>>>> if (virThreadPoolSendJob(driver->workerPool, 0,
processEvent)
>>>>>> < 0)
>> {
>>>>>> ignore_value(virObjectUnref(vm));
>>>>>> VIR_FREE(processEvent);
>>>>>> goto cleanup;
>>>>>> }
>>>>>>
>>>>>> /* We don't want this EOF handler to be called over and
over
>>>>>> while
>>>>> the
>>>>>> * thread is waiting for a job.
>>>>>> */
>>>>>> qemuMonitorUnregister(mon);
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in
>>>>>> processMonitorEOFEvent
>>>>> function:
>>>>>>
>>>>>> static void
>>>>>> processMonitorEOFEvent(virQEMUDriverPtr driver,
>>>>>> virDomainObjPtr vm) {
>>>>>> ...
>>>>>> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY,
>>>>>> true) <
>>>>> 0)
>>>>>> return;
>>>>>> ...
>>>>>> }
>>>>>>
>>>>>> Here, libvirt will show that the vm state is running all the
time
>>>>>> if qemuProcessBeginStopJob return -1 even though qemu may
>>>>>> terminate or be
>>>>> killed later.
>>>>>>
>>>>>> So, may be we should re-register the monitor when
>>>>> qemuProcessBeginStopJob failed?
>>>>>
>>>>> The fact that processMonitorEOFEvent() failed to grab DESTROY job
>>>>> means that we screwed up earlier and now you're just seeing
effects
> of it.
>>>>> Threads should be albe to acquire DESTROY job at any point,
>>>>> regardless
>> of
>>>>> other jobs set on the domain object.
>>>>>
>>>>> Can you please:
>>>>> a) try to turn on debug logs [1] and tell us why acquiring DESTROY
>>>>> job failed? You should see an error message like this:
>>>>>
>>>>> error: cannot acquire state change lock ..
>>>>>
>>>>> b) tell us what is your libvirt version and if you're able to
>>>>> reproduce this with the latest git HEAD?
>>>>>
>>>>
>>>> I said " qemuProcessBeginStopJob failed" means that:
>>>
>>> Oh, I though that the message you've sent earlier is related to this:
>>>
>>>
https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
>>>
>>> So you are not accidentally sending SIGKILL to qemu then?
>>
>> Yep, I send SIGKILL to qemu outside. The 'accident' means that the
>> scene libvirt indicate the vm is in running state all the time is
>> hardly to reproduce. In the past month, I just reproduce it twice.
>>
>>
>>
>>>> we failed to kill qemu process in 15 seconds (refer to
>> virProcessKillPainfully).
>>>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit
>>>> in
>> 15s, and
>>>> then libvirt will think qemu is still in running state event though
>>>> qemu
>> exit
>>>> indeed after the 15s loop in virProcessKillPainfully.
>>>
>>> What state is qemu process in then? I mean, how can we see EOF if the
>>> process still exists?
>>>
>> I send SIGKILL to qemu process, but the qemu process didn't exited
>> immediately, I use command 'ps -ef | grep qemu' show that the qemu
>> process is in defunct state.
>
> Ah, so you can find the process, but it is in D state. Because I read
the
> email linked above like qemu is gone.
Yep
>> Then about
>> 20s-30s after sending the SIGKILLthe qemu process exited and I can't
>> find the qemu info though ps command.
>> So, the libvirt still think the qemu process is alive in the 15s loop
>> in virProcessKillPainfully.
>
> Ah, so IIUC, qemu has closed the monitor but right after that it went to
> the D state instead of quitting. Meanwhile, libvirt sees EOF on the
monitor
> but is unable to kill the process.
Right
> Well, registering EOF handler back would be only a workaround, because
if
> you register EOF handler back the event loop will do a busy wait (in
each
> iteration it will see EOF), so eventually the
> virProcessKillPainfully() will see the process gone and
> qemuProcessBeginStopJob() would be able to return successfully.
>
> I'm unsure what the right fix might be though. Maybe, at EOF we can
check
> what state is qemu process in and if it's in D state don't try to kill
it
> and continue with BeginJob() call.
>
> Michal
Hmm, I can't come up with a better solution for this problem, so I wish if
somebody could help to solve this problem.
BTW, how to check a process is in D state in libvirt?
By reading /proc/$pid/status. Although this would work only on Linux,
not *BSD. On the other hand, I'm not sure *BSD has D state.
Michal