>> On 03/05/2018 12:43 PM, Cordius Wu wrote:
>>>>>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We unregister qemu monitor after sending
>>>>>>> QEMU_PROCESS_EVENT_MONITOR_EOF
>>>>>> to workerPool:
>>>>>>>
>>>>>>> static void
>>>>>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
>>>>>>> virDomainObjPtr vm,
>>>>>>> void *opaque) {
>>>>>>> virQEMUDriverPtr driver = opaque;
>>>>>>> qemuDomainObjPrivatePtr priv; struct qemuProcessEvent
>>>>>>> *processEvent; ...
>>>>>>> processEvent->eventType =
QEMU_PROCESS_EVENT_MONITOR_EOF;
>>>>>>> processEvent->vm = vm;
>>>>>>>
>>>>>>> virObjectRef(vm);
>>>>>>> if (virThreadPoolSendJob(driver->workerPool, 0,
>>>>>>> processEvent) < 0)
>>> {
>>>>>>> ignore_value(virObjectUnref(vm));
>>>>>>> VIR_FREE(processEvent);
>>>>>>> goto cleanup;
>>>>>>> }
>>>>>>>
>>>>>>> /* We don't want this EOF handler to be called over
and over
>>>>>>> while
>>>>>> the
>>>>>>> * thread is waiting for a job.
>>>>>>> */
>>>>>>> qemuMonitorUnregister(mon);
>>>>>>> ...
>>>>>>> }
>>>>>>>
>>>>>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in
>>>>>>> processMonitorEOFEvent
>>>>>> function:
>>>>>>>
>>>>>>> static void
>>>>>>> processMonitorEOFEvent(virQEMUDriverPtr driver,
>>>>>>> virDomainObjPtr vm) {
>>>>>>> ...
>>>>>>> if (qemuProcessBeginStopJob(driver, vm,
QEMU_JOB_DESTROY,
>>>>>>> true) <
>>>>>> 0)
>>>>>>> return;
>>>>>>> ...
>>>>>>> }
>>>>>>>
>>>>>>> Here, libvirt will show that the vm state is running all
the
>>>>>>> time if qemuProcessBeginStopJob return -1 even though qemu
may
>>>>>>> terminate or be
>>>>>> killed later.
>>>>>>>
>>>>>>> So, may be we should re-register the monitor when
>>>>>> qemuProcessBeginStopJob failed?
>>>>>>
>>>>>> The fact that processMonitorEOFEvent() failed to grab DESTROY
job
>>>>>> means that we screwed up earlier and now you're just seeing
>>>>>> effects
>> of it.
>>>>>> Threads should be albe to acquire DESTROY job at any point,
>>>>>> regardless
>>> of
>>>>>> other jobs set on the domain object.
>>>>>>
>>>>>> Can you please:
>>>>>> a) try to turn on debug logs [1] and tell us why acquiring
>>>>>> DESTROY job failed? You should see an error message like this:
>>>>>>
>>>>>> error: cannot acquire state change lock ..
>>>>>>
>>>>>> b) tell us what is your libvirt version and if you're able
to
>>>>>> reproduce this with the latest git HEAD?
>>>>>>
>>>>>
>>>>> I said " qemuProcessBeginStopJob failed" means that:
>>>>
>>>> Oh, I though that the message you've sent earlier is related to
this:
>>>>
>>>>
https://www.redhat.com/archives/libvir-list/2018-March/msg00148.htm
>>>> l
>>>>
>>>> So you are not accidentally sending SIGKILL to qemu then?
>>>
>>> Yep, I send SIGKILL to qemu outside. The 'accident' means that the
>>> scene libvirt indicate the vm is in running state all the time is
>>> hardly to reproduce. In the past month, I just reproduce it twice.
>>>
>>>
>>>
>>>>> we failed to kill qemu process in 15 seconds (refer to
>>> virProcessKillPainfully).
>>>>> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't
exit
>>>>> in
>>> 15s, and
>>>>> then libvirt will think qemu is still in running state event
>>>>> though qemu
>>> exit
>>>>> indeed after the 15s loop in virProcessKillPainfully.
>>>>
>>>> What state is qemu process in then? I mean, how can we see EOF if
>>>> the process still exists?
>>>>
>>> I send SIGKILL to qemu process, but the qemu process didn't exited
>>> immediately, I use command 'ps -ef | grep qemu' show that the qemu
>>> process is in defunct state.
>>
>> Ah, so you can find the process, but it is in D state. Because I read
> the
>> email linked above like qemu is gone.
>
> Yep
>>> Then about
>>> 20s-30s after sending the SIGKILLthe qemu process exited and I can't
>>> find the qemu info though ps command.
>>> So, the libvirt still think the qemu process is alive in the 15s
>>> loop in virProcessKillPainfully.
>>
>> Ah, so IIUC, qemu has closed the monitor but right after that it went
>> to the D state instead of quitting. Meanwhile, libvirt sees EOF on
>> the
> monitor
>> but is unable to kill the process.
>
> Right
>> Well, registering EOF handler back would be only a workaround,
>> because
> if
>> you register EOF handler back the event loop will do a busy wait (in
> each
>> iteration it will see EOF), so eventually the
>> virProcessKillPainfully() will see the process gone and
>> qemuProcessBeginStopJob() would be able to return successfully.
>>
>> I'm unsure what the right fix might be though. Maybe, at EOF we can
> check
>> what state is qemu process in and if it's in D state don't try to
>> kill
> it
>> and continue with BeginJob() call.
>>
>> Michal
> Hmm, I can't come up with a better solution for this problem, so I
> wish if somebody could help to solve this problem.
> BTW, how to check a process is in D state in libvirt?
By reading /proc/$pid/status. Although this would work only on Linux, not
*BSD. On the other hand, I'm not sure *BSD has D state.
Michal