>> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
>>> Hi,
>>>
>>> We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF
>> to workerPool:
>>>
>>> static void
>>> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
>>> virDomainObjPtr vm,
>>> void *opaque) {
>>> virQEMUDriverPtr driver = opaque;
>>> qemuDomainObjPrivatePtr priv;
>>> struct qemuProcessEvent *processEvent; ...
>>> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF;
>>> processEvent->vm = vm;
>>>
>>> virObjectRef(vm);
>>> if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0)
{
>>> ignore_value(virObjectUnref(vm));
>>> VIR_FREE(processEvent);
>>> goto cleanup;
>>> }
>>>
>>> /* We don't want this EOF handler to be called over and over while
>> the
>>> * thread is waiting for a job.
>>> */
>>> qemuMonitorUnregister(mon);
>>> ...
>>> }
>>>
>>> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent
>> function:
>>>
>>> static void
>>> processMonitorEOFEvent(virQEMUDriverPtr driver,
>>> virDomainObjPtr vm) {
>>> ...
>>> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) <
>> 0)
>>> return;
>>> ...
>>> }
>>>
>>> Here, libvirt will show that the vm state is running all the time if
>>> qemuProcessBeginStopJob return -1 even though qemu may terminate or be
>> killed later.
>>>
>>> So, may be we should re-register the monitor when
>> qemuProcessBeginStopJob failed?
>>
>> The fact that processMonitorEOFEvent() failed to grab DESTROY job means
>> that we screwed up earlier and now you're just seeing effects of it.
>> Threads should be albe to acquire DESTROY job at any point, regardless
of
>> other jobs set on the domain object.
>>
>> Can you please:
>> a) try to turn on debug logs [1] and tell us why acquiring DESTROY job
>> failed? You should see an error message like this:
>>
>> error: cannot acquire state change lock ..
>>
>> b) tell us what is your libvirt version and if you're able to reproduce
>> this with the latest git HEAD?
>>
>
> I said " qemuProcessBeginStopJob failed" means that:
Oh, I though that the message you've sent earlier is related to this:
https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
So you are not accidentally sending SIGKILL to qemu then?
Yep, I send SIGKILL to qemu outside. The 'accident' means that the scene
libvirt indicate
the vm is in running state all the time is hardly to reproduce. In the past
month, I just
reproduce it twice.
> we failed to kill qemu process in 15 seconds (refer to
virProcessKillPainfully).
> IOW, we send SIGTERM and SIGKILL but the qemu process doesn't
exit in
15s, and
> then libvirt will think qemu is still in running state event
though qemu
exit
> indeed after the 15s loop in virProcessKillPainfully.
What state is qemu process in then? I mean, how can we see EOF if the
process still exists?
I send SIGKILL to qemu process, but the qemu process didn't exited
immediately, I use
command 'ps -ef | grep qemu' show that the qemu process is in defunct state.
Then about
20s-30s after sending the SIGKILLthe qemu process exited and I can't find
the qemu info
though ps command.
So, the libvirt still think the qemu process is alive in the 15s loop in
virProcessKillPainfully.
Thanks,
Wu Zongyong