On 03/05/2018 10:26 AM, Michal Privoznik wrote:
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
> Hi,
>
> We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to
workerPool:
>
> static void
> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon,
> virDomainObjPtr vm,
> void *opaque)
> {
> virQEMUDriverPtr driver = opaque;
> qemuDomainObjPrivatePtr priv;
> struct qemuProcessEvent *processEvent;
> ...
> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF;
> processEvent->vm = vm;
>
> virObjectRef(vm);
> if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) {
> ignore_value(virObjectUnref(vm));
> VIR_FREE(processEvent);
> goto cleanup;
> }
>
> /* We don't want this EOF handler to be called over and over while the
> * thread is waiting for a job.
> */
> qemuMonitorUnregister(mon);
> ...
> }
>
> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent function:
>
> static void
> processMonitorEOFEvent(virQEMUDriverPtr driver,
> virDomainObjPtr vm)
> {
> ...
> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) < 0)
> return;
> ...
> }
>
> Here, libvirt will show that the vm state is running all the time if
qemuProcessBeginStopJob return -1
> even though qemu may terminate or be killed later.
>
> So, may be we should re-register the monitor when qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means
that we screwed up earlier and now you're just seeing effects of it.
Threads should be albe to acquire DESTROY job at any point, regardless
of other jobs set on the domain object.
Can you please:
a) try to turn on debug logs [1] and tell us why acquiring DESTROY job
failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce
this with the latest git HEAD?
Ha! Looking at the code I think I've found something that might be
causing this issue. Do you have max_queued set in qemu.conf? Because if
you do, then qemuDomainObjBeginJobInternal() might fail to set job
because it's above the set limit. If I'm right, this should be the fix:
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c
index 8b4efc82d..7eb631e06 100644
--- a/src/qemu/qemu_domain.c
+++ b/src/qemu/qemu_domain.c
@@ -5401,7 +5401,8 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver,
then = now + QEMU_JOB_WAIT_TIME;
retry:
- if (cfg->maxQueuedJobs &&
+ if ((!async && job == QEMU_JOB_DESTROY) &&
Oh, this should be reversed (note to myself: don't write any patches
until morning coffee kicks in):
if ((!async && job != QEMU_JOB_DESTROY) &&
cfg->maxQueuedJobs &&
Michal