[libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process

Hi, We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool: static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm; virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; } /* We don't want this EOF handler to be called over and over while the * thread is waiting for a job. */ qemuMonitorUnregister(mon); ... } Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent function: static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) < 0) return; ... } Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be killed later. So, may be we should re-register the monitor when qemuProcessBeginStopJob failed? Thanks, Zongyong Wu

On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
Hi,
We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool:
static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm;
virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; }
/* We don't want this EOF handler to be called over and over while the * thread is waiting for a job. */ qemuMonitorUnregister(mon); ... }
Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent function:
static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) < 0) return; ... }
Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be killed later.
So, may be we should re-register the monitor when qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object. Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this: error: cannot acquire state change lock .. b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD? Ha! Looking at the code I think I've found something that might be causing this issue. Do you have max_queued set in qemu.conf? Because if you do, then qemuDomainObjBeginJobInternal() might fail to set job because it's above the set limit. If I'm right, this should be the fix: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8b4efc82d..7eb631e06 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5401,7 +5401,8 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, then = now + QEMU_JOB_WAIT_TIME; retry: - if (cfg->maxQueuedJobs && + if ((!async && job == QEMU_JOB_DESTROY) && + cfg->maxQueuedJobs && priv->jobs_queued > cfg->maxQueuedJobs) { goto error; } Michal

On 03/05/2018 10:26 AM, Michal Privoznik wrote:
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote:
Hi,
We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool:
static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm;
virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; }
/* We don't want this EOF handler to be called over and over while the * thread is waiting for a job. */ qemuMonitorUnregister(mon); ... }
Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent function:
static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) < 0) return; ... }
Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be killed later.
So, may be we should re-register the monitor when qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object.
Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD?
Ha! Looking at the code I think I've found something that might be causing this issue. Do you have max_queued set in qemu.conf? Because if you do, then qemuDomainObjBeginJobInternal() might fail to set job because it's above the set limit. If I'm right, this should be the fix:
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8b4efc82d..7eb631e06 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5401,7 +5401,8 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, then = now + QEMU_JOB_WAIT_TIME;
retry: - if (cfg->maxQueuedJobs && + if ((!async && job == QEMU_JOB_DESTROY) &&
Oh, this should be reversed (note to myself: don't write any patches until morning coffee kicks in): if ((!async && job != QEMU_JOB_DESTROY) && cfg->maxQueuedJobs && Michal

Thanks, Zongyong Wu
-----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Monday, March 05, 2018 5:27 PM To: Wuzongyong (Euler Dept) <cordius.wu@huawei.com>; libvir- list@redhat.com Cc: Wanzongshun (Vincent) <wanzongshun@huawei.com>; weijinfen <weijinfen@huawei.com> Subject: Re: [libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process
Hi,
We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool:
static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm;
virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; }
/* We don't want this EOF handler to be called over and over while
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: the
* thread is waiting for a job. */ qemuMonitorUnregister(mon); ... }
Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent
function:
static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) <
0)
return; ... }
Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be
killed later.
So, may be we should re-register the monitor when
qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object.
Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD?
I said " qemuProcessBeginStopJob failed" means that: we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully. int qemuProcessBeginStopJob(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainJob job, bool forceKill) { ... if (qemuProcessKill(vm, killFlags) < 0) goto cleanup; ... }
Ha! Looking at the code I think I've found something that might be causing this issue. Do you have max_queued set in qemu.conf? Because if you do, then qemuDomainObjBeginJobInternal() might fail to set job because it's above the set limit. If I'm right, this should be the fix:
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8b4efc82d..7eb631e06 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5401,7 +5401,8 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, then = now + QEMU_JOB_WAIT_TIME;
retry: - if (cfg->maxQueuedJobs && + if ((!async && job == QEMU_JOB_DESTROY) && + cfg->maxQueuedJobs && priv->jobs_queued > cfg->maxQueuedJobs) { goto error; }
Michal

On 03/05/2018 10:39 AM, Wuzongyong (Euler Dept) wrote:
Thanks, Zongyong Wu
[Please don't top post on technical lists]
-----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Monday, March 05, 2018 5:27 PM To: Wuzongyong (Euler Dept) <cordius.wu@huawei.com>; libvir- list@redhat.com Cc: Wanzongshun (Vincent) <wanzongshun@huawei.com>; weijinfen <weijinfen@huawei.com> Subject: Re: [libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process
Hi,
We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool:
static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm;
virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; }
/* We don't want this EOF handler to be called over and over while
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: the
* thread is waiting for a job. */ qemuMonitorUnregister(mon); ... }
Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent
function:
static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) <
0)
return; ... }
Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be
killed later.
So, may be we should re-register the monitor when
qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object.
Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD?
I said " qemuProcessBeginStopJob failed" means that:
Oh, I though that the message you've sent earlier is related to this: https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html So you are not accidentally sending SIGKILL to qemu then?
we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully.
What state is qemu process in then? I mean, how can we see EOF if the process still exists? Michal

-----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Monday, March 05, 2018 5:27 PM To: Wuzongyong (Euler Dept) <cordius.wu@huawei.com>; libvir- list@redhat.com Cc: Wanzongshun (Vincent) <wanzongshun@huawei.com>; weijinfen <weijinfen@huawei.com> Subject: Re: [libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process
Hi,
We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool:
static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm;
virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; }
/* We don't want this EOF handler to be called over and over while
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: the
* thread is waiting for a job. */ qemuMonitorUnregister(mon); ... }
Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent
function:
static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) <
0)
return; ... }
Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be
killed later.
So, may be we should re-register the monitor when
qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object.
Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD?
I said " qemuProcessBeginStopJob failed" means that: we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully because we have unregister the monitor. int qemuProcessBeginStopJob(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainJob job, bool forceKill) { ... if (qemuProcessKill(vm, killFlags) < 0) goto cleanup; ... }
Ha! Looking at the code I think I've found something that might be causing this issue. Do you have max_queued set in qemu.conf? Because if you do, then qemuDomainObjBeginJobInternal() might fail to set job because it's above the set limit. If I'm right, this should be the fix:
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8b4efc82d..7eb631e06 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5401,7 +5401,8 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, then = now + QEMU_JOB_WAIT_TIME;
retry: - if (cfg->maxQueuedJobs && + if ((!async && job == QEMU_JOB_DESTROY) && + cfg->maxQueuedJobs && priv->jobs_queued > cfg->maxQueuedJobs) { goto error; }
Michal

Hi,
We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool:
static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm;
virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; }
/* We don't want this EOF handler to be called over and over while
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: the
* thread is waiting for a job. */ qemuMonitorUnregister(mon); ... }
Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent
function:
static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) <
0)
return; ... }
Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be
killed later.
So, may be we should re-register the monitor when
qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object.
Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD?
I said " qemuProcessBeginStopJob failed" means that:
Oh, I though that the message you've sent earlier is related to this:
https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
So you are not accidentally sending SIGKILL to qemu then?
Yep, I send SIGKILL to qemu outside. The 'accident' means that the scene libvirt indicate the vm is in running state all the time is hardly to reproduce. In the past month, I just reproduce it twice.
we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully.
What state is qemu process in then? I mean, how can we see EOF if the process still exists?
I send SIGKILL to qemu process, but the qemu process didn't exited immediately, I use command 'ps -ef | grep qemu' show that the qemu process is in defunct state. Then about 20s-30s after sending the SIGKILLthe qemu process exited and I can't find the qemu info though ps command. So, the libvirt still think the qemu process is alive in the 15s loop in virProcessKillPainfully. Thanks, Wu Zongyong

On 03/05/2018 12:43 PM, Cordius Wu wrote:
Hi,
We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool:
static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm;
virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; }
/* We don't want this EOF handler to be called over and over while
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: the
* thread is waiting for a job. */ qemuMonitorUnregister(mon); ... }
Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent
function:
static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) <
0)
return; ... }
Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be
killed later.
So, may be we should re-register the monitor when
qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object.
Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD?
I said " qemuProcessBeginStopJob failed" means that:
Oh, I though that the message you've sent earlier is related to this:
https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
So you are not accidentally sending SIGKILL to qemu then?
Yep, I send SIGKILL to qemu outside. The 'accident' means that the scene libvirt indicate the vm is in running state all the time is hardly to reproduce. In the past month, I just reproduce it twice.
we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully.
What state is qemu process in then? I mean, how can we see EOF if the process still exists?
I send SIGKILL to qemu process, but the qemu process didn't exited immediately, I use command 'ps -ef | grep qemu' show that the qemu process is in defunct state.
Ah, so you can find the process, but it is in D state. Because I read the email linked above like qemu is gone.
Then about 20s-30s after sending the SIGKILLthe qemu process exited and I can't find the qemu info though ps command. So, the libvirt still think the qemu process is alive in the 15s loop in virProcessKillPainfully.
Ah, so IIUC, qemu has closed the monitor but right after that it went to the D state instead of quitting. Meanwhile, libvirt sees EOF on the monitor but is unable to kill the process. Well, registering EOF handler back would be only a workaround, because if you register EOF handler back the event loop will do a busy wait (in each iteration it will see EOF), so eventually the virProcessKillPainfully() will see the process gone and qemuProcessBeginStopJob() would be able to return successfully. I'm unsure what the right fix might be though. Maybe, at EOF we can check what state is qemu process in and if it's in D state don't try to kill it and continue with BeginJob() call. Michal

-----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Monday, March 5, 2018 8:09 PM To: Cordius Wu; 'Wuzongyong (Euler Dept)'; libvir-list@redhat.com Cc: 'Wanzongshun (Vincent)'; 'weijinfen' Subject: Re: [libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process
On 03/05/2018 12:43 PM, Cordius Wu wrote:
Hi,
We unregister qemu monitor after sending QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool:
static void qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque) { virQEMUDriverPtr driver = opaque; qemuDomainObjPrivatePtr priv; struct qemuProcessEvent *processEvent; ... processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; processEvent->vm = vm;
virObjectRef(vm); if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { ignore_value(virObjectUnref(vm)); VIR_FREE(processEvent); goto cleanup; }
/* We don't want this EOF handler to be called over and over while
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: the
* thread is waiting for a job. */ qemuMonitorUnregister(mon); ... }
Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in processMonitorEOFEvent
function:
static void processMonitorEOFEvent(virQEMUDriverPtr driver, virDomainObjPtr vm) { ... if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) <
0)
return; ... }
Here, libvirt will show that the vm state is running all the time if qemuProcessBeginStopJob return -1 even though qemu may terminate or be
killed later.
So, may be we should re-register the monitor when
qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object.
Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD?
I said " qemuProcessBeginStopJob failed" means that:
Oh, I though that the message you've sent earlier is related to this:
https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
So you are not accidentally sending SIGKILL to qemu then?
Yep, I send SIGKILL to qemu outside. The 'accident' means that the scene libvirt indicate the vm is in running state all the time is hardly to reproduce. In the past month, I just reproduce it twice.
we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully.
What state is qemu process in then? I mean, how can we see EOF if the process still exists?
I send SIGKILL to qemu process, but the qemu process didn't exited immediately, I use command 'ps -ef | grep qemu' show that the qemu process is in defunct state.
Ah, so you can find the process, but it is in D state. Because I read
the
email linked above like qemu is gone.
Yep
Then about 20s-30s after sending the SIGKILLthe qemu process exited and I can't find the qemu info though ps command. So, the libvirt still think the qemu process is alive in the 15s loop in virProcessKillPainfully.
Ah, so IIUC, qemu has closed the monitor but right after that it went to the D state instead of quitting. Meanwhile, libvirt sees EOF on the monitor but is unable to kill the process.
Right
Well, registering EOF handler back would be only a workaround, because if you register EOF handler back the event loop will do a busy wait (in each iteration it will see EOF), so eventually the virProcessKillPainfully() will see the process gone and qemuProcessBeginStopJob() would be able to return successfully.
I'm unsure what the right fix might be though. Maybe, at EOF we can check what state is qemu process in and if it's in D state don't try to kill it and continue with BeginJob() call.
Michal Hmm, I can't come up with a better solution for this problem, so I wish if somebody could help to solve this problem. BTW, how to check a process is in D state in libvirt?
Thanks, Wu Zongyong

On 03/05/2018 01:21 PM, Cordius Wu wrote:
-----Original Message----- From: Michal Privoznik [mailto:mprivozn@redhat.com] Sent: Monday, March 5, 2018 8:09 PM To: Cordius Wu; 'Wuzongyong (Euler Dept)'; libvir-list@redhat.com Cc: 'Wanzongshun (Vincent)'; 'weijinfen' Subject: Re: [libvirt] [Question]Libvirt doesn't care about qemu monitor event if fail to destroy qemu process
On 03/05/2018 12:43 PM, Cordius Wu wrote:
On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: > Hi, > > We unregister qemu monitor after sending > QEMU_PROCESS_EVENT_MONITOR_EOF to workerPool: > > static void > qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, > virDomainObjPtr vm, > void *opaque) { > virQEMUDriverPtr driver = opaque; > qemuDomainObjPrivatePtr priv; > struct qemuProcessEvent *processEvent; ... > processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; > processEvent->vm = vm; > > virObjectRef(vm); > if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) > < 0) { > ignore_value(virObjectUnref(vm)); > VIR_FREE(processEvent); > goto cleanup; > } > > /* We don't want this EOF handler to be called over and over > while the > * thread is waiting for a job. > */ > qemuMonitorUnregister(mon); > ... > } > > Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in > processMonitorEOFEvent function: > > static void > processMonitorEOFEvent(virQEMUDriverPtr driver, > virDomainObjPtr vm) { > ... > if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, > true) < 0) > return; > ... > } > > Here, libvirt will show that the vm state is running all the time > if qemuProcessBeginStopJob return -1 even though qemu may > terminate or be killed later. > > So, may be we should re-register the monitor when qemuProcessBeginStopJob failed?
The fact that processMonitorEOFEvent() failed to grab DESTROY job means that we screwed up earlier and now you're just seeing effects of it. Threads should be albe to acquire DESTROY job at any point, regardless of other jobs set on the domain object.
Can you please: a) try to turn on debug logs [1] and tell us why acquiring DESTROY job failed? You should see an error message like this:
error: cannot acquire state change lock ..
b) tell us what is your libvirt version and if you're able to reproduce this with the latest git HEAD?
I said " qemuProcessBeginStopJob failed" means that:
Oh, I though that the message you've sent earlier is related to this:
https://www.redhat.com/archives/libvir-list/2018-March/msg00148.html
So you are not accidentally sending SIGKILL to qemu then?
Yep, I send SIGKILL to qemu outside. The 'accident' means that the scene libvirt indicate the vm is in running state all the time is hardly to reproduce. In the past month, I just reproduce it twice.
we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully.
What state is qemu process in then? I mean, how can we see EOF if the process still exists?
I send SIGKILL to qemu process, but the qemu process didn't exited immediately, I use command 'ps -ef | grep qemu' show that the qemu process is in defunct state.
Ah, so you can find the process, but it is in D state. Because I read the email linked above like qemu is gone.
Yep
Then about 20s-30s after sending the SIGKILLthe qemu process exited and I can't find the qemu info though ps command. So, the libvirt still think the qemu process is alive in the 15s loop in virProcessKillPainfully.
Ah, so IIUC, qemu has closed the monitor but right after that it went to the D state instead of quitting. Meanwhile, libvirt sees EOF on the monitor but is unable to kill the process.
Right
Well, registering EOF handler back would be only a workaround, because if you register EOF handler back the event loop will do a busy wait (in each iteration it will see EOF), so eventually the virProcessKillPainfully() will see the process gone and qemuProcessBeginStopJob() would be able to return successfully.
I'm unsure what the right fix might be though. Maybe, at EOF we can check what state is qemu process in and if it's in D state don't try to kill it and continue with BeginJob() call.
Michal Hmm, I can't come up with a better solution for this problem, so I wish if somebody could help to solve this problem. BTW, how to check a process is in D state in libvirt?
By reading /proc/$pid/status. Although this would work only on Linux, not *BSD. On the other hand, I'm not sure *BSD has D state. Michal

Thanks, Zongyong Wu
On 03/05/2018 12:43 PM, Cordius Wu wrote:
> On 03/05/2018 03:20 AM, Wuzongyong (Euler Dept) wrote: >> Hi, >> >> We unregister qemu monitor after sending >> QEMU_PROCESS_EVENT_MONITOR_EOF > to workerPool: >> >> static void >> qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, >> virDomainObjPtr vm, >> void *opaque) { >> virQEMUDriverPtr driver = opaque; >> qemuDomainObjPrivatePtr priv; struct qemuProcessEvent >> *processEvent; ... >> processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; >> processEvent->vm = vm; >> >> virObjectRef(vm); >> if (virThreadPoolSendJob(driver->workerPool, 0, >> processEvent) < 0) { >> ignore_value(virObjectUnref(vm)); >> VIR_FREE(processEvent); >> goto cleanup; >> } >> >> /* We don't want this EOF handler to be called over and over >> while > the >> * thread is waiting for a job. >> */ >> qemuMonitorUnregister(mon); >> ... >> } >> >> Then we handle QEMU_PROCESS_EVENT_MONITOR_EOF in >> processMonitorEOFEvent > function: >> >> static void >> processMonitorEOFEvent(virQEMUDriverPtr driver, >> virDomainObjPtr vm) { >> ... >> if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, >> true) < > 0) >> return; >> ... >> } >> >> Here, libvirt will show that the vm state is running all the >> time if qemuProcessBeginStopJob return -1 even though qemu may >> terminate or be > killed later. >> >> So, may be we should re-register the monitor when > qemuProcessBeginStopJob failed? > > The fact that processMonitorEOFEvent() failed to grab DESTROY job > means that we screwed up earlier and now you're just seeing > effects of it. > Threads should be albe to acquire DESTROY job at any point, > regardless of > other jobs set on the domain object. > > Can you please: > a) try to turn on debug logs [1] and tell us why acquiring > DESTROY job failed? You should see an error message like this: > > error: cannot acquire state change lock .. > > b) tell us what is your libvirt version and if you're able to > reproduce this with the latest git HEAD? >
I said " qemuProcessBeginStopJob failed" means that:
Oh, I though that the message you've sent earlier is related to this:
https://www.redhat.com/archives/libvir-list/2018-March/msg00148.htm l
So you are not accidentally sending SIGKILL to qemu then?
Yep, I send SIGKILL to qemu outside. The 'accident' means that the scene libvirt indicate the vm is in running state all the time is hardly to reproduce. In the past month, I just reproduce it twice.
we failed to kill qemu process in 15 seconds (refer to virProcessKillPainfully). IOW, we send SIGTERM and SIGKILL but the qemu process doesn't exit in 15s, and then libvirt will think qemu is still in running state event though qemu exit indeed after the 15s loop in virProcessKillPainfully.
What state is qemu process in then? I mean, how can we see EOF if the process still exists?
I send SIGKILL to qemu process, but the qemu process didn't exited immediately, I use command 'ps -ef | grep qemu' show that the qemu process is in defunct state.
Ah, so you can find the process, but it is in D state. Because I read the email linked above like qemu is gone.
Yep
Then about 20s-30s after sending the SIGKILLthe qemu process exited and I can't find the qemu info though ps command. So, the libvirt still think the qemu process is alive in the 15s loop in virProcessKillPainfully.
Ah, so IIUC, qemu has closed the monitor but right after that it went to the D state instead of quitting. Meanwhile, libvirt sees EOF on the monitor but is unable to kill the process.
Right
Well, registering EOF handler back would be only a workaround, because if you register EOF handler back the event loop will do a busy wait (in each iteration it will see EOF), so eventually the virProcessKillPainfully() will see the process gone and qemuProcessBeginStopJob() would be able to return successfully.
I'm unsure what the right fix might be though. Maybe, at EOF we can check what state is qemu process in and if it's in D state don't try to kill it and continue with BeginJob() call.
Michal Hmm, I can't come up with a better solution for this problem, so I wish if somebody could help to solve this problem. BTW, how to check a process is in D state in libvirt?
By reading /proc/$pid/status. Although this would work only on Linux, not *BSD. On the other hand, I'm not sure *BSD has D state.
Michal Hmmm, is a process marked with defunct in Z state instead of D state?
participants (3)
-
Cordius Wu
-
Michal Privoznik
-
Wuzongyong (Euler Dept)