[libvirt] [PATCH] qemuDomainObjBeginJob: Don't account DESTROY job to maxQueuedJobs

When trying to destroy a domain (e.g. because we've seen EOF on the monitor) we try to acquire QEMU_JOB_DESTROY. However, if max_queued is set in qemu.conf this may fail and since our code doesn't count on that we will still report domain as active even though the qemu process is long gone. More specifically, if we've seen EOF on the monitor, qemuProcessHandleMonitorEOF() is called which sends MONITOR_EOF job to the event worker pool and unregisters monitor from the event loop. The worker pool calls processMonitorEOFEvent() which tries to set job which may fail due to the limit as described above. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8b4efc82d..ee02ecd0c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -5401,7 +5401,8 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, then = now + QEMU_JOB_WAIT_TIME; retry: - if (cfg->maxQueuedJobs && + if ((!async && job != QEMU_JOB_DESTROY) && + cfg->maxQueuedJobs && priv->jobs_queued > cfg->maxQueuedJobs) { goto error; } -- 2.16.1

On 03/05/2018 04:36 AM, Michal Privoznik wrote:
When trying to destroy a domain (e.g. because we've seen EOF on the monitor) we try to acquire QEMU_JOB_DESTROY. However, if max_queued is set in qemu.conf this may fail and since our code doesn't count on that we will still report domain as active even though the qemu process is long gone. More specifically, if we've seen EOF on the monitor, qemuProcessHandleMonitorEOF() is called which sends MONITOR_EOF job to the event worker pool and unregisters monitor from the event loop. The worker pool calls processMonitorEOFEvent() which tries to set job which may fail due to the limit as described above.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Should the qemu.conf explanation be adjusted to specifically note that a DESTROY job does not count against the max queue jobs count? Reviewed-by: John Ferlan <jferlan@redhat.com> John

On 03/06/2018 05:20 PM, John Ferlan wrote:
On 03/05/2018 04:36 AM, Michal Privoznik wrote:
When trying to destroy a domain (e.g. because we've seen EOF on the monitor) we try to acquire QEMU_JOB_DESTROY. However, if max_queued is set in qemu.conf this may fail and since our code doesn't count on that we will still report domain as active even though the qemu process is long gone. More specifically, if we've seen EOF on the monitor, qemuProcessHandleMonitorEOF() is called which sends MONITOR_EOF job to the event worker pool and unregisters monitor from the event loop. The worker pool calls processMonitorEOFEvent() which tries to set job which may fail due to the limit as described above.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
Should the qemu.conf explanation be adjusted to specifically note that a DESTROY job does not count against the max queue jobs count?
I don't think so. The set of jobs we have in qemu driver is internal implementation detail. It may change during releases and users have no control over it.
Reviewed-by: John Ferlan <jferlan@redhat.com>
Pushed, thanks. Michal
participants (2)
-
John Ferlan
-
Michal Privoznik