[libvirt] [PATCH 0/9] Fix race conditions with qemuProcessStop

Jiri Denemark (9): qemu: Export qemuDomainObjBeginNestedJob qemu: End nested jobs properly qemu: Pass async job to qemuProcessInit qemu: Introduce qemuProcessBeginStopJob qemu: Start an async job for processGuestPanicEvent qemu: Process monitor EOF in a job qemu: Simplify error handling in qemuProcessReconnect qemu: Avoid calling qemuProcessStop without a job Check for active domain in virDomainObjWait src/conf/domain_conf.c | 7 ++ src/qemu/qemu_domain.c | 9 +-- src/qemu/qemu_domain.h | 5 ++ src/qemu/qemu_driver.c | 179 +++++++++++++++++++++++-------------------- src/qemu/qemu_migration.c | 9 ++- src/qemu/qemu_monitor.c | 14 +++- src/qemu/qemu_monitor.h | 2 + src/qemu/qemu_process.c | 191 +++++++++++++++++++++++++++++----------------- src/qemu/qemu_process.h | 6 ++ 9 files changed, 254 insertions(+), 168 deletions(-) -- 2.7.1

Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_domain.h | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 495c76b..da22d1c 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1760,7 +1760,7 @@ int qemuDomainObjBeginAsyncJob(virQEMUDriverPtr driver, return 0; } -static int ATTRIBUTE_RETURN_CHECK +int qemuDomainObjBeginNestedJob(virQEMUDriverPtr driver, virDomainObjPtr obj, qemuDomainAsyncJob asyncJob) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 285af8c..5812174 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -267,6 +267,10 @@ int qemuDomainObjBeginAsyncJob(virQEMUDriverPtr driver, virDomainObjPtr obj, qemuDomainAsyncJob asyncJob) ATTRIBUTE_RETURN_CHECK; +int qemuDomainObjBeginNestedJob(virQEMUDriverPtr driver, + virDomainObjPtr obj, + qemuDomainAsyncJob asyncJob) + ATTRIBUTE_RETURN_CHECK; void qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj); -- 2.7.1

On Tue, Feb 16, 2016 at 15:36:53 +0100, Jiri Denemark wrote:
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.c | 2 +- src/qemu/qemu_domain.h | 4 ++++ 2 files changed, 5 insertions(+), 1 deletion(-)
ACK

Ending a nested job is no different from ending any other (non-async) job, after all the code in qemuDomainBeginJobInternal does not handle them differently either. Thus we should call qemuDomainObjEndJob to stop nested jobs. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index da22d1c..85d2ba6 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1899,11 +1899,8 @@ qemuDomainObjExitMonitorInternal(virQEMUDriverPtr driver, if (!hasRefs) priv->mon = NULL; - if (priv->job.active == QEMU_JOB_ASYNC_NESTED) { - qemuDomainObjResetJob(priv); - qemuDomainObjSaveJob(driver, obj); - virCondSignal(&priv->job.cond); - } + if (priv->job.active == QEMU_JOB_ASYNC_NESTED) + qemuDomainObjEndJob(driver, obj); } void qemuDomainObjEnterMonitor(virQEMUDriverPtr driver, -- 2.7.1

On Tue, Feb 16, 2016 at 15:36:54 +0100, Jiri Denemark wrote:
Ending a nested job is no different from ending any other (non-async) job, after all the code in qemuDomainBeginJobInternal does not handle them differently either. Thus we should call qemuDomainObjEndJob to stop nested jobs.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
ACK

Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 3 ++- src/qemu/qemu_process.c | 3 ++- src/qemu/qemu_process.h | 1 + 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index f2c7b61..72d7144 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3462,7 +3462,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, goto stopjob; } - if (qemuProcessInit(driver, vm, true, false) < 0) + if (qemuProcessInit(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN, + true, false) < 0) goto stopjob; stopProcess = true; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 23238df..6a44073 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4323,6 +4323,7 @@ qemuProcessStartValidate(virDomainDefPtr def, int qemuProcessInit(virQEMUDriverPtr driver, virDomainObjPtr vm, + qemuDomainAsyncJob asyncJob ATTRIBUTE_UNUSED, bool migration, bool snap) { @@ -5263,7 +5264,7 @@ qemuProcessStart(virConnectPtr conn, VIR_QEMU_PROCESS_START_PAUSED | VIR_QEMU_PROCESS_START_AUTODESTROY, cleanup); - if (qemuProcessInit(driver, vm, !!migrateFrom, !!snapshot) < 0) + if (qemuProcessInit(driver, vm, asyncJob, !!migrateFrom, !!snapshot) < 0) goto cleanup; if (migrateFrom) { diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index a1c33f4..b16919d 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -89,6 +89,7 @@ int qemuProcessStartValidate(virDomainDefPtr def, int qemuProcessInit(virQEMUDriverPtr driver, virDomainObjPtr vm, + qemuDomainAsyncJob asyncJob, bool migration, bool snap); -- 2.7.1

On Tue, Feb 16, 2016 at 15:36:55 +0100, Jiri Denemark wrote:
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 3 ++- src/qemu/qemu_process.c | 3 ++- src/qemu/qemu_process.h | 1 + 3 files changed, 5 insertions(+), 2 deletions(-)
ACK

When destroying a domain we need to make sure we will be able to start a job no matter what other operations are running or even stuck in a job. This is done by killing the domain before starting the destroy job. Let's introduce qemuProcessBeginStopJob which combines killing a domain and starting a job in a single API which can be called everywhere we need a job to stop a domain. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 43 ++++++++++--------------------------------- src/qemu/qemu_process.c | 39 +++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.h | 4 ++++ 3 files changed, 53 insertions(+), 33 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2bbc724..1770f81 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2272,44 +2272,21 @@ qemuDomainDestroyFlags(virDomainPtr dom, if (virDomainDestroyFlagsEnsureACL(dom->conn, vm->def) < 0) goto cleanup; + if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, + !(flags & VIR_DOMAIN_DESTROY_GRACEFUL)) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + qemuDomainSetFakeReboot(driver, vm, false); if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_IN) stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; - /* We need to prevent monitor EOF callback from doing our work (and sending - * misleading events) while the vm is unlocked inside BeginJob/ProcessKill API - */ - priv->beingDestroyed = true; - - /* Although qemuProcessStop does this already, there may - * be an outstanding job active. We want to make sure we - * can kill the process even if a job is active. Killing - * it now means the job will be released - */ - if (flags & VIR_DOMAIN_DESTROY_GRACEFUL) { - if (qemuProcessKill(vm, 0) < 0) { - priv->beingDestroyed = false; - goto cleanup; - } - } else { - if (qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_FORCE) < 0) { - priv->beingDestroyed = false; - goto cleanup; - } - } - - if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_DESTROY) < 0) - goto cleanup; - - priv->beingDestroyed = false; - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("domain is not running")); - goto endjob; - } - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_DESTROYED, stopFlags); event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 6a44073..2f9df7c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5348,6 +5348,45 @@ qemuProcessKill(virDomainObjPtr vm, unsigned int flags) } +/** + * qemuProcessBeginStopJob: + * + * Stop all current jobs by killing the domain and start a new one for + * qemuProcessStop. + */ +int +qemuProcessBeginStopJob(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainJob job, + bool forceKill) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + unsigned int killFlags = forceKill ? VIR_QEMU_PROCESS_KILL_FORCE : 0; + int ret = -1; + + /* We need to prevent monitor EOF callback from doing our work (and + * sending misleading events) while the vm is unlocked inside + * BeginJob/ProcessKill API + */ + priv->beingDestroyed = true; + + if (qemuProcessKill(vm, killFlags) < 0) + goto cleanup; + + /* Wake up anything waiting on domain condition */ + virDomainObjBroadcast(vm); + + if (qemuDomainObjBeginJob(driver, vm, job) < 0) + goto cleanup; + + ret = 0; + + cleanup: + priv->beingDestroyed = false; + return ret; +} + + void qemuProcessStop(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainShutoffReason reason, diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index b16919d..69be99e 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -114,6 +114,10 @@ typedef enum { VIR_QEMU_PROCESS_STOP_NO_RELABEL = 1 << 1, } qemuProcessStopFlags; +int qemuProcessBeginStopJob(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainJob job, + bool forceKill); void qemuProcessStop(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainShutoffReason reason, -- 2.7.1

On Tue, Feb 16, 2016 at 15:36:56 +0100, Jiri Denemark wrote:
When destroying a domain we need to make sure we will be able to start a job no matter what other operations are running or even stuck in a job. This is done by killing the domain before starting the destroy job.
Let's introduce qemuProcessBeginStopJob which combines killing a domain and starting a job in a single API which can be called everywhere we need a job to stop a domain.
This patch does one additional thing ...
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 43 ++++++++++--------------------------------- src/qemu/qemu_process.c | 39 +++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.h | 4 ++++ 3 files changed, 53 insertions(+), 33 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2bbc724..1770f81 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2272,44 +2272,21 @@ qemuDomainDestroyFlags(virDomainPtr dom, if (virDomainDestroyFlagsEnsureACL(dom->conn, vm->def) < 0) goto cleanup;
+ if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, + !(flags & VIR_DOMAIN_DESTROY_GRACEFUL)) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + }
Moving this here is a non-reafactor change at this point. I'd suggest you split this into two patches and explain why you want to move it.
+ qemuDomainSetFakeReboot(driver, vm, false);
if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_IN) stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED;
ACK with the split

Only a small portion of processGuestPanicEvent was enclosed within a job, let's make sure we use the job for all operations to avoid race conditions. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 59 +++++++++++++++----------------------------------- 1 file changed, 18 insertions(+), 41 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 1770f81..0bb0759 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4012,17 +4012,6 @@ doCoreDumpToAutoDumpPath(virQEMUDriverPtr driver, timestr) < 0) goto cleanup; - if (qemuDomainObjBeginAsyncJob(driver, vm, - QEMU_ASYNC_JOB_DUMP) < 0) { - goto cleanup; - } - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("domain is not running")); - goto endjob; - } - flags |= cfg->autoDumpBypassCache ? VIR_DUMP_BYPASS_CACHE: 0; ret = doCoreDump(driver, vm, dumpfile, getCompressionType(driver), flags, @@ -4030,10 +4019,6 @@ doCoreDumpToAutoDumpPath(virQEMUDriverPtr driver, if (ret < 0) virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("Dump failed")); - - endjob: - qemuDomainObjEndAsyncJob(driver, vm); - cleanup: VIR_FREE(dumpfile); virObjectUnref(cfg); @@ -4048,11 +4033,15 @@ processGuestPanicEvent(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; virObjectEventPtr event = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + bool removeInactive = false; + + if (qemuDomainObjBeginAsyncJob(driver, vm, QEMU_ASYNC_JOB_DUMP) < 0) + goto cleanup; if (!virDomainObjIsActive(vm)) { VIR_DEBUG("Ignoring GUEST_PANICKED event from inactive domain %s", vm->def->name); - goto cleanup; + goto endjob; } virDomainObjSetState(vm, @@ -4065,6 +4054,11 @@ processGuestPanicEvent(virQEMUDriverPtr driver, qemuDomainEventQueue(driver, event); + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm, driver->caps) < 0) { + VIR_WARN("Unable to save status on vm %s after state change", + vm->def->name); + } + if (virDomainLockProcessPause(driver->lockManager, vm, &priv->lockState) < 0) VIR_WARN("Unable to release lease on %s", vm->def->name); VIR_DEBUG("Preserving lock state '%s'", NULLSTR(priv->lockState)); @@ -4072,40 +4066,23 @@ processGuestPanicEvent(virQEMUDriverPtr driver, switch (action) { case VIR_DOMAIN_LIFECYCLE_CRASH_COREDUMP_DESTROY: if (doCoreDumpToAutoDumpPath(driver, vm, VIR_DUMP_MEMORY_ONLY) < 0) - goto cleanup; + goto endjob; /* fall through */ case VIR_DOMAIN_LIFECYCLE_CRASH_DESTROY: - priv->beingDestroyed = true; - - if (qemuProcessKill(vm, VIR_QEMU_PROCESS_KILL_FORCE) < 0) { - priv->beingDestroyed = false; - goto cleanup; - } - - priv->beingDestroyed = false; - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("domain is not running")); - goto cleanup; - } - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_CRASHED, 0); event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_CRASHED); qemuDomainEventQueue(driver, event); - virDomainAuditStop(vm, "destroyed"); - - qemuDomainRemoveInactive(driver, vm); + removeInactive = true; break; case VIR_DOMAIN_LIFECYCLE_CRASH_COREDUMP_RESTART: if (doCoreDumpToAutoDumpPath(driver, vm, VIR_DUMP_MEMORY_ONLY) < 0) - goto cleanup; + goto endjob; /* fall through */ case VIR_DOMAIN_LIFECYCLE_CRASH_RESTART: @@ -4120,12 +4097,12 @@ processGuestPanicEvent(virQEMUDriverPtr driver, break; } - cleanup: - if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm, driver->caps) < 0) { - VIR_WARN("Unable to save status on vm %s after state change", - vm->def->name); - } + endjob: + qemuDomainObjEndAsyncJob(driver, vm); + if (removeInactive) + qemuDomainRemoveInactive(driver, vm); + cleanup: virObjectUnref(cfg); } -- 2.7.1

On Tue, Feb 16, 2016 at 15:36:57 +0100, Jiri Denemark wrote:
Only a small portion of processGuestPanicEvent was enclosed within a job, let's make sure we use the job for all operations to avoid race conditions.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 59 +++++++++++++++----------------------------------- 1 file changed, 18 insertions(+), 41 deletions(-)
ACK

Stopping a domain without a job risks a race condition with another thread which started a job a which does not expect anyone else to be messing around with the same domain object. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor.c | 14 ++++++++++---- src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_process.c | 45 +++++++++++++++---------------------------- 5 files changed, 78 insertions(+), 35 deletions(-) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 5812174..8655ba8 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -234,6 +234,7 @@ typedef enum { QEMU_PROCESS_EVENT_NIC_RX_FILTER_CHANGED, QEMU_PROCESS_EVENT_SERIAL_CHANGED, QEMU_PROCESS_EVENT_BLOCK_JOB, + QEMU_PROCESS_EVENT_MONITOR_EOF, QEMU_PROCESS_EVENT_LAST } qemuProcessEventType; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0bb0759..dc448fb 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4563,13 +4563,59 @@ processBlockJobEvent(virQEMUDriverPtr driver, } +static void +processMonitorEOFEvent(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN; + int stopReason = VIR_DOMAIN_SHUTOFF_SHUTDOWN; + const char *auditReason = "shutdown"; + unsigned int stopFlags = 0; + virObjectEventPtr event = NULL; + + if (qemuProcessBeginStopJob(driver, vm, QEMU_JOB_DESTROY, true) < 0) + return; + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("Domain %p '%s' is not active, ignoring EOF", + vm, vm->def->name); + goto endjob; + } + + if (priv->monJSON && !priv->gotShutdown) { + VIR_DEBUG("Monitor connection to '%s' closed without SHUTDOWN event; " + "assuming the domain crashed", vm->def->name); + eventReason = VIR_DOMAIN_EVENT_STOPPED_FAILED; + stopReason = VIR_DOMAIN_SHUTOFF_CRASHED; + auditReason = "failed"; + } + + if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_IN) { + stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; + qemuMigrationErrorSave(driver, vm->def->name, + qemuMonitorLastError(priv->mon)); + } + + event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, + eventReason); + qemuProcessStop(driver, vm, stopReason, stopFlags); + virDomainAuditStop(vm, auditReason); + qemuDomainEventQueue(driver, event); + + endjob: + qemuDomainObjEndJob(driver, vm); + qemuDomainRemoveInactive(driver, vm); +} + + static void qemuProcessEventHandler(void *data, void *opaque) { struct qemuProcessEvent *processEvent = data; virDomainObjPtr vm = processEvent->vm; virQEMUDriverPtr driver = opaque; - VIR_DEBUG("vm=%p", vm); + VIR_DEBUG("vm=%p, event=%d", vm, processEvent->eventType); virObjectLock(vm); @@ -4596,6 +4642,9 @@ static void qemuProcessEventHandler(void *data, void *opaque) processEvent->action, processEvent->status); break; + case QEMU_PROCESS_EVENT_MONITOR_EOF: + processMonitorEOFEvent(driver, vm); + break; case QEMU_PROCESS_EVENT_LAST: break; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 6b23e88..c3583ff 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -915,6 +915,15 @@ qemuMonitorOpenFD(virDomainObjPtr vm, void +qemuMonitorUnregister(qemuMonitorPtr mon) +{ + if (mon->watch) { + virEventRemoveHandle(mon->watch); + mon->watch = 0; + } +} + +void qemuMonitorClose(qemuMonitorPtr mon) { if (!mon) @@ -927,10 +936,7 @@ qemuMonitorClose(qemuMonitorPtr mon) qemuMonitorSetDomainLog(mon, NULL, NULL, NULL); if (mon->fd >= 0) { - if (mon->watch) { - virEventRemoveHandle(mon->watch); - mon->watch = 0; - } + qemuMonitorUnregister(mon); VIR_FORCE_CLOSE(mon->fd); } diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 9d7d5f3..21cb256 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -244,6 +244,8 @@ qemuMonitorPtr qemuMonitorOpenFD(virDomainObjPtr vm, void *opaque) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(4); +void qemuMonitorUnregister(qemuMonitorPtr mon) + ATTRIBUTE_NONNULL(1); void qemuMonitorClose(qemuMonitorPtr mon); virErrorPtr qemuMonitorLastError(qemuMonitorPtr mon); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2f9df7c..e3d9f3d 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -284,54 +284,39 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, void *opaque) { virQEMUDriverPtr driver = opaque; - virObjectEventPtr event = NULL; qemuDomainObjPrivatePtr priv; - int eventReason = VIR_DOMAIN_EVENT_STOPPED_SHUTDOWN; - int stopReason = VIR_DOMAIN_SHUTOFF_SHUTDOWN; - const char *auditReason = "shutdown"; - unsigned int stopFlags = 0; + struct qemuProcessEvent *processEvent; + + virObjectLock(vm); VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name); - virObjectLock(vm); - priv = vm->privateData; - if (priv->beingDestroyed) { VIR_DEBUG("Domain is being destroyed, EOF is expected"); goto cleanup; } - if (!virDomainObjIsActive(vm)) { - VIR_DEBUG("Domain %p is not active, ignoring EOF", vm); + if (VIR_ALLOC(processEvent) < 0) goto cleanup; - } - if (priv->monJSON && !priv->gotShutdown) { - VIR_DEBUG("Monitor connection to '%s' closed without SHUTDOWN event; " - "assuming the domain crashed", vm->def->name); - eventReason = VIR_DOMAIN_EVENT_STOPPED_FAILED; - stopReason = VIR_DOMAIN_SHUTOFF_CRASHED; - auditReason = "failed"; - } + processEvent->eventType = QEMU_PROCESS_EVENT_MONITOR_EOF; + processEvent->vm = vm; - if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_IN) { - stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; - qemuMigrationErrorSave(driver, vm->def->name, - qemuMonitorLastError(priv->mon)); + virObjectRef(vm); + if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { + ignore_value(virObjectUnref(vm)); + VIR_FREE(processEvent); + goto cleanup; } - event = virDomainEventLifecycleNewFromObj(vm, - VIR_DOMAIN_EVENT_STOPPED, - eventReason); - qemuProcessStop(driver, vm, stopReason, stopFlags); - virDomainAuditStop(vm, auditReason); - - qemuDomainRemoveInactive(driver, vm); + /* We don't want this EOF handler to be called over and over while the + * thread is waiting for a job. + */ + qemuMonitorUnregister(mon); cleanup: virObjectUnlock(vm); - qemuDomainEventQueue(driver, event); } -- 2.7.1

On Tue, Feb 16, 2016 at 15:36:58 +0100, Jiri Denemark wrote:
Stopping a domain without a job risks a race condition with another thread which started a job a which does not expect anyone else to be messing around with the same domain object.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor.c | 14 ++++++++++---- src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_process.c | 45 +++++++++++++++---------------------------- 5 files changed, 78 insertions(+), 35 deletions(-)
ACK

Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_process.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e3d9f3d..19ecf59 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3339,6 +3339,7 @@ qemuProcessReconnect(void *opaque) size_t i; int ret; unsigned int stopFlags = 0; + bool jobStarted = false; VIR_FREE(data); @@ -3349,13 +3350,14 @@ qemuProcessReconnect(void *opaque) cfg = virQEMUDriverGetConfig(driver); priv = obj->privateData; + if (qemuDomainObjBeginJob(driver, obj, QEMU_JOB_MODIFY) < 0) + goto error; + jobStarted = true; + /* XXX If we ever gonna change pid file pattern, come up with * some intelligence here to deal with old paths. */ if (!(priv->pidfile = virPidFileBuildPath(cfg->stateDir, obj->def->name))) - goto killvm; - - if (qemuDomainObjBeginJob(driver, obj, QEMU_JOB_MODIFY) < 0) - goto killvm; + goto error; virNWFilterReadLockFilterUpdates(); @@ -3425,7 +3427,6 @@ qemuProcessReconnect(void *opaque) VIR_DEBUG("Finishing shutdown sequence for domain %s", obj->def->name); qemuProcessShutdownOrReboot(driver, obj); - qemuDomainObjEndJob(driver, obj); goto cleanup; } @@ -3498,12 +3499,18 @@ qemuProcessReconnect(void *opaque) if (virAtomicIntInc(&driver->nactive) == 1 && driver->inhibitCallback) driver->inhibitCallback(true, driver->inhibitOpaque); - qemuDomainObjEndJob(driver, obj); - goto cleanup; + cleanup: + if (jobStarted) + qemuDomainObjEndJob(driver, obj); + if (!virDomainObjIsActive(obj)) + qemuDomainRemoveInactive(driver, obj); + virDomainObjEndAPI(&obj); + virObjectUnref(conn); + virObjectUnref(cfg); + virNWFilterUnlockFilterUpdates(); + return; error: - qemuDomainObjEndJob(driver, obj); - killvm: if (virDomainObjIsActive(obj)) { /* We can't get the monitor back, so must kill the VM * to remove danger of it ending up running twice if @@ -3521,14 +3528,7 @@ qemuProcessReconnect(void *opaque) } qemuProcessStop(driver, obj, state, stopFlags); } - - qemuDomainRemoveInactive(driver, obj); - - cleanup: - virDomainObjEndAPI(&obj); - virObjectUnref(conn); - virObjectUnref(cfg); - virNWFilterUnlockFilterUpdates(); + goto cleanup; } static int -- 2.7.1

On Tue, Feb 16, 2016 at 15:36:59 +0100, Jiri Denemark wrote:
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_process.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-)
ACK

Calling qemuProcessStop without a job opens a way to race conditions with qemuDomainObjExitMonitor called in another thread. A real world example of such a race condition: - migration thread (A) calls qemuMigrationWaitForSpice - another thread (B) starts processing qemuDomainAbortJob API - thread B signals thread A via qemuDomainObjAbortAsyncJob - thread B enters monitor (qemuDomainObjEnterMonitor) - thread B calls qemuMonitorSend - thread A awakens and calls qemuProcessStop - thread A calls qemuMonitorClose and sets priv->mon to NULL - thread B calls qemuDomainObjExitMonitor with priv->mon == NULL => monitor stays ref'ed and locked Depending on how lucky we are, the race may result in a memory leak or it can even deadlock libvirtd's event loop if it tries to lock the monitor to process an event received before qemuMonitorClose was called. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 28 +++++++++++------- src/qemu/qemu_migration.c | 6 +++- src/qemu/qemu_process.c | 72 +++++++++++++++++++++++++++++++---------------- src/qemu/qemu_process.h | 1 + 4 files changed, 71 insertions(+), 36 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index dc448fb..3bccb55 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2287,7 +2287,8 @@ qemuDomainDestroyFlags(virDomainPtr dom, if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_IN) stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_DESTROYED, stopFlags); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_DESTROYED, + QEMU_ASYNC_JOB_NONE, stopFlags); event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_DESTROYED); @@ -3297,7 +3298,8 @@ qemuDomainSaveInternal(virQEMUDriverPtr driver, virDomainPtr dom, goto endjob; /* Shut it down */ - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_SAVED, 0); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_SAVED, + QEMU_ASYNC_JOB_SAVE, 0); virDomainAuditStop(vm, "saved"); event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_SAVED); @@ -3790,7 +3792,8 @@ qemuDomainCoreDumpWithFormat(virDomainPtr dom, endjob: if ((ret == 0) && (flags & VIR_DUMP_CRASH)) { - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_CRASHED, 0); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_CRASHED, + QEMU_ASYNC_JOB_DUMP, 0); virDomainAuditStop(vm, "crashed"); event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, @@ -4070,7 +4073,8 @@ processGuestPanicEvent(virQEMUDriverPtr driver, /* fall through */ case VIR_DOMAIN_LIFECYCLE_CRASH_DESTROY: - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_CRASHED, 0); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_CRASHED, + QEMU_ASYNC_JOB_DUMP, 0); event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_CRASHED); @@ -4599,7 +4603,7 @@ processMonitorEOFEvent(virQEMUDriverPtr driver, event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, eventReason); - qemuProcessStop(driver, vm, stopReason, stopFlags); + qemuProcessStop(driver, vm, stopReason, QEMU_ASYNC_JOB_NONE, stopFlags); virDomainAuditStop(vm, auditReason); qemuDomainEventQueue(driver, event); @@ -6609,7 +6613,7 @@ qemuDomainSaveImageStartVM(virConnectPtr conn, } if (virCommandWait(cmd, NULL) < 0) { - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, 0); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, asyncJob, 0); restored = false; } VIR_DEBUG("Decompression binary stderr: %s", NULLSTR(errbuf)); @@ -13626,7 +13630,8 @@ qemuDomainSnapshotCreateActiveInternal(virConnectPtr conn, if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT) { event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_FROM_SNAPSHOT); - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, 0); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, + QEMU_ASYNC_JOB_SNAPSHOT, 0); virDomainAuditStop(vm, "from-snapshot"); resume = false; } @@ -14468,7 +14473,8 @@ qemuDomainSnapshotCreateActiveExternal(virConnectPtr conn, if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT) { event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED, VIR_DOMAIN_EVENT_STOPPED_FROM_SNAPSHOT); - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, 0); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, + QEMU_ASYNC_JOB_SNAPSHOT, 0); virDomainAuditStop(vm, "from-snapshot"); resume = false; thaw = 0; @@ -15328,7 +15334,8 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, } virResetError(err); qemuProcessStop(driver, vm, - VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, 0); + VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, + QEMU_ASYNC_JOB_START, 0); virDomainAuditStop(vm, "from-snapshot"); detail = VIR_DOMAIN_EVENT_STOPPED_FROM_SNAPSHOT; event = virDomainEventLifecycleNewFromObj(vm, @@ -15448,7 +15455,8 @@ qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, if (virDomainObjIsActive(vm)) { /* Transitions 4, 7 */ - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, 0); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, + QEMU_ASYNC_JOB_START, 0); virDomainAuditStop(vm, "from-snapshot"); detail = VIR_DOMAIN_EVENT_STOPPED_FROM_SNAPSHOT; event = virDomainEventLifecycleNewFromObj(vm, diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 72d7144..21804f6 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3594,7 +3594,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, if (!relabel) stopFlags |= VIR_QEMU_PROCESS_STOP_NO_RELABEL; virDomainAuditStart(vm, "migrated", false); - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, stopFlags); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, + QEMU_ASYNC_JOB_MIGRATION_IN, stopFlags); } qemuMigrationJobFinish(driver, vm); @@ -3903,6 +3904,7 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver, qemuMigrationWaitForSpice(vm); qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_MIGRATED, + QEMU_ASYNC_JOB_MIGRATION_OUT, VIR_QEMU_PROCESS_STOP_MIGRATED); virDomainAuditStop(vm, "migrated"); @@ -5414,6 +5416,7 @@ qemuMigrationPerformJob(virQEMUDriverPtr driver, */ if (!v3proto) { qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_MIGRATED, + QEMU_ASYNC_JOB_MIGRATION_OUT, VIR_QEMU_PROCESS_STOP_MIGRATED); virDomainAuditStop(vm, "migrated"); event = virDomainEventLifecycleNewFromObj(vm, @@ -5896,6 +5899,7 @@ qemuMigrationFinish(virQEMUDriverPtr driver, !(flags & VIR_MIGRATE_OFFLINE) && virDomainObjIsActive(vm)) { qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, + QEMU_ASYNC_JOB_MIGRATION_IN, VIR_QEMU_PROCESS_STOP_MIGRATED); virDomainAuditStop(vm, "failed"); event = virDomainEventLifecycleNewFromObj(vm, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 19ecf59..9d6afc4 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3526,7 +3526,11 @@ qemuProcessReconnect(void *opaque) * really is and FAILED means "failed to start" */ state = VIR_DOMAIN_SHUTOFF_UNKNOWN; } - qemuProcessStop(driver, obj, state, stopFlags); + /* If BeginJob failed, we jumped here without a job, let's hope another + * thread didn't have a chance to start playing with the domain yet + * (it's all we can do anyway). + */ + qemuProcessStop(driver, obj, state, QEMU_ASYNC_JOB_NONE, stopFlags); } goto cleanup; } @@ -3564,8 +3568,13 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Could not create thread. QEMU initialization " "might be incomplete")); - /* We can't spawn a thread and thus connect to monitor. Kill qemu. */ - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, 0); + /* We can't spawn a thread and thus connect to monitor. Kill qemu. + * It's safe to call qemuProcessStop without a job here since there + * is no thread that could be doing anything else with the same domain + * object. + */ + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, + QEMU_ASYNC_JOB_NONE, 0); qemuDomainRemoveInactive(src->driver, obj); virDomainObjEndAPI(&obj); @@ -4308,7 +4317,7 @@ qemuProcessStartValidate(virDomainDefPtr def, int qemuProcessInit(virQEMUDriverPtr driver, virDomainObjPtr vm, - qemuDomainAsyncJob asyncJob ATTRIBUTE_UNUSED, + qemuDomainAsyncJob asyncJob, bool migration, bool snap) { @@ -4381,7 +4390,7 @@ qemuProcessInit(virQEMUDriverPtr driver, stopFlags = VIR_QEMU_PROCESS_STOP_NO_RELABEL; if (migration) stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, stopFlags); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, asyncJob, stopFlags); goto cleanup; } @@ -5298,7 +5307,7 @@ qemuProcessStart(virConnectPtr conn, stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; if (priv->mon) qemuMonitorSetDomainLog(priv->mon, NULL, NULL, NULL); - qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, stopFlags); + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, asyncJob, stopFlags); goto cleanup; } @@ -5375,6 +5384,7 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver, void qemuProcessStop(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainShutoffReason reason, + qemuDomainAsyncJob asyncJob, unsigned int flags) { int ret; @@ -5389,27 +5399,34 @@ void qemuProcessStop(virQEMUDriverPtr driver, virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); qemuDomainLogContextPtr logCtxt = NULL; - VIR_DEBUG("Shutting down vm=%p name=%s id=%d pid=%llu flags=%x", + VIR_DEBUG("Shutting down vm=%p name=%s id=%d pid=%llu, " + "reason=%s, asyncJob=%s, flags=%x", vm, vm->def->name, vm->def->id, - (unsigned long long)vm->pid, flags); - - if (!virDomainObjIsActive(vm)) { - VIR_DEBUG("VM '%s' not active", vm->def->name); - virObjectUnref(cfg); - return; - } + (unsigned long long)vm->pid, + virDomainShutoffReasonTypeToString(reason), + qemuDomainAsyncJobTypeToString(asyncJob), + flags); /* This method is routinely used in clean up paths. Disable error * reporting so we don't squash a legit error. */ orig_err = virSaveLastError(); - /* - * We may unlock the vm in qemuProcessKill(), and another thread - * can lock the vm, and then call qemuProcessStop(). So we should - * set vm->def->id to -1 here to avoid qemuProcessStop() to be called twice. - */ + if (asyncJob != QEMU_ASYNC_JOB_NONE) { + if (qemuDomainObjBeginNestedJob(driver, vm, asyncJob) < 0) + goto cleanup; + } else if (priv->job.asyncJob != QEMU_ASYNC_JOB_NONE && + priv->job.asyncOwner == virThreadSelfID() && + priv->job.active != QEMU_JOB_ASYNC_NESTED) { + VIR_WARN("qemuProcessStop called without a nested job (async=%s)", + qemuDomainAsyncJobTypeToString(asyncJob)); + } + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("VM '%s' not active", vm->def->name); + goto endjob; + } + vm->def->id = -1; - if (virAtomicIntDecAndTest(&driver->nactive) && driver->inhibitCallback) driver->inhibitCallback(false, driver->inhibitOpaque); @@ -5664,6 +5681,11 @@ void qemuProcessStop(virQEMUDriverPtr driver, vm->newDef = NULL; } + endjob: + if (asyncJob != QEMU_ASYNC_JOB_NONE) + qemuDomainObjEndJob(driver, vm); + + cleanup: if (orig_err) { virSetError(orig_err); virFreeError(orig_err); @@ -5946,13 +5968,13 @@ qemuProcessAutoDestroy(virDomainObjPtr dom, qemuDomainObjDiscardAsyncJob(driver, dom); } - if (qemuDomainObjBeginJob(driver, dom, - QEMU_JOB_DESTROY) < 0) - goto cleanup; - VIR_DEBUG("Killing domain"); - qemuProcessStop(driver, dom, VIR_DOMAIN_SHUTOFF_DESTROYED, stopFlags); + if (qemuProcessBeginStopJob(driver, dom, QEMU_JOB_DESTROY, true) < 0) + goto cleanup; + + qemuProcessStop(driver, dom, VIR_DOMAIN_SHUTOFF_DESTROYED, + QEMU_ASYNC_JOB_NONE, stopFlags); virDomainAuditStop(dom, "destroyed"); event = virDomainEventLifecycleNewFromObj(dom, diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 69be99e..eb37863 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -121,6 +121,7 @@ int qemuProcessBeginStopJob(virQEMUDriverPtr driver, void qemuProcessStop(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainShutoffReason reason, + qemuDomainAsyncJob asyncJob, unsigned int flags); int qemuProcessAttach(virConnectPtr conn, -- 2.7.1

On Tue, Feb 16, 2016 at 15:37:00 +0100, Jiri Denemark wrote:
Calling qemuProcessStop without a job opens a way to race conditions with qemuDomainObjExitMonitor called in another thread. A real world example of such a race condition:
- migration thread (A) calls qemuMigrationWaitForSpice - another thread (B) starts processing qemuDomainAbortJob API - thread B signals thread A via qemuDomainObjAbortAsyncJob - thread B enters monitor (qemuDomainObjEnterMonitor) - thread B calls qemuMonitorSend - thread A awakens and calls qemuProcessStop - thread A calls qemuMonitorClose and sets priv->mon to NULL - thread B calls qemuDomainObjExitMonitor with priv->mon == NULL => monitor stays ref'ed and locked
Depending on how lucky we are, the race may result in a memory leak or it can even deadlock libvirtd's event loop if it tries to lock the monitor to process an event received before qemuMonitorClose was called.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_driver.c | 28 +++++++++++------- src/qemu/qemu_migration.c | 6 +++- src/qemu/qemu_process.c | 72 +++++++++++++++++++++++++++++++---------------- src/qemu/qemu_process.h | 1 + 4 files changed, 71 insertions(+), 36 deletions(-)
ACK

virDomainObjWait is designed to be called in a loop. Make sure we break the loop in case the domain dies to avoid waiting for an event which will never happen. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/conf/domain_conf.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 67415fa..f68a8d9 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2749,6 +2749,13 @@ virDomainObjWait(virDomainObjPtr vm) _("failed to wait for domain condition")); return -1; } + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("domain is not running")); + return -1; + } + return 0; } -- 2.7.1

On Tue, Feb 16, 2016 at 15:37:01 +0100, Jiri Denemark wrote:
virDomainObjWait is designed to be called in a loop. Make sure we break the loop in case the domain dies to avoid waiting for an event which will never happen.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/conf/domain_conf.c | 7 +++++++ 1 file changed, 7 insertions(+)
ACK, currently all the users expect the VM to be active at this point. Since the code of the function is short enough the behaviour is self-explanatory. Peter
participants (2)
-
Jiri Denemark
-
Peter Krempa