[libvirt] [PATCH v2 RFC 0/4] qemu: replace nested job with interruptible async job state

This is the next version of RFC [1] 'drop nested job concept'. Actually it is quite different from the first. Patches that accomodate callers to use functions to enter/exit monitor with without driver and asyncJob arguments are moved out of this series. I guess this can be done as soon as this series is upstream. Most of approach details are in the first patch. Pros: - better async job and concurrent regular job isolation - more straightforward implementation - more simple to use - no need to pass async job argument down the stack - safer to use - no warnings to be fixed if function is started to be used from asynchronous job context. TODO: - replace qemuDomainObjEnterMonitorAsync with qemuDomainObjEnterMonitor - accomodate callers to qemuDomainObjExitMonitor became returning void - remove passing driver and asyncJob down the stack - reflect changes in THREADS.txt [1] https://www.redhat.com/archives/libvir-list/2016-November/msg01357.html Nikolay Shirokovskiy (4): qemu: replace nested job with interruptible async job state qemu: remove liveness check from qemuDomainObjExitMonitor qemu: remove nesting job usage from qemuProcessStop qemu: remove the rest of nested job parts src/conf/domain_conf.c | 19 ---- src/conf/domain_conf.h | 1 - src/libvirt_private.syms | 1 - src/qemu/qemu_domain.c | 253 +++++++++++++++++++++++++++++----------------- src/qemu/qemu_domain.h | 29 ++++-- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_migration.c | 70 +++++++------ src/qemu/qemu_process.c | 31 ++---- 8 files changed, 225 insertions(+), 181 deletions(-) -- 1.8.3.1

Nested job is a construction that gives way to run regular jobs while async job is running. But the period when another job is actually can be started is when async job waits for some event with domain lock dropped. Let's code this condition straitforward using asyncInterruptible flag. Upon awaking in async job it is important to wait while any regular job that is started meanwhile is finished so the two are not overlapped and don't use domain or monitor object simultaneously. This solution isolates jobs even better than nested job appoach. In the latter it is possible that concurrent job awaits reply from monitor, then monitor event is delivered and async job awakes and continues to run until it needs to send command to monitor. Only at this moment it stops on acquiring nested job condition. qemuDomainObjEnterMonitorInternal needs asyncJob argument and can not use domain job state because the function can be called simultaneously from async job and concurrend job and can not determine context due to nesting without help from callers. With qemuDomainObjWaitInternal there is no cases when this function is called from concurrent job when async job is running currently. But just for the case detection is possible and implemented because this state of affairs can be changed one day. So qemuDomainObj{Wait,Sleep} are safe to use from regular or async job context. There will be no situations when qemuDomainObjEnterMonitor should be changed to qemuDomainObjEnterMonitorAsync because function begins being used from async job context. --- src/conf/domain_conf.c | 19 ----- src/conf/domain_conf.h | 1 - src/libvirt_private.syms | 1 - src/qemu/qemu_domain.c | 194 ++++++++++++++++++++++++++++++++++++---------- src/qemu/qemu_domain.h | 23 +++++- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_migration.c | 70 ++++++++--------- src/qemu/qemu_process.c | 8 +- 8 files changed, 212 insertions(+), 106 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 7f5da4e..ee30e4e 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -3015,25 +3015,6 @@ virDomainObjBroadcast(virDomainObjPtr vm) } -int -virDomainObjWait(virDomainObjPtr vm) -{ - if (virCondWait(&vm->cond, &vm->parent.lock) < 0) { - virReportSystemError(errno, "%s", - _("failed to wait for domain condition")); - return -1; - } - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("domain is not running")); - return -1; - } - - return 0; -} - - /** * Waits for domain condition to be triggered for a specific period of time. * diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 31c7a92..4a4a5e6 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2577,7 +2577,6 @@ bool virDomainObjTaint(virDomainObjPtr obj, virDomainTaintFlags taint); void virDomainObjBroadcast(virDomainObjPtr vm); -int virDomainObjWait(virDomainObjPtr vm); int virDomainObjWaitUntil(virDomainObjPtr vm, unsigned long long whenms); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index e6901a8..e974108 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -452,7 +452,6 @@ virDomainObjSetMetadata; virDomainObjSetState; virDomainObjTaint; virDomainObjUpdateModificationImpact; -virDomainObjWait; virDomainObjWaitUntil; virDomainOSTypeFromString; virDomainOSTypeToString; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 13fd706..2a51b5e 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -323,6 +323,7 @@ qemuDomainObjResetAsyncJob(qemuDomainObjPrivatePtr priv) job->spiceMigration = false; job->spiceMigrated = false; job->postcopyEnabled = false; + job->asyncInterruptible = false; VIR_FREE(job->current); } @@ -3622,15 +3623,16 @@ qemuDomainObjReleaseAsyncJob(virDomainObjPtr obj) } static bool -qemuDomainNestedJobAllowed(qemuDomainObjPrivatePtr priv, qemuDomainJob job) +qemuDomainAsyncJobCompatible(qemuDomainObjPrivatePtr priv, qemuDomainJob job) { - return !priv->job.asyncJob || (priv->job.mask & JOB_MASK(job)) != 0; + return !priv->job.asyncJob || + (priv->job.asyncInterruptible && (priv->job.mask & JOB_MASK(job)) != 0); } bool qemuDomainJobAllowed(qemuDomainObjPrivatePtr priv, qemuDomainJob job) { - return !priv->job.active && qemuDomainNestedJobAllowed(priv, job); + return !priv->job.active && qemuDomainAsyncJobCompatible(priv, job); } /* Give up waiting for mutex after 30 seconds */ @@ -3648,7 +3650,6 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = obj->privateData; unsigned long long now; unsigned long long then; - bool nested = job == QEMU_JOB_ASYNC_NESTED; bool async = job == QEMU_JOB_ASYNC; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); const char *blocker = NULL; @@ -3681,7 +3682,7 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, goto error; } - while (!nested && !qemuDomainNestedJobAllowed(priv, job)) { + while (!qemuDomainAsyncJobCompatible(priv, job)) { VIR_DEBUG("Waiting for async job (vm=%p name=%s)", obj, obj->def->name); if (virCondWaitUntil(&priv->job.asyncCond, &obj->parent.lock, then) < 0) goto error; @@ -3695,7 +3696,7 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, /* No job is active but a new async job could have been started while obj * was unlocked, so we need to recheck it. */ - if (!nested && !qemuDomainNestedJobAllowed(priv, job)) + if (!qemuDomainAsyncJobCompatible(priv, job)) goto retry; qemuDomainObjResetJob(priv); @@ -3750,7 +3751,7 @@ qemuDomainObjBeginJobInternal(virQEMUDriverPtr driver, priv->job.asyncOwner, NULLSTR(priv->job.asyncOwnerAPI), duration / 1000, asyncDuration / 1000); - if (nested || qemuDomainNestedJobAllowed(priv, job)) + if (qemuDomainAsyncJobCompatible(priv, job)) blocker = priv->job.ownerAPI; else blocker = priv->job.asyncOwnerAPI; @@ -3870,7 +3871,7 @@ qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj) qemuDomainObjResetJob(priv); if (qemuDomainTrackJob(job)) qemuDomainObjSaveJob(driver, obj); - virCondSignal(&priv->job.cond); + virCondBroadcast(&priv->job.cond); } void @@ -3907,44 +3908,25 @@ qemuDomainObjAbortAsyncJob(virDomainObjPtr obj) * * To be called immediately before any QEMU monitor API call * Must have already either called qemuDomainObjBeginJob() and checked - * that the VM is still active; may not be used for nested async jobs. + * that the VM is still active. * * To be followed with qemuDomainObjExitMonitor() once complete */ -static int -qemuDomainObjEnterMonitorInternal(virQEMUDriverPtr driver, - virDomainObjPtr obj, - qemuDomainAsyncJob asyncJob) +void +qemuDomainObjEnterMonitor(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; - if (asyncJob != QEMU_ASYNC_JOB_NONE) { - int ret; - if ((ret = qemuDomainObjBeginNestedJob(driver, obj, asyncJob)) < 0) - return ret; - if (!virDomainObjIsActive(obj)) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("domain is no longer running")); - qemuDomainObjEndJob(driver, obj); - return -1; - } - } else if (priv->job.asyncOwner == virThreadSelfID()) { - VIR_WARN("This thread seems to be the async job owner; entering" - " monitor without asking for a nested job is dangerous"); - } - VIR_DEBUG("Entering monitor (mon=%p vm=%p name=%s)", priv->mon, obj, obj->def->name); virObjectLock(priv->mon); virObjectRef(priv->mon); ignore_value(virTimeMillisNow(&priv->monStart)); virObjectUnlock(obj); - - return 0; } static void ATTRIBUTE_NONNULL(1) -qemuDomainObjExitMonitorInternal(virQEMUDriverPtr driver, +qemuDomainObjExitMonitorInternal(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; @@ -3962,17 +3944,8 @@ qemuDomainObjExitMonitorInternal(virQEMUDriverPtr driver, priv->monStart = 0; if (!hasRefs) priv->mon = NULL; - - if (priv->job.active == QEMU_JOB_ASYNC_NESTED) - qemuDomainObjEndJob(driver, obj); } -void qemuDomainObjEnterMonitor(virQEMUDriverPtr driver, - virDomainObjPtr obj) -{ - ignore_value(qemuDomainObjEnterMonitorInternal(driver, obj, - QEMU_ASYNC_JOB_NONE)); -} /* obj must NOT be locked before calling * @@ -4014,9 +3987,134 @@ int qemuDomainObjExitMonitor(virQEMUDriverPtr driver, int qemuDomainObjEnterMonitorAsync(virQEMUDriverPtr driver, virDomainObjPtr obj, - qemuDomainAsyncJob asyncJob) + qemuDomainAsyncJob asyncJob ATTRIBUTE_UNUSED) { - return qemuDomainObjEnterMonitorInternal(driver, obj, asyncJob); + qemuDomainObjEnterMonitor(driver, obj); + return 0; +} + + +void +qemuDomainObjEnterInterruptible(virDomainObjPtr obj, + qemuDomainJobContextPtr ctx) +{ + qemuDomainObjPrivatePtr priv = obj->privateData; + struct qemuDomainJobObj *job = &priv->job; + + /* Second clause helps to detect situation when this function is + * called from from concurrent regular job. + */ + ctx->async = job->asyncJob && !job->active; + + if (!ctx->async) + return; + + job->asyncInterruptible = true; + VIR_DEBUG("Async job enters interruptible state. " + "(obj=%p name=%s, async=%s)", + obj, obj->def->name, + qemuDomainAsyncJobTypeToString(job->asyncJob)); + virCondBroadcast(&priv->job.asyncCond); +} + + +int +qemuDomainObjExitInterruptible(virDomainObjPtr obj, + qemuDomainJobContextPtr ctx) +{ + qemuDomainObjPrivatePtr priv = obj->privateData; + struct qemuDomainJobObj *job = &priv->job; + virErrorPtr err = NULL; + int ret = -1; + + if (!ctx->async) + return 0; + + job->asyncInterruptible = false; + VIR_DEBUG("Async job exits interruptible state. " + "(obj=%p name=%s, async=%s)", + obj, obj->def->name, + qemuDomainAsyncJobTypeToString(job->asyncJob)); + + err = virSaveLastError(); + + /* Before continuing async job wait until any job started in + * meanwhile is finished. + */ + while (job->active) { + if (virCondWait(&priv->job.cond, &obj->parent.lock) < 0) { + virReportSystemError(errno, "%s", + _("failed to wait for job condition")); + goto cleanup; + } + } + + if (!virDomainObjIsActive(obj)) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("domain is not running")); + goto cleanup; + } + + ret = 0; + cleanup: + if (err) { + virSetError(err); + virFreeError(err); + } + return ret; +} + + +/* + * obj must be locked before calling. Must be used within context of regular or + * async job + * + * Wait on obj lock. In case of async job regular compatible jobs are allowed + * to run while waiting. Any regular job than is started during the wait is + * finished before return from this function. + */ +int +qemuDomainObjWait(virDomainObjPtr obj) +{ + qemuDomainJobContext ctx; + int rc = 0; + + qemuDomainObjEnterInterruptible(obj, &ctx); + + if (virCondWait(&obj->cond, &obj->parent.lock) < 0) { + virReportSystemError(errno, "%s", + _("failed to wait for domain condition")); + rc = -1; + } + + if (qemuDomainObjExitInterruptible(obj, &ctx) < 0 || rc < 0) + return -1; + + return 0; +} + + +/* + * obj must be locked before calling. Must be used within context of regular or + * async job + * + * Sleep with obj lock dropped. In case of async job regular compatible jobs + * are allowed to run while sleeping. Any regular job than is started during the + * sleep is finished before return from this function. + */ +int +qemuDomainObjSleep(virDomainObjPtr obj, unsigned long nsec) +{ + struct timespec ts = { .tv_sec = 0, .tv_nsec = nsec }; + qemuDomainJobContext ctx; + + qemuDomainObjEnterInterruptible(obj, &ctx); + + virObjectUnlock(obj); + nanosleep(&ts, NULL); + virObjectLock(obj); + + return qemuDomainObjExitInterruptible(obj, &ctx); } @@ -4063,16 +4161,26 @@ qemuDomainObjExitAgent(virDomainObjPtr obj, qemuAgentPtr agent) void qemuDomainObjEnterRemote(virDomainObjPtr obj) { + qemuDomainJobContext ctx; + VIR_DEBUG("Entering remote (vm=%p name=%s)", obj, obj->def->name); + + qemuDomainObjEnterInterruptible(obj, &ctx); virObjectUnlock(obj); } -void qemuDomainObjExitRemote(virDomainObjPtr obj) +int qemuDomainObjExitRemote(virDomainObjPtr obj) { + /* enter/exit remote MUST be called only in the context of async job */ + qemuDomainJobContext ctx = { .async = true }; + virObjectLock(obj); + VIR_DEBUG("Exited remote (vm=%p name=%s)", obj, obj->def->name); + + return qemuDomainObjExitInterruptible(obj, &ctx); } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index aebd91a..39b3aed 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -132,6 +132,8 @@ struct qemuDomainJobObj { virCond asyncCond; /* Use to coordinate with async jobs */ qemuDomainAsyncJob asyncJob; /* Currently active async job */ + bool asyncInterruptible; /* Jobs compatible with current async job + are allowed to run */ unsigned long long asyncOwner; /* Thread which set current async job */ const char *asyncOwnerAPI; /* The API which owns the async job */ unsigned long long asyncStarted; /* When the current async job started */ @@ -473,6 +475,23 @@ int qemuDomainObjEnterMonitorAsync(virQEMUDriverPtr driver, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; +typedef struct _qemuDomainJobContext qemuDomainJobContext; +typedef qemuDomainJobContext *qemuDomainJobContextPtr; +struct _qemuDomainJobContext { + bool async; +}; + +void qemuDomainObjEnterInterruptible(virDomainObjPtr obj, qemuDomainJobContextPtr ctx) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuDomainObjExitInterruptible(virDomainObjPtr obj, qemuDomainJobContextPtr ctx) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; + +int qemuDomainObjWait(virDomainObjPtr obj) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + +int qemuDomainObjSleep(virDomainObjPtr obj, unsigned long nsec) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; + qemuAgentPtr qemuDomainObjEnterAgent(virDomainObjPtr obj) ATTRIBUTE_NONNULL(1); void qemuDomainObjExitAgent(virDomainObjPtr obj, qemuAgentPtr agent) @@ -481,8 +500,8 @@ void qemuDomainObjExitAgent(virDomainObjPtr obj, qemuAgentPtr agent) void qemuDomainObjEnterRemote(virDomainObjPtr obj) ATTRIBUTE_NONNULL(1); -void qemuDomainObjExitRemote(virDomainObjPtr obj) - ATTRIBUTE_NONNULL(1); +int qemuDomainObjExitRemote(virDomainObjPtr obj) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_RETURN_CHECK; virDomainDefPtr qemuDomainDefCopy(virQEMUDriverPtr driver, virDomainDefPtr src, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a5c664e..28e16e5 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16260,7 +16260,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom, qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); qemuBlockJobUpdate(driver, vm, QEMU_ASYNC_JOB_NONE, disk); while (diskPriv->blockjob) { - if (virDomainObjWait(vm) < 0) { + if (qemuDomainObjWait(vm) < 0) { ret = -1; goto endjob; } diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index b4507a3..e848a62 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -840,7 +840,7 @@ qemuMigrationCancelDriveMirror(virQEMUDriverPtr driver, if (failed && !err) err = virSaveLastError(); - if (virDomainObjWait(vm) < 0) + if (qemuDomainObjWait(vm) < 0) goto cleanup; } @@ -979,7 +979,7 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, goto cleanup; } - if (virDomainObjWait(vm) < 0) + if (qemuDomainObjWait(vm) < 0) goto cleanup; } @@ -1334,7 +1334,7 @@ qemuMigrationWaitForSpice(virDomainObjPtr vm) VIR_DEBUG("Waiting for SPICE to finish migration"); while (!priv->job.spiceMigrated && !priv->job.abortJob) { - if (virDomainObjWait(vm) < 0) + if (qemuDomainObjWait(vm) < 0) return -1; } return 0; @@ -1569,7 +1569,7 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; bool events = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_EVENT); - int rv; + int rv, rc; flags |= QEMU_MIGRATION_COMPLETED_UPDATE_STATS; @@ -1579,18 +1579,15 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, if (rv < 0) return rv; - if (events) { - if (virDomainObjWait(vm) < 0) { - jobInfo->type = VIR_DOMAIN_JOB_FAILED; - return -2; - } - } else { - /* Poll every 50ms for progress & to allow cancellation */ - struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; + /* Poll every 50ms for progress & to allow cancellation */ + if (events) + rc = qemuDomainObjWait(vm); + else + rc = qemuDomainObjSleep(vm, 50 * 1000 * 1000ul); - virObjectUnlock(vm); - nanosleep(&ts, NULL); - virObjectLock(vm); + if (rc < 0) { + jobInfo->type = VIR_DOMAIN_JOB_FAILED; + return -2; } } @@ -1623,7 +1620,7 @@ qemuMigrationWaitForDestCompletion(virQEMUDriverPtr driver, while ((rv = qemuMigrationCompleted(driver, vm, asyncJob, NULL, flags)) != 1) { - if (rv < 0 || virDomainObjWait(vm) < 0) + if (rv < 0 || qemuDomainObjWait(vm) < 0) return -1; } @@ -3842,7 +3839,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (priv->monJSON) { while (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { priv->signalStop = true; - rc = virDomainObjWait(vm); + rc = qemuDomainObjWait(vm); priv->signalStop = false; if (rc < 0) goto cancelPostCopy; @@ -4139,27 +4136,20 @@ static int doPeer2PeerMigrate2(virQEMUDriverPtr driver, qemuDomainObjEnterRemote(vm); ret = dconn->driver->domainMigratePrepareTunnel (dconn, st, destflags, dname, resource, dom_xml); - qemuDomainObjExitRemote(vm); + if (qemuDomainObjExitRemote(vm) < 0) + ret = -1; } else { qemuDomainObjEnterRemote(vm); ret = dconn->driver->domainMigratePrepare2 (dconn, &cookie, &cookielen, NULL, &uri_out, destflags, dname, resource, dom_xml); - qemuDomainObjExitRemote(vm); + if (qemuDomainObjExitRemote(vm) < 0) + ret = -1; } VIR_FREE(dom_xml); if (ret == -1) goto cleanup; - /* the domain may have shutdown or crashed while we had the locks dropped - * in qemuDomainObjEnterRemote, so check again - */ - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("guest unexpectedly quit")); - goto cleanup; - } - if (!(flags & VIR_MIGRATE_TUNNELLED) && (uri_out == NULL)) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -4206,7 +4196,7 @@ static int doPeer2PeerMigrate2(virQEMUDriverPtr driver, ddomain = dconn->driver->domainMigrateFinish2 (dconn, dname, cookie, cookielen, uri_out ? uri_out : dconnuri, destflags, cancelled); - qemuDomainObjExitRemote(vm); + ignore_value(qemuDomainObjExitRemote(vm)); if (cancelled && ddomain) VIR_ERROR(_("finish step ignored that migration was cancelled")); @@ -4367,7 +4357,8 @@ doPeer2PeerMigrate3(virQEMUDriverPtr driver, (dconn, st, cookiein, cookieinlen, &cookieout, &cookieoutlen, destflags, dname, bandwidth, dom_xml); } - qemuDomainObjExitRemote(vm); + if (qemuDomainObjExitRemote(vm) < 0) + ret = -1; } else { qemuDomainObjEnterRemote(vm); if (useParams) { @@ -4379,7 +4370,8 @@ doPeer2PeerMigrate3(virQEMUDriverPtr driver, (dconn, cookiein, cookieinlen, &cookieout, &cookieoutlen, uri, &uri_out, destflags, dname, bandwidth, dom_xml); } - qemuDomainObjExitRemote(vm); + if (qemuDomainObjExitRemote(vm) < 0) + ret = -1; } VIR_FREE(dom_xml); if (ret == -1) @@ -4475,7 +4467,7 @@ doPeer2PeerMigrate3(virQEMUDriverPtr driver, ddomain = dconn->driver->domainMigrateFinish3Params (dconn, params, nparams, cookiein, cookieinlen, &cookieout, &cookieoutlen, destflags, cancelled); - qemuDomainObjExitRemote(vm); + ignore_value(qemuDomainObjExitRemote(vm)); } } else { dname = dname ? dname : vm->def->name; @@ -4483,7 +4475,7 @@ doPeer2PeerMigrate3(virQEMUDriverPtr driver, ddomain = dconn->driver->domainMigrateFinish3 (dconn, dname, cookiein, cookieinlen, &cookieout, &cookieoutlen, dconnuri, uri, destflags, cancelled); - qemuDomainObjExitRemote(vm); + ignore_value(qemuDomainObjExitRemote(vm)); } if (cancelled) { @@ -4624,6 +4616,7 @@ static int doPeer2PeerMigrate(virQEMUDriverPtr driver, bool offline = false; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); bool useParams; + int rc; VIR_DEBUG("driver=%p, sconn=%p, vm=%p, xmlin=%s, dconnuri=%s, uri=%s, " "graphicsuri=%s, listenAddress=%s, nmigrate_disks=%zu, " @@ -4661,7 +4654,8 @@ static int doPeer2PeerMigrate(virQEMUDriverPtr driver, qemuDomainObjEnterRemote(vm); dconn = virConnectOpenAuth(dconnuri, &virConnectAuthConfig, 0); - qemuDomainObjExitRemote(vm); + rc = qemuDomainObjExitRemote(vm); + if (dconn == NULL) { virReportError(VIR_ERR_OPERATION_FAILED, _("Failed to connect to remote libvirt URI %s: %s"), @@ -4670,6 +4664,9 @@ static int doPeer2PeerMigrate(virQEMUDriverPtr driver, return -1; } + if (rc < 0) + goto cleanup; + if (virConnectSetKeepAlive(dconn, cfg->keepAliveInterval, cfg->keepAliveCount) < 0) goto cleanup; @@ -4693,7 +4690,8 @@ static int doPeer2PeerMigrate(virQEMUDriverPtr driver, if (flags & VIR_MIGRATE_OFFLINE) offline = VIR_DRV_SUPPORTS_FEATURE(dconn->driver, dconn, VIR_DRV_FEATURE_MIGRATION_OFFLINE); - qemuDomainObjExitRemote(vm); + if (qemuDomainObjExitRemote(vm) < 0) + goto cleanup; if (!p2p) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", @@ -4747,7 +4745,7 @@ static int doPeer2PeerMigrate(virQEMUDriverPtr driver, qemuDomainObjEnterRemote(vm); virConnectUnregisterCloseCallback(dconn, qemuMigrationConnectionClosed); virObjectUnref(dconn); - qemuDomainObjExitRemote(vm); + ignore_value(qemuDomainObjExitRemote(vm)); if (orig_err) { virSetError(orig_err); virFreeError(orig_err); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ea3e45c..87a9511 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -209,6 +209,8 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) qemuDomainObjPrivatePtr priv = vm->privateData; qemuAgentPtr agent = NULL; virDomainChrDefPtr config = qemuFindAgentConfig(vm->def); + qemuDomainJobContext ctx; + int rc; if (!config) return 0; @@ -232,6 +234,7 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) * deleted while the agent is active */ virObjectRef(vm); + qemuDomainObjEnterInterruptible(vm, &ctx); virObjectUnlock(vm); agent = qemuAgentOpen(vm, @@ -239,14 +242,13 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObjPtr vm) &agentCallbacks); virObjectLock(vm); + rc = qemuDomainObjExitInterruptible(vm, &ctx); if (agent == NULL) virObjectUnref(vm); - if (!virDomainObjIsActive(vm)) { + if (rc < 0) { qemuAgentClose(agent); - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("guest crashed while connecting to the guest agent")); return -1; } -- 1.8.3.1

As qemuProcessStop is called only in the context of some job and jobs can not overlap now this check becomes useless. Previously async job can continue to run while concurrent regular job is still waiting for qemu response. Now it is not possible. --- src/qemu/qemu_domain.c | 33 ++++++++------------------------- 1 file changed, 8 insertions(+), 25 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 2a51b5e..5aaa97a 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -3925,9 +3925,13 @@ qemuDomainObjEnterMonitor(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, virDomainObj virObjectUnlock(obj); } -static void ATTRIBUTE_NONNULL(1) -qemuDomainObjExitMonitorInternal(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, - virDomainObjPtr obj) +/* obj must NOT be locked before calling + * + * Should be paired with an earlier qemuDomainObjEnterMonitor call + */ +int +qemuDomainObjExitMonitor(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; bool hasRefs; @@ -3944,32 +3948,11 @@ qemuDomainObjExitMonitorInternal(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, priv->monStart = 0; if (!hasRefs) priv->mon = NULL; -} - -/* obj must NOT be locked before calling - * - * Should be paired with an earlier qemuDomainObjEnterMonitor() call - * - * Returns -1 if the domain is no longer alive after exiting the monitor. - * In that case, the caller should be careful when using obj's data, - * e.g. the live definition in vm->def has been freed by qemuProcessStop - * and replaced by the persistent definition, so pointers stolen - * from the live definition could no longer be valid. - */ -int qemuDomainObjExitMonitor(virQEMUDriverPtr driver, - virDomainObjPtr obj) -{ - qemuDomainObjExitMonitorInternal(driver, obj); - if (!virDomainObjIsActive(obj)) { - if (!virGetLastError()) - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("domain is no longer running")); - return -1; - } return 0; } + /* * obj must be locked before calling * -- 1.8.3.1

81f50cb92 adds getting nested job on stopping domain to prevent leaking of monitor object on race between aborting job and stopping domain cased by that abort. One of the causes of this problem is that async job and concurrent regular job were not fully isolated and async job can continue to run when concurrent job is not finished yet. Now this is not possible and we can safely remove getting nested job condition in qemuProcessStop as stopping always done within context of some job. --- src/qemu/qemu_process.c | 21 +++------------------ 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 87a9511..d9f4af7 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -6149,7 +6149,7 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver, void qemuProcessStop(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainShutoffReason reason, - qemuDomainAsyncJob asyncJob, + qemuDomainAsyncJob asyncJob ATTRIBUTE_UNUSED, unsigned int flags) { int ret; @@ -6163,30 +6163,19 @@ void qemuProcessStop(virQEMUDriverPtr driver, virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); VIR_DEBUG("Shutting down vm=%p name=%s id=%d pid=%lld, " - "reason=%s, asyncJob=%s, flags=%x", + "reason=%s, flags=%x", vm, vm->def->name, vm->def->id, (long long) vm->pid, virDomainShutoffReasonTypeToString(reason), - qemuDomainAsyncJobTypeToString(asyncJob), flags); /* This method is routinely used in clean up paths. Disable error * reporting so we don't squash a legit error. */ orig_err = virSaveLastError(); - if (asyncJob != QEMU_ASYNC_JOB_NONE) { - if (qemuDomainObjBeginNestedJob(driver, vm, asyncJob) < 0) - goto cleanup; - } else if (priv->job.asyncJob != QEMU_ASYNC_JOB_NONE && - priv->job.asyncOwner == virThreadSelfID() && - priv->job.active != QEMU_JOB_ASYNC_NESTED) { - VIR_WARN("qemuProcessStop called without a nested job (async=%s)", - qemuDomainAsyncJobTypeToString(asyncJob)); - } - if (!virDomainObjIsActive(vm)) { VIR_DEBUG("VM '%s' not active", vm->def->name); - goto endjob; + goto cleanup; } qemuProcessBuildDestroyHugepagesPath(driver, vm, false); @@ -6467,10 +6456,6 @@ void qemuProcessStop(virQEMUDriverPtr driver, virDomainObjRemoveTransientDef(vm); - endjob: - if (asyncJob != QEMU_ASYNC_JOB_NONE) - qemuDomainObjEndJob(driver, vm); - cleanup: if (orig_err) { virSetError(orig_err); -- 1.8.3.1

It is not clear whether nested job could be saved in status file or not so it stays in job types enum. --- src/qemu/qemu_domain.c | 28 +--------------------------- src/qemu/qemu_domain.h | 6 +----- src/qemu/qemu_process.c | 2 +- 3 files changed, 3 insertions(+), 33 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5aaa97a..20580c6 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -87,7 +87,7 @@ VIR_ENUM_IMPL(qemuDomainJob, QEMU_JOB_LAST, "abort", "migration operation", "none", /* async job is never stored in job.active */ - "async nested", + "async nested", /* keep for backward compatibility */ ); VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, @@ -3600,8 +3600,6 @@ qemuDomainObjDiscardAsyncJob(virQEMUDriverPtr driver, virDomainObjPtr obj) { qemuDomainObjPrivatePtr priv = obj->privateData; - if (priv->job.active == QEMU_JOB_ASYNC_NESTED) - qemuDomainObjResetJob(priv); qemuDomainObjResetAsyncJob(priv); qemuDomainObjSaveJob(driver, obj); } @@ -3825,30 +3823,6 @@ int qemuDomainObjBeginAsyncJob(virQEMUDriverPtr driver, return 0; } -int -qemuDomainObjBeginNestedJob(virQEMUDriverPtr driver, - virDomainObjPtr obj, - qemuDomainAsyncJob asyncJob) -{ - qemuDomainObjPrivatePtr priv = obj->privateData; - - if (asyncJob != priv->job.asyncJob) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("unexpected async job %d"), asyncJob); - return -1; - } - - if (priv->job.asyncOwner != virThreadSelfID()) { - VIR_WARN("This thread doesn't seem to be the async job owner: %llu", - priv->job.asyncOwner); - } - - return qemuDomainObjBeginJobInternal(driver, obj, - QEMU_JOB_ASYNC_NESTED, - QEMU_ASYNC_JOB_NONE); -} - - /* * obj must be locked and have a reference before calling * diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 39b3aed..062ad1d 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -76,7 +76,7 @@ typedef enum { /* The following two items must always be the last items before JOB_LAST */ QEMU_JOB_ASYNC, /* Asynchronous job */ - QEMU_JOB_ASYNC_NESTED, /* Normal job within an async job */ + QEMU_JOB_PLACEHOLDER_1, /* Have to keep for backward compatibility */ QEMU_JOB_LAST } qemuDomainJob; @@ -439,10 +439,6 @@ int qemuDomainObjBeginAsyncJob(virQEMUDriverPtr driver, qemuDomainAsyncJob asyncJob, virDomainJobOperation operation) ATTRIBUTE_RETURN_CHECK; -int qemuDomainObjBeginNestedJob(virQEMUDriverPtr driver, - virDomainObjPtr obj, - qemuDomainAsyncJob asyncJob) - ATTRIBUTE_RETURN_CHECK; void qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d9f4af7..a99a467 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3177,7 +3177,7 @@ qemuProcessRecoverJob(virQEMUDriverPtr driver, case QEMU_JOB_MIGRATION_OP: case QEMU_JOB_ABORT: case QEMU_JOB_ASYNC: - case QEMU_JOB_ASYNC_NESTED: + case QEMU_JOB_PLACEHOLDER_1: /* async job was already handled above */ case QEMU_JOB_NONE: case QEMU_JOB_LAST: -- 1.8.3.1

ping On 02.05.2017 13:06, Nikolay Shirokovskiy wrote:
This is the next version of RFC [1] 'drop nested job concept'. Actually it is quite different from the first. Patches that accomodate callers to use functions to enter/exit monitor with without driver and asyncJob arguments are moved out of this series. I guess this can be done as soon as this series is upstream. Most of approach details are in the first patch.
Pros: - better async job and concurrent regular job isolation - more straightforward implementation - more simple to use - no need to pass async job argument down the stack - safer to use - no warnings to be fixed if function is started to be used from asynchronous job context.
TODO: - replace qemuDomainObjEnterMonitorAsync with qemuDomainObjEnterMonitor - accomodate callers to qemuDomainObjExitMonitor became returning void - remove passing driver and asyncJob down the stack - reflect changes in THREADS.txt
[1] https://www.redhat.com/archives/libvir-list/2016-November/msg01357.html
Nikolay Shirokovskiy (4): qemu: replace nested job with interruptible async job state qemu: remove liveness check from qemuDomainObjExitMonitor qemu: remove nesting job usage from qemuProcessStop qemu: remove the rest of nested job parts
src/conf/domain_conf.c | 19 ---- src/conf/domain_conf.h | 1 - src/libvirt_private.syms | 1 - src/qemu/qemu_domain.c | 253 +++++++++++++++++++++++++++++----------------- src/qemu/qemu_domain.h | 29 ++++-- src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_migration.c | 70 +++++++------ src/qemu/qemu_process.c | 31 ++---- 8 files changed, 225 insertions(+), 181 deletions(-)
participants (1)
-
Nikolay Shirokovskiy