[libvirt] [PATCH 0/3] For 1.2.14: Fix regression in synchronous block job ABORT/PIVOT

When a block job is terminated via the synchronous API the backing chain would be updated in a separate thread and thus allowed applications to get outdated data. This broke live snapshot merge on oVirt. Since the commit breaking this (see patch 3/3) was not released yet I'm asking to merge this during the freeze. Peter Krempa (3): qemu: processBlockJob: Don't unlock @vm twice qemu: Extract internals of processBlockJobEvent into a helper qemu: blockjob: Synchronously update backing chain in XML on ABORT/PIVOT src/conf/domain_conf.c | 16 +++- src/conf/domain_conf.h | 6 ++ src/qemu/qemu_driver.c | 246 ++++++++++++++++++++++++------------------------ src/qemu/qemu_process.c | 38 +++++--- 4 files changed, 169 insertions(+), 137 deletions(-) -- 2.2.2

Commit 1a92c719 moved code to handle block job events to a different function that is executed in a separate thread. The caller of processBlockJob handles locking and unlocking of @vm, so the we should not do it in the function itself. --- src/qemu/qemu_driver.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f07e4fb..f1cbc46 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4574,7 +4574,6 @@ processBlockJobEvent(virQEMUDriverPtr driver, VIR_WARN("Unable to update persistent definition on vm %s " "after block job", vm->def->name); } - virObjectUnlock(vm); virObjectUnref(cfg); if (event) -- 2.2.2

On 03/30/2015 03:26 AM, Peter Krempa wrote:
Commit 1a92c719 moved code to handle block job events to a different function that is executed in a separate thread. The caller of processBlockJob handles locking and unlocking of @vm, so the we should not do it in the function itself. --- src/qemu/qemu_driver.c | 1 - 1 file changed, 1 deletion(-)
ACK; bug fix so safe for freeze.
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f07e4fb..f1cbc46 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4574,7 +4574,6 @@ processBlockJobEvent(virQEMUDriverPtr driver, VIR_WARN("Unable to update persistent definition on vm %s " "after block job", vm->def->name); } - virObjectUnlock(vm); virObjectUnref(cfg);
if (event)
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Later on I'll be adding a condition that will allow to synchronise a SYNC block job abort. The approach will require this code to be called from two different places so it has to be extracted into a helper. --- src/qemu/qemu_driver.c | 200 +++++++++++++++++++++++++------------------------ 1 file changed, 104 insertions(+), 96 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f1cbc46..257dea8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4453,116 +4453,101 @@ processSerialChangedEvent(virQEMUDriverPtr driver, static void -processBlockJobEvent(virQEMUDriverPtr driver, - virDomainObjPtr vm, - char *diskAlias, - int type, - int status) +qemuBlockJobEventProcess(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + int type, + int status) { virObjectEventPtr event = NULL; virObjectEventPtr event2 = NULL; const char *path; - virDomainDiskDefPtr disk; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); virDomainDiskDefPtr persistDisk = NULL; bool save = false; - if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) - goto cleanup; - - if (!virDomainObjIsActive(vm)) { - VIR_DEBUG("Domain is not running"); - goto endjob; - } - - disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias); - - if (disk) { - /* Have to generate two variants of the event for old vs. new - * client callbacks */ - if (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && - disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) - type = disk->mirrorJob; - path = virDomainDiskGetSource(disk); - event = virDomainEventBlockJobNewFromObj(vm, path, type, status); - event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, - status); - - /* If we completed a block pull or commit, then update the XML - * to match. */ - switch ((virConnectDomainEventBlockJobStatus) status) { - case VIR_DOMAIN_BLOCK_JOB_COMPLETED: - if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_PIVOT) { - if (vm->newDef) { - int indx = virDomainDiskIndexByName(vm->newDef, disk->dst, - false); - virStorageSourcePtr copy = NULL; - - if (indx >= 0) { - persistDisk = vm->newDef->disks[indx]; - copy = virStorageSourceCopy(disk->mirror, false); - if (virStorageSourceInitChainElement(copy, - persistDisk->src, - true) < 0) { - VIR_WARN("Unable to update persistent definition " - "on vm %s after block job", - vm->def->name); - virStorageSourceFree(copy); - copy = NULL; - persistDisk = NULL; - } - } - if (copy) { - virStorageSourceFree(persistDisk->src); - persistDisk->src = copy; + /* Have to generate two variants of the event for old vs. new + * client callbacks */ + if (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && + disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) + type = disk->mirrorJob; + path = virDomainDiskGetSource(disk); + event = virDomainEventBlockJobNewFromObj(vm, path, type, status); + event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, status); + + /* If we completed a block pull or commit, then update the XML + * to match. */ + switch ((virConnectDomainEventBlockJobStatus) status) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_PIVOT) { + if (vm->newDef) { + int indx = virDomainDiskIndexByName(vm->newDef, disk->dst, false); + virStorageSourcePtr copy = NULL; + + if (indx >= 0) { + persistDisk = vm->newDef->disks[indx]; + copy = virStorageSourceCopy(disk->mirror, false); + if (virStorageSourceInitChainElement(copy, + persistDisk->src, + true) < 0) { + VIR_WARN("Unable to update persistent definition " + "on vm %s after block job", + vm->def->name); + virStorageSourceFree(copy); + copy = NULL; + persistDisk = NULL; } } - - /* XXX We want to revoke security labels and disk - * lease, as well as audit that revocation, before - * dropping the original source. But it gets tricky - * if both source and mirror share common backing - * files (we want to only revoke the non-shared - * portion of the chain); so for now, we leak the - * access to the original. */ - virStorageSourceFree(disk->src); - disk->src = disk->mirror; - } else { - virStorageSourceFree(disk->mirror); + if (copy) { + virStorageSourceFree(persistDisk->src); + persistDisk->src = copy; + } } - /* Recompute the cached backing chain to match our - * updates. Better would be storing the chain ourselves - * rather than reprobing, but we haven't quite completed - * that conversion to use our XML tracking. */ - disk->mirror = NULL; - save = disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_NONE; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; - disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; - ignore_value(qemuDomainDetermineDiskChain(driver, vm, disk, - true, true)); - disk->blockjob = false; - break; + /* XXX We want to revoke security labels and disk + * lease, as well as audit that revocation, before + * dropping the original source. But it gets tricky + * if both source and mirror share common backing + * files (we want to only revoke the non-shared + * portion of the chain); so for now, we leak the + * access to the original. */ + virStorageSourceFree(disk->src); + disk->src = disk->mirror; + } else { + virStorageSourceFree(disk->mirror); + } - case VIR_DOMAIN_BLOCK_JOB_READY: - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; - save = true; - break; + /* Recompute the cached backing chain to match our + * updates. Better would be storing the chain ourselves + * rather than reprobing, but we haven't quite completed + * that conversion to use our XML tracking. */ + disk->mirror = NULL; + save = disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; + ignore_value(qemuDomainDetermineDiskChain(driver, vm, disk, + true, true)); + disk->blockjob = false; + break; - case VIR_DOMAIN_BLOCK_JOB_FAILED: - case VIR_DOMAIN_BLOCK_JOB_CANCELED: - virStorageSourceFree(disk->mirror); - disk->mirror = NULL; - disk->mirrorState = status == VIR_DOMAIN_BLOCK_JOB_FAILED ? - VIR_DOMAIN_DISK_MIRROR_STATE_ABORT : VIR_DOMAIN_DISK_MIRROR_STATE_NONE; - disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; - save = true; - disk->blockjob = false; - break; + case VIR_DOMAIN_BLOCK_JOB_READY: + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; + save = true; + break; - case VIR_DOMAIN_BLOCK_JOB_LAST: - break; - } + case VIR_DOMAIN_BLOCK_JOB_FAILED: + case VIR_DOMAIN_BLOCK_JOB_CANCELED: + virStorageSourceFree(disk->mirror); + disk->mirror = NULL; + disk->mirrorState = status == VIR_DOMAIN_BLOCK_JOB_FAILED ? + VIR_DOMAIN_DISK_MIRROR_STATE_ABORT : VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; + save = true; + disk->blockjob = false; + break; + + case VIR_DOMAIN_BLOCK_JOB_LAST: + break; } if (save) { @@ -4574,13 +4559,36 @@ processBlockJobEvent(virQEMUDriverPtr driver, VIR_WARN("Unable to update persistent definition on vm %s " "after block job", vm->def->name); } - virObjectUnref(cfg); if (event) qemuDomainEventQueue(driver, event); if (event2) qemuDomainEventQueue(driver, event2); + virObjectUnref(cfg); +} + + +static void +processBlockJobEvent(virQEMUDriverPtr driver, + virDomainObjPtr vm, + char *diskAlias, + int type, + int status) +{ + virDomainDiskDefPtr disk; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + VIR_DEBUG("Domain is not running"); + goto endjob; + } + + if ((disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias))) + qemuBlockJobEventProcess(driver, vm, disk, type, status); + endjob: qemuDomainObjEndJob(driver, vm); cleanup: -- 2.2.2

On 03/30/2015 03:26 AM, Peter Krempa wrote:
Later on I'll be adding a condition that will allow to synchronise a SYNC block job abort. The approach will require this code to be called from two different places so it has to be extracted into a helper. --- src/qemu/qemu_driver.c | 200 +++++++++++++++++++++++++------------------------ 1 file changed, 104 insertions(+), 96 deletions(-)
ACK; looks to be code motion with no immediate impact, so safe for freeze if 3/3 is approved. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

When the synchronous pivot option is selected, libvirt would not update the backing chain until the job was exitted. Some applications then received invalid data as their job serialized first. This patch removes polling to wait for the ABORT/PIVOT job completion and replaces it with a condition. If a synchronous operation is requested the update of the XML is executed in the job of the caller of the synchronous request. Otherwise the monitor event callback uses a separate worker to update the backing chain with a new job. This is a regression since 1a92c719101e5bfa6fe2b78006ad04c7f075ea28 When the ABORT job is finished synchronously you get the following call stack: #0 qemuBlockJobEventProcess #1 qemuDomainBlockJobImpl #2 qemuDomainBlockJobAbort #3 virDomainBlockJobAbort While previously or while using the _ASYNC flag you'd get: #0 qemuBlockJobEventProcess #1 processBlockJobEvent #2 qemuProcessEventHandler #3 virThreadPoolWorker --- src/conf/domain_conf.c | 16 +++++++++++++++- src/conf/domain_conf.h | 6 ++++++ src/qemu/qemu_driver.c | 45 +++++++++++++++++++-------------------------- src/qemu/qemu_process.c | 38 +++++++++++++++++++++++++------------- 4 files changed, 65 insertions(+), 40 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 9324de0..cd6ee22 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1289,9 +1289,22 @@ virDomainDiskDefNew(void) if (VIR_ALLOC(ret) < 0) return NULL; + if (VIR_ALLOC(ret->src) < 0) - VIR_FREE(ret); + goto error; + + if (virCondInit(&ret->blockJobSyncCond) < 0) { + virReportSystemError(errno, "%s", _("Failed to initialize condition")); + goto error; + } + return ret; + + error: + virStorageSourceFree(ret->src); + VIR_FREE(ret); + + return NULL; } @@ -1310,6 +1323,7 @@ virDomainDiskDefFree(virDomainDiskDefPtr def) VIR_FREE(def->product); VIR_FREE(def->domain_name); virDomainDeviceInfoClear(&def->info); + virCondDestroy(&def->blockJobSyncCond); VIR_FREE(def); } diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 608f61f..84e880a 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -685,6 +685,12 @@ struct _virDomainDiskDef { int mirrorState; /* enum virDomainDiskMirrorState */ int mirrorJob; /* virDomainBlockJobType */ + /* for some synchronous block jobs, we need to notify the owner */ + virCond blockJobSyncCond; + int blockJobType; /* type of the block job from the event */ + int blockJobStatus; /* status of the finished block job */ + bool blockJobSync; /* the block job needs synchronized termination */ + struct { unsigned int cylinders; unsigned int heads; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 257dea8..b37995b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16276,6 +16276,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, goto endjob; if (mode == BLOCK_JOB_ABORT) { + if (async && !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + /* prepare state for event delivery */ + disk->blockJobStatus = -1; + disk->blockJobSync = true; + } + if ((flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) && !(async && disk->mirror)) { virReportError(VIR_ERR_OPERATION_INVALID, @@ -16389,37 +16395,24 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, status); event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, status); - } else if (!(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + } else if (disk->blockJobSync) { /* XXX If the event reports failure, we should reflect * that back into the return status of this API call. */ - while (1) { - /* Poll every 50ms */ - static struct timespec ts = { - .tv_sec = 0, - .tv_nsec = 50 * 1000 * 1000ull }; - virDomainBlockJobInfo dummy; - - qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJobInfo(priv->mon, device, &dummy, NULL); - if (qemuDomainObjExitMonitor(driver, vm) < 0) - ret = -1; - - if (ret <= 0) - break; - - virObjectUnlock(vm); - nanosleep(&ts, NULL); - - virObjectLock(vm); - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("domain is not running")); - ret = -1; - break; + while (disk->blockJobStatus == -1 && disk->blockJobSync) { + if (virCondWait(&disk->blockJobSyncCond, &vm->parent.lock) < 0) { + virReportSystemError(errno, "%s", + _("Unable to wait on block job sync " + "condition")); + disk->blockJobSync = false; + goto endjob; } } + + qemuBlockJobEventProcess(driver, vm, disk, + disk->blockJobType, + disk->blockJobStatus); + disk->blockJobSync = false; } } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 79f763e..2d86eb8 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1020,28 +1020,40 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, { virQEMUDriverPtr driver = opaque; struct qemuProcessEvent *processEvent = NULL; - char *data; + virDomainDiskDefPtr disk; + char *data = NULL; virObjectLock(vm); VIR_DEBUG("Block job for device %s (domain: %p,%s) type %d status %d", diskAlias, vm, vm->def->name, type, status); - if (VIR_ALLOC(processEvent) < 0) + if (!(disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias))) goto error; - processEvent->eventType = QEMU_PROCESS_EVENT_BLOCK_JOB; - if (VIR_STRDUP(data, diskAlias) < 0) - goto error; - processEvent->data = data; - processEvent->vm = vm; - processEvent->action = type; - processEvent->status = status; + if (disk->blockJobSync) { + disk->blockJobType = type; + disk->blockJobStatus = status; + /* We have an SYNC API waiting for this event, dispatch it back */ + virCondSignal(&disk->blockJobSyncCond); + } else { + /* there is no waiting SYNC API, dispatch the update to a thread */ + if (VIR_ALLOC(processEvent) < 0) + goto error; - virObjectRef(vm); - if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { - ignore_value(virObjectUnref(vm)); - goto error; + processEvent->eventType = QEMU_PROCESS_EVENT_BLOCK_JOB; + if (VIR_STRDUP(data, diskAlias) < 0) + goto error; + processEvent->data = data; + processEvent->vm = vm; + processEvent->action = type; + processEvent->status = status; + + virObjectRef(vm); + if (virThreadPoolSendJob(driver->workerPool, 0, processEvent) < 0) { + ignore_value(virObjectUnref(vm)); + goto error; + } } cleanup: -- 2.2.2

On 03/30/2015 03:26 AM, Peter Krempa wrote:
When the synchronous pivot option is selected, libvirt would not update the backing chain until the job was exitted. Some applications then
s/exitted/exited/
received invalid data as their job serialized first.
This patch removes polling to wait for the ABORT/PIVOT job completion and replaces it with a condition. If a synchronous operation is requested the update of the XML is executed in the job of the caller of the synchronous request. Otherwise the monitor event callback uses a separate worker to update the backing chain with a new job.
This is a regression since 1a92c719101e5bfa6fe2b78006ad04c7f075ea28
unreleased, and therefore worth fixing during freeze.
When the ABORT job is finished synchronously you get the following call stack: #0 qemuBlockJobEventProcess #1 qemuDomainBlockJobImpl #2 qemuDomainBlockJobAbort #3 virDomainBlockJobAbort
While previously or while using the _ASYNC flag you'd get: #0 qemuBlockJobEventProcess #1 processBlockJobEvent #2 qemuProcessEventHandler #3 virThreadPoolWorker --- src/conf/domain_conf.c | 16 +++++++++++++++- src/conf/domain_conf.h | 6 ++++++ src/qemu/qemu_driver.c | 45 +++++++++++++++++++-------------------------- src/qemu/qemu_process.c | 38 +++++++++++++++++++++++++------------- 4 files changed, 65 insertions(+), 40 deletions(-)
@@ -16389,37 +16395,24 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, status); event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, status); - } else if (!(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + } else if (disk->blockJobSync) { /* XXX If the event reports failure, we should reflect * that back into the return status of this API call. */
Isn't this comment stale now? That is, since you are waiting for the event to free the condition, and we are listening for both success and failure events, we can now tell if the event reports failure. Otherwise, looks correct to me. I hate that it is so close to the release deadline, but I hate regressions more (which is why we have freezes). I'd feel better if you got a second review, if you can wrangle one up in time, but here's my: ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Mar 30, 2015 at 14:51:08 -0600, Eric Blake wrote:
On 03/30/2015 03:26 AM, Peter Krempa wrote:
When the synchronous pivot option is selected, libvirt would not update the backing chain until the job was exitted. Some applications then
s/exitted/exited/
received invalid data as their job serialized first.
This patch removes polling to wait for the ABORT/PIVOT job completion and replaces it with a condition. If a synchronous operation is requested the update of the XML is executed in the job of the caller of the synchronous request. Otherwise the monitor event callback uses a separate worker to update the backing chain with a new job.
This is a regression since 1a92c719101e5bfa6fe2b78006ad04c7f075ea28
unreleased, and therefore worth fixing during freeze.
When the ABORT job is finished synchronously you get the following call stack: #0 qemuBlockJobEventProcess #1 qemuDomainBlockJobImpl #2 qemuDomainBlockJobAbort #3 virDomainBlockJobAbort
While previously or while using the _ASYNC flag you'd get: #0 qemuBlockJobEventProcess #1 processBlockJobEvent #2 qemuProcessEventHandler #3 virThreadPoolWorker --- src/conf/domain_conf.c | 16 +++++++++++++++- src/conf/domain_conf.h | 6 ++++++ src/qemu/qemu_driver.c | 45 +++++++++++++++++++-------------------------- src/qemu/qemu_process.c | 38 +++++++++++++++++++++++++------------- 4 files changed, 65 insertions(+), 40 deletions(-)
@@ -16389,37 +16395,24 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, status); event2 = virDomainEventBlockJob2NewFromObj(vm, disk->dst, type, status); - } else if (!(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + } else if (disk->blockJobSync) { /* XXX If the event reports failure, we should reflect * that back into the return status of this API call. */
Isn't this comment stale now? That is, since you are waiting for the event to free the condition, and we are listening for both success and failure events, we can now tell if the event reports failure.
No it is not. The code currently does not change the return value depending on the success state of the operation. I wanted not to change semantics in this case. I'm going to follow up to this series with a series that refactors qemuDomainBlockJobImpl by splitting it into seprate functions instead of that spaghetti monster of function it is now. In that change I'll then try to fix the semantics.
Otherwise, looks correct to me. I hate that it is so close to the release deadline, but I hate regressions more (which is why we have freezes). I'd feel better if you got a second review, if you can wrangle one up in time, but here's my:
ACK.
Thanks. Peter
participants (2)
-
Eric Blake
-
Peter Krempa