[libvirt] [PATCH v3 0/6] Propagate storage errors on migration

diff to v3: - Took a step further and introduced a condition to wait on Michal Privoznik (6): qemuProcessHandleBlockJob: Set disk->mirrorState more often qemuProcessHandleBlockJob: Take status into account qemuMigrationDriveMirror: Listen to events qemuDomainObjPrivate: Introduce blockJob condition qemuMigrationDriveMirror: utilize blockJob condition qemuDomainBlockJobImpl: utilize blockJob condition src/qemu/qemu_domain.c | 4 +++- src/qemu/qemu_domain.h | 2 ++ src/qemu/qemu_driver.c | 15 ++++++-------- src/qemu/qemu_migration.c | 50 ++++++++++++++++++++--------------------------- src/qemu/qemu_process.c | 40 +++++++++++++++++++++++-------------- 5 files changed, 57 insertions(+), 54 deletions(-) -- 2.0.5

Currently, upon BLOCK_JOB_* event, disk->mirrorState is not updated each time. The callback code handling the events checks if a blockjob was started via our public APIs prior to setting the mirrorState. However, some block jobs may be started internally (e.g. during storage migration), in which case we don't bother with setting disk->mirror (there's nothing we can set it to anyway), or other fields. But it will come handy if we update the mirrorState in these cases too. The event wasn't delivered just for fun - we've started the job after all. So, in this commit, the mirrorState is set to whatever job status we've obtained. Of course, there are some actions on some statuses that we want to perform. But instead of if {} else if {} else {} ... enumeration, let's move to switch(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 35 ++++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 43a64a1..f30acbb 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1041,7 +1041,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, /* If we completed a block pull or commit, then update the XML * to match. */ - if (status == VIR_DOMAIN_BLOCK_JOB_COMPLETED) { + switch ((virConnectDomainEventBlockJobStatus) status) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_PIVOT) { if (vm->newDef) { int indx = virDomainDiskIndexByName(vm->newDef, disk->dst, @@ -1091,20 +1092,24 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; ignore_value(qemuDomainDetermineDiskChain(driver, vm, disk, true, true)); - } else if (disk->mirror && - (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY || - type == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT)) { - if (status == VIR_DOMAIN_BLOCK_JOB_READY) { - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; - save = true; - } else if (status == VIR_DOMAIN_BLOCK_JOB_FAILED || - status == VIR_DOMAIN_BLOCK_JOB_CANCELED) { - virStorageSourceFree(disk->mirror); - disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; - disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; - save = true; - } + break; + + case VIR_DOMAIN_BLOCK_JOB_READY: + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; + save = true; + break; + + case VIR_DOMAIN_BLOCK_JOB_FAILED: + case VIR_DOMAIN_BLOCK_JOB_CANCELED: + virStorageSourceFree(disk->mirror); + disk->mirror = NULL; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; + save = true; + break; + + case VIR_DOMAIN_BLOCK_JOB_LAST: + break; } } -- 2.0.5

On Fri, Feb 13, 2015 at 16:24:28 +0100, Michal Privoznik wrote:
Currently, upon BLOCK_JOB_* event, disk->mirrorState is not updated each time. The callback code handling the events checks if a blockjob was started via our public APIs prior to setting the mirrorState. However, some block jobs may be started internally (e.g. during storage migration), in which case we don't bother with setting disk->mirror (there's nothing we can set it to anyway), or other fields. But it will come handy if we update the mirrorState in these cases too. The event wasn't delivered just for fun - we've started the job after all.
So, in this commit, the mirrorState is set to whatever job status we've obtained. Of course, there are some actions on some statuses that we want to perform. But instead of if {} else if {} else {} ... enumeration, let's move to switch().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 35 ++++++++++++++++++++--------------- 1 file changed, 20 insertions(+), 15 deletions(-)
ACK Jirka

Upon BLOCK_JOB_COMPLETED event delivery, we check if the job has completed (in qemuMonitorJSONHandleBlockJobImpl()). For better image, the event looks something like this: "timestamp": {"seconds": 1423582694, "microseconds": 372666}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 8412790784, "offset": 409993216, "speed": 8796093022207, "type": "mirror", "error": "No space left on device"}} If "len" does not equal "offset" it's considered an error, and we can clearly see "error" field filled in. However, later in the event processing this case was handled no differently to case of job being aborted via separate API. It's time that we start differentiate these two because of the future work. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f30acbb..9cfc0f3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1103,7 +1103,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, case VIR_DOMAIN_BLOCK_JOB_CANCELED: virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + disk->mirrorState = status == VIR_DOMAIN_BLOCK_JOB_FAILED ? + VIR_DOMAIN_DISK_MIRROR_STATE_ABORT : VIR_DOMAIN_DISK_MIRROR_STATE_NONE; disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; save = true; break; -- 2.0.5

On Fri, Feb 13, 2015 at 16:24:29 +0100, Michal Privoznik wrote:
Upon BLOCK_JOB_COMPLETED event delivery, we check if the job has completed (in qemuMonitorJSONHandleBlockJobImpl()). For better image, the event looks something like this:
"timestamp": {"seconds": 1423582694, "microseconds": 372666}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len": 8412790784, "offset": 409993216, "speed": 8796093022207, "type": "mirror", "error": "No space left on device"}}
If "len" does not equal "offset" it's considered an error, and we can clearly see "error" field filled in. However, later in the event processing this case was handled no differently to case of job being aborted via separate API. It's time that we start differentiate these two because of the future work.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_process.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f30acbb..9cfc0f3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1103,7 +1103,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, case VIR_DOMAIN_BLOCK_JOB_CANCELED: virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + disk->mirrorState = status == VIR_DOMAIN_BLOCK_JOB_FAILED ? + VIR_DOMAIN_DISK_MIRROR_STATE_ABORT : VIR_DOMAIN_DISK_MIRROR_STATE_NONE; disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; save = true; break;
ACK Jirka

https://bugzilla.redhat.com/show_bug.cgi?id=1179678 When migrating with storage, libvirt iterates over domain disks and instruct qemu to migrate the ones we are interested in (shared, RO and source-less disks are skipped). The disks are migrated in series. No new disk is transferred until the previous one hasn't been quiesced. This is checked on the qemu monitor via 'query-jobs' command. If the disk has been quiesced, it practically went from copying its content to mirroring state, where all disk writes are mirrored to the other side of migration too. Having said that, there's one inherent error in the design. The monitor command we use reports only active jobs. So if the job fails for whatever reason, we will not see it anymore in the command output. And this can happen fairly simply: just try to migrate a domain with storage. If the storage migration fails (e.g. due to ENOSPC on the destination) we resume the host on the destination and let it run on partly copied disk. The proper fix is what even the comment in the code says: listen for qemu events instead of polling. If storage migration changes state an event is emitted and we can act accordingly: either consider disk copied and continue the process, or consider disk mangled and abort the migration. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 37 +++++++++++++++++-------------------- 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 4506d87..14a4ec6 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1739,7 +1739,6 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDefPtr disk = vm->def->disks[i]; - virDomainBlockJobInfo info; /* skip shared, RO and source-less disks */ if (disk->src->shared || disk->src->readonly || @@ -1772,38 +1771,36 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, /* Poll every 500ms for progress & to allow cancellation */ struct timespec ts = { .tv_sec = 0, .tv_nsec = 500 * 1000 * 1000ull }; - memset(&info, 0, sizeof(info)); - - if (qemuDomainObjEnterMonitorAsync(driver, vm, - QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + /* Explicitly check if domain is still alive. Maybe qemu + * died meanwhile so we won't see any event at all. */ + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest unexpectedly quit")); goto error; + } + + /* The following check should be race free as long as the variable + * is set only with domain object locked. And here we have the + * domain object locked too. */ if (priv->job.asyncAbort) { - /* explicitly do this *after* we entered the monitor, - * as this is a critical section so we are guaranteed - * priv->job.asyncAbort will not change */ - ignore_value(qemuDomainObjExitMonitor(driver, vm)); priv->job.current->type = VIR_DOMAIN_JOB_CANCELLED; virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), qemuDomainAsyncJobTypeToString(priv->job.asyncJob), _("canceled by client")); goto error; } - mon_ret = qemuMonitorBlockJobInfo(priv->mon, diskAlias, &info, - NULL); - if (qemuDomainObjExitMonitor(driver, vm) < 0) - goto error; - if (mon_ret < 0) - goto error; - - if (info.cur == info.end) { + if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_READY) { VIR_DEBUG("Drive mirroring of '%s' completed", diskAlias); break; + } else if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_ABORT) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("migration of disk %s failed"), + disk->dst); + goto error; } - /* XXX Frankly speaking, we should listen to the events, - * instead of doing this. But this works for now and we - * are doing something similar in migration itself anyway */ + /* XXX Turn this into virCond someday. */ virObjectUnlock(vm); -- 2.0.5

On Fri, Feb 13, 2015 at 16:24:30 +0100, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1179678
When migrating with storage, libvirt iterates over domain disks and instruct qemu to migrate the ones we are interested in (shared, RO and source-less disks are skipped). The disks are migrated in series. No new disk is transferred until the previous one hasn't been quiesced. This is checked on the qemu monitor via 'query-jobs' command. If the disk has been quiesced, it practically went from copying its content to mirroring state, where all disk writes are mirrored to the other side of migration too. Having said that, there's one inherent error in the design. The monitor command we use reports only active jobs. So if the job fails for whatever reason, we will not see it anymore in the command output. And this can happen fairly simply: just try to migrate a domain with storage. If the storage migration fails (e.g. due to ENOSPC on the destination) we resume the host on the destination and let it run on partly copied disk.
The proper fix is what even the comment in the code says: listen for qemu events instead of polling. If storage migration changes state an event is emitted and we can act accordingly: either consider disk copied and continue the process, or consider disk mangled and abort the migration.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 37 +++++++++++++++++-------------------- 1 file changed, 17 insertions(+), 20 deletions(-)
ACK, please push patches 1..3 since I realized turning all this into a condition is not going to be that easy... Jirka

So far the condition is not used, but will be. There are some operations, where we actively wait for a block job to complete. Instead of locking and unlocking the domain object, entering and leaving monitor, lets just use a condition and let our monitor event handling code wake up when needed. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 4 +++- src/qemu/qemu_domain.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 99c46d4..28961d2 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -406,7 +406,8 @@ qemuDomainObjPrivateAlloc(void) goto error; } - if (virCondInit(&priv->unplugFinished) < 0) + if (virCondInit(&priv->unplugFinished) < 0 || + virCondInit(&priv->blockJob) < 0) goto error; if (!(priv->devs = virChrdevAlloc())) @@ -439,6 +440,7 @@ qemuDomainObjPrivateFree(void *data) VIR_FREE(priv->origname); virCondDestroy(&priv->unplugFinished); + virCondDestroy(&priv->blockJob); virChrdevFree(priv->devs); /* This should never be non-NULL if we get here, but just in case... */ diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index b2c3881..db9ffac 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -186,6 +186,8 @@ struct _qemuDomainObjPrivate { const char *unpluggingDevice; /* alias of the device that is being unplugged */ char **qemuDevices; /* NULL-terminated list of devices aliases known to QEMU */ + virCond blockJob; /* signals that one of disks translated state of a block job */ + bool hookRun; /* true if there was a hook run over this domain */ virBitmapPtr autoNodeset; }; -- 2.0.5

On Fri, Feb 13, 2015 at 16:24:31 +0100, Michal Privoznik wrote:
So far the condition is not used, but will be. There are some operations, where we actively wait for a block job to complete. Instead of locking and unlocking the domain object, entering and leaving monitor, lets just use a condition and let our monitor event handling code wake up when needed.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_domain.c | 4 +++- src/qemu/qemu_domain.h | 2 ++ 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 99c46d4..28961d2 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -406,7 +406,8 @@ qemuDomainObjPrivateAlloc(void) goto error; }
- if (virCondInit(&priv->unplugFinished) < 0) + if (virCondInit(&priv->unplugFinished) < 0 || + virCondInit(&priv->blockJob) < 0) goto error;
if (!(priv->devs = virChrdevAlloc())) @@ -439,6 +440,7 @@ qemuDomainObjPrivateFree(void *data) VIR_FREE(priv->origname);
virCondDestroy(&priv->unplugFinished); + virCondDestroy(&priv->blockJob); virChrdevFree(priv->devs);
/* This should never be non-NULL if we get here, but just in case... */ diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index b2c3881..db9ffac 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -186,6 +186,8 @@ struct _qemuDomainObjPrivate { const char *unpluggingDevice; /* alias of the device that is being unplugged */ char **qemuDevices; /* NULL-terminated list of devices aliases known to QEMU */
+ virCond blockJob; /* signals that one of disks translated state of a block job */
Wouldn't "signals whenever a block job changes its state" or something similar be more readable?
+ bool hookRun; /* true if there was a hook run over this domain */ virBitmapPtr autoNodeset; };
Jirka

Instead of unlocking and locking the domain object every 50ms lets just wait on blockJob condition and run the loop body if and BLOCK_JOB even occurred. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 15 +++++---------- src/qemu/qemu_process.c | 4 ++++ 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 14a4ec6..998e8f5 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1768,9 +1768,6 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, /* wait for completion */ while (true) { - /* Poll every 500ms for progress & to allow cancellation */ - struct timespec ts = { .tv_sec = 0, .tv_nsec = 500 * 1000 * 1000ull }; - /* Explicitly check if domain is still alive. Maybe qemu * died meanwhile so we won't see any event at all. */ if (!virDomainObjIsActive(vm)) { @@ -1800,13 +1797,11 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, goto error; } - /* XXX Turn this into virCond someday. */ - - virObjectUnlock(vm); - - nanosleep(&ts, NULL); - - virObjectLock(vm); + if (virCondWait(&priv->blockJob, &vm->parent.lock) < 0) { + virReportSystemError(errno, "%s", + _("Unable to wait on blockJob condition")); + goto error; + } } } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 9cfc0f3..7b87cdb 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1017,6 +1017,7 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, void *opaque) { virQEMUDriverPtr driver = opaque; + qemuDomainObjPrivatePtr priv; virObjectEventPtr event = NULL; virObjectEventPtr event2 = NULL; const char *path; @@ -1026,6 +1027,7 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, bool save = false; virObjectLock(vm); + priv = vm->privateData; disk = qemuProcessFindDomainDiskByAlias(vm, diskAlias); if (disk) { @@ -1112,6 +1114,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, case VIR_DOMAIN_BLOCK_JOB_LAST: break; } + + virCondSignal(&priv->blockJob); } if (save) { -- 2.0.5

On Fri, Feb 13, 2015 at 16:24:32 +0100, Michal Privoznik wrote:
Instead of unlocking and locking the domain object every 50ms lets just wait on blockJob condition and run the loop body if and BLOCK_JOB even occurred.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 15 +++++---------- src/qemu/qemu_process.c | 4 ++++ 2 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 14a4ec6..998e8f5 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1768,9 +1768,6 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver,
/* wait for completion */ while (true) { - /* Poll every 500ms for progress & to allow cancellation */ - struct timespec ts = { .tv_sec = 0, .tv_nsec = 500 * 1000 * 1000ull }; - /* Explicitly check if domain is still alive. Maybe qemu * died meanwhile so we won't see any event at all. */ if (!virDomainObjIsActive(vm)) { @@ -1800,13 +1797,11 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, goto error; }
- /* XXX Turn this into virCond someday. */ - - virObjectUnlock(vm); - - nanosleep(&ts, NULL); - - virObjectLock(vm); + if (virCondWait(&priv->blockJob, &vm->parent.lock) < 0) {
Hmm, you'd also need to signal this condition on EOF, otherwise this could be waiting forever when the domain dies during drive mirror job. But see below...
+ virReportSystemError(errno, "%s", + _("Unable to wait on blockJob condition")); + goto error; + } } }
The main problem with this patch is hidden in the (missing) context: if (priv->job.asyncAbort) { priv->job.current->type = VIR_DOMAIN_JOB_CANCELLED; virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), qemuDomainAsyncJobTypeToString(priv->job.asyncJob), _("canceled by client")); goto error; } The loop would never break when priv->job.asyncAbort is set because it would still be waiting for priv->blockJob to be signalled. So we need to somehow turn this into a condition too. And because there's no way we could wait for more conditions at once, we'll need to think more what to do. One possibility is to create a general jobCond which would be signalled in several places: when asyncAbort is set, when block job changes its state, on monitor EOF, ... Jirka

Instead of unlocking and locking the domain object every 50ms lets just wait on blockJob condition and run the loop body if and BLOCK_JOB even occurred. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- This one, well, I still left the only monitor call as I'm not very familiar with this code so I don't know if it can be more optimized. But hey, the 50ms sleep is gone! src/qemu/qemu_driver.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 709f468..9298619 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -15872,10 +15872,6 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, /* XXX If the event reports failure, we should reflect * that back into the return status of this API call. */ while (1) { - /* Poll every 50ms */ - static struct timespec ts = { - .tv_sec = 0, - .tv_nsec = 50 * 1000 * 1000ull }; virDomainBlockJobInfo dummy; qemuDomainObjEnterMonitor(driver, vm); @@ -15886,11 +15882,12 @@ qemuDomainBlockJobImpl(virDomainObjPtr vm, if (ret <= 0) break; - virObjectUnlock(vm); - - nanosleep(&ts, NULL); - - virObjectLock(vm); + if (virCondWait(&priv->blockJob, &vm->parent.lock) < 0) { + virReportSystemError(errno, "%s", + _("Unable to wait on blockJob condition")); + ret = -1; + break; + } if (!virDomainObjIsActive(vm)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", -- 2.0.5

On Fri, Feb 13, 2015 at 16:24:33 +0100, Michal Privoznik wrote:
Instead of unlocking and locking the domain object every 50ms lets just wait on blockJob condition and run the loop body if and BLOCK_JOB even occurred.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
This one, well, I still left the only monitor call as I'm not very familiar with this code so I don't know if it can be more optimized. But hey, the 50ms sleep is gone!
It seems the monitor call just checks for block job status, which seems redundant since it should already be present in disk->mirrorState thanks to the block job event handler. Jirka
participants (2)
-
Jiri Denemark
-
Michal Privoznik