[libvirt] [PATCH v3 00/24] Add support for migration events

QEMU will soon (patches are available on qemu-devel) get support for migration events which will finally allow us to get rid of polling query-migrate every 50ms. However, we first need to be able to wait for all events related to migration (migration status changes, block job events, async abort requests) at once. This series prepares the infrastructure and uses it to switch all polling loops in migration code to pthread_cond_wait. https://bugzilla.redhat.com/show_bug.cgi?id=1212077 Version 3 (see individual patches for details): - most of the series has been ACKed in v2 - "qemu: Use domain condition for synchronous block jobs" was split in 3 patches for easier review - minor changes requested in v2 review Version 2 (see individual patches for details): - rewritten using per-domain condition variable - enahnced to fully support the migration events Jiri Denemark (24): conf: Introduce per-domain condition variable qemu: Introduce qemuBlockJobUpdate qemu: Properly report failed migration qemu: Use domain condition for synchronous block jobs qemu: Cancel storage migration in parallel qemu: Abort migration early if disk mirror failed qemu: Don't mess with disk->mirrorState Pass domain object to private data formatter/parser qemu: Make qemuMigrationCancelDriveMirror usable without async job qemu: Refactor qemuMonitorBlockJobInfo qemu: Cancel disk mirrors after libvirtd restart qemu: Use domain condition for asyncAbort qemu_monitor: Wire up SPICE_MIGRATE_COMPLETED event qemu: Do not poll for spice migration status qemu: Refactor qemuDomainGetJob{Info,Stats} qemu: Refactor qemuMigrationUpdateJobStatus qemu: Don't pass redundant job name around qemu: Refactor qemuMigrationWaitForCompletion qemu_monitor: Wire up MIGRATION event qemuDomainGetJobStatsInternal: Support migration events qemu: Update migration state according to MIGRATION event qemu: Wait for migration events on domain condition qemu: cancel drive mirrors when p2p connection breaks DO NOT APPLY: qemu: Work around weird migration status changes po/POTFILES.in | 1 - src/conf/domain_conf.c | 51 ++- src/conf/domain_conf.h | 12 +- src/libvirt_private.syms | 6 + src/libxl/libxl_domain.c | 10 +- src/lxc/lxc_domain.c | 12 +- src/qemu/qemu_blockjob.c | 185 +++-------- src/qemu/qemu_blockjob.h | 15 +- src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain.c | 78 +++-- src/qemu/qemu_domain.h | 7 +- src/qemu/qemu_driver.c | 201 +++++++----- src/qemu/qemu_migration.c | 763 +++++++++++++++++++++++++++++-------------- src/qemu/qemu_migration.h | 8 + src/qemu/qemu_monitor.c | 73 ++++- src/qemu/qemu_monitor.h | 33 +- src/qemu/qemu_monitor_json.c | 152 ++++----- src/qemu/qemu_monitor_json.h | 7 +- src/qemu/qemu_process.c | 92 +++++- tests/qemumonitorjsontest.c | 40 --- 21 files changed, 1057 insertions(+), 693 deletions(-) -- 2.4.3

Complex jobs, such as migration, need to monitor several events at once, which is impossible when each of the event uses its own condition variable. This patch adds a single condition variable to each domain object. This variable can be used instead of the other event specific conditions. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - rebased (context conflict in libvirt_private.syms) Version 2: - new patch which replaces thread queues and conditions (patch 1 and 2 in version 1) src/conf/domain_conf.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 6 ++++++ src/libvirt_private.syms | 4 ++++ 3 files changed, 57 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 36de844..433183f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2509,6 +2509,7 @@ static void virDomainObjDispose(void *obj) virDomainObjPtr dom = obj; VIR_DEBUG("obj=%p", dom); + virCondDestroy(&dom->cond); virDomainDefFree(dom->def); virDomainDefFree(dom->newDef); @@ -2529,6 +2530,12 @@ virDomainObjNew(virDomainXMLOptionPtr xmlopt) if (!(domain = virObjectLockableNew(virDomainObjClass))) return NULL; + if (virCondInit(&domain->cond) < 0) { + virReportSystemError(errno, "%s", + _("failed to initialize domain condition")); + goto error; + } + if (xmlopt->privateData.alloc) { if (!(domain->privateData = (xmlopt->privateData.alloc)())) goto error; @@ -2651,6 +2658,46 @@ virDomainObjEndAPI(virDomainObjPtr *vm) } +void +virDomainObjSignal(virDomainObjPtr vm) +{ + virCondSignal(&vm->cond); +} + + +void +virDomainObjBroadcast(virDomainObjPtr vm) +{ + virCondBroadcast(&vm->cond); +} + + +int +virDomainObjWait(virDomainObjPtr vm) +{ + if (virCondWait(&vm->cond, &vm->parent.lock) < 0) { + virReportSystemError(errno, "%s", + _("failed to wait for domain condition")); + return -1; + } + return 0; +} + + +int +virDomainObjWaitUntil(virDomainObjPtr vm, + unsigned long long whenms) +{ + if (virCondWaitUntil(&vm->cond, &vm->parent.lock, whenms) < 0 && + errno != ETIMEDOUT) { + virReportSystemError(errno, "%s", + _("failed to wait for domain condition")); + return -1; + } + return 0; +} + + /* * * If flags & VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE then diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index ba17a8d..ac29ce5 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2318,6 +2318,7 @@ typedef struct _virDomainObj virDomainObj; typedef virDomainObj *virDomainObjPtr; struct _virDomainObj { virObjectLockable parent; + virCond cond; pid_t pid; virDomainStateReason state; @@ -2437,6 +2438,11 @@ void virDomainObjEndAPI(virDomainObjPtr *vm); bool virDomainObjTaint(virDomainObjPtr obj, virDomainTaintFlags taint); +void virDomainObjSignal(virDomainObjPtr vm); +void virDomainObjBroadcast(virDomainObjPtr vm); +int virDomainObjWait(virDomainObjPtr vm); +int virDomainObjWaitUntil(virDomainObjPtr vm, + unsigned long long whenms); int virDomainDefCheckUnsupportedMemoryHotplug(virDomainDefPtr def); int virDomainDeviceDefCheckUnsupportedMemoryDevice(virDomainDeviceDefPtr dev); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 55a5e19..62a4b4c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -380,6 +380,7 @@ virDomainNetTypeToString; virDomainNostateReasonTypeFromString; virDomainNostateReasonTypeToString; virDomainObjAssignDef; +virDomainObjBroadcast; virDomainObjCopyPersistentDef; virDomainObjEndAPI; virDomainObjFormat; @@ -408,8 +409,11 @@ virDomainObjParseNode; virDomainObjSetDefTransient; virDomainObjSetMetadata; virDomainObjSetState; +virDomainObjSignal; virDomainObjTaint; virDomainObjUpdateModificationImpact; +virDomainObjWait; +virDomainObjWaitUntil; virDomainOSTypeFromString; virDomainOSTypeToString; virDomainParseMemory; -- 2.4.3

The wrapper is useful for calling qemuBlockJobEventProcess with the event details stored in disk's privateData, which is the most likely usage of qemuBlockJobEventProcess. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 1 and 2 Version 3: - better flow in qemuBlockJobUpdate - document qemuBlockJobUpdate Version 2: - no changes ACKed in version 1 and 2 Version 3: - better flow in qemuBlockJobUpdate Version 2: - no changes src/libvirt_private.syms | 2 ++ src/qemu/qemu_blockjob.c | 47 +++++++++++++++++++++++++++++++++++++++-------- src/qemu/qemu_blockjob.h | 3 +++ 3 files changed, 44 insertions(+), 8 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 62a4b4c..0c9fa06 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -265,6 +265,8 @@ virDomainDiskInsert; virDomainDiskInsertPreAlloced; virDomainDiskIoTypeFromString; virDomainDiskIoTypeToString; +virDomainDiskMirrorStateTypeFromString; +virDomainDiskMirrorStateTypeToString; virDomainDiskPathByName; virDomainDiskRemove; virDomainDiskRemoveByName; diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 098a43a..eb05cef 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -38,6 +38,37 @@ VIR_LOG_INIT("qemu.qemu_blockjob"); + +/** + * qemuBlockJobUpdate: + * @driver: qemu driver + * @vm: domain + * @disk: domain disk + * + * Update disk's mirror state in response to a block job event stored in + * blockJobStatus by qemuProcessHandleBlockJob event handler. + * + * Returns the block job event processed or -1 if there was no pending event. + */ +int +qemuBlockJobUpdate(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); + int status = diskPriv->blockJobStatus; + + if (status != -1) { + qemuBlockJobEventProcess(driver, vm, disk, + diskPriv->blockJobType, + diskPriv->blockJobStatus); + diskPriv->blockJobStatus = -1; + } + + return status; +} + + /** * qemuBlockJobEventProcess: * @driver: qemu driver @@ -49,8 +80,6 @@ VIR_LOG_INIT("qemu.qemu_blockjob"); * Update disk's mirror state in response to a block job event * from QEMU. For mirror state's that must survive libvirt * restart, also update the domain's status XML. - * - * Returns 0 on success, -1 otherwise. */ void qemuBlockJobEventProcess(virQEMUDriverPtr driver, @@ -67,6 +96,12 @@ qemuBlockJobEventProcess(virQEMUDriverPtr driver, bool save = false; qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); + VIR_DEBUG("disk=%s, mirrorState=%s, type=%d, status=%d", + disk->dst, + NULLSTR(virDomainDiskMirrorStateTypeToString(disk->mirrorState)), + type, + status); + /* Have to generate two variants of the event for old vs. new * client callbacks */ if (type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && @@ -218,9 +253,7 @@ qemuBlockJobSyncEnd(virQEMUDriverPtr driver, if (diskPriv->blockJobSync && diskPriv->blockJobStatus != -1) { if (ret_status) *ret_status = diskPriv->blockJobStatus; - qemuBlockJobEventProcess(driver, vm, disk, - diskPriv->blockJobType, - diskPriv->blockJobStatus); + qemuBlockJobUpdate(driver, vm, disk); diskPriv->blockJobStatus = -1; } diskPriv->blockJobSync = false; @@ -300,9 +333,7 @@ qemuBlockJobSyncWaitWithTimeout(virQEMUDriverPtr driver, if (ret_status) *ret_status = diskPriv->blockJobStatus; - qemuBlockJobEventProcess(driver, vm, disk, - diskPriv->blockJobType, - diskPriv->blockJobStatus); + qemuBlockJobUpdate(driver, vm, disk); diskPriv->blockJobStatus = -1; return 0; diff --git a/src/qemu/qemu_blockjob.h b/src/qemu/qemu_blockjob.h index ba372a2..81e893e 100644 --- a/src/qemu/qemu_blockjob.h +++ b/src/qemu/qemu_blockjob.h @@ -25,6 +25,9 @@ # include "internal.h" # include "qemu_conf.h" +int qemuBlockJobUpdate(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk); void qemuBlockJobEventProcess(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainDiskDefPtr disk, -- 2.4.3

On Wed, Jun 10, 2015 at 15:42:36 +0200, Jiri Denemark wrote:
The wrapper is useful for calling qemuBlockJobEventProcess with the event details stored in disk's privateData, which is the most likely usage of qemuBlockJobEventProcess.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> ---
Notes: ACKed in version 1 and 2
Version 3: - better flow in qemuBlockJobUpdate - document qemuBlockJobUpdate
Version 2: - no changes
ACKed in version 1 and 2
Version 3: - better flow in qemuBlockJobUpdate
Version 2: - no changes
src/libvirt_private.syms | 2 ++ src/qemu/qemu_blockjob.c | 47 +++++++++++++++++++++++++++++++++++++++-------- src/qemu/qemu_blockjob.h | 3 +++ 3 files changed, 44 insertions(+), 8 deletions(-)
...
diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 098a43a..eb05cef 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -38,6 +38,37 @@
VIR_LOG_INIT("qemu.qemu_blockjob");
+ +/** + * qemuBlockJobUpdate: + * @driver: qemu driver + * @vm: domain + * @disk: domain disk + * + * Update disk's mirror state in response to a block job event stored in + * blockJobStatus by qemuProcessHandleBlockJob event handler. + * + * Returns the block job event processed or -1 if there was no pending event. + */ +int +qemuBlockJobUpdate(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk) +{ + qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); + int status = diskPriv->blockJobStatus; + + if (status != -1) { + qemuBlockJobEventProcess(driver, vm, disk, + diskPriv->blockJobType, + diskPriv->blockJobStatus); + diskPriv->blockJobStatus = -1; + } + + return status; +}
Looks much better!
+ + /** * qemuBlockJobEventProcess: * @driver: qemu driver
ACK, Peter

Because we are polling we may detect some errors after we asked QEMU for migration status even though they occurred before. If this happens and QEMU reports migration completed successfully, we would happily report the migration succeeded even though we should have cancelled it because of the other error. In practise it is not a big issue now but it will become a much bigger issue once the check for storage migration status is moved inside the loop in qemuMigrationWaitForCompletion. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - already ACKed in version 1 :-) - really do what commit message describes src/qemu/qemu_migration.c | 48 +++++++++++++++++++++++------------------------ 1 file changed, 23 insertions(+), 25 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 70400f3..8d01468 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2442,6 +2442,7 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, qemuDomainJobInfoPtr jobInfo = priv->job.current; const char *job; int pauseReason; + int ret = -1; switch (priv->job.asyncJob) { case QEMU_ASYNC_JOB_MIGRATION_OUT: @@ -2459,12 +2460,12 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED; - while (1) { + while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { /* Poll every 50ms for progress & to allow cancellation */ struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) == -1) - break; + goto error; /* cancel migration if disk I/O error is emitted while migrating */ if (abort_on_error && @@ -2472,40 +2473,37 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, pauseReason == VIR_DOMAIN_PAUSED_IOERROR) { virReportError(VIR_ERR_OPERATION_FAILED, _("%s: %s"), job, _("failed due to I/O error")); - break; + goto error; } if (dconn && virConnectIsAlive(dconn) <= 0) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("Lost connection to destination host")); - break; + goto error; } - if (jobInfo->type != VIR_DOMAIN_JOB_UNBOUNDED) - break; - - virObjectUnlock(vm); - - nanosleep(&ts, NULL); - - virObjectLock(vm); + if (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { + virObjectUnlock(vm); + nanosleep(&ts, NULL); + virObjectLock(vm); + } } - if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) { - qemuDomainJobInfoUpdateDowntime(jobInfo); - VIR_FREE(priv->job.completed); - if (VIR_ALLOC(priv->job.completed) == 0) - *priv->job.completed = *jobInfo; - return 0; - } else if (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { - /* The migration was aborted by us rather than QEMU itself so let's - * update the job type and notify the caller to send migrate_cancel. - */ + qemuDomainJobInfoUpdateDowntime(jobInfo); + VIR_FREE(priv->job.completed); + if (VIR_ALLOC(priv->job.completed) == 0) + *priv->job.completed = *jobInfo; + return 0; + + error: + /* Check if the migration was aborted by us rather than QEMU itself. */ + if (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED || + jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) { + if (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) + ret = -2; jobInfo->type = VIR_DOMAIN_JOB_FAILED; - return -2; - } else { - return -1; } + return ret; } -- 2.4.3

By switching block jobs to use domain conditions, we can drop some pretty complicated code in NBD storage migration. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: Version 3: - split into 3 patches Version 2: - slightly modified to use domain conditions po/POTFILES.in | 1 - src/qemu/qemu_blockjob.c | 137 +++------------------------------------------- src/qemu/qemu_blockjob.h | 12 +--- src/qemu/qemu_domain.c | 17 +----- src/qemu/qemu_domain.h | 1 - src/qemu/qemu_driver.c | 24 ++++---- src/qemu/qemu_migration.c | 112 +++++++++++++++++-------------------- src/qemu/qemu_process.c | 13 ++--- 8 files changed, 76 insertions(+), 241 deletions(-) diff --git a/po/POTFILES.in b/po/POTFILES.in index bb0f6e1..dd06ab3 100644 --- a/po/POTFILES.in +++ b/po/POTFILES.in @@ -112,7 +112,6 @@ src/parallels/parallels_utils.h src/parallels/parallels_storage.c src/phyp/phyp_driver.c src/qemu/qemu_agent.c -src/qemu/qemu_blockjob.c src/qemu/qemu_capabilities.c src/qemu/qemu_cgroup.c src/qemu/qemu_command.c diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index eb05cef..3aa6118 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -214,19 +214,17 @@ qemuBlockJobEventProcess(virQEMUDriverPtr driver, * * During a synchronous block job, a block job event for @disk * will not be processed asynchronously. Instead, it will be - * processed only when qemuBlockJobSyncWait* or - * qemuBlockJobSyncEnd is called. + * processed only when qemuBlockJobUpdate or qemuBlockJobSyncEnd + * is called. */ void qemuBlockJobSyncBegin(virDomainDiskDefPtr disk) { qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); - if (diskPriv->blockJobSync) - VIR_WARN("Disk %s already has synchronous block job", - disk->dst); - + VIR_DEBUG("disk=%s", disk->dst); diskPriv->blockJobSync = true; + diskPriv->blockJobStatus = -1; } @@ -235,135 +233,16 @@ qemuBlockJobSyncBegin(virDomainDiskDefPtr disk) * @driver: qemu driver * @vm: domain * @disk: domain disk - * @ret_status: pointer to virConnectDomainEventBlockJobStatus * * End a synchronous block job for @disk. Any pending block job event - * for the disk is processed, and its status is recorded in the - * virConnectDomainEventBlockJobStatus field pointed to by - * @ret_status. + * for the disk is processed. */ void qemuBlockJobSyncEnd(virQEMUDriverPtr driver, virDomainObjPtr vm, - virDomainDiskDefPtr disk, - virConnectDomainEventBlockJobStatus *ret_status) + virDomainDiskDefPtr disk) { - qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); - - if (diskPriv->blockJobSync && diskPriv->blockJobStatus != -1) { - if (ret_status) - *ret_status = diskPriv->blockJobStatus; - qemuBlockJobUpdate(driver, vm, disk); - diskPriv->blockJobStatus = -1; - } - diskPriv->blockJobSync = false; -} - - -/** - * qemuBlockJobSyncWaitWithTimeout: - * @driver: qemu driver - * @vm: domain - * @disk: domain disk - * @timeout: timeout in milliseconds - * @ret_status: pointer to virConnectDomainEventBlockJobStatus - * - * Wait up to @timeout milliseconds for a block job event for @disk. - * If an event is received it is processed, and its status is recorded - * in the virConnectDomainEventBlockJobStatus field pointed to by - * @ret_status. - * - * If @timeout is not 0, @vm will be unlocked while waiting for the event. - * - * Returns 0 if an event was received or the timeout expired, - * -1 otherwise. - */ -int -qemuBlockJobSyncWaitWithTimeout(virQEMUDriverPtr driver, - virDomainObjPtr vm, - virDomainDiskDefPtr disk, - unsigned long long timeout, - virConnectDomainEventBlockJobStatus *ret_status) -{ - qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); - - if (!diskPriv->blockJobSync) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("No current synchronous block job")); - return -1; - } - - while (diskPriv->blockJobSync && diskPriv->blockJobStatus == -1) { - int r; - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("guest unexpectedly quit")); - diskPriv->blockJobSync = false; - return -1; - } - - if (timeout == (unsigned long long)-1) { - r = virCondWait(&diskPriv->blockJobSyncCond, &vm->parent.lock); - } else if (timeout) { - unsigned long long now; - if (virTimeMillisNow(&now) < 0) { - virReportSystemError(errno, "%s", - _("Unable to get current time")); - return -1; - } - r = virCondWaitUntil(&diskPriv->blockJobSyncCond, - &vm->parent.lock, - now + timeout); - if (r < 0 && errno == ETIMEDOUT) - return 0; - } else { - errno = ETIMEDOUT; - return 0; - } - - if (r < 0) { - diskPriv->blockJobSync = false; - virReportSystemError(errno, "%s", - _("Unable to wait on block job sync " - "condition")); - return -1; - } - } - - if (ret_status) - *ret_status = diskPriv->blockJobStatus; + VIR_DEBUG("disk=%s", disk->dst); qemuBlockJobUpdate(driver, vm, disk); - diskPriv->blockJobStatus = -1; - - return 0; -} - - -/** - * qemuBlockJobSyncWait: - * @driver: qemu driver - * @vm: domain - * @disk: domain disk - * @ret_status: pointer to virConnectDomainEventBlockJobStatus - * - * Wait for a block job event for @disk. If an event is received it - * is processed, and its status is recorded in the - * virConnectDomainEventBlockJobStatus field pointed to by - * @ret_status. - * - * @vm will be unlocked while waiting for the event. - * - * Returns 0 if an event was received, - * -1 otherwise. - */ -int -qemuBlockJobSyncWait(virQEMUDriverPtr driver, - virDomainObjPtr vm, - virDomainDiskDefPtr disk, - virConnectDomainEventBlockJobStatus *ret_status) -{ - return qemuBlockJobSyncWaitWithTimeout(driver, vm, disk, - (unsigned long long)-1, - ret_status); + QEMU_DOMAIN_DISK_PRIVATE(disk)->blockJobSync = false; } diff --git a/src/qemu/qemu_blockjob.h b/src/qemu/qemu_blockjob.h index 81e893e..775ce95 100644 --- a/src/qemu/qemu_blockjob.h +++ b/src/qemu/qemu_blockjob.h @@ -37,16 +37,6 @@ void qemuBlockJobEventProcess(virQEMUDriverPtr driver, void qemuBlockJobSyncBegin(virDomainDiskDefPtr disk); void qemuBlockJobSyncEnd(virQEMUDriverPtr driver, virDomainObjPtr vm, - virDomainDiskDefPtr disk, - virConnectDomainEventBlockJobStatus *ret_status); -int qemuBlockJobSyncWaitWithTimeout(virQEMUDriverPtr driver, - virDomainObjPtr vm, - virDomainDiskDefPtr disk, - unsigned long long timeout, - virConnectDomainEventBlockJobStatus *ret_status); -int qemuBlockJobSyncWait(virQEMUDriverPtr driver, - virDomainObjPtr vm, - virDomainDiskDefPtr disk, - virConnectDomainEventBlockJobStatus *ret_status); + virDomainDiskDefPtr disk); #endif /* __QEMU_BLOCKJOB_H__ */ diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0682390..0b5ebe1 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -413,7 +413,6 @@ qemuDomainJobInfoToParams(qemuDomainJobInfoPtr jobInfo, static virClassPtr qemuDomainDiskPrivateClass; -static void qemuDomainDiskPrivateDispose(void *obj); static int qemuDomainDiskPrivateOnceInit(void) @@ -421,7 +420,7 @@ qemuDomainDiskPrivateOnceInit(void) qemuDomainDiskPrivateClass = virClassNew(virClassForObject(), "qemuDomainDiskPrivate", sizeof(qemuDomainDiskPrivate), - qemuDomainDiskPrivateDispose); + NULL); if (!qemuDomainDiskPrivateClass) return -1; else @@ -441,23 +440,9 @@ qemuDomainDiskPrivateNew(void) if (!(priv = virObjectNew(qemuDomainDiskPrivateClass))) return NULL; - if (virCondInit(&priv->blockJobSyncCond) < 0) { - virReportSystemError(errno, "%s", _("Failed to initialize condition")); - virObjectUnref(priv); - return NULL; - } - return (virObjectPtr) priv; } -static void -qemuDomainDiskPrivateDispose(void *obj) -{ - qemuDomainDiskPrivatePtr priv = obj; - - virCondDestroy(&priv->blockJobSyncCond); -} - static void * qemuDomainObjPrivateAlloc(void) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 053607f..9003c9b 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -214,7 +214,6 @@ struct _qemuDomainDiskPrivate { bool blockjob; /* for some synchronous block jobs, we need to notify the owner */ - virCond blockJobSyncCond; int blockJobType; /* type of the block job from the event */ int blockJobStatus; /* status of the finished block job */ bool blockJobSync; /* the block job needs synchronized termination */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 34e5581..0214e6b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16450,10 +16450,8 @@ qemuDomainBlockJobAbort(virDomainPtr dom, goto endjob; } - if (modern && !async) { - /* prepare state for event delivery */ + if (modern && !async) qemuBlockJobSyncBegin(disk); - } if (pivot) { if ((ret = qemuDomainBlockPivot(driver, vm, device, disk)) < 0) @@ -16501,21 +16499,21 @@ qemuDomainBlockJobAbort(virDomainPtr dom, VIR_DOMAIN_BLOCK_JOB_TYPE_PULL, VIR_DOMAIN_BLOCK_JOB_CANCELED); } else { - virConnectDomainEventBlockJobStatus status = -1; - if (qemuBlockJobSyncWait(driver, vm, disk, &status) < 0) { - ret = -1; - } else if (status == VIR_DOMAIN_BLOCK_JOB_FAILED) { - virReportError(VIR_ERR_OPERATION_FAILED, - _("failed to terminate block job on disk '%s'"), - disk->dst); - ret = -1; + qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); + qemuBlockJobUpdate(driver, vm, disk); + while (diskPriv->blockjob) { + if (virDomainObjWait(vm) < 0) { + ret = -1; + goto endjob; + } + qemuBlockJobUpdate(driver, vm, disk); } } } endjob: - if (disk && QEMU_DOMAIN_DISK_PRIVATE(disk)->blockJobSync) - qemuBlockJobSyncEnd(driver, vm, disk, NULL); + if (disk) + qemuBlockJobSyncEnd(driver, vm, disk); qemuDomainObjEndJob(driver, vm); cleanup: diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 8d01468..83d6c22 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1720,7 +1720,7 @@ qemuMigrationStopNBDServer(virQEMUDriverPtr driver, /** - * qemuMigrationCheckDriveMirror: + * qemuMigrationDriveMirrorReady: * @driver: qemu driver * @vm: domain * @@ -1733,37 +1733,39 @@ qemuMigrationStopNBDServer(virQEMUDriverPtr driver, * -1 on error. */ static int -qemuMigrationCheckDriveMirror(virQEMUDriverPtr driver, +qemuMigrationDriveMirrorReady(virQEMUDriverPtr driver, virDomainObjPtr vm) { size_t i; - int ret = 1; + size_t notReady = 0; + int status; for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDefPtr disk = vm->def->disks[i]; qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); - if (!diskPriv->migrating || !diskPriv->blockJobSync) + if (!diskPriv->migrating) continue; - /* process any pending event */ - if (qemuBlockJobSyncWaitWithTimeout(driver, vm, disk, - 0ull, NULL) < 0) - return -1; - - switch (disk->mirrorState) { - case VIR_DOMAIN_DISK_MIRROR_STATE_NONE: - ret = 0; - break; - case VIR_DOMAIN_DISK_MIRROR_STATE_ABORT: + status = qemuBlockJobUpdate(driver, vm, disk); + if (status == VIR_DOMAIN_BLOCK_JOB_FAILED) { virReportError(VIR_ERR_OPERATION_FAILED, _("migration of disk %s failed"), disk->dst); return -1; } + + if (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_READY) + notReady++; } - return ret; + if (notReady) { + VIR_DEBUG("Waiting for %zu disk mirrors to get ready", notReady); + return 0; + } else { + VIR_DEBUG("All disk mirrors are ready"); + return 1; + } } @@ -1823,18 +1825,17 @@ qemuMigrationCancelOneDriveMirror(virQEMUDriverPtr driver, /* Mirror may become ready before cancellation takes * effect; loop if we get that event first */ - do { - ret = qemuBlockJobSyncWait(driver, vm, disk, &status); - if (ret < 0) { - VIR_WARN("Unable to wait for block job on %s to cancel", - diskAlias); + while (1) { + status = qemuBlockJobUpdate(driver, vm, disk); + if (status != -1 && status != VIR_DOMAIN_BLOCK_JOB_READY) + break; + if ((ret = virDomainObjWait(vm)) < 0) goto endjob; - } - } while (status == VIR_DOMAIN_BLOCK_JOB_READY); + } } endjob: - qemuBlockJobSyncEnd(driver, vm, disk, NULL); + qemuBlockJobSyncEnd(driver, vm, disk); if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_ABORT) disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; @@ -1924,6 +1925,9 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, char *nbd_dest = NULL; char *hoststr = NULL; unsigned int mirror_flags = VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + int rv; + + VIR_DEBUG("Starting drive mirrors for domain %s", vm->def->name); /* steal NBD port and thus prevent its propagation back to destination */ port = mig->nbd->port; @@ -1950,60 +1954,46 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, !virDomainDiskGetSource(disk)) continue; - VIR_FREE(diskAlias); - VIR_FREE(nbd_dest); if ((virAsprintf(&diskAlias, "%s%s", QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) || (virAsprintf(&nbd_dest, "nbd:%s:%d:exportname=%s", hoststr, port, diskAlias) < 0)) goto cleanup; + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto cleanup; + qemuBlockJobSyncBegin(disk); - - if (qemuDomainObjEnterMonitorAsync(driver, vm, - QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) { - qemuBlockJobSyncEnd(driver, vm, disk, NULL); - goto cleanup; - } - mon_ret = qemuMonitorDriveMirror(priv->mon, diskAlias, nbd_dest, NULL, speed, 0, 0, mirror_flags); + VIR_FREE(diskAlias); + VIR_FREE(nbd_dest); if (qemuDomainObjExitMonitor(driver, vm) < 0 || mon_ret < 0) { - qemuBlockJobSyncEnd(driver, vm, disk, NULL); + qemuBlockJobSyncEnd(driver, vm, disk); goto cleanup; } diskPriv->migrating = true; } - /* Wait for each disk to become ready in turn, but check the status - * for *all* mirrors to determine if any have aborted. */ - for (i = 0; i < vm->def->ndisks; i++) { - virDomainDiskDefPtr disk = vm->def->disks[i]; - qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); - - if (!diskPriv->migrating) - continue; - - while (disk->mirrorState != VIR_DOMAIN_DISK_MIRROR_STATE_READY) { - /* The following check should be race free as long as the variable - * is set only with domain object locked. And here we have the - * domain object locked too. */ - if (priv->job.asyncAbort) { - priv->job.current->type = VIR_DOMAIN_JOB_CANCELLED; - virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), - qemuDomainAsyncJobTypeToString(priv->job.asyncJob), - _("canceled by client")); - goto cleanup; - } - - if (qemuBlockJobSyncWaitWithTimeout(driver, vm, disk, - 500ull, NULL) < 0) - goto cleanup; - - if (qemuMigrationCheckDriveMirror(driver, vm) < 0) - goto cleanup; + while ((rv = qemuMigrationDriveMirrorReady(driver, vm)) != 1) { + unsigned long long now; + + if (rv < 0) + goto cleanup; + + if (priv->job.asyncAbort) { + priv->job.current->type = VIR_DOMAIN_JOB_CANCELLED; + virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), + qemuDomainAsyncJobTypeToString(priv->job.asyncJob), + _("canceled by client")); + goto cleanup; } + + if (virTimeMillisNow(&now) < 0 || + virDomainObjWaitUntil(vm, now + 500) < 0) + goto cleanup; } /* Okay, all disks are ready. Modify migrate_flags */ @@ -4092,7 +4082,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, /* Confirm state of drive mirrors */ if (mig->nbd) { - if (qemuMigrationCheckDriveMirror(driver, vm) != 1) { + if (qemuMigrationDriveMirrorReady(driver, vm) != 1) { ret = -1; goto cancel; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 64ee049..3c9d4bc 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1001,10 +1001,10 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); if (diskPriv->blockJobSync) { + /* We have a SYNC API waiting for this event, dispatch it back */ diskPriv->blockJobType = type; diskPriv->blockJobStatus = status; - /* We have an SYNC API waiting for this event, dispatch it back */ - virCondSignal(&diskPriv->blockJobSyncCond); + virDomainObjSignal(vm); } else { /* there is no waiting SYNC API, dispatch the update to a thread */ if (VIR_ALLOC(processEvent) < 0) @@ -5055,13 +5055,8 @@ void qemuProcessStop(virQEMUDriverPtr driver, if (virAtomicIntDecAndTest(&driver->nactive) && driver->inhibitCallback) driver->inhibitCallback(false, driver->inhibitOpaque); - /* Wake up anything waiting on synchronous block jobs */ - for (i = 0; i < vm->def->ndisks; i++) { - qemuDomainDiskPrivatePtr diskPriv = - QEMU_DOMAIN_DISK_PRIVATE(vm->def->disks[i]); - if (diskPriv->blockJobSync && diskPriv->blockJobStatus == -1) - virCondSignal(&diskPriv->blockJobSyncCond); - } + /* Wake up anything waiting on domain condition */ + virDomainObjBroadcast(vm); if ((logfile = qemuDomainCreateLog(driver, vm, true)) < 0) { /* To not break the normal domain shutdown process, skip the -- 2.4.3

On Wed, Jun 10, 2015 at 15:42:38 +0200, Jiri Denemark wrote:
By switching block jobs to use domain conditions, we can drop some pretty complicated code in NBD storage migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> ---
Notes: Version 3: - split into 3 patches
Version 2: - slightly modified to use domain conditions
po/POTFILES.in | 1 - src/qemu/qemu_blockjob.c | 137 +++------------------------------------------- src/qemu/qemu_blockjob.h | 12 +--- src/qemu/qemu_domain.c | 17 +----- src/qemu/qemu_domain.h | 1 - src/qemu/qemu_driver.c | 24 ++++---- src/qemu/qemu_migration.c | 112 +++++++++++++++++-------------------- src/qemu/qemu_process.c | 13 ++--- 8 files changed, 76 insertions(+), 241 deletions(-)
ACK, Peter

Instead of cancelling disk mirrors sequentially, let's just call block-job-cancel for all migrating disks and then wait until all disappear. In case we cancel disk mirrors at the end of successful migration we also need to check all block jobs completed successfully. Otherwise we have to abort the migration. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: Version 3: - new patch (separated from "qemu: Use domain condition for synchronous block jobs") - get rid of bool *failed parameter in qemuMigrationDriveMirrorCancelled src/qemu/qemu_migration.c | 196 +++++++++++++++++++++++++++++++--------------- 1 file changed, 134 insertions(+), 62 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 83d6c22..11504eb 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1769,76 +1769,122 @@ qemuMigrationDriveMirrorReady(virQEMUDriverPtr driver, } -/** - * qemuMigrationCancelOneDriveMirror: - * @driver: qemu driver - * @vm: domain +/* + * If @check is true, the function will report an error and return a different + * code in case a block job fails. This way we can properly abort migration in + * case some block jobs failed once all memory has already been transferred. * - * Cancel all drive-mirrors started by qemuMigrationDriveMirror. - * Any pending block job events for the mirrored disks will be - * processed. - * - * Returns 0 on success, -1 otherwise. + * Returns 1 if all mirrors are gone, + * 0 if some mirrors are still active, + * -1 some mirrors failed but some are still active, + * -2 all mirrors are gone but some of them failed. */ static int -qemuMigrationCancelOneDriveMirror(virQEMUDriverPtr driver, +qemuMigrationDriveMirrorCancelled(virQEMUDriverPtr driver, virDomainObjPtr vm, - virDomainDiskDefPtr disk) + bool check) { - qemuDomainObjPrivatePtr priv = vm->privateData; - char *diskAlias = NULL; - int ret = -1; + size_t i; + size_t active = 0; + int status; + bool failed = false; - /* No need to cancel if mirror already aborted */ - if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_ABORT) { - ret = 0; - } else { - virConnectDomainEventBlockJobStatus status = -1; + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); - if (virAsprintf(&diskAlias, "%s%s", - QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) - goto cleanup; + if (!diskPriv->migrating) + continue; - if (qemuDomainObjEnterMonitorAsync(driver, vm, - QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) - goto endjob; - ret = qemuMonitorBlockJobCancel(priv->mon, diskAlias, true); - if (qemuDomainObjExitMonitor(driver, vm) < 0) - goto endjob; - - if (ret < 0) { - virDomainBlockJobInfo info; - - /* block-job-cancel can fail if QEMU simultaneously - * aborted the job; probe for it again to detect this */ - if (qemuMonitorBlockJobInfo(priv->mon, diskAlias, - &info, NULL) == 0) { - ret = 0; - } else { + status = qemuBlockJobUpdate(driver, vm, disk); + switch (status) { + case VIR_DOMAIN_BLOCK_JOB_FAILED: + if (check) { virReportError(VIR_ERR_OPERATION_FAILED, - _("could not cancel migration of disk %s"), + _("migration of disk %s failed"), disk->dst); + failed = true; } + /* fallthrough */ + case VIR_DOMAIN_BLOCK_JOB_CANCELED: + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + qemuBlockJobSyncEnd(driver, vm, disk); + diskPriv->migrating = false; + break; - goto endjob; + default: + active++; } + } - /* Mirror may become ready before cancellation takes - * effect; loop if we get that event first */ - while (1) { - status = qemuBlockJobUpdate(driver, vm, disk); - if (status != -1 && status != VIR_DOMAIN_BLOCK_JOB_READY) - break; - if ((ret = virDomainObjWait(vm)) < 0) - goto endjob; + if (failed) { + if (active) { + VIR_DEBUG("Some disk mirrors failed; still waiting for %zu " + "disk mirrors to finish", active); + return -1; + } else { + VIR_DEBUG("All disk mirrors are gone; some of them failed"); + return -2; + } + } else { + if (active) { + VIR_DEBUG("Waiting for %zu disk mirrors to finish", active); + return 0; + } else { + VIR_DEBUG("All disk mirrors are gone"); + return 1; } } +} + + +/* + * Returns 0 on success, + * 1 when job is already completed or it failed and failNoJob is false, + * -1 on error or when job failed and failNoJob is true. + */ +static int +qemuMigrationCancelOneDriveMirror(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + bool failNoJob) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + char *diskAlias = NULL; + int ret = -1; + int status; + int rv; + + status = qemuBlockJobUpdate(driver, vm, disk); + switch (status) { + case VIR_DOMAIN_BLOCK_JOB_FAILED: + case VIR_DOMAIN_BLOCK_JOB_CANCELED: + if (failNoJob) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("migration of disk %s failed"), + disk->dst); + return -1; + } + return 1; + + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + return 1; + } + + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) + return -1; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto cleanup; + + rv = qemuMonitorBlockJobCancel(priv->mon, diskAlias, true); - endjob: - qemuBlockJobSyncEnd(driver, vm, disk); + if (qemuDomainObjExitMonitor(driver, vm) < 0 || rv < 0) + goto cleanup; - if (disk->mirrorState == VIR_DOMAIN_DISK_MIRROR_STATE_ABORT) - disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + ret = 0; cleanup: VIR_FREE(diskAlias); @@ -1850,6 +1896,7 @@ qemuMigrationCancelOneDriveMirror(virQEMUDriverPtr driver, * qemuMigrationCancelDriveMirror: * @driver: qemu driver * @vm: domain + * @check: if true report an error when some of the mirrors fails * * Cancel all drive-mirrors started by qemuMigrationDriveMirror. * Any pending block job events for the affected disks will be @@ -1859,28 +1906,53 @@ qemuMigrationCancelOneDriveMirror(virQEMUDriverPtr driver, */ static int qemuMigrationCancelDriveMirror(virQEMUDriverPtr driver, - virDomainObjPtr vm) + virDomainObjPtr vm, + bool check) { virErrorPtr err = NULL; - int ret = 0; + int ret = -1; size_t i; + int rv; + bool failed = false; + + VIR_DEBUG("Cancelling drive mirrors for domain %s", vm->def->name); for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDefPtr disk = vm->def->disks[i]; qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); - if (!diskPriv->migrating || !diskPriv->blockJobSync) + if (!diskPriv->migrating) continue; - if (qemuMigrationCancelOneDriveMirror(driver, vm, disk) < 0) { - ret = -1; - if (!err) - err = virSaveLastError(); + rv = qemuMigrationCancelOneDriveMirror(driver, vm, disk, check); + if (rv != 0) { + if (rv < 0) { + if (!err) + err = virSaveLastError(); + failed = true; + } + qemuBlockJobSyncEnd(driver, vm, disk); + diskPriv->migrating = false; + } + } + + while ((rv = qemuMigrationDriveMirrorCancelled(driver, vm, check)) != 1) { + if (rv < 0) { + failed = true; + if (rv == -2) + break; } - diskPriv->migrating = false; + if (failed && !err) + err = virSaveLastError(); + + if (virDomainObjWait(vm) < 0) + goto cleanup; } + ret = failed ? -1 : 0; + + cleanup: if (err) { virSetError(err); virFreeError(err); @@ -3532,7 +3604,7 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver, virErrorPtr orig_err = virSaveLastError(); /* cancel any outstanding NBD jobs */ - qemuMigrationCancelDriveMirror(driver, vm); + qemuMigrationCancelDriveMirror(driver, vm, false); virSetError(orig_err); virFreeError(orig_err); @@ -4111,7 +4183,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, /* cancel any outstanding NBD jobs */ if (mig && mig->nbd) { - if (qemuMigrationCancelDriveMirror(driver, vm) < 0) + if (qemuMigrationCancelDriveMirror(driver, vm, ret == 0) < 0) ret = -1; } -- 2.4.3

On Wed, Jun 10, 2015 at 15:42:39 +0200, Jiri Denemark wrote:
Instead of cancelling disk mirrors sequentially, let's just call block-job-cancel for all migrating disks and then wait until all disappear.
In case we cancel disk mirrors at the end of successful migration we also need to check all block jobs completed successfully. Otherwise we have to abort the migration.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> ---
ACK, Peter

Abort migration as soon as we detect that some of the disk mirrors failed. There's no sense in trying to finish memory migration first. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: Version 3: - new patch (separated from "qemu: Use domain condition for synchronous block jobs") src/qemu/qemu_migration.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 11504eb..b11407e 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2498,7 +2498,8 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob asyncJob, virConnectPtr dconn, - bool abort_on_error) + bool abort_on_error, + bool storage) { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; @@ -2529,6 +2530,10 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) == -1) goto error; + if (storage && + qemuMigrationDriveMirrorReady(driver, vm) < 0) + break; + /* cancel migration if disk I/O error is emitted while migrating */ if (abort_on_error && virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED && @@ -4146,20 +4151,12 @@ qemuMigrationRun(virQEMUDriverPtr driver, rc = qemuMigrationWaitForCompletion(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT, - dconn, abort_on_error); + dconn, abort_on_error, !!mig->nbd); if (rc == -2) goto cancel; else if (rc == -1) goto cleanup; - /* Confirm state of drive mirrors */ - if (mig->nbd) { - if (qemuMigrationDriveMirrorReady(driver, vm) != 1) { - ret = -1; - goto cancel; - } - } - /* When migration completed, QEMU will have paused the * CPUs for us, but unless we're using the JSON monitor * we won't have been notified of this, so might still @@ -5637,7 +5634,8 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, if (rc < 0) goto cleanup; - rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, false); + rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, + NULL, false, false); if (rc < 0) { if (rc == -2) { -- 2.4.3

On Wed, Jun 10, 2015 at 15:42:40 +0200, Jiri Denemark wrote:
Abort migration as soon as we detect that some of the disk mirrors failed. There's no sense in trying to finish memory migration first.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> ---
Notes: Version 3: - new patch (separated from "qemu: Use domain condition for synchronous block jobs")
src/qemu/qemu_migration.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-)
ACK, Peter

This patch reverts commit 76c61cdca20c106960af033e5d0f5da70177af0f. VIR_DOMAIN_DISK_MIRROR_STATE_ABORT says we asked for a block job to be aborted rather than saying it was aborted. Let's just use VIR_DOMAIN_DISK_MIRROR_STATE_NONE consistently whenever a block job finishes since no caller depends on VIR_DOMAIN_DISK_MIRROR_STATE_ABORT (anymore) to check whether a block job failed or it was cancelled. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 2, 3: - no changes src/qemu/qemu_blockjob.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c index 3aa6118..8849850 100644 --- a/src/qemu/qemu_blockjob.c +++ b/src/qemu/qemu_blockjob.c @@ -174,8 +174,7 @@ qemuBlockJobEventProcess(virQEMUDriverPtr driver, case VIR_DOMAIN_BLOCK_JOB_CANCELED: virStorageSourceFree(disk->mirror); disk->mirror = NULL; - disk->mirrorState = status == VIR_DOMAIN_BLOCK_JOB_FAILED ? - VIR_DOMAIN_DISK_MIRROR_STATE_ABORT : VIR_DOMAIN_DISK_MIRROR_STATE_NONE; + disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_NONE; disk->mirrorJob = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; save = true; diskPriv->blockjob = false; -- 2.4.3

So that they can format private data (e.g., disk private data) stored elsewhere in the domain object. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 2, 3: - no changes src/conf/domain_conf.c | 4 ++-- src/conf/domain_conf.h | 6 ++++-- src/libxl/libxl_domain.c | 10 ++++++---- src/lxc/lxc_domain.c | 12 ++++++++---- src/qemu/qemu_domain.c | 10 ++++++---- 5 files changed, 26 insertions(+), 16 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 433183f..350640f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -15974,7 +15974,7 @@ virDomainObjParseXML(xmlDocPtr xml, VIR_FREE(nodes); if (xmlopt->privateData.parse && - ((xmlopt->privateData.parse)(ctxt, obj->privateData)) < 0) + xmlopt->privateData.parse(ctxt, obj) < 0) goto error; return obj; @@ -21863,7 +21863,7 @@ virDomainObjFormat(virDomainXMLOptionPtr xmlopt, } if (xmlopt->privateData.format && - ((xmlopt->privateData.format)(&buf, obj->privateData)) < 0) + xmlopt->privateData.format(&buf, obj) < 0) goto error; if (virDomainDefFormatInternal(obj->def, flags, &buf) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index ac29ce5..44ecd4a 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2357,8 +2357,10 @@ typedef virDomainXMLOption *virDomainXMLOptionPtr; typedef void *(*virDomainXMLPrivateDataAllocFunc)(void); typedef void (*virDomainXMLPrivateDataFreeFunc)(void *); typedef virObjectPtr (*virDomainXMLPrivateDataNewFunc)(void); -typedef int (*virDomainXMLPrivateDataFormatFunc)(virBufferPtr, void *); -typedef int (*virDomainXMLPrivateDataParseFunc)(xmlXPathContextPtr, void *); +typedef int (*virDomainXMLPrivateDataFormatFunc)(virBufferPtr, + virDomainObjPtr); +typedef int (*virDomainXMLPrivateDataParseFunc)(xmlXPathContextPtr, + virDomainObjPtr); /* Called once after everything else has been parsed, for adjusting * overall domain defaults. */ diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c index 0652270..ebd7964 100644 --- a/src/libxl/libxl_domain.c +++ b/src/libxl/libxl_domain.c @@ -223,9 +223,10 @@ libxlDomainObjPrivateFree(void *data) } static int -libxlDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) +libxlDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, + virDomainObjPtr vm) { - libxlDomainObjPrivatePtr priv = data; + libxlDomainObjPrivatePtr priv = vm->privateData; priv->lockState = virXPathString("string(./lockstate)", ctxt); @@ -233,9 +234,10 @@ libxlDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) } static int -libxlDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) +libxlDomainObjPrivateXMLFormat(virBufferPtr buf, + virDomainObjPtr vm) { - libxlDomainObjPrivatePtr priv = data; + libxlDomainObjPrivatePtr priv = vm->privateData; if (priv->lockState) virBufferAsprintf(buf, "<lockstate>%s</lockstate>\n", priv->lockState); diff --git a/src/lxc/lxc_domain.c b/src/lxc/lxc_domain.c index c2180cb..70606f3 100644 --- a/src/lxc/lxc_domain.c +++ b/src/lxc/lxc_domain.c @@ -51,9 +51,11 @@ static void virLXCDomainObjPrivateFree(void *data) } -static int virLXCDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) +static int +virLXCDomainObjPrivateXMLFormat(virBufferPtr buf, + virDomainObjPtr vm) { - virLXCDomainObjPrivatePtr priv = data; + virLXCDomainObjPrivatePtr priv = vm->privateData; virBufferAsprintf(buf, "<init pid='%llu'/>\n", (unsigned long long)priv->initpid); @@ -61,9 +63,11 @@ static int virLXCDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) return 0; } -static int virLXCDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) +static int +virLXCDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, + virDomainObjPtr vm) { - virLXCDomainObjPrivatePtr priv = data; + virLXCDomainObjPrivatePtr priv = vm->privateData; unsigned long long thepid; if (virXPathULongLong("string(./init[1]/@pid)", ctxt, &thepid) < 0) { diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 0b5ebe1..24ff020 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -511,9 +511,10 @@ qemuDomainObjPrivateFree(void *data) static int -qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) +qemuDomainObjPrivateXMLFormat(virBufferPtr buf, + virDomainObjPtr vm) { - qemuDomainObjPrivatePtr priv = data; + qemuDomainObjPrivatePtr priv = vm->privateData; const char *monitorpath; qemuDomainJob job; @@ -600,9 +601,10 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) } static int -qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) +qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, + virDomainObjPtr vm) { - qemuDomainObjPrivatePtr priv = data; + qemuDomainObjPrivatePtr priv = vm->privateData; char *monitorpath; char *tmp; int n; -- 2.4.3

We don't have an async job when reconnecting to existing domains after libvirtd restart. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 2, 3: - no changes src/qemu/qemu_migration.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index b11407e..065a0f9 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1847,7 +1847,8 @@ static int qemuMigrationCancelOneDriveMirror(virQEMUDriverPtr driver, virDomainObjPtr vm, virDomainDiskDefPtr disk, - bool failNoJob) + bool failNoJob, + qemuDomainAsyncJob asyncJob) { qemuDomainObjPrivatePtr priv = vm->privateData; char *diskAlias = NULL; @@ -1875,8 +1876,7 @@ qemuMigrationCancelOneDriveMirror(virQEMUDriverPtr driver, QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) return -1; - if (qemuDomainObjEnterMonitorAsync(driver, vm, - QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) goto cleanup; rv = qemuMonitorBlockJobCancel(priv->mon, diskAlias, true); @@ -1907,7 +1907,8 @@ qemuMigrationCancelOneDriveMirror(virQEMUDriverPtr driver, static int qemuMigrationCancelDriveMirror(virQEMUDriverPtr driver, virDomainObjPtr vm, - bool check) + bool check, + qemuDomainAsyncJob asyncJob) { virErrorPtr err = NULL; int ret = -1; @@ -1924,7 +1925,8 @@ qemuMigrationCancelDriveMirror(virQEMUDriverPtr driver, if (!diskPriv->migrating) continue; - rv = qemuMigrationCancelOneDriveMirror(driver, vm, disk, check); + rv = qemuMigrationCancelOneDriveMirror(driver, vm, disk, + check, asyncJob); if (rv != 0) { if (rv < 0) { if (!err) @@ -3609,7 +3611,8 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver, virErrorPtr orig_err = virSaveLastError(); /* cancel any outstanding NBD jobs */ - qemuMigrationCancelDriveMirror(driver, vm, false); + qemuMigrationCancelDriveMirror(driver, vm, false, + QEMU_ASYNC_JOB_MIGRATION_OUT); virSetError(orig_err); virFreeError(orig_err); @@ -4180,7 +4183,8 @@ qemuMigrationRun(virQEMUDriverPtr driver, /* cancel any outstanding NBD jobs */ if (mig && mig->nbd) { - if (qemuMigrationCancelDriveMirror(driver, vm, ret == 0) < 0) + if (qemuMigrationCancelDriveMirror(driver, vm, ret == 0, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) ret = -1; } -- 2.4.3

"query-block-jobs" QMP command returns all running block jobs at once, while qemuMonitorBlockJobInfo would only report one. This is not very nice in case we need to check several block jobs. This patch refactors the monitor code to always parse all block jobs and store them in a hash. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_driver.c | 45 +++++++++++++++------------- src/qemu/qemu_monitor.c | 37 ++++++++++++++++++----- src/qemu/qemu_monitor.h | 17 ++++++++--- src/qemu/qemu_monitor_json.c | 70 ++++++++++++++++++++++---------------------- src/qemu/qemu_monitor_json.h | 7 ++--- 5 files changed, 104 insertions(+), 72 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0214e6b..1556a9e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16178,7 +16178,7 @@ qemuDomainBlockPivot(virQEMUDriverPtr driver, { int ret = -1, rc; qemuDomainObjPrivatePtr priv = vm->privateData; - virDomainBlockJobInfo info; + qemuMonitorBlockJobInfo info; virStorageSourcePtr oldsrc = NULL; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); @@ -16192,7 +16192,7 @@ qemuDomainBlockPivot(virQEMUDriverPtr driver, /* Probe the status, if needed. */ if (!disk->mirrorState) { qemuDomainObjEnterMonitor(driver, vm); - rc = qemuMonitorBlockJobInfo(priv->mon, device, &info, NULL); + rc = qemuMonitorGetBlockJobInfo(priv->mon, disk->info.alias, &info); if (qemuDomainObjExitMonitor(driver, vm) < 0) goto cleanup; if (rc < 0) @@ -16525,16 +16525,16 @@ qemuDomainBlockJobAbort(virDomainPtr dom, static int -qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, - virDomainBlockJobInfoPtr info, unsigned int flags) +qemuDomainGetBlockJobInfo(virDomainPtr dom, + const char *path, + virDomainBlockJobInfoPtr info, + unsigned int flags) { virQEMUDriverPtr driver = dom->conn->privateData; virDomainObjPtr vm; - char *device = NULL; - int idx; virDomainDiskDefPtr disk; int ret = -1; - unsigned long long bandwidth; + qemuMonitorBlockJobInfo rawInfo; virCheckFlags(VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES, -1); @@ -16557,31 +16557,34 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, if (qemuDomainSupportsBlockJobs(vm, NULL) < 0) goto endjob; - if (!(device = qemuDiskPathToAlias(vm, path, &idx))) + if (!(disk = virDomainDiskByName(vm->def, path, true))) goto endjob; - disk = vm->def->disks[idx]; qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorBlockJobInfo(qemuDomainGetMonitor(vm), device, info, - &bandwidth); + ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm), + disk->info.alias, &rawInfo); if (qemuDomainObjExitMonitor(driver, vm) < 0) ret = -1; if (ret < 0) goto endjob; + info->cur = rawInfo.cur; + info->end = rawInfo.end; + + info->type = rawInfo.type; if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) info->type = disk->mirrorJob; - if (bandwidth) { - if (!(flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES)) - bandwidth = VIR_DIV_UP(bandwidth, 1024 * 1024); - info->bandwidth = bandwidth; - if (info->bandwidth != bandwidth) { - virReportError(VIR_ERR_OVERFLOW, - _("bandwidth %llu cannot be represented in result"), - bandwidth); - goto endjob; - } + + if (rawInfo.bandwidth && + !(flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES)) + rawInfo.bandwidth = VIR_DIV_UP(rawInfo.bandwidth, 1024 * 1024); + info->bandwidth = rawInfo.bandwidth; + if (info->bandwidth != rawInfo.bandwidth) { + virReportError(VIR_ERR_OVERFLOW, + _("bandwidth %llu cannot be represented in result"), + rawInfo.bandwidth); + goto endjob; } /* Snoop block copy operations, so future cancel operations can diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 33600f0..ae7ef28 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3207,17 +3207,40 @@ qemuMonitorBlockJobSetSpeed(qemuMonitorPtr mon, } +virHashTablePtr +qemuMonitorGetAllBlockJobInfo(qemuMonitorPtr mon) +{ + QEMU_CHECK_MONITOR_JSON_NULL(mon); + return qemuMonitorJSONGetAllBlockJobInfo(mon); +} + + +/** + * qemuMonitorGetBlockJobInfo: + * Parse Block Job information, and populate info for the named device. + * Return 1 if info available, 0 if device has no block job, and -1 on error. + */ int -qemuMonitorBlockJobInfo(qemuMonitorPtr mon, - const char *device, - virDomainBlockJobInfoPtr info, - unsigned long long *bandwidth) +qemuMonitorGetBlockJobInfo(qemuMonitorPtr mon, + const char *alias, + qemuMonitorBlockJobInfoPtr info) { - VIR_DEBUG("device=%s, info=%p, bandwidth=%p", device, info, bandwidth); + virHashTablePtr all; + qemuMonitorBlockJobInfoPtr data; + int ret = 0; - QEMU_CHECK_MONITOR_JSON(mon); + VIR_DEBUG("alias=%s, info=%p", alias, info); - return qemuMonitorJSONBlockJobInfo(mon, device, info, bandwidth); + if (!(all = qemuMonitorGetAllBlockJobInfo(mon))) + return -1; + + if ((data = virHashLookup(all, alias))) { + *info = *data; + ret = 1; + } + + virHashFree(all); + return ret; } diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index d30b514..96f47e8 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -774,10 +774,19 @@ int qemuMonitorBlockJobSetSpeed(qemuMonitorPtr mon, unsigned long long bandwidth, bool modern); -int qemuMonitorBlockJobInfo(qemuMonitorPtr mon, - const char *device, - virDomainBlockJobInfoPtr info, - unsigned long long *bandwidth) +typedef struct _qemuMonitorBlockJobInfo qemuMonitorBlockJobInfo; +typedef qemuMonitorBlockJobInfo *qemuMonitorBlockJobInfoPtr; +struct _qemuMonitorBlockJobInfo { + int type; /* virDomainBlockJobType */ + unsigned long long bandwidth; /* in bytes/s */ + virDomainBlockJobCursor cur; + virDomainBlockJobCursor end; +}; + +virHashTablePtr qemuMonitorGetAllBlockJobInfo(qemuMonitorPtr mon); +int qemuMonitorGetBlockJobInfo(qemuMonitorPtr mon, + const char *device, + qemuMonitorBlockJobInfoPtr info) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorOpenGraphics(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 13c57d2..5b227cd 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4124,29 +4124,30 @@ int qemuMonitorJSONScreendump(qemuMonitorPtr mon, return ret; } -/* Returns -1 on error, 0 if not the right device, 1 if info was - * populated. However, rather than populate info->bandwidth (which - * might overflow on 32-bit machines), bandwidth is tracked optionally - * on the side. */ + static int -qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, - const char *device, - virDomainBlockJobInfoPtr info, - unsigned long long *bandwidth) +qemuMonitorJSONParseBlockJobInfo(virHashTablePtr blockJobs, + virJSONValuePtr entry) { - const char *this_dev; + qemuMonitorBlockJobInfoPtr info = NULL; + const char *device; const char *type; - if ((this_dev = virJSONValueObjectGetString(entry, "device")) == NULL) { + if (!(device = virJSONValueObjectGetString(entry, "device"))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("entry was missing 'device'")); return -1; } - if (!STREQ(this_dev, device)) - return 0; + if (STRPREFIX(device, QEMU_DRIVE_HOST_PREFIX)) + device += strlen(QEMU_DRIVE_HOST_PREFIX); - type = virJSONValueObjectGetString(entry, "type"); - if (!type) { + if (VIR_ALLOC(info) < 0 || + virHashAddEntry(blockJobs, device, info) < 0) { + VIR_FREE(info); + return -1; + } + + if (!(type = virJSONValueObjectGetString(entry, "type"))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("entry was missing 'type'")); return -1; @@ -4160,8 +4161,7 @@ qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, else info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; - if (bandwidth && - virJSONValueObjectGetNumberUlong(entry, "speed", bandwidth) < 0) { + if (virJSONValueObjectGetNumberUlong(entry, "speed", &info->bandwidth) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("entry was missing 'speed'")); return -1; @@ -4178,30 +4178,23 @@ qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, _("entry was missing 'len'")); return -1; } - return 1; + + return 0; } -/** - * qemuMonitorJSONBlockJobInfo: - * Parse Block Job information, and populate info for the named device. - * Return 1 if info available, 0 if device has no block job, and -1 on error. - */ -int -qemuMonitorJSONBlockJobInfo(qemuMonitorPtr mon, - const char *device, - virDomainBlockJobInfoPtr info, - unsigned long long *bandwidth) +virHashTablePtr +qemuMonitorJSONGetAllBlockJobInfo(qemuMonitorPtr mon) { virJSONValuePtr cmd = NULL; virJSONValuePtr reply = NULL; virJSONValuePtr data; int nr_results; size_t i; - int ret = -1; + virHashTablePtr blockJobs = NULL; cmd = qemuMonitorJSONMakeCommand("query-block-jobs", NULL); if (!cmd) - return -1; + return NULL; if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) goto cleanup; @@ -4223,22 +4216,29 @@ qemuMonitorJSONBlockJobInfo(qemuMonitorPtr mon, goto cleanup; } - for (i = ret = 0; i < nr_results && ret == 0; i++) { + if (!(blockJobs = virHashCreate(nr_results, virHashValueFree))) + goto cleanup; + + for (i = 0; i < nr_results; i++) { virJSONValuePtr entry = virJSONValueArrayGet(data, i); if (!entry) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("missing array element")); - ret = -1; - goto cleanup; + goto error; } - ret = qemuMonitorJSONGetBlockJobInfoOne(entry, device, info, - bandwidth); + if (qemuMonitorJSONParseBlockJobInfo(blockJobs, entry) < 0) + goto error; } cleanup: virJSONValueFree(cmd); virJSONValueFree(reply); - return ret; + return blockJobs; + + error: + virHashFree(blockJobs); + blockJobs = NULL; + goto cleanup; } diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index fb77930..c0ee4ce 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -316,11 +316,8 @@ int qemuMonitorJSONBlockJobSetSpeed(qemuMonitorPtr mon, bool modern) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); -int qemuMonitorJSONBlockJobInfo(qemuMonitorPtr mon, - const char *device, - virDomainBlockJobInfoPtr info, - unsigned long long *bandwidth) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +virHashTablePtr qemuMonitorJSONGetAllBlockJobInfo(qemuMonitorPtr mon) + ATTRIBUTE_NONNULL(1); int qemuMonitorJSONSetLink(qemuMonitorPtr mon, const char *name, -- 2.4.3

When libvirtd is restarted during migration, we properly cancel the ongoing migration (unless it managed to almost finished before the restart). But if we were also migrating storage using NBD, we would completely forget about the running disk mirrors. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - make use of qemuMonitorGetAllBlockJobInfo introduced by the previous patch - undo qemuBlockJobSyncBegin in case of error src/qemu/qemu_domain.c | 45 ++++++++++++++++++++++++- src/qemu/qemu_migration.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 ++ src/qemu/qemu_process.c | 8 +---- 4 files changed, 133 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 24ff020..68b6a95 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -578,7 +578,27 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, qemuDomainAsyncJobPhaseToString( priv->job.asyncJob, priv->job.phase)); } - virBufferAddLit(buf, "/>\n"); + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT) { + virBufferAddLit(buf, "/>\n"); + } else { + size_t i; + virDomainDiskDefPtr disk; + qemuDomainDiskPrivatePtr diskPriv; + + virBufferAddLit(buf, ">\n"); + virBufferAdjustIndent(buf, 2); + + for (i = 0; i < vm->def->ndisks; i++) { + disk = vm->def->disks[i]; + diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); + virBufferAsprintf(buf, "<disk dev='%s' migrating='%s'/>\n", + disk->dst, + diskPriv->migrating ? "yes" : "no"); + } + + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</job>\n"); + } } priv->job.active = job; @@ -736,6 +756,29 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, } } + if ((n = virXPathNodeSet("./job[1]/disk[@migrating='yes']", + ctxt, &nodes)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to parse list of disks marked for migration")); + goto error; + } + if (n > 0) { + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT) { + VIR_WARN("Found disks marked for migration but we were not " + "migrating"); + n = 0; + } + for (i = 0; i < n; i++) { + char *dst = virXMLPropString(nodes[i], "dev"); + virDomainDiskDefPtr disk; + + if (dst && (disk = virDomainDiskByName(vm->def, dst, false))) + QEMU_DOMAIN_DISK_PRIVATE(disk)->migrating = true; + VIR_FREE(dst); + } + } + VIR_FREE(nodes); + priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1; if ((n = virXPathNodeSet("./devices/device", ctxt, &nodes)) < 0) { diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 065a0f9..e0de09e 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2000,6 +2000,7 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, char *hoststr = NULL; unsigned int mirror_flags = VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; int rv; + virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); VIR_DEBUG("Starting drive mirrors for domain %s", vm->def->name); @@ -2049,6 +2050,11 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, goto cleanup; } diskPriv->migrating = true; + + if (virDomainSaveStatus(driver->xmlopt, cfg->stateDir, vm) < 0) { + VIR_WARN("Failed to save status on vm %s", vm->def->name); + goto cleanup; + } } while ((rv = qemuMigrationDriveMirrorReady(driver, vm)) != 1) { @@ -2076,6 +2082,7 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, ret = 0; cleanup: + virObjectUnref(cfg); VIR_FREE(diskAlias); VIR_FREE(nbd_dest); VIR_FREE(hoststr); @@ -5698,6 +5705,84 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, return ret; } + +int +qemuMigrationCancel(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + virHashTablePtr blockJobs = NULL; + bool storage = false; + size_t i; + int ret = -1; + + VIR_DEBUG("Canceling unfinished outgoing migration of domain %s", + vm->def->name); + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + if (QEMU_DOMAIN_DISK_PRIVATE(disk)->migrating) { + qemuBlockJobSyncBegin(disk); + storage = true; + } + } + + qemuDomainObjEnterMonitor(driver, vm); + + ignore_value(qemuMonitorMigrateCancel(priv->mon)); + if (storage) + blockJobs = qemuMonitorGetAllBlockJobInfo(priv->mon); + + if (qemuDomainObjExitMonitor(driver, vm) < 0 || (storage && !blockJobs)) + goto endsyncjob; + + if (!storage) { + ret = 0; + goto cleanup; + } + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); + + if (!diskPriv->migrating) + continue; + + if (virHashLookup(blockJobs, disk->info.alias)) { + VIR_DEBUG("Drive mirror on disk %s is still running", disk->dst); + } else { + VIR_DEBUG("Drive mirror on disk %s is gone", disk->dst); + qemuBlockJobSyncEnd(driver, vm, disk); + diskPriv->migrating = false; + } + } + + if (qemuMigrationCancelDriveMirror(driver, vm, false, + QEMU_ASYNC_JOB_NONE) < 0) + goto endsyncjob; + + ret = 0; + + cleanup: + virHashFree(blockJobs); + return ret; + + endsyncjob: + if (storage) { + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + qemuDomainDiskPrivatePtr diskPriv = QEMU_DOMAIN_DISK_PRIVATE(disk); + + if (diskPriv->migrating) { + qemuBlockJobSyncEnd(driver, vm, disk); + diskPriv->migrating = false; + } + } + } + goto cleanup; +} + + int qemuMigrationJobStart(virQEMUDriverPtr driver, virDomainObjPtr vm, diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 1726455..e47bde5 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -177,4 +177,7 @@ int qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(5) ATTRIBUTE_RETURN_CHECK; +int qemuMigrationCancel(virQEMUDriverPtr driver, + virDomainObjPtr vm); + #endif /* __QEMU_MIGRATION_H__ */ diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3c9d4bc..5be0002 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3354,8 +3354,6 @@ qemuProcessRecoverMigration(virQEMUDriverPtr driver, virDomainState state, int reason) { - qemuDomainObjPrivatePtr priv = vm->privateData; - if (job == QEMU_ASYNC_JOB_MIGRATION_IN) { switch (phase) { case QEMU_MIGRATION_PHASE_NONE: @@ -3409,11 +3407,7 @@ qemuProcessRecoverMigration(virQEMUDriverPtr driver, case QEMU_MIGRATION_PHASE_PERFORM3: /* migration is still in progress, let's cancel it and resume the * domain */ - VIR_DEBUG("Canceling unfinished outgoing migration of domain %s", - vm->def->name); - qemuDomainObjEnterMonitor(driver, vm); - ignore_value(qemuMonitorMigrateCancel(priv->mon)); - if (qemuDomainObjExitMonitor(driver, vm) < 0) + if (qemuMigrationCancel(driver, vm) < 0) return -1; /* resume the domain but only if it was paused as a result of * migration */ -- 2.4.3

To avoid polling for asyncAbort flag changes. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - rewritten using domain condition src/qemu/qemu_domain.c | 5 +++-- src/qemu/qemu_domain.h | 2 +- src/qemu/qemu_migration.c | 11 ++++------- 3 files changed, 8 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 68b6a95..25fa8d3 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -169,7 +169,7 @@ qemuDomainObjResetAsyncJob(qemuDomainObjPrivatePtr priv) job->phase = 0; job->mask = QEMU_JOB_DEFAULT_MASK; job->dump_memory_only = false; - job->asyncAbort = false; + job->abortJob = false; VIR_FREE(job->current); } @@ -1652,7 +1652,8 @@ qemuDomainObjAbortAsyncJob(virDomainObjPtr obj) qemuDomainAsyncJobTypeToString(priv->job.asyncJob), obj, obj->def->name); - priv->job.asyncAbort = true; + priv->job.abortJob = true; + virDomainObjBroadcast(obj); } /* diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 9003c9b..a3c5015 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -135,7 +135,7 @@ struct qemuDomainJobObj { bool dump_memory_only; /* use dump-guest-memory to do dump */ qemuDomainJobInfoPtr current; /* async job progress data */ qemuDomainJobInfoPtr completed; /* statistics data of a recently completed job */ - bool asyncAbort; /* abort of async job requested */ + bool abortJob; /* abort of the job requested */ }; typedef void (*qemuDomainCleanupCallback)(virQEMUDriverPtr driver, diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index e0de09e..d82a5ba 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2058,12 +2058,10 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, } while ((rv = qemuMigrationDriveMirrorReady(driver, vm)) != 1) { - unsigned long long now; - if (rv < 0) goto cleanup; - if (priv->job.asyncAbort) { + if (priv->job.abortJob) { priv->job.current->type = VIR_DOMAIN_JOB_CANCELLED; virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), qemuDomainAsyncJobTypeToString(priv->job.asyncJob), @@ -2071,8 +2069,7 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, goto cleanup; } - if (virTimeMillisNow(&now) < 0 || - virDomainObjWaitUntil(vm, now + 500) < 0) + if (virDomainObjWait(vm) < 0) goto cleanup; } @@ -4069,10 +4066,10 @@ qemuMigrationRun(virQEMUDriverPtr driver, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup; - if (priv->job.asyncAbort) { + if (priv->job.abortJob) { /* explicitly do this *after* we entered the monitor, * as this is a critical section so we are guaranteed - * priv->job.asyncAbort will not change */ + * priv->job.abortJob will not change */ ignore_value(qemuDomainObjExitMonitor(driver, vm)); priv->job.current->type = VIR_DOMAIN_JOB_CANCELLED; virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), -- 2.4.3

Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_monitor.c | 12 ++++++++++++ src/qemu/qemu_monitor.h | 6 ++++++ src/qemu/qemu_monitor_json.c | 10 ++++++++++ 3 files changed, 28 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ae7ef28..b7de846 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1479,6 +1479,18 @@ qemuMonitorEmitSerialChange(qemuMonitorPtr mon, int +qemuMonitorEmitSpiceMigrated(qemuMonitorPtr mon) +{ + int ret = -1; + VIR_DEBUG("mon=%p", mon); + + QEMU_MONITOR_CALLBACK(mon, ret, domainSpiceMigrated, mon->vm); + + return ret; +} + + +int qemuMonitorSetCapabilities(qemuMonitorPtr mon) { QEMU_CHECK_MONITOR(mon); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 96f47e8..a29c505 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -182,6 +182,10 @@ typedef int (*qemuMonitorDomainSerialChangeCallback)(qemuMonitorPtr mon, bool connected, void *opaque); +typedef int (*qemuMonitorDomainSpiceMigratedCallback)(qemuMonitorPtr mon, + virDomainObjPtr vm, + void *opaque); + typedef struct _qemuMonitorCallbacks qemuMonitorCallbacks; typedef qemuMonitorCallbacks *qemuMonitorCallbacksPtr; struct _qemuMonitorCallbacks { @@ -209,6 +213,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainDeviceDeletedCallback domainDeviceDeleted; qemuMonitorDomainNicRxFilterChangedCallback domainNicRxFilterChanged; qemuMonitorDomainSerialChangeCallback domainSerialChange; + qemuMonitorDomainSpiceMigratedCallback domainSpiceMigrated; }; char *qemuMonitorEscapeArg(const char *in); @@ -307,6 +312,7 @@ int qemuMonitorEmitNicRxFilterChanged(qemuMonitorPtr mon, int qemuMonitorEmitSerialChange(qemuMonitorPtr mon, const char *devAlias, bool connected); +int qemuMonitorEmitSpiceMigrated(qemuMonitorPtr mon); int qemuMonitorStartCPUs(qemuMonitorPtr mon, virConnectPtr conn); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 5b227cd..04ae339 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -83,6 +83,7 @@ static void qemuMonitorJSONHandleGuestPanic(qemuMonitorPtr mon, virJSONValuePtr static void qemuMonitorJSONHandleDeviceDeleted(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleNicRxFilterChanged(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSerialChange(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleSpiceMigrated(qemuMonitorPtr mon, virJSONValuePtr data); typedef struct { const char *type; @@ -107,6 +108,7 @@ static qemuEventHandler eventHandlers[] = { { "SPICE_CONNECTED", qemuMonitorJSONHandleSPICEConnect, }, { "SPICE_DISCONNECTED", qemuMonitorJSONHandleSPICEDisconnect, }, { "SPICE_INITIALIZED", qemuMonitorJSONHandleSPICEInitialize, }, + { "SPICE_MIGRATE_COMPLETED", qemuMonitorJSONHandleSpiceMigrated, }, { "STOP", qemuMonitorJSONHandleStop, }, { "SUSPEND", qemuMonitorJSONHandlePMSuspend, }, { "SUSPEND_DISK", qemuMonitorJSONHandlePMSuspendDisk, }, @@ -914,6 +916,14 @@ qemuMonitorJSONHandleSerialChange(qemuMonitorPtr mon, } +static void +qemuMonitorJSONHandleSpiceMigrated(qemuMonitorPtr mon, + virJSONValuePtr data ATTRIBUTE_UNUSED) +{ + qemuMonitorEmitSpiceMigrated(mon); +} + + int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, const char *cmd_str, -- 2.4.3

QEMU_CAPS_SEAMLESS_MIGRATION capability says QEMU supports SPICE_MIGRATE_COMPLETED event. Thus we can just drop all code which polls query-spice and replace it with waiting for the event. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_domain.c | 1 + src/qemu/qemu_domain.h | 1 + src/qemu/qemu_migration.c | 38 ++++++++++------------------------ src/qemu/qemu_monitor.c | 10 --------- src/qemu/qemu_monitor.h | 2 -- src/qemu/qemu_monitor_json.c | 49 -------------------------------------------- src/qemu/qemu_process.c | 28 +++++++++++++++++++++++++ tests/qemumonitorjsontest.c | 40 ------------------------------------ 8 files changed, 41 insertions(+), 128 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 25fa8d3..c9bdf6b 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -170,6 +170,7 @@ qemuDomainObjResetAsyncJob(qemuDomainObjPrivatePtr priv) job->mask = QEMU_JOB_DEFAULT_MASK; job->dump_memory_only = false; job->abortJob = false; + job->spiceMigrated = false; VIR_FREE(job->current); } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index a3c5015..54e1e7b 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -136,6 +136,7 @@ struct qemuDomainJobObj { qemuDomainJobInfoPtr current; /* async job progress data */ qemuDomainJobInfoPtr completed; /* statistics data of a recently completed job */ bool abortJob; /* abort of the job requested */ + bool spiceMigrated; /* spice migration completed */ }; typedef void (*qemuDomainCleanupCallback)(virQEMUDriverPtr driver, diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d82a5ba..d5a9dea 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2390,45 +2390,29 @@ qemuMigrationSetPinAll(virQEMUDriverPtr driver, } static int -qemuMigrationWaitForSpice(virQEMUDriverPtr driver, - virDomainObjPtr vm) +qemuMigrationWaitForSpice(virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; bool wait_for_spice = false; - bool spice_migrated = false; size_t i = 0; - int rc; - if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_SEAMLESS_MIGRATION)) { - for (i = 0; i < vm->def->ngraphics; i++) { - if (vm->def->graphics[i]->type == VIR_DOMAIN_GRAPHICS_TYPE_SPICE) { - wait_for_spice = true; - break; - } + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_SEAMLESS_MIGRATION)) + return 0; + + for (i = 0; i < vm->def->ngraphics; i++) { + if (vm->def->graphics[i]->type == VIR_DOMAIN_GRAPHICS_TYPE_SPICE) { + wait_for_spice = true; + break; } } if (!wait_for_spice) return 0; - while (!spice_migrated) { - /* Poll every 50ms for progress & to allow cancellation */ - struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; - - if (qemuDomainObjEnterMonitorAsync(driver, vm, - QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + while (!priv->job.spiceMigrated && !priv->job.abortJob) { + if (virDomainObjWait(vm) < 0) return -1; - - rc = qemuMonitorGetSpiceMigrationStatus(priv->mon, &spice_migrated); - if (qemuDomainObjExitMonitor(driver, vm) < 0) - return -1; - if (rc < 0) - return -1; - virObjectUnlock(vm); - nanosleep(&ts, NULL); - virObjectLock(vm); } - return 0; } @@ -3602,7 +3586,7 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver, if (retcode == 0) { /* If guest uses SPICE and supports seamless migration we have to hold * up domain shutdown until SPICE server transfers its data */ - qemuMigrationWaitForSpice(driver, vm); + qemuMigrationWaitForSpice(vm); qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_MIGRATED, VIR_QEMU_PROCESS_STOP_MIGRATED); diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index b7de846..94b0007 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2117,16 +2117,6 @@ qemuMonitorGetMigrationStatus(qemuMonitorPtr mon, int -qemuMonitorGetSpiceMigrationStatus(qemuMonitorPtr mon, - bool *spice_migrated) -{ - QEMU_CHECK_MONITOR_JSON(mon); - - return qemuMonitorJSONGetSpiceMigrationStatus(mon, spice_migrated); -} - - -int qemuMonitorMigrateToFd(qemuMonitorPtr mon, unsigned int flags, int fd) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index a29c505..1afc344 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -498,8 +498,6 @@ struct _qemuMonitorMigrationStatus { int qemuMonitorGetMigrationStatus(qemuMonitorPtr mon, qemuMonitorMigrationStatusPtr status); -int qemuMonitorGetSpiceMigrationStatus(qemuMonitorPtr mon, - bool *spice_migrated); typedef enum { QEMU_MONITOR_MIGRATION_CAPS_XBZRLE, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 04ae339..0ba549e 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2684,55 +2684,6 @@ int qemuMonitorJSONGetMigrationStatus(qemuMonitorPtr mon, } -static int -qemuMonitorJSONSpiceGetMigrationStatusReply(virJSONValuePtr reply, - bool *spice_migrated) -{ - virJSONValuePtr ret; - - if (!(ret = virJSONValueObjectGet(reply, "return"))) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("query-spice reply was missing return data")); - return -1; - } - - if (virJSONValueObjectGetBoolean(ret, "migrated", spice_migrated) < 0) { - /* Deliberately don't report error here as we are - * probably dealing with older qemu which doesn't - * report this yet. Pretend spice is migrated. */ - *spice_migrated = true; - } - - return 0; -} - - -int qemuMonitorJSONGetSpiceMigrationStatus(qemuMonitorPtr mon, - bool *spice_migrated) -{ - int ret; - virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("query-spice", - NULL); - virJSONValuePtr reply = NULL; - - if (!cmd) - return -1; - - ret = qemuMonitorJSONCommand(mon, cmd, &reply); - - if (ret == 0) - ret = qemuMonitorJSONCheckError(cmd, reply); - - if (ret == 0) - ret = qemuMonitorJSONSpiceGetMigrationStatusReply(reply, - spice_migrated); - - virJSONValueFree(cmd); - virJSONValueFree(reply); - return ret; -} - - int qemuMonitorJSONMigrate(qemuMonitorPtr mon, unsigned int flags, const char *uri) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 5be0002..ba84182 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1481,6 +1481,33 @@ qemuProcessHandleSerialChanged(qemuMonitorPtr mon ATTRIBUTE_UNUSED, } +static int +qemuProcessHandleSpiceMigrated(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + void *opaque ATTRIBUTE_UNUSED) +{ + qemuDomainObjPrivatePtr priv; + + virObjectLock(vm); + + VIR_DEBUG("Spice migration completed for domain %p %s", + vm, vm->def->name); + + priv = vm->privateData; + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT) { + VIR_DEBUG("got SPICE_MIGRATE_COMPLETED event without a migration job"); + goto cleanup; + } + + priv->job.spiceMigrated = true; + virDomainObjSignal(vm); + + cleanup: + virObjectUnlock(vm); + return 0; +} + + static qemuMonitorCallbacks monitorCallbacks = { .eofNotify = qemuProcessHandleMonitorEOF, .errorNotify = qemuProcessHandleMonitorError, @@ -1504,6 +1531,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainDeviceDeleted = qemuProcessHandleDeviceDeleted, .domainNicRxFilterChanged = qemuProcessHandleNicRxFilterChanged, .domainSerialChange = qemuProcessHandleSerialChanged, + .domainSpiceMigrated = qemuProcessHandleSpiceMigrated, }; static int diff --git a/tests/qemumonitorjsontest.c b/tests/qemumonitorjsontest.c index 0f82fd8..0623275 100644 --- a/tests/qemumonitorjsontest.c +++ b/tests/qemumonitorjsontest.c @@ -1706,45 +1706,6 @@ testQemuMonitorJSONqemuMonitorJSONGetMigrationStatus(const void *data) } static int -testQemuMonitorJSONqemuMonitorJSONGetSpiceMigrationStatus(const void *data) -{ - virDomainXMLOptionPtr xmlopt = (virDomainXMLOptionPtr)data; - qemuMonitorTestPtr test = qemuMonitorTestNewSimple(true, xmlopt); - int ret = -1; - bool spiceMigrated; - - if (!test) - return -1; - - if (qemuMonitorTestAddItem(test, "query-spice", - "{" - " \"return\": {" - " \"migrated\": true," - " \"enabled\": false," - " \"mouse-mode\": \"client\"" - " }," - " \"id\": \"libvirt-14\"" - "}") < 0) - goto cleanup; - - if (qemuMonitorJSONGetSpiceMigrationStatus(qemuMonitorTestGetMonitor(test), - &spiceMigrated) < 0) - goto cleanup; - - if (!spiceMigrated) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "Invalid spice migration status: %d, expecting 1", - spiceMigrated); - goto cleanup; - } - - ret = 0; - cleanup: - qemuMonitorTestFree(test); - return ret; -} - -static int testHashEqualChardevInfo(const void *value1, const void *value2) { const qemuMonitorChardevInfo *info1 = value1; @@ -2400,7 +2361,6 @@ mymain(void) DO_TEST(qemuMonitorJSONGetBlockStatsInfo); DO_TEST(qemuMonitorJSONGetMigrationCacheSize); DO_TEST(qemuMonitorJSONGetMigrationStatus); - DO_TEST(qemuMonitorJSONGetSpiceMigrationStatus); DO_TEST(qemuMonitorJSONGetChardevInfo); DO_TEST(qemuMonitorJSONSetBlockIoThrottle); DO_TEST(qemuMonitorJSONGetTargetArch); -- 2.4.3

Move common parts of qemuDomainGetJobInfo and qemuDomainGetJobStats into a separate API (qemuDomainGetJobStatsInternal). Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_driver.c | 113 ++++++++++++++++++++++++++----------------------- 1 file changed, 61 insertions(+), 52 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 1556a9e..17c8c85 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -13090,42 +13090,72 @@ qemuConnectBaselineCPU(virConnectPtr conn ATTRIBUTE_UNUSED, } -static int qemuDomainGetJobInfo(virDomainPtr dom, - virDomainJobInfoPtr info) +static int +qemuDomainGetJobStatsInternal(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + bool completed, + qemuDomainJobInfoPtr jobInfo) { + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainJobInfoPtr info; + int ret = -1; + + if (!completed && + !virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } + + if (completed) + info = priv->job.completed; + else + info = priv->job.current; + + if (!info) { + jobInfo->type = VIR_DOMAIN_JOB_NONE; + ret = 0; + goto cleanup; + } + *jobInfo = *info; + + if (jobInfo->type == VIR_DOMAIN_JOB_BOUNDED || + jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) + ret = qemuDomainJobInfoUpdateTime(jobInfo); + else + ret = 0; + + cleanup: + return ret; +} + + +static int +qemuDomainGetJobInfo(virDomainPtr dom, + virDomainJobInfoPtr info) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + qemuDomainJobInfo jobInfo; virDomainObjPtr vm; int ret = -1; - qemuDomainObjPrivatePtr priv; if (!(vm = qemuDomObjFromDomain(dom))) goto cleanup; - priv = vm->privateData; - if (virDomainGetJobInfoEnsureACL(dom->conn, vm->def) < 0) goto cleanup; - if (virDomainObjIsActive(vm)) { - if (priv->job.current) { - /* Refresh elapsed time again just to ensure it - * is fully updated. This is primarily for benefit - * of incoming migration which we don't currently - * monitor actively in the background thread - */ - if (qemuDomainJobInfoUpdateTime(priv->job.current) < 0 || - qemuDomainJobInfoToInfo(priv->job.current, info) < 0) - goto cleanup; - } else { - memset(info, 0, sizeof(*info)); - info->type = VIR_DOMAIN_JOB_NONE; - } - } else { - virReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("domain is not running")); + if (qemuDomainGetJobStatsInternal(driver, vm, false, &jobInfo) < 0) + goto cleanup; + + if (jobInfo.type == VIR_DOMAIN_JOB_NONE) { + memset(info, 0, sizeof(*info)); + info->type = VIR_DOMAIN_JOB_NONE; + ret = 0; goto cleanup; } - ret = 0; + ret = qemuDomainJobInfoToInfo(&jobInfo, info); cleanup: virDomainObjEndAPI(&vm); @@ -13140,9 +13170,11 @@ qemuDomainGetJobStats(virDomainPtr dom, int *nparams, unsigned int flags) { + virQEMUDriverPtr driver = dom->conn->privateData; virDomainObjPtr vm; qemuDomainObjPrivatePtr priv; - qemuDomainJobInfoPtr jobInfo; + qemuDomainJobInfo jobInfo; + bool completed = !!(flags & VIR_DOMAIN_JOB_STATS_COMPLETED); int ret = -1; virCheckFlags(VIR_DOMAIN_JOB_STATS_COMPLETED, -1); @@ -13150,24 +13182,14 @@ qemuDomainGetJobStats(virDomainPtr dom, if (!(vm = qemuDomObjFromDomain(dom))) goto cleanup; - priv = vm->privateData; - if (virDomainGetJobStatsEnsureACL(dom->conn, vm->def) < 0) goto cleanup; - if (!(flags & VIR_DOMAIN_JOB_STATS_COMPLETED) && - !virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("domain is not running")); + priv = vm->privateData; + if (qemuDomainGetJobStatsInternal(driver, vm, completed, &jobInfo) < 0) goto cleanup; - } - if (flags & VIR_DOMAIN_JOB_STATS_COMPLETED) - jobInfo = priv->job.completed; - else - jobInfo = priv->job.current; - - if (!jobInfo) { + if (jobInfo.type == VIR_DOMAIN_JOB_NONE) { *type = VIR_DOMAIN_JOB_NONE; *params = NULL; *nparams = 0; @@ -13175,24 +13197,11 @@ qemuDomainGetJobStats(virDomainPtr dom, goto cleanup; } - /* Refresh elapsed time again just to ensure it - * is fully updated. This is primarily for benefit - * of incoming migration which we don't currently - * monitor actively in the background thread - */ - if ((jobInfo->type == VIR_DOMAIN_JOB_BOUNDED || - jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) && - qemuDomainJobInfoUpdateTime(jobInfo) < 0) - goto cleanup; + ret = qemuDomainJobInfoToParams(&jobInfo, type, params, nparams); - if (qemuDomainJobInfoToParams(jobInfo, type, params, nparams) < 0) - goto cleanup; - - if (flags & VIR_DOMAIN_JOB_STATS_COMPLETED) + if (completed && ret == 0) VIR_FREE(priv->job.completed); - ret = 0; - cleanup: virDomainObjEndAPI(&vm); return ret; -- 2.4.3

Once we start waiting for migration events instead of polling query-migrate, priv->job.current will not be regularly updated anymore because we will get the current status directly from the events. Thus virDomainGetJob{Info,Stats} will have to query QEMU, but they can't just blindly update priv->job.current structure. This patch introduces qemuMigrationFetchJobStatus which just fills in a caller supplied structure and makes qemuMigrationUpdateJobStatus a tiny wrapper around it. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_migration.c | 133 ++++++++++++++++++++++++++++++---------------- src/qemu/qemu_migration.h | 5 ++ 2 files changed, 93 insertions(+), 45 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d5a9dea..7259c57 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2416,67 +2416,110 @@ qemuMigrationWaitForSpice(virDomainObjPtr vm) return 0; } -static int -qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, - virDomainObjPtr vm, - const char *job, - qemuDomainAsyncJob asyncJob) + +static void +qemuMigrationUpdateJobType(qemuDomainJobInfoPtr jobInfo) { - qemuDomainObjPrivatePtr priv = vm->privateData; - qemuMonitorMigrationStatus status; - qemuDomainJobInfoPtr jobInfo; - int ret; - - memset(&status, 0, sizeof(status)); - - ret = qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob); - if (ret < 0) { - /* Guest already exited or waiting for the job timed out; nothing - * further to update. */ - return ret; - } - ret = qemuMonitorGetMigrationStatus(priv->mon, &status); - - if (qemuDomainObjExitMonitor(driver, vm) < 0) - return -1; - - if (ret < 0 || - qemuDomainJobInfoUpdateTime(priv->job.current) < 0) - return -1; - - ret = -1; - jobInfo = priv->job.current; - switch (status.status) { + switch (jobInfo->status.status) { case QEMU_MONITOR_MIGRATION_STATUS_COMPLETED: jobInfo->type = VIR_DOMAIN_JOB_COMPLETED; - /* fall through */ - case QEMU_MONITOR_MIGRATION_STATUS_SETUP: - case QEMU_MONITOR_MIGRATION_STATUS_ACTIVE: - case QEMU_MONITOR_MIGRATION_STATUS_CANCELLING: - ret = 0; break; case QEMU_MONITOR_MIGRATION_STATUS_INACTIVE: jobInfo->type = VIR_DOMAIN_JOB_NONE; - virReportError(VIR_ERR_OPERATION_FAILED, - _("%s: %s"), job, _("is not active")); break; case QEMU_MONITOR_MIGRATION_STATUS_ERROR: jobInfo->type = VIR_DOMAIN_JOB_FAILED; - virReportError(VIR_ERR_OPERATION_FAILED, - _("%s: %s"), job, _("unexpectedly failed")); break; case QEMU_MONITOR_MIGRATION_STATUS_CANCELLED: jobInfo->type = VIR_DOMAIN_JOB_CANCELLED; + break; + + case QEMU_MONITOR_MIGRATION_STATUS_SETUP: + case QEMU_MONITOR_MIGRATION_STATUS_ACTIVE: + case QEMU_MONITOR_MIGRATION_STATUS_CANCELLING: + break; + } +} + + +int +qemuMigrationFetchJobStatus(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob asyncJob, + qemuDomainJobInfoPtr jobInfo) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int rv; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) + return -1; + + memset(&jobInfo->status, 0, sizeof(jobInfo->status)); + rv = qemuMonitorGetMigrationStatus(priv->mon, &jobInfo->status); + + if (qemuDomainObjExitMonitor(driver, vm) < 0 || rv < 0) + return -1; + + qemuMigrationUpdateJobType(jobInfo); + return qemuDomainJobInfoUpdateTime(jobInfo); +} + + +static int +qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob asyncJob) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainJobInfoPtr jobInfo = priv->job.current; + qemuDomainJobInfo newInfo = *jobInfo; + + if (qemuMigrationFetchJobStatus(driver, vm, asyncJob, &newInfo) < 0) + return -1; + + *jobInfo = newInfo; + return 0; +} + + +static int +qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, + virDomainObjPtr vm, + const char *job, + qemuDomainAsyncJob asyncJob) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainJobInfoPtr jobInfo = priv->job.current; + + if (qemuMigrationUpdateJobStatus(driver, vm, asyncJob) < 0) + return -1; + + switch (jobInfo->type) { + case VIR_DOMAIN_JOB_NONE: + virReportError(VIR_ERR_OPERATION_FAILED, + _("%s: %s"), job, _("is not active")); + return -1; + + case VIR_DOMAIN_JOB_FAILED: + virReportError(VIR_ERR_OPERATION_FAILED, + _("%s: %s"), job, _("unexpectedly failed")); + return -1; + + case VIR_DOMAIN_JOB_CANCELLED: virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), job, _("canceled by client")); + return -1; + + case VIR_DOMAIN_JOB_BOUNDED: + case VIR_DOMAIN_JOB_UNBOUNDED: + case VIR_DOMAIN_JOB_COMPLETED: + case VIR_DOMAIN_JOB_LAST: break; } - jobInfo->status = status; - - return ret; + return 0; } @@ -2517,7 +2560,7 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, /* Poll every 50ms for progress & to allow cancellation */ struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; - if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) == -1) + if (qemuMigrationCheckJobStatus(driver, vm, job, asyncJob) < 0) goto error; if (storage && @@ -4123,8 +4166,8 @@ qemuMigrationRun(virQEMUDriverPtr driver, * rather failed later on. Check its status before waiting for a * connection from qemu which may never be initiated. */ - if (qemuMigrationUpdateJobStatus(driver, vm, _("migration job"), - QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + if (qemuMigrationCheckJobStatus(driver, vm, _("migration job"), + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cancel; while ((fd = accept(spec->dest.unix_socket.sock, NULL, NULL)) < 0) { diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index e47bde5..65dfdec 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -180,4 +180,9 @@ int qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, int qemuMigrationCancel(virQEMUDriverPtr driver, virDomainObjPtr vm); +int qemuMigrationFetchJobStatus(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob asyncJob, + qemuDomainJobInfoPtr jobInfo); + #endif /* __QEMU_MIGRATION_H__ */ -- 2.4.3

Instead of passing current job name to several functions which already know what the current job is we can generate the name where we actually need to use it. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_migration.c | 54 ++++++++++++++++++++++++----------------------- 1 file changed, 28 insertions(+), 26 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 7259c57..77a76a7 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2468,6 +2468,24 @@ qemuMigrationFetchJobStatus(virQEMUDriverPtr driver, } +static const char * +qemuMigrationJobName(virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + switch (priv->job.asyncJob) { + case QEMU_ASYNC_JOB_MIGRATION_OUT: + return _("migration job"); + case QEMU_ASYNC_JOB_SAVE: + return _("domain save job"); + case QEMU_ASYNC_JOB_DUMP: + return _("domain core dump job"); + default: + return _("job"); + } +} + + static int qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, virDomainObjPtr vm, @@ -2488,7 +2506,6 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, static int qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, virDomainObjPtr vm, - const char *job, qemuDomainAsyncJob asyncJob) { qemuDomainObjPrivatePtr priv = vm->privateData; @@ -2499,18 +2516,18 @@ qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, switch (jobInfo->type) { case VIR_DOMAIN_JOB_NONE: - virReportError(VIR_ERR_OPERATION_FAILED, - _("%s: %s"), job, _("is not active")); + virReportError(VIR_ERR_OPERATION_FAILED, _("%s: %s"), + qemuMigrationJobName(vm), _("is not active")); return -1; case VIR_DOMAIN_JOB_FAILED: - virReportError(VIR_ERR_OPERATION_FAILED, - _("%s: %s"), job, _("unexpectedly failed")); + virReportError(VIR_ERR_OPERATION_FAILED, _("%s: %s"), + qemuMigrationJobName(vm), _("unexpectedly failed")); return -1; case VIR_DOMAIN_JOB_CANCELLED: - virReportError(VIR_ERR_OPERATION_ABORTED, - _("%s: %s"), job, _("canceled by client")); + virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), + qemuMigrationJobName(vm), _("canceled by client")); return -1; case VIR_DOMAIN_JOB_BOUNDED: @@ -2536,31 +2553,16 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; - const char *job; int pauseReason; int ret = -1; - switch (priv->job.asyncJob) { - case QEMU_ASYNC_JOB_MIGRATION_OUT: - job = _("migration job"); - break; - case QEMU_ASYNC_JOB_SAVE: - job = _("domain save job"); - break; - case QEMU_ASYNC_JOB_DUMP: - job = _("domain core dump job"); - break; - default: - job = _("job"); - } - jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED; while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { /* Poll every 50ms for progress & to allow cancellation */ struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; - if (qemuMigrationCheckJobStatus(driver, vm, job, asyncJob) < 0) + if (qemuMigrationCheckJobStatus(driver, vm, asyncJob) < 0) goto error; if (storage && @@ -2571,8 +2573,8 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, if (abort_on_error && virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED && pauseReason == VIR_DOMAIN_PAUSED_IOERROR) { - virReportError(VIR_ERR_OPERATION_FAILED, - _("%s: %s"), job, _("failed due to I/O error")); + virReportError(VIR_ERR_OPERATION_FAILED, _("%s: %s"), + qemuMigrationJobName(vm), _("failed due to I/O error")); goto error; } @@ -4166,7 +4168,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, * rather failed later on. Check its status before waiting for a * connection from qemu which may never be initiated. */ - if (qemuMigrationCheckJobStatus(driver, vm, _("migration job"), + if (qemuMigrationCheckJobStatus(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cancel; -- 2.4.3

Checking status of all part of migration and aborting it when something failed is a complex thing which makes the waiting loop hard to read. This patch moves all the checks into a separate function similarly to what was done for drive mirror loops. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_migration.c | 106 +++++++++++++++++++++++++++++----------------- 1 file changed, 66 insertions(+), 40 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 77a76a7..d9f1a59 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2540,6 +2540,63 @@ qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, } +/** + * Returns 1 if migration completed successfully, + * 0 if the domain is still being migrated, + * -1 migration failed, + * -2 something else failed, we need to cancel migration. + */ +static int +qemuMigrationCompleted(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob asyncJob, + virConnectPtr dconn, + bool abort_on_error, + bool storage) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainJobInfoPtr jobInfo = priv->job.current; + int pauseReason; + + if (qemuMigrationCheckJobStatus(driver, vm, asyncJob) < 0) + goto error; + + if (storage && qemuMigrationDriveMirrorReady(driver, vm) < 0) + goto error; + + if (abort_on_error && + virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED && + pauseReason == VIR_DOMAIN_PAUSED_IOERROR) { + virReportError(VIR_ERR_OPERATION_FAILED, _("%s: %s"), + qemuMigrationJobName(vm), _("failed due to I/O error")); + goto error; + } + + if (dconn && virConnectIsAlive(dconn) <= 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Lost connection to destination host")); + goto error; + } + + if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) + return 1; + else + return 0; + + error: + if (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { + /* The migration was aborted by us rather than QEMU itself. */ + jobInfo->type = VIR_DOMAIN_JOB_FAILED; + return -2; + } else if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) { + jobInfo->type = VIR_DOMAIN_JOB_FAILED; + return -1; + } else { + return -1; + } +} + + /* Returns 0 on success, -2 when migration needs to be cancelled, or -1 when * QEMU reports failed migration. */ @@ -2553,59 +2610,28 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; - int pauseReason; - int ret = -1; + int rv; jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED; - - while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { + while ((rv = qemuMigrationCompleted(driver, vm, asyncJob, dconn, + abort_on_error, storage)) != 1) { /* Poll every 50ms for progress & to allow cancellation */ struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; - if (qemuMigrationCheckJobStatus(driver, vm, asyncJob) < 0) - goto error; + if (rv < 0) + return rv; - if (storage && - qemuMigrationDriveMirrorReady(driver, vm) < 0) - break; - - /* cancel migration if disk I/O error is emitted while migrating */ - if (abort_on_error && - virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED && - pauseReason == VIR_DOMAIN_PAUSED_IOERROR) { - virReportError(VIR_ERR_OPERATION_FAILED, _("%s: %s"), - qemuMigrationJobName(vm), _("failed due to I/O error")); - goto error; - } - - if (dconn && virConnectIsAlive(dconn) <= 0) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("Lost connection to destination host")); - goto error; - } - - if (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { - virObjectUnlock(vm); - nanosleep(&ts, NULL); - virObjectLock(vm); - } + virObjectUnlock(vm); + nanosleep(&ts, NULL); + virObjectLock(vm); } qemuDomainJobInfoUpdateDowntime(jobInfo); VIR_FREE(priv->job.completed); if (VIR_ALLOC(priv->job.completed) == 0) *priv->job.completed = *jobInfo; - return 0; - error: - /* Check if the migration was aborted by us rather than QEMU itself. */ - if (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED || - jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) { - if (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) - ret = -2; - jobInfo->type = VIR_DOMAIN_JOB_FAILED; - } - return ret; + return 0; } -- 2.4.3

Thanks to Juan's work QEMU finally emits an event whenever migration state changes. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: The MIGRATION event is not supported by QEMU yet, this patch is based on the current patches available on qemu-devel mailing list. However, there were no objections to the design of the event, which makes it unlikely to change. Anyway this will have to wait until the patches are applied to QEMU. ACKed in version 2 Version 3: - rebased (context conflict in qemu_capabilities.[ch]) Version 2: - new patch src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_monitor.c | 14 ++++++++++++++ src/qemu/qemu_monitor.h | 8 ++++++++ src/qemu/qemu_monitor_json.c | 23 +++++++++++++++++++++++ 5 files changed, 49 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index ca7a7c2..a80763a 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -285,6 +285,8 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "dea-key-wrap", "pci-serial", "aarch64-off", + + "migration-event", /* 190 */ ); @@ -1499,6 +1501,7 @@ struct virQEMUCapsStringFlags virQEMUCapsEvents[] = { { "BALLOON_CHANGE", QEMU_CAPS_BALLOON_EVENT }, { "SPICE_MIGRATE_COMPLETED", QEMU_CAPS_SEAMLESS_MIGRATION }, { "DEVICE_DELETED", QEMU_CAPS_DEVICE_DEL_EVENT }, + { "MIGRATION", QEMU_CAPS_MIGRATION_EVENT }, }; struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index b5a7770..34cd078 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -229,6 +229,7 @@ typedef enum { QEMU_CAPS_DEA_KEY_WRAP = 187, /* -machine dea_key_wrap */ QEMU_CAPS_DEVICE_PCI_SERIAL = 188, /* -device pci-serial */ QEMU_CAPS_CPU_AARCH64_OFF = 189, /* -cpu ...,aarch64=off */ + QEMU_CAPS_MIGRATION_EVENT = 190, /* MIGRATION event */ QEMU_CAPS_LAST, /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 94b0007..7f71a0e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -1491,6 +1491,20 @@ qemuMonitorEmitSpiceMigrated(qemuMonitorPtr mon) int +qemuMonitorEmitMigrationStatus(qemuMonitorPtr mon, + int status) +{ + int ret = -1; + VIR_DEBUG("mon=%p, status=%s", + mon, NULLSTR(qemuMonitorMigrationStatusTypeToString(status))); + + QEMU_MONITOR_CALLBACK(mon, ret, domainMigrationStatus, mon->vm, status); + + return ret; +} + + +int qemuMonitorSetCapabilities(qemuMonitorPtr mon) { QEMU_CHECK_MONITOR(mon); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 1afc344..7018045 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -186,6 +186,11 @@ typedef int (*qemuMonitorDomainSpiceMigratedCallback)(qemuMonitorPtr mon, virDomainObjPtr vm, void *opaque); +typedef int (*qemuMonitorDomainMigrationStatusCallback)(qemuMonitorPtr mon, + virDomainObjPtr vm, + int status, + void *opaque); + typedef struct _qemuMonitorCallbacks qemuMonitorCallbacks; typedef qemuMonitorCallbacks *qemuMonitorCallbacksPtr; struct _qemuMonitorCallbacks { @@ -214,6 +219,7 @@ struct _qemuMonitorCallbacks { qemuMonitorDomainNicRxFilterChangedCallback domainNicRxFilterChanged; qemuMonitorDomainSerialChangeCallback domainSerialChange; qemuMonitorDomainSpiceMigratedCallback domainSpiceMigrated; + qemuMonitorDomainMigrationStatusCallback domainMigrationStatus; }; char *qemuMonitorEscapeArg(const char *in); @@ -313,6 +319,8 @@ int qemuMonitorEmitSerialChange(qemuMonitorPtr mon, const char *devAlias, bool connected); int qemuMonitorEmitSpiceMigrated(qemuMonitorPtr mon); +int qemuMonitorEmitMigrationStatus(qemuMonitorPtr mon, + int status); int qemuMonitorStartCPUs(qemuMonitorPtr mon, virConnectPtr conn); diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 0ba549e..1f070cf 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -84,6 +84,7 @@ static void qemuMonitorJSONHandleDeviceDeleted(qemuMonitorPtr mon, virJSONValueP static void qemuMonitorJSONHandleNicRxFilterChanged(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSerialChange(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSpiceMigrated(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleMigrationStatus(qemuMonitorPtr mon, virJSONValuePtr data); typedef struct { const char *type; @@ -99,6 +100,7 @@ static qemuEventHandler eventHandlers[] = { { "DEVICE_DELETED", qemuMonitorJSONHandleDeviceDeleted, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "GUEST_PANICKED", qemuMonitorJSONHandleGuestPanic, }, + { "MIGRATION", qemuMonitorJSONHandleMigrationStatus, }, { "NIC_RX_FILTER_CHANGED", qemuMonitorJSONHandleNicRxFilterChanged, }, { "POWERDOWN", qemuMonitorJSONHandlePowerdown, }, { "RESET", qemuMonitorJSONHandleReset, }, @@ -924,6 +926,27 @@ qemuMonitorJSONHandleSpiceMigrated(qemuMonitorPtr mon, } +static void +qemuMonitorJSONHandleMigrationStatus(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + const char *str; + int status; + + if (!(str = virJSONValueObjectGetString(data, "status"))) { + VIR_WARN("missing status in migration event"); + return; + } + + if ((status = qemuMonitorMigrationStatusTypeFromString(str)) == -1) { + VIR_WARN("unknown status '%s' in migration event", str); + return; + } + + qemuMonitorEmitMigrationStatus(mon, status); +} + + int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, const char *cmd_str, -- 2.4.3

On Wed, Jun 10, 2015 at 15:42:53 +0200, Jiri Denemark wrote:
Thanks to Juan's work QEMU finally emits an event whenever migration state changes.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> ---
Notes: The MIGRATION event is not supported by QEMU yet, this patch is based on the current patches available on qemu-devel mailing list. However, there were no objections to the design of the event, which makes it unlikely to change. Anyway this will have to wait until the patches are applied to QEMU.
ACKed in version 2
Version 3: - rebased (context conflict in qemu_capabilities.[ch])
Version 2: - new patch
src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_monitor.c | 14 ++++++++++++++ src/qemu/qemu_monitor.h | 8 ++++++++ src/qemu/qemu_monitor_json.c | 23 +++++++++++++++++++++++ 5 files changed, 49 insertions(+)
...
diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index b5a7770..34cd078 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -229,6 +229,7 @@ typedef enum { QEMU_CAPS_DEA_KEY_WRAP = 187, /* -machine dea_key_wrap */ QEMU_CAPS_DEVICE_PCI_SERIAL = 188, /* -device pci-serial */ QEMU_CAPS_CPU_AARCH64_OFF = 189, /* -cpu ...,aarch64=off */ + QEMU_CAPS_MIGRATION_EVENT = 190, /* MIGRATION event */
The alignment of the equals sign is off here.
QEMU_CAPS_LAST, /* this must always be the last item */ } virQEMUCapsFlags;
ACK with that fixed. Peter

When QEMU supports migration events the qemuDomainJobInfo structure will no longer be updated with migration statistics. We have to enter a job and explicitly ask QEMU every time virDomainGetJob{Info,Stats} is called. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - moved before "qemu: Update migration state according to MIGRATION event" Version 2: - new patch src/qemu/qemu_driver.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 17c8c85..7c5d685 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -13091,15 +13091,27 @@ qemuConnectBaselineCPU(virConnectPtr conn ATTRIBUTE_UNUSED, static int -qemuDomainGetJobStatsInternal(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, +qemuDomainGetJobStatsInternal(virQEMUDriverPtr driver, virDomainObjPtr vm, bool completed, qemuDomainJobInfoPtr jobInfo) { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr info; + bool fetch = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_EVENT); int ret = -1; + if (completed) + fetch = false; + + /* Do not ask QEMU if migration is not even running yet */ + if (!priv->job.current || !priv->job.current->status.status) + fetch = false; + + if (fetch && + qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) + return -1; + if (!completed && !virDomainObjIsActive(vm)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", @@ -13120,12 +13132,19 @@ qemuDomainGetJobStatsInternal(virQEMUDriverPtr driver ATTRIBUTE_UNUSED, *jobInfo = *info; if (jobInfo->type == VIR_DOMAIN_JOB_BOUNDED || - jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) - ret = qemuDomainJobInfoUpdateTime(jobInfo); - else + jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { + if (fetch) + ret = qemuMigrationFetchJobStatus(driver, vm, QEMU_ASYNC_JOB_NONE, + jobInfo); + else + ret = qemuDomainJobInfoUpdateTime(jobInfo); + } else { ret = 0; + } cleanup: + if (fetch) + qemuDomainObjEndJob(driver, vm); return ret; } -- 2.4.3

We don't need to call query-migrate every 50ms when we get the current migration state via MIGRATION event. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in vesrion 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_migration.c | 14 ++++++++++++-- src/qemu/qemu_process.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d9f1a59..c3c2cac 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2511,7 +2511,11 @@ qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; - if (qemuMigrationUpdateJobStatus(driver, vm, asyncJob) < 0) + bool events = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_EVENT); + + if (events) + qemuMigrationUpdateJobType(jobInfo); + else if (qemuMigrationUpdateJobStatus(driver, vm, asyncJob) < 0) return -1; switch (jobInfo->type) { @@ -2530,9 +2534,15 @@ qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, qemuMigrationJobName(vm), _("canceled by client")); return -1; + case VIR_DOMAIN_JOB_COMPLETED: + /* Fetch statistics of a completed migration */ + if (events && + qemuMigrationUpdateJobStatus(driver, vm, asyncJob) < 0) + return -1; + break; + case VIR_DOMAIN_JOB_BOUNDED: case VIR_DOMAIN_JOB_UNBOUNDED: - case VIR_DOMAIN_JOB_COMPLETED: case VIR_DOMAIN_JOB_LAST: break; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ba84182..e703cbd 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1508,6 +1508,36 @@ qemuProcessHandleSpiceMigrated(qemuMonitorPtr mon ATTRIBUTE_UNUSED, } +static int +qemuProcessHandleMigrationStatus(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virDomainObjPtr vm, + int status, + void *opaque ATTRIBUTE_UNUSED) +{ + qemuDomainObjPrivatePtr priv; + + virObjectLock(vm); + + VIR_DEBUG("Migration of domain %p %s changed state to %s", + vm, vm->def->name, + qemuMonitorMigrationStatusTypeToString(status)); + + priv = vm->privateData; + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT && + priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_IN) { + VIR_DEBUG("got MIGRATION event without a migration job"); + goto cleanup; + } + + priv->job.current->status.status = status; + virDomainObjSignal(vm); + + cleanup: + virObjectUnlock(vm); + return 0; +} + + static qemuMonitorCallbacks monitorCallbacks = { .eofNotify = qemuProcessHandleMonitorEOF, .errorNotify = qemuProcessHandleMonitorError, @@ -1532,6 +1562,7 @@ static qemuMonitorCallbacks monitorCallbacks = { .domainNicRxFilterChanged = qemuProcessHandleNicRxFilterChanged, .domainSerialChange = qemuProcessHandleSerialChanged, .domainSpiceMigrated = qemuProcessHandleSpiceMigrated, + .domainMigrationStatus = qemuProcessHandleMigrationStatus, }; static int -- 2.4.3

Since we already support the MIGRATION event, we just need to make sure the domain condition is signalled whenever a p2p connection drops or the domain is paused due to IO error and we can avoid waking up every 50 ms to check whether something happened. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - no change Version 2: - new patch src/qemu/qemu_domain.h | 3 +++ src/qemu/qemu_migration.c | 45 +++++++++++++++++++++++++++++++++++++++------ src/qemu/qemu_process.c | 3 +++ 3 files changed, 45 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 54e1e7b..86bd604 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -199,6 +199,9 @@ struct _qemuDomainObjPrivate { /* Bitmaps below hold data from the auto NUMA feature */ virBitmapPtr autoNodeset; virBitmapPtr autoCpuset; + + bool signalIOError; /* true if the domain condition should be signalled on + I/O error */ }; # define QEMU_DOMAIN_DISK_PRIVATE(disk) \ diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index c3c2cac..60c75f3 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2620,20 +2620,28 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; + bool events = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_EVENT); int rv; jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED; while ((rv = qemuMigrationCompleted(driver, vm, asyncJob, dconn, abort_on_error, storage)) != 1) { - /* Poll every 50ms for progress & to allow cancellation */ - struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; - if (rv < 0) return rv; - virObjectUnlock(vm); - nanosleep(&ts, NULL); - virObjectLock(vm); + if (events) { + if (virDomainObjWait(vm) < 0) { + jobInfo->type = VIR_DOMAIN_JOB_FAILED; + return -2; + } + } else { + /* Poll every 50ms for progress & to allow cancellation */ + struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; + + virObjectUnlock(vm); + nanosleep(&ts, NULL); + virObjectLock(vm); + } } qemuDomainJobInfoUpdateDowntime(jobInfo); @@ -4050,6 +4058,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, virErrorPtr orig_err = NULL; unsigned int cookieFlags = 0; bool abort_on_error = !!(flags & VIR_MIGRATE_ABORT_ON_ERROR); + bool events = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_EVENT); int rc; VIR_DEBUG("driver=%p, vm=%p, cookiein=%s, cookieinlen=%d, " @@ -4079,6 +4088,9 @@ qemuMigrationRun(virQEMUDriverPtr driver, return -1; } + if (events) + priv->signalIOError = abort_on_error; + mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, cookieFlags | QEMU_MIGRATION_COOKIE_GRAPHICS); if (!mig) @@ -4284,6 +4296,9 @@ qemuMigrationRun(virQEMUDriverPtr driver, qemuMigrationCookieFree(mig); + if (events) + priv->signalIOError = false; + if (orig_err) { virSetError(orig_err); virFreeError(orig_err); @@ -4908,6 +4923,18 @@ doPeer2PeerMigrate3(virQEMUDriverPtr driver, } +static void +qemuMigrationConnectionClosed(virConnectPtr conn, + int reason, + void *opaque) +{ + virDomainObjPtr vm = opaque; + + VIR_DEBUG("conn=%p, reason=%d, vm=%s", conn, reason, vm->def->name); + virDomainObjSignal(vm); +} + + static int virConnectCredType[] = { VIR_CRED_AUTHNAME, VIR_CRED_PASSPHRASE, @@ -4981,6 +5008,11 @@ static int doPeer2PeerMigrate(virQEMUDriverPtr driver, cfg->keepAliveCount) < 0) goto cleanup; + if (virConnectRegisterCloseCallback(dconn, qemuMigrationConnectionClosed, + vm, NULL) < 0) { + goto cleanup; + } + qemuDomainObjEnterRemote(vm); p2p = VIR_DRV_SUPPORTS_FEATURE(dconn->driver, dconn, VIR_DRV_FEATURE_MIGRATION_P2P); @@ -5045,6 +5077,7 @@ static int doPeer2PeerMigrate(virQEMUDriverPtr driver, cleanup: orig_err = virSaveLastError(); qemuDomainObjEnterRemote(vm); + virConnectUnregisterCloseCallback(dconn, qemuMigrationConnectionClosed); virObjectUnref(dconn); qemuDomainObjExitRemote(vm); if (orig_err) { diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e703cbd..93c0844 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -952,6 +952,9 @@ qemuProcessHandleIOError(qemuMonitorPtr mon ATTRIBUTE_UNUSED, qemuDomainObjPrivatePtr priv = vm->privateData; VIR_DEBUG("Transitioned guest %s to paused state due to IO error", vm->def->name); + if (priv->signalIOError) + virDomainObjSignal(vm); + virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, VIR_DOMAIN_PAUSED_IOERROR); lifecycleEvent = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_SUSPENDED, -- 2.4.3

When a connection to the destination host during a p2p migration drops, we know we will have to cancel the migration; it doesn't make sense to waste resources by trying to finish the migration. We already do so after sending "migrate" command to QEMU and we should do it while waiting for drive mirrors to become ready too. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: ACKed in version 2 Version 3: - rebased on top of modified qemuMigrationDriveMirrorCancelled Version 2: - new patch src/qemu/qemu_migration.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 60c75f3..c53d2ea 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1908,7 +1908,8 @@ static int qemuMigrationCancelDriveMirror(virQEMUDriverPtr driver, virDomainObjPtr vm, bool check, - qemuDomainAsyncJob asyncJob) + qemuDomainAsyncJob asyncJob, + virConnectPtr dconn) { virErrorPtr err = NULL; int ret = -1; @@ -1939,6 +1940,13 @@ qemuMigrationCancelDriveMirror(virQEMUDriverPtr driver, } while ((rv = qemuMigrationDriveMirrorCancelled(driver, vm, check)) != 1) { + if (check && !failed && + dconn && virConnectIsAlive(dconn) <= 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Lost connection to destination host")); + failed = true; + } + if (rv < 0) { failed = true; if (rv == -2) @@ -1989,7 +1997,8 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, qemuMigrationCookiePtr mig, const char *host, unsigned long speed, - unsigned int *migrate_flags) + unsigned int *migrate_flags, + virConnectPtr dconn) { qemuDomainObjPrivatePtr priv = vm->privateData; int ret = -1; @@ -2069,6 +2078,12 @@ qemuMigrationDriveMirror(virQEMUDriverPtr driver, goto cleanup; } + if (dconn && virConnectIsAlive(dconn) <= 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Lost connection to destination host")); + goto cleanup; + } + if (virDomainObjWait(vm) < 0) goto cleanup; } @@ -3689,7 +3704,7 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver, /* cancel any outstanding NBD jobs */ qemuMigrationCancelDriveMirror(driver, vm, false, - QEMU_ASYNC_JOB_MIGRATION_OUT); + QEMU_ASYNC_JOB_MIGRATION_OUT, NULL); virSetError(orig_err); virFreeError(orig_err); @@ -4106,7 +4121,8 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (qemuMigrationDriveMirror(driver, vm, mig, spec->dest.host.name, migrate_speed, - &migrate_flags) < 0) { + &migrate_flags, + dconn) < 0) { goto cleanup; } } else { @@ -4265,7 +4281,8 @@ qemuMigrationRun(virQEMUDriverPtr driver, /* cancel any outstanding NBD jobs */ if (mig && mig->nbd) { if (qemuMigrationCancelDriveMirror(driver, vm, ret == 0, - QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + QEMU_ASYNC_JOB_MIGRATION_OUT, + dconn) < 0) ret = -1; } @@ -5853,7 +5870,7 @@ qemuMigrationCancel(virQEMUDriverPtr driver, } if (qemuMigrationCancelDriveMirror(driver, vm, false, - QEMU_ASYNC_JOB_NONE) < 0) + QEMU_ASYNC_JOB_NONE, NULL) < 0) goto endsyncjob; ret = 0; -- 2.4.3

When cancelling migration we can see the following conversation on QMP monitor: {"execute":"migrate_cancel","id":"libvirt-33"} {"timestamp": {"seconds": 1432899178, "microseconds": 844907}, "event": "MIGRATION", "data": {"status": "cancelling"}} {"return": {}, "id": "libvirt-33"} {"timestamp": {"seconds": 1432899178, "microseconds": 845625}, "event": "MIGRATION", "data": {"status": "failed"}} {"timestamp": {"seconds": 1432899178, "microseconds": 846432}, "event": "MIGRATION", "data": {"status": "cancelled"}} That is, migration status first changes to "failed" just to change to the correct "cancelled" state in a few moments later. However, this is enough to let libvirt report migration failed for unknown reason instead of having been cancelled by a user. This should really be fixed in QEMU but I'm not sure how easy it is. However, it's pretty easy for us to work around it. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- Notes: Version 3: - mark as "DO NOT APPLY" -- Juan will fix the bug in QEMU in the next version of his migration event series Version 2: - new patch src/qemu/qemu_process.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 93c0844..dd43657 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1518,6 +1518,7 @@ qemuProcessHandleMigrationStatus(qemuMonitorPtr mon ATTRIBUTE_UNUSED, void *opaque ATTRIBUTE_UNUSED) { qemuDomainObjPrivatePtr priv; + qemuDomainJobInfoPtr jobInfo; virObjectLock(vm); @@ -1532,7 +1533,15 @@ qemuProcessHandleMigrationStatus(qemuMonitorPtr mon ATTRIBUTE_UNUSED, goto cleanup; } - priv->job.current->status.status = status; + jobInfo = priv->job.current; + if (status == QEMU_MONITOR_MIGRATION_STATUS_ERROR && + jobInfo->status.status == QEMU_MONITOR_MIGRATION_STATUS_CANCELLING) { + VIR_DEBUG("State changed from \"cancelling\" to \"failed\"; setting " + "current state to \"cancelled\""); + status = QEMU_MONITOR_MIGRATION_STATUS_CANCELLED; + } + + jobInfo->status.status = status; virDomainObjSignal(vm); cleanup: -- 2.4.3

On 06/10/2015 09:42 AM, Jiri Denemark wrote:
QEMU will soon (patches are available on qemu-devel) get support for migration events which will finally allow us to get rid of polling query-migrate every 50ms. However, we first need to be able to wait for all events related to migration (migration status changes, block job events, async abort requests) at once. This series prepares the infrastructure and uses it to switch all polling loops in migration code to pthread_cond_wait.
https://bugzilla.redhat.com/show_bug.cgi?id=1212077
Version 3 (see individual patches for details): - most of the series has been ACKed in v2 - "qemu: Use domain condition for synchronous block jobs" was split in 3 patches for easier review - minor changes requested in v2 review
Version 2 (see individual patches for details): - rewritten using per-domain condition variable - enahnced to fully support the migration events
Jiri Denemark (24): conf: Introduce per-domain condition variable qemu: Introduce qemuBlockJobUpdate qemu: Properly report failed migration qemu: Use domain condition for synchronous block jobs qemu: Cancel storage migration in parallel qemu: Abort migration early if disk mirror failed qemu: Don't mess with disk->mirrorState Pass domain object to private data formatter/parser qemu: Make qemuMigrationCancelDriveMirror usable without async job qemu: Refactor qemuMonitorBlockJobInfo qemu: Cancel disk mirrors after libvirtd restart qemu: Use domain condition for asyncAbort qemu_monitor: Wire up SPICE_MIGRATE_COMPLETED event qemu: Do not poll for spice migration status qemu: Refactor qemuDomainGetJob{Info,Stats} qemu: Refactor qemuMigrationUpdateJobStatus qemu: Don't pass redundant job name around qemu: Refactor qemuMigrationWaitForCompletion qemu_monitor: Wire up MIGRATION event qemuDomainGetJobStatsInternal: Support migration events qemu: Update migration state according to MIGRATION event qemu: Wait for migration events on domain condition qemu: cancel drive mirrors when p2p connection breaks DO NOT APPLY: qemu: Work around weird migration status changes
po/POTFILES.in | 1 - src/conf/domain_conf.c | 51 ++- src/conf/domain_conf.h | 12 +- src/libvirt_private.syms | 6 + src/libxl/libxl_domain.c | 10 +- src/lxc/lxc_domain.c | 12 +- src/qemu/qemu_blockjob.c | 185 +++-------- src/qemu/qemu_blockjob.h | 15 +- src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain.c | 78 +++-- src/qemu/qemu_domain.h | 7 +- src/qemu/qemu_driver.c | 201 +++++++----- src/qemu/qemu_migration.c | 763 +++++++++++++++++++++++++++++-------------- src/qemu/qemu_migration.h | 8 + src/qemu/qemu_monitor.c | 73 ++++- src/qemu/qemu_monitor.h | 33 +- src/qemu/qemu_monitor_json.c | 152 ++++----- src/qemu/qemu_monitor_json.h | 7 +- src/qemu/qemu_process.c | 92 +++++- tests/qemumonitorjsontest.c | 40 --- 21 files changed, 1057 insertions(+), 693 deletions(-)
Just ran this through my Coverity checker - only one issue RESOURCE_LEAK in qemuMigrationRun: 4235 if (qemuMigrationCheckJobStatus(driver, vm, 4236 QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) (4) Event if_end: End of if statement 4237 goto cancel; 4238 (5) Event open_fn: Returning handle opened by "accept". (6) Event var_assign: Assigning: "fd" = handle returned from "accept(spec->dest.unix_socket.sock, __SOCKADDR_ARG({ .__sockaddr__ = NULL}), NULL)". (7) Event cond_false: Condition "(fd = accept(spec->dest.unix_socket.sock, __SOCKADDR_ARG({ .__sockaddr__ = NULL}), NULL)) < 0", taking false branch Also see events: [leaked_handle] 4239 while ((fd = accept(spec->dest.unix_socket.sock, NULL, NULL)) < 0) { 4240 if (errno == EAGAIN || errno == EINTR) 4241 continue; ... 4252 rc = qemuMigrationWaitForCompletion(driver, vm, 4253 QEMU_ASYNC_JOB_MIGRATION_OUT, 4254 dconn, abort_on_error, !!mig->nbd); (13) Event cond_true: Condition "rc == -2", taking true branch 4255 if (rc == -2) (14) Event goto: Jumping to label "cancel" 4256 goto cancel; 4257 else if (rc == -1) ... 4288 (28) Event cond_false: Condition "spec->fwdType != MIGRATION_FWD_DIRECT", taking false branch 4289 if (spec->fwdType != MIGRATION_FWD_DIRECT) { 4290 if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) 4291 ret = -1; 4292 VIR_FORCE_CLOSE(fd); (29) Event if_end: End of if statement 4293 } 4294 ... 4322 } 4323 (38) Event leaked_handle: Handle variable "fd" going out of scope leaks the handle. Also see events: [open_fn][var_assign] 4324 return ret; 4325 4326 exit_monitor: 4327 ignore_value(qemuDomainObjExitMonitor(driver, vm)); 4328 goto cleanup; 4329 (15) Event label: Reached label "cancel" 4330 cancel: 4331 orig_err = virSaveLastError(); 4332 (16) Event cond_true: Condition "virDomainObjIsActive(vm)", taking true branch ...

On Wed, Jun 10, 2015 at 10:27:11 -0400, John Ferlan wrote:
On 06/10/2015 09:42 AM, Jiri Denemark wrote:
QEMU will soon (patches are available on qemu-devel) get support for migration events which will finally allow us to get rid of polling query-migrate every 50ms. However, we first need to be able to wait for all events related to migration (migration status changes, block job events, async abort requests) at once. This series prepares the infrastructure and uses it to switch all polling loops in migration code to pthread_cond_wait.
https://bugzilla.redhat.com/show_bug.cgi?id=1212077
Version 3 (see individual patches for details): - most of the series has been ACKed in v2 - "qemu: Use domain condition for synchronous block jobs" was split in 3 patches for easier review - minor changes requested in v2 review
Version 2 (see individual patches for details): - rewritten using per-domain condition variable - enahnced to fully support the migration events
Just ran this through my Coverity checker - only one issue
RESOURCE_LEAK in qemuMigrationRun:
4235 if (qemuMigrationCheckJobStatus(driver, vm, 4236 QEMU_ASYNC_JOB_MIGRATION_OUT) < 0)
(4) Event if_end: End of if statement
4237 goto cancel; 4238
(5) Event open_fn: Returning handle opened by "accept". (6) Event var_assign: Assigning: "fd" = handle returned from "accept(spec->dest.unix_socket.sock, __SOCKADDR_ARG({ .__sockaddr__ = NULL}), NULL)". (7) Event cond_false: Condition "(fd = accept(spec->dest.unix_socket.sock, __SOCKADDR_ARG({ .__sockaddr__ = NULL}), NULL)) < 0", taking false branch Also see events: [leaked_handle]
4239 while ((fd = accept(spec->dest.unix_socket.sock, NULL, NULL)) < 0) { 4240 if (errno == EAGAIN || errno == EINTR) 4241 continue;
Hmm, what an old and unused (except for some ancient QEMU versions) code path :-) However, this code is only executed if spec->destType == MIGRATION_DEST_UNIX, which only happens in tunnelled migration path, which also sets spec.fwdType = MIGRATION_FWD_STREAM. ...
(28) Event cond_false: Condition "spec->fwdType != MIGRATION_FWD_DIRECT", taking false branch
4289 if (spec->fwdType != MIGRATION_FWD_DIRECT) { 4290 if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) 4291 ret = -1; 4292 VIR_FORCE_CLOSE(fd);
Which means "spec->fwdType != MIGRATION_FWD_DIRECT" will be true and fd will be correctly closed. Jirka

On 06/10/2015 11:06 AM, Jiri Denemark wrote:
On Wed, Jun 10, 2015 at 10:27:11 -0400, John Ferlan wrote:
On 06/10/2015 09:42 AM, Jiri Denemark wrote:
QEMU will soon (patches are available on qemu-devel) get support for migration events which will finally allow us to get rid of polling query-migrate every 50ms. However, we first need to be able to wait for all events related to migration (migration status changes, block job events, async abort requests) at once. This series prepares the infrastructure and uses it to switch all polling loops in migration code to pthread_cond_wait.
https://bugzilla.redhat.com/show_bug.cgi?id=1212077
Version 3 (see individual patches for details): - most of the series has been ACKed in v2 - "qemu: Use domain condition for synchronous block jobs" was split in 3 patches for easier review - minor changes requested in v2 review
Version 2 (see individual patches for details): - rewritten using per-domain condition variable - enahnced to fully support the migration events
Just ran this through my Coverity checker - only one issue
RESOURCE_LEAK in qemuMigrationRun:
4235 if (qemuMigrationCheckJobStatus(driver, vm, 4236 QEMU_ASYNC_JOB_MIGRATION_OUT) < 0)
(4) Event if_end: End of if statement
4237 goto cancel; 4238
(5) Event open_fn: Returning handle opened by "accept". (6) Event var_assign: Assigning: "fd" = handle returned from "accept(spec->dest.unix_socket.sock, __SOCKADDR_ARG({ .__sockaddr__ = NULL}), NULL)". (7) Event cond_false: Condition "(fd = accept(spec->dest.unix_socket.sock, __SOCKADDR_ARG({ .__sockaddr__ = NULL}), NULL)) < 0", taking false branch Also see events: [leaked_handle]
4239 while ((fd = accept(spec->dest.unix_socket.sock, NULL, NULL)) < 0) { 4240 if (errno == EAGAIN || errno == EINTR) 4241 continue;
Hmm, what an old and unused (except for some ancient QEMU versions) code path :-) However, this code is only executed if spec->destType == MIGRATION_DEST_UNIX, which only happens in tunnelled migration path, which also sets spec.fwdType = MIGRATION_FWD_STREAM.
Placing "sa_assert(spec->fwdType == MIGRATION_FWD_STREAM);" above the while loop makes Coverity happy. John
...
(28) Event cond_false: Condition "spec->fwdType != MIGRATION_FWD_DIRECT", taking false branch
4289 if (spec->fwdType != MIGRATION_FWD_DIRECT) { 4290 if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) 4291 ret = -1; 4292 VIR_FORCE_CLOSE(fd);
Which means "spec->fwdType != MIGRATION_FWD_DIRECT" will be true and fd will be correctly closed.
Jirka

On Wed, Jun 10, 2015 at 11:16:29 -0400, John Ferlan wrote:
On 06/10/2015 11:06 AM, Jiri Denemark wrote:
On Wed, Jun 10, 2015 at 10:27:11 -0400, John Ferlan wrote:
On 06/10/2015 09:42 AM, Jiri Denemark wrote:
QEMU will soon (patches are available on qemu-devel) get support for migration events which will finally allow us to get rid of polling query-migrate every 50ms. However, we first need to be able to wait for all events related to migration (migration status changes, block job events, async abort requests) at once. This series prepares the infrastructure and uses it to switch all polling loops in migration code to pthread_cond_wait.
https://bugzilla.redhat.com/show_bug.cgi?id=1212077
Version 3 (see individual patches for details): - most of the series has been ACKed in v2 - "qemu: Use domain condition for synchronous block jobs" was split in 3 patches for easier review - minor changes requested in v2 review
Version 2 (see individual patches for details): - rewritten using per-domain condition variable - enahnced to fully support the migration events
Just ran this through my Coverity checker - only one issue
RESOURCE_LEAK in qemuMigrationRun:
4235 if (qemuMigrationCheckJobStatus(driver, vm, 4236 QEMU_ASYNC_JOB_MIGRATION_OUT) < 0)
(4) Event if_end: End of if statement
4237 goto cancel; 4238
(5) Event open_fn: Returning handle opened by "accept". (6) Event var_assign: Assigning: "fd" = handle returned from "accept(spec->dest.unix_socket.sock, __SOCKADDR_ARG({ .__sockaddr__ = NULL}), NULL)". (7) Event cond_false: Condition "(fd = accept(spec->dest.unix_socket.sock, __SOCKADDR_ARG({ .__sockaddr__ = NULL}), NULL)) < 0", taking false branch Also see events: [leaked_handle]
4239 while ((fd = accept(spec->dest.unix_socket.sock, NULL, NULL)) < 0) { 4240 if (errno == EAGAIN || errno == EINTR) 4241 continue;
Hmm, what an old and unused (except for some ancient QEMU versions) code path :-) However, this code is only executed if spec->destType == MIGRATION_DEST_UNIX, which only happens in tunnelled migration path, which also sets spec.fwdType = MIGRATION_FWD_STREAM.
Placing "sa_assert(spec->fwdType == MIGRATION_FWD_STREAM);" above the while loop makes Coverity happy.
Feel free to push the sa_assert, it's completely unrelated to this series and it has been there for ages. Jirka

On Wed, Jun 10, 2015 at 15:42:34 +0200, Jiri Denemark wrote:
QEMU will soon (patches are available on qemu-devel) get support for migration events which will finally allow us to get rid of polling query-migrate every 50ms. However, we first need to be able to wait for all events related to migration (migration status changes, block job events, async abort requests) at once. This series prepares the infrastructure and uses it to switch all polling loops in migration code to pthread_cond_wait.
https://bugzilla.redhat.com/show_bug.cgi?id=1212077
Version 3 (see individual patches for details): - most of the series has been ACKed in v2 - "qemu: Use domain condition for synchronous block jobs" was split in 3 patches for easier review - minor changes requested in v2 review
Version 2 (see individual patches for details): - rewritten using per-domain condition variable - enahnced to fully support the migration events
Jiri Denemark (24): conf: Introduce per-domain condition variable qemu: Introduce qemuBlockJobUpdate qemu: Properly report failed migration qemu: Use domain condition for synchronous block jobs qemu: Cancel storage migration in parallel qemu: Abort migration early if disk mirror failed qemu: Don't mess with disk->mirrorState Pass domain object to private data formatter/parser qemu: Make qemuMigrationCancelDriveMirror usable without async job qemu: Refactor qemuMonitorBlockJobInfo qemu: Cancel disk mirrors after libvirtd restart qemu: Use domain condition for asyncAbort qemu_monitor: Wire up SPICE_MIGRATE_COMPLETED event qemu: Do not poll for spice migration status qemu: Refactor qemuDomainGetJob{Info,Stats} qemu: Refactor qemuMigrationUpdateJobStatus qemu: Don't pass redundant job name around qemu: Refactor qemuMigrationWaitForCompletion qemu_monitor: Wire up MIGRATION event qemuDomainGetJobStatsInternal: Support migration events qemu: Update migration state according to MIGRATION event qemu: Wait for migration events on domain condition qemu: cancel drive mirrors when p2p connection breaks
ACK to the above ones once qemu accepts the event stuff. Peter
DO NOT APPLY: qemu: Work around weird migration status changes

On Wed, Jun 10, 2015 at 17:13:12 +0200, Peter Krempa wrote:
On Wed, Jun 10, 2015 at 15:42:34 +0200, Jiri Denemark wrote:
QEMU will soon (patches are available on qemu-devel) get support for migration events which will finally allow us to get rid of polling query-migrate every 50ms. However, we first need to be able to wait for all events related to migration (migration status changes, block job events, async abort requests) at once. This series prepares the infrastructure and uses it to switch all polling loops in migration code to pthread_cond_wait.
https://bugzilla.redhat.com/show_bug.cgi?id=1212077
Version 3 (see individual patches for details): - most of the series has been ACKed in v2 - "qemu: Use domain condition for synchronous block jobs" was split in 3 patches for easier review - minor changes requested in v2 review
Version 2 (see individual patches for details): - rewritten using per-domain condition variable - enahnced to fully support the migration events
Jiri Denemark (24): conf: Introduce per-domain condition variable qemu: Introduce qemuBlockJobUpdate qemu: Properly report failed migration qemu: Use domain condition for synchronous block jobs qemu: Cancel storage migration in parallel qemu: Abort migration early if disk mirror failed qemu: Don't mess with disk->mirrorState Pass domain object to private data formatter/parser qemu: Make qemuMigrationCancelDriveMirror usable without async job qemu: Refactor qemuMonitorBlockJobInfo qemu: Cancel disk mirrors after libvirtd restart qemu: Use domain condition for asyncAbort qemu_monitor: Wire up SPICE_MIGRATE_COMPLETED event qemu: Do not poll for spice migration status qemu: Refactor qemuDomainGetJob{Info,Stats} qemu: Refactor qemuMigrationUpdateJobStatus qemu: Don't pass redundant job name around qemu: Refactor qemuMigrationWaitForCompletion qemu_monitor: Wire up MIGRATION event qemuDomainGetJobStatsInternal: Support migration events qemu: Update migration state according to MIGRATION event qemu: Wait for migration events on domain condition qemu: cancel drive mirrors when p2p connection breaks
ACK to the above ones once qemu accepts the event stuff.
Thanks, I pushed all patches which did not rely on the new MIGRATION event, in other words, all except the following ones: qemu_monitor: Wire up MIGRATION event qemuDomainGetJobStatsInternal: Support migration events qemu: Update migration state according to MIGRATION event qemu: Wait for migration events on domain condition DO NOT APPLY: qemu: Work around weird migration status changes Jirka
participants (3)
-
Jiri Denemark
-
John Ferlan
-
Peter Krempa