[libvirt] [PATCH 0/5] qemu: Wait until destination QEMU consumes all migration data

Even though QEMU on the source host reports completed migration and thus we move to the Finish phase, QEMU on the destination host may still be processing migration data. Thus before we can start guest CPUs on the destination, we have to wait for a completed migration event. https://bugzilla.redhat.com/show_bug.cgi?id=1265902 Jiri Denemark (5): qemu: Always update migration times on destination qemu: Copy completed migration stats only on success qemu: Introduce flags in qemuMigrationCompleted qemu: Make updating stats in qemuMigrationCheckJobStatus optional qemu: Wait until destination QEMU consumes all migration data src/qemu/qemu_migration.c | 86 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 63 insertions(+), 23 deletions(-) -- 2.6.0

Even if we are migrating a domain with VIR_MIGRATE_PAUSED flag set, we should still update the total time of the migration. Updating downtime doesn't hurt either, even though we don't actually start guest CPUs. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 7440108..c771db6 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -5743,10 +5743,11 @@ qemuMigrationFinish(virQEMUDriverPtr driver, if (v3proto) goto endjob; } - if (priv->job.completed) { - qemuDomainJobInfoUpdateTime(priv->job.completed); - qemuDomainJobInfoUpdateDowntime(priv->job.completed); - } + } + + if (priv->job.completed) { + qemuDomainJobInfoUpdateTime(priv->job.completed); + qemuDomainJobInfoUpdateDowntime(priv->job.completed); } dom = virGetDomain(dconn, vm->def->name, vm->def->uuid); -- 2.6.0

On Mon, Oct 05, 2015 at 16:15:58 +0200, Jiri Denemark wrote:
Even if we are migrating a domain with VIR_MIGRATE_PAUSED flag set, we should still update the total time of the migration. Updating downtime doesn't hurt either, even though we don't actually start guest CPUs.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
ACK, Peter

The destination host gets detailed statistics about the current migration form the source host via migration cookie and copies them to the domain object so that they can be queried using virDomainGetJobStats. However, we should only copy statistics to the domain object when migration finished successfully. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index c771db6..ff9491c 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -5671,6 +5671,10 @@ qemuMigrationFinish(virQEMUDriverPtr driver, dom = virGetDomain(dconn, vm->def->name, vm->def->uuid); } else if (retcode == 0) { + unsigned long long timeReceived = 0; + + ignore_value(virTimeMillisNow(&timeReceived)); + if (!virDomainObjIsActive(vm)) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("guest unexpectedly quit")); @@ -5678,16 +5682,6 @@ qemuMigrationFinish(virQEMUDriverPtr driver, goto endjob; } - if (mig->jobInfo) { - qemuDomainJobInfoPtr jobInfo = mig->jobInfo; - priv->job.completed = jobInfo; - mig->jobInfo = NULL; - if (jobInfo->sent && virTimeMillisNow(&jobInfo->received) == 0) { - jobInfo->timeDelta = jobInfo->received - jobInfo->sent; - jobInfo->timeDeltaSet = true; - } - } - if (qemuMigrationVPAssociatePortProfiles(vm->def) < 0) goto endjob; @@ -5745,7 +5739,16 @@ qemuMigrationFinish(virQEMUDriverPtr driver, } } - if (priv->job.completed) { + if (mig->jobInfo) { + qemuDomainJobInfoPtr jobInfo = mig->jobInfo; + priv->job.completed = jobInfo; + mig->jobInfo = NULL; + + if (jobInfo->sent && timeReceived) { + jobInfo->timeDelta = timeReceived - jobInfo->sent; + jobInfo->received = timeReceived; + jobInfo->timeDeltaSet = true; + } qemuDomainJobInfoUpdateTime(priv->job.completed); qemuDomainJobInfoUpdateDowntime(priv->job.completed); } -- 2.6.0

On Mon, Oct 05, 2015 at 16:15:59 +0200, Jiri Denemark wrote:
The destination host gets detailed statistics about the current migration form the source host via migration cookie and copies them to the domain object so that they can be queried using virDomainGetJobStats. However, we should only copy statistics to the domain object when migration finished successfully.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
ACK, Peter

The function already has two bool parameters and we will need to add a new one. Let's switch to flags to make the callers readable. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ff9491c..1672323 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2618,6 +2618,11 @@ qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, } +enum qemuMigrationCompletedFlags { + QEMU_MIGRATION_COMPLETED_ABORT_ON_ERROR = (1 << 0), + QEMU_MIGRATION_COMPLETED_CHECK_STORAGE = (1 << 1), +}; + /** * Returns 1 if migration completed successfully, * 0 if the domain is still being migrated, @@ -2629,8 +2634,7 @@ qemuMigrationCompleted(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob asyncJob, virConnectPtr dconn, - bool abort_on_error, - bool storage) + unsigned int flags) { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; @@ -2639,10 +2643,11 @@ qemuMigrationCompleted(virQEMUDriverPtr driver, if (qemuMigrationCheckJobStatus(driver, vm, asyncJob) < 0) goto error; - if (storage && qemuMigrationDriveMirrorReady(driver, vm) < 0) + if (flags & QEMU_MIGRATION_COMPLETED_CHECK_STORAGE && + qemuMigrationDriveMirrorReady(driver, vm) < 0) goto error; - if (abort_on_error && + if (flags & QEMU_MIGRATION_COMPLETED_ABORT_ON_ERROR && virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED && pauseReason == VIR_DOMAIN_PAUSED_IOERROR) { virReportError(VIR_ERR_OPERATION_FAILED, _("%s: %s"), @@ -2689,11 +2694,17 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; bool events = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_EVENT); + unsigned int flags = 0; int rv; + if (abort_on_error) + flags |= QEMU_MIGRATION_COMPLETED_ABORT_ON_ERROR; + if (storage) + flags |= QEMU_MIGRATION_COMPLETED_CHECK_STORAGE; + jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED; - while ((rv = qemuMigrationCompleted(driver, vm, asyncJob, dconn, - abort_on_error, storage)) != 1) { + while ((rv = qemuMigrationCompleted(driver, vm, asyncJob, + dconn, flags)) != 1) { if (rv < 0) return rv; -- 2.6.0

On Mon, Oct 05, 2015 at 16:16:00 +0200, Jiri Denemark wrote:
The function already has two bool parameters and we will need to add a new one. Let's switch to flags to make the callers readable.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ff9491c..1672323 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2618,6 +2618,11 @@ qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, }
+enum qemuMigrationCompletedFlags { + QEMU_MIGRATION_COMPLETED_ABORT_ON_ERROR = (1 << 0), + QEMU_MIGRATION_COMPLETED_CHECK_STORAGE = (1 << 1), +};
I plan to do similar stuff with qemuMigrationIsAllowed and it's parameters. :) ACK, Peter

With new QEMU which supports migration events, qemuMigrationCheckJobStatus needs to explicitly query QEMU for migration statistics once migration is completed to make sure the caller sees up-to-date statistics with both old and new QEMU. However, some callers are not interested in the statistics at all and once we start waiting for a completed migration on the destination host too, checking the statistics would even fail. Let's push the decision whether to update the statistics or not to the caller. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 1672323..cf3df64 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2574,7 +2574,8 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, static int qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, virDomainObjPtr vm, - qemuDomainAsyncJob asyncJob) + qemuDomainAsyncJob asyncJob, + bool updateJobStats) { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; @@ -2604,7 +2605,7 @@ qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, case VIR_DOMAIN_JOB_COMPLETED: /* Fetch statistics of a completed migration */ - if (events && + if (events && updateJobStats && qemuMigrationUpdateJobStatus(driver, vm, asyncJob) < 0) return -1; break; @@ -2621,6 +2622,7 @@ qemuMigrationCheckJobStatus(virQEMUDriverPtr driver, enum qemuMigrationCompletedFlags { QEMU_MIGRATION_COMPLETED_ABORT_ON_ERROR = (1 << 0), QEMU_MIGRATION_COMPLETED_CHECK_STORAGE = (1 << 1), + QEMU_MIGRATION_COMPLETED_UPDATE_STATS = (1 << 2), }; /** @@ -2639,8 +2641,9 @@ qemuMigrationCompleted(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; int pauseReason; + bool updateStats = !!(flags & QEMU_MIGRATION_COMPLETED_UPDATE_STATS); - if (qemuMigrationCheckJobStatus(driver, vm, asyncJob) < 0) + if (qemuMigrationCheckJobStatus(driver, vm, asyncJob, updateStats) < 0) goto error; if (flags & QEMU_MIGRATION_COMPLETED_CHECK_STORAGE && @@ -2694,7 +2697,7 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; bool events = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_EVENT); - unsigned int flags = 0; + unsigned int flags = QEMU_MIGRATION_COMPLETED_UPDATE_STATS; int rv; if (abort_on_error) @@ -4358,7 +4361,8 @@ qemuMigrationRun(virQEMUDriverPtr driver, * connection from qemu which may never be initiated. */ if (qemuMigrationCheckJobStatus(driver, vm, - QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + QEMU_ASYNC_JOB_MIGRATION_OUT, + false) < 0) goto cancel; while ((fd = accept(spec->dest.unix_socket.sock, NULL, NULL)) < 0) { -- 2.6.0

On Mon, Oct 05, 2015 at 16:16:01 +0200, Jiri Denemark wrote:
With new QEMU which supports migration events, qemuMigrationCheckJobStatus needs to explicitly query QEMU for migration statistics once migration is completed to make sure the caller sees up-to-date statistics with both old and new QEMU. However, some callers are not interested in the statistics at all and once we start waiting for a completed migration on the destination host too, checking the statistics would even fail. Let's push the decision whether to update the statistics or not to the caller.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
ACK, Peter

Even though QEMU on the source host reports completed migration and thus we move to the Finish phase, QEMU on the destination host may still be processing migration data. Thus before we can start guest CPUs on the destination, we have to wait for a completed migration event. https://bugzilla.redhat.com/show_bug.cgi?id=1265902 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index cf3df64..fa81a16 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -5725,6 +5725,27 @@ qemuMigrationFinish(virQEMUDriverPtr driver, } } + /* We need to wait for QEMU to process all data sent by the source + * before starting guest CPUs. + */ + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_EVENT)) { + int rv; + VIR_DEBUG("Waiting for migration to complete"); + while ((rv = qemuMigrationCompleted(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN, + NULL, 0)) != 1) { + if (rv < 0 || virDomainObjWait(vm) < 0) { + /* There's not much we can do for v2 protocol since the + * original domain on the source host is already gone. + */ + if (v3proto) + goto endjob; + else + break; + } + } + } + if (!(flags & VIR_MIGRATE_PAUSED)) { /* run 'cont' on the destination, which allows migration on qemu * >= 0.10.6 to work properly. This isn't strictly necessary on -- 2.6.0

On Mon, Oct 05, 2015 at 16:16:02 +0200, Jiri Denemark wrote:
Even though QEMU on the source host reports completed migration and thus we move to the Finish phase, QEMU on the destination host may still be processing migration data. Thus before we can start guest CPUs on the destination, we have to wait for a completed migration event.
https://bugzilla.redhat.com/show_bug.cgi?id=1265902
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
ACK, Peter

On Tue, Oct 06, 2015 at 10:43:56 +0200, Peter Krempa wrote:
On Mon, Oct 05, 2015 at 16:16:02 +0200, Jiri Denemark wrote:
Even though QEMU on the source host reports completed migration and thus we move to the Finish phase, QEMU on the destination host may still be processing migration data. Thus before we can start guest CPUs on the destination, we have to wait for a completed migration event.
https://bugzilla.redhat.com/show_bug.cgi?id=1265902
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
ACK,
Pushed, thanks. Jirka
participants (2)
-
Jiri Denemark
-
Peter Krempa