[libvirt PATCH 0/2] A few post-copy related fixes
*** BLURB EVERYWHERE *** Jiri Denemark (2): qemu: Always restore post-copy migration job on reconnect qemu: Ignore failure in post-copy migration when QEMU says completed src/qemu/qemu_migration.c | 8 ++++++++ src/qemu/qemu_process.c | 15 ++++++++++----- 2 files changed, 18 insertions(+), 5 deletions(-) -- 2.38.1
We need the restored job even in case the migration already finished even though we will stop it just a few lines below as the functions we call in between require an existing migration job. This fixes a crash on reconnect when post-copy migration finished while the daemon was not running. Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_process.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 0769f30d74..e1c18dde90 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3635,11 +3635,14 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, if (rc > 0) { job->phase = QEMU_MIGRATION_PHASE_POSTCOPY_FAILED; + /* Even though we restore the migration async job here, the APIs below + * use VIR_ASYNC_JOB_NONE because we're already in a MODIFY job started + * before we reconnected to the domain. */ + qemuProcessRestoreMigrationJob(vm, job); if (migStatus == VIR_DOMAIN_JOB_STATUS_POSTCOPY) { VIR_DEBUG("Post-copy migration of domain %s still running, it will be handled as unattended", vm->def->name); - qemuProcessRestoreMigrationJob(vm, job); return 0; } @@ -3648,17 +3651,19 @@ qemuProcessRecoverMigration(virQEMUDriver *driver, qemuMigrationSrcPostcopyFailed(vm); else qemuMigrationDstPostcopyFailed(vm); - - qemuProcessRestoreMigrationJob(vm, job); return 0; } VIR_DEBUG("Post-copy migration of domain %s already finished", vm->def->name); - if (job->asyncJob == VIR_ASYNC_JOB_MIGRATION_OUT) + if (job->asyncJob == VIR_ASYNC_JOB_MIGRATION_OUT) { qemuMigrationSrcComplete(driver, vm, VIR_ASYNC_JOB_NONE); - else + /* No need to stop the restored job as the domain has just been + * destroyed. */ + } else { qemuMigrationDstComplete(driver, vm, true, VIR_ASYNC_JOB_NONE, job); + virDomainObjEndAsyncJob(vm); + } return 0; } -- 2.38.1
On Fri, Nov 18, 2022 at 16:37:21 +0100, Jiri Denemark wrote:
We need the restored job even in case the migration already finished even though we will stop it just a few lines below as the functions we call in between require an existing migration job.
This fixes a crash on reconnect when post-copy migration finished while the daemon was not running.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_process.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
When post-copy migration is running in Finish phase we already did everything needed and we're just waiting for all the memory to transfer to the destination. The domain is already running on there at this point. Once all data is transferred (QEMU sends a MIGRATION completed event) we're done. So in this specific post-copy case the source does not need to care about the result of the Finish call as long as QEMU says migration completed. The Finish call to the destination daemon may fail for reasons that do not affect QEMU, e.g., libvirt daemon was restarted there or the libvirt connection broke. Currently we just mark the post-copy migration as failed on the source and keep the domain paused there. But when libvirt daemon is restarted at this point, it will detect migration finished successfully and kill the domain as migrated. It make sense to do this even without having to restart the daemon. Closes: https://gitlab.com/libvirt/libvirt/-/issues/338 Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index bba4e1dbf3..bef06f4caf 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -3901,6 +3901,7 @@ qemuMigrationSrcConfirmPhase(virQEMUDriver *driver, g_autoptr(qemuMigrationCookie) mig = NULL; qemuDomainObjPrivate *priv = vm->privateData; qemuDomainJobPrivate *jobPriv = vm->job->privateData; + qemuDomainJobDataPrivate *currentData = vm->job->current->privateData; virDomainJobData *jobData = NULL; qemuMigrationJobPhase phase; @@ -3911,6 +3912,13 @@ qemuMigrationSrcConfirmPhase(virQEMUDriver *driver, virCheckFlags(QEMU_MIGRATION_FLAGS, -1); + if (retcode != 0 && + virDomainObjIsPostcopy(vm, VIR_DOMAIN_JOB_OPERATION_MIGRATION_OUT) && + currentData->stats.mig.status == QEMU_MONITOR_MIGRATION_STATUS_COMPLETED) { + VIR_DEBUG("Finish phase failed, but QEMU reports post-copy migration is completed; forcing success"); + retcode = 0; + } + if (flags & VIR_MIGRATE_POSTCOPY_RESUME) { phase = QEMU_MIGRATION_PHASE_CONFIRM_RESUME; } else if (virDomainObjIsFailedPostcopy(vm)) { -- 2.38.1
On Fri, Nov 18, 2022 at 16:37:22 +0100, Jiri Denemark wrote:
When post-copy migration is running in Finish phase we already did everything needed and we're just waiting for all the memory to transfer to the destination. The domain is already running on there at this point. Once all data is transferred (QEMU sends a MIGRATION completed event) we're done. So in this specific post-copy case the source does not need to care about the result of the Finish call as long as QEMU says migration completed. The Finish call to the destination daemon may fail for reasons that do not affect QEMU, e.g., libvirt daemon was restarted there or the libvirt connection broke.
Currently we just mark the post-copy migration as failed on the source and keep the domain paused there. But when libvirt daemon is restarted at this point, it will detect migration finished successfully and kill the domain as migrated. It make sense to do this even without having to restart the daemon.
Closes: https://gitlab.com/libvirt/libvirt/-/issues/338
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 8 ++++++++ 1 file changed, 8 insertions(+)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>
participants (2)
-
Jiri Denemark -
Peter Krempa