
On Wed, May 18, 2022 at 15:40:18 +0200, Jiri Denemark wrote:
On Thu, May 12, 2022 at 14:14:15 +0200, Peter Krempa wrote:
On Tue, May 10, 2022 at 17:21:15 +0200, Jiri Denemark wrote:
Mostly we just need to check whether the domain is in a failed post-copy migration that can be resumed.
Signed-off-by: Jiri Denemark <jdenemar@redhat.com> --- src/qemu/qemu_migration.c | 143 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 143 insertions(+)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index c111dd8686..99b1d4b88b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2717,6 +2717,143 @@ qemuMigrationSrcBeginPhase(virQEMUDriver *driver, flags); }
+ +static bool +qemuMigrationAnyCanResume(virDomainObj *vm, + virDomainAsyncJob job, + unsigned long flags, + qemuMigrationJobPhase expectedPhase) +{ + qemuDomainObjPrivate *priv = vm->privateData; + + VIR_DEBUG("vm=%p, job=%s, flags=0x%lx, expectedPhase=%s", + vm, virDomainAsyncJobTypeToString(job), flags, + qemuDomainAsyncJobPhaseToString(VIR_ASYNC_JOB_MIGRATION_OUT, + expectedPhase)); + + if (!(flags & VIR_MIGRATE_POSTCOPY)) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("resuming failed post-copy migration requires " + "post-copy to be enabled"));
Error messages shall not have line breaks.
Also ... requiring this flag seems to be a bit pointless and ... cosmetic at best. Since the previous migration would need to be started with the post-copy flag already which is checked below.
Right, the previous migration must have been started as post-copy, but resuming it is post-copy too. We could have implied the flag based on POSTCOPY_RESUME (which is ugly) or all relevant places checking POSTCOPY would need to check POSTCOPY_RESUME too (which is ugly as well). And with the expected use case (repeating the failed migration command with all the flags and arguments and adding an extra POSTCOPY_RESUME flag) in mind requiring POSTCOPY to be used with POSTCOPY_RESUME looks like the the best option to me.
I was kind of referring to the fact that it's implied that postcopy is being resumed with a POSTCOPY_RESUME flag and you need to check anyways that the migration is indeed postcopy, so this feels redundant, but I think it's okay.
+ return false; + } + + /* This should never happen since POSTCOPY_RESUME is newer than + * CHANGE_PROTECTION, but let's check it anyway in case we're talking to + * a weired client. + */ + if (job == VIR_ASYNC_JOB_MIGRATION_OUT && + expectedPhase < QEMU_MIGRATION_PHASE_PERFORM_RESUME && + !(flags & VIR_MIGRATE_CHANGE_PROTECTION)) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("resuming failed post-copy migration requires " + "change protection"));
Same here.
+ return NULL; + }
[...]
+static char * +qemuMigrationSrcBeginResume(virQEMUDriver *driver, + virDomainObj *vm, + const char *xmlin, + char **cookieout, + int *cookieoutlen, + unsigned long flags) +{ + virDomainJobStatus status; + + if (qemuMigrationAnyRefreshStatus(driver, vm, VIR_ASYNC_JOB_MIGRATION_OUT, + &status) < 0) + return NULL; + + if (status != VIR_DOMAIN_JOB_STATUS_POSTCOPY_PAUSED) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("QEMU reports migration is still running")); + return NULL; + } + + return qemuMigrationSrcBeginXML(driver, vm, xmlin, + cookieout, cookieoutlen, 0, NULL, 0, flags);
Certain steps here around the non-shared-storage migration are redundant at this point and IMO should be skipped.
Which is what happens in qemuMigrationSrcBegin:
if (flags & VIR_MIGRATE_POSTCOPY_RESUME) { xml = qemuMigrationSrcBeginResumePhase(...); goto cleanup; }
...
Or did I miss anything?
In 'qemuMigrationSrcBeginXML' qemuMigrationSrcBeginPhaseBlockDirtyBitmaps is called which does a bunch of setup for bitmap migration which is pointless when recovering as storage is copied already if the CPUs are running on the destination.