On Tue, May 10, 2022 at 17:20:35 +0200, Jiri Denemark wrote:
When post-copy migration fails, we can't just abort the migration
and
resume the domain on the source host as it is already running on the
destination host and no host has a complete state of the domain memory.
Instead of the current approach of just marking the domain on both ends
as paused/running with a post-copy failed sub state, we will keep the
migration job active (even though the migration API will return failure)
so that the state is more visible and we can better control what APIs
can be called on the domains and even allow for resuming the migration.
Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
---
src/qemu/qemu_migration.c | 94 ++++++++++++++++++++++++++++-----------
1 file changed, 68 insertions(+), 26 deletions(-)
@@ -5445,11 +5479,12 @@ qemuMigrationSrcPerformPhase(virQEMUDriver
*driver,
goto endjob;
endjob:
- if (ret < 0) {
+ if (ret < 0 && !virDomainObjIsFailedPostcopy(vm)) {
qemuMigrationParamsReset(driver, vm, VIR_ASYNC_JOB_MIGRATION_OUT,
jobPriv->migParams, priv->job.apiFlags);
qemuMigrationJobFinish(vm);
} else {
+ qemuDomainCleanupAdd(vm, qemuProcessCleanupMigrationJob);
qemuMigrationJobContinue(vm);
}
This logic change is a bit obscure and IMO would benefit from a comment
stating that we want to continue all post-copy migration jobs and all
successful other migrations.
Reviewed-by: Peter Krempa <pkrempa(a)redhat.com>