On Wed, May 11, 2022 at 14:35:22 +0200, Peter Krempa wrote:
On Tue, May 10, 2022 at 17:20:35 +0200, Jiri Denemark wrote:
> When post-copy migration fails, we can't just abort the migration and
> resume the domain on the source host as it is already running on the
> destination host and no host has a complete state of the domain memory.
> Instead of the current approach of just marking the domain on both ends
> as paused/running with a post-copy failed sub state, we will keep the
> migration job active (even though the migration API will return failure)
> so that the state is more visible and we can better control what APIs
> can be called on the domains and even allow for resuming the migration.
>
> Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
> ---
> src/qemu/qemu_migration.c | 94 ++++++++++++++++++++++++++++-----------
> 1 file changed, 68 insertions(+), 26 deletions(-)
> @@ -5445,11 +5479,12 @@ qemuMigrationSrcPerformPhase(virQEMUDriver *driver,
> goto endjob;
>
> endjob:
> - if (ret < 0) {
> + if (ret < 0 && !virDomainObjIsFailedPostcopy(vm)) {
> qemuMigrationParamsReset(driver, vm, VIR_ASYNC_JOB_MIGRATION_OUT,
> jobPriv->migParams, priv->job.apiFlags);
> qemuMigrationJobFinish(vm);
> } else {
> + qemuDomainCleanupAdd(vm, qemuProcessCleanupMigrationJob);
> qemuMigrationJobContinue(vm);
> }
This logic change is a bit obscure and IMO would benefit from a comment
stating that we want to continue all post-copy migration jobs and all
successful other migrations.
I'm not sure what is so special about this hunk to make it look obscure.
It's the same change done everywhere in this patch, which is about
continuing the job if it succeeded (existing) or failed in post-copy
(new).
Jirka