[PATCH 0/2] libxl: migration fixes

A different (and more correct IMO) approach to fixing issues in the migration error handling logic. Initial attempt here https://listman.redhat.com/archives/libvir-list/2023-July/240652.html Jim Fehlig (2): libxl: Don't attempt to resume domain on canceled migration libxl: Advertise support for VIR_MIGRATE_CHANGE_PROTECTION src/libxl/libxl_driver.c | 2 +- src/libxl/libxl_migration.c | 16 +--------------- src/libxl/libxl_migration.h | 3 ++- 3 files changed, 4 insertions(+), 17 deletions(-) -- 2.41.0

For unknown reasons, the libxl driver attempts to resume a domain in the confirm phase when a migration operation has been canceled. This has shown to be problematic when simulating scenarios that result in a canceled migration. In all scenarios, the domain was in a running state when entering libxlDomainMigrationSrcConfirm, causing the call to libxl_domain_resume to fail. Making matters worse, the domain state is changed to paused when in fact it's running. And finally, libxlDomainMigrationSrcConfirm incorrectly returns an error. Remove this incorrect logic from libxlDomainMigrationSrcConfirm. On a canceled migration it's sufficient to resume the lock process that was paused in the perform phase. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/libxl/libxl_migration.c | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c index 961815f9f7..a91091f5e8 100644 --- a/src/libxl/libxl_migration.c +++ b/src/libxl/libxl_migration.c @@ -1342,7 +1342,6 @@ libxlDomainMigrationSrcConfirm(libxlDriverPrivate *driver, libxlDriverConfig *cfg = libxlDriverConfigGet(driver); libxlDomainObjPrivate *priv = vm->privateData; virObjectEvent *event = NULL; - int ret = -1; if (cancelled) { /* Resume lock process that was paused in MigrationSrcPerform */ @@ -1351,17 +1350,6 @@ libxlDomainMigrationSrcConfirm(libxlDriverPrivate *driver, vm, priv->lockState); priv->lockProcessRunning = true; - if (libxl_domain_resume(cfg->ctx, vm->def->id, 1, 0) == 0) { - ret = 0; - } else { - VIR_DEBUG("Unable to resume domain '%s' after failed migration", - vm->def->name); - virDomainObjSetState(vm, VIR_DOMAIN_PAUSED, - VIR_DOMAIN_PAUSED_MIGRATION); - event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_SUSPENDED, - VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED); - ignore_value(virDomainObjSave(vm, driver->xmlopt, cfg->stateDir)); - } goto cleanup; } @@ -1380,12 +1368,10 @@ libxlDomainMigrationSrcConfirm(libxlDriverPrivate *driver, if (!vm->persistent || (flags & VIR_MIGRATE_UNDEFINE_SOURCE)) virDomainObjListRemove(driver->domains, vm); - ret = 0; - cleanup: /* EndJob for corresponding BeginJob in begin phase */ virDomainObjEndJob(vm); virObjectEventStateQueue(driver->domainEventState, event); virObjectUnref(cfg); - return ret; + return 0; } -- 2.41.0

On Wed, Jul 12, 2023 at 03:26:14PM -0600, Jim Fehlig wrote:
For unknown reasons, the libxl driver attempts to resume a domain in the confirm phase when a migration operation has been canceled. This has shown to be problematic when simulating scenarios that result in a canceled migration. In all scenarios, the domain was in a running state when entering libxlDomainMigrationSrcConfirm, causing the call to libxl_domain_resume to fail. Making matters worse, the domain state is changed to paused when in fact it's running. And finally, libxlDomainMigrationSrcConfirm incorrectly returns an error.
Remove this incorrect logic from libxlDomainMigrationSrcConfirm. On a canceled migration it's sufficient to resume the lock process that was paused in the perform phase.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/libxl/libxl_migration.c | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

The libxl driver has basic support for VIR_MIGRATE_CHANGE_PROTECTION by starting and stopping modify jobs in the begin/confirm and prepare/finish phases of migration, but it doesn't advertise that support. This can result in unterminated jobs because the migration logic skips phases of migration when the VIR_MIGRATE_CHANGE_PROTECTION feature is absent. Ensure jobs are terminated properly by advertising support for VIR_MIGRATE_CHANGE_PROTECTION. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/libxl/libxl_driver.c | 2 +- src/libxl/libxl_migration.h | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index d99bc37bf6..2644d1400a 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -5846,8 +5846,8 @@ libxlConnectSupportsFeature(virConnectPtr conn, int feature) case VIR_DRV_FEATURE_MIGRATION_V3: case VIR_DRV_FEATURE_MIGRATION_PARAMS: case VIR_DRV_FEATURE_MIGRATION_P2P: - return 1; case VIR_DRV_FEATURE_MIGRATE_CHANGE_PROTECTION: + return 1; case VIR_DRV_FEATURE_MIGRATION_DIRECT: case VIR_DRV_FEATURE_MIGRATION_OFFLINE: case VIR_DRV_FEATURE_MIGRATION_V1: diff --git a/src/libxl/libxl_migration.h b/src/libxl/libxl_migration.h index 078510c672..19b68d08b7 100644 --- a/src/libxl/libxl_migration.h +++ b/src/libxl/libxl_migration.h @@ -28,7 +28,8 @@ VIR_MIGRATE_TUNNELLED | \ VIR_MIGRATE_PERSIST_DEST | \ VIR_MIGRATE_UNDEFINE_SOURCE | \ - VIR_MIGRATE_PAUSED) + VIR_MIGRATE_PAUSED | \ + VIR_MIGRATE_CHANGE_PROTECTION) /* All supported migration parameters and their types. */ #define LIBXL_MIGRATION_PARAMETERS \ -- 2.41.0

On Wed, Jul 12, 2023 at 03:26:15PM -0600, Jim Fehlig wrote:
The libxl driver has basic support for VIR_MIGRATE_CHANGE_PROTECTION by starting and stopping modify jobs in the begin/confirm and prepare/finish phases of migration, but it doesn't advertise that support. This can result in unterminated jobs because the migration logic skips phases of migration when the VIR_MIGRATE_CHANGE_PROTECTION feature is absent. Ensure jobs are terminated properly by advertising support for VIR_MIGRATE_CHANGE_PROTECTION.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/libxl/libxl_driver.c | 2 +- src/libxl/libxl_migration.h | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Daniel P. Berrangé
-
Jim Fehlig