[libvirt] [PATCH] qemu: Clear async job when p2p migration fails early

When p2p migration fails early because qemuMigrationIsAllowed or qemuMigrationIsSafe say migration should be cancelled, we fail to clear the migration-out async job. As a result of that, further APIs called for the same domain may fail with Timed out during operation: cannot acquire state change lock. Reported by Guido Winkelmann. --- src/qemu/qemu_migration.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 68d614d..65cd6ec 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2641,10 +2641,10 @@ qemuMigrationPerformJob(struct qemud_driver *driver, } if (!qemuMigrationIsAllowed(driver, vm, NULL)) - goto cleanup; + goto endjob; if (!(flags & VIR_MIGRATE_UNSAFE) && !qemuMigrationIsSafe(vm->def)) - goto cleanup; + goto endjob; resume = virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING; -- 1.7.12.3

On 10/17/12 14:29, Jiri Denemark wrote:
When p2p migration fails early because qemuMigrationIsAllowed or qemuMigrationIsSafe say migration should be cancelled, we fail to clear the migration-out async job. As a result of that, further APIs called for the same domain may fail with Timed out during operation: cannot acquire state change lock.
Reported by Guido Winkelmann. --- src/qemu/qemu_migration.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Or you might move the checks before acquiring the job. But that would be just a minor improvement. ACK. Peter

On Wed, Oct 17, 2012 at 14:34:25 +0200, Peter Krempa wrote:
On 10/17/12 14:29, Jiri Denemark wrote:
When p2p migration fails early because qemuMigrationIsAllowed or qemuMigrationIsSafe say migration should be cancelled, we fail to clear the migration-out async job. As a result of that, further APIs called for the same domain may fail with Timed out during operation: cannot acquire state change lock.
Reported by Guido Winkelmann. --- src/qemu/qemu_migration.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
Or you might move the checks before acquiring the job.
That wouldn't work because the domain is unlocked when we wait for a job and thus it's definition can change.
ACK.
Pushed, thanks. Jirka

On Wed, Oct 17, 2012 at 14:29:04 +0200, Jiri Denemark wrote:
When p2p migration fails early because qemuMigrationIsAllowed or qemuMigrationIsSafe say migration should be cancelled, we fail to clear the migration-out async job. As a result of that, further APIs called for the same domain may fail with Timed out during operation: cannot acquire state change lock.
Reported by Guido Winkelmann.
BTW, this bug was first introduced in 0.9.5 (commit e2fb96d92b4b986a2b5732416f7bfd302a848970) and than copied in the next if statement. In other words, this patch seems to be worth backporting to various maintenance releases. Jirka
participants (2)
-
Jiri Denemark
-
Peter Krempa