[PATCH v2] qemu: fix potential hang in qemuMigrationSrcCancelUnattended during reconnect
When libvirtd reconnects to a running QEMU process that had an in-progress migration, qemuProcessReconnect first connects the monitor and only later recovers the migration job. During this window the async job is VIR_ASYNC_JOB_NONE, so any MIGRATION status events from QEMU are silently dropped by qemuProcessHandleMigrationStatus. If the migration was already cancelled or completed by QEMU during this window, no further events will be emitted. When qemuMigrationSrcCancelUnattended later restores the async job and calls qemuMigrationSrcCancel with wait=true, the wait loop calls qemuDomainObjWait (virCondWait with no timeout) and blocks forever waiting for an event that will never arrive. Fix this by re-querying QEMU migration state with qemuMigrationAnyRefreshStatus after restoring the async job but before calling qemuMigrationSrcCancel. If QEMU has already reached a terminal state, the cancel is skipped. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jiri Denemark <jdenemar@redhat.com> CC: Peter Krempa <pkrempa@redhat.com> CC: Michal Privoznik <mprivozn@redhat.com> CC: Efim Shevrin <efim.shevrin@virtuozzo.com> --- v1 -> v2: Instead of querying QEMU with query-migrate inside qemuMigrationSrcCancel, use qemuMigrationAnyRefreshStatus in qemuMigrationSrcCancelUnattended after restoring the async job to re-check migration state before the actual cancel. src/qemu/qemu_migration.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index fec808ccfb..a4bd7efa09 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -7330,6 +7330,7 @@ int qemuMigrationSrcCancelUnattended(virDomainObj *vm, virDomainJobObj *oldJob) { + virDomainJobStatus migStatus = VIR_DOMAIN_JOB_STATUS_NONE; bool storage = false; size_t i; @@ -7348,11 +7349,20 @@ qemuMigrationSrcCancelUnattended(virDomainObj *vm, VIR_JOB_NONE); } - /* We're inside a MODIFY job and the restored MIGRATION_OUT async job is - * used only for processing migration events from QEMU. Thus we don't want - * to start a nested job for talking to QEMU. + /* Query the actual migration state from QEMU. The state passed to + * qemuProcessRecoverMigrationOut may be stale: QEMU could have + * reached a terminal state between that initial query and the async + * job restore above, with the corresponding event silently dropped. */ - qemuMigrationSrcCancel(vm, VIR_ASYNC_JOB_NONE, true); + qemuMigrationAnyRefreshStatus(vm, VIR_ASYNC_JOB_NONE, &migStatus); + + if (migStatus != VIR_DOMAIN_JOB_STATUS_CANCELED) { + /* We're inside a MODIFY job and the restored MIGRATION_OUT async + * job is used only for processing migration events from QEMU. + * Thus we don't want to start a nested job for talking to QEMU. + */ + qemuMigrationSrcCancel(vm, VIR_ASYNC_JOB_NONE, true); + } virDomainObjEndAsyncJob(vm); -- 2.51.0
participants (1)
-
Denis V. Lunev