[PATCH v2] qemu: fix potential hang in qemuMigrationSrcCancelUnattended during reconnect

20 Mar 2026

When libvirtd reconnects to a running QEMU process that had an
in-progress migration, qemuProcessReconnect first connects the
monitor and only later recovers the migration job. During this window
the async job is VIR_ASYNC_JOB_NONE, so any MIGRATION status events
from QEMU are silently dropped by qemuProcessHandleMigrationStatus.

If the migration was already cancelled or completed by QEMU during
this window, no further events will be emitted. When
qemuMigrationSrcCancelUnattended later restores the async job and
calls qemuMigrationSrcCancel with wait=true, the wait loop calls
qemuDomainObjWait (virCondWait with no timeout) and blocks forever
waiting for an event that will never arrive.

Fix this by re-querying QEMU migration state with
qemuMigrationAnyRefreshStatus after restoring the async job but before
calling qemuMigrationSrcCancel. If QEMU has already reached a terminal
state, the cancel is skipped.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Jiri Denemark <jdenemar@redhat.com>
CC: Peter Krempa <pkrempa@redhat.com>
CC: Michal Privoznik <mprivozn@redhat.com>
CC: Efim Shevrin <efim.shevrin@virtuozzo.com>
---
v1 -> v2: Instead of querying QEMU with query-migrate inside
qemuMigrationSrcCancel, use qemuMigrationAnyRefreshStatus in
qemuMigrationSrcCancelUnattended after restoring the async job
to re-check migration state before the actual cancel.

 src/qemu/qemu_migration.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index fec808ccfb..a4bd7efa09 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -7330,6 +7330,7 @@ int
 qemuMigrationSrcCancelUnattended(virDomainObj *vm,
                                  virDomainJobObj *oldJob)
 {
+    virDomainJobStatus migStatus = VIR_DOMAIN_JOB_STATUS_NONE;
     bool storage = false;
     size_t i;
 
@@ -7348,11 +7349,20 @@ qemuMigrationSrcCancelUnattended(virDomainObj *vm,
                                      VIR_JOB_NONE);
     }
 
-    /* We're inside a MODIFY job and the restored MIGRATION_OUT async job is
-     * used only for processing migration events from QEMU. Thus we don't want
-     * to start a nested job for talking to QEMU.
+    /* Query the actual migration state from QEMU. The state passed to
+     * qemuProcessRecoverMigrationOut may be stale: QEMU could have
+     * reached a terminal state between that initial query and the async
+     * job restore above, with the corresponding event silently dropped.
      */
-    qemuMigrationSrcCancel(vm, VIR_ASYNC_JOB_NONE, true);
+    qemuMigrationAnyRefreshStatus(vm, VIR_ASYNC_JOB_NONE, &migStatus);
+
+    if (migStatus != VIR_DOMAIN_JOB_STATUS_CANCELED) {
+        /* We're inside a MODIFY job and the restored MIGRATION_OUT async
+         * job is used only for processing migration events from QEMU.
+         * Thus we don't want to start a nested job for talking to QEMU.
+         */
+        qemuMigrationSrcCancel(vm, VIR_ASYNC_JOB_NONE, true);
+    }
 
     virDomainObjEndAsyncJob(vm);
 
-- 
2.51.0

    

Denis V. Lunev

tags

participants (1)