Re: [PATCH 1/1] qemu: fix potential hang in qemuMigrationSrcCancel during reconnect

20 Mar 2026

On 3/19/26 11:51, Jiri Denemark wrote:
...
...
When libvirtd reconnects to a running QEMU process that had an
in-progress migration, qemuProcessReconnect first connects the
monitor and only later recovers the migration job. During this window
the async job is VIR_ASYNC_JOB_NONE, so any MIGRATION status events
from QEMU are silently dropped by qemuProcessHandleMigrationStatus.
If the migration was already cancelled or completed by QEMU during
this window, no further events will be emitted. When
qemuMigrationSrcCancelUnattended later restores the async job and
calls qemuMigrationSrcCancel with wait=true, the wait loop calls
qemuDomainObjWait (virCondWait with no timeout) and blocks forever
waiting for an event that will never arrive.
Fix this by querying QEMU migration status with query-migrate
immediately after sending migrate_cancel, while still inside the
monitor session. This ensures the job's migration status is up to
date before entering the wait loop, so if QEMU already reached a
terminal state (cancelled/completed/error), the loop exits
immediately.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Peter Krempa <pkrempa@redhat.com>
CC: Michal Privoznik <mprivozn@redhat.com>
---
 src/qemu/qemu_migration.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index fec808ccfb..3a9185f65c 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -4876,6 +4876,21 @@ qemuMigrationSrcCancel(virDomainObj *vm,
         return -1;
rc = qemuMonitorMigrateCancel(priv->mon);
+
+    if (rc == 0 && wait) {
+        virDomainJobData *jobData = vm->job->current;
+        qemuDomainJobDataPrivate *privJob = jobData->privateData;
+        qemuMonitorMigrationStats stats;
+
+        /* During reconnect the async job is not yet restored when migration
+         * events can arrive from QEMU, causing
+         * qemuProcessHandleMigrationStatus() to drop them. In that case
+         * QEMU won't send any more events and the wait loop would block
+         * forever. */
+        if (qemuMonitorGetMigrationStats(priv->mon, &stats, NULL) == 0)
+            privJob->stats.mig.status = stats.status;
+    }
+
     qemuDomainObjExitMonitor(vm);
if (rc < 0)
This is a wrong place to fix the issue. The qemuProcessRecoverMigration
is already checking the current migration state and passes it to
qemuProcessRecoverMigrationOut as migStatus. The state just needs to be
On Fri, Mar 13, 2026 at 12:55:18 +0100, Denis V. Lunev wrote:
passed down to qemuMigrationSrcCancelUnattended which would then skip
the qemuMigrationSrcCancel call (and
qemuDomainObjRestoreAsyncJob/virDomainObjEndAsyncJob too) if migStatus
is VIR_DOMAIN_JOB_STATUS_CANCELED.
Jirka
For me it looks like with this approach the window of the race would
be still open.

If the previous libvirtd sent migrate_cancel and died while QEMU was
still processing it, we could reconnect and see a non-final migration
state. That's not VIR_DOMAIN_JOB_STATUS_CANCELED, thus within your
proposal we will proceed with the cancel path. But migration has already
been cancelled from QEMU's point of view and the event has arrived at
libvirt before qemuDomainObjRestoreAsyncJob is called.

At least this is my understanding of the situation.

I agree with you that the original approach is ugly and we should not
look inside objects. What about just qemuMigrationAnyRefreshStatus
before the actual cancel?

I am sending this as v2 now.

Den