On Fri, Mar 13, 2026 at 12:55:18 +0100, Denis V. Lunev wrote:
When libvirtd reconnects to a running QEMU process that had an in-progress migration, qemuProcessReconnect first connects the monitor and only later recovers the migration job. During this window the async job is VIR_ASYNC_JOB_NONE, so any MIGRATION status events from QEMU are silently dropped by qemuProcessHandleMigrationStatus.
If the migration was already cancelled or completed by QEMU during this window, no further events will be emitted. When qemuMigrationSrcCancelUnattended later restores the async job and calls qemuMigrationSrcCancel with wait=true, the wait loop calls qemuDomainObjWait (virCondWait with no timeout) and blocks forever waiting for an event that will never arrive.
Fix this by querying QEMU migration status with query-migrate immediately after sending migrate_cancel, while still inside the monitor session. This ensures the job's migration status is up to date before entering the wait loop, so if QEMU already reached a terminal state (cancelled/completed/error), the loop exits immediately.
Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Krempa <pkrempa@redhat.com> CC: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index fec808ccfb..3a9185f65c 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -4876,6 +4876,21 @@ qemuMigrationSrcCancel(virDomainObj *vm, return -1;
rc = qemuMonitorMigrateCancel(priv->mon); + + if (rc == 0 && wait) { + virDomainJobData *jobData = vm->job->current; + qemuDomainJobDataPrivate *privJob = jobData->privateData; + qemuMonitorMigrationStats stats; + + /* During reconnect the async job is not yet restored when migration + * events can arrive from QEMU, causing + * qemuProcessHandleMigrationStatus() to drop them. In that case + * QEMU won't send any more events and the wait loop would block + * forever. */ + if (qemuMonitorGetMigrationStats(priv->mon, &stats, NULL) == 0) + privJob->stats.mig.status = stats.status; + } + qemuDomainObjExitMonitor(vm);
if (rc < 0)
This is a wrong place to fix the issue. The qemuProcessRecoverMigration is already checking the current migration state and passes it to qemuProcessRecoverMigrationOut as migStatus. The state just needs to be passed down to qemuMigrationSrcCancelUnattended which would then skip the qemuMigrationSrcCancel call (and qemuDomainObjRestoreAsyncJob/virDomainObjEndAsyncJob too) if migStatus is VIR_DOMAIN_JOB_STATUS_CANCELED. Jirka