On 3/19/26 11:51, Jiri Denemark wrote:
When libvirtd reconnects to a running QEMU process that had an in-progress migration, qemuProcessReconnect first connects the monitor and only later recovers the migration job. During this window the async job is VIR_ASYNC_JOB_NONE, so any MIGRATION status events from QEMU are silently dropped by qemuProcessHandleMigrationStatus.
If the migration was already cancelled or completed by QEMU during this window, no further events will be emitted. When qemuMigrationSrcCancelUnattended later restores the async job and calls qemuMigrationSrcCancel with wait=true, the wait loop calls qemuDomainObjWait (virCondWait with no timeout) and blocks forever waiting for an event that will never arrive.
Fix this by querying QEMU migration status with query-migrate immediately after sending migrate_cancel, while still inside the monitor session. This ensures the job's migration status is up to date before entering the wait loop, so if QEMU already reached a terminal state (cancelled/completed/error), the loop exits immediately.
Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Peter Krempa <pkrempa@redhat.com> CC: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_migration.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index fec808ccfb..3a9185f65c 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -4876,6 +4876,21 @@ qemuMigrationSrcCancel(virDomainObj *vm, return -1;
rc = qemuMonitorMigrateCancel(priv->mon); + + if (rc == 0 && wait) { + virDomainJobData *jobData = vm->job->current; + qemuDomainJobDataPrivate *privJob = jobData->privateData; + qemuMonitorMigrationStats stats; + + /* During reconnect the async job is not yet restored when migration + * events can arrive from QEMU, causing + * qemuProcessHandleMigrationStatus() to drop them. In that case + * QEMU won't send any more events and the wait loop would block + * forever. */ + if (qemuMonitorGetMigrationStats(priv->mon, &stats, NULL) == 0) + privJob->stats.mig.status = stats.status; + } + qemuDomainObjExitMonitor(vm);
if (rc < 0) This is a wrong place to fix the issue. The qemuProcessRecoverMigration is already checking the current migration state and passes it to qemuProcessRecoverMigrationOut as migStatus. The state just needs to be
On Fri, Mar 13, 2026 at 12:55:18 +0100, Denis V. Lunev wrote: passed down to qemuMigrationSrcCancelUnattended which would then skip the qemuMigrationSrcCancel call (and qemuDomainObjRestoreAsyncJob/virDomainObjEndAsyncJob too) if migStatus is VIR_DOMAIN_JOB_STATUS_CANCELED.
Jirka
For me it looks like with this approach the window of the race would be still open. If the previous libvirtd sent migrate_cancel and died while QEMU was still processing it, we could reconnect and see a non-final migration state. That's not VIR_DOMAIN_JOB_STATUS_CANCELED, thus within your proposal we will proceed with the cancel path. But migration has already been cancelled from QEMU's point of view and the event has arrived at libvirt before qemuDomainObjRestoreAsyncJob is called. At least this is my understanding of the situation. I agree with you that the original approach is ugly and we should not look inside objects. What about just qemuMigrationAnyRefreshStatus before the actual cancel? I am sending this as v2 now. Den