On Thu, Sep 01, 2022 at 14:47:41 +0200, Jiri Denemark wrote:
We have always considered "migrate_cancel" QMP command to
return after
successfully cancelling the migration. But this is no longer true (to be
honest I'm not sure it ever was) as it just changes the migration state
to "cancelling". In most cases the migration is canceled pretty quickly
and we don't really notice anything, but sometimes it takes so long we
even get to clearing migration capabilities before the migration is
actually canceled, which fails as capabilities can only be changed when
no migration is running. So to avoid this issue, we can wait for the
migration to be really canceled after sending migrate_cancel. The only
place where we don't need synchronous behavior is when we're cancelling
migration on user's request while it is actively watched by another
thread.
https://bugzilla.redhat.com/show_bug.cgi?id=2114866
Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
---
src/qemu/qemu_driver.c | 2 +-
src/qemu/qemu_migration.c | 45 +++++++++++++++++++++++++++++++++++----
src/qemu/qemu_migration.h | 3 ++-
src/qemu/qemu_process.c | 2 +-
4 files changed, 45 insertions(+), 7 deletions(-)
[...]
int
qemuMigrationSrcCancel(virDomainObj *vm,
- virDomainAsyncJob asyncJob)
+ virDomainAsyncJob asyncJob,
+ bool wait)
{
qemuDomainObjPrivate *priv = vm->privateData;
@@ -4625,6 +4653,15 @@ qemuMigrationSrcCancel(virDomainObj *vm,
qemuMonitorMigrateCancel(priv->mon);
qemuDomainObjExitMonitor(vm);
+ if (virDomainObjIsActive(vm) && wait) {
Is the call to virDomainObjIsActive() necessary here? IIUC the domain
shutdown code is always executed in a way to make sure that waiting
threads are always woken.
+ VIR_DEBUG("Waiting for migration to be
canceled");
+
+ while (!qemuMigrationSrcIsCanceled(vm)) {
+ if (qemuDomainObjWait(vm) < 0)
So here if the VM would crash before we wait we'd report success and if
it crashed during our wait we'll report failure, which seems weird too.
+ return -1;
+ }
The rest of the patch looks okay.