On Wed, Feb 12, 2025 at 13:33:46 +0100, Peter Krempa wrote:
On incoming migration qemu doesn't activate the block graph nodes
right
away. This is to properly facilitate locking of the images.
The block nodes are normally re-activated when starting the CPUs after
migration, but in cases (e.g. when a paused VM was migrated) when the VM
is left paused the block nodes are not re-activated by qemu.
This means that blockjobs which would want to write to an existing
backing chain member would fail. Generally read-only jobs would succeed
with older qemu's but this was not intended.
Instead with new qemu you'll always get an error if attempting to access
a inactive node:
error: internal error: unable to execute QEMU command 'blockdev-mirror':
Inactive 'libvirt-1-storage' can't be a backing child of active
'#block052'
This is the case for explicit blockjobs (virsh blockcopy) but also for
non shared-storage migration (virsh migrate --copy-storage-all).
Since qemu now provides 'blockdev-set-active' QMP command which can
on-demand re-activate the nodes we can re-activate them in similar cases
as when we'd be starting vCPUs if the VM weren't left paused.
The only exception is on the source in case of a failed post-copy
migration as the VM already ran on destination so it won't ever run on
the source even when recovered.
Resolves:
https://issues.redhat.com/browse/RHEL-78398
Signed-off-by: Peter Krempa <pkrempa(a)redhat.com>
---
v2:
- qemuMigrationBlockNodesReactivate now preserves previous errors
- qemuMigrationSrcRestoreDomainState:
- separated post-copy failure state to avoid reactivation
- moved the reactivation under a label
- reactivating also when resuming of cpus fails
src/qemu/qemu_migration.c | 56 ++++++++++++++++++++++++++++++++++++---
1 file changed, 52 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 53bbbee629..b36d779657 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -220,6 +220,43 @@ qemuMigrationSrcStoreDomainState(virDomainObj *vm)
}
+/**
+ * qemuMigrationBlockNodesReactivate:
+ *
+ * In case when we're keeping the VM paused qemu will not re-activate the block
+ * device backend tree so blockjobs would fail. In case when qemu supports the
+ * 'blockdev-set-active' command this function will re-activate the block
nodes.
+ */
+static void
+qemuMigrationBlockNodesReactivate(virDomainObj *vm,
+ virDomainAsyncJob asyncJob)
+{
+ virErrorPtr orig_err;
+ qemuDomainObjPrivate *priv = vm->privateData;
+ int rc;
+
+ if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV_SET_ACTIVE))
+ return;
+
+ VIR_DEBUG("re-activating block nodes");
+
+ virErrorPreserveLast(&orig_err);
+
+ if (qemuDomainObjEnterMonitorAsync(vm, asyncJob) < 0)
+ goto cleanup;
+
+ rc = qemuMonitorBlockdevSetActive(priv->mon, NULL, true);
+
+ qemuDomainObjExitMonitor(vm);
+
+ if (rc < 0)
+ VIR_WARN("failed to re-activate block nodes after migration of VM
'%s'", vm->def->name);
+
+ cleanup:
+ virErrorRestore(&orig_err);
+}
+
+
static void
qemuMigrationSrcRestoreDomainState(virQEMUDriver *driver, virDomainObj *vm)
{
@@ -236,14 +273,16 @@ qemuMigrationSrcRestoreDomainState(virQEMUDriver *driver,
virDomainObj *vm)
virDomainStateTypeToString(state),
virDomainStateReasonToString(state, reason));
- if (preMigrationState != VIR_DOMAIN_RUNNING ||
- state != VIR_DOMAIN_PAUSED ||
- reason == VIR_DOMAIN_PAUSED_POSTCOPY_FAILED)
+ if (reason == VIR_DOMAIN_PAUSED_POSTCOPY_FAILED)
state == VIR_DOMAIN_PAUSED && reason == VIR_DOMAIN_PAUSED_POSTCOPY_FAILED
otherwise you could be comparing completely unrelated numbers
return;
+ if (preMigrationState != VIR_DOMAIN_RUNNING ||
+ state != VIR_DOMAIN_PAUSED)
+ goto reactivate;
+
if (reason == VIR_DOMAIN_PAUSED_IOERROR) {
VIR_DEBUG("Domain is paused due to I/O error, skipping resume");
- return;
+ goto reactivate;
}
VIR_DEBUG("Restoring pre-migration state due to migration error");
Reviewed-by: Jiri Denemark <jdenemar(a)redhat.com>