On Wed, Jun 01, 2022 at 14:49:03 +0200, Jiri Denemark wrote:
This new "post-copy failed" reason for the running state
will be used on
the destination host when post-copy migration fails while the domain is
already running there.
Signed-off-by: Jiri Denemark <jdenemar(a)redhat.com>
---
Notes:
Version 2:
- documented both VIR_DOMAIN_RUNNING_POSTCOPY_FAILED and
VIR_DOMAIN_PAUSED_POSTCOPY_FAILED possibilities on the destination
examples/c/misc/event-test.c | 3 +++
include/libvirt/libvirt-domain.h | 2 ++
src/conf/domain_conf.c | 1 +
src/libvirt-domain.c | 26 +++++++++++++++++++-------
src/qemu/qemu_domain.c | 3 +++
tools/virsh-domain-event.c | 3 ++-
tools/virsh-domain-monitor.c | 1 +
7 files changed, 31 insertions(+), 8 deletions(-)
[...]
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c
index e3ced700b8..b9f1d73d5a 100644
--- a/src/libvirt-domain.c
+++ b/src/libvirt-domain.c
@@ -9764,10 +9764,16 @@ virDomainMigrateGetMaxSpeed(virDomainPtr domain,
* at most once no matter how fast it changes. On the other hand once the
* guest is running on the destination host, the migration can no longer be
* rolled back because none of the hosts has complete state. If this happens,
- * libvirt will leave the domain paused on both hosts with
- * VIR_DOMAIN_PAUSED_POSTCOPY_FAILED reason. It's up to the upper layer to
- * decide what to do in such case. Because of this, libvirt will refuse to
- * cancel post-copy migration via virDomainAbortJob.
+ * libvirt will leave the domain paused on the source host with
+ * VIR_DOMAIN_PAUSED_POSTCOPY_FAILED reason. The domain on the destination host
+ * will either remain running with VIR_DOMAIN_RUNNING_POSTCOPY_FAILED reason if
+ * libvirt loses control over the migration (e.g., the daemon is restarted or
+ * libvirt connection is broken) while QEMU is still able to continue migrating
+ * memory pages from the source to the destination or it will be paused with
+ * VIR_DOMAIN_PAUSED_POSTCOPY_FAILED if even the connection between QEMU
+ * processes gets broken. It's up to the upper layer to decide what to do in
I presume this bit is still up for discussion, right? Currently with the
RFC patch 81 you'd attempt to pause it but qemu will break anyways IIUC.
If that is the case, this should for now document it properly.
+ * such case. Because of this, libvirt will refuse to cancel
post-copy
+ * migration via virDomainAbortJob.
*
* The following domain life cycle events are emitted during post-copy
* migration:
@@ -9781,9 +9787,15 @@ virDomainMigrateGetMaxSpeed(virDomainPtr domain,
* VIR_DOMAIN_EVENT_RESUMED_MIGRATED (on the destination),
* VIR_DOMAIN_EVENT_STOPPED_MIGRATED (on the source) -- migration finished
* successfully and the destination host holds a complete guest state.
- * VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY_FAILED (on the destination) -- emitted
- * when migration fails in post-copy mode and it's unclear whether any
- * of the hosts has a complete guest state.
+ * VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY_FAILED (on the source),
+ * VIR_DOMAIN_EVENT_RESUMED_POSTCOPY_FAILED (on the destination) -- emitted
+ * when migration fails in post-copy mode from libvirt's point of view
+ * and it's unclear whether any of the hosts has a complete guest state.
+ * This happens when libvirt loses control over the migration. Virtual
+ * CPUs on the destination are still running.
+ * VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY_FAILED (on the destination) -- QEMU is
+ * not able to keep migration running in post-copy mode (i.e., its
+ * connection is broken) and libvirt stops virtual CPUs on the destination.
Ditto.
*
* The progress of a post-copy migration can be monitored normally using
* virDomainGetJobStats on the source host. Fetching statistics of a completed