
On Jun 18, 2024, at 11:14 AM, Daniel P. Berrangé <berrange@redhat.com> wrote:
!-------------------------------------------------------------------| CAUTION: External Email
|-------------------------------------------------------------------!
On Tue, Jun 18, 2024 at 08:06:06AM -0700, Jon Kohler wrote:
Add plumbing for QEMU's switchover-ack migration capability, which helps lower the downtime during VFIO migrations. This capability is enabled by default as long as both the source and destination support it.
Note: switchover-ack depends on the return path capability, so this may not be used when VIR_MIGRATE_TUNNELLED flag is set.
Extensive details about the qemu switchover-ack implementation are available in the qemu series v6 cover letter [1] where the highlight is the extreme reduction in guest visible downtime. In addition to the original test results below, I saw a roughly ~20% reduction in downtime for VFIO VGPU devices at minimum.
=== Test results ===
The below table shows the downtime of two identical migrations. In the first migration swithcover ack is disabled and in the second it is enabled. The migrated VM is assigned with a mlx5 VFIO device which has 300MB of device data to be migrated.
+----------------------+-----------------------+----------+ | Switchover ack | VFIO device data size | Downtime | +----------------------+-----------------------+----------+ | Disabled | 300MB | 1900ms | | Enabled | 300MB | 420ms | +----------------------+-----------------------+----------+
Switchover ack gives a roughly 4.5 times improvement in downtime. The 1480ms difference is time that is used for resource allocation for the VFIO device in the destination. Without switchover ack, this time is spent when the source VM is stopped and thus the downtime is much higher. With switchover ack, the time is spent when the source VM is still running.
Signed-off-by: Jon Kohler <jon@nutanix.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Avihai Horon <avihaih@nvidia.com> Cc: Markus Armbruster <armbru@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: YangHang Liu <yanghliu@redhat.com> --- include/libvirt/libvirt-domain.h | 11 +++++++++++ src/libvirt-domain.c | 20 ++++++++++++++++++++ src/qemu/qemu_migration.h | 1 + src/qemu/qemu_migration_params.c | 8 +++++++- src/qemu/qemu_migration_params.h | 1 + 5 files changed, 40 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 2f5b01bbfe..9543629f30 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1100,6 +1100,17 @@ typedef enum { * Since: 8.5.0 */ VIR_MIGRATE_ZEROCOPY = (1 << 20), + + /* Use switchover ack migration capability to reduce downtime on VFIO + * device migration. This prevents the source from stopping the VM and + * completing the migration until an ACK is received from the destination + * that it's OK to do so. Thus, a VFIO device can make sure that its + * initial bytes were sent and loaded in the destination before the + * source VM is stopped. + * + * Since: 10.5.0 + */ + VIR_MIGRATE_SWITCHOVER_ACK = (1 << 21), } virDomainMigrateFlags;
Do we really need a flag for this ? Is there a credible scenario in which this flag works, and yet shouldn't be used by libvirt ?
IOW, can we just "do the right thing" and always enable this, except for TUNNELLED mode.
The flag makes the error handling in src/libvirt-domain.c a bit smoother, but if we dropped that, we could simply depend on the “party” logic in qemuMigrationParamsFlagMap to do the right thing, which as far as I can tell does exactly what you’re talking about. In short, if we are OK with dropping the error handling given by VIR_EXCLUSIVE_FLAGS_GOTO(), then we could simplify this. Happy to go either way with this, let me know what you think about the error handling comments above. Thanks for the quick review - Jon
With regards, Daniel -- |: https://urldefense.proofpoint.com/v2/url?u=https-3A__berrange.com&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=dejjCPlhp3d29jOJm8xIAoRxolMcjVexK07G3qdQt5A&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__www.flickr.com_photos_dberrange&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=y0K6Ram4tL7Uu74R-yqo-gibJM9XxrdVy5eB8D8DGCQ&e= :| |: https://urldefense.proofpoint.com/v2/url?u=https-3A__libvirt.org&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=o2SVP1vo3ALW1s_3cY_vd2DKCHcbK-zMR9Mpz6X5NzY&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__fstop138.berrange.com&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=Xn9lrZVPIT-j2dxLgZ44mqA2eOqDAZ5tIInN9zNMNT4&e= :| |: https://urldefense.proofpoint.com/v2/url?u=https-3A__entangle-2Dphoto.org&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=gR-qzVxTnd6m6kXnUhxi0xNjkEphbpwc6SdNMmhBb_s&e= -o- https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instagram.com_dberrange&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=IK96YahScD6JLFH7zWWs9JorFqX8nJEfoxk6A_fdNOKH6x1TW4o3ACyknvLMQN-K&s=SbmO2KPGog2k1fYbN30msRFiwCUndhS_KyizZSGB4mI&e= :|