On Jun 18, 2024, at 11:14 AM, Daniel P. Berrangé
<berrange(a)redhat.com> wrote:
!-------------------------------------------------------------------|
CAUTION: External Email
|-------------------------------------------------------------------!
On Tue, Jun 18, 2024 at 08:06:06AM -0700, Jon Kohler wrote:
> Add plumbing for QEMU's switchover-ack migration capability, which
> helps lower the downtime during VFIO migrations. This capability is
> enabled by default as long as both the source and destination support
> it.
>
> Note: switchover-ack depends on the return path capability, so this may
> not be used when VIR_MIGRATE_TUNNELLED flag is set.
>
> Extensive details about the qemu switchover-ack implementation are
> available in the qemu series v6 cover letter [1] where the highlight is
> the extreme reduction in guest visible downtime. In addition to the
> original test results below, I saw a roughly ~20% reduction in downtime
> for VFIO VGPU devices at minimum.
>
> === Test results ===
>
> The below table shows the downtime of two identical migrations. In the
> first migration swithcover ack is disabled and in the second it is
> enabled. The migrated VM is assigned with a mlx5 VFIO device which has
> 300MB of device data to be migrated.
>
> +----------------------+-----------------------+----------+
> | Switchover ack | VFIO device data size | Downtime |
> +----------------------+-----------------------+----------+
> | Disabled | 300MB | 1900ms |
> | Enabled | 300MB | 420ms |
> +----------------------+-----------------------+----------+
>
> Switchover ack gives a roughly 4.5 times improvement in downtime.
> The 1480ms difference is time that is used for resource allocation for
> the VFIO device in the destination. Without switchover ack, this time is
> spent when the source VM is stopped and thus the downtime is much
> higher. With switchover ack, the time is spent when the source VM is
> still running.
>
> [1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org...
>
> Signed-off-by: Jon Kohler <jon(a)nutanix.com>
> Cc: Alex Williamson <alex.williamson(a)redhat.com>
> Cc: Avihai Horon <avihaih(a)nvidia.com>
> Cc: Markus Armbruster <armbru(a)redhat.com>
> Cc: Peter Xu <peterx(a)redhat.com>
> Cc: YangHang Liu <yanghliu(a)redhat.com>
> ---
> include/libvirt/libvirt-domain.h | 11 +++++++++++
> src/libvirt-domain.c | 20 ++++++++++++++++++++
> src/qemu/qemu_migration.h | 1 +
> src/qemu/qemu_migration_params.c | 8 +++++++-
> src/qemu/qemu_migration_params.h | 1 +
> 5 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h
> index 2f5b01bbfe..9543629f30 100644
> --- a/include/libvirt/libvirt-domain.h
> +++ b/include/libvirt/libvirt-domain.h
> @@ -1100,6 +1100,17 @@ typedef enum {
> * Since: 8.5.0
> */
> VIR_MIGRATE_ZEROCOPY = (1 << 20),
> +
> + /* Use switchover ack migration capability to reduce downtime on VFIO
> + * device migration. This prevents the source from stopping the VM and
> + * completing the migration until an ACK is received from the destination
> + * that it's OK to do so. Thus, a VFIO device can make sure that its
> + * initial bytes were sent and loaded in the destination before the
> + * source VM is stopped.
> + *
> + * Since: 10.5.0
> + */
> + VIR_MIGRATE_SWITCHOVER_ACK = (1 << 21),
> } virDomainMigrateFlags;
Do we really need a flag for this ? Is there a credible scenario
in which this flag works, and yet shouldn't be used by libvirt ?
IOW, can we just "do the right thing" and always enable this,
except for TUNNELLED mode.
The flag makes the error handling in src/libvirt-domain.c a bit
smoother, but if we dropped that, we could simply depend on
the “party” logic in qemuMigrationParamsFlagMap to do the
right thing, which as far as I can tell does exactly what you’re
talking about.
In short, if we are OK with dropping the error handling given
by VIR_EXCLUSIVE_FLAGS_GOTO(), then we could
simplify this.
Happy to go either way with this, let me know what you think
about the error handling comments above.
Thanks for the quick review -
Jon