The logic of this commit looks correct to me and I have tested this commit
on the Nvidia RTX A5000 and it works well!
Tested-by: Zhiyi Guo <zhguo(a)redhat.com>
Reviewed-by: Zhiyi Guo <zhguo(a)redhat.com>
Regards,
Zhiyi
On Wed, Jan 8, 2025 at 1:44 PM Laine Stump <laine(a)redhat.com> wrote:
Ping. It would be nice to get this into the upcoming release.
On 12/13/24 1:07 PM, Laine Stump wrote:
> GPU vendors are moving away from using mdev to create virtual GPUs
> towards using SRIOV VFs that are vGPUs. In both cases, once created
> the vGPUs are assigned to guests via <hostdev> (i.e. VFIO device
> assignment), and inside the guest the devices look identical, but mdev
> vGPUs are located by QEMU/VFIO using a uuid, while VF vGPUs are
> located with a PCI address. So although we generally require the
> device on the source host to exactly match the device on the
> destination host, in the case of mdev-created vGPU vs. VF vGPU
> migration *can* potentially work, except that libvirt has a hard-coded
> check that prevents us from even trying.
>
> This patch loosens up that check so that we will allow attempts to
> migrate a guest from a source host that has mdev-created vGPUs to a
> destination host that has VF vGPUs (and vice versa). The expectation
> is that if this doesn't actually work then QEMU will fail and generate
> an error that we can report.
>
> Based-on-patch-by: Zhiyi Guo <zhguo(a)redhat.com>
> Signed-off-by: Laine Stump <laine(a)redhat.com>
> ---
>
> Zhiyi's original patch removed the check for subsys type completely,
> and this worked. My modified patch keeps the check in place, but
> allows it to pass if the src type is pci and dst is mdev, or vice
> versa.
>
> src/conf/domain_conf.c | 28 +++++++++++++++++++++-------
> 1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 4ad8289b89..9d5fda0469 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -20647,13 +20647,27 @@
virDomainHostdevDefCheckABIStability(virDomainHostdevDef *src,
> return false;
> }
>
> - if (src->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
> - src->source.subsys.type != dst->source.subsys.type) {
> - virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> - _("Target host device subsystem %1$s does not
match source %2$s"),
> -
virDomainHostdevSubsysTypeToString(dst->source.subsys.type),
> -
virDomainHostdevSubsysTypeToString(src->source.subsys.type));
> - return false;
> + if (src->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) {
> + virDomainHostdevSubsysType srcType = src->source.subsys.type;
> + virDomainHostdevSubsysType dstType = dst->source.subsys.type;
> +
> + /* If the source and destination subsys types aren't the same,
> + * then migration can't be supported, *except* that it might
> + * be supported to migrate from subsys type 'pci' to
'mdev'
> + * and vice versa. (libvirt can't know for certain whether or
> + * not it will actually work, so we have to just allow it and
> + * count on QEMU to provide us with an error if it fails)
> + */
> +
> + if (srcType != dstType
> + && ((srcType != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
srcType != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV)
> + || (dstType != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
dstType != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV))) {
> + virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> + _("Target host device subsystem type %1$s is
not compatible with source subsystem type %2$s"),
> + virDomainHostdevSubsysTypeToString(dstType),
> + virDomainHostdevSubsysTypeToString(srcType));
> + return false;
> + }
> }
>
> if (!virDomainDeviceInfoCheckABIStability(src->info, dst->info))