The logic of this commit looks correct to me and I have tested this commit on the Nvidia RTX A5000 and it works well!

Tested-by: Zhiyi Guo <zhguo@redhat.com>
Reviewed-by: Zhiyi Guo <zhguo@redhat.com>

Regards,
Zhiyi

On Wed, Jan 8, 2025 at 1:44 PM Laine Stump <laine@redhat.com> wrote:
Ping. It would be nice to get this into the upcoming release.


On 12/13/24 1:07 PM, Laine Stump wrote:
> GPU vendors are moving away from using mdev to create virtual GPUs
> towards using SRIOV VFs that are vGPUs. In both cases, once created
> the vGPUs are assigned to guests via <hostdev> (i.e. VFIO device
> assignment), and inside the guest the devices look identical, but mdev
> vGPUs are located by QEMU/VFIO using a uuid, while VF vGPUs are
> located with a PCI address. So although we generally require the
> device on the source host to exactly match the device on the
> destination host, in the case of mdev-created vGPU vs. VF vGPU
> migration *can* potentially work, except that libvirt has a hard-coded
> check that prevents us from even trying.
>
> This patch loosens up that check so that we will allow attempts to
> migrate a guest from a source host that has mdev-created vGPUs to a
> destination host that has VF vGPUs (and vice versa). The expectation
> is that if this doesn't actually work then QEMU will fail and generate
> an error that we can report.
>
> Based-on-patch-by: Zhiyi Guo <zhguo@redhat.com>
> Signed-off-by: Laine Stump <laine@redhat.com>
> ---
>
> Zhiyi's original patch removed the check for subsys type completely,
> and this worked. My modified patch keeps the check in place, but
> allows it to pass if the src type is pci and dst is mdev, or vice
> versa.
>
>   src/conf/domain_conf.c | 28 +++++++++++++++++++++-------
>   1 file changed, 21 insertions(+), 7 deletions(-)
>
> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> index 4ad8289b89..9d5fda0469 100644
> --- a/src/conf/domain_conf.c
> +++ b/src/conf/domain_conf.c
> @@ -20647,13 +20647,27 @@ virDomainHostdevDefCheckABIStability(virDomainHostdevDef *src,
>           return false;
>       }
>   
> -    if (src->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
> -        src->source.subsys.type != dst->source.subsys.type) {
> -        virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> -                       _("Target host device subsystem %1$s does not match source %2$s"),
> -                       virDomainHostdevSubsysTypeToString(dst->source.subsys.type),
> -                       virDomainHostdevSubsysTypeToString(src->source.subsys.type));
> -        return false;
> +    if (src->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS) {
> +        virDomainHostdevSubsysType srcType = src->source.subsys.type;
> +        virDomainHostdevSubsysType dstType = dst->source.subsys.type;
> +
> +        /* If the source and destination subsys types aren't the same,
> +         * then migration can't be supported, *except* that it might
> +         * be supported to migrate from subsys type 'pci' to 'mdev'
> +         * and vice versa. (libvirt can't know for certain whether or
> +         * not it will actually work, so we have to just allow it and
> +         * count on QEMU to provide us with an error if it fails)
> +         */
> +
> +        if (srcType != dstType
> +            && ((srcType != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && srcType != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV)
> +                || (dstType != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI && dstType != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_MDEV))) {
> +            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
> +                           _("Target host device subsystem type %1$s is not compatible with source subsystem type %2$s"),
> +                           virDomainHostdevSubsysTypeToString(dstType),
> +                           virDomainHostdevSubsysTypeToString(srcType));
> +            return false;
> +        }
>       }
>   
>       if (!virDomainDeviceInfoCheckABIStability(src->info, dst->info))