[libvirt PATCH v3 0/8] Support for VFIO variant drivers, Part 1

A "VFIO variant" driver is a kernel driver for a device that supports all the APIs of the basic vfio-pci driver (which enables assigning a host PCI device to a QEMU guest) plus "some extra stuff" (e.g. to enable things like saving/restoring device state in order to support live migration.) Way back last year I posted a couple attempts to support VFIO variant drivers; here is V2 (along with a later followup discussion from a couple months ago): https://listman.redhat.com/archives/libvir-list/2022-August/233661.html https://listman.redhat.com/archives/libvir-list/2023-May/240108.html The mlx5-vfio-pci driver has now been upstream for quite awhile (and even in the downstream Fedora 38 kernel, for example), as are the sysfs bits that allow us to determine whether or not a driver is a VFIO variant, and I've updated the patch(es) to use this. I've also been working on auto-binding to the "best-match" VFIO variant driver based on comparing the device's modalias file in sysfs to the contents of the kernel's modules.alias file, but that isn't quite ready (partly code that isn't yet working, but also partly indecision about exactly where in the XML to put the driver name when it is specified; I won't take up more space here with that though). In the meantime, there are people who want to use the mlx5-vfio-pci driver (and Cedric Le Goater also has written vfio-pci-igbvf and vfio-pci-e1000e drivers (which area very useful for testing), although I don't think he has posted them anywhere yet), so I would like to get the basic patches here merged in upstream now while I continue working on "Part 2". These patches provide two improvements that make testing/using VFIO drivers much more convenient: 1) The specific driver can be given in the virsh nodedev-detach command (or the virNodeDeviceDetachFlags() API call), e.g.: virsh nodedev-detach pci_0000_04_11_5 --driver vfio-pci-igbvf 2) If the <hostdev> (or "<interface> ... <type='hostdev'/>" has "managed='no'", then libvirt will recognize any VFIO variant driver (rather than the current behavior of rejecting anything that isn't exactly "vfio-pci") With these two capabilities, it's simple and straightforward to bind a device to a VFIO variant driver, and then start a guest that uses that device. Change in V2: * complete remake, more refactoring * use existence of "vfio-dev" subdirectory of device directory in sysfs to determine whether the currently-bound driver is a vfio variant. * support binding to a user-specified driver during nodedev-detach, rather than only supporting vfio-pci. Laine Stump (8): util: use "stubDriverType" instead of just "stubDriver" util: add stub driver name to virPCIDevice object util: rename virPCIDeviceGetDriverPathAndName util: permit existing binding to VFIO variant driver util: probe stub driver from within function that binds to stub driver util: honor stubDriverName when probing/binding stub driver for a device node_device: support binding other drivers with virNodeDeviceDetachFlags() qemu: turn two multiline log messages into single line src/hypervisor/domain_driver.c | 9 +- src/hypervisor/domain_driver.h | 2 + src/hypervisor/virhostdev.c | 35 +++----- src/libvirt_private.syms | 9 +- src/libxl/libxl_driver.c | 3 +- src/qemu/qemu_driver.c | 37 ++++---- src/util/virnvme.c | 2 +- src/util/virpci.c | 156 +++++++++++++++++++++++++-------- src/util/virpci.h | 18 ++-- tests/virhostdevtest.c | 2 +- tests/virpcitest.c | 10 +-- 11 files changed, 185 insertions(+), 98 deletions(-) -- 2.41.0

In the past we just kept track of the type of the "stub driver" (the driver that is bound to a device in order to assign it to a guest). The next commit will add a stubDriverName to go along with type, so lets use stubDriverType for the existing enum to make it easier to keep track of whether we're talking about the name or the type. Signed-off-by: Laine Stump <laine@redhat.com> --- src/hypervisor/domain_driver.c | 4 ++-- src/hypervisor/virhostdev.c | 8 ++++---- src/libvirt_private.syms | 4 ++-- src/util/virnvme.c | 2 +- src/util/virpci.c | 16 ++++++++-------- src/util/virpci.h | 6 +++--- tests/virhostdevtest.c | 2 +- tests/virpcitest.c | 8 ++++---- 8 files changed, 25 insertions(+), 25 deletions(-) diff --git a/src/hypervisor/domain_driver.c b/src/hypervisor/domain_driver.c index 66e09ffb04..a70f75f3ae 100644 --- a/src/hypervisor/domain_driver.c +++ b/src/hypervisor/domain_driver.c @@ -505,9 +505,9 @@ virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, return -1; if (STREQ(driverName, "vfio")) - virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriverType(pci, VIR_PCI_STUB_DRIVER_VFIO); else if (STREQ(driverName, "xen")) - virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_XEN); + virPCIDeviceSetStubDriverType(pci, VIR_PCI_STUB_DRIVER_XEN); return virHostdevPCINodeDeviceDetach(hostdevMgr, pci); } diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index eac3474783..c437ca9d22 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -244,9 +244,9 @@ virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev, virPCIDeviceSetManaged(actual, hostdev->managed); if (pcisrc->backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) { - virPCIDeviceSetStubDriver(actual, VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriverType(actual, VIR_PCI_STUB_DRIVER_VFIO); } else if (pcisrc->backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_XEN) { - virPCIDeviceSetStubDriver(actual, VIR_PCI_STUB_DRIVER_XEN); + virPCIDeviceSetStubDriverType(actual, VIR_PCI_STUB_DRIVER_XEN); } else { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("pci backend driver '%1$s' is not supported"), @@ -679,7 +679,7 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, for (i = 0; i < virPCIDeviceListCount(pcidevs); i++) { virPCIDevice *pci = virPCIDeviceListGet(pcidevs, i); bool strict_acs_check = !!(flags & VIR_HOSTDEV_STRICT_ACS_CHECK); - bool usesVFIO = (virPCIDeviceGetStubDriver(pci) == VIR_PCI_STUB_DRIVER_VFIO); + bool usesVFIO = (virPCIDeviceGetStubDriverType(pci) == VIR_PCI_STUB_DRIVER_VFIO); struct virHostdevIsPCINodeDeviceUsedData data = {mgr, drv_name, dom_name, false}; int hdrType = -1; @@ -776,7 +776,7 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, /* The device is bound to a known stub driver: store this * information and add a copy to the inactive list */ - virPCIDeviceSetStubDriver(pci, stub); + virPCIDeviceSetStubDriverType(pci, stub); VIR_DEBUG("Adding PCI device %s to inactive list", virPCIDeviceGetName(pci)); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index da60c965dd..983109df86 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3072,7 +3072,7 @@ virPCIDeviceGetManaged; virPCIDeviceGetName; virPCIDeviceGetRemoveSlot; virPCIDeviceGetReprobe; -virPCIDeviceGetStubDriver; +virPCIDeviceGetStubDriverType; virPCIDeviceGetUnbindFromStub; virPCIDeviceGetUsedBy; virPCIDeviceGetVPD; @@ -3098,7 +3098,7 @@ virPCIDeviceReset; virPCIDeviceSetManaged; virPCIDeviceSetRemoveSlot; virPCIDeviceSetReprobe; -virPCIDeviceSetStubDriver; +virPCIDeviceSetStubDriverType; virPCIDeviceSetUnbindFromStub; virPCIDeviceSetUsedBy; virPCIDeviceUnbind; diff --git a/src/util/virnvme.c b/src/util/virnvme.c index f7f8dc5ea9..37333d515b 100644 --- a/src/util/virnvme.c +++ b/src/util/virnvme.c @@ -292,7 +292,7 @@ virNVMeDeviceCreatePCIDevice(const virNVMeDevice *nvme) return NULL; /* NVMe devices must be bound to vfio */ - virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriverType(pci, VIR_PCI_STUB_DRIVER_VFIO); virPCIDeviceSetManaged(pci, nvme->managed); return g_steal_pointer(&pci); diff --git a/src/util/virpci.c b/src/util/virpci.c index cc2b07bbba..88a020fb86 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -87,7 +87,7 @@ struct _virPCIDevice { bool managed; - virPCIStubDriver stubDriver; + virPCIStubDriver stubDriverType; /* used by reattach function */ bool unbind_from_stub; @@ -1233,12 +1233,12 @@ virPCIDeviceBindToStub(virPCIDevice *dev) g_autofree char *driverLink = NULL; /* Check the device is configured to use one of the known stub drivers */ - if (dev->stubDriver == VIR_PCI_STUB_DRIVER_NONE) { + if (dev->stubDriverType == VIR_PCI_STUB_DRIVER_NONE) { virReportError(VIR_ERR_INTERNAL_ERROR, _("No stub driver configured for PCI device %1$s"), dev->name); return -1; - } else if (!(stubDriverName = virPCIStubDriverTypeToString(dev->stubDriver))) { + } else if (!(stubDriverName = virPCIStubDriverTypeToString(dev->stubDriverType))) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Unknown stub driver configured for PCI device %1$s"), dev->name); @@ -1287,7 +1287,7 @@ virPCIDeviceDetach(virPCIDevice *dev, virPCIDeviceList *activeDevs, virPCIDeviceList *inactiveDevs) { - if (virPCIProbeStubDriver(dev->stubDriver) < 0) + if (virPCIProbeStubDriver(dev->stubDriverType) < 0) return -1; if (activeDevs && virPCIDeviceListFind(activeDevs, &dev->address)) { @@ -1569,15 +1569,15 @@ virPCIDeviceGetManaged(virPCIDevice *dev) } void -virPCIDeviceSetStubDriver(virPCIDevice *dev, virPCIStubDriver driver) +virPCIDeviceSetStubDriverType(virPCIDevice *dev, virPCIStubDriver driverType) { - dev->stubDriver = driver; + dev->stubDriverType = driverType; } virPCIStubDriver -virPCIDeviceGetStubDriver(virPCIDevice *dev) +virPCIDeviceGetStubDriverType(virPCIDevice *dev) { - return dev->stubDriver; + return dev->stubDriverType; } bool diff --git a/src/util/virpci.h b/src/util/virpci.h index 4d9193f24e..485f535bc9 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -134,9 +134,9 @@ int virPCIDeviceReset(virPCIDevice *dev, void virPCIDeviceSetManaged(virPCIDevice *dev, bool managed); bool virPCIDeviceGetManaged(virPCIDevice *dev); -void virPCIDeviceSetStubDriver(virPCIDevice *dev, - virPCIStubDriver driver); -virPCIStubDriver virPCIDeviceGetStubDriver(virPCIDevice *dev); +void virPCIDeviceSetStubDriverType(virPCIDevice *dev, + virPCIStubDriver driverType); +virPCIStubDriver virPCIDeviceGetStubDriverType(virPCIDevice *dev); virPCIDeviceAddress *virPCIDeviceGetAddress(virPCIDevice *dev); int virPCIDeviceSetUsedBy(virPCIDevice *dev, const char *drv_name, diff --git a/tests/virhostdevtest.c b/tests/virhostdevtest.c index ee0d1c1e6b..04e6c00908 100644 --- a/tests/virhostdevtest.c +++ b/tests/virhostdevtest.c @@ -142,7 +142,7 @@ myInit(void) if (!(dev[i] = virPCIDeviceNew(&subsys->u.pci.addr))) goto cleanup; - virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriverType(dev[i], VIR_PCI_STUB_DRIVER_VFIO); } for (i = 0; i < ndisks; i++) { diff --git a/tests/virpcitest.c b/tests/virpcitest.c index 769175d7c4..92cc8c07c6 100644 --- a/tests/virpcitest.c +++ b/tests/virpcitest.c @@ -107,7 +107,7 @@ testVirPCIDeviceDetach(const void *opaque G_GNUC_UNUSED) if (!(dev[i] = virPCIDeviceNew(&devAddr))) goto cleanup; - virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriverType(dev[i], VIR_PCI_STUB_DRIVER_VFIO); if (virPCIDeviceDetach(dev[i], activeDevs, inactiveDevs) < 0) goto cleanup; @@ -149,7 +149,7 @@ testVirPCIDeviceReset(const void *opaque G_GNUC_UNUSED) if (!(dev[i] = virPCIDeviceNew(&devAddr))) goto cleanup; - virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriverType(dev[i], VIR_PCI_STUB_DRIVER_VFIO); if (virPCIDeviceReset(dev[i], activeDevs, inactiveDevs) < 0) goto cleanup; @@ -190,7 +190,7 @@ testVirPCIDeviceReattach(const void *opaque G_GNUC_UNUSED) CHECK_LIST_COUNT(activeDevs, 0); CHECK_LIST_COUNT(inactiveDevs, i + 1); - virPCIDeviceSetStubDriver(dev[i], VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriverType(dev[i], VIR_PCI_STUB_DRIVER_VFIO); } CHECK_LIST_COUNT(activeDevs, 0); @@ -249,7 +249,7 @@ testVirPCIDeviceDetachSingle(const void *opaque) if (!dev) goto cleanup; - virPCIDeviceSetStubDriver(dev, VIR_PCI_STUB_DRIVER_VFIO); + virPCIDeviceSetStubDriverType(dev, VIR_PCI_STUB_DRIVER_VFIO); if (virPCIDeviceDetach(dev, NULL, NULL) < 0) goto cleanup; -- 2.41.0

On 8/21/23 21:32, Laine Stump wrote:
In the past we just kept track of the type of the "stub driver" (the driver that is bound to a device in order to assign it to a guest). The next commit will add a stubDriverName to go along with type, so lets use stubDriverType for the existing enum to make it easier to keep track of whether we're talking about the name or the type.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/hypervisor/domain_driver.c | 4 ++-- src/hypervisor/virhostdev.c | 8 ++++---- src/libvirt_private.syms | 4 ++-- src/util/virnvme.c | 2 +- src/util/virpci.c | 16 ++++++++-------- src/util/virpci.h | 6 +++--- tests/virhostdevtest.c | 2 +- tests/virpcitest.c | 8 ++++---- 8 files changed, 25 insertions(+), 25 deletions(-)
There's one more occurrence: diff --git i/src/util/virpci.c w/src/util/virpci.c index 88a020fb86..d86a81c2b1 100644 --- i/src/util/virpci.c +++ w/src/util/virpci.c @@ -1267,9 +1267,10 @@ virPCIDeviceBindToStub(virPCIDevice *dev) /* virPCIDeviceDetach: * * Detach this device from the host driver, attach it to the stub - * driver (previously set with virPCIDeviceSetStubDriver(), and add *a - * copy* of the object to the inactiveDevs list (if provided). This - * function will *never* consume dev, so the caller should free it. + * driver (previously set with virPCIDeviceSetStubDriverType(), and + * add *a copy* of the object to the inactiveDevs list (if provided). + * This function will *never* consume dev, so the caller should free + * it. * * Returns 0 on success, -1 on failure (will fail if the device is * already in the activeDevs list, but will be a NOP if the device is Michal

There can be many different drivers that are of the type "VFIO", so add the driver name to the object and allow getting/setting it. Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 2 ++ src/util/virpci.c | 16 ++++++++++++++++ src/util/virpci.h | 3 +++ 3 files changed, 21 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 983109df86..fad5389d68 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3072,6 +3072,7 @@ virPCIDeviceGetManaged; virPCIDeviceGetName; virPCIDeviceGetRemoveSlot; virPCIDeviceGetReprobe; +virPCIDeviceGetStubDriverName; virPCIDeviceGetStubDriverType; virPCIDeviceGetUnbindFromStub; virPCIDeviceGetUsedBy; @@ -3098,6 +3099,7 @@ virPCIDeviceReset; virPCIDeviceSetManaged; virPCIDeviceSetRemoveSlot; virPCIDeviceSetReprobe; +virPCIDeviceSetStubDriverName; virPCIDeviceSetStubDriverType; virPCIDeviceSetUnbindFromStub; virPCIDeviceSetUsedBy; diff --git a/src/util/virpci.c b/src/util/virpci.c index 88a020fb86..103bc4254e 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -88,6 +88,7 @@ struct _virPCIDevice { bool managed; virPCIStubDriver stubDriverType; + char *stubDriverName; /* if blank, use default for type */ /* used by reattach function */ bool unbind_from_stub; @@ -1507,6 +1508,7 @@ virPCIDeviceCopy(virPCIDevice *dev) copy->path = g_strdup(dev->path); copy->used_by_drvname = g_strdup(dev->used_by_drvname); copy->used_by_domname = g_strdup(dev->used_by_domname); + copy->stubDriverName = g_strdup(dev->stubDriverName); return copy; } @@ -1521,6 +1523,7 @@ virPCIDeviceFree(virPCIDevice *dev) g_free(dev->path); g_free(dev->used_by_drvname); g_free(dev->used_by_domname); + g_free(dev->stubDriverName); g_free(dev); } @@ -1580,6 +1583,19 @@ virPCIDeviceGetStubDriverType(virPCIDevice *dev) return dev->stubDriverType; } +void +virPCIDeviceSetStubDriverName(virPCIDevice *dev, + const char *driverName) +{ + dev->stubDriverName = g_strdup(driverName); +} + +const char * +virPCIDeviceGetStubDriverName(virPCIDevice *dev) +{ + return dev->stubDriverName; +} + bool virPCIDeviceGetUnbindFromStub(virPCIDevice *dev) { diff --git a/src/util/virpci.h b/src/util/virpci.h index 485f535bc9..f8f98f39de 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -137,6 +137,9 @@ bool virPCIDeviceGetManaged(virPCIDevice *dev); void virPCIDeviceSetStubDriverType(virPCIDevice *dev, virPCIStubDriver driverType); virPCIStubDriver virPCIDeviceGetStubDriverType(virPCIDevice *dev); +void virPCIDeviceSetStubDriverName(virPCIDevice *dev, + const char *driverName); +const char *virPCIDeviceGetStubDriverName(virPCIDevice *dev); virPCIDeviceAddress *virPCIDeviceGetAddress(virPCIDevice *dev); int virPCIDeviceSetUsedBy(virPCIDevice *dev, const char *drv_name, -- 2.41.0

On 8/21/23 21:32, Laine Stump wrote:
There can be many different drivers that are of the type "VFIO", so add the driver name to the object and allow getting/setting it.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 2 ++ src/util/virpci.c | 16 ++++++++++++++++ src/util/virpci.h | 3 +++ 3 files changed, 21 insertions(+)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 88a020fb86..103bc4254e 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c
@@ -1580,6 +1583,19 @@ virPCIDeviceGetStubDriverType(virPCIDevice *dev) return dev->stubDriverType; }
+void +virPCIDeviceSetStubDriverName(virPCIDevice *dev, + const char *driverName) +{ + dev->stubDriverName = g_strdup(driverName); +}
I think it's worth freeing dev->stubDriverName before setting another one. Please consider squashing this in: diff --git i/src/util/virpci.c w/src/util/virpci.c index 3f207a24f3..15e53e6749 100644 --- i/src/util/virpci.c +++ w/src/util/virpci.c @@ -1586,8 +1586,9 @@ virPCIDeviceGetStubDriverType(virPCIDevice *dev) void virPCIDeviceSetStubDriverName(virPCIDevice *dev, - const char *driverName) + const char *driverName) { + g_free(dev->stubDriverName); dev->stubDriverName = g_strdup(driverName); } And just a thought (maybe it was covered in earlier discussions, maybe it'll be covered in next patches) - what about the following scenario: virPCIDeviceSetStubDriverType(&dev, VIR_PCI_STUB_DRIVER_VFIO); virPCIDeviceSetStubDriverName(&dev, "blah"); Should the latter reset dev->stubDriverType to _NONE? Similarly, what if the two are reversed? But I guess we can always fine tune this later. Michal

On 8/23/23 3:52 AM, Michal Prívozník wrote:
On 8/21/23 21:32, Laine Stump wrote:
There can be many different drivers that are of the type "VFIO", so add the driver name to the object and allow getting/setting it.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/libvirt_private.syms | 2 ++ src/util/virpci.c | 16 ++++++++++++++++ src/util/virpci.h | 3 +++ 3 files changed, 21 insertions(+)
diff --git a/src/util/virpci.c b/src/util/virpci.c index 88a020fb86..103bc4254e 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c
@@ -1580,6 +1583,19 @@ virPCIDeviceGetStubDriverType(virPCIDevice *dev) return dev->stubDriverType; }
+void +virPCIDeviceSetStubDriverName(virPCIDevice *dev, + const char *driverName) +{ + dev->stubDriverName = g_strdup(driverName); +}
I think it's worth freeing dev->stubDriverName before setting another one. Please consider squashing this in:
diff --git i/src/util/virpci.c w/src/util/virpci.c index 3f207a24f3..15e53e6749 100644 --- i/src/util/virpci.c +++ w/src/util/virpci.c @@ -1586,8 +1586,9 @@ virPCIDeviceGetStubDriverType(virPCIDevice *dev)
void virPCIDeviceSetStubDriverName(virPCIDevice *dev, - const char *driverName) + const char *driverName) { + g_free(dev->stubDriverName); dev->stubDriverName = g_strdup(driverName); }
Sure, that seems prudent.
And just a thought (maybe it was covered in earlier discussions, maybe it'll be covered in next patches) - what about the following scenario:
virPCIDeviceSetStubDriverType(&dev, VIR_PCI_STUB_DRIVER_VFIO); virPCIDeviceSetStubDriverName(&dev, "blah");
Should the latter reset dev->stubDriverType to _NONE? Similarly, what if the two are reversed? But I guess we can always fine tune this later.
Although the two settings are conceptually intertwined, I think the APIs should be orthogonal, with the setting of one not affecting the other; otherwise people would get confused about which one should come first and/or be surprised when it didn't do what they wanted.

Instead, call it virPCIDeviceGetCurrentDriverPathAndName() to avoid confusion with the device name that is stored in the virPCIDevice object - that one is not necessarily the name of the current driver for the device, but could instead be the driver that we want to be bound to the device in the future. Signed-off-by: Laine Stump <laine@redhat.com> --- src/hypervisor/virhostdev.c | 7 ++++--- src/libvirt_private.syms | 2 +- src/util/virpci.c | 10 ++++++---- src/util/virpci.h | 6 +++--- tests/virpcitest.c | 2 +- 5 files changed, 15 insertions(+), 12 deletions(-) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index c437ca9d22..244f057c6c 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -765,9 +765,10 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, * information about active / inactive device across * daemon restarts has been implemented */ - if (virPCIDeviceGetDriverPathAndName(pci, - &driverPath, &driverName) < 0) + if (virPCIDeviceGetCurrentDriverPathAndName(pci, &driverPath, + &driverName) < 0) { goto reattachdevs; + } stub = virPCIStubDriverTypeFromString(driverName); @@ -2294,7 +2295,7 @@ virHostdevPrepareOneNVMeDevice(virHostdevManager *hostdev_mgr, g_autofree char *drvName = NULL; int stub = VIR_PCI_STUB_DRIVER_NONE; - if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) + if (virPCIDeviceGetCurrentDriverPathAndName(pci, &drvPath, &drvName) < 0) goto cleanup; if (drvName) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index fad5389d68..2b577c4e2d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3064,7 +3064,7 @@ virPCIDeviceFileIterate; virPCIDeviceFree; virPCIDeviceGetAddress; virPCIDeviceGetConfigPath; -virPCIDeviceGetDriverPathAndName; +virPCIDeviceGetCurrentDriverPathAndName; virPCIDeviceGetIOMMUGroupDev; virPCIDeviceGetIOMMUGroupList; virPCIDeviceGetLinkCapSta; diff --git a/src/util/virpci.c b/src/util/virpci.c index 103bc4254e..2ec0dc2053 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -228,7 +228,7 @@ virPCIFile(const char *device, const char *file) } -/* virPCIDeviceGetDriverPathAndName - put the path to the driver +/* virPCIDeviceGetCurrentDriverPathAndName - put the path to the driver * directory of the driver in use for this device in @path and the * name of the driver in @name. Both could be NULL if it's not bound * to any driver. @@ -236,7 +236,9 @@ virPCIFile(const char *device, const char *file) * Return 0 for success, -1 for error. */ int -virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, char **path, char **name) +virPCIDeviceGetCurrentDriverPathAndName(virPCIDevice *dev, + char **path, + char **name) { int ret = -1; g_autofree char *drvlink = NULL; @@ -1032,7 +1034,7 @@ virPCIDeviceReset(virPCIDevice *dev, * reset it whenever appropriate, so doing it ourselves would just * be redundant. */ - if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, &drvName) < 0) + if (virPCIDeviceGetCurrentDriverPathAndName(dev, &drvPath, &drvName) < 0) goto cleanup; if (virPCIStubDriverTypeFromString(drvName) == VIR_PCI_STUB_DRIVER_VFIO) { @@ -1137,7 +1139,7 @@ virPCIDeviceUnbind(virPCIDevice *dev) g_autofree char *drvpath = NULL; g_autofree char *driver = NULL; - if (virPCIDeviceGetDriverPathAndName(dev, &drvpath, &driver) < 0) + if (virPCIDeviceGetCurrentDriverPathAndName(dev, &drvpath, &driver) < 0) return -1; if (!driver) diff --git a/src/util/virpci.h b/src/util/virpci.h index f8f98f39de..19c910202a 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -280,9 +280,9 @@ virPCIVPDResource * virPCIDeviceGetVPD(virPCIDevice *dev); int virPCIDeviceUnbind(virPCIDevice *dev); int virPCIDeviceRebind(virPCIDevice *dev); -int virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, - char **path, - char **name); +int virPCIDeviceGetCurrentDriverPathAndName(virPCIDevice *dev, + char **path, + char **name); int virPCIDeviceIsPCIExpress(virPCIDevice *dev); int virPCIDeviceHasPCIExpressLink(virPCIDevice *dev); diff --git a/tests/virpcitest.c b/tests/virpcitest.c index 92cc8c07c6..d69a1b5118 100644 --- a/tests/virpcitest.c +++ b/tests/virpcitest.c @@ -37,7 +37,7 @@ testVirPCIDeviceCheckDriver(virPCIDevice *dev, const char *expected) g_autofree char *path = NULL; g_autofree char *driver = NULL; - if (virPCIDeviceGetDriverPathAndName(dev, &path, &driver) < 0) + if (virPCIDeviceGetCurrentDriverPathAndName(dev, &path, &driver) < 0) return -1; if (STRNEQ_NULLABLE(driver, expected)) { -- 2.41.0

Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal host driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest. In the past vfio-pci was the only driver that supplied these APIs, but there are now vendor/device-specific "VFIO variant" drivers that provide the basic vfio-pci driver functionality/API while adding support for device-specific operations (for example these device-specific drivers may support live migration of certain devices). All that is needed to make this functionality available is to bind the vendor-specific "VFIO variant" driver to the device (rather than the generic vfio-pci driver, which will continue to work, just without the extra functionality). But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific VFIO variant driver to the device (and puts "managed='no'" in the config to prevent libvirt from changing the binding), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't exactly "vfio-pci". Beginning with kernel 6.1, it's possible to determine from the sysfs directory for a device whether the currently-bound driver is the vfio-pci driver or a VFIO variant - the device directory will have a subdirectory called "vfio-dev". We can use that to appropriately widen the list of drivers that libvirt will allow for VFIO device assignment. This patch doesn't remove the explicit check for the exact "vfio-pci" driver (since that would cause systems with pre-6.1 kernels to behave incorrectly), but adds an additional check for the vfio-dev directory, so that any VFIO variant driver is acceptable for libvirt to continue setting up for VFIO device assignment. Signed-off-by: Laine Stump <laine@redhat.com> --- src/hypervisor/virhostdev.c | 28 +++++-------- src/libvirt_private.syms | 1 + src/util/virpci.c | 78 ++++++++++++++++++++++++++++++++++--- src/util/virpci.h | 3 ++ 4 files changed, 87 insertions(+), 23 deletions(-) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index 244f057c6c..b95d6bf3d6 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -743,9 +743,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, mgr->inactivePCIHostdevs) < 0) goto reattachdevs; } else { - g_autofree char *driverPath = NULL; - g_autofree char *driverName = NULL; - int stub; + g_autofree char *drvName = NULL; + virPCIStubDriver drvType; /* Unmanaged devices should already have been marked as * inactive: if that's the case, we can simply move on */ @@ -765,19 +764,17 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, * information about active / inactive device across * daemon restarts has been implemented */ - if (virPCIDeviceGetCurrentDriverPathAndName(pci, &driverPath, - &driverName) < 0) { + if (virPCIDeviceGetCurrentDriverNameAndType(pci, &drvName, + &drvType) < 0) { goto reattachdevs; } - stub = virPCIStubDriverTypeFromString(driverName); - - if (stub > VIR_PCI_STUB_DRIVER_NONE && - stub < VIR_PCI_STUB_DRIVER_LAST) { + if (drvType > VIR_PCI_STUB_DRIVER_NONE) { /* The device is bound to a known stub driver: store this * information and add a copy to the inactive list */ - virPCIDeviceSetStubDriverType(pci, stub); + virPCIDeviceSetStubDriverType(pci, drvType); + virPCIDeviceSetStubDriverName(pci, drvName); VIR_DEBUG("Adding PCI device %s to inactive list", virPCIDeviceGetName(pci)); @@ -2291,18 +2288,13 @@ virHostdevPrepareOneNVMeDevice(virHostdevManager *hostdev_mgr, /* Let's check if all PCI devices are NVMe disks. */ for (i = 0; i < virPCIDeviceListCount(pciDevices); i++) { virPCIDevice *pci = virPCIDeviceListGet(pciDevices, i); - g_autofree char *drvPath = NULL; g_autofree char *drvName = NULL; - int stub = VIR_PCI_STUB_DRIVER_NONE; + virPCIStubDriver drvType; - if (virPCIDeviceGetCurrentDriverPathAndName(pci, &drvPath, &drvName) < 0) + if (virPCIDeviceGetCurrentDriverNameAndType(pci, &drvName, &drvType) < 0) goto cleanup; - if (drvName) - stub = virPCIStubDriverTypeFromString(drvName); - - if (stub == VIR_PCI_STUB_DRIVER_VFIO || - STREQ_NULLABLE(drvName, "nvme")) + if (drvType == VIR_PCI_STUB_DRIVER_VFIO || STREQ_NULLABLE(drvName, "nvme")) continue; VIR_WARN("Suspicious NVMe disk assignment. PCI device " diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 2b577c4e2d..413985d34c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3064,6 +3064,7 @@ virPCIDeviceFileIterate; virPCIDeviceFree; virPCIDeviceGetAddress; virPCIDeviceGetConfigPath; +virPCIDeviceGetCurrentDriverNameAndType; virPCIDeviceGetCurrentDriverPathAndName; virPCIDeviceGetIOMMUGroupDev; virPCIDeviceGetIOMMUGroupList; diff --git a/src/util/virpci.c b/src/util/virpci.c index 2ec0dc2053..e165725cd9 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -280,6 +280,73 @@ virPCIDeviceGetCurrentDriverPathAndName(virPCIDevice *dev, } +/** + * virPCIDeviceGetCurrentDriverNameAndType: + * @dev: virPCIDevice object to examine + * @drvName: returns name of driver bound to this device (if any) + * @drvType: returns type of driver if it is a known stub driver type + * + * Find the name of the driver bound to @dev (if any) and the type of + * the driver if it is a known/recognized "stub" driver (based on the + * driver name). + * + * There are vfio "variant" drivers that provide all the basic + * functionality of the standard vfio-pci driver as well as additional + * stuff. As of kernel 6.1, the vfio-pci driver and all vfio variant + * drivers can be identified (once the driver has been bound to a + * device) by looking for the subdirectory "vfio-dev" in the device's + * sysfs directory; for example, if the directory + * /sys/bus/pci/devices/0000:04:11.4/vfio-dev exists, then the driver + * that is currently bound to PCI device 0000:04:11.4 is either + * vfio-pci, or a vfio-pci variant driver. + * + * Return 0 on success, -1 on failure. If -1 is returned, then an error + * message has been logged. + */ +int +virPCIDeviceGetCurrentDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType) +{ + g_autofree char *drvPath = NULL; + g_autofree char *vfioDevDir = NULL; + int tmpType; + + if (virPCIDeviceGetCurrentDriverPathAndName(dev, &drvPath, drvName) < 0) + return -1; + + if (!*drvName) { + *drvType = VIR_PCI_STUB_DRIVER_NONE; + return 0; + } + + tmpType = virPCIStubDriverTypeFromString(*drvName); + + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { + *drvType = tmpType; + return 0; /* exact match of a known driver name (or no name) */ + } + + /* If the sysfs directory of this device contains a directory + * named "vfio-dev" then the currently-bound driver is a vfio + * variant driver. + */ + + vfioDevDir = virPCIFile(dev->name, "vfio-dev"); + + if (virFileIsDir(vfioDevDir)) { + VIR_DEBUG("Driver %s is a vfio_pci driver", *drvName); + *drvType = VIR_PCI_STUB_DRIVER_VFIO; + } else { + VIR_DEBUG("Driver %s is NOT a vfio_pci driver, or kernel is too old", + *drvName); + *drvType = VIR_PCI_STUB_DRIVER_NONE; + } + + return 0; +} + + static int virPCIDeviceConfigOpenInternal(virPCIDevice *dev, bool readonly, bool fatal) { @@ -1007,8 +1074,8 @@ virPCIDeviceReset(virPCIDevice *dev, virPCIDeviceList *activeDevs, virPCIDeviceList *inactiveDevs) { - g_autofree char *drvPath = NULL; g_autofree char *drvName = NULL; + virPCIStubDriver drvType; int ret = -1; int fd = -1; int hdrType = -1; @@ -1034,15 +1101,16 @@ virPCIDeviceReset(virPCIDevice *dev, * reset it whenever appropriate, so doing it ourselves would just * be redundant. */ - if (virPCIDeviceGetCurrentDriverPathAndName(dev, &drvPath, &drvName) < 0) + if (virPCIDeviceGetCurrentDriverNameAndType(dev, &drvName, &drvType) < 0) goto cleanup; - if (virPCIStubDriverTypeFromString(drvName) == VIR_PCI_STUB_DRIVER_VFIO) { - VIR_DEBUG("Device %s is bound to vfio-pci - skip reset", - dev->name); + if (drvType == VIR_PCI_STUB_DRIVER_VFIO) { + + VIR_DEBUG("Device %s is bound to %s - skip reset", dev->name, drvName); ret = 0; goto cleanup; } + VIR_DEBUG("Resetting device %s", dev->name); if ((fd = virPCIDeviceConfigOpenWrite(dev)) < 0) diff --git a/src/util/virpci.h b/src/util/virpci.h index 19c910202a..faca6cf6f9 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -283,6 +283,9 @@ int virPCIDeviceRebind(virPCIDevice *dev); int virPCIDeviceGetCurrentDriverPathAndName(virPCIDevice *dev, char **path, char **name); +int virPCIDeviceGetCurrentDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType); int virPCIDeviceIsPCIExpress(virPCIDevice *dev); int virPCIDeviceHasPCIExpressLink(virPCIDevice *dev); -- 2.41.0

On 8/21/23 21:32, Laine Stump wrote:
Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal host driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest.
In the past vfio-pci was the only driver that supplied these APIs, but there are now vendor/device-specific "VFIO variant" drivers that provide the basic vfio-pci driver functionality/API while adding support for device-specific operations (for example these device-specific drivers may support live migration of certain devices). All that is needed to make this functionality available is to bind the vendor-specific "VFIO variant" driver to the device (rather than the generic vfio-pci driver, which will continue to work, just without the extra functionality).
But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific VFIO variant driver to the device (and puts "managed='no'" in the config to prevent libvirt from changing the binding), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't exactly "vfio-pci".
Beginning with kernel 6.1, it's possible to determine from the sysfs directory for a device whether the currently-bound driver is the vfio-pci driver or a VFIO variant - the device directory will have a subdirectory called "vfio-dev". We can use that to appropriately widen the list of drivers that libvirt will allow for VFIO device assignment.
This patch doesn't remove the explicit check for the exact "vfio-pci" driver (since that would cause systems with pre-6.1 kernels to behave incorrectly), but adds an additional check for the vfio-dev directory, so that any VFIO variant driver is acceptable for libvirt to continue setting up for VFIO device assignment.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/hypervisor/virhostdev.c | 28 +++++-------- src/libvirt_private.syms | 1 + src/util/virpci.c | 78 ++++++++++++++++++++++++++++++++++--- src/util/virpci.h | 3 ++ 4 files changed, 87 insertions(+), 23 deletions(-)
Splendid! We can now turn virPCIDeviceGetCurrentDriverPathAndName() into a static function is it's only real use is inside of virpci.c. The only other use outside is in virpcitest.c and it only cares about ${driverName} anyway (so it's okay with calling this new virPCIDeviceGetCurrentDriverNameAndType()). Feel free to save that work for a follow up patch. Michal

virPCIProbeStubDriver() and virPCIDeviceBindToStub() both have very similar code that locally sets a driver name (based on stubDriverType). These two functions are each also called in just one place (virPCIDeviceDetach()), with just a small bit of validation code in between. To eliminate the "duplicated" code (which is going to be expanded slightly in upcoming patches to support manually or automatically picking a VFIO variant driver), this patch modifies virPCIProbeStubDriver() to take the driver name as an argument (rather than the virPCIDevice object), and calls it from within virPCIDeviceBindToStub() (rather than from that function's caller), using the driverName it has just figured out with the now-not-duplicated code. (NB: Since it could be used to probe *any* driver module, the name is changed to virPCIProbeDriver()). Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 33 ++++++++++++--------------------- 1 file changed, 12 insertions(+), 21 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index e165725cd9..ac91480e0b 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1154,28 +1154,19 @@ virPCIDeviceReset(virPCIDevice *dev, static int -virPCIProbeStubDriver(virPCIStubDriver driver) +virPCIProbeDriver(const char *driverName) { - const char *drvname = NULL; g_autofree char *drvpath = NULL; g_autofree char *errbuf = NULL; - if (driver == VIR_PCI_STUB_DRIVER_NONE || - !(drvname = virPCIStubDriverTypeToString(driver))) { - virReportError(VIR_ERR_INTERNAL_ERROR, - "%s", - _("Attempting to use unknown stub driver")); - return -1; - } - - drvpath = virPCIDriverDir(drvname); + drvpath = virPCIDriverDir(driverName); /* driver previously loaded, return */ if (virFileExists(drvpath)) return 0; - if ((errbuf = virKModLoad(drvname))) { - VIR_WARN("failed to load driver %s: %s", drvname, errbuf); + if ((errbuf = virKModLoad(driverName))) { + VIR_WARN("failed to load driver %s: %s", driverName, errbuf); goto cleanup; } @@ -1187,14 +1178,14 @@ virPCIProbeStubDriver(virPCIStubDriver driver) /* If we know failure was because of admin config, let's report that; * otherwise, report a more generic failure message */ - if (virKModIsProhibited(drvname)) { + if (virKModIsProhibited(driverName)) { virReportError(VIR_ERR_INTERNAL_ERROR, - _("Failed to load PCI stub module %1$s: administratively prohibited"), - drvname); + _("Failed to load PCI driver module %1$s: administratively prohibited"), + driverName); } else { virReportError(VIR_ERR_INTERNAL_ERROR, - _("Failed to load PCI stub module %1$s"), - drvname); + _("Failed to load PCI driver module %1$s"), + driverName); } return -1; @@ -1316,6 +1307,9 @@ virPCIDeviceBindToStub(virPCIDevice *dev) return -1; } + if (virPCIProbeDriver(stubDriverName) < 0) + return -1; + stubDriverPath = virPCIDriverDir(stubDriverName); driverLink = virPCIFile(dev->name, "driver"); @@ -1358,9 +1352,6 @@ virPCIDeviceDetach(virPCIDevice *dev, virPCIDeviceList *activeDevs, virPCIDeviceList *inactiveDevs) { - if (virPCIProbeStubDriver(dev->stubDriverType) < 0) - return -1; - if (activeDevs && virPCIDeviceListFind(activeDevs, &dev->address)) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Not detaching active device %1$s"), dev->name); -- 2.41.0

Signed-off-by: Laine Stump <laine@redhat.com> --- src/util/virpci.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/util/virpci.c b/src/util/virpci.c index ac91480e0b..c721b8e533 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -1290,17 +1290,20 @@ virPCIDeviceUnbindFromStub(virPCIDevice *dev) static int virPCIDeviceBindToStub(virPCIDevice *dev) { - const char *stubDriverName; + const char *stubDriverName = dev->stubDriverName; g_autofree char *stubDriverPath = NULL; g_autofree char *driverLink = NULL; - /* Check the device is configured to use one of the known stub drivers */ + if (dev->stubDriverType == VIR_PCI_STUB_DRIVER_NONE) { virReportError(VIR_ERR_INTERNAL_ERROR, _("No stub driver configured for PCI device %1$s"), dev->name); return -1; - } else if (!(stubDriverName = virPCIStubDriverTypeToString(dev->stubDriverType))) { + } + + if (!stubDriverName + && !(stubDriverName = virPCIStubDriverTypeToString(dev->stubDriverType))) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Unknown stub driver configured for PCI device %1$s"), dev->name); -- 2.41.0

In the past, the only allowable values for the "driver" field of virNodeDeviceDetachFlags() were "kvm" or "vfio" for the QEMU driver, and "xen" for the libxl driver. Then "kvm" was deprecated and removed, so the driver name became essentially irrelevant (because it is always called via a particular hypervisor driver, and so the "xen" or "vfio" can be (and almost always is) implied. With the advent of VFIO variant drivers, the ability to explicitly specify a driver name once again becomes useful - it can be used to name the exact VFIO driver that we want bound to the device in place of vfio-pci, so this patch allows those other names to be passed down the call chain, where the code in virpci.c can make use of them. The names "vfio", "kvm", and "xen" retain their special meaning, though: 1) because there may be some application or configuration that still calls virNodeDeviceDetachFlags() with driverName="vfio", this single value is substituted with the synonym of NULL, which means "bind the default driver for this device and hypervisor". This will currently result in the vfio-pci driver being bound to the device. 2) in the case of the libxl driver, "xen" means to use the standard driver used in the case of Xen ("pciback"). 3) "kvm" as a driver name always results in an error, as legacy KVM device assignment was removed from the kernel around 10 years ago. Signed-off-by: Laine Stump <laine@redhat.com> --- src/hypervisor/domain_driver.c | 9 ++++----- src/hypervisor/domain_driver.h | 2 ++ src/libxl/libxl_driver.c | 3 ++- src/qemu/qemu_driver.c | 33 +++++++++++++++++++-------------- 4 files changed, 27 insertions(+), 20 deletions(-) diff --git a/src/hypervisor/domain_driver.c b/src/hypervisor/domain_driver.c index a70f75f3ae..429784292a 100644 --- a/src/hypervisor/domain_driver.c +++ b/src/hypervisor/domain_driver.c @@ -462,6 +462,7 @@ virDomainDriverNodeDeviceReAttach(virNodeDevicePtr dev, int virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, virHostdevManager *hostdevMgr, + virPCIStubDriver driverType, const char *driverName) { g_autoptr(virPCIDevice) pci = NULL; @@ -471,7 +472,7 @@ virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, g_autoptr(virConnect) nodeconn = NULL; g_autoptr(virNodeDevice) nodedev = NULL; - if (!driverName) + if (driverType == VIR_PCI_STUB_DRIVER_NONE) return -1; if (!(nodeconn = virGetConnectNodeDev())) @@ -504,10 +505,8 @@ virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, if (!pci) return -1; - if (STREQ(driverName, "vfio")) - virPCIDeviceSetStubDriverType(pci, VIR_PCI_STUB_DRIVER_VFIO); - else if (STREQ(driverName, "xen")) - virPCIDeviceSetStubDriverType(pci, VIR_PCI_STUB_DRIVER_XEN); + virPCIDeviceSetStubDriverType(pci, driverType); + virPCIDeviceSetStubDriverName(pci, driverName); return virHostdevPCINodeDeviceDetach(hostdevMgr, pci); } diff --git a/src/hypervisor/domain_driver.h b/src/hypervisor/domain_driver.h index 4241c86932..9942f58fda 100644 --- a/src/hypervisor/domain_driver.h +++ b/src/hypervisor/domain_driver.h @@ -22,6 +22,7 @@ #include "node_device_conf.h" #include "virhostdev.h" +#include "virpci.h" char * virDomainDriverGenerateRootHash(const char *drivername, @@ -58,6 +59,7 @@ int virDomainDriverNodeDeviceReAttach(virNodeDevicePtr dev, int virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, virHostdevManager *hostdevMgr, + virPCIStubDriver driverType, const char *driverName); int virDomainDriverAddIOThreadCheck(virDomainDef *def, diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 3d10f45850..079922dd32 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -5876,7 +5876,8 @@ libxlNodeDeviceDetachFlags(virNodeDevicePtr dev, /* virNodeDeviceDetachFlagsEnsureACL() is being called by * virDomainDriverNodeDeviceDetachFlags() */ - return virDomainDriverNodeDeviceDetachFlags(dev, hostdev_mgr, driverName); + return virDomainDriverNodeDeviceDetachFlags(dev, hostdev_mgr, + VIR_PCI_STUB_DRIVER_XEN, NULL); } static int diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f8039160f4..f676744e9e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -70,7 +70,6 @@ #include "domain_driver.h" #include "domain_postparse.h" #include "domain_validate.h" -#include "virpci.h" #include "virpidfile.h" #include "virprocess.h" #include "libvirt_internal.h" @@ -11400,24 +11399,28 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, virCheckFlags(0, -1); - if (!driverName) - driverName = "vfio"; - - /* Only the 'vfio' driver is supported and a special error message for - * the previously supported 'kvm' driver is provided below. */ - if (STRNEQ(driverName, "vfio") && STRNEQ(driverName, "kvm")) { - virReportError(VIR_ERR_INVALID_ARG, - _("unknown driver name '%1$s'"), driverName); - return -1; - } + /* For historical reasons, if driverName is "vfio", that is the + * same as NULL, i.e. the default vfio driver for this device + */ + if (STREQ_NULLABLE(driverName, "vfio")) + driverName = NULL; - if (STREQ(driverName, "kvm")) { + /* the "kvm" driver name was used a very long time ago to force + * "legacy KVM device assignment", which hasn't been supported in + * over 10 years. + */ + if (STREQ_NULLABLE(driverName, "kvm")) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("KVM device assignment is no longer " + _("'legacy KVM' device assignment is no longer " "supported on this system")); return -1; } + /* for any other driver, we can't know whether or not it is a VFIO + * driver until the device has been bound to it, so we will defer + * further validation until then. + */ + if (!qemuHostdevHostSupportsPassthroughVFIO()) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", _("VFIO device assignment is currently not " @@ -11427,7 +11430,9 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, /* virNodeDeviceDetachFlagsEnsureACL() is being called by * virDomainDriverNodeDeviceDetachFlags() */ - return virDomainDriverNodeDeviceDetachFlags(dev, hostdev_mgr, driverName); + return virDomainDriverNodeDeviceDetachFlags(dev, hostdev_mgr, + VIR_PCI_STUB_DRIVER_VFIO, + driverName); } static int -- 2.41.0

On 8/21/23 21:32, Laine Stump wrote:
In the past, the only allowable values for the "driver" field of virNodeDeviceDetachFlags() were "kvm" or "vfio" for the QEMU driver, and "xen" for the libxl driver. Then "kvm" was deprecated and removed, so the driver name became essentially irrelevant (because it is always called via a particular hypervisor driver, and so the "xen" or "vfio" can be (and almost always is) implied.
With the advent of VFIO variant drivers, the ability to explicitly specify a driver name once again becomes useful - it can be used to name the exact VFIO driver that we want bound to the device in place of vfio-pci, so this patch allows those other names to be passed down the call chain, where the code in virpci.c can make use of them.
The names "vfio", "kvm", and "xen" retain their special meaning, though:
1) because there may be some application or configuration that still calls virNodeDeviceDetachFlags() with driverName="vfio", this single value is substituted with the synonym of NULL, which means "bind the default driver for this device and hypervisor". This will currently result in the vfio-pci driver being bound to the device.
2) in the case of the libxl driver, "xen" means to use the standard driver used in the case of Xen ("pciback").
3) "kvm" as a driver name always results in an error, as legacy KVM device assignment was removed from the kernel around 10 years ago.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/hypervisor/domain_driver.c | 9 ++++----- src/hypervisor/domain_driver.h | 2 ++ src/libxl/libxl_driver.c | 3 ++- src/qemu/qemu_driver.c | 33 +++++++++++++++++++-------------- 4 files changed, 27 insertions(+), 20 deletions(-)
diff --git a/src/hypervisor/domain_driver.c b/src/hypervisor/domain_driver.c index a70f75f3ae..429784292a 100644 --- a/src/hypervisor/domain_driver.c +++ b/src/hypervisor/domain_driver.c @@ -462,6 +462,7 @@ virDomainDriverNodeDeviceReAttach(virNodeDevicePtr dev, int virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, virHostdevManager *hostdevMgr, + virPCIStubDriver driverType, const char *driverName) { g_autoptr(virPCIDevice) pci = NULL; @@ -471,7 +472,7 @@ virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, g_autoptr(virConnect) nodeconn = NULL; g_autoptr(virNodeDevice) nodedev = NULL;
- if (!driverName) + if (driverType == VIR_PCI_STUB_DRIVER_NONE) return -1;
Even though, we are very unlikely to hit this check, the rest of the function reports an error on error paths. Please virReportError() here too. Michal

On 8/23/23 3:52 AM, Michal Prívozník wrote:
On 8/21/23 21:32, Laine Stump wrote:
In the past, the only allowable values for the "driver" field of virNodeDeviceDetachFlags() were "kvm" or "vfio" for the QEMU driver, and "xen" for the libxl driver. Then "kvm" was deprecated and removed, so the driver name became essentially irrelevant (because it is always called via a particular hypervisor driver, and so the "xen" or "vfio" can be (and almost always is) implied.
With the advent of VFIO variant drivers, the ability to explicitly specify a driver name once again becomes useful - it can be used to name the exact VFIO driver that we want bound to the device in place of vfio-pci, so this patch allows those other names to be passed down the call chain, where the code in virpci.c can make use of them.
The names "vfio", "kvm", and "xen" retain their special meaning, though:
1) because there may be some application or configuration that still calls virNodeDeviceDetachFlags() with driverName="vfio", this single value is substituted with the synonym of NULL, which means "bind the default driver for this device and hypervisor". This will currently result in the vfio-pci driver being bound to the device.
2) in the case of the libxl driver, "xen" means to use the standard driver used in the case of Xen ("pciback").
3) "kvm" as a driver name always results in an error, as legacy KVM device assignment was removed from the kernel around 10 years ago.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/hypervisor/domain_driver.c | 9 ++++----- src/hypervisor/domain_driver.h | 2 ++ src/libxl/libxl_driver.c | 3 ++- src/qemu/qemu_driver.c | 33 +++++++++++++++++++-------------- 4 files changed, 27 insertions(+), 20 deletions(-)
diff --git a/src/hypervisor/domain_driver.c b/src/hypervisor/domain_driver.c index a70f75f3ae..429784292a 100644 --- a/src/hypervisor/domain_driver.c +++ b/src/hypervisor/domain_driver.c @@ -462,6 +462,7 @@ virDomainDriverNodeDeviceReAttach(virNodeDevicePtr dev, int virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, virHostdevManager *hostdevMgr, + virPCIStubDriver driverType, const char *driverName) { g_autoptr(virPCIDevice) pci = NULL; @@ -471,7 +472,7 @@ virDomainDriverNodeDeviceDetachFlags(virNodeDevicePtr dev, g_autoptr(virConnect) nodeconn = NULL; g_autoptr(virNodeDevice) nodedev = NULL;
- if (!driverName) + if (driverType == VIR_PCI_STUB_DRIVER_NONE) return -1;
Even though, we are very unlikely to hit this check, the rest of the function reports an error on error paths. Please virReportError() here too.
Nice catch! Although 1) this is pre-existing (added all the way back in commit 887dd0d33), and 2) with the current code it would be impossible for this error to occur (since both callers set driverType to something other than NONE), still there is no point in having the check there if it's just going to fail silently on that fateful day in the future when someone modifies the code such that all the callers *don't* set driverType != NONE. I'll add an error log before I push.

Normally I wouldn't bother with a change like this, but I was touching the function anyway, and wanted to leave it looking nice and tidy. Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_driver.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f676744e9e..a60cbf0ed4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11411,8 +11411,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, */ if (STREQ_NULLABLE(driverName, "kvm")) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("'legacy KVM' device assignment is no longer " - "supported on this system")); + _("'legacy KVM' device assignment is no longer supported on this system")); return -1; } @@ -11423,8 +11422,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, if (!qemuHostdevHostSupportsPassthroughVFIO()) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("VFIO device assignment is currently not " - "supported on this system")); + _("VFIO device assignment is currently not supported on this system")); return -1; } -- 2.41.0

On 8/21/23 21:32, Laine Stump wrote:
Normally I wouldn't bother with a change like this, but I was touching the function anyway, and wanted to leave it looking nice and tidy.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_driver.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f676744e9e..a60cbf0ed4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11411,8 +11411,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, */ if (STREQ_NULLABLE(driverName, "kvm")) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("'legacy KVM' device assignment is no longer " - "supported on this system")); + _("'legacy KVM' device assignment is no longer supported on this system")); return -1; }
@@ -11423,8 +11422,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev,
if (!qemuHostdevHostSupportsPassthroughVFIO()) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("VFIO device assignment is currently not " - "supported on this system")); + _("VFIO device assignment is currently not supported on this system")); return -1; }
This got me thinking, whether we should do one huge commit which would fix ALL the error messages in all files and just be done with it for good. Again, future work, you patch is good as is. Michal

On 8/23/23 3:52 AM, Michal Prívozník wrote:
On 8/21/23 21:32, Laine Stump wrote:
Normally I wouldn't bother with a change like this, but I was touching the function anyway, and wanted to leave it looking nice and tidy.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_driver.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f676744e9e..a60cbf0ed4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11411,8 +11411,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, */ if (STREQ_NULLABLE(driverName, "kvm")) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("'legacy KVM' device assignment is no longer " - "supported on this system")); + _("'legacy KVM' device assignment is no longer supported on this system")); return -1; }
@@ -11423,8 +11422,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev,
if (!qemuHostdevHostSupportsPassthroughVFIO()) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("VFIO device assignment is currently not " - "supported on this system")); + _("VFIO device assignment is currently not supported on this system")); return -1; }
This got me thinking, whether we should do one huge commit which would fix ALL the error messages in all files and just be done with it for good. Again, future work, you patch is good as is.
Yep, I had that thought too. I do worry that single giant mega-patches like that can create merge conflicts later though (since cherry-picking the mega-patch to resolve one conflict in context can create several other conflicts), but as you say that can be done (including the discussion of its relative merits) at another time.

On 8/23/23 22:06, Laine Stump wrote:
On 8/23/23 3:52 AM, Michal Prívozník wrote:
On 8/21/23 21:32, Laine Stump wrote:
Normally I wouldn't bother with a change like this, but I was touching the function anyway, and wanted to leave it looking nice and tidy.
Signed-off-by: Laine Stump <laine@redhat.com> --- src/qemu/qemu_driver.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f676744e9e..a60cbf0ed4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11411,8 +11411,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, */ if (STREQ_NULLABLE(driverName, "kvm")) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("'legacy KVM' device assignment is no longer " - "supported on this system")); + _("'legacy KVM' device assignment is no longer supported on this system")); return -1; } @@ -11423,8 +11422,7 @@ qemuNodeDeviceDetachFlags(virNodeDevicePtr dev, if (!qemuHostdevHostSupportsPassthroughVFIO()) { virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", - _("VFIO device assignment is currently not " - "supported on this system")); + _("VFIO device assignment is currently not supported on this system")); return -1; }
This got me thinking, whether we should do one huge commit which would fix ALL the error messages in all files and just be done with it for good. Again, future work, you patch is good as is.
Yep, I had that thought too. I do worry that single giant mega-patches like that can create merge conflicts later though (since cherry-picking the mega-patch to resolve one conflict in context can create several other conflicts), but as you say that can be done (including the discussion of its relative merits) at another time.
Yeah, that would be a problem, but also such conflicts would be trivial to resolve. Michal

On 8/21/23 21:32, Laine Stump wrote:
A "VFIO variant" driver is a kernel driver for a device that supports all the APIs of the basic vfio-pci driver (which enables assigning a host PCI device to a QEMU guest) plus "some extra stuff" (e.g. to enable things like saving/restoring device state in order to support live migration.)
Way back last year I posted a couple attempts to support VFIO variant drivers; here is V2 (along with a later followup discussion from a couple months ago):
https://listman.redhat.com/archives/libvir-list/2022-August/233661.html https://listman.redhat.com/archives/libvir-list/2023-May/240108.html
The mlx5-vfio-pci driver has now been upstream for quite awhile (and even in the downstream Fedora 38 kernel, for example), as are the sysfs bits that allow us to determine whether or not a driver is a VFIO variant, and I've updated the patch(es) to use this.
I've also been working on auto-binding to the "best-match" VFIO variant driver based on comparing the device's modalias file in sysfs to the contents of the kernel's modules.alias file, but that isn't quite ready (partly code that isn't yet working, but also partly indecision about exactly where in the XML to put the driver name when it is specified; I won't take up more space here with that though).
In the meantime, there are people who want to use the mlx5-vfio-pci driver (and Cedric Le Goater also has written vfio-pci-igbvf and vfio-pci-e1000e drivers (which area very useful for testing), although I don't think he has posted them anywhere yet), so I would like to get the basic patches here merged in upstream now while I continue working on "Part 2".
These patches provide two improvements that make testing/using VFIO drivers much more convenient:
1) The specific driver can be given in the virsh nodedev-detach command (or the virNodeDeviceDetachFlags() API call), e.g.:
virsh nodedev-detach pci_0000_04_11_5 --driver vfio-pci-igbvf
2) If the <hostdev> (or "<interface> ... <type='hostdev'/>" has "managed='no'", then libvirt will recognize any VFIO variant driver (rather than the current behavior of rejecting anything that isn't exactly "vfio-pci")
With these two capabilities, it's simple and straightforward to bind a device to a VFIO variant driver, and then start a guest that uses that device.
Change in V2:
* complete remake, more refactoring
* use existence of "vfio-dev" subdirectory of device directory in sysfs to determine whether the currently-bound driver is a vfio variant.
* support binding to a user-specified driver during nodedev-detach, rather than only supporting vfio-pci.
Laine Stump (8): util: use "stubDriverType" instead of just "stubDriver" util: add stub driver name to virPCIDevice object util: rename virPCIDeviceGetDriverPathAndName util: permit existing binding to VFIO variant driver util: probe stub driver from within function that binds to stub driver util: honor stubDriverName when probing/binding stub driver for a device node_device: support binding other drivers with virNodeDeviceDetachFlags() qemu: turn two multiline log messages into single line
src/hypervisor/domain_driver.c | 9 +- src/hypervisor/domain_driver.h | 2 + src/hypervisor/virhostdev.c | 35 +++----- src/libvirt_private.syms | 9 +- src/libxl/libxl_driver.c | 3 +- src/qemu/qemu_driver.c | 37 ++++---- src/util/virnvme.c | 2 +- src/util/virpci.c | 156 +++++++++++++++++++++++++-------- src/util/virpci.h | 18 ++-- tests/virhostdevtest.c | 2 +- tests/virpcitest.c | 10 +-- 11 files changed, 185 insertions(+), 98 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal

On 21/08/2023 20:32, Laine Stump wrote:
A "VFIO variant" driver is a kernel driver for a device that supports all the APIs of the basic vfio-pci driver (which enables assigning a host PCI device to a QEMU guest) plus "some extra stuff" (e.g. to enable things like saving/restoring device state in order to support live migration.)
Way back last year I posted a couple attempts to support VFIO variant drivers; here is V2 (along with a later followup discussion from a couple months ago):
https://listman.redhat.com/archives/libvir-list/2022-August/233661.html https://listman.redhat.com/archives/libvir-list/2023-May/240108.html
The mlx5-vfio-pci driver has now been upstream for quite awhile (and even in the downstream Fedora 38 kernel, for example), as are the sysfs bits that allow us to determine whether or not a driver is a VFIO variant, and I've updated the patch(es) to use this.
I've also been working on auto-binding to the "best-match" VFIO variant driver based on comparing the device's modalias file in sysfs to the contents of the kernel's modules.alias file, but that isn't quite ready (partly code that isn't yet working, but also partly indecision about exactly where in the XML to put the driver name when it is specified; I won't take up more space here with that though).
Don't we have a: <interface type='hostdev' managed='yes'> ... <driver name='vfio'> ... </interface/ ... that could be re-used somehow to specify the VFIO variant driver of choice? Somewhat similar to the nodedev-detach equivalent? OTOH, it would only cover networking hostdevs... :/
In the meantime, there are people who want to use the mlx5-vfio-pci driver (and Cedric Le Goater also has written vfio-pci-igbvf and vfio-pci-e1000e drivers (which area very useful for testing), although I don't think he has posted them anywhere yet),
Interesting, e1000e and igbvf have quiesce/resume/dirty-tracking commands? Sounds too old to have :D unless of course this is Qemu-created commands and not actual IGB/E1000e hardware doing those
so I would like to get the basic patches here merged in upstream now while I continue working on "Part 2".
These patches provide two improvements that make testing/using VFIO drivers much more convenient:
1) The specific driver can be given in the virsh nodedev-detach command (or the virNodeDeviceDetachFlags() API call), e.g.:
virsh nodedev-detach pci_0000_04_11_5 --driver vfio-pci-igbvf
2) If the <hostdev> (or "<interface> ... <type='hostdev'/>" has "managed='no'", then libvirt will recognize any VFIO variant driver (rather than the current behavior of rejecting anything that isn't exactly "vfio-pci")
With these two capabilities, it's simple and straightforward to bind a device to a VFIO variant driver, and then start a guest that uses that device.
Nice Provided the initial use of VFIO variant drivers have been to support live migration, should the hostdev blocker be lifted too *just in case it is simple addition of course* ? In vfio 8.1 it is enabled by default, so there isn't much else to do other than unblock migration with hostdevs. Something like below is <interface> specific; which works but it is likely too specific: diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d21551ab07d9..6edc8583c539 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1316,13 +1316,15 @@ qemuMigrationSrcIsAllowedHostdev(const virDomainDef *def) continue; } - /* all other PCI hostdevs can't be migrated */ + /* some PCI hostdevs can't be migrated */ if (hostdev->parentnet) { - virDomainNetType actualType = virDomainNetGetActualType(hostdev->parentnet); - - virReportError(VIR_ERR_OPERATION_UNSUPPORTED, - _("cannot migrate a domain with <interface type='%1$s'>"), - virDomainNetTypeToString(actualType)); + /* + * Some VFIO network devices can be migrated if vendor + * VFIO driver is used e.g. via mlx5-vfio-pci. + * We leave it up to the live migration blocker in Qemu to + * stop us from proceeding. + */ + continue; } else { virReportError(VIR_ERR_OPERATION_UNSUPPORTED, _("cannot migrate a domain with <hostdev mode='subsystem' type='%1$s'>"), Or even something like this generic to the hostdev as long as it's assigned to a variant driver. For any off shot case that isn't a migration vfio driver then Qemu will fail for us as it will fail to probe the right things e.g. (below is totally untested, it is just for illustration purposes, we can probably do away with just VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO and let Qemu block migration for us?) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d21551ab07d9..1f29493aa0c2 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1301,6 +1301,8 @@ qemuMigrationSrcIsAllowedHostdev(const virDomainDef *def) return false; case VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI: + g_autoptr(virPCIDevice) pci = NULL; + /* * if the device has a <teaming type='transient'> * element, then migration *is* allowed because the @@ -1316,6 +1318,22 @@ qemuMigrationSrcIsAllowedHostdev(const virDomainDef *def) continue; } + if ((hostdev->source.subsys.u.pci.backend == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_VFIO) && + !virHostdevGetPCIHostDevice(hostdev, &pci) && pci) { + g_autofree char *drvPath = NULL; + g_autofree char *drvName = NULL; + + /* + * Some VFIO network devices can be migrated if non standard + * VFIO driver is used. We leave it up to the live migration + * blocker in Qemu to stop us from proceeding. + */ + if (!virPCIDeviceGetCurrentDriverPathAndName(pci, &drvPath, + &drvName) && + !STREQ(drvName, "vfio-pci")) + continue; + } + /* all other PCI hostdevs can't be migrated */ if (hostdev->parentnet) { virDomainNetType actualType = virDomainNetGetActualType(hostdev->parentnet);
participants (3)
-
Joao Martins
-
Laine Stump
-
Michal Prívozník