[PATCH v2] util: basic support for VFIO variant drivers

11 Aug 2022

Before a PCI device can be assigned to a guest with VFIO, that device
must be bound to the vfio-pci driver rather than to the device's
normal driver. The vfio-pci driver provides APIs that permit QEMU to
perform all the necessary operations to make the device accessible to
the guest.

There has been kernel work recently to support vendor/device-specific
VFIO variant drivers that provide the basic vfio-pci driver functionality
while adding support for device-specific operations (for example these
device-specific drivers are planned to support live migration of
certain devices). All that will be needed to make this functionality
available will be to bind the new vendor-specific driver to the device
(rather than the generic vfio-pci driver, which will continue to work
just without the extra functionality).

But until now libvirt has required that all PCI devices being assigned
to a guest with VFIO specifically have the "vfio-pci" driver bound to
the device. So even if the user manually binds a shiny new
vendor-specific vfio variant driver to the device (and puts
"managed='no'" in the config to prevent libvirt from changing the
binding), libvirt will just fail during startup of the guest (or
during hotplug) because the driver bound to the device isn't exactly
"vfio-pci".

This patch loosens that restriction a bit - rather than requiring that
the device be bound to "vfio-pci", it also checks if the drivername
contains the string "vfio" at all, and in this case allows the
operation to continue. If the driver is in fact a VFIO variant, then
the assignment will succeed, but if it is not a VFIO variant then QEMU
will fail (and report the error back to libvirt).

In the near future (possibly by kernel 6.0) there will be a
formal method of identifying a VFIO variant driver by looking in
sysfs; in the meantime the inexact, but simple, method in this patch
will allow users of the few existing VFIO variant drivers (and
developers of new VFIO variant drivers) to use their new drivers
without needing to remove libvirt from their setup - they can simply
pre-bind the device to the new driver, then use "managed='no'" in
their libvirt config.

NB: this patch does *not* handle automatically determining the proper
vendor-specific driver and binding to it in the case of
"managed='yes'". This will be implemented later when there is a widely
available driver / device combo we can use for testing.

Signed-off-by: Laine Stump <laine@redhat.com>
---
V1 here: https://listman.redhat.com/archives/libvir-list/2022-August/233327.html

Change in V2:

V1 used the output of modinfo to look for "vfio_pci" as an alias for a
driver to see if it was a VFIO variant driver.

As a result of discussion of V1, V2 is much simpler - it just assumes
that any driver with "vfio" in the name is a VFIO variant. This is
okay because 1) QEMU will still catch it and libvirt will properly log
the error if the driver isn't actually a VFIO variant, and 2) it's a
temporary situation, just to enable use of VFIO variant drivers with
libvirt until a standard method of detecting this is added to sysfs
(which, according to the discussion of V1, is coming in the near
future).

(NB: I did implement checking of /lib/modules/`uname -r`/modules.alias
as suggested by Erik, but it turned out that this caused the unit
tests to call uname(3) and open the modules.alias file on the test
host - for a proper unit test I would have also needed to mock these
two functions, and it seemed like too much complexity for a temporary
workaround. I've implemented Jason's suggestion here (accept any
driver with "vfio" in the name), which is similar to danpb's
suggestion (accept specifically the two drivers that are already in
the upstream kernel), but will also allow for new drivers that may be
under development.)

 src/hypervisor/virhostdev.c | 26 ++++---------
 src/util/virpci.c           | 76 ++++++++++++++++++++++++++++++++++---
 src/util/virpci.h           |  3 ++
 3 files changed, 82 insertions(+), 23 deletions(-)

diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
index c0ce867596..15b35fa75e 100644
--- a/src/hypervisor/virhostdev.c
+++ b/src/hypervisor/virhostdev.c
@@ -747,9 +747,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr,
                                    mgr->inactivePCIHostdevs) < 0)
                 goto reattachdevs;
         } else {
-            g_autofree char *driverPath = NULL;
-            g_autofree char *driverName = NULL;
-            int stub;
+            g_autofree char *drvName = NULL;
+            virPCIStubDriver drvType;
 
             /* Unmanaged devices should already have been marked as
              * inactive: if that's the case, we can simply move on */
@@ -769,18 +768,14 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr,
              *       information about active / inactive device across
              *       daemon restarts has been implemented */
 
-            if (virPCIDeviceGetDriverPathAndName(pci,
-                                                 &driverPath, &driverName) < 0)
+            if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0)
                 goto reattachdevs;
 
-            stub = virPCIStubDriverTypeFromString(driverName);
-
-            if (stub > VIR_PCI_STUB_DRIVER_NONE &&
-                stub < VIR_PCI_STUB_DRIVER_LAST) {
+            if (drvType > VIR_PCI_STUB_DRIVER_NONE) {
 
                 /* The device is bound to a known stub driver: store this
                  * information and add a copy to the inactive list */
-                virPCIDeviceSetStubDriver(pci, stub);
+                virPCIDeviceSetStubDriver(pci, drvType);
 
                 VIR_DEBUG("Adding PCI device %s to inactive list",
                           virPCIDeviceGetName(pci));
@@ -2292,18 +2287,13 @@ virHostdevPrepareOneNVMeDevice(virHostdevManager *hostdev_mgr,
     /* Let's check if all PCI devices are NVMe disks. */
     for (i = 0; i < virPCIDeviceListCount(pciDevices); i++) {
         virPCIDevice *pci = virPCIDeviceListGet(pciDevices, i);
-        g_autofree char *drvPath = NULL;
         g_autofree char *drvName = NULL;
-        int stub = VIR_PCI_STUB_DRIVER_NONE;
+        virPCIStubDriver drvType;
 
-        if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0)
+        if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0)
             goto cleanup;
 
-        if (drvName)
-            stub = virPCIStubDriverTypeFromString(drvName);
-
-        if (stub == VIR_PCI_STUB_DRIVER_VFIO ||
-            STREQ_NULLABLE(drvName, "nvme"))
+        if (drvType == VIR_PCI_STUB_DRIVER_VFIO || STREQ_NULLABLE(drvName, "nvme"))
             continue;
 
         VIR_WARN("Suspicious NVMe disk assignment. PCI device "
diff --git a/src/util/virpci.c b/src/util/virpci.c
index 7800966963..51ccf4d9fd 100644
--- a/src/util/virpci.c
+++ b/src/util/virpci.c
@@ -277,6 +277,71 @@ virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, char **path, char **name)
 }
 
 
+/**
+ * virPCIDeviceGetDriverNameAndType:
+ * @dev: virPCIDevice object to examine
+ * @drvName: returns name of driver bound to this device (if any)
+ * @drvType: returns type of driver if it is a known stub driver type
+ *
+ * Find the name of the driver bound to @dev (if any) and the type of
+ * the driver if it is a known/recognized "stub" driver (based on the
+ * driver name).
+ *
+ * There are vfio "variant" drivers that provide all the basic
+ * functionality of the standard vfio-pci driver as well as additional
+ * stuff. There is a plan to add info to sysfs that will allow easily
+ * determining if a driver is a vfio variant driver, but that sysfs
+ * entry isn't yet available. In the meantime as a workaround so that
+ * the few existing vfio variant drivers can be used with libvirt, and
+ * so that driver developers can test their new vfio variant drivers
+ * without needing to bypass libvirt, we also check if the driver name
+ * contains the string "vfio"; if it does, then we consider this drier
+ * as type VFIO. This can lead to false positives, but that isn't a
+ * horrible thing, because the problem will still be caught by QEMU as
+ * soon as libvirt makes the request to attach the device.
+ *
+ * Return 0 on success, -1 on failure. If -1 is returned, then an error
+ * message has been logged.
+ */
+int
+virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
+                                 char **drvName,
+                                 virPCIStubDriver *drvType)
+{
+    g_autofree char *drvPath = NULL;
+    int tmpType;
+
+    if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0)
+        return -1;
+
+    if (!*drvName) {
+        *drvType = VIR_PCI_STUB_DRIVER_NONE;
+        return 0;
+    }
+
+    tmpType = virPCIStubDriverTypeFromString(*drvName);
+
+    if (tmpType > VIR_PCI_STUB_DRIVER_NONE) {
+        *drvType = tmpType;
+        return 0; /* exact match of a known driver name (or no name) */
+    }
+
+    /* Check if the drivername contains "vfio" and count as a VFIO
+     * driver if so - see above for explanation.
+     */
+
+    if (strstr(*drvName, "vfio")) {
+        VIR_DEBUG("Driver %s is a vfio_pci driver", *drvName);
+        *drvType = VIR_PCI_STUB_DRIVER_VFIO;
+    } else {
+        VIR_DEBUG("Driver %s is NOT a vfio_pci driver", *drvName);
+        *drvType = VIR_PCI_STUB_DRIVER_NONE;
+    }
+
+    return 0;
+}
+
+
 static int
 virPCIDeviceConfigOpenInternal(virPCIDevice *dev, bool readonly, bool fatal)
 {
@@ -1004,8 +1069,8 @@ virPCIDeviceReset(virPCIDevice *dev,
                   virPCIDeviceList *activeDevs,
                   virPCIDeviceList *inactiveDevs)
 {
-    g_autofree char *drvPath = NULL;
     g_autofree char *drvName = NULL;
+    virPCIStubDriver drvType;
     int ret = -1;
     int fd = -1;
     int hdrType = -1;
@@ -1032,15 +1097,16 @@ virPCIDeviceReset(virPCIDevice *dev,
      * reset it whenever appropriate, so doing it ourselves would just
      * be redundant.
      */
-    if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, &drvName) < 0)
+    if (virPCIDeviceGetDriverNameAndType(dev, &drvName, &drvType) < 0)
         goto cleanup;
 
-    if (virPCIStubDriverTypeFromString(drvName) == VIR_PCI_STUB_DRIVER_VFIO) {
-        VIR_DEBUG("Device %s is bound to vfio-pci - skip reset",
-                  dev->name);
+    if (drvType == VIR_PCI_STUB_DRIVER_VFIO) {
+
+        VIR_DEBUG("Device %s is bound to %s - skip reset", dev->name, drvName);
         ret = 0;
         goto cleanup;
     }
+
     VIR_DEBUG("Resetting device %s", dev->name);
 
     if ((fd = virPCIDeviceConfigOpenWrite(dev)) < 0)
diff --git a/src/util/virpci.h b/src/util/virpci.h
index 4d9193f24e..0532b90f90 100644
--- a/src/util/virpci.h
+++ b/src/util/virpci.h
@@ -280,6 +280,9 @@ int virPCIDeviceRebind(virPCIDevice *dev);
 int virPCIDeviceGetDriverPathAndName(virPCIDevice *dev,
                                      char **path,
                                      char **name);
+int virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
+                                     char **drvName,
+                                     virPCIStubDriver *drvType);
 
 int virPCIDeviceIsPCIExpress(virPCIDevice *dev);
 int virPCIDeviceHasPCIExpressLink(virPCIDevice *dev);
-- 
2.37.1

    

Laine Stump

Laine Stump

Michal Prívozník

Alex Williamson

Jason Gunthorpe

Joao Martins

Jason Gunthorpe

Joao Martins

Laine Stump

Jason Gunthorpe

Alex Williamson

Jason Gunthorpe

Alex Williamson

Jason Gunthorpe

tags

participants (5)