ping.
I have a different version of this patch where I do read the modules.alias file
rather than just checking the name of the driver, but that also requires "double
mocking" open() in the unit test, which wasn't working properly, and I'd
rather
not spend the time figuring it out if it's not going to be needed. (Alex prefers
that version because it is more correct than just checking the name, and he's
concerned that the new sysfs-based API may take longer than we're thinking to
get into downstream distros, but the version in this patch does satisfy both
Jason and Daniel's suggested implementations). Anyway, I can post the other
patch if anyone is interested.
[sorry for the thread necromancy]
I was wondering if you're planning on respinning this work, or rather the
modalias approach alternative you mention? Or perhaps we are waiting for the
cdev sysfs? Though, there's still a significant amount of kernel versions to
cover that won't have the sysfs entry sadly :(
On 8/11/22 1:00 AM, Laine Stump wrote:
> Before a PCI device can be assigned to a guest with VFIO, that device
> must be bound to the vfio-pci driver rather than to the device's
> normal driver. The vfio-pci driver provides APIs that permit QEMU to
> perform all the necessary operations to make the device accessible to
> the guest.
>
> There has been kernel work recently to support vendor/device-specific
> VFIO variant drivers that provide the basic vfio-pci driver functionality
> while adding support for device-specific operations (for example these
> device-specific drivers are planned to support live migration of
> certain devices). All that will be needed to make this functionality
> available will be to bind the new vendor-specific driver to the device
> (rather than the generic vfio-pci driver, which will continue to work
> just without the extra functionality).
>
> But until now libvirt has required that all PCI devices being assigned
> to a guest with VFIO specifically have the "vfio-pci" driver bound to
> the device. So even if the user manually binds a shiny new
> vendor-specific vfio variant driver to the device (and puts
> "managed='no'" in the config to prevent libvirt from changing the
> binding), libvirt will just fail during startup of the guest (or
> during hotplug) because the driver bound to the device isn't exactly
> "vfio-pci".
>
> This patch loosens that restriction a bit - rather than requiring that
> the device be bound to "vfio-pci", it also checks if the drivername
> contains the string "vfio" at all, and in this case allows the
> operation to continue. If the driver is in fact a VFIO variant, then
> the assignment will succeed, but if it is not a VFIO variant then QEMU
> will fail (and report the error back to libvirt).
>
> In the near future (possibly by kernel 6.0) there will be a
> formal method of identifying a VFIO variant driver by looking in
> sysfs; in the meantime the inexact, but simple, method in this patch
> will allow users of the few existing VFIO variant drivers (and
> developers of new VFIO variant drivers) to use their new drivers
> without needing to remove libvirt from their setup - they can simply
> pre-bind the device to the new driver, then use "managed='no'" in
> their libvirt config.
>
> NB: this patch does *not* handle automatically determining the proper
> vendor-specific driver and binding to it in the case of
> "managed='yes'". This will be implemented later when there is a
widely
> available driver / device combo we can use for testing.
>
> Signed-off-by: Laine Stump <laine(a)redhat.com>
> ---
> V1 here:
https://listman.redhat.com/archives/libvir-list/2022-August/233327.html
>
> Change in V2:
>
> V1 used the output of modinfo to look for "vfio_pci" as an alias for a
> driver to see if it was a VFIO variant driver.
>
> As a result of discussion of V1, V2 is much simpler - it just assumes
> that any driver with "vfio" in the name is a VFIO variant. This is
> okay because 1) QEMU will still catch it and libvirt will properly log
> the error if the driver isn't actually a VFIO variant, and 2) it's a
> temporary situation, just to enable use of VFIO variant drivers with
> libvirt until a standard method of detecting this is added to sysfs
> (which, according to the discussion of V1, is coming in the near
> future).
>
> (NB: I did implement checking of /lib/modules/`uname -r`/modules.alias
> as suggested by Erik, but it turned out that this caused the unit
> tests to call uname(3) and open the modules.alias file on the test
> host - for a proper unit test I would have also needed to mock these
> two functions, and it seemed like too much complexity for a temporary
> workaround. I've implemented Jason's suggestion here (accept any
> driver with "vfio" in the name), which is similar to danpb's
> suggestion (accept specifically the two drivers that are already in
> the upstream kernel), but will also allow for new drivers that may be
> under development.)
>
> src/hypervisor/virhostdev.c | 26 ++++---------
> src/util/virpci.c | 76 ++++++++++++++++++++++++++++++++++---
> src/util/virpci.h | 3 ++
> 3 files changed, 82 insertions(+), 23 deletions(-)
>
> diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
> index c0ce867596..15b35fa75e 100644
> --- a/src/hypervisor/virhostdev.c
> +++ b/src/hypervisor/virhostdev.c
> @@ -747,9 +747,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr,
> mgr->inactivePCIHostdevs) < 0)
> goto reattachdevs;
> } else {
> - g_autofree char *driverPath = NULL;
> - g_autofree char *driverName = NULL;
> - int stub;
> + g_autofree char *drvName = NULL;
> + virPCIStubDriver drvType;
> /* Unmanaged devices should already have been marked as
> * inactive: if that's the case, we can simply move on */
> @@ -769,18 +768,14 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr,
> * information about active / inactive device across
> * daemon restarts has been implemented */
> - if (virPCIDeviceGetDriverPathAndName(pci,
> - &driverPath, &driverName)
< 0)
> + if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType)
< 0)
> goto reattachdevs;
> - stub = virPCIStubDriverTypeFromString(driverName);
> -
> - if (stub > VIR_PCI_STUB_DRIVER_NONE &&
> - stub < VIR_PCI_STUB_DRIVER_LAST) {
> + if (drvType > VIR_PCI_STUB_DRIVER_NONE) {
> /* The device is bound to a known stub driver: store this
> * information and add a copy to the inactive list */
> - virPCIDeviceSetStubDriver(pci, stub);
> + virPCIDeviceSetStubDriver(pci, drvType);
> VIR_DEBUG("Adding PCI device %s to inactive list",
> virPCIDeviceGetName(pci));
> @@ -2292,18 +2287,13 @@ virHostdevPrepareOneNVMeDevice(virHostdevManager
> *hostdev_mgr,
> /* Let's check if all PCI devices are NVMe disks. */
> for (i = 0; i < virPCIDeviceListCount(pciDevices); i++) {
> virPCIDevice *pci = virPCIDeviceListGet(pciDevices, i);
> - g_autofree char *drvPath = NULL;
> g_autofree char *drvName = NULL;
> - int stub = VIR_PCI_STUB_DRIVER_NONE;
> + virPCIStubDriver drvType;
> - if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) <
0)
> + if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) <
0)
> goto cleanup;
> - if (drvName)
> - stub = virPCIStubDriverTypeFromString(drvName);
> -
> - if (stub == VIR_PCI_STUB_DRIVER_VFIO ||
> - STREQ_NULLABLE(drvName, "nvme"))
> + if (drvType == VIR_PCI_STUB_DRIVER_VFIO || STREQ_NULLABLE(drvName,
> "nvme"))
> continue;
> VIR_WARN("Suspicious NVMe disk assignment. PCI device "
> diff --git a/src/util/virpci.c b/src/util/virpci.c
> index 7800966963..51ccf4d9fd 100644
> --- a/src/util/virpci.c
> +++ b/src/util/virpci.c
> @@ -277,6 +277,71 @@ virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, char
> **path, char **name)
> }
> +/**
> + * virPCIDeviceGetDriverNameAndType:
> + * @dev: virPCIDevice object to examine
> + * @drvName: returns name of driver bound to this device (if any)
> + * @drvType: returns type of driver if it is a known stub driver type
> + *
> + * Find the name of the driver bound to @dev (if any) and the type of
> + * the driver if it is a known/recognized "stub" driver (based on the
> + * driver name).
> + *
> + * There are vfio "variant" drivers that provide all the basic
> + * functionality of the standard vfio-pci driver as well as additional
> + * stuff. There is a plan to add info to sysfs that will allow easily
> + * determining if a driver is a vfio variant driver, but that sysfs
> + * entry isn't yet available. In the meantime as a workaround so that
> + * the few existing vfio variant drivers can be used with libvirt, and
> + * so that driver developers can test their new vfio variant drivers
> + * without needing to bypass libvirt, we also check if the driver name
> + * contains the string "vfio"; if it does, then we consider this drier
> + * as type VFIO. This can lead to false positives, but that isn't a
> + * horrible thing, because the problem will still be caught by QEMU as
> + * soon as libvirt makes the request to attach the device.
> + *
> + * Return 0 on success, -1 on failure. If -1 is returned, then an error
> + * message has been logged.
> + */
> +int
> +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
> + char **drvName,
> + virPCIStubDriver *drvType)
> +{
> + g_autofree char *drvPath = NULL;
> + int tmpType;
> +
> + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0)
> + return -1;
> +
> + if (!*drvName) {
> + *drvType = VIR_PCI_STUB_DRIVER_NONE;
> + return 0;
> + }
> +
> + tmpType = virPCIStubDriverTypeFromString(*drvName);
> +
> + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) {
> + *drvType = tmpType;
> + return 0; /* exact match of a known driver name (or no name) */
> + }
> +
> + /* Check if the drivername contains "vfio" and count as a VFIO
> + * driver if so - see above for explanation.
> + */
> +
> + if (strstr(*drvName, "vfio")) {
> + VIR_DEBUG("Driver %s is a vfio_pci driver", *drvName);
> + *drvType = VIR_PCI_STUB_DRIVER_VFIO;
> + } else {
> + VIR_DEBUG("Driver %s is NOT a vfio_pci driver", *drvName);
> + *drvType = VIR_PCI_STUB_DRIVER_NONE;
> + }
> +
> + return 0;
> +}
> +
> +
> static int
> virPCIDeviceConfigOpenInternal(virPCIDevice *dev, bool readonly, bool fatal)
> {
> @@ -1004,8 +1069,8 @@ virPCIDeviceReset(virPCIDevice *dev,
> virPCIDeviceList *activeDevs,
> virPCIDeviceList *inactiveDevs)
> {
> - g_autofree char *drvPath = NULL;
> g_autofree char *drvName = NULL;
> + virPCIStubDriver drvType;
> int ret = -1;
> int fd = -1;
> int hdrType = -1;
> @@ -1032,15 +1097,16 @@ virPCIDeviceReset(virPCIDevice *dev,
> * reset it whenever appropriate, so doing it ourselves would just
> * be redundant.
> */
> - if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, &drvName) < 0)
> + if (virPCIDeviceGetDriverNameAndType(dev, &drvName, &drvType) < 0)
> goto cleanup;
> - if (virPCIStubDriverTypeFromString(drvName) == VIR_PCI_STUB_DRIVER_VFIO) {
> - VIR_DEBUG("Device %s is bound to vfio-pci - skip reset",
> - dev->name);
> + if (drvType == VIR_PCI_STUB_DRIVER_VFIO) {
> +
> + VIR_DEBUG("Device %s is bound to %s - skip reset", dev->name,
drvName);
> ret = 0;
> goto cleanup;
> }
> +
> VIR_DEBUG("Resetting device %s", dev->name);
> if ((fd = virPCIDeviceConfigOpenWrite(dev)) < 0)
> diff --git a/src/util/virpci.h b/src/util/virpci.h
> index 4d9193f24e..0532b90f90 100644
> --- a/src/util/virpci.h
> +++ b/src/util/virpci.h
> @@ -280,6 +280,9 @@ int virPCIDeviceRebind(virPCIDevice *dev);
> int virPCIDeviceGetDriverPathAndName(virPCIDevice *dev,
> char **path,
> char **name);
> +int virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
> + char **drvName,
> + virPCIStubDriver *drvType);
> int virPCIDeviceIsPCIExpress(virPCIDevice *dev);
> int virPCIDeviceHasPCIExpressLink(virPCIDevice *dev);