[PATCH] util: basic support for vendor-specific vfio drivers

Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest. There has been kernel work recently to support vendor/device-specific VFIO drivers that provide the basic vfio-pci driver functionality while adding support for device-specific operations (for example these device-specific drivers are planned to support live migration of certain devices). All that will be needed to make this functionality available will be to bind the new vendor-specific driver to the device (rather than the generic vfio-pci driver, which will continue to work just without the extra functionality). But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific driver to the device (and puts "managed='no'" in the config to prevent libvirt from changing that), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't named exactly "vfio-pci". Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver. That's what this patch does. When libvirt checks the driver bound to a device (either to decide if it needs to bind to a different driver or perform some other operation, or if the current driver is acceptable as-is), if the driver isn't specifically "vfio-pci", then it will look at the output of modinfo for the driver that *is* bound to the device; if modinfo shows vfio_pci as an alias for that device, then we'll behave as if the driver was exactly "vfio-pci". The effect of this patch is that users will now be able to pre-setup a device to be bound to a vendor-specific driver, then put "managed='no'" in the config and libvirt will allow that driver. What this patch does *not* do is handle automatically determining the proper/best vendor-specific driver and binding to it in the case of "managed='yes'". This will be implemented later when there is a widely available driver / device combo we can use for testing. This initial simple patch is just something simple that will permit initial testing of the new drivers' functionality. (I personally had to add an extra patch playing with driver names to my build just to test that everything was working as expected; that's okay for a patch as simple as this, but wouldn't be acceptable testing for anything more complex.) Signed-off-by: Laine Stump <laine@redhat.com> --- meson.build | 1 + src/hypervisor/virhostdev.c | 26 ++++------- src/util/virpci.c | 90 ++++++++++++++++++++++++++++++++++--- src/util/virpci.h | 3 ++ 4 files changed, 97 insertions(+), 23 deletions(-) diff --git a/meson.build b/meson.build index de59b1be9c..9d96eb3ee3 100644 --- a/meson.build +++ b/meson.build @@ -822,6 +822,7 @@ optional_programs = [ 'iscsiadm', 'mdevctl', 'mm-ctl', + 'modinfo', 'modprobe', 'ovs-vsctl', 'pdwtags', diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index c0ce867596..15b35fa75e 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -747,9 +747,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, mgr->inactivePCIHostdevs) < 0) goto reattachdevs; } else { - g_autofree char *driverPath = NULL; - g_autofree char *driverName = NULL; - int stub; + g_autofree char *drvName = NULL; + virPCIStubDriver drvType; /* Unmanaged devices should already have been marked as * inactive: if that's the case, we can simply move on */ @@ -769,18 +768,14 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr, * information about active / inactive device across * daemon restarts has been implemented */ - if (virPCIDeviceGetDriverPathAndName(pci, - &driverPath, &driverName) < 0) + if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0) goto reattachdevs; - stub = virPCIStubDriverTypeFromString(driverName); - - if (stub > VIR_PCI_STUB_DRIVER_NONE && - stub < VIR_PCI_STUB_DRIVER_LAST) { + if (drvType > VIR_PCI_STUB_DRIVER_NONE) { /* The device is bound to a known stub driver: store this * information and add a copy to the inactive list */ - virPCIDeviceSetStubDriver(pci, stub); + virPCIDeviceSetStubDriver(pci, drvType); VIR_DEBUG("Adding PCI device %s to inactive list", virPCIDeviceGetName(pci)); @@ -2292,18 +2287,13 @@ virHostdevPrepareOneNVMeDevice(virHostdevManager *hostdev_mgr, /* Let's check if all PCI devices are NVMe disks. */ for (i = 0; i < virPCIDeviceListCount(pciDevices); i++) { virPCIDevice *pci = virPCIDeviceListGet(pciDevices, i); - g_autofree char *drvPath = NULL; g_autofree char *drvName = NULL; - int stub = VIR_PCI_STUB_DRIVER_NONE; + virPCIStubDriver drvType; - if (virPCIDeviceGetDriverPathAndName(pci, &drvPath, &drvName) < 0) + if (virPCIDeviceGetDriverNameAndType(pci, &drvName, &drvType) < 0) goto cleanup; - if (drvName) - stub = virPCIStubDriverTypeFromString(drvName); - - if (stub == VIR_PCI_STUB_DRIVER_VFIO || - STREQ_NULLABLE(drvName, "nvme")) + if (drvType == VIR_PCI_STUB_DRIVER_VFIO || STREQ_NULLABLE(drvName, "nvme")) continue; VIR_WARN("Suspicious NVMe disk assignment. PCI device " diff --git a/src/util/virpci.c b/src/util/virpci.c index 7800966963..8b714d3ddc 100644 --- a/src/util/virpci.c +++ b/src/util/virpci.c @@ -37,6 +37,7 @@ #include "virstring.h" #include "viralloc.h" #include "virpcivpd.h" +#include "vircommand.h" VIR_LOG_INIT("util.pci"); @@ -277,6 +278,84 @@ virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, char **path, char **name) } +/** + * virPCIDeviceGetDriverNameAndType: + * @dev: virPCIDevice object to examine + * @drvName: returns name of driver bound to this device (if any) + * @drvType: returns type of driver if it is a known stub driver type + * + * Find the name of the driver bound to @dev (if any) and the type of + * the driver if it is a known/recognized "stub" driver (based on the + * name). If the name doesn't match one of the virPCIStubDrivers + * exactly, check the output of "modinfo vfio-pci" to see if + * "vfio_pci" is included in the driver's list of aliases; if so, then + * this driver has all the functionality of the basic vfio_pci driver, + * so it should be considered of the type VIR_PCI_STUB_DRIVER_VFIO. + * + * Return 0 on success, -1 on failure. If -1 is returned, then an error + * message has been logged. + */ +int +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType) +{ + g_autofree char *drvPath = NULL; + g_autoptr(virCommand) cmd = NULL; + g_autofree char *output = NULL; + g_autoptr(GRegex) regex = NULL; + g_autoptr(GError) err = NULL; + g_autoptr(GMatchInfo) info = NULL; + int exit; + int tmpType; + + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0) + return -1; + + if (!*drvName) { + *drvType = VIR_PCI_STUB_DRIVER_NONE; + return 0; + } + + tmpType = virPCIStubDriverTypeFromString(*drvName); + + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { + *drvType = tmpType; + return 0; /* exact match of a known driver name (or no name) */ + } + + /* Check the output of "modinfo $drvName" to see if it has + * "vfio_pci" as an alias. If it does, then this driver should + * also be considered as a vfio-pci driver, because it implements + * all the functionality of the basic vfio-pci (plus additional + * device-specific stuff). + */ + + cmd = virCommandNewArgList(MODINFO, *drvName, NULL); + virCommandSetOutputBuffer(cmd, &output); + if (virCommandRun(cmd, &exit) < 0) + return -1; + + regex = g_regex_new("^alias: +vfio_pci:", G_REGEX_MULTILINE, 0, &err); + if (!regex) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Failed to compile regex %s"), err->message); + return -1; + } + + g_regex_match(regex, output, 0, &info); + if (g_match_info_matches(info)) { + VIR_DEBUG("Driver %s is a vfio_pci driver", *drvName); + *drvType = VIR_PCI_STUB_DRIVER_VFIO; + } else { + VIR_DEBUG("Driver %s is NOT a vfio_pci driver", *drvName); + *drvType = VIR_PCI_STUB_DRIVER_NONE; + } + + return 0; +} + + static int virPCIDeviceConfigOpenInternal(virPCIDevice *dev, bool readonly, bool fatal) { @@ -1004,8 +1083,8 @@ virPCIDeviceReset(virPCIDevice *dev, virPCIDeviceList *activeDevs, virPCIDeviceList *inactiveDevs) { - g_autofree char *drvPath = NULL; g_autofree char *drvName = NULL; + virPCIStubDriver drvType; int ret = -1; int fd = -1; int hdrType = -1; @@ -1032,15 +1111,16 @@ virPCIDeviceReset(virPCIDevice *dev, * reset it whenever appropriate, so doing it ourselves would just * be redundant. */ - if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, &drvName) < 0) + if (virPCIDeviceGetDriverNameAndType(dev, &drvName, &drvType) < 0) goto cleanup; - if (virPCIStubDriverTypeFromString(drvName) == VIR_PCI_STUB_DRIVER_VFIO) { - VIR_DEBUG("Device %s is bound to vfio-pci - skip reset", - dev->name); + if (drvType == VIR_PCI_STUB_DRIVER_VFIO) { + + VIR_DEBUG("Device %s is bound to %s - skip reset", dev->name, drvName); ret = 0; goto cleanup; } + VIR_DEBUG("Resetting device %s", dev->name); if ((fd = virPCIDeviceConfigOpenWrite(dev)) < 0) diff --git a/src/util/virpci.h b/src/util/virpci.h index 4d9193f24e..0532b90f90 100644 --- a/src/util/virpci.h +++ b/src/util/virpci.h @@ -280,6 +280,9 @@ int virPCIDeviceRebind(virPCIDevice *dev); int virPCIDeviceGetDriverPathAndName(virPCIDevice *dev, char **path, char **name); +int virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType); int virPCIDeviceIsPCIExpress(virPCIDevice *dev); int virPCIDeviceHasPCIExpressLink(virPCIDevice *dev); -- 2.35.3

On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:
Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest.
There has been kernel work recently to support vendor/device-specific VFIO drivers that provide the basic vfio-pci driver functionality while adding support for device-specific operations (for example these device-specific drivers are planned to support live migration of certain devices). All that will be needed to make this functionality available will be to bind the new vendor-specific driver to the device (rather than the generic vfio-pci driver, which will continue to work just without the extra functionality).
But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific driver to the device (and puts "managed='no'" in the config to prevent libvirt from changing that), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't named exactly "vfio-pci".
Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver.
That's what this patch does. When libvirt checks the driver bound to a device (either to decide if it needs to bind to a different driver or perform some other operation, or if the current driver is acceptable as-is), if the driver isn't specifically "vfio-pci", then it will look at the output of modinfo for the driver that *is* bound to the device; if modinfo shows vfio_pci as an alias for that device, then we'll behave as if the driver was exactly "vfio-pci".
Since you say that the vendor/device-specific drivers does each of such drivers implement the base vfio-pci functionality or they simply call into the base driver? The reason why I'm asking is that if each of the vendor-specific drivers depend on the vfio-pci module to be loaded as well, then reading /proc/modules should suffice as vfio-pci should be listed right next to the vendor-specific one. What am I missing? The 'alias' field is optional so do we have any support guarantees from the vendors that the it will always be filled in correctly? I mean you surely handle that case in the code, but once we start supporting this there's no way back and we already know how painful it can be to convince the vendors to follow some kind of standard so that we don't need to maintain several code paths based on a vendor-matrix. ...
+int +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType) +{ + g_autofree char *drvPath = NULL; + g_autoptr(virCommand) cmd = NULL; + g_autofree char *output = NULL; + g_autoptr(GRegex) regex = NULL; + g_autoptr(GError) err = NULL; + g_autoptr(GMatchInfo) info = NULL; + int exit; + int tmpType; + + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0) + return -1; + + if (!*drvName) { + *drvType = VIR_PCI_STUB_DRIVER_NONE; + return 0; + } + + tmpType = virPCIStubDriverTypeFromString(*drvName); + + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { + *drvType = tmpType; + return 0; /* exact match of a known driver name (or no name) */ + } + + /* Check the output of "modinfo $drvName" to see if it has + * "vfio_pci" as an alias. If it does, then this driver should + * also be considered as a vfio-pci driver, because it implements + * all the functionality of the basic vfio-pci (plus additional + * device-specific stuff). + */
Instead of calling an external program and then grepping its output which technically could change in the future, wouldn't it be better if we read /lib/modules/`uname -r`/modules.alias and filtered whatever line had the vfio-pci' substring and compared the module name with the user-provided device driver? If not then I think you should pass '-F alias' to the command to speed up the regex just a tiny bit. Regards, Erik

On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:
Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest.
There has been kernel work recently to support vendor/device-specific VFIO drivers that provide the basic vfio-pci driver functionality while adding support for device-specific operations (for example these device-specific drivers are planned to support live migration of certain devices). All that will be needed to make this functionality available will be to bind the new vendor-specific driver to the device (rather than the generic vfio-pci driver, which will continue to work just without the extra functionality).
But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific driver to the device (and puts "managed='no'" in the config to prevent libvirt from changing that), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't named exactly "vfio-pci".
Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver.
That's what this patch does. When libvirt checks the driver bound to a device (either to decide if it needs to bind to a different driver or perform some other operation, or if the current driver is acceptable as-is), if the driver isn't specifically "vfio-pci", then it will look at the output of modinfo for the driver that *is* bound to the device; if modinfo shows vfio_pci as an alias for that device, then we'll behave as if the driver was exactly "vfio-pci".
Since you say that the vendor/device-specific drivers does each of such drivers implement the base vfio-pci functionality or they simply call into the base driver? The reason why I'm asking is that if each of the vendor-specific drivers depend on the vfio-pci module to be loaded as well, then reading /proc/modules should suffice as vfio-pci should be listed right next to the vendor-specific one. What am I missing? I don't know the definitive answer to that, as I have no example of a working vendor-specific driver to look at and only know about the kernel work going on second-hand from Alex. It looks like even the vfio_pci driver itself depends on other presumably lower level vfio-* modules (it
On 8/1/22 7:58 AM, Erik Skultety wrote: directly uses vfio_pci_core, which in turn uses vfio and vfio_vifqfd), so possibly these new drivers would be depending on one or more of those lower level modules rather than vfio_pci. Also I would imagine it would be possible for other drivers to also depend on the vfio-pci driver while not themselves being a vfio driver.
The 'alias' field is optional so do we have any support guarantees from the vendors that the it will always be filled in correctly? I mean you surely handle that case in the code, but once we start supporting this there's no way back and we already know how painful it can be to convince the vendors to follow some kind of standard so that we don't need to maintain several code paths based on a vendor-matrix.
The aliases are what is used to determine the "best" vfio driver for a particular device, so I don't think it would be possible for a driver to not implement it, and the method I've used here to determine if a driver is a vfio driver was recommended by Alex after a couple of discussions on the subject.
...
+int +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType) +{ + g_autofree char *drvPath = NULL; + g_autoptr(virCommand) cmd = NULL; + g_autofree char *output = NULL; + g_autoptr(GRegex) regex = NULL; + g_autoptr(GError) err = NULL; + g_autoptr(GMatchInfo) info = NULL; + int exit; + int tmpType; + + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0) + return -1; + + if (!*drvName) { + *drvType = VIR_PCI_STUB_DRIVER_NONE; + return 0; + } + + tmpType = virPCIStubDriverTypeFromString(*drvName); + + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { + *drvType = tmpType; + return 0; /* exact match of a known driver name (or no name) */ + } + + /* Check the output of "modinfo $drvName" to see if it has + * "vfio_pci" as an alias. If it does, then this driver should + * also be considered as a vfio-pci driver, because it implements + * all the functionality of the basic vfio-pci (plus additional + * device-specific stuff). + */
Instead of calling an external program and then grepping its output which technically could change in the future, wouldn't it be better if we read /lib/modules/`uname -r`/modules.alias and filtered whatever line had the vfio-pci' substring and compared the module name with the user-provided device driver?
Again, although I was hesistant about calling an external command, and asked if there was something simpler, Alex still suggested modinfo, so I'll let him answer that. Alex? (Also, although the format of the output of "uname -r" is pretty much written in stone, you're still running an external command :-))
If not then I think you should pass '-F alias' to the command to speed up the regex just a tiny bit.
Sure. That sounds fine to me.

On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:
Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest.
There has been kernel work recently to support vendor/device-specific VFIO drivers that provide the basic vfio-pci driver functionality while adding support for device-specific operations (for example these device-specific drivers are planned to support live migration of certain devices). All that will be needed to make this functionality available will be to bind the new vendor-specific driver to the device (rather than the generic vfio-pci driver, which will continue to work just without the extra functionality).
But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific driver to the device (and puts "managed='no'" in the config to prevent libvirt from changing that), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't named exactly "vfio-pci".
Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver.
That's what this patch does. When libvirt checks the driver bound to a device (either to decide if it needs to bind to a different driver or perform some other operation, or if the current driver is acceptable as-is), if the driver isn't specifically "vfio-pci", then it will look at the output of modinfo for the driver that *is* bound to the device; if modinfo shows vfio_pci as an alias for that device, then we'll behave as if the driver was exactly "vfio-pci".
Since you say that the vendor/device-specific drivers does each of such drivers implement the base vfio-pci functionality or they simply call into the base driver? The reason why I'm asking is that if each of the vendor-specific drivers depend on the vfio-pci module to be loaded as well, then reading /proc/modules should suffice as vfio-pci should be listed right next to the vendor-specific one. What am I missing? I don't know the definitive answer to that, as I have no example of a working vendor-specific driver to look at and only know about the kernel work going on second-hand from Alex. It looks like even the vfio_pci driver itself depends on other presumably lower level vfio-* modules (it directly uses vfio_pci_core, which in turn uses vfio and vfio_vifqfd), so possibly
On 8/1/22 7:58 AM, Erik Skultety wrote: these new drivers would be depending on one or more of those lower level modules rather than vfio_pci. Also I would imagine it would be possible for other drivers to also depend on the vfio-pci driver while not themselves being a vfio driver.
The 'alias' field is optional so do we have any support guarantees from the vendors that the it will always be filled in correctly? I mean you surely handle that case in the code, but once we start supporting this there's no way back and we already know how painful it can be to convince the vendors to follow some kind of standard so that we don't need to maintain several code paths based on a vendor-matrix.
The aliases are what is used to determine the "best" vfio driver for a particular device, so I don't think it would be possible for a driver to not implement it, and the method I've used here to determine if a driver is a vfio driver was recommended by Alex after a couple of discussions on the subject.
...
+int +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType) +{ + g_autofree char *drvPath = NULL; + g_autoptr(virCommand) cmd = NULL; + g_autofree char *output = NULL; + g_autoptr(GRegex) regex = NULL; + g_autoptr(GError) err = NULL; + g_autoptr(GMatchInfo) info = NULL; + int exit; + int tmpType; + + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0) + return -1; + + if (!*drvName) { + *drvType = VIR_PCI_STUB_DRIVER_NONE; + return 0; + } + + tmpType = virPCIStubDriverTypeFromString(*drvName); + + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { + *drvType = tmpType; + return 0; /* exact match of a known driver name (or no name) */ + } + + /* Check the output of "modinfo $drvName" to see if it has + * "vfio_pci" as an alias. If it does, then this driver should + * also be considered as a vfio-pci driver, because it implements + * all the functionality of the basic vfio-pci (plus additional + * device-specific stuff). + */
Instead of calling an external program and then grepping its output which technically could change in the future, wouldn't it be better if we read /lib/modules/`uname -r`/modules.alias and filtered whatever line had the vfio-pci' substring and compared the module name with the user-provided device driver?
Again, although I was hesistant about calling an external command, and asked if there was something simpler, Alex still suggested modinfo, so I'll let him answer that. Alex?
(Also, although the format of the output of "uname -r" is pretty much written in stone, you're still running an external command :-))
Well, that was just my attempt to express whatever is the running kernel in that path, but of course I didn't mean to run the command verbatim as it contradicts what I'm trying to suggest. We can get the version from e.g. QEMU caps (<kernelVersion>), it's also available in /proc/version. Grepping the codebase we don't seem to be using the kernelVersion attribute from QEMU caps at the moment. Erik

Putting Alex on CC since I don't see him there: +alex.williamson@redhat.com On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:
Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest.
There has been kernel work recently to support vendor/device-specific VFIO drivers that provide the basic vfio-pci driver functionality while adding support for device-specific operations (for example these device-specific drivers are planned to support live migration of certain devices). All that will be needed to make this functionality available will be to bind the new vendor-specific driver to the device (rather than the generic vfio-pci driver, which will continue to work just without the extra functionality).
But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific driver to the device (and puts "managed='no'" in the config to prevent libvirt from changing that), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't named exactly "vfio-pci".
Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver.
That's what this patch does. When libvirt checks the driver bound to a device (either to decide if it needs to bind to a different driver or perform some other operation, or if the current driver is acceptable as-is), if the driver isn't specifically "vfio-pci", then it will look at the output of modinfo for the driver that *is* bound to the device; if modinfo shows vfio_pci as an alias for that device, then we'll behave as if the driver was exactly "vfio-pci".
Since you say that the vendor/device-specific drivers does each of such drivers implement the base vfio-pci functionality or they simply call into the base driver? The reason why I'm asking is that if each of the vendor-specific drivers depend on the vfio-pci module to be loaded as well, then reading /proc/modules should suffice as vfio-pci should be listed right next to the vendor-specific one. What am I missing? I don't know the definitive answer to that, as I have no example of a working vendor-specific driver to look at and only know about the kernel work going on second-hand from Alex. It looks like even the vfio_pci driver itself depends on other presumably lower level vfio-* modules (it directly uses vfio_pci_core, which in turn uses vfio and vfio_vifqfd), so possibly
On 8/1/22 7:58 AM, Erik Skultety wrote: these new drivers would be depending on one or more of those lower level modules rather than vfio_pci. Also I would imagine it would be possible for other drivers to also depend on the vfio-pci driver while not themselves being a vfio driver.
The 'alias' field is optional so do we have any support guarantees from the vendors that the it will always be filled in correctly? I mean you surely handle that case in the code, but once we start supporting this there's no way back and we already know how painful it can be to convince the vendors to follow some kind of standard so that we don't need to maintain several code paths based on a vendor-matrix.
The aliases are what is used to determine the "best" vfio driver for a particular device, so I don't think it would be possible for a driver to not implement it, and the method I've used here to determine if a driver is a vfio driver was recommended by Alex after a couple of discussions on the subject.
...
+int +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType) +{ + g_autofree char *drvPath = NULL; + g_autoptr(virCommand) cmd = NULL; + g_autofree char *output = NULL; + g_autoptr(GRegex) regex = NULL; + g_autoptr(GError) err = NULL; + g_autoptr(GMatchInfo) info = NULL; + int exit; + int tmpType; + + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0) + return -1; + + if (!*drvName) { + *drvType = VIR_PCI_STUB_DRIVER_NONE; + return 0; + } + + tmpType = virPCIStubDriverTypeFromString(*drvName); + + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { + *drvType = tmpType; + return 0; /* exact match of a known driver name (or no name) */ + } + + /* Check the output of "modinfo $drvName" to see if it has + * "vfio_pci" as an alias. If it does, then this driver should + * also be considered as a vfio-pci driver, because it implements + * all the functionality of the basic vfio-pci (plus additional + * device-specific stuff). + */
Instead of calling an external program and then grepping its output which technically could change in the future, wouldn't it be better if we read /lib/modules/`uname -r`/modules.alias and filtered whatever line had the vfio-pci' substring and compared the module name with the user-provided device driver?
Again, although I was hesistant about calling an external command, and asked if there was something simpler, Alex still suggested modinfo, so I'll let him answer that. Alex?
(Also, although the format of the output of "uname -r" is pretty much written in stone, you're still running an external command :-))
If not then I think you should pass '-F alias' to the command to speed up the regex just a tiny bit.
Sure. That sounds fine to me.

On Mon, 1 Aug 2022 16:02:05 +0200 Erik Skultety <eskultet@redhat.com> wrote:
Putting Alex on CC since I don't see him there: +alex.williamson@redhat.com
Hmm, Laine cc'd me on the initial post but it seems it got dropped somewhere.
On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:
Before a PCI device can be assigned to a guest with VFIO, that device must be bound to the vfio-pci driver rather than to the device's normal driver. The vfio-pci driver provides APIs that permit QEMU to perform all the necessary operations to make the device accessible to the guest.
There has been kernel work recently to support vendor/device-specific VFIO drivers that provide the basic vfio-pci driver functionality while adding support for device-specific operations (for example these device-specific drivers are planned to support live migration of certain devices). All that will be needed to make this functionality available will be to bind the new vendor-specific driver to the device (rather than the generic vfio-pci driver, which will continue to work just without the extra functionality).
But until now libvirt has required that all PCI devices being assigned to a guest with VFIO specifically have the "vfio-pci" driver bound to the device. So even if the user manually binds a shiny new vendor-specific driver to the device (and puts "managed='no'" in the config to prevent libvirt from changing that), libvirt will just fail during startup of the guest (or during hotplug) because the driver bound to the device isn't named exactly "vfio-pci".
Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver.
That's what this patch does. When libvirt checks the driver bound to a device (either to decide if it needs to bind to a different driver or perform some other operation, or if the current driver is acceptable as-is), if the driver isn't specifically "vfio-pci", then it will look at the output of modinfo for the driver that *is* bound to the device; if modinfo shows vfio_pci as an alias for that device, then we'll behave as if the driver was exactly "vfio-pci".
Since you say that the vendor/device-specific drivers does each of such drivers implement the base vfio-pci functionality or they simply call into the base driver? The reason why I'm asking is that if each of the vendor-specific drivers depend on the vfio-pci module to be loaded as well, then reading /proc/modules should suffice as vfio-pci should be listed right next to the vendor-specific one. What am I missing? I don't know the definitive answer to that, as I have no example of a working vendor-specific driver to look at and only know about the kernel work going on second-hand from Alex. It looks like even the vfio_pci driver itself depends on other presumably lower level vfio-* modules (it directly uses vfio_pci_core, which in turn uses vfio and vfio_vifqfd), so possibly
On 8/1/22 7:58 AM, Erik Skultety wrote: these new drivers would be depending on one or more of those lower level modules rather than vfio_pci. Also I would imagine it would be possible for other drivers to also depend on the vfio-pci driver while not themselves being a vfio driver.
A module dependency on vfio-pci (actually vfio-pci-core) is a pretty loose requirement, *any* symbol dependency generates such a linkage, without necessarily exposing a vfio-pci uAPI. The alias support introduced to the kernel is intended to allow userspace to determine the most appropriate vfio-pci driver for a device, whether that's vfio-pci itself or a variant driver that augments device specific features. See the upstream commit here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i... All we're doing here is extending libvirt to say that if the driver is vfio-pci or the modalias for the driver is prefixed with vfio-pci, then the driver exposes a vfio-pci compatible uAPI. I expect in the future libvirt, or some other utility, may take on the role as described in the above commit log to not only detect that a driver supports a vfio-pci uAPI, but also to identify the most appropriate driver for the device which exposes a vfio-uAPI.
The 'alias' field is optional so do we have any support guarantees from the vendors that the it will always be filled in correctly? I mean you surely handle that case in the code, but once we start supporting this there's no way back and we already know how painful it can be to convince the vendors to follow some kind of standard so that we don't need to maintain several code paths based on a vendor-matrix.
The aliases are what is used to determine the "best" vfio driver for a particular device, so I don't think it would be possible for a driver to not implement it, and the method I've used here to determine if a driver is a vfio driver was recommended by Alex after a couple of discussions on the subject.
...
+int +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev, + char **drvName, + virPCIStubDriver *drvType) +{ + g_autofree char *drvPath = NULL; + g_autoptr(virCommand) cmd = NULL; + g_autofree char *output = NULL; + g_autoptr(GRegex) regex = NULL; + g_autoptr(GError) err = NULL; + g_autoptr(GMatchInfo) info = NULL; + int exit; + int tmpType; + + if (virPCIDeviceGetDriverPathAndName(dev, &drvPath, drvName) < 0) + return -1; + + if (!*drvName) { + *drvType = VIR_PCI_STUB_DRIVER_NONE; + return 0; + } + + tmpType = virPCIStubDriverTypeFromString(*drvName); + + if (tmpType > VIR_PCI_STUB_DRIVER_NONE) { + *drvType = tmpType; + return 0; /* exact match of a known driver name (or no name) */ + } + + /* Check the output of "modinfo $drvName" to see if it has + * "vfio_pci" as an alias. If it does, then this driver should + * also be considered as a vfio-pci driver, because it implements + * all the functionality of the basic vfio-pci (plus additional + * device-specific stuff). + */
Instead of calling an external program and then grepping its output which technically could change in the future, wouldn't it be better if we read /lib/modules/`uname -r`/modules.alias and filtered whatever line had the vfio-pci' substring and compared the module name with the user-provided device driver?
Again, although I was hesistant about calling an external command, and asked if there was something simpler, Alex still suggested modinfo, so I'll let him answer that. Alex?
(Also, although the format of the output of "uname -r" is pretty much written in stone, you're still running an external command :-))
The above commit log actually suggests modules.alias, so if you'd rather rely on that than modinfo, sure, that's fine. Thanks, Alex

On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver.
We are moving in a direction on the kernel side to expose a sysfs under the PCI device that definitively says it is VFIO enabled, eg something like /sys/devices/pci0000:00/0000:00:1f.6/vfio/<N> Which is how every other subsystem in the kernel works. When this lands libvirt can simply stat the vfio directory and confirm that the device handle it is looking at is vfio enabled, for all things that vfio support. My thinking had been to do the above work a bit later, but if libvirt needs it right now then lets do it right away so we don't have to worry about this hacky modprobe stuff down the road? There are already WIP patches for it here: https://github.com/yiliu1765/iommufd/commit/7fff911c20f31dff933741771499170d... Regards, Jason

On Thu, 4 Aug 2022 13:51:20 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver.
We are moving in a direction on the kernel side to expose a sysfs under the PCI device that definitively says it is VFIO enabled, eg something like
/sys/devices/pci0000:00/0000:00:1f.6/vfio/<N>
Which is how every other subsystem in the kernel works. When this lands libvirt can simply stat the vfio directory and confirm that the device handle it is looking at is vfio enabled, for all things that vfio support.
My thinking had been to do the above work a bit later, but if libvirt needs it right now then lets do it right away so we don't have to worry about this hacky modprobe stuff down the road?
That seems like a pretty long gap, there are vfio-pci variant drivers since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20) midway through the merge window. We therefore have at least 3 kernels exposing devices in a way that libvirt can't make use of simply due to a driver matching test. Libvirt needs backwards compatibility, so we'll need it to look for the vfio-pci driver through some long deprecation period. In the interim, it can look at module aliases, support for which will be necessary and might be leveraged for managed='yes' with variant drivers. Once vfio devices expose a chardev themselves, libvirt might order the tests as: a) vfio device chardev present b) driver is a vfio-pci modalias c) driver is vfio-pci The current state of the world though is that variant driver exist and libvirt can't make use of them. Thanks, Alex

On Thu, Aug 04, 2022 at 12:18:26PM -0600, Alex Williamson wrote:
On Thu, 4 Aug 2022 13:51:20 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
> Fortunately these new vendor/device-specific drivers can be easily > identified as being "vfio-pci + extra stuff" - all that's needed is to > look at the output of the "modinfo $driver_name" command to see if > "vfio_pci" is in the alias list for the driver.
We are moving in a direction on the kernel side to expose a sysfs under the PCI device that definitively says it is VFIO enabled, eg something like
/sys/devices/pci0000:00/0000:00:1f.6/vfio/<N>
Which is how every other subsystem in the kernel works. When this lands libvirt can simply stat the vfio directory and confirm that the device handle it is looking at is vfio enabled, for all things that vfio support.
My thinking had been to do the above work a bit later, but if libvirt needs it right now then lets do it right away so we don't have to worry about this hacky modprobe stuff down the road?
That seems like a pretty long gap, there are vfio-pci variant drivers since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20) midway through the merge window. We therefore have at least 3 kernels exposing devices in a way that libvirt can't make use of simply due to a driver matching test.
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
might be leveraged for managed='yes' with variant drivers. Once vfio devices expose a chardev themselves, libvirt might order the tests as:
I wasn't thinking to include the chardev part if we are to expedite this. The struct device bit alone is enough and it doesn't have the complex bits needed to make the cdev. If you say you want to do it we'll do it for v6.1.. Jason

On 8/4/22 2:36 PM, Jason Gunthorpe wrote:
On Thu, Aug 04, 2022 at 12:18:26PM -0600, Alex Williamson wrote:
On Thu, 4 Aug 2022 13:51:20 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
>> Fortunately these new vendor/device-specific drivers can be easily >> identified as being "vfio-pci + extra stuff" - all that's needed is to >> look at the output of the "modinfo $driver_name" command to see if >> "vfio_pci" is in the alias list for the driver.
We are moving in a direction on the kernel side to expose a sysfs under the PCI device that definitively says it is VFIO enabled, eg something like
/sys/devices/pci0000:00/0000:00:1f.6/vfio/<N>
Which is how every other subsystem in the kernel works. When this lands libvirt can simply stat the vfio directory and confirm that the device handle it is looking at is vfio enabled, for all things that vfio support.
My thinking had been to do the above work a bit later, but if libvirt needs it right now then lets do it right away so we don't have to worry about this hacky modprobe stuff down the road?
That seems like a pretty long gap, there are vfio-pci variant drivers since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20) midway through the merge window. We therefore have at least 3 kernels exposing devices in a way that libvirt can't make use of simply due to a driver matching test.
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
might be leveraged for managed='yes' with variant drivers. Once vfio devices expose a chardev themselves, libvirt might order the tests as:
I wasn't thinking to include the chardev part if we are to expedite this. The struct device bit alone is enough and it doesn't have the complex bits needed to make the cdev.
If you say you want to do it we'll do it for v6.1..
Since we already need to do something else as a stop-gap for the interim (in order to avoid making driver developers wait any longer if for no other reason), my opinion would be to not spend extra time splitting up patches just to give us this functionality slightly sooner; we'll anyway have something at least workable in place. Definitely once it is there, libvirt should check for it, since it would be quicker and just "feels even more reliable". I'm updating my patches to directly look at modules.alias and will resend based on that.

On Thu, 4 Aug 2022 15:11:07 -0400 Laine Stump <laine@redhat.com> wrote:
On 8/4/22 2:36 PM, Jason Gunthorpe wrote:
On Thu, Aug 04, 2022 at 12:18:26PM -0600, Alex Williamson wrote:
On Thu, 4 Aug 2022 13:51:20 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
>>> Fortunately these new vendor/device-specific drivers can be easily >>> identified as being "vfio-pci + extra stuff" - all that's needed is to >>> look at the output of the "modinfo $driver_name" command to see if >>> "vfio_pci" is in the alias list for the driver.
We are moving in a direction on the kernel side to expose a sysfs under the PCI device that definitively says it is VFIO enabled, eg something like
/sys/devices/pci0000:00/0000:00:1f.6/vfio/<N>
Which is how every other subsystem in the kernel works. When this lands libvirt can simply stat the vfio directory and confirm that the device handle it is looking at is vfio enabled, for all things that vfio support.
My thinking had been to do the above work a bit later, but if libvirt needs it right now then lets do it right away so we don't have to worry about this hacky modprobe stuff down the road?
That seems like a pretty long gap, there are vfio-pci variant drivers since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20) midway through the merge window. We therefore have at least 3 kernels exposing devices in a way that libvirt can't make use of simply due to a driver matching test.
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
might be leveraged for managed='yes' with variant drivers. Once vfio devices expose a chardev themselves, libvirt might order the tests as:
I wasn't thinking to include the chardev part if we are to expedite this. The struct device bit alone is enough and it doesn't have the complex bits needed to make the cdev.
If you say you want to do it we'll do it for v6.1..
Since we already need to do something else as a stop-gap for the interim (in order to avoid making driver developers wait any longer if for no other reason), my opinion would be to not spend extra time splitting up patches just to give us this functionality slightly sooner; we'll anyway have something at least workable in place.
We also need to be careful in adding things piecemeal that libvirt can determine when new functionality, such as vfio device chardevs, are actually available and not simply a placeholder to fill a gap elsewhere. Thanks, Alex

On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
This is still pretty hacky. I'm worried about what happens to the kernel if this becames some crazy unintended uAPI that we never really thought about carefully... This was not a use case when we designed the modules.alias stuff at least. BTW - why not do things the normal way? 1. readlink /sys/bus/pci/devices/XX/iommu_group 2. Compute basename of #1 3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2) It has a small edge case where a multi-device group might give a false positive for an undrivered device, but for the purposes of libvirt that seems pretty obscure.. (while the above has false negative issues, obviously)
Since we already need to do something else as a stop-gap for the interim (in order to avoid making driver developers wait any longer if for no other reason), my opinion would be to not spend extra time splitting up patches just to give us this functionality slightly sooner; we'll anyway have something at least workable in place.
We also need to be careful in adding things piecemeal that libvirt can determine when new functionality, such as vfio device chardevs, are actually available and not simply a placeholder to fill a gap elsewhere.
In sysfs the kernel standard is the directory means there is a "struct device" and the "dev" file within the directory means there is a cdev, so it is safely discoverable. I would like to split the kernel patch anyhow because the cdev is conceptually complicated so having just the struct device bits on its own would be helpful.. Jason

On Thu, 4 Aug 2022 21:11:07 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
This is still pretty hacky. I'm worried about what happens to the kernel if this becames some crazy unintended uAPI that we never really thought about carefully... This was not a use case when we designed the modules.alias stuff at least.
BTW - why not do things the normal way?
1. readlink /sys/bus/pci/devices/XX/iommu_group 2. Compute basename of #1 3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)
It has a small edge case where a multi-device group might give a false positive for an undrivered device, but for the purposes of libvirt that seems pretty obscure.. (while the above has false negative issues, obviously)
This is not a small edge case, it's extremely common. We have a *lot* of users assigning desktop GPUs and other consumer grade hardware, which are usually multi-function devices without isolation exposed via ACS or quirks. The vfio group exists if any devices in the group are bound to a vfio driver, but the device is not accessible from the group unless the viability test passes. That means QEMU may not be able to get access to the device because the device we want isn't actually bound to a vfio driver or another device in the group is not in a viable state. Thanks, Alex

On Fri, Aug 05, 2022 at 11:24:08AM -0600, Alex Williamson wrote:
On Thu, 4 Aug 2022 21:11:07 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
This is still pretty hacky. I'm worried about what happens to the kernel if this becames some crazy unintended uAPI that we never really thought about carefully... This was not a use case when we designed the modules.alias stuff at least.
BTW - why not do things the normal way?
1. readlink /sys/bus/pci/devices/XX/iommu_group 2. Compute basename of #1 3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)
It has a small edge case where a multi-device group might give a false positive for an undrivered device, but for the purposes of libvirt that seems pretty obscure.. (while the above has false negative issues, obviously)
This is not a small edge case, it's extremely common. We have a *lot* of users assigning desktop GPUs and other consumer grade hardware, which are usually multi-function devices without isolation exposed via ACS or quirks.
The edge case is that the user has created a multi-device group, manually assigned device 1 in the group to VFIO, left device 2 with no driver and then told libvirt to manually use device 2. With the above approach libvirt won't detect this misconfiguration and qemu will fail.
The vfio group exists if any devices in the group are bound to a vfio driver, but the device is not accessible from the group unless the viability test passes. That means QEMU may not be able to get access to the device because the device we want isn't actually bound to a vfio driver or another device in the group is not in a viable state. Thanks,
This is a different misconfiguration that libvirt also won't detect, right? In this case ownership claiming in the kernel will fail and qemu will fail too, like above. This, and the above, could be handled by having libvirt also open the group FD and get the device. It would prove both correct binding and viability. I had understood the point of this logic was to give better error reporting to users so that common misconfigurations could be diagnosed earlier. When I say 'small edge case' I mean it seems like an unlikely misconfiguration that someone would know to setup VFIO but then use the wrong BDFs to do it - arguably less likely than someone would know to setup VFIO but forget to unbind the other drivers in the group? But maybe I don't get it at all ... Jason

On Fri, 5 Aug 2022 15:20:24 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Fri, Aug 05, 2022 at 11:24:08AM -0600, Alex Williamson wrote:
On Thu, 4 Aug 2022 21:11:07 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
This is still pretty hacky. I'm worried about what happens to the kernel if this becames some crazy unintended uAPI that we never really thought about carefully... This was not a use case when we designed the modules.alias stuff at least.
BTW - why not do things the normal way?
1. readlink /sys/bus/pci/devices/XX/iommu_group 2. Compute basename of #1 3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)
It has a small edge case where a multi-device group might give a false positive for an undrivered device, but for the purposes of libvirt that seems pretty obscure.. (while the above has false negative issues, obviously)
This is not a small edge case, it's extremely common. We have a *lot* of users assigning desktop GPUs and other consumer grade hardware, which are usually multi-function devices without isolation exposed via ACS or quirks.
The edge case is that the user has created a multi-device group, manually assigned device 1 in the group to VFIO, left device 2 with no driver and then told libvirt to manually use device 2. With the above approach libvirt won't detect this misconfiguration and qemu will fail.
The vfio group exists if any devices in the group are bound to a vfio driver, but the device is not accessible from the group unless the viability test passes. That means QEMU may not be able to get access to the device because the device we want isn't actually bound to a vfio driver or another device in the group is not in a viable state. Thanks,
This is a different misconfiguration that libvirt also won't detect, right? In this case ownership claiming in the kernel will fail and qemu will fail too, like above.
This, and the above, could be handled by having libvirt also open the group FD and get the device. It would prove both correct binding and viability.
libvirt cannot do this in the group model because the group must be isolated in a container before the device can be accessed and libvirt cannot presume the QEMU container configuration. For direct device access, this certainly becomes a possibility and I've been trying to steer things in that direction, libvirt has the option to pass an fd for the iommufd and can then pass fds for each of the devices in the new uAPI paradigm.
I had understood the point of this logic was to give better error reporting to users so that common misconfigurations could be diagnosed earlier. When I say 'small edge case' I mean it seems like an unlikely misconfiguration that someone would know to setup VFIO but then use the wrong BDFs to do it - arguably less likely than someone would know to setup VFIO but forget to unbind the other drivers in the group?
I'm not sure how much testing libvirt does of other devices in a group, Laine? AIUI here, libvirt has a managed='yes|no' option per device. In the 'yes' case libvirt will unbind devices from their host driver and bind them to vfio-pci. In the 'no' case, I believe libvirt is still doing a sanity test on the driver, but only knows about vfio-pci. The initial step is to then enlighten libvirt that other drivers can be compatible for the 'no' case and later we can make smarter choices about which driver to use or allow the user to specify (ie. a user should be able to use vfio-pci rather than a variant driver if they choose) in the 'yes' case. If libvirt is currently testing that only the target device is bound to vfio-pci, then maybe we do have gaps for the ancillary devices in the group, but that gap changes if instead we only test that a vfio group exists relative to the iommu group of the target device. Thanks, Alex

On 8/5/22 2:56 PM, Alex Williamson wrote:
On Fri, 5 Aug 2022 15:20:24 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Fri, Aug 05, 2022 at 11:24:08AM -0600, Alex Williamson wrote:
On Thu, 4 Aug 2022 21:11:07 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
> That is reasonable, but I'd say those three kernels only have two > drivers and they both have vfio as a substring in their name - so the > simple thing of just substring searching 'vfio' would get us over that > gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
This is still pretty hacky. I'm worried about what happens to the kernel if this becames some crazy unintended uAPI that we never really thought about carefully... This was not a use case when we designed the modules.alias stuff at least.
BTW - why not do things the normal way?
1. readlink /sys/bus/pci/devices/XX/iommu_group 2. Compute basename of #1 3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)
It has a small edge case where a multi-device group might give a false positive for an undrivered device, but for the purposes of libvirt that seems pretty obscure.. (while the above has false negative issues, obviously)
This is not a small edge case, it's extremely common. We have a *lot* of users assigning desktop GPUs and other consumer grade hardware, which are usually multi-function devices without isolation exposed via ACS or quirks.
The edge case is that the user has created a multi-device group, manually assigned device 1 in the group to VFIO, left device 2 with no driver and then told libvirt to manually use device 2. With the above approach libvirt won't detect this misconfiguration and qemu will fail.
The vfio group exists if any devices in the group are bound to a vfio driver, but the device is not accessible from the group unless the viability test passes. That means QEMU may not be able to get access to the device because the device we want isn't actually bound to a vfio driver or another device in the group is not in a viable state. Thanks,
This is a different misconfiguration that libvirt also won't detect, right? In this case ownership claiming in the kernel will fail and qemu will fail too, like above.
This, and the above, could be handled by having libvirt also open the group FD and get the device. It would prove both correct binding and viability.
libvirt cannot do this in the group model because the group must be isolated in a container before the device can be accessed and libvirt cannot presume the QEMU container configuration. For direct device access, this certainly becomes a possibility and I've been trying to steer things in that direction, libvirt has the option to pass an fd for the iommufd and can then pass fds for each of the devices in the new uAPI paradigm.
I had understood the point of this logic was to give better error reporting to users so that common misconfigurations could be diagnosed earlier. When I say 'small edge case' I mean it seems like an unlikely misconfiguration that someone would know to setup VFIO but then use the wrong BDFs to do it - arguably less likely than someone would know to setup VFIO but forget to unbind the other drivers in the group?
I'm not sure how much testing libvirt does of other devices in a group, Laine?
It had been so long since I looked at that, and the code was so obtuse, that I had to set something up to see what happened. 1) if there is another devices in the same group, and that device is bound to vfio-pci and in use by a different QEMU process, then libvirt refuses assign the device in question (it will allow assigning the device to the same guest as other devices in the same group. 2) if there is another device in the same group, and that device is bound to some other driver than vfio-pci, then libvirt doesn't notice, tells QEMU to do the assignment, and QEMU fails. Without looking/trying, I would have said that libvirt would check for (2), but I guess nobody ever tried it :-/
AIUI here, libvirt has a managed='yes|no' option per device. In the 'yes' case libvirt will unbind devices from their host driver and bind them to vfio-pci. In the 'no' case, I believe libvirt is still doing a sanity test on the driver, but only knows about vfio-pci.
Correct.
The initial step is to then enlighten libvirt that other drivers can be compatible for the 'no' case and later we can make smarter choices about which driver to use or allow the user to specify (ie. a user should be able to use vfio-pci rather than a variant driver if they choose) in the 'yes' case.
Yes, that's the next step. I just wanted to first add a simple (i.e. difficult to botch up) patch to allow trying out the new variant drivers without bypassing libvirt (which is bad PR for libvirt every time it happens).
If libvirt is currently testing that only the target device is bound to vfio-pci, then maybe we do have gaps for the ancillary devices in the group, but that gap changes if instead we only test that a vfio group exists relative to the iommu group of the target device.
Yep. Currently we miss case (2) above. With this suggested method of detecting vfio variant drivers, we would also start missing (1). Of course QEMU would also give an error and fail in this case, so it's not catastrophic (but still not prefered).

On Fri, Aug 05, 2022 at 08:21:50PM -0400, Laine Stump wrote:
Without looking/trying, I would have said that libvirt would check for (2), but I guess nobody ever tried it :-/
There is nothing you can do about this in userspace today. The kernel has internal data about what devices are or are not compatible with vfio, and that data should stay internal to the kernel. Hardwiring ABI assumptions to fix this in vfio is pretty bad.
The initial step is to then enlighten libvirt that other drivers can be compatible for the 'no' case and later we can make smarter choices about which driver to use or allow the user to specify (ie. a user should be able to use vfio-pci rather than a variant driver if they choose) in the 'yes' case.
Yes, that's the next step. I just wanted to first add a simple (i.e. difficult to botch up) patch to allow trying out the new variant drivers without bypassing libvirt (which is bad PR for libvirt every time it happens).
I still think "search for vfio in the driver name" is sufficient for now, and we can do much better, including solving (2) with kernel support.
Yep. Currently we miss case (2) above. With this suggested method of detecting vfio variant drivers, we would also start missing (1).
No, (1) should be detected by scanning /sys/class/vfio/XX and looking for devices in the same group and running your "different QEMU process" algorithm on those grouping. (1) should have nothing to do with driver name matching. Jason

On Fri, Aug 05, 2022 at 12:56:17PM -0600, Alex Williamson wrote:
This, and the above, could be handled by having libvirt also open the group FD and get the device. It would prove both correct binding and viability.
libvirt cannot do this in the group model because the group must be isolated in a container before the device can be accessed and libvirt cannot presume the QEMU container configuration.
Hmm? it is a bunch of busy work, but libvirt can open an empty container, do the VFIO_GROUP_GET_STATUS and then close everything before invoking qemu. But, looking at this, it seems we've messed something up in the kernel implementation: case VFIO_GROUP_GET_STATUS: down_read(&group->group_rwsem); if (group->container) status.flags |= VFIO_GROUP_FLAGS_CONTAINER_SET | VFIO_GROUP_FLAGS_VIABLE; else if (!iommu_group_dma_owner_claimed(group->iommu_group)) status.flags |= VFIO_GROUP_FLAGS_VIABLE; up_read(&group->group_rwsem); If group->container is set then iommu_group_dma_owner_claimed() will always be true, because group->container can only ever be set if iommu_group_claim_dma_owner() succeeded. So I wonder what was supposed to happen here..
For direct device access, this certainly becomes a possibility and I've been trying to steer things in that direction, libvirt has the option to pass an fd for the iommufd and can then pass fds for each of the devices in the new uAPI paradigm.
So, what would be nice here? I assume libvirt would like to use these tests even if the VM is currently running, so we want some kind of sysfs file that returns "true if vfio can succeed iommu_group_claim_dma_owner()" - which would be true if a vfio_device FD is already open, otherwise would check iommu_group_dma_owner_claimed() to look for conflicting drivers?
The initial step is to then enlighten libvirt that other drivers can be compatible for the 'no' case and later we can make smarter choices about which driver to use or allow the user to specify (ie. a user should be able to use vfio-pci rather than a variant driver if they choose) in the 'yes' case.
I really don't like all this snooping around in kernel internals. It is creating ABI where ABI should not exist. If libvirt is going to do this stuff I would be happier if it was conditional on kernel-support to avoid it, and we don't have a libvirt in the wild that is creating these ABIs. Yu had split out the vfio_device sysfs part last week so we might see a series this week or next.
If libvirt is currently testing that only the target device is bound to vfio-pci, then maybe we do have gaps for the ancillary devices in the group, but that gap changes if instead we only test that a vfio group exists relative to the iommu group of the target device.
Yes, but if we check the group then libvirt uses a kernel led solution and doesn't create ABI - though it is differently imperfect than the current solution.. Jason

On 8/5/22 2:20 PM, Jason Gunthorpe wrote:
On Fri, Aug 05, 2022 at 11:24:08AM -0600, Alex Williamson wrote:
On Thu, 4 Aug 2022 21:11:07 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
This is still pretty hacky. I'm worried about what happens to the kernel if this becames some crazy unintended uAPI that we never really thought about carefully... This was not a use case when we designed the modules.alias stuff at least.
BTW - why not do things the normal way?
1. readlink /sys/bus/pci/devices/XX/iommu_group 2. Compute basename of #1 3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)
It has a small edge case where a multi-device group might give a false positive for an undrivered device, but for the purposes of libvirt that seems pretty obscure.. (while the above has false negative issues, obviously)
This is not a small edge case, it's extremely common. We have a *lot* of users assigning desktop GPUs and other consumer grade hardware, which are usually multi-function devices without isolation exposed via ACS or quirks.
The edge case is that the user has created a multi-device group, manually assigned device 1 in the group to VFIO, left device 2 with no driver and then told libvirt to manually use device 2. With the above approach libvirt won't detect this misconfiguration and qemu will fail.
libvirt will see that there is no driver at all, and recognize that, by definition, "no driver" == "not a vfio variant driver" :-). So in this case libvirt catches the misconfiguration.
The vfio group exists if any devices in the group are bound to a vfio driver, but the device is not accessible from the group unless the viability test passes. That means QEMU may not be able to get access to the device because the device we want isn't actually bound to a vfio driver or another device in the group is not in a viable state. Thanks,
This is a different misconfiguration that libvirt also won't detect, right? In this case ownership claiming in the kernel will fail and qemu will fail too, like above.
Right. If we're relying on "iommu group matching" as you suggest, then libvirt will mistakenly conclude that the driver is a vfio variant, but then qemu will fail.
This, and the above, could be handled by having libvirt also open the group FD and get the device. It would prove both correct binding and viability.
I had understood the point of this logic was to give better error reporting to users so that common misconfigurations could be diagnosed earlier.
Correct. In the end QEMU and the kernel have the final say of course, but if we can detect the problem sooner then it's more likely the user will get a meaningful error message.
When I say 'small edge case' I mean it seems like an unlikely misconfiguration that someone would know to setup VFIO but then use the wrong BDFs to do it - arguably less likely than someone would know to setup VFIO but forget to unbind the other drivers in the group?
You obviously haven't spent enough time trying to remotely troubleshoot the setups of noobs on IRC :-). If anything can be done wrong, there is certainly someone around who will do it that way. Of course we can't eliminate 100% of these, but the more misconfigurations we can catch, and the earlier we can catch them, the better. All of this without any false error reports of course. It's better that some edge cases get through (and be shot down by QEMU) rather than that we would mistakenly prevent someone from using a totally viable config (which is the case right now).
But maybe I don't get it at all ...
Nah, you get it. We just have minor differences of opinion of which choice will be the best combination of simple to implement + less user headaches (and in the end any of us could be right; personally my opinion sways from minute to minute).

On Thu, Aug 04, 2022 at 03:11:07PM -0400, Laine Stump wrote:
On 8/4/22 2:36 PM, Jason Gunthorpe wrote:
On Thu, Aug 04, 2022 at 12:18:26PM -0600, Alex Williamson wrote:
On Thu, 4 Aug 2022 13:51:20 -0300 Jason Gunthorpe <jgg@nvidia.com> wrote:
On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
> > > Fortunately these new vendor/device-specific drivers can be easily > > > identified as being "vfio-pci + extra stuff" - all that's needed is to > > > look at the output of the "modinfo $driver_name" command to see if > > > "vfio_pci" is in the alias list for the driver.
We are moving in a direction on the kernel side to expose a sysfs under the PCI device that definitively says it is VFIO enabled, eg something like
/sys/devices/pci0000:00/0000:00:1f.6/vfio/<N>
Which is how every other subsystem in the kernel works. When this lands libvirt can simply stat the vfio directory and confirm that the device handle it is looking at is vfio enabled, for all things that vfio support.
My thinking had been to do the above work a bit later, but if libvirt needs it right now then lets do it right away so we don't have to worry about this hacky modprobe stuff down the road?
That seems like a pretty long gap, there are vfio-pci variant drivers since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20) midway through the merge window. We therefore have at least 3 kernels exposing devices in a way that libvirt can't make use of simply due to a driver matching test.
That is reasonable, but I'd say those three kernels only have two drivers and they both have vfio as a substring in their name - so the simple thing of just substring searching 'vfio' would get us over that gap.
Looking at the aliases for exactly "vfio_pci" isn't that much more complicated, and "feels" a lot more reliable than just doing a substring search for "vfio" in the driver's name. (It would be, uh, .... "not smart" to name a driver "vfio<anything>" if it wasn't actually a vfio variant driver (or the opposite), but I could imagine it happening; :-/)
If it is just 2 drivers so far then we don't need to even do a substring. We should do a precise full string match for just those couple of drivers that exist. We don't need to care about out of tree drivers IMHO. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Aug 04, 2022 at 01:51:20PM -0300, Jason Gunthorpe wrote:
On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
Fortunately these new vendor/device-specific drivers can be easily identified as being "vfio-pci + extra stuff" - all that's needed is to look at the output of the "modinfo $driver_name" command to see if "vfio_pci" is in the alias list for the driver.
We are moving in a direction on the kernel side to expose a sysfs under the PCI device that definitively says it is VFIO enabled, eg something like
/sys/devices/pci0000:00/0000:00:1f.6/vfio/<N>
Which is how every other subsystem in the kernel works. When this lands libvirt can simply stat the vfio directory and confirm that the device handle it is looking at is vfio enabled, for all things that vfio support.
My thinking had been to do the above work a bit later, but if libvirt needs it right now then lets do it right away so we don't have to worry about this hacky modprobe stuff down the road?
I wouldn't go so far as to say libvirt "needs" it, as obviously we can make it work using module.alias information. I would say that exposing this in sysfs though makes it simpler and faster, because the check then essentially turns into a single stat() call. So from that POV libvirt would be happy to see that improvement. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
On 8/1/22 7:58 AM, Erik Skultety wrote:
Instead of calling an external program and then grepping its output which technically could change in the future, wouldn't it be better if we read /lib/modules/`uname -r`/modules.alias and filtered whatever line had the vfio-pci' substring and compared the module name with the user-provided device driver?
Again, although I was hesistant about calling an external command, and asked if there was something simpler, Alex still suggested modinfo, so I'll let him answer that. Alex?
(Also, although the format of the output of "uname -r" is pretty much written in stone, you're still running an external command :-))
You wouldn't actually call 'uname -r', you'd invoke uname(2) function and use the 'release' field in 'struct utsname'. I'd favour reading modules.alias directly over invoking modinfo for sure, though I'd be even more in favour of the kernel just exposing the sysfs attribute and in the meanwhile just hardcoding the only 2 driver names that exist so far. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 8/5/22 5:40 AM, Daniel P. Berrangé wrote:
On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
On 8/1/22 7:58 AM, Erik Skultety wrote:
Instead of calling an external program and then grepping its output which technically could change in the future, wouldn't it be better if we read /lib/modules/`uname -r`/modules.alias and filtered whatever line had the vfio-pci' substring and compared the module name with the user-provided device driver?
Again, although I was hesistant about calling an external command, and asked if there was something simpler, Alex still suggested modinfo, so I'll let him answer that. Alex?
(Also, although the format of the output of "uname -r" is pretty much written in stone, you're still running an external command :-))
You wouldn't actually call 'uname -r', you'd invoke uname(2) function and use the 'release' field in 'struct utsname'.
Yeah, I wasn't thinking clearly when I said that :-P
I'd favour reading modules.alias directly over invoking modinfo for sure, though I'd be even more in favour of the kernel just exposing the sysfs attribute and in the meanwhile just hardcoding the only 2 driver names that exist so far.
The problem with hardcoding the 2 existing driver names is that it wouldn't do any good to anyone developing a new driver, and part of the aim of doing this is to make it possible for developers to test their new drivers using libvirt (and management systems based on libvirt).

On Fri, Aug 05, 2022 at 11:46:17AM -0400, Laine Stump wrote:
On 8/5/22 5:40 AM, Daniel P. Berrangé wrote:
On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
On 8/1/22 7:58 AM, Erik Skultety wrote:
Instead of calling an external program and then grepping its output which technically could change in the future, wouldn't it be better if we read /lib/modules/`uname -r`/modules.alias and filtered whatever line had the vfio-pci' substring and compared the module name with the user-provided device driver?
Again, although I was hesistant about calling an external command, and asked if there was something simpler, Alex still suggested modinfo, so I'll let him answer that. Alex?
(Also, although the format of the output of "uname -r" is pretty much written in stone, you're still running an external command :-))
You wouldn't actually call 'uname -r', you'd invoke uname(2) function and use the 'release' field in 'struct utsname'.
Yeah, I wasn't thinking clearly when I said that :-P
I'd favour reading modules.alias directly over invoking modinfo for sure, though I'd be even more in favour of the kernel just exposing the sysfs attribute and in the meanwhile just hardcoding the only 2 driver names that exist so far.
The problem with hardcoding the 2 existing driver names is that it wouldn't do any good to anyone developing a new driver, and part of the aim of doing this is to make it possible for developers to test their new drivers using libvirt (and management systems based on libvirt).
I'm only suggesting hardcoding the driver names, *if* the kernel folks agree to expose the sysfs directory. Anyone developing new drivers is unlikely to have their drivers merged before this new sysfs dir is added, so it is not a significant enough real world blocker for them for us to worry about. Best to focus on what the best long term approach is, and not worry about problems that will only exist for a couple of kernel releases today. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (5)
-
Alex Williamson
-
Daniel P. Berrangé
-
Erik Skultety
-
Jason Gunthorpe
-
Laine Stump