On 1/5/24 8:46 AM, Peter Krempa wrote:
On Fri, Jan 05, 2024 at 03:20:16 -0500, Laine Stump wrote:
> Rather than always binding to the vfio-pci driver, use the new
> function virPCIDeviceFindBestVFIOVariant() to see if the running
> kernel has a VFIO variant driver available that is a better match for
> the device, and if one is found, use that instead.
>
> virPCIDeviceFindBestVFIOVariant() function reads the modalias file for
> the given device from sysfs, then looks through
> /lib/modules/${kernel_release}/modules.alias for the vfio_pci alias
> that matches with the least number of wildcard ('*') fields.
>
> The appropriate "VFIO variant" driver for a device will be the PCI
> driver implemented by the discovered module - these drivers are
> compatible with (and provide the entire API of) the standard vfio-pci
> driver, but have additional device-specific APIs that can be useful
> for, e.g., saving/restoring state for migration.
>
> If a specific driver is named (using <driver model='blah'/> in the
> device XML), that will still be used rather than searching
> modules.alias; this makes it possible to force binding of vfio-pci if
> there is an issue with the auto-selected variant driver.
>
> Signed-off-by: Laine Stump <laine(a)redhat.com>
> ---
>
> Changes from V2:
>
> * fail if device modalias file isn't found.
This [1] ...
> * use unsigned int instead of int for wildcardCt
> * increase file memory buffer from 4MB to 8MB
> * other minor nits pointed out by Peter
[...]
> +/* virPCIDeviceFindBestVFIOVariant:
> + *
> + * Find the "best" match of all vfio_pci aliases for @dev in the host
> + * modules.alias file. This uses the algorithm of finding every
> + * modules.alias line that begins with "vfio_pci:", then picking the
> + * one that matches the device's own modalias value (from the file of
> + * that name in the device's sysfs directory) with the fewest
> + * "wildcards" (* character, meaning "match any value for this
> + * attribute").
> + */
> +int
> +virPCIDeviceFindBestVFIOVariant(virPCIDevice *dev,
> + char **moduleName)
> +{
> + g_autofree char *devModAliasPath = NULL;
> + g_autofree char *devModAliasContent = NULL;
> + const char *devModAlias;
> + g_autoptr(virPCIDeviceAliasInfo) devModAliasInfo = NULL;
> + struct utsname unameInfo;
> + g_autofree char *modulesAliasPath = NULL;
> + g_autofree char *modulesAliasContent = NULL;
> + const char *line;
> + unsigned int currentBestWildcardCt = INT_MAX;
UINT_MAX
> +
> + *moduleName = NULL;
> +
> + /* get the modalias values for the device from sysfs */
> + devModAliasPath = virPCIFile(dev->name, "modalias");
> + if (virFileReadAll(devModAliasPath, 100, &devModAliasContent) < 0)
... [1] isn't true.
How so? If the file can't be opened, then virFileReadAll logs an error
and returns -1, and changed changed this code from ignoring that to also
returning -1, which will cause the caller to fail. What am I missing?
> + return -1;
> +
> + VIR_DEBUG("modalias path: '%s' contents: '%s'",
> + devModAliasPath, devModAliasContent);
> +
> + /* "pci:vNNNNNNNNdNNNNNNNNsvNNNNNNNNsdNNNNNNNNbcNNscNNiNN\n" */
> + if ((devModAlias = STRSKIP(devModAliasContent, "pci:")) == NULL ||
> + !(devModAliasInfo = virPCIDeviceAliasInfoNew(devModAlias))) {
> + virReportError(VIR_ERR_INTERNAL_ERROR,
> + _("device modalias file %1$s content has improper
format"),
> + devModAliasPath);
> + return -1;
> + }
> +
> + uname(&unameInfo);
> + modulesAliasPath = g_strdup_printf("/lib/modules/%s/modules.alias",
unameInfo.release);
> + if (virFileReadAll(modulesAliasPath, 8 * 1024 * 1024, &modulesAliasContent)
< 0)
> + return -1;
IIUC this is picking a driver which is 'better' than the default
'vfio_pci', but the device itself would still work if the default were
used, right?
Correct.
In such case do we really want to treat any of the errors above as
failures?
I was on the fence about this, which was why earlier versions ignored
the error. But your earlier suggestion to check for file existence
before trying to read the file (thus eliminating any sort of message at
all, which I think would be wrong) pushed me in the direction of logging
the error and failing (since it would be *really* broken for the
modalias file to be missing).
I would also be fine with logging the error when virFileReadAll() fails
(ie *not* checking if the file exists so we could silently avoid the
error log), but then clearing the and returning success. Does that work
for you?
I presueme a workaround for if any of the above breaks is to
use an explicitly specified driver, right?
Correct. If a driver is explicitly given, then we don't attempt to find
a better match. So in the case that a device really had no modalias file
but was otherwise working, the error could be worked around by adding
<driver model='vfio-pci'/> to the XML (assuming that whatever is
creating the XML is able to do that).
> +
> + /* Look for all lines that are aliases for vfio_pci drivers.
> + * (The first line is always a comment, so we can be sure "alias"
> + * is preceded by a newline)
> + */
> + line = modulesAliasContent;
> +