On Fri, Jan 05, 2024 at 10:06:13 -0500, Laine Stump wrote:
On 1/5/24 8:46 AM, Peter Krempa wrote:
> On Fri, Jan 05, 2024 at 03:20:16 -0500, Laine Stump wrote:
> > Rather than always binding to the vfio-pci driver, use the new
> > function virPCIDeviceFindBestVFIOVariant() to see if the running
> > kernel has a VFIO variant driver available that is a better match for
> > the device, and if one is found, use that instead.
> >
> > virPCIDeviceFindBestVFIOVariant() function reads the modalias file for
> > the given device from sysfs, then looks through
> > /lib/modules/${kernel_release}/modules.alias for the vfio_pci alias
> > that matches with the least number of wildcard ('*') fields.
> >
> > The appropriate "VFIO variant" driver for a device will be the PCI
> > driver implemented by the discovered module - these drivers are
> > compatible with (and provide the entire API of) the standard vfio-pci
> > driver, but have additional device-specific APIs that can be useful
> > for, e.g., saving/restoring state for migration.
> >
> > If a specific driver is named (using <driver model='blah'/> in
the
> > device XML), that will still be used rather than searching
> > modules.alias; this makes it possible to force binding of vfio-pci if
> > there is an issue with the auto-selected variant driver.
> >
> > Signed-off-by: Laine Stump <laine(a)redhat.com>
> > ---
> >
> > Changes from V2:
> >
> > * fail if device modalias file isn't found.
>
> This [1] ...
>
> > * use unsigned int instead of int for wildcardCt
> > * increase file memory buffer from 4MB to 8MB
> > * other minor nits pointed out by Peter
>
> [...]
>
> > +/* virPCIDeviceFindBestVFIOVariant:
> > + *
> > + * Find the "best" match of all vfio_pci aliases for @dev in the
host
> > + * modules.alias file. This uses the algorithm of finding every
> > + * modules.alias line that begins with "vfio_pci:", then picking
the
> > + * one that matches the device's own modalias value (from the file of
> > + * that name in the device's sysfs directory) with the fewest
> > + * "wildcards" (* character, meaning "match any value for this
> > + * attribute").
> > + */
> > +int
> > +virPCIDeviceFindBestVFIOVariant(virPCIDevice *dev,
> > + char **moduleName)
> > +{
> > + g_autofree char *devModAliasPath = NULL;
> > + g_autofree char *devModAliasContent = NULL;
> > + const char *devModAlias;
> > + g_autoptr(virPCIDeviceAliasInfo) devModAliasInfo = NULL;
> > + struct utsname unameInfo;
> > + g_autofree char *modulesAliasPath = NULL;
> > + g_autofree char *modulesAliasContent = NULL;
> > + const char *line;
> > + unsigned int currentBestWildcardCt = INT_MAX;
>
> UINT_MAX
>
> > +
> > + *moduleName = NULL;
> > +
> > + /* get the modalias values for the device from sysfs */
> > + devModAliasPath = virPCIFile(dev->name, "modalias");
> > + if (virFileReadAll(devModAliasPath, 100, &devModAliasContent) < 0)
>
> ... [1] isn't true.
How so? If the file can't be opened, then virFileReadAll logs an error and
returns -1, and changed changed this code from ignoring that to also
returning -1, which will cause the caller to fail. What am I missing?
I think I misremembered. I thought that the idea was that this should
not fail if the file is missing.
> > + return -1;
> > +
> > + VIR_DEBUG("modalias path: '%s' contents: '%s'",
> > + devModAliasPath, devModAliasContent);
> > +
> > + /* "pci:vNNNNNNNNdNNNNNNNNsvNNNNNNNNsdNNNNNNNNbcNNscNNiNN\n" */
> > + if ((devModAlias = STRSKIP(devModAliasContent, "pci:")) == NULL
||
> > + !(devModAliasInfo = virPCIDeviceAliasInfoNew(devModAlias))) {
> > + virReportError(VIR_ERR_INTERNAL_ERROR,
> > + _("device modalias file %1$s content has improper
format"),
> > + devModAliasPath);
> > + return -1;
> > + }
> > +
> > + uname(&unameInfo);
> > + modulesAliasPath =
g_strdup_printf("/lib/modules/%s/modules.alias", unameInfo.release);
> > + if (virFileReadAll(modulesAliasPath, 8 * 1024 * 1024,
&modulesAliasContent) < 0)
> > + return -1;
>
> IIUC this is picking a driver which is 'better' than the default
> 'vfio_pci', but the device itself would still work if the default were
> used, right?
Correct.
>
> In such case do we really want to treat any of the errors above as
> failures?
I was on the fence about this, which was why earlier versions ignored the
error. But your earlier suggestion to check for file existence before trying
to read the file (thus eliminating any sort of message at all, which I think
would be wrong) pushed me in the direction of logging the error and failing
(since it would be *really* broken for the modalias file to be missing).
I would also be fine with logging the error when virFileReadAll() fails (ie
*not* checking if the file exists so we could silently avoid the error log),
but then clearing the and returning success. Does that work for you?
Hmmm, yeah I understand why. Each option has benefits.
I was worried a bit about short-term occuring bugs, but long term it
makes sense to just report teh error if it doesnt' work ...
> I presueme a workaround for if any of the above breaks is to
> use an explicitly specified driver, right?
Correct. If a driver is explicitly given, then we don't attempt to find a
better match. So in the case that a device really had no modalias file but
was otherwise working, the error could be worked around by adding <driver
model='vfio-pci'/> to the XML (assuming that whatever is creating the XML is
able to do that).
... and there is a workaround for if stuff breaks. Thus:
Reviewed-by: Peter Krempa <pkrempa(a)redhat.com>
(fix the UINT_MAX thing)