On 1/5/24 10:30 AM, Peter Krempa wrote:
On Fri, Jan 05, 2024 at 10:06:13 -0500, Laine Stump wrote:
> On 1/5/24 8:46 AM, Peter Krempa wrote:
>> On Fri, Jan 05, 2024 at 03:20:16 -0500, Laine Stump wrote:
>>> Rather than always binding to the vfio-pci driver, use the new
>>> function virPCIDeviceFindBestVFIOVariant() to see if the running
>>> kernel has a VFIO variant driver available that is a better match for
>>> the device, and if one is found, use that instead.
>>>
>>> virPCIDeviceFindBestVFIOVariant() function reads the modalias file for
>>> the given device from sysfs, then looks through
>>> /lib/modules/${kernel_release}/modules.alias for the vfio_pci alias
>>> that matches with the least number of wildcard ('*') fields.
>>>
>>> The appropriate "VFIO variant" driver for a device will be the PCI
>>> driver implemented by the discovered module - these drivers are
>>> compatible with (and provide the entire API of) the standard vfio-pci
>>> driver, but have additional device-specific APIs that can be useful
>>> for, e.g., saving/restoring state for migration.
>>>
>>> If a specific driver is named (using <driver model='blah'/> in
the
>>> device XML), that will still be used rather than searching
>>> modules.alias; this makes it possible to force binding of vfio-pci if
>>> there is an issue with the auto-selected variant driver.
>>>
>>> Signed-off-by: Laine Stump <laine(a)redhat.com>
>>> ---
>>>
>>> Changes from V2:
>>>
>>> * fail if device modalias file isn't found.
>>
>> This [1] ...
>>
>>> * use unsigned int instead of int for wildcardCt
>>> * increase file memory buffer from 4MB to 8MB
>>> * other minor nits pointed out by Peter
>>
>> [...]
>>
>>> +/* virPCIDeviceFindBestVFIOVariant:
>>> + *
>>> + * Find the "best" match of all vfio_pci aliases for @dev in the
host
>>> + * modules.alias file. This uses the algorithm of finding every
>>> + * modules.alias line that begins with "vfio_pci:", then picking
the
>>> + * one that matches the device's own modalias value (from the file of
>>> + * that name in the device's sysfs directory) with the fewest
>>> + * "wildcards" (* character, meaning "match any value for
this
>>> + * attribute").
>>> + */
>>> +int
>>> +virPCIDeviceFindBestVFIOVariant(virPCIDevice *dev,
>>> + char **moduleName)
>>> +{
>>> + g_autofree char *devModAliasPath = NULL;
>>> + g_autofree char *devModAliasContent = NULL;
>>> + const char *devModAlias;
>>> + g_autoptr(virPCIDeviceAliasInfo) devModAliasInfo = NULL;
>>> + struct utsname unameInfo;
>>> + g_autofree char *modulesAliasPath = NULL;
>>> + g_autofree char *modulesAliasContent = NULL;
>>> + const char *line;
>>> + unsigned int currentBestWildcardCt = INT_MAX;
>>
>> UINT_MAX
>>
>>> +
>>> + *moduleName = NULL;
>>> +
>>> + /* get the modalias values for the device from sysfs */
>>> + devModAliasPath = virPCIFile(dev->name, "modalias");
>>> + if (virFileReadAll(devModAliasPath, 100, &devModAliasContent) <
0)
>>
>> ... [1] isn't true.
>
> How so? If the file can't be opened, then virFileReadAll logs an error and
> returns -1, and changed changed this code from ignoring that to also
> returning -1, which will cause the caller to fail. What am I missing?
I think I misremembered. I thought that the idea was that this should
not fail if the file is missing.
Well, it *was*, so you didn't misremember, you were just remembering one
iteration into the past :-)
>>> + return -1;
>>> +
>>> + VIR_DEBUG("modalias path: '%s' contents:
'%s'",
>>> + devModAliasPath, devModAliasContent);
>>> +
>>> + /* "pci:vNNNNNNNNdNNNNNNNNsvNNNNNNNNsdNNNNNNNNbcNNscNNiNN\n"
*/
>>> + if ((devModAlias = STRSKIP(devModAliasContent, "pci:")) ==
NULL ||
>>> + !(devModAliasInfo = virPCIDeviceAliasInfoNew(devModAlias))) {
>>> + virReportError(VIR_ERR_INTERNAL_ERROR,
>>> + _("device modalias file %1$s content has
improper format"),
>>> + devModAliasPath);
>>> + return -1;
>>> + }
>>> +
>>> + uname(&unameInfo);
>>> + modulesAliasPath =
g_strdup_printf("/lib/modules/%s/modules.alias", unameInfo.release);
>>> + if (virFileReadAll(modulesAliasPath, 8 * 1024 * 1024,
&modulesAliasContent) < 0)
>>> + return -1;
>>
>> IIUC this is picking a driver which is 'better' than the default
>> 'vfio_pci', but the device itself would still work if the default were
>> used, right?
>
> Correct.
>
>>
>> In such case do we really want to treat any of the errors above as
>> failures?
>
> I was on the fence about this, which was why earlier versions ignored the
> error. But your earlier suggestion to check for file existence before trying
> to read the file (thus eliminating any sort of message at all, which I think
> would be wrong) pushed me in the direction of logging the error and failing
> (since it would be *really* broken for the modalias file to be missing).
>
> I would also be fine with logging the error when virFileReadAll() fails (ie
> *not* checking if the file exists so we could silently avoid the error log),
> but then clearing the and returning success. Does that work for you?
Hmmm, yeah I understand why. Each option has benefits.
I was worried a bit about short-term occuring bugs, but long term it
makes sense to just report teh error if it doesnt' work ...
>> I presueme a workaround for if any of the above breaks is to
>> use an explicitly specified driver, right?
>
> Correct. If a driver is explicitly given, then we don't attempt to find a
> better match. So in the case that a device really had no modalias file but
> was otherwise working, the error could be worked around by adding <driver
> model='vfio-pci'/> to the XML (assuming that whatever is creating the XML
is
> able to do that).
... and there is a workaround for if stuff breaks. Thus:
Reviewed-by: Peter Krempa <pkrempa(a)redhat.com>
(fix the UINT_MAX thing)
Yup!
Thanks!