On Mon, Nov 06, 2023 at 02:38:57 -0500, Laine Stump wrote:
This patch makes it possible to manually specify which VFIO variant
driver to use for PCI hostdev device assignment, so that, e.g. you
could force use of the generic vfio-pci driver with
<driver name='vfio-pci'/>
when libvirt would have normally (after applying a subsequent patch)
found a "better match" for a device in the active kernel's
modules.alias file. (The main use of this manual override would be to
work around a bug in a new VFIO variant driver by temporarily not
using that driver).
This sounds like a simple addition to XML parsing/formatting, but of
course it's messier than that, since the attribute we want to use was
previously used in config for a now non-existent purpose (choosing a
type of device assignment that was removed from the kernel nearly a
decade ago), and continues to be used *internal to the C code*.
Background:
Beginning when VFIO device assignment support was added to <hostdev>
or <interface type='hostdev'/>) in libvirt's QEMU driver, it was
possible to specify which device assignment backend to use with
"<driver name='vfio|kvm'/>". This was only useful for a couple
of
years in the early 2010's when VFIO device assignment was newly
introduced, but "legacy KVM" device assignment was still possible (in
reality "<driver name='blah'/>" was only intended to be used (1)
in
the *very* early days of VFIO when "kvm" was still the default, or (2)
when the host kernel was too old to have VFIO support (meaning it was
e.g. pre-RHEL7) or (3) some bug was encountered with the then-new VFIO
code that could have been avoided by switching back to the older style
of device assignment (I don't recall anyone ever needing to set it for
this reason, but that is one of the main reasons we added the knob).
Later, when the libxl (Xen) driver in libvirt began supporting "device
passthrough" with <hostdev>, they added <driver name='xen'/> to
indicate that mode of operation. But in practice this was also never
necessary in any config, since Xen only supports one type of device
assignment, and so the attribute was anyway set in the C object by the
libxl driver code prior to calling any hypervisor-common code
(i.e. the stuff in hypervisor/hostdev.c and util/virpci.c) that needs
to know which type of device assignment is being used - setting it in
the XML config was superfluous (kind of like me saying "I am now
describing this patch in a human language", ref: Perd Hapley on "Parks
and Recreation").
So the setting was available in the XML, but never needed to be set by
the user. Because it was automatically set in the C code though,
sometimes libvirt-generated XML would contain the option even though
the user hadn't included it in the original input XML (e.g. the libxl
driver sets it in the PostParse, so all saved configs with a PCI
<hostdev> device will have <driver name='xen'/>; also status XML
from
the QEMU and libxl drivers will contain <driver name='vfio|xen'/> - in
both cases completely unnecessary and redundant information).
When adding support for specifying a variant driver for VFIO device
assignment, from the beginning I have wanted to use <driver
name='blah'/> to force a specific variant driver (e.g. <driver
name='mlx5_vfio_pci'/>), with the assumption that the name attribute
could be easily repurposed because it had been *completely* unused for
so long. What I discovered though, was that 1) the field in the C
object corresponding to driver name was an enum value rather than a
string (so parser and formatter need to be changed, not just the
driver code looking at a string in the C object), and 2) even though
the XML attribute was effectively unused (except in some output), the
field in the C object was *always* being set internally as a way for
the hypervisor driver code to inform the hypervisor-common hostdev
manager (in src/hypervisor) which method of device assignment to
use. So re-use wasn't as simple as I'd wished.
However.
What I have hit upon that permits us to use
<driver name='mlx5_vfio_pci'/> is to do the following:
1) rename the existing driver name attribute to driver *type* - this
will now be the enum that is set internally by the hypervisor
driver prior to calling the hostdev manager (and also is what will
show up in status XML; the libxl driver was modified in a previous
patch so that the setting isn't done until domain runtime (rather
than during PostParse), so it will no longer show up in
regurgitated libxml config)
2) add a new attribute with the now-newly-unused name "name" - this
will be a string that can be used to communicate a specific host
kernel driver to the hostdev manager.
3) In the parsing code for <driver>, if <driver
name='vfio|xen|kvm'/>
is encountered, set the driver *type* to the appropriate enum
instead, and clear our the name string. (since the four places
that use the hostdev <driver> element now all call a common parser
function, this check is only needed in a single place)
I could alternately just leave "name" as-is, and create a new
attribute with a never-before-used name within driver. I suppose
instead of "type" and "name", we could instead have "name and
"model"
(where "model" would be the string that could be set to
"mlx5_vfio_pci"). I just prefer type/name to name/model, and think
it's been long enough since <driver name='blah'/> has had a useful
purpose (and the fixup for backward compatibility is limited to a few
lines in one function) that this change is safe.
I'm not entirely sold on the reusing of the 'name' attribute and this
paragraph seems to confirm that it's just for optics.
I'm a bit worried that some tool may be checking the output here however
useless it is and that tool might then need changing just because we
wanted "better looking" attribute names.
(another alternate would be to add an extra parameter to the C API
that calls into the hostdev manager rather than keeping/adding the
publicly visible "type" attribute, but it seems at least possible that
sometime in the future another alternate "type" of device assignment
could be introduced for either Xen or QEMU, and we would then need to
add back the XML attribute anyway)
I'd be fine with doing it one of the other ways, but decided to post
this way first, since it's the one that I find the most aesthetically
pleasing :-)
Okay this paragraph is a bit too much of discussion for a commit
message. Please put stuff like this under the lines or as a RFC
question.
Note that a few test cases have been modified - places where an
existing "name='vfio'" was changed to "type='vfio'"
were in in status
XML or only the *output* XML for a test (except the case of the
virnetworkportxml2xmltest, which doesn't have a separate directory for
the XML result; fortunately the converged parsing of <driver> between
domain/network/networkport means that the test cases for network and
domain XML are already testing the same code that would convert "name"
to "type" in thte networkport XML)
Signed-off-by: Laine Stump <laine(a)redhat.com>
---
The code looks okay but I'm not a fan of the rename for cosmetical
reasons. Especially if we fiddle with the value in the field with
original name and display it back to the user/management sw.