Re: [libvirt PATCH v2 10/15] xen: explicitly set hostdev driver.type at runtime, not in postparse

Monday, 18 December 2023

On Mon, Dec 18, 2023 at 01:14:55 -0500, Laine Stump wrote:
...
 On 11/27/23 10:12 AM, Peter Krempa wrote:
 > On Mon, Nov 06, 2023 at 02:38:55 -0500, Laine Stump wrote:
 > > Xen only supports a single type of PCI hostdev assignment, so it is
 > > superfluous to have <driver name='xen'/> peppered throughout the
 > > config. It *is* necessary to have the driver type explicitly set in
 > > the hosdev object before calling into the hypervisor-agnostic "hostdev
 > > manager" though (otherwise the hostdev manager doesn't know whether
it
 > > should do Xen-specific setup, or VFIO-specific setup).
 > > 
 > > Historically, the Xen driver has checked for "default" driver type
 > > (i.e. not set in the XML), and set it to "xen', during the XML
 > > postparse, thus guaranteeing that it will be set by the time the
 > > object is sent to the hostdev manager at runtime, but also setting it
 > > so early that a simple round-trip of parse-format results in the XML
 > > always containing an explicit <driver name='xen'/>, even if that
 > > wasn't specified in the original XML.
 > 
 > Generally stuff that is written to the definition at parse time is done
 > so that it doesn't change in the future, thus preserving ABI of machines
 > created with a config where the default was not entered yet.

 I don't think there was any intentional "lock in the default setting in the
 config to preserve ABI" in this case - it all seems to just be
 convenience/serendipity. From what I can see in the original code adding
 "PCI passthrough" to the libxl driver, they were just filling in the driver
 name in the xml because 1) it was there, and 2) they had just split out some
 of the code from qemu into "common code" to be used by both libxl and QEMU,
 needed a way to specify which driver was calling into this common code, saw
 the driver name as a convenient way to do it, and noticed that lots of other
 values that weren't set by the user were being set in the postparse, so
 that's where they set it in their code.

 (BTW, there was code in the original commit adding PCI passthrough to libxl
 (commit 6225cb3df5a5732868719462f0a602ef11f7fa03) that logged an error if
 the driver name was anything other than empty (in which case it was set to
 "xen") or "xen". That code is still there today (see
 libxlNodeDeviceDetachFlags), so the Xen driver definiteoy only supports the
 one type of PCI device assignment.)

 > 
 > This commit message doesn't make an argument why the change is needed,
 > so I'm a bit reluctant...

 Well... the less use of the old usage of <driver name> there is, the better,
 and preventing it from being written into every single new config file with
 a <hostdev> means less use. Also this change makes the libxl driver more
 consistent with the behavior of the qemu driver (which has never set an
 explicit default in the postparse - it waits until the devices are being
 initialized for domain startup/hotplug before it sets the driver type (and
 it also always sets it to the same value).

 I suppose I could omit this patch, but I think it's just perpetuating
 unintentional code that is inconsistent with the other driver (qemu) which
 serves to make it more difficult for a newcomer to understand what's going
 on with the <driver name/type> and what is the proper way to handle it in
 the case of some new hypervisor driver that decides to support <hostdev>.

 > 
 > 
 > > The QEMU driver *doesn't* set driver.type during postparse though;
 > > instead, it waits until domain startup time (or device attach time for
 > > hotplug), and sets the driver.type then. The result is that a
 > > parse-format round trip of the XML in the QEMU driver *doesn't* add in
 > > the <driver name='vfio'/>.
 > > 
 > > This patch modifies the Xen code to behave similarly to the QEMU code
 > > - the PostParse just checks for a driver.type that isn't supported by
 > > the Xen driver, and any explicit setting to "xen" is deferred until
domain
 > > runtime rather than during the postparse.
 > > 
 > > This delayed setting of driver.type of course results in slightly
 > > different xml2xml parse-format results, so the unit test data is
 > > modified accordingly.
 > > 
 > > Signed-off-by: Laine Stump <laine(a)redhat.com&gt;
 > > ---
 > >   src/libxl/libxl_domain.c                      | 65 +++++++++++++++----
 > >   src/libxl/libxl_driver.c                      | 25 ++++---
 > >   tests/libxlxml2domconfigdata/moredevs-hvm.xml |  1 -
 > >   tests/xlconfigdata/test-fullvirt-pci.xml      |  2 -
 > >   tests/xmconfigdata/test-pci-dev-syntax.xml    |  2 -
 > >   tests/xmconfigdata/test-pci-devs.xml          |  2 -
 > >   6 files changed, 69 insertions(+), 28 deletions(-)
 > > 
 > > diff --git a/src/libxl/libxl_domain.c b/src/libxl/libxl_domain.c
 > > index 22482adfa6..ecdf57e655 100644
 > > --- a/src/libxl/libxl_domain.c
 > > +++ b/src/libxl/libxl_domain.c
 > > @@ -160,8 +160,15 @@ libxlDomainDeviceDefPostParse(virDomainDeviceDef *dev,
 > >           if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
 > >               hostdev->source.subsys.type ==
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
 > > -            pcisrc->driver.type ==
VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_DEFAULT)
 > > -            pcisrc->driver.type = VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_XEN;
 > > +            pcisrc->driver.type !=
VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_DEFAULT &&
 > > +            pcisrc->driver.type != VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_XEN)
{
 > > +
 > > +            /* Xen only supports "Xen" style of hostdev, of course!
*/
 > > +            virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
 > > +                           _("XEN does not support device assignment mode
'%1$s'"),
 > > +                          
virDeviceHostdevPCIDriverTypeToString(pcisrc->driver.type));
 > > +            return -1;
 > > +        }
 > >       }
 > >       if (dev->type == VIR_DOMAIN_DEVICE_VIDEO) {
 > > @@ -986,18 +993,9 @@ libxlNetworkPrepareDevices(virDomainDef *def)
 > >           if (actualType == VIR_DOMAIN_NET_TYPE_HOSTDEV &&
 > >               net->type == VIR_DOMAIN_NET_TYPE_NETWORK) {
 > >               /* Each type='hostdev' network device must also have a
 > > -             * corresponding entry in the hostdevs array. For netdevs
 > > -             * that are hardcoded as type='hostdev', this is already
 > > -             * done by the parser, but for those allocated from a
 > > -             * network / determined at runtime, we need to do it
 > > -             * separately.
 > > +             * corresponding entry in the hostdevs array.
 > >                */
 > >               virDomainHostdevDef *hostdev =
virDomainNetGetActualHostdev(net);
 > > -            virDomainHostdevSubsysPCI *pcisrc =
&hostdev->source.subsys.u.pci;
 > > -
 > > -            if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
 > > -                hostdev->source.subsys.type ==
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
 > > -                pcisrc->driver.type =
VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_XEN;
 > >               if (virDomainHostdevInsert(def, hostdev) < 0)
 > >                   return -1;
 > > @@ -1007,6 +1005,46 @@ libxlNetworkPrepareDevices(virDomainDef *def)
 > >       return 0;
 > >   }
 > > +
 > > +static int
 > > +libxlHostdevPrepareDevices(virDomainDef *def)
 > > +{
 > > +    size_t i;
 > > +
 > > +    for (i = 0; i < def->nhostdevs; i++) {
 > > +        virDomainHostdevDef *hostdev = def->hostdevs[i];
 > > +        virDomainHostdevSubsysPCI *pcisrc =
&hostdev->source.subsys.u.pci;
 > > +
 > > +        if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS &&
 > > +            hostdev->source.subsys.type ==
VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI &&
 > > +            pcisrc->driver.type ==
VIR_DEVICE_HOSTDEV_PCI_DRIVER_TYPE_DEFAULT) {
 > > +
 > 
 > ... and here I don't think it's obvious enough that the default must
 > never change.

 But is that the case? (that the default must never change, I mean) It's only
 going to make a difference if libxl adds support for some other type of PCI
 device assignment *and* there is something in that new type of device
 assignment that changes the guest ABI in an incompatible way. (I'm not
 certain, but I'm guessing that Xen doesn't live migration of domains with an
 assigned PCI device, so likely any guest would need to be fully shut down to
 move from the existing type of device assignment to some hypothetical new
 type that may exist at some time in the future.

 At the time that VFIO device assignment first became available, there was no
 <driver name='kvm'/> in any existing config, and no existing management
app
 had a setting for the type of device assignment. This turned out to be a
 very *good* thing, because it meant that everyone began using VFIO device
 assignment by default immediately even on existing configs (and the
 newly-introduced <driver name='kvm|vfio'/> was only used if someone found
a
 problem in VFIO and needed to temporarily switch back to KVM; fortunately I
 don't think anyone ever did).

 If we had instead required an explicit change to config for anyone to switch
 to using VFIO device assignment instead of legacy KVM device assignment (in
 addition to also requiring the management applications to add a knob to turn
 it on) it surely would have taken *years* longer for VFIO to be fully
 accepted/adopted, and it would have been a correspondingly longer time
 before the legacy KVM device assignment code could be deprecated and then
 finally removed from the kernel and qemu.

 Anyway, I don't see an advantage to explicitly writing into the config the
 value of a knob that only has one valid setting. If, at some point in the
 future, it turns out that libxl adds some new type of device assignment, and
 that new type of device assignement is incompatible with the current device
 assignment in such a way that they don't want users to automatically begin
 using it, then at that time it can just be explicitly stated that "not
 having the driver type set in the config means that the type is "legacy
 xen". But maybe that *won't* be needed, and it will be totally okay for
 everyone to just automatically start using this hypothetical "new type of
 device assignment for libxl"; why make the decision now that we shouldn't
 allow that?

 So have I convinced you, or should I drop it? 
The explanation makes sense. You can mention in the commit message that
this is mostly a backend thing so it's likely that it can be upgraded in
the future in a compatible way.

Ideally add a comment to the xen function stating that it can change if
it's ABI compatible.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt PATCH v2 10/15] xen: explicitly set hostdev driver.type at runtime, not in postparse