Re: [libvirt] [Qemu-devel] clean/simple Q35 support in libvirt+QEMU for guest OSes that don't support virtio-1.0

Friday, 17 August 2018

On Thu, Aug 16, 2018 at 06:20:29PM -0400, Laine Stump wrote:
...
 Summary of the problem:

 1) We want to persuade libvirt+QEMU users to move away from the i440fx
 machinetype in favor of Q35. (NB: Someday this *might* lead to the
 ability to deprecate and even remove the 440fx machinetype, but even if
 that were to happen, it would be a *very long* time from now, so this
 discussion is *not* about that!) 
There are plenty of OS that will never support Q35 and are still interesting
to use under Q35. The set which could use Q35, but lack virtio1.0 is fairly
small. So removal of i440fx is really only something for downstream KVM vendors
to consider. Those vendors only care about modern OS, but upstream is much
more open minded about what QEMU is used for, so I see it probably living
forever in upstream, or at least long enough that current maintaniers will
be retired ;-P.

...
 2) When Q35 machinetype is used, libvirt assigns virtio devices to a
 slot on a PCI Express controller (because why have modern PCIe
 controllers/slots available but force everything onto clunky old legacy
 controllers?).

 3) When a virtio device is plugged into an Express controller, QEMU
 disables the device's IO port space, and it is put into "modern-only"
 mode (this is done to avoid a rapid exhaustion of limited IO port space).

 4) modern-only virtio devices won't work with a legacy (virtio-0.9-only)
 guest driver, because virtio-0.9 requires IO port space.

 5) Some guest OSes that we still want to support (and which would
 otherwise work okay on a Q35 virtual machine) have virtio drivers too
 old to support virtio-1.0 (CentOS6 and RHEL6 are examples of such OSes),
 but due to the chain of reasons listed above, the "standard" config for
 a Q35 guest generated by libvirt doesn't support virtio-0.9, hence
 doesn't support these guest OSes. 
Note when talking about "support" you're really saying it from the
downstream vendor, specifically RHEL, POV. From upstream or Fedora POV
essentially all x86 OS ever made are in scope for running under QEMU
if suitable virtual hardware models have been provided. QEMU doesn't
maintain any whitelist of "supported" OS that differs from what is
technically capable of being run, in the way downstream vendors do.

...
 And here's a list of possible solutions to this problem (note
that
 "consumers" means management applications such as OpenStack, oVirt,
 virt-manager, virt-install, gnome-boxes, etc. In all cases, it's assumed
 that the consumer's decision on the action to take will be based on
 information from libosinfo). For completeness, I've included even the
 possibilities that have been rejected, along with a brief synopsis of
 (at least part of) the reason for rejection:

   (1) Add some way libvirt consumers can ask libvirt to place
       virtio devices on a legacy pci slot instead of pcie when
       the machinetype is q35 (qemu sets virtio devices in legacy
       PCI slots to transitional mode, so io port space is enabled
       and virtio-0.0 drivers will work).

       This has been proposed on libvir-list, but rejected. Here is
       the most elquently stated reasoning for the rejection I could
       find (with thanks to Dan Berrange):

          The domain XML is a way to express the configuration
          of the guest virtual machine.  What we're talking about
          here is a policy tunable for an internal libvirt QEMU
          driver algorithm, as so does not belong anywhere in the
          domain XML. 
Indeed, that's a guiding principal in general, not just for this PCI
question.

...
   (2) Add full-blown pci enumeration support to all libvirt
consumers
       (i.e. they will need to build a model of the PCI bus topology
       of each guest, and keep track of which addresses are in use).
       They can then manually place virtio devices on legacy pci slots
       (again, triggering transitional mode) when the intended guest
       OS doesn't support virtio-0.9.

       (This is seen as requiring too much duplicated effort for
       development and support/maintenance, since up until now libvirt
       has been the single point of action for PCI address assignment
       (well, QEMU can do it too, but for > 10 years libvirt has
       *always* provided full PCI addresses for all devices) 
It really depends on the scope of the mgmt app - at some point the mgmt
apps needs to take charge to some degree if it has particular ideas
about how a machine should look. Libvirt's placement strategy is a good
default for 95% of use cases, but it'll never be 100%. An example is
setting up a particular PCI topology that is guest NUMA node aware,
using expander buses.

So some apps might take this option, but in the common case it is
undesirable.

...
   (3) Add virtio-1.0 support to all guest OSes. If this is done,
       existing libvirt configs will work.

       (Aside from the difficulty of backporting, and the fact that
       there are going to be some OSes that don't get it *at all*,
       there will always be older releases that haven't gotten the
       backport. So this isn't a complete solution). 
Yep, there will always be guest OS that don't support 1.0. So that's
only a solution if the person who cares about Q35 support also controls
the guest OS in question.

...
   (4) Consumers can continue using the 440fx machinetype for guest
       OSes that don't support virtio-0.9

       (This would work, but perpetuates use of the 440fx
       machinetype, and all for just this one reason (at least in
       the case of CentOS6/RHEL6, which otherwise work just fine with
       Q35)). 
...
From an usptream POV this is always going to be the case. This is
really only an undesirable thing for downstream who are trying to
artificially restrict what QEMU features users have available to
them.

...
   (5) Introduce  virtio-0.9, virtio-1.0 models in libvirt
       which are explicitly legacy-only and modern-only.
       QEMU doesn't need to change, as libvirt can simply set
       the right params on existing QEMU models to force the
       behavior.

       (NB: it's unclear to me whether virtio-0.9 simply won't
       work without forcing the device to be on a legacy PCI
       slot, or if that's just "a very bad idea" because it
       will mean that the device uses up extra io port space) 
...
 As a starter for continuing the discussion, it seems to me that for
 option (5):

 a) we don't really need the virtio-1.0 model, since that's what you
 currently get anyway when you ask for "virtio" on Q35 (and on 440fx,
 "virtio" gives you transitional, which works for everybody). 
At some point we might have a virtio-2.0 and find ourselves in a
similar problem again. IMHO it is preferrable to have both explicit
versioned models, and discourage use of the magical 'virtio' model from
mgmt apps. Use libosinfo to identify which virtio model is supported
for the OS in question and use that explicitly.  Only use the magical
'virtio' model if there's no information about what version the OS
supports.

...
 b) Rather than a "legacy-only" model for virtio-0.9, it
would be more
 useful to have "transitional". This way the config would work for older
 OSes that don't support virtio-1.0, and when/if the OS was upgraded such
 that it supported virtio-1.0, that would be automatically used without
 needing to change the config. 
I don't think the case of OS suddenly gaining support for 1.0 in an update
is frequent enough to be worth worrying about.

...
 c) Even if it's possible to force a device on an Express slot
into
 transitional mode, this is extremely wasteful of io port space, so
 libvirt should consider virtio-0.9 devices to be legacy PCI, and thus
 plug them into legacy PCI slots. And once we're doing this, it's
 unnecessary to add any extra option to the qemu commandline to force
 legacy support (i.e. transitional mode), as that is what QEMU already
 does when the device is connected to a legacy PCI slot. 
Yes, it should plug them into legacy PCI slots by default, but if a
mgmt app has done explicit placement itself, it should honour that
even if it means wasting IO space.

...
 So making the naive assumption that we agree on implementing option
(5)
 and there are no objections to my points a-c (Hah! As if!), how does
 this sound as a plan:

 A) libosinfo starts telling consumers that the preferred virtio device
 model for the relevant OSes is "virtio-0.9", and leaves the
 recommendation for other OSes as "virtio". 
Libosinfo already uses 'virtio' as the prefix identifying virtio-0.9
support (the old PCI product IDs), and 'virtio-1.0' as the prefix for
identifying virtio-1.0 support (the new PCI product IDs).  That these
don't match libvirt model names doesn't matter.

...
 B) libvirt adds a "virtio-0.9" model for all virtio devices
that
 actually have virtio-0.9 support (a couple of devices never existed
 prior to virtio-1.0 (rng and ???) so virtio-0.9 would be nonsensical for
 them). 
...

 C) inside libvirt, the implementation of the "virtio-0.9" model is
 identical to "virtio", except that the VIR_PCI_CONNECT_TYPE flags for
 these devices contain VIR_PCI_CONNECT_TYPE_PCI rather than
 VIR_PCI_CONNECT_TYPE_PCIE, resulting in those devices being assigned to
 a legacy PCI slot, and thus they would be transitional mode by default. 

For 'virtio-0.9' libvirt should set "disable-modern=yes" in QEMU args

For 'virtio-1.0' libvirt should set "disable-legacy=yes" in QEMU args

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [Qemu-devel] clean/simple Q35 support in libvirt+QEMU for guest OSes that don't support virtio-1.0