On Thu, Aug 16, 2018 at 06:20:29PM -0400, Laine Stump wrote:
Summary of the problem:
1) We want to persuade libvirt+QEMU users to move away from the i440fx
machinetype in favor of Q35. (NB: Someday this *might* lead to the
ability to deprecate and even remove the 440fx machinetype, but even if
that were to happen, it would be a *very long* time from now, so this
discussion is *not* about that!)
There are plenty of OS that will never support Q35 and are still interesting
to use under Q35. The set which could use Q35, but lack virtio1.0 is fairly
small. So removal of i440fx is really only something for downstream KVM vendors
to consider. Those vendors only care about modern OS, but upstream is much
more open minded about what QEMU is used for, so I see it probably living
forever in upstream, or at least long enough that current maintaniers will
be retired ;-P.
2) When Q35 machinetype is used, libvirt assigns virtio devices to a
slot on a PCI Express controller (because why have modern PCIe
controllers/slots available but force everything onto clunky old legacy
controllers?).
3) When a virtio device is plugged into an Express controller, QEMU
disables the device's IO port space, and it is put into "modern-only"
mode (this is done to avoid a rapid exhaustion of limited IO port space).
4) modern-only virtio devices won't work with a legacy (virtio-0.9-only)
guest driver, because virtio-0.9 requires IO port space.
5) Some guest OSes that we still want to support (and which would
otherwise work okay on a Q35 virtual machine) have virtio drivers too
old to support virtio-1.0 (CentOS6 and RHEL6 are examples of such OSes),
but due to the chain of reasons listed above, the "standard" config for
a Q35 guest generated by libvirt doesn't support virtio-0.9, hence
doesn't support these guest OSes.
Note when talking about "support" you're really saying it from the
downstream vendor, specifically RHEL, POV. From upstream or Fedora POV
essentially all x86 OS ever made are in scope for running under QEMU
if suitable virtual hardware models have been provided. QEMU doesn't
maintain any whitelist of "supported" OS that differs from what is
technically capable of being run, in the way downstream vendors do.
And here's a list of possible solutions to this problem (note
that
"consumers" means management applications such as OpenStack, oVirt,
virt-manager, virt-install, gnome-boxes, etc. In all cases, it's assumed
that the consumer's decision on the action to take will be based on
information from libosinfo). For completeness, I've included even the
possibilities that have been rejected, along with a brief synopsis of
(at least part of) the reason for rejection:
(1) Add some way libvirt consumers can ask libvirt to place
virtio devices on a legacy pci slot instead of pcie when
the machinetype is q35 (qemu sets virtio devices in legacy
PCI slots to transitional mode, so io port space is enabled
and virtio-0.0 drivers will work).
This has been proposed on libvir-list, but rejected. Here is
the most elquently stated reasoning for the rejection I could
find (with thanks to Dan Berrange):
The domain XML is a way to express the configuration
of the guest virtual machine. What we're talking about
here is a policy tunable for an internal libvirt QEMU
driver algorithm, as so does not belong anywhere in the
domain XML.
Indeed, that's a guiding principal in general, not just for this PCI
question.
(2) Add full-blown pci enumeration support to all libvirt
consumers
(i.e. they will need to build a model of the PCI bus topology
of each guest, and keep track of which addresses are in use).
They can then manually place virtio devices on legacy pci slots
(again, triggering transitional mode) when the intended guest
OS doesn't support virtio-0.9.
(This is seen as requiring too much duplicated effort for
development and support/maintenance, since up until now libvirt
has been the single point of action for PCI address assignment
(well, QEMU can do it too, but for > 10 years libvirt has
*always* provided full PCI addresses for all devices)
It really depends on the scope of the mgmt app - at some point the mgmt
apps needs to take charge to some degree if it has particular ideas
about how a machine should look. Libvirt's placement strategy is a good
default for 95% of use cases, but it'll never be 100%. An example is
setting up a particular PCI topology that is guest NUMA node aware,
using expander buses.
So some apps might take this option, but in the common case it is
undesirable.
(3) Add virtio-1.0 support to all guest OSes. If this is done,
existing libvirt configs will work.
(Aside from the difficulty of backporting, and the fact that
there are going to be some OSes that don't get it *at all*,
there will always be older releases that haven't gotten the
backport. So this isn't a complete solution).
Yep, there will always be guest OS that don't support 1.0. So that's
only a solution if the person who cares about Q35 support also controls
the guest OS in question.
(4) Consumers can continue using the 440fx machinetype for guest
OSes that don't support virtio-0.9
(This would work, but perpetuates use of the 440fx
machinetype, and all for just this one reason (at least in
the case of CentOS6/RHEL6, which otherwise work just fine with
Q35)).
From an usptream POV this is always going to be the case. This is
really only an undesirable thing for downstream who are trying to
artificially restrict what QEMU features users have available to
them.
(5) Introduce virtio-0.9, virtio-1.0 models in libvirt
which are explicitly legacy-only and modern-only.
QEMU doesn't need to change, as libvirt can simply set
the right params on existing QEMU models to force the
behavior.
(NB: it's unclear to me whether virtio-0.9 simply won't
work without forcing the device to be on a legacy PCI
slot, or if that's just "a very bad idea" because it
will mean that the device uses up extra io port space)
As a starter for continuing the discussion, it seems to me that for
option (5):
a) we don't really need the virtio-1.0 model, since that's what you
currently get anyway when you ask for "virtio" on Q35 (and on 440fx,
"virtio" gives you transitional, which works for everybody).
At some point we might have a virtio-2.0 and find ourselves in a
similar problem again. IMHO it is preferrable to have both explicit
versioned models, and discourage use of the magical 'virtio' model from
mgmt apps. Use libosinfo to identify which virtio model is supported
for the OS in question and use that explicitly. Only use the magical
'virtio' model if there's no information about what version the OS
supports.
b) Rather than a "legacy-only" model for virtio-0.9, it
would be more
useful to have "transitional". This way the config would work for older
OSes that don't support virtio-1.0, and when/if the OS was upgraded such
that it supported virtio-1.0, that would be automatically used without
needing to change the config.
I don't think the case of OS suddenly gaining support for 1.0 in an update
is frequent enough to be worth worrying about.
c) Even if it's possible to force a device on an Express slot
into
transitional mode, this is extremely wasteful of io port space, so
libvirt should consider virtio-0.9 devices to be legacy PCI, and thus
plug them into legacy PCI slots. And once we're doing this, it's
unnecessary to add any extra option to the qemu commandline to force
legacy support (i.e. transitional mode), as that is what QEMU already
does when the device is connected to a legacy PCI slot.
Yes, it should plug them into legacy PCI slots by default, but if a
mgmt app has done explicit placement itself, it should honour that
even if it means wasting IO space.
So making the naive assumption that we agree on implementing option
(5)
and there are no objections to my points a-c (Hah! As if!), how does
this sound as a plan:
A) libosinfo starts telling consumers that the preferred virtio device
model for the relevant OSes is "virtio-0.9", and leaves the
recommendation for other OSes as "virtio".
Libosinfo already uses 'virtio' as the prefix identifying virtio-0.9
support (the old PCI product IDs), and 'virtio-1.0' as the prefix for
identifying virtio-1.0 support (the new PCI product IDs). That these
don't match libvirt model names doesn't matter.
B) libvirt adds a "virtio-0.9" model for all virtio devices
that
actually have virtio-0.9 support (a couple of devices never existed
prior to virtio-1.0 (rng and ???) so virtio-0.9 would be nonsensical for
them).
C) inside libvirt, the implementation of the "virtio-0.9" model is
identical to "virtio", except that the VIR_PCI_CONNECT_TYPE flags for
these devices contain VIR_PCI_CONNECT_TYPE_PCI rather than
VIR_PCI_CONNECT_TYPE_PCIE, resulting in those devices being assigned to
a legacy PCI slot, and thus they would be transitional mode by default.
For 'virtio-0.9' libvirt should set "disable-modern=yes" in QEMU args
For 'virtio-1.0' libvirt should set "disable-legacy=yes" in QEMU args
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|