On 03/09/2016 09:54 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote:
> On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
>>> I'm not sure I fully understand all of the above, but I'll pitch
>>> in with my own proposal regardless :)
>>>
>>> First, we make sure that
>>>
>>> <controller type='pci' index='0'
model='pcie-root'/>
>>>
>>> is always added automatically to the domain XML when using the
>>> mach-virt machine type. Then, if
>>>
>>> <controller type='pci' index='1'
model='dmi-to-pci-bridge'/>
>>> <controller type='pci' index='2'
model='pci-bridge'/>
>>>
>>> is present as well we default to virtio-pci, otherwise we use
>>> the current default of virtio-mmio. This should allow management
>>> applications, based on knowledge about the guest OS, to easily
>>> pick between the two address schemes.
>>>
>>> Does this sound like a good idea?
>> ... or a variation of that, anyway :-)
>>
>> What I think: If there are *any* pci controllers *beyond* pcie-root, or
>> if there are any devices that already have a PCI address, then assign
>> PCI addresses, else use mmio.
> This sounds reasonable.
>
> However, I'm wondering if we shouldn't be even more explicit about
> this... From libvirt's point of view we just need to agree on some
> sort of "trigger" that causes it to allocate PCI addresses instead
> of MMIO addresses, and for that purpose "any PCI controller or any
> device with a PCI address" is perfectly fine; looking at that from
> the point of view of an user, though? Not sure about it.
>
> What about adding something like
>
> <preferences>
> <preference name="defaultAddressType" value="pci"/>
> </preferences>
>
> to the domain XML? AFAICT libvirt doesn't have any element that
> could be used for this purpose at the moment, but maybe having a
> generic way to set domain-wide preferences could be useful in
> other situations as well?
[snip]
Looking at this mail and laine's before I really get the impression we
are over-thinking this all. The automatic address assignment by libvirt
was written such that it matched what QEMU would have done, so that we
could introduce the concept of device addressing without breaking
existing apps which didn't supply addresses. The automatic addressing
logic certainly wasn't ever intended to be able to implement arbitrary
policies.
As a general rule libvirt has always aimed to stay out of the policy
business, and focus on providing the mechanism only. So when we start
talking about adding <preferences> to the XML this is a sure sign
that we've gone too far into trying to implement policy in libvirt.
>From the POV of being able to tell libvirt to assign PCI instead of
MMIO addresses for the virt machine, I think it suffices to have
libvirt accept a simple <address type="pci"/> element (ie with no
bus/slot/func filled in).
That's just a more limited case of the same type of feature creep though
- you're specifying a preference for what type of address you want, just
in a different place. (The odd thing is that we're adding it only to
remedy what is apparently a temporary problem, and even then it's a
problem that wouldn't exist if we just said simply this: "If
virtio-mmio is specified, then use virtio-mmio; If no address is
specified, use pci". This puts the burden on people insisting on the old
/ soon-to-be-deprecated (?) virtio-mmio address type to add a little
something to the XML (and a very simple little something at that!). For
a short time, this could cause problems for people who create new
domains using qemu binaries that don't have a pci bus on aarch64/virt
yet, but that will be easily solvable (by adding "<address
type='virtio-mmio'/>", which is nearly as simple as typing
"<address
type='pci'/>").
A downside of setting your preference with <address type="pci"/> vs.
having the preference in a separate element is that you can no longer
cause a "re-addressing" of all the devices by simply removing all the
<address> elements (and it's really that that got me thinking about
adding hints/preferences in the <target> element). I don't know how
important that is; I suppose when the producer of the XML is some other
software it's a non-issue, when the producer is a human using virsh edit
it would make life much simpler once in awhile (and the suggestion in
the previous paragraph would eliminate that problem anyway).
So is there a specific reason why we need to keep virtio-mmio as the
default, and require explicitly saying "address type='pci'" in order to
get a PCI address?
If applictions care about assigning devices to specify PCI buses,
ie to distinguish between PCI & PCIe, or to pick a different bus
when there is one bus per NUMA node, then really it is time for
the application to start specifying the full <address> element
with all details, and not try to invent ever more complex policies
inside libvirt which will never deal with all application use cases.
First, I agree with your vigilance against unnecessary feature creep.
Both because keeping things simple makes it easier to maintain and use,
and also because it's very difficult (or even impossible) to change
something once it's gone in, in the case that you have second thoughts.
But, expanding beyond just the aarch64/virt mmio vs. pci problem (which
I had already done in this thread, and which is what got us into this
"feature creep" discussion in the first place :-)...
I remember when we were first adding support for Q35 that we said it
should be as easy to create a domain with Q35 machinetype as it is for
440fx, i.e. you should be able to do it without manually specifying any
PCI addresses (that's why we have the strange default bus hierarchy,
with a dmi-to-pci-bridge and a pci-bridge, and all devices going onto
the pci-bridge). But of course 440fx is very simplistic - all slots are
standard PCI and all are hotpluggable. On Q35 there could be room for
choices though, in particular PCI vs PCIe and hotpluggable vs. not. (and
yeah, also what NUMA node the bus is on; I agree that one is "getting
out there"). So since there are choices, libvirt has by definition begun
implementing policy when it auto-assigns *any* PCI address (current
policy is "require all auto-assigned device addresses to be hotpluggable
standard PCI). And if libvirt doesn't provide an easy way to choose, it
will have implemented a defacto mandatory policy (unless you force full
specification of PCI addresses, which goes against the "make it easy" goal).
And the current policy is looking (for Q35 and aarch64/virt at least)
less and less like the correct one - in the years since it was put in,
1) we've learned that qemu doesn't care, and will not ever care, that a
PCI device has been plugged into a PCIe slot, 2) we now have support for
several PCIe controllers we didn't previously have, 3) people have
started complaining that their devices are in PCI slots rather than
PCIe, 4) even though it was stated at the time I wrote the code to
auto-add a pci-bridge to every Q35 domain that pci-bridge's failure to
support hotplug was a temporary bug, I just learned yesterday that it
apparently still doesn't work, and 5) some platforms (e.g. our favorite
aarch64/virt) are emulating a hardware platform that has *never* had
standard PCI slots, only PCIe.
Beyond that, there is no place that provides a simple encoding of which
type of controller provides which type of slot, and what is allowed to
plug into what. If you require the management application/human to
manually specify all the PCI addresses as soon as they have a need for
one of these basic characteristics, then not only has it become
cumbersome to define a domain (because the management app has to
maintain a data structure to keep track of which PCI addresses are in
use and which aren't), but it means that the management application also
needs to know all sorts of rules about which PCI controllers are
actually pcie vs. pci, and which accept hotplug devices vs. which don't,
as well as things like the min/max slot number for each controller, and
which ones can plug into where, e.g. a pcie-root-port can only plug into
pcie-root or pcie-expander-bus, and a pcie-switch-downstream-port can
only plug into a pcie-switch-upstream-port, etc. Requiring a management
app to get all of that right just so that they can pick between a
hotpluggable and non-hotpluggable slot seems like an overly large burden
(and prone to error).
In the end, if libvirt makes it simple for the management app to specify
what kind of slot it wants, rather than requiring it to specify the
exact slot, then libvirt isn't making any policy decisions, it's just
making it easier for the management app to implement its own policy
decisions, without requiring the management app to know all the details
about which controller implements what kind of connection etc, and that
does seem to fit within libvirt's purpose.