On 03/09/2016 02:09 PM, Laine Stump wrote:
On 03/09/2016 09:54 AM, Daniel P. Berrange wrote:
> On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote:
>> On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
>>>> I'm not sure I fully understand all of the above, but I'll pitch
>>>> in with my own proposal regardless :)
>>>> First, we make sure that
>>>> <controller type='pci' index='0'
model='pcie-root'/>
>>>> is always added automatically to the domain XML when using the
>>>> mach-virt machine type. Then, if
>>>> <controller type='pci' index='1'
model='dmi-to-pci-bridge'/>
>>>> <controller type='pci' index='2'
model='pci-bridge'/>
>>>> is present as well we default to virtio-pci, otherwise we use
>>>> the current default of virtio-mmio. This should allow management
>>>> applications, based on knowledge about the guest OS, to easily
>>>> pick between the two address schemes.
>>>> Does this sound like a good idea?
>>> ... or a variation of that, anyway :-)
>>> What I think: If there are *any* pci controllers *beyond* pcie-root, or
>>> if there are any devices that already have a PCI address, then assign
>>> PCI addresses, else use mmio.
>> This sounds reasonable.
>>
>> However, I'm wondering if we shouldn't be even more explicit about
>> this... From libvirt's point of view we just need to agree on some
>> sort of "trigger" that causes it to allocate PCI addresses instead
>> of MMIO addresses, and for that purpose "any PCI controller or any
>> device with a PCI address" is perfectly fine; looking at that from
>> the point of view of an user, though? Not sure about it.
>>
>> What about adding something like
>>
>> <preferences>
>> <preference name="defaultAddressType"
value="pci"/>
>> </preferences>
>>
>> to the domain XML? AFAICT libvirt doesn't have any element that
>> could be used for this purpose at the moment, but maybe having a
>> generic way to set domain-wide preferences could be useful in
>> other situations as well?
> [snip]
>
> Looking at this mail and laine's before I really get the impression we
> are over-thinking this all. The automatic address assignment by libvirt
> was written such that it matched what QEMU would have done, so that we
> could introduce the concept of device addressing without breaking
> existing apps which didn't supply addresses. The automatic addressing
> logic certainly wasn't ever intended to be able to implement arbitrary
> policies.
>
> As a general rule libvirt has always aimed to stay out of the policy
> business, and focus on providing the mechanism only. So when we start
> talking about adding <preferences> to the XML this is a sure sign
> that we've gone too far into trying to implement policy in libvirt.
>
> >From the POV of being able to tell libvirt to assign PCI instead of
> MMIO addresses for the virt machine, I think it suffices to have
> libvirt accept a simple <address type="pci"/> element (ie with no
> bus/slot/func filled in).
That's just a more limited case of the same type of feature creep though -
you're specifying a preference for what type of address you want, just in a
different place. (The odd thing is that we're adding it only to remedy what is
apparently a temporary problem, and even then it's a problem that wouldn't
exist if we just said simply this: "If virtio-mmio is specified, then use
virtio-mmio; If no address is specified, use pci". This puts the burden on
people insisting on the old / soon-to-be-deprecated (?) virtio-mmio address
type to add a little something to the XML (and a very simple little something
at that!). For a short time, this could cause problems for people who create
new domains using qemu binaries that don't have a pci bus on aarch64/virt yet,
but that will be easily solvable (by adding "<address
type='virtio-mmio'/>",
which is nearly as simple as typing "<address type='pci'/>").
A downside of setting your preference with <address type="pci"/> vs.
having
the preference in a separate element is that you can no longer cause a
"re-addressing" of all the devices by simply removing all the <address>
elements (and it's really that that got me thinking about adding
hints/preferences in the <target> element). I don't know how important that
is; I suppose when the producer of the XML is some other software it's a
non-issue, when the producer is a human using virsh edit it would make life
much simpler once in awhile (and the suggestion in the previous paragraph
would eliminate that problem anyway).
So is there a specific reason why we need to keep virtio-mmio as the default,
and require explicitly saying "address type='pci'" in order to get a
PCI address?
I explained this in my reply to Dan just now, but basically all currently
released distros don't work with virtio-pci. However long term I've suggested
from the start that we switch to virtio-pci by default, keying off a new
enough -M virt version, as an arbitrary marker in time when hopefully most
common distros work with virtio-pci
> If applictions care about assigning devices to specify PCI
buses,
> ie to distinguish between PCI & PCIe, or to pick a different bus
> when there is one bus per NUMA node, then really it is time for
> the application to start specifying the full <address> element
> with all details, and not try to invent ever more complex policies
> inside libvirt which will never deal with all application use cases.
First, I agree with your vigilance against unnecessary feature creep. Both
because keeping things simple makes it easier to maintain and use, and also
because it's very difficult (or even impossible) to change something once it's
gone in, in the case that you have second thoughts.
But, expanding beyond just the aarch64/virt mmio vs. pci problem (which I had
already done in this thread, and which is what got us into this "feature
creep" discussion in the first place :-)...
I remember when we were first adding support for Q35 that we said it should be
as easy to create a domain with Q35 machinetype as it is for 440fx, i.e. you
should be able to do it without manually specifying any PCI addresses (that's
why we have the strange default bus hierarchy, with a dmi-to-pci-bridge and a
pci-bridge, and all devices going onto the pci-bridge). But of course 440fx is
very simplistic - all slots are standard PCI and all are hotpluggable. On Q35
there could be room for choices though, in particular PCI vs PCIe and
hotpluggable vs. not. (and yeah, also what NUMA node the bus is on; I agree
that one is "getting out there"). So since there are choices, libvirt has by
definition begun implementing policy when it auto-assigns *any* PCI address
(current policy is "require all auto-assigned device addresses to be
hotpluggable standard PCI). And if libvirt doesn't provide an easy way to
choose, it will have implemented a defacto mandatory policy (unless you force
full specification of PCI addresses, which goes against the "make it easy"
goal).
And the current policy is looking (for Q35 and aarch64/virt at least) less and
less like the correct one - in the years since it was put in, 1) we've learned
that qemu doesn't care, and will not ever care, that a PCI device has been
plugged into a PCIe slot, 2) we now have support for several PCIe controllers
we didn't previously have, 3) people have started complaining that their
devices are in PCI slots rather than PCIe, 4) even though it was stated at the
time I wrote the code to auto-add a pci-bridge to every Q35 domain that
pci-bridge's failure to support hotplug was a temporary bug, I just learned
yesterday that it apparently still doesn't work, and 5) some platforms (e.g.
our favorite aarch64/virt) are emulating a hardware platform that has *never*
had standard PCI slots, only PCIe.
Beyond that, there is no place that provides a simple encoding of which type
of controller provides which type of slot, and what is allowed to plug into
what. If you require the management application/human to manually specify all
the PCI addresses as soon as they have a need for one of these basic
characteristics, then not only has it become cumbersome to define a domain
(because the management app has to maintain a data structure to keep track of
which PCI addresses are in use and which aren't), but it means that the
management application also needs to know all sorts of rules about which PCI
controllers are actually pcie vs. pci, and which accept hotplug devices vs.
which don't, as well as things like the min/max slot number for each
controller, and which ones can plug into where, e.g. a pcie-root-port can only
plug into pcie-root or pcie-expander-bus, and a pcie-switch-downstream-port
can only plug into a pcie-switch-upstream-port, etc. Requiring a management
app to get all of that right just so that they can pick between a hotpluggable
and non-hotpluggable slot seems like an overly large burden (and prone to error).
In the end, if libvirt makes it simple for the management app to specify what
kind of slot it wants, rather than requiring it to specify the exact slot,
then libvirt isn't making any policy decisions, it's just making it easier for
the management app to implement its own policy decisions, without requiring
the management app to know all the details about which controller implements
what kind of connection etc, and that does seem to fit within libvirt's purpose.
I guess the main thing would be to understand usecases... outside of aarch64
is there a real reason for q35 that apps might need one address policy over
another? If it's something super obscure, like only for testing or dev, then I
think asking people to do it manually is fine.
At least for the arm case I think we can side step the question by adding the
<address type='pci'/> allocation request, and flipping the default from
virtio-mmio to virtio-pci at some future point
- Cole