
On 03/09/2016 02:09 PM, Laine Stump wrote:
On 03/09/2016 09:54 AM, Daniel P. Berrange wrote:
On Wed, Mar 09, 2016 at 01:40:36PM +0100, Andrea Bolognani wrote:
On Fri, 2016-03-04 at 17:05 -0500, Laine Stump wrote:
I'm not sure I fully understand all of the above, but I'll pitch in with my own proposal regardless :) First, we make sure that <controller type='pci' index='0' model='pcie-root'/> is always added automatically to the domain XML when using the mach-virt machine type. Then, if <controller type='pci' index='1' model='dmi-to-pci-bridge'/> <controller type='pci' index='2' model='pci-bridge'/> is present as well we default to virtio-pci, otherwise we use the current default of virtio-mmio. This should allow management applications, based on knowledge about the guest OS, to easily pick between the two address schemes. Does this sound like a good idea? ... or a variation of that, anyway :-) What I think: If there are *any* pci controllers *beyond* pcie-root, or if there are any devices that already have a PCI address, then assign PCI addresses, else use mmio. This sounds reasonable.
However, I'm wondering if we shouldn't be even more explicit about this... From libvirt's point of view we just need to agree on some sort of "trigger" that causes it to allocate PCI addresses instead of MMIO addresses, and for that purpose "any PCI controller or any device with a PCI address" is perfectly fine; looking at that from the point of view of an user, though? Not sure about it.
What about adding something like
<preferences> <preference name="defaultAddressType" value="pci"/> </preferences>
to the domain XML? AFAICT libvirt doesn't have any element that could be used for this purpose at the moment, but maybe having a generic way to set domain-wide preferences could be useful in other situations as well? [snip]
Looking at this mail and laine's before I really get the impression we are over-thinking this all. The automatic address assignment by libvirt was written such that it matched what QEMU would have done, so that we could introduce the concept of device addressing without breaking existing apps which didn't supply addresses. The automatic addressing logic certainly wasn't ever intended to be able to implement arbitrary policies.
As a general rule libvirt has always aimed to stay out of the policy business, and focus on providing the mechanism only. So when we start talking about adding <preferences> to the XML this is a sure sign that we've gone too far into trying to implement policy in libvirt.
From the POV of being able to tell libvirt to assign PCI instead of MMIO addresses for the virt machine, I think it suffices to have libvirt accept a simple <address type="pci"/> element (ie with no bus/slot/func filled in).
That's just a more limited case of the same type of feature creep though - you're specifying a preference for what type of address you want, just in a different place. (The odd thing is that we're adding it only to remedy what is apparently a temporary problem, and even then it's a problem that wouldn't exist if we just said simply this: "If virtio-mmio is specified, then use virtio-mmio; If no address is specified, use pci". This puts the burden on people insisting on the old / soon-to-be-deprecated (?) virtio-mmio address type to add a little something to the XML (and a very simple little something at that!). For a short time, this could cause problems for people who create new domains using qemu binaries that don't have a pci bus on aarch64/virt yet, but that will be easily solvable (by adding "<address type='virtio-mmio'/>", which is nearly as simple as typing "<address type='pci'/>").
A downside of setting your preference with <address type="pci"/> vs. having the preference in a separate element is that you can no longer cause a "re-addressing" of all the devices by simply removing all the <address> elements (and it's really that that got me thinking about adding hints/preferences in the <target> element). I don't know how important that is; I suppose when the producer of the XML is some other software it's a non-issue, when the producer is a human using virsh edit it would make life much simpler once in awhile (and the suggestion in the previous paragraph would eliminate that problem anyway).
So is there a specific reason why we need to keep virtio-mmio as the default, and require explicitly saying "address type='pci'" in order to get a PCI address?
I explained this in my reply to Dan just now, but basically all currently released distros don't work with virtio-pci. However long term I've suggested from the start that we switch to virtio-pci by default, keying off a new enough -M virt version, as an arbitrary marker in time when hopefully most common distros work with virtio-pci
If applictions care about assigning devices to specify PCI buses, ie to distinguish between PCI & PCIe, or to pick a different bus when there is one bus per NUMA node, then really it is time for the application to start specifying the full <address> element with all details, and not try to invent ever more complex policies inside libvirt which will never deal with all application use cases.
First, I agree with your vigilance against unnecessary feature creep. Both because keeping things simple makes it easier to maintain and use, and also because it's very difficult (or even impossible) to change something once it's gone in, in the case that you have second thoughts.
But, expanding beyond just the aarch64/virt mmio vs. pci problem (which I had already done in this thread, and which is what got us into this "feature creep" discussion in the first place :-)...
I remember when we were first adding support for Q35 that we said it should be as easy to create a domain with Q35 machinetype as it is for 440fx, i.e. you should be able to do it without manually specifying any PCI addresses (that's why we have the strange default bus hierarchy, with a dmi-to-pci-bridge and a pci-bridge, and all devices going onto the pci-bridge). But of course 440fx is very simplistic - all slots are standard PCI and all are hotpluggable. On Q35 there could be room for choices though, in particular PCI vs PCIe and hotpluggable vs. not. (and yeah, also what NUMA node the bus is on; I agree that one is "getting out there"). So since there are choices, libvirt has by definition begun implementing policy when it auto-assigns *any* PCI address (current policy is "require all auto-assigned device addresses to be hotpluggable standard PCI). And if libvirt doesn't provide an easy way to choose, it will have implemented a defacto mandatory policy (unless you force full specification of PCI addresses, which goes against the "make it easy" goal).
And the current policy is looking (for Q35 and aarch64/virt at least) less and less like the correct one - in the years since it was put in, 1) we've learned that qemu doesn't care, and will not ever care, that a PCI device has been plugged into a PCIe slot, 2) we now have support for several PCIe controllers we didn't previously have, 3) people have started complaining that their devices are in PCI slots rather than PCIe, 4) even though it was stated at the time I wrote the code to auto-add a pci-bridge to every Q35 domain that pci-bridge's failure to support hotplug was a temporary bug, I just learned yesterday that it apparently still doesn't work, and 5) some platforms (e.g. our favorite aarch64/virt) are emulating a hardware platform that has *never* had standard PCI slots, only PCIe.
Beyond that, there is no place that provides a simple encoding of which type of controller provides which type of slot, and what is allowed to plug into what. If you require the management application/human to manually specify all the PCI addresses as soon as they have a need for one of these basic characteristics, then not only has it become cumbersome to define a domain (because the management app has to maintain a data structure to keep track of which PCI addresses are in use and which aren't), but it means that the management application also needs to know all sorts of rules about which PCI controllers are actually pcie vs. pci, and which accept hotplug devices vs. which don't, as well as things like the min/max slot number for each controller, and which ones can plug into where, e.g. a pcie-root-port can only plug into pcie-root or pcie-expander-bus, and a pcie-switch-downstream-port can only plug into a pcie-switch-upstream-port, etc. Requiring a management app to get all of that right just so that they can pick between a hotpluggable and non-hotpluggable slot seems like an overly large burden (and prone to error).
In the end, if libvirt makes it simple for the management app to specify what kind of slot it wants, rather than requiring it to specify the exact slot, then libvirt isn't making any policy decisions, it's just making it easier for the management app to implement its own policy decisions, without requiring the management app to know all the details about which controller implements what kind of connection etc, and that does seem to fit within libvirt's purpose.
I guess the main thing would be to understand usecases... outside of aarch64 is there a real reason for q35 that apps might need one address policy over another? If it's something super obscure, like only for testing or dev, then I think asking people to do it manually is fine. At least for the arm case I think we can side step the question by adding the <address type='pci'/> allocation request, and flipping the default from virtio-mmio to virtio-pci at some future point - Cole