
On Thu, 2016-09-29 at 15:20 -0400, Laine Stump wrote:
For machinetypes with a pci-root bus (all legacy PCI), libvirt will make a "fake" reservation for one extra slot prior to assigning addresses to unaddressed PCI endpoint devices in the domain. This will trigger auto-adding of a pci-bridge for the final device to be assigned an address *if that device would have otherwise instead been the last device on the last available pci-bridge*; thus it assures that there will always be at least one slot left open in the domains bus topology for expansion (which is important both for hotplug (since a new pci-bridge can't be added while the guest is running) as well as for coldplug (since adding a new device might otherwise in some cases require re-addressing existing devices, which we want to avoid)). It's important to note that for this (legacy PCI) case, we must check for the special case of all slots on all buses being occupied *prior to assigning any addresses*, and avoid attempting to reserve the extra address in that case, because there is no free address in the existing topology, so no place to auto-add a pci-bridge for expansion. Since that condition can only be reached by manual intervention, this is acceptable. For machinetypes with pcie-root, libvirt's methodology for automatically expanding the bus topology is different - pcie-root-ports are plugged into slots (soon to be functions) of pcie-root as needed, and the new endpoint devices are assigned to the single slot in each pcie-root-port. This is done so that the devices are, by default, hotpluggable (the slots of pcie-root don't support hotplug, but the single slot of the pcie-root-port does). Since pcie-root-ports can only be plugged into pcie-root, and we don't auto-assign endpoint devices to those slots, this means topology expansion doesn't compete with endpoint devices for slots, so we don't need to worry about checking for all "useful" slots being free prior to assigning addresses to new endpoint devices - as a matter of fact, if we attempt to reserve the fake slots before the useful slots, it can lead to errors. Instead this patch just reserves slots for "fake" devices after doing the assignment for actual devices - if there were already enough unoccupied pcie-root-ports, then none will be added; if not, then enough will be added to support the desired (hardcoded) potential number of hotplugged devices. Since there hasn't been any other concrete suggestion for the number of "available hotpluggable slots" libvirt should assure, this patch uses "4" as the answer (thanks Drew!) Alternatives could be: 0 - hotplug would only work if the user had thought to add an extra pcie-root to the config *as an extra step after adding and saving each new device*. (this would preclude creating a new domain that had a pcie-root-port available for hotplug - the extra modify/save step would be needed even during initial domain creation, and would need to be repeated every time a device was added to the domain) 1 - this is the *minimum* number of hotpluggable slots libvirt guarantees for legacy PCI domains (with pci-root). Of course on average these domains are more likely to have *15* slots available (half of the slots on a pci-bridge). "anything else" - really any choice made for this is going to be considered wrong by *somebody*. I hope we can all agree that "0" is a wrong choice, just because it will require so much ongoing babysitting. A couple of us have brought up the idea of having "availableHotplugSlots" be configurable in the domain. There has also been sentiment against that. Perhaps it could be configurable in qemu.conf? (I don't really like that either, though - I think everything about the domain should be self-contained in the domain XML, and if something is tunable, it should be tunable separately for each domain). In the absense of anything configurable, we will need to pick a number though. I've done that here, and now we can argue about it (or not :-)
As we have already agreed on a slightly different approach,x eg. doing pretty much exactly this, but only if the configuration provided by the user contains no PCI controller (except possibly for pcie-root), I think it's fair to say: NACK -- Andrea Bolognani / Red Hat / Virtualization