On Thu, 2016-09-29 at 15:20 -0400, Laine Stump wrote:
For machinetypes with a pci-root bus (all legacy PCI), libvirt will
make a "fake" reservation for one extra slot prior to assigning
addresses to unaddressed PCI endpoint devices in the domain. This will
trigger auto-adding of a pci-bridge for the final device to be
assigned an address *if that device would have otherwise instead been
the last device on the last available pci-bridge*; thus it assures
that there will always be at least one slot left open in the domains
bus topology for expansion (which is important both for hotplug (since
a new pci-bridge can't be added while the guest is running) as well as
for coldplug (since adding a new device might otherwise in some cases
require re-addressing existing devices, which we want to avoid)).
It's important to note that for this (legacy PCI) case, we must check
for the special case of all slots on all buses being occupied *prior
to assigning any addresses*, and avoid attempting to reserve the extra
address in that case, because there is no free address in the existing
topology, so no place to auto-add a pci-bridge for expansion. Since
that condition can only be reached by manual intervention, this is
acceptable.
For machinetypes with pcie-root, libvirt's methodology for
automatically expanding the bus topology is different -
pcie-root-ports are plugged into slots (soon to be functions) of
pcie-root as needed, and the new endpoint devices are assigned to the
single slot in each pcie-root-port. This is done so that the devices
are, by default, hotpluggable (the slots of pcie-root don't support
hotplug, but the single slot of the pcie-root-port does). Since
pcie-root-ports can only be plugged into pcie-root, and we don't
auto-assign endpoint devices to those slots, this means topology
expansion doesn't compete with endpoint devices for slots, so we don't
need to worry about checking for all "useful" slots being free prior
to assigning addresses to new endpoint devices - as a matter of fact,
if we attempt to reserve the fake slots before the useful slots, it
can lead to errors.
Instead this patch just reserves slots for "fake" devices after doing
the assignment for actual devices - if there were already enough
unoccupied pcie-root-ports, then none will be added; if not, then
enough will be added to support the desired (hardcoded) potential
number of hotplugged devices.
Since there hasn't been any other concrete suggestion for the number
of "available hotpluggable slots" libvirt should assure, this patch
uses "4" as the answer (thanks Drew!) Alternatives could be:
0 - hotplug would only work if the user had thought to add an extra
pcie-root to the config *as an extra step after adding and saving each
new device*. (this would preclude creating a new domain that had a
pcie-root-port available for hotplug - the extra modify/save step
would be needed even during initial domain creation, and would need to
be repeated every time a device was added to the domain)
1 - this is the *minimum* number of hotpluggable slots libvirt
guarantees for legacy PCI domains (with pci-root). Of course on
average these domains are more likely to have *15* slots available
(half of the slots on a pci-bridge).
"anything else" - really any choice made for this is going to be
considered wrong by *somebody*. I hope we can all agree that "0" is a
wrong choice, just because it will require so much ongoing
babysitting.
A couple of us have brought up the idea of having
"availableHotplugSlots" be configurable in the domain. There has also
been sentiment against that. Perhaps it could be configurable in
qemu.conf? (I don't really like that either, though - I think
everything about the domain should be self-contained in the domain
XML, and if something is tunable, it should be tunable separately for
each domain).
In the absense of anything configurable, we will need to pick a number
though. I've done that here, and now we can argue about it (or not :-)
As we have already agreed on a slightly different approach,x
eg. doing pretty much exactly this, but only if the
configuration provided by the user contains no PCI controller
(except possibly for pcie-root), I think it's fair to say:
NACK
--
Andrea Bolognani / Red Hat / Virtualization