[libvirt] Analysis of the effect of adding PCIe root ports

I was asked to look at the impact on boot times of adding (empty) PCIe root ports to the device model. The proposal from Laine is to add a few of these to every guest to allow hotplugging. Last time I looked into this I found that probing any (legacy) PCI device is expensive because of the inefficient way that qemu emulates accesses to PCI config space, requiring IIRC 2 or 4 VMEXITs to access every word. (PCI slots which are not occupied are basically free, the problem is PCI devices). At that time I did not look at Q35/PCIe at all. We generally aim for boot times under 600-700ms. Probing PCI devices takes a significant fraction of this time. The detailed analysis is attached. It comes from a program called 'boot-analysis' (https://github.com/libguestfs/libguestfs/tree/master/utils/boot-analysis). It is best viewed using 'less -r' so that you can see the colours. The summary table is: Number of ports 0 1 2 3 4 bios:overhead 874 884 875 878 935 kernel:entry 159 163 165 174 491 kernel:initcalls-before-userspace 1065 1090 1110 1147 1263 /init:udev-overhead 173 187 185 193 301 insmod virtio_pci 43 41 41 41 74 TOTAL + 51 + 62 + 119 + 750 (All times are in milliseconds.) A few observations: (1) For #ports <= 3, each extra port adds around 30-40ms to the boot time, which is consistent with what I saw when I measured legacy PCI. (2) There is a sudden, unexplained and reproducible discontinuity when going from 3 to 4 ports. (Because libguestfs uses other devices, this might not actually be 3 to 4 PCIe devices, this is spare ports after all the others added for libguestfs devices.) (3) "kernel:entry" covers a lot of initialization that happens before the kernel prints its version string. This is moderately affected by adding PCIe ports until we hit the discontinuity. Based on this analysis I conclude: (a) Unless we can explain the discontinuity, avoid adding more than 3 root ports. (b) It would be nice to turn the whole thing off for people who don't care about / need hotplugging. (c) We could make PCI probing a lot faster by having the kernel handle PCI config space. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org

On 10/05/2016 11:50 AM, Richard W.M. Jones wrote:
I was asked to look at the impact on boot times of adding (empty) PCIe root ports to the device model. The proposal from Laine is to add a few of these to every guest to allow hotplugging.
Last time I looked into this I found that probing any (legacy) PCI device is expensive because of the inefficient way that qemu emulates accesses to PCI config space, requiring IIRC 2 or 4 VMEXITs to access every word. (PCI slots which are not occupied are basically free, the problem is PCI devices). At that time I did not look at Q35/PCIe at all.
We generally aim for boot times under 600-700ms. Probing PCI devices takes a significant fraction of this time.
The detailed analysis is attached. It comes from a program called 'boot-analysis' (https://github.com/libguestfs/libguestfs/tree/master/utils/boot-analysis). It is best viewed using 'less -r' so that you can see the colours.
The summary table is:
Number of ports 0 1 2 3 4
bios:overhead 874 884 875 878 935 kernel:entry 159 163 165 174 491 kernel:initcalls-before-userspace 1065 1090 1110 1147 1263 /init:udev-overhead 173 187 185 193 301 insmod virtio_pci 43 41 41 41 74
TOTAL + 51 + 62 + 119 + 750
(All times are in milliseconds.)
A few observations:
(1) For #ports <= 3, each extra port adds around 30-40ms to the boot time, which is consistent with what I saw when I measured legacy PCI.
Thanks for doing this (and especially on such short notice!). This is interesting info. I'm wondering now if the discontinuity is due to going over a threshold in the number of pcie-root-ports, the number of total PCI controllers, or the total number of devices. This is significant because a couple of the patches in the "Use more PCIe less legacy PCI" patchset I sent a two weeks ago will eliminate both the dmi-to-pci-bridge and the pci-bridge in the default config (as long as there are no legacy PCI devices). If the "jump" is due to an increase in total devices, or total PCI controllers, then getting rid of those two will change the number of root-ports that can be added prior to the jump. Even if this is the case, it seems like we still want to have as little fat as possible.
(2) There is a sudden, unexplained and reproducible discontinuity when going from 3 to 4 ports. (Because libguestfs uses other devices, this might not actually be 3 to 4 PCIe devices, this is spare ports after all the others added for libguestfs devices.)
(3) "kernel:entry" covers a lot of initialization that happens before the kernel prints its version string. This is moderately affected by adding PCIe ports until we hit the discontinuity.
Based on this analysis I conclude:
(a) Unless we can explain the discontinuity, avoid adding more than 3 root ports.
(b) It would be nice to turn the whole thing off for people who don't care about / need hotplugging.
I had contemplated having an "availablePCIeSlots" (or something like that) that was either an attribute of the config, or an option in qemu.conf or libvirtd.conf. If we had such a setting, it could be set to "0".
(c) We could make PCI probing a lot faster by having the kernel handle PCI config space.

On Wed, Oct 05, 2016 at 01:26:31PM -0400, Laine Stump wrote:
On 10/05/2016 11:50 AM, Richard W.M. Jones wrote:
(b) It would be nice to turn the whole thing off for people who don't care about / need hotplugging.
I had contemplated having an "availablePCIeSlots" (or something like that) that was either an attribute of the config, or an option in qemu.conf or libvirtd.conf. If we had such a setting, it could be set to "0".
Note that changes to libvirt conf files are not usable by libguestfs. The setting would need to go into the XML, and please also make it possible to determine if $random version of libvirt supports the setting, either by a version check or something in capabilities. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW

On Wed, 2016-10-05 at 18:36 +0100, Richard W.M. Jones wrote:
(b) It would be nice to turn the whole thing off for people who don't care about / need hotplugging. I had contemplated having an "availablePCIeSlots" (or something like that) that was either an attribute of the config, or an option in qemu.conf or libvirtd.conf. If we had such a setting, it could be set to "0".
I remember some pushback when this was proposed. Maybe we should just give up on the idea of providing spare hotpluggable PCIe slots by default and ask the user to add them explicitly after all.
Note that changes to libvirt conf files are not usable by libguestfs. The setting would need to go into the XML, and please also make it possible to determine if $random version of libvirt supports the setting, either by a version check or something in capabilities.
Note that you can avoid using any PCIe root port at all by assigning PCI addresses manually. It looks like the overhead for the small (I'm assuming) number of devices a libguestfs appliance will use is low enough that you will probably not want to open that can of worm, though. -- Andrea Bolognani / Red Hat / Virtualization

On 10/06/2016 06:58 AM, Andrea Bolognani wrote:
(b) It would be nice to turn the whole thing off for people who don't care about / need hotplugging.
I had contemplated having an "availablePCIeSlots" (or something like that) that was either an attribute of the config, or an option in qemu.conf or libvirtd.conf. If we had such a setting, it could be set to "0". I remember some pushback when this was proposed. Maybe we should just give up on the idea of providing spare hotpluggable PCIe slots by default and ask the user to add
On Wed, 2016-10-05 at 18:36 +0100, Richard W.M. Jones wrote: them explicitly after all.
We need to have *something* new in the config related to this, otherwise it will be impossible to add new "available for hotplug" ports at the same time as new unaddressed endpoint devices are added (unless an extra port is added for each endpoint device as well, and of course that is assuming that the user knows about things like auto-added memory balloon devices, and which devices are going to be PCIe and which will be legacy PCI, etc). If the "available" ports aren't somehow marked, the endpoint devices will end up being placed on them.
Note that changes to libvirt conf files are not usable by libguestfs.
The setting would need to go into the XML, and please also make it possible to determine if $random version of libvirt supports the setting, either by a version check or something in capabilities. Note that you can avoid using any PCIe root port at all by assigning PCI addresses manually. It looks like the overhead for the small (I'm assuming) number of devices a libguestfs appliance will use is low enough that you will probably not want to open that can of worm, though.
I looked in an example domain config Rich pastebin-ed in IRC yesterday. These are the PCI devices he has: * <controller type='scsi' model='virtio-scsi'> - currently put on a pci-bridge, but will end up on a root-port after my patches * <controller type='usb' model='blah'> x 4 - a set of USB2 controllers. This will turn into a single USB3 controller on a root-port after my patches. Alternately, since it seems you don't use it, you could eliminate it with: <controller type='usb' model='none'/> * <controller type='sata'> - this is fixed on 00:1f.2, and there's nothing we can do to get rid of it or even change its address * <controller type='pci' model='dmi-to-pci-bridge'> - the need for this will be eliminated by my patches * <controller type='pci' model='pci-bridge'> - also eliminated * <controller type='virtio-serial'> - currently on the pci-bridge, but will move to a root-port after my patches * <memballoon model='virtio'> - Rich, do you actually use this device? Seems doubtful. If not, then add <memballoon model='none'/> to your config to prevent it. After getting rid of the controllers that will be eliminated by my patches, and the "default" devices that you can get rid of with manual intervention (model='none'), you'll be left with the following devices on pcie-root-ports which *could* be manually moved to pcie-root (thus eliminating one pcie-root-port from the domain): 1) virtio-scsi controller 2) virtio-serial controller and nothing else. Manually address those two to be on bus 0 (pcie-root), and (with my patches) you've reduced your PCI device+controller count from the current 10 down to 3 (including the sata controller).

On Thu, Oct 06, 2016 at 10:00:39AM -0400, Laine Stump wrote:
* <controller type='usb' model='blah'> x 4 - a set of USB2 controllers. This will turn into a single USB3 controller on a root-port after my patches. Alternately, since it seems you don't use it, you could eliminate it with:
Yup, this is auto-added, and a mistake. I have sent a patch upstream adding:
<controller type='usb' model='none'/> <memballoon model='none'/>
...
1) virtio-scsi controller 2) virtio-serial controller
and nothing else. Manually address those two to be on bus 0 (pcie-root), and (with my patches) you've reduced your PCI device+controller count from the current 10 down to 3 (including the sata controller).
Interesting. Is there any particular reason why we should or should not use explicit PCI addresses for the remaining devices? What would you recommend we do? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html

On 10/06/2016 11:20 AM, Richard W.M. Jones wrote:
On Thu, Oct 06, 2016 at 10:00:39AM -0400, Laine Stump wrote:
* <controller type='usb' model='blah'> x 4 - a set of USB2 controllers. This will turn into a single USB3 controller on a root-port after my patches. Alternately, since it seems you don't use it, you could eliminate it with: Yup, this is auto-added, and a mistake.
I have sent a patch upstream adding:
<controller type='usb' model='none'/> <memballoon model='none'/>
...
1) virtio-scsi controller 2) virtio-serial controller
and nothing else. Manually address those two to be on bus 0 (pcie-root), and (with my patches) you've reduced your PCI device+controller count from the current 10 down to 3 (including the sata controller). Interesting. Is there any particular reason why we should or should not use explicit PCI addresses for the remaining devices? What would you recommend we do?
For 440fx it makes no difference. For Q35 it can eliminate the "extra PCI controllers" - current upstream libvirt will add a dmi-to-pci-bridge, then a pci-bridge plugged into that, and add your PCI devices to that; after my in-progress patches, libvirt will add a pcie-root-port for each virtio device, and plug them into those. If you manually address the devices to "some unused slot on bus 0", then they will be directly plugged into pcie-root, and no extra controllers will be needed. As far as recommendations, I guess you could manually assign addresses for those two devices that would otherwise be open in both 440fx and Q35. Generally 00:1 (chipset devices) and 00:2 (video) are in use on a 440fx domain, and 00:1 (video) and 00:1F (chipset devices) on a Q35. If you don't disable USB, then USB controllers will also be added at 00:3 (?) on 440fx) and 00:1D on Q35; but you don't use USB so you don't need to worry about this. So you could just manually assign the virtio-scsi and virtio-serial devices to have these two addresses: <address type='pci' bus='0' slot='3'/> <address type='pci' bus='0' slot='4'/> (domain and function default to '0' if not specified - in older libvirt they were required by the RNG, but otherwise optional, in new libvirt they're completely optional). That should work for both machinetypes and not cause any of the stupid outdated "you won't be able to add a video device in the future!" warnings.

On Thu, Oct 06, 2016 at 12:58:51PM +0200, Andrea Bolognani wrote:
On Wed, 2016-10-05 at 18:36 +0100, Richard W.M. Jones wrote:
(b) It would be nice to turn the whole thing off for people who don't care about / need hotplugging. I had contemplated having an "availablePCIeSlots" (or something like that) that was either an attribute of the config, or an option in qemu.conf or libvirtd.conf. If we had such a setting, it could be set to "0".
I remember some pushback when this was proposed. Maybe we should just give up on the idea of providing spare hotpluggable PCIe slots by default and ask the user to add them explicitly after all.
Note that changes to libvirt conf files are not usable by libguestfs. The setting would need to go into the XML, and please also make it possible to determine if $random version of libvirt supports the setting, either by a version check or something in capabilities.
Note that you can avoid using any PCIe root port at all by assigning PCI addresses manually. It looks like the overhead for the small (I'm assuming) number of devices a libguestfs appliance will use is low enough that you will probably not want to open that can of worm, though.
For most apps the performance impact of the PCI enumeration is not a big deal. So having libvirt ensure there's enough available hotpluggable PCIe slots is reasonable, as long as we leave a get-out clause for libguestfs. This could be as simple as declaring that *if* we see one or more <controller type="pci"> in the input XML, then libvirt will honour those and not try to add new controllers to the guest. That way, by default libvirt will just "do the right thing" and auto-create a suitable number of controllers needed to boot the guest. Apps that want strict control though, can specify the <controllers> elements themselves. Libvirt can still auto-assign device addresses onto these controllers. It simply wouldn't add any further controllers itself at that point. NB I'm talking cold-boot here. So libguestfs would specify <controller> itself to the minimal set it wants to optimize its boot performance. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 10/06/2016 11:31 AM, Daniel P. Berrange wrote:
On Thu, Oct 06, 2016 at 12:58:51PM +0200, Andrea Bolognani wrote:
(b) It would be nice to turn the whole thing off for people who don't care about / need hotplugging.
I had contemplated having an "availablePCIeSlots" (or something like that) that was either an attribute of the config, or an option in qemu.conf or libvirtd.conf. If we had such a setting, it could be set to "0". I remember some pushback when this was proposed. Maybe we should just give up on the idea of providing spare hotpluggable PCIe slots by default and ask the user to add
On Wed, 2016-10-05 at 18:36 +0100, Richard W.M. Jones wrote: them explicitly after all.
Note that changes to libvirt conf files are not usable by libguestfs.
The setting would need to go into the XML, and please also make it possible to determine if $random version of libvirt supports the setting, either by a version check or something in capabilities. Note that you can avoid using any PCIe root port at all by assigning PCI addresses manually. It looks like the overhead for the small (I'm assuming) number of devices a libguestfs appliance will use is low enough that you will probably not want to open that can of worm, though. For most apps the performance impact of the PCI enumeration is not a big deal. So having libvirt ensure there's enough available hotpluggable PCIe slots is reasonable, as long as we leave a get-out clause for libguestfs.
This could be as simple as declaring that *if* we see one or more <controller type="pci"> in the input XML, then libvirt will honour those and not try to add new controllers to the guest.
That way, by default libvirt will just "do the right thing" and auto-create a suitable number of controllers needed to boot the guest.
Apps that want strict control though, can specify the <controllers> elements themselves. Libvirt can still auto-assign device addresses onto these controllers. It simply wouldn't add any further controllers itself at that point. NB I'm talking cold-boot here. So libguestfs would specify <controller> itself to the minimal set it wants to optimize its boot performance.
That works for the initial definition of the domain, but as soon as you've saved it once, there will be controllers explicitly in the config, and since we don't have any way of differentiating between auto-added controllers and those specifically requested by the user, we have to assume they were explicitly added, so such a check is then meaningless because you will *always* have PCI controllers. Say you create a domain definition with no controllers, you would get enough for the devices in the initial config, plus "N" more empty root ports. Let's say you then add 4 more devices (either hotplug or coldplug, doesn't matter). Those devices are placed on the existing unused pcie-root-ports. But now all your ports are full, and since you have PCI controllers in the config, libvirt is going to say "Ah, this user knows what they want to do, so I'm not going to add any extras! I'm so smart!". This would be especially maddening in the case of "coldplug", where libvirt could have easily added a new controller to accomodate the new device, but didn't. Unless we don't care what happens after the initial definition (and then adding of "N" new devices), trying to behave properly purely based on whether or not there are any PCI controllers present in the config isn't going to work. (BTW, isn't there something wrt aarch64 about "no pci controllers in config means use mmio for devices", or something like that? (Or maybe we were just *thinking* about that and didn't actually do it, I don't remember). Using the lack of PCI controllers in the config to imply "automatically add necessary + extra controllers" would directly conflict with that.)

On Thu, Oct 06, 2016 at 11:57:17AM -0400, Laine Stump wrote:
On 10/06/2016 11:31 AM, Daniel P. Berrange wrote:
On Thu, Oct 06, 2016 at 12:58:51PM +0200, Andrea Bolognani wrote:
(b) It would be nice to turn the whole thing off for people who don't care about / need hotplugging. I had contemplated having an "availablePCIeSlots" (or something like that) that was either an attribute of the config, or an option in qemu.conf or libvirtd.conf. If we had such a setting, it could be set to "0". I remember some pushback when this was proposed. Maybe we should just give up on the idea of providing spare hotpluggable PCIe slots by default and ask the user to add
On Wed, 2016-10-05 at 18:36 +0100, Richard W.M. Jones wrote: them explicitly after all.
Note that changes to libvirt conf files are not usable by libguestfs. The setting would need to go into the XML, and please also make it possible to determine if $random version of libvirt supports the setting, either by a version check or something in capabilities. Note that you can avoid using any PCIe root port at all by assigning PCI addresses manually. It looks like the overhead for the small (I'm assuming) number of devices a libguestfs appliance will use is low enough that you will probably not want to open that can of worm, though. For most apps the performance impact of the PCI enumeration is not a big deal. So having libvirt ensure there's enough available hotpluggable PCIe slots is reasonable, as long as we leave a get-out clause for libguestfs.
This could be as simple as declaring that *if* we see one or more <controller type="pci"> in the input XML, then libvirt will honour those and not try to add new controllers to the guest.
That way, by default libvirt will just "do the right thing" and auto-create a suitable number of controllers needed to boot the guest.
Apps that want strict control though, can specify the <controllers> elements themselves. Libvirt can still auto-assign device addresses onto these controllers. It simply wouldn't add any further controllers itself at that point. NB I'm talking cold-boot here. So libguestfs would specify <controller> itself to the minimal set it wants to optimize its boot performance.
That works for the initial definition of the domain, but as soon as you've saved it once, there will be controllers explicitly in the config, and since we don't have any way of differentiating between auto-added controllers and those specifically requested by the user, we have to assume they were explicitly added, so such a check is then meaningless because you will *always* have PCI controllers.
Ok, so coldplug was probably the wrong word to use. What I actually meant was "at time of initial define", since that's when libvirt actually does its controller auto-creation. If you later add more devices to the guest, whether it is online or offline, that libvirt would still be auto-adding more controllers if required (and if possible) . I was not expecting libvirt to remember whether we were auto-adding controllers the first time or not.
Say you create a domain definition with no controllers, you would get enough for the devices in the initial config, plus "N" more empty root ports. Let's say you then add 4 more devices (either hotplug or coldplug, doesn't matter). Those devices are placed on the existing unused pcie-root-ports. But now all your ports are full, and since you have PCI controllers in the config, libvirt is going to say "Ah, this user knows what they want to do, so I'm not going to add any extras! I'm so smart!". This would be especially maddening in the case of "coldplug", where libvirt could have easily added a new controller to accomodate the new device, but didn't.
Unless we don't care what happens after the initial definition (and then adding of "N" new devices), trying to behave properly purely based on whether or not there are any PCI controllers present in the config isn't going to work.
I think that's fine. Lets stop talking about coldplug since that's very misleading. What I mean is that... 1. When initially defining a guest If no controllers are present, auto-add controllers implied by the machine type, sufficient to deal with all currently listed devices, plus "N" extra spare ports. Else, simply assign devices to the controllers listed in the XML config. If there are no extra spare ports after doing this, so be it. It was the application's choice to have not listed enough controllers to allow later addition of more devices. 2. When adding further devices (whether to an offline or online guest) If there's not enough slots left, add further controllers to host the devices. If there were not enough slots left to allow adding further controllers, that must be due to the initial application decision at time of defining the original XML Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 10/06/2016 12:10 PM, Daniel P. Berrange wrote:
On Thu, Oct 06, 2016 at 11:57:17AM -0400, Laine Stump wrote:
On 10/06/2016 11:31 AM, Daniel P. Berrange wrote:
On Thu, Oct 06, 2016 at 12:58:51PM +0200, Andrea Bolognani wrote:
> (b) It would be nice to turn the whole thing off for people who don't > care about / need hotplugging. I had contemplated having an "availablePCIeSlots" (or something like that) that was either an attribute of the config, or an option in qemu.conf or libvirtd.conf. If we had such a setting, it could be set to "0". I remember some pushback when this was proposed. Maybe we should just give up on the idea of providing spare hotpluggable PCIe slots by default and ask the user to add
On Wed, 2016-10-05 at 18:36 +0100, Richard W.M. Jones wrote: them explicitly after all.
Note that changes to libvirt conf files are not usable by libguestfs. The setting would need to go into the XML, and please also make it possible to determine if $random version of libvirt supports the setting, either by a version check or something in capabilities. Note that you can avoid using any PCIe root port at all by assigning PCI addresses manually. It looks like the overhead for the small (I'm assuming) number of devices a libguestfs appliance will use is low enough that you will probably not want to open that can of worm, though. For most apps the performance impact of the PCI enumeration is not a big deal. So having libvirt ensure there's enough available hotpluggable PCIe slots is reasonable, as long as we leave a get-out clause for libguestfs.
This could be as simple as declaring that *if* we see one or more <controller type="pci"> in the input XML, then libvirt will honour those and not try to add new controllers to the guest.
That way, by default libvirt will just "do the right thing" and auto-create a suitable number of controllers needed to boot the guest.
Apps that want strict control though, can specify the <controllers> elements themselves. Libvirt can still auto-assign device addresses onto these controllers. It simply wouldn't add any further controllers itself at that point.
Even if it was adding offline, and there wasn't any place to put a new device? (i.e., the operation would fail without adding a controller, and libvirt was able to add it). Or am I taking your statement beyond its intent (I'm good at that :-)
NB I'm talking cold-boot here. So libguestfs would specify <controller> itself to the minimal set it wants to optimize its boot performance. That works for the initial definition of the domain, but as soon as you've saved it once, there will be controllers explicitly in the config, and since we don't have any way of differentiating between auto-added controllers and those specifically requested by the user, we have to assume they were explicitly added, so such a check is then meaningless because you will *always* have PCI controllers. Ok, so coldplug was probably the wrong word to use. What I actually meant was "at time of initial define", since that's when libvirt actually does its controller auto-creation. If you later add more devices to the guest, whether it is online or offline, that libvirt would still be auto-adding more controllers if required (and if possible) . I was not expecting libvirt to remember whether we were auto-adding controllers the first time or not.
Say you create a domain definition with no controllers, you would get enough for the devices in the initial config, plus "N" more empty root ports. Let's say you then add 4 more devices (either hotplug or coldplug, doesn't matter). Those devices are placed on the existing unused pcie-root-ports. But now all your ports are full, and since you have PCI controllers in the config, libvirt is going to say "Ah, this user knows what they want to do, so I'm not going to add any extras! I'm so smart!". This would be especially maddening in the case of "coldplug", where libvirt could have easily added a new controller to accomodate the new device, but didn't.
Unless we don't care what happens after the initial definition (and then adding of "N" new devices), trying to behave properly purely based on whether or not there are any PCI controllers present in the config isn't going to work. I think that's fine.
Lets stop talking about coldplug since that's very misleading.
Do you mean use of the term "coldplug" at all, or talking about what happens when you add a device to the persistent config of the domain but not to the currently running guest itself?
What I mean is that...
1. When initially defining a guest
If no controllers are present, auto-add controllers implied by the machine type, sufficient to deal with all currently listed devices, plus "N" extra spare ports.
Else, simply assign devices to the controllers listed in the XML config. If there are no extra spare ports after doing this, so be it. It was the application's choice to have not listed enough controllers to allow later addition of more devices.
2. When adding further devices (whether to an offline or online guest)
If there's not enough slots left, add further controllers to host the devices.
Right. That works great for adding devices "offline" (since you don't like the term "coldplug" :-). It's just in the case of hotplug that it's problematic, because you can't add new PCI controllers to a running system (big digression here - skip if you like...) (well, it *could* be possible to hotplug an upstream port plus some number of downstream ports, but qemu doesn't support it because it attaches devices one by one, and guest has to be notified of the entire contraption at once when the upstream port is attached - so you would have to send the attach for all the downstream ports first, then the upstream, but you *can't* do it in that order because then qemu doesn't yet know about the id (alias) you're going to give to the upstream port at the time you're attaching the downstreams. Anyway, even if and when qemu does support hotplugging upstream+downstream ports, you can't add more downstream ports to an existing upstream afterwards, so you would end up with some horrid scheme where you had to always make sure you had at least one open downstream or root-port open every time a device was added.)
If there were not enough slots left to allow adding further controllers, that must be due to the initial application decision at time of defining the original XML
Or it's because there have already been "N" new devices added since the domain was defined, and they're now trying to *hotplug* device "N+1". I'm fine with that behavior, I just want to make sure everyone understands this restriction beforehand. So here's a rewording of your description (with a couple additional conditions) to see if I understand everything correctly: 1) during initial domain definition: A) If there are *no pci controllers at all* (not even a pci-root or pcie-root) *and there are any unaddressed devices that need a PCI slot* then auto-add enough controllers for the requested devices, *and* make sure there are enough empty slots for "N" (do we stick with 4? or make it 3?) devices to be added later without needing more controllers. (So, if the domain has no PCI devices, we don't add anything extra, and also if it only has PCI devices that already have addresses, then we also don't add anything extra). B) if there is at least one pci controller specified in the XML, and there are any unused slots in the pci controllers in the provided XML, then use them for the unaddressed devices. If there are more devices that need an address at this time, also add controllers for them, but no *extra* controllers. (Note to Rich: libguestfs could avoid the extra controllers either by adding a pci-root/pcie-root to the XML, or by manually addressing the devices. The latter would actually be better, since it would avoid the need for any pcie-root-ports). 2) When adding a device to the persistent config (i.e. offline): if there is an empty slot on a controller, use it. If not, add a controller for that device *but no extra controllers* 3) when adding a device to the guest machine (i.e. hotplug / online), if there is an empty slot on a controller, use it. If not, then fail. The differences I see from what (I think) you suggested are: * if there aren't any unaddressed pci devices (even if there are no controllers in the config), then we also don't add any extra controllers (although we will of course add the pci-root or pcie-root, to acknowledge it is there). * if another controller is needed for adding a device offline, it's okay to add it.

On Fri, 2016-10-07 at 10:17 -0400, Laine Stump wrote:
So here's a rewording of your description (with a couple additional conditions) to see if I understand everything correctly: 1) during initial domain definition: A) If there are *no pci controllers at all* (not even a pci-root or pcie-root) *and there are any unaddressed devices that need a PCI slot* then auto-add enough controllers for the requested devices, *and* make sure there are enough empty slots for "N" (do we stick with 4? or make it 3?) devices to be added later without needing more controllers. (So, if the domain has no PCI devices, we don't add anything extra, and also if it only has PCI devices that already have addresses, then we also don't add anything extra). B) if there is at least one pci controller specified in the XML, and there are any unused slots in the pci controllers in the provided XML, then use them for the unaddressed devices. If there are more devices that need an address at this time, also add controllers for them, but no *extra* controllers. (Note to Rich: libguestfs could avoid the extra controllers either by adding a pci-root/pcie-root to the XML, or by manually addressing the devices. The latter would actually be better, since it would avoid the need for any pcie-root-ports). 2) When adding a device to the persistent config (i.e. offline): if there is an empty slot on a controller, use it. If not, add a controller for that device *but no extra controllers* 3) when adding a device to the guest machine (i.e. hotplug / online), if there is an empty slot on a controller, use it. If not, then fail. The differences I see from what (I think) you suggested are: * if there aren't any unaddressed pci devices (even if there are no controllers in the config), then we also don't add any extra controllers (although we will of course add the pci-root or pcie-root, to acknowledge it is there). * if another controller is needed for adding a device offline, it's okay to add it.
So instead of guaranteeing that there will always be an empty slot available for hotplug during a single start/destroy cycle of the guest, we would be guaranteeing that there will be 3/4 empty slots available for either hotplug or coldplug throughout the entire life of the guest. Sounds like a pretty good compromise to me. The only problem I can think of is that there might be management applications that add eg. a pcie-root in the XML when first defining a guest, and after the change such guests would get zero hotpluggable ports. Then again it's probably okay to expect such management applications to add the necessary number of pcie-root-ports themselves. Maybe we could relax the wording on A) and ignore any pci{,e}-root? Even though there is really no reason for either a user or a management application to add them explicitly when defining a guest, I feel like they might be special enough to deserve an exception. -- Andrea Bolognani / Red Hat / Virtualization

On 10/07/2016 11:16 AM, Andrea Bolognani wrote:
So here's a rewording of your description (with a couple additional conditions) to see if I understand everything correctly:
1) during initial domain definition:
A) If there are *no pci controllers at all* (not even a pci-root or pcie-root) *and there are any unaddressed devices that need a PCI slot* then auto-add enough controllers for the requested devices, *and* make sure there are enough empty slots for "N" (do we stick with 4? or make it 3?) devices to be added later without needing more controllers. (So, if the domain has no PCI devices, we don't add anything extra, and also if it only has PCI devices that already have addresses, then we also don't add anything extra).
B) if there is at least one pci controller specified in the XML, and there are any unused slots in the pci controllers in the provided XML, then use them for the unaddressed devices. If there are more devices that need an address at this time, also add controllers for them, but no *extra* controllers.
(Note to Rich: libguestfs could avoid the extra controllers either by adding a pci-root/pcie-root to the XML, or by manually addressing the devices. The latter would actually be better, since it would avoid the need for any pcie-root-ports).
2) When adding a device to the persistent config (i.e. offline): if there is an empty slot on a controller, use it. If not, add a controller for that device *but no extra controllers*
3) when adding a device to the guest machine (i.e. hotplug / online), if there is an empty slot on a controller, use it. If not, then fail.
The differences I see from what (I think) you suggested are:
* if there aren't any unaddressed pci devices (even if there are no controllers in the config), then we also don't add any extra controllers (although we will of course add the pci-root or pcie-root, to acknowledge it is there).
* if another controller is needed for adding a device offline, it's okay to add it. So instead of guaranteeing that there will always be an empty slot available for hotplug during a single start/destroy cycle of the guest, we would be guaranteeing that there will be 3/4 empty slots available for either hotplug or coldplug
On Fri, 2016-10-07 at 10:17 -0400, Laine Stump wrote: throughout the entire life of the guest.
A better way to put it is that we guarantee there will be "N" (3 or 4 or whatever) slots available when the domain is originally defined. Once any change has been made, all bets are off.
Sounds like a pretty good compromise to me.
The only problem I can think of is that there might be management applications that add eg. a pcie-root in the XML when first defining a guest, and after the change such guests would get zero hotpluggable ports. Then again it's probably okay to expect such management applications to add the necessary number of pcie-root-ports themselves.
Yeah, if they know enough to be adding a root-port, then they know enough to add extras.
Maybe we could relax the wording on A) and ignore any pci{,e}-root? Even though there is really no reason for either a user or a management application to add them explicitly when defining a guest, I feel like they might be special enough to deserve an exception.
I thought about that; I'm not sure. In the case of libguestfs, even if Rich added a pcie-root, I guess he would still be manually addressing his pci devices, so that would be clue enough that he knew what he was doing and didn't want any extra.

On Fri, 2016-10-07 at 12:11 -0400, Laine Stump wrote:
So instead of guaranteeing that there will always be an empty slot available for hotplug during a single start/destroy cycle of the guest, we would be guaranteeing that there will be 3/4 empty slots available for either hotplug or coldplug throughout the entire life of the guest. A better way to put it is that we guarantee there will be "N" (3 or 4 or whatever) slots available when the domain is originally defined. Once any change has been made, all bets are off.
Okay, your alternative reading matches with my understanding of your proposal :)
Sounds like a pretty good compromise to me. The only problem I can think of is that there might be management applications that add eg. a pcie-root in the XML when first defining a guest, and after the change such guests would get zero hotpluggable ports. Then again it's probably okay to expect such management applications to add the necessary number of pcie-root-ports themselves. Yeah, if they know enough to be adding a root-port, then they know enough to add extras.
Note that I wrote pcie-root, not pcie-root-port.
Maybe we could relax the wording on A) and ignore any pci{,e}-root? Even though there is really no reason for either a user or a management application to add them explicitly when defining a guest, I feel like they might be special enough to deserve an exception. I thought about that; I'm not sure. In the case of libguestfs, even if Rich added a pcie-root, I guess he would still be manually addressing his pci devices, so that would be clue enough that he knew what he was doing and didn't want any extra.
Yeah, I'm not sure about that either, I just felt like it would be good to think about it. I don't really see problems going one way or the other, and I guess your initial proposal is slightly more strict, so we might as well go for it and relax it later if a compelling use case is found. -- Andrea Bolognani / Red Hat / Virtualization

On 10/10/2016 05:27 AM, Andrea Bolognani wrote:
On Fri, 2016-10-07 at 12:11 -0400, Laine Stump wrote:
So instead of guaranteeing that there will always be an empty slot available for hotplug during a single start/destroy cycle of the guest, we would be guaranteeing that there will be 3/4 empty slots available for either hotplug or coldplug throughout the entire life of the guest.
A better way to put it is that we guarantee there will be "N" (3 or 4 or whatever) slots available when the domain is originally defined. Once any change has been made, all bets are off. Okay, your alternative reading matches with my understanding of your proposal :)
Sounds like a pretty good compromise to me.
The only problem I can think of is that there might be management applications that add eg. a pcie-root in the XML when first defining a guest, and after the change such guests would get zero hotpluggable ports. Then again it's probably okay to expect such management applications to add the necessary number of pcie-root-ports themselves.
Yeah, if they know enough to be adding a root-port, then they know enough to add extras. Note that I wrote pcie-root, not pcie-root-port.
Maybe we could relax the wording on A) and ignore any pci{,e}-root? Even though there is really no reason for either a user or a management application to add them explicitly when defining a guest, I feel like they might be special enough to deserve an exception.
I thought about that; I'm not sure. In the case of libguestfs, even if Rich added a pcie-root, I guess he would still be manually addressing his pci devices, so that would be clue enough that he knew what he was doing and didn't want any extra. Yeah, I'm not sure about that either, I just felt like it would be good to think about it. I don't really see problems going one way or the other, and I guess your initial proposal is slightly more strict, so we might as well go for it and relax it later if a compelling use case is found.
I implemented this over the weekend, and it turns out that it's much less disruptive to ignore whether the pcie-root controller was added by the user or by libvirt. This is because pcie-root is added much earlier (when we are adding the implicit/default devices for the domain), so by the time we get to the function that assigns PCI addresses (which is where we auto-add all the additional PCI controllers) we no longer have any information about whether pcie-root was specified in the XML, or if it was auto-added. The way I have it implemented now is to add the extra controllers if the initial controller count was <= 1 *and* any new PCI controllers were auto-added during address assignment (which implies that an unaddressed device needed an address, so we added a controller to satisfy it). I suppose if I want to be pedantic, I need to actually keep track of whether or not there were any unaddressed devices (in the case that the only devices they requested were primary video + USB2 + SATA controller, no new PCI controllers would be added).

On Thu, 2016-10-06 at 11:57 -0400, Laine Stump wrote:
(BTW, isn't there something wrt aarch64 about "no pci controllers in config means use mmio for devices", or something like that? (Or maybe we were just *thinking* about that and didn't actually do it, I don't remember). Using the lack of PCI controllers in the config to imply "automatically add necessary + extra controllers" would directly conflict with that.)
We were just thinking about it. mach-virt guests use MMIO addresses by default, but you can force specific devices to use PCI instead by adding <address type='pci'/> to the relevant element. How we will make it easy for users and management applications to create pure PCIe mach-virt guests is still up for debate :) -- Andrea Bolognani / Red Hat / Virtualization
participants (4)
-
Andrea Bolognani
-
Daniel P. Berrange
-
Laine Stump
-
Richard W.M. Jones