On Mon, Aug 08, 2016 at 12:41:48PM -0400, Laine Stump wrote:
On 08/08/2016 04:56 AM, Laine Stump wrote:
> When faced with a guest device that requires a PCI address but doesn't
> have one manually assigned in the config, libvirt has always insisted
> (well... *tried* to insist) on auto-assigning an address that is on a
> PCI controller that supports hotplug. One big problem with this is
> that it prevents automatic use of a Q35 (or aarch64/virt) machine's
> pcie-root (since the PCIe root complex doesn't support hotplug).
>
> In order to promote simpler domain configs (more devices on pcie-root
> rather than on a pci-bridge), this patch adds a new sub-element to all
> guest devices that have a PCI address and support hotplug:
>
> <hotplug require='no'/>
>
> For devices that have hotplug require='no', we turn off the
> VIR_PCI_CONNECT_HOGPLUGGABLE bit in the devFlags when searching for an
> available PCI address. Since pcie-root now allows standard PCI
> devices, this results in those devices being placed on pcie-root
> rather than pci-bridge.
I've been playing around with this and, by itself, it works very well. With
this solved, combined with taking advantage of PCIe for virtio when
available, it's very easy to create q35 domains that have no legacy-PCI
without needing to resort to manually assigning addresses.
However, there is still another item that we need to be able to configure -
stating a preference of legacy PCI vs. PCIe when both are available for a
device (again, the aim is to do this *without* needing to manually assign an
address). The following devices have this choice:
1) vfio assigned devices
2) virtio devices
3) the nec-xhci USB controller
You might think that it would always be preferable to use PCIe if it's
available, but especially in these "early days" of using PCIe in guests it
would be useful to have to ability to *easily* force use of a legacy PCI
slot in case some PCIe-related bug is encountered (in particular, people
have pointed out in discussions about vfio device assignment that it could
be possible for a guest OS to misbehave when presented with a device's PCIe
configuration block (which hasn't been visible in the past because the
device was attached to a legacy PCI slot)).
In order maintain functionality while any such bugs are figured out and
fixed, we need to be able to force the device onto a PCI slot. There are two
ways of doing this:
1) manually specify the full PCI address of a legacy PCI slot in the config
2) provide an option in the config that simply says "use any PCI slot" or
"use any PCIe slot".
Assuming that (1) is too cumbersome, we need to come up with a
reasonable
name/location for a config option (providing the backend for it will be
trivial). Some possible places:
I prefer a variant on (1), which is to specifcy an address with only the
controller index filled out. eg given a q35 bridge topology
<controller type='pci' index='0' model='pcie-root'/>
<controller type='pci' index='1'
model='dmi-to-pci-bridge'>
<model name='i82801b11-bridge'/>
</controller>
<controller type='pci' index='2' model='pci-bridge'>
<model name='pci-bridge'/>
<target chassisNr='56'/>
</controller>
A device would use
<address type="pci" controller="2"> (for pci-bridge
placement)
<address type="pci" controller="0"> (for pcie-root
placement)
This trivially expaneds to cover the NUMA use case too, without having to
invent further elements duplicating NUMA node info again.
<controller type='pci' index='0' model='pci-root'/>
<controller type='pci' index='1'
model='pci-expander-bus'>
<model name='pxb'/>
<target busNr='254'>
<node>0</node>
</target>
</controller>
<controller type='pci' index='2'
model='pci-expander-bus'>
<model name='pxb'/>
<target busNr='255'>
<node>1</node>
</target>
</controller>
eg device uses
<address type="pci" controller="1"> (for pxb on NUMA node
0)
<address type="pci" controller="2"> (for pxb on NUMA node
1)
Yes, when you first boot a guest, this means the mgmt app has to know
what controllers exist by default, and/or specify controllers, but I
think that's ultimately preferrable than inventing an indefinitely
growing list of extra XML elements to provide tweaks to teh PCI address
assginment logic.
We can simplify life of mgmt apps in a different way, but using the domain
XML capabilities to provide full data on the default controllers used for
each machine type. So apps would not need to have any machine type specific
logic in them - they can write code that's entirely metadata driven based
on the domain capabilities.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|