On Thu, 21 Nov 2019 19:19:13 -0300
Daniel Henrique Barboza <danielhb413(a)gmail.com> wrote:
Changes from previous version [1], all of them result of
feedback from Alex Williamson and Abdulla Bubshait:
- use <address type='none'/> instead of creating a new subsys
attribute;
- expand the change to all PCI hostdevs. Former patch 01 was
discarded because we don't need the PCI Multifunction helpers
for now;
- series changed name to reflect what it's being done
- new patch 04: add documentation to formatdomain.html.in
To avoid a huge wall of text please refer to [1] for context
about the road up to here. Commit msgs of the first 3 patches
tells the story as well.
[1]
https://www.redhat.com/archives/libvir-list/2019-October/msg00298.html
What I want to discuss here instead is a caveat that I've found
while testing this work, since its first version. This test was
done in a Power 8 system with a Broadcom BCM5719 PCIe Multifunction
card, with 4 virtual functions. This series enables Libvirt to
declare PCI hostdevs that will not be visible to the guest using
address type='none'. During the tests I faced a scenario that I
expected to fail, but it didn't. This is the relevant XML except:
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x09' slot='0x00'
function='0x0'/>
</source>
<address type='none'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x09' slot='0x00'
function='0x1'/>
</source>
<address type='none'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x09' slot='0x00'
function='0x2'/>
</source>
<address type='pci' domain='0x0000' bus='0x01'
slot='0x01' function='0x2'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
<driver name='vfio'/>
<source>
<address domain='0x0001' bus='0x09' slot='0x00'
function='0x3'/>
</source>
<address type='none'/>
</hostdev>
I'm declaring all the BCM5719 functions in the XML, but I am making
functions 0, 1 and 3 unassignable by the guest using address type='none'.
This test was meant to fail, but it didn't. To my surprise the guest
booted and the device is functional:
$ lspci
0000:00:01.0 Unclassified device [00ff]: Red Hat, Inc Virtio memory balloon
0000:00:03.0 USB controller: Red Hat, Inc. QEMU XHCI Host Controller (rev 01)
0000:00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
0001:00:01.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet
PCIe (rev 01)
$
I've talked with Michael Roth (QEMU PPC64 developer that worked with
the PCI multifunction hotplug/unplug support in the PPC64 machine)
about this. He mentioned that this is intended. I'll quote here what he
had to say about it:
"The current logic is that we only emit the hotplug event when function
0 is attached, but if some other function is attached at boot-time the
guest will still see it on the bus, and whether that works or not I
think is up to the device/driver"
This explains why this test didn't fail as I expected. At least for
the PPC64 machine, depending on the device support, this setup is
allowed. PPC64 machine uses function 0 hotplug as a signal of
'plug all the queue functions and function 0', but function 0
isn't required at boot time. I would like to hear other opinions
in this because I can't say whether this is allowed in x86.
I would expect that on x86 a Linux guest would need to be booted with
the pci=pcie_scan_all kernel option to find this device. The PCI-core
will only scan for devfn 00.0 behind a downstream port and then only
scan non-zero functions if that device indicate multifunction support.
I'd expect more archs to behave this way than what you see on PPC where
it "just works".
I am mentioning all this now because this had a direct impact on the
design of this work since the previous version, and I failed
to bring it up back then. I am *not* checking for the assignment of
function 0 at guest boot time in Libvirt, leaving the user free to
decide what to do. I am aware that this will be inconsistent to the
logic of the PCI multifunction hotplug/unplug support, where
function 0 is required. This also puts a lot of faith in the user,
relying that the user is fully aware of the capabilities of the
hardware.
My question is: should Libvirt force function 0 to be present in
boot time as well, regardless of whether the PPC64 guest or some
cards are able to boot without it?
In my reading of the PCI 3.0 spec, 3.2.2.3.4 indicates that
multifunction devices are required to implement function 0 and that
configuration software is under no obligation to scan for higher number
functions if function 0 is not present and does not have the
multifunction bit set in the header type register. With PCIe, where
link information, payload size, ARI, VC, etc are exposed in config
space, many of these are only valid or have specific means only for
function 0.
What you're seeing on PPC is, I think, more typical of paravirtualized
PCI enumeration, where you're only seeing a view of the bus as allowed
by a hypervisor. I can't say whether the pcie_scan_all boot option was
added to allow discovery of devices as in your configuration or simply
to correct cases where function 0 forgot to implement the multifunction
bit. Whether libvirt wants to prevent this is more of a support
question, it seems spec compliant hardware should never do this, but
not all hardware is spec compliant. Libvirt should never generate such
a configuration on it's own, but I wouldn't necessarily object if it
allows a user to shoot themselves in the foot. Thanks,
Alex