[libvirt] RFC: revival of hotplug/unplug for PCI Multifunction devices in QEMU guests

Hi, This is labeled as RFC but it's more like a FYI to let people know and comment beforehand. Shiva sent a 28 patch series last year that implements hotplug/unplug support for PCI multifunction devices [1]. The design motivation of his work was based in a RFC sent to this mailing list back in 2016 [2]. I'll briefly summarize the goals and motivations here. What we have today in Libvirt: - no hotplug/unplug support for multifunction PCI devices This is explained in details in [2]. When hotplugging a multifunction device, QEMU will queue the hotplug operation of all non-zero functions and, when function 0 is hotplugged, all functions are hotplugged together. This is true for all archs that supports PCI multifunction devices in QEMU. For unplug it varies: x86 will unplug all functions if any function is unplugged, ppc64 needs to unplug each one. Due to the nature of how Libvirt hotplug works now, hotplug of these devices is not possible. All hotplugs are considered in an isolated manner. Even if we hotplug each function in the proper order (i.e. leaving function 0 last), Libvirt can assign different slots in the guest for each. Similar problems happens with hot-unplug. This feature aims to address these by creating a new <devices> element, exclusive for multifunction devices, that aggregates all functions of a device in a single operation. To handle this new element, the existing attach/detach functions in the QEMU driver now handles multiple devices. Attaching/detaching a single device is routed away from the specialized multifunction code to be handled to the existing attach/detach code base. - no support for partial assignment of functions We can't make the assumption that the guest will always assign all devices of a multifunction device. Some functions might be a security risk to expose to a guest, or the device can behave differently depending on the amount of functions assigned. Even if the 'leftover' functions can't be used to anything else in the host, the decision of full/partial assignment of functions should come from the user, not us. We can't predict how any other hardware vendor will setup its devices. This patch series also handles this case. ------ The latest version of this feature, rebase to cdd362e0e7a (the current master as I'm writing this), can be found at: https://github.com/danielhb/libvirt/tree/multifunction-rc2 This is a work still ongoing that it's not ready for contribution yet (first patches that changes the unit test code are breaking existing tests). The feature itself is being stress tested using a Broadcom BCM5719 with 4 functions and it looks solid. I'll see if I can grab a x86 env with a multifunction card to test it there too when I have the chance. Thoughts/comments are welcome. Instead of dropping a 28+ patch set for review, I'm planning into sending smaller chunks that makes senses individually. Thanks, DHB [1] https://www.redhat.com/archives/libvir-list/2018-March/msg00729.html [2] https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html

On Tue, Jun 18, 2019 at 03:04:40PM -0300, Daniel Henrique Barboza wrote:
Hi,
This is labeled as RFC but it's more like a FYI to let people know and comment beforehand. Shiva sent a 28 patch series last year that implements hotplug/unplug support for PCI multifunction devices [1]. The design motivation of his work was based in a RFC sent to this mailing list back in 2016 [2].
I'll briefly summarize the goals and motivations here. What we have today in Libvirt:
- no hotplug/unplug support for multifunction PCI devices
This is explained in details in [2]. When hotplugging a multifunction device, QEMU will queue the hotplug operation of all non-zero functions and, when function 0 is hotplugged, all functions are hotplugged together. This is true for all archs that supports PCI multifunction devices in QEMU. For unplug it varies: x86 will unplug all functions if any function is unplugged, ppc64 needs to unplug each one.
Do you know anything about why ppc64 & x86 are different in this respect in QEMU. I think it would be desirable to fix QEMU so that unplug works consistently across architectures. These kind of behavioural differences are a cause of pain as x86 gets all the day to day testing & leaving ppc64 to bitrot if it behaves differently. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 6/19/19 4:31 AM, Daniel P. Berrangé wrote:
Hi,
This is labeled as RFC but it's more like a FYI to let people know and comment beforehand. Shiva sent a 28 patch series last year that implements hotplug/unplug support for PCI multifunction devices [1]. The design motivation of his work was based in a RFC sent to this mailing list back in 2016 [2].
I'll briefly summarize the goals and motivations here. What we have today in Libvirt:
- no hotplug/unplug support for multifunction PCI devices
This is explained in details in [2]. When hotplugging a multifunction device, QEMU will queue the hotplug operation of all non-zero functions and, when function 0 is hotplugged, all functions are hotplugged together. This is true for all archs that supports PCI multifunction devices in QEMU. For unplug it varies: x86 will unplug all functions if any function is unplugged, ppc64 needs to unplug each one. Do you know anything about why ppc64 & x86 are different in this respect in QEMU. I think it would be desirable to fix QEMU so that unplug works consistently across architectures. These kind of behavioural differences are a cause of pain as x86 gets all the day to day testing & leaving
On Tue, Jun 18, 2019 at 03:04:40PM -0300, Daniel Henrique Barboza wrote: ppc64 to bitrot if it behaves differently.
That's a good point. I don't know why it's different but I'll investigate it. Thanks, DHB
Regards, Daniel

Hi Daniel, On 6/19/19 4:31 AM, Daniel P. Berrangé wrote:
Hi,
This is labeled as RFC but it's more like a FYI to let people know and comment beforehand. Shiva sent a 28 patch series last year that implements hotplug/unplug support for PCI multifunction devices [1]. The design motivation of his work was based in a RFC sent to this mailing list back in 2016 [2].
I'll briefly summarize the goals and motivations here. What we have today in Libvirt:
- no hotplug/unplug support for multifunction PCI devices
This is explained in details in [2]. When hotplugging a multifunction device, QEMU will queue the hotplug operation of all non-zero functions and, when function 0 is hotplugged, all functions are hotplugged together. This is true for all archs that supports PCI multifunction devices in QEMU. For unplug it varies: x86 will unplug all functions if any function is unplugged, ppc64 needs to unplug each one. Do you know anything about why ppc64 & x86 are different in this respect in QEMU. I think it would be desirable to fix QEMU so that unplug works consistently across architectures. These kind of behavioural differences are a cause of pain as x86 gets all the day to day testing & leaving
On Tue, Jun 18, 2019 at 03:04:40PM -0300, Daniel Henrique Barboza wrote: ppc64 to bitrot if it behaves differently.
Finally had the time to look into it in QEMU. The reason for the difference is indeed architectural - x86 handles it in slot-granularity level and PPC64 handles it in function level. This is why in x86 unplugging any function will remove the whole slot, while in PPC64 each non-zero function needs to be detached first. I've sent a patch to QEMU to change the way PPC64 behaves in function zero unplug [1]. Instead of bail out claiming that there are non-zero functions left, QEMU will simply remove the remaining functions by itself. This behavior happens only with function zero unplug (at least in this first implementation). Not sure if QEMU will accept it, but if it does, it'll spare us some patches in Libvirt unplug code for PCI Multifunction. Let's see how it goes. [1] https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg04820.html Thanks, DHB
Regards, Daniel

On Thu, Aug 22, 2019 at 05:09:10PM -0300, Daniel Henrique Barboza wrote:
Hi Daniel,
On 6/19/19 4:31 AM, Daniel P. Berrangé wrote:
Hi,
This is labeled as RFC but it's more like a FYI to let people know and comment beforehand. Shiva sent a 28 patch series last year that implements hotplug/unplug support for PCI multifunction devices [1]. The design motivation of his work was based in a RFC sent to this mailing list back in 2016 [2].
I'll briefly summarize the goals and motivations here. What we have today in Libvirt:
- no hotplug/unplug support for multifunction PCI devices
This is explained in details in [2]. When hotplugging a multifunction device, QEMU will queue the hotplug operation of all non-zero functions and, when function 0 is hotplugged, all functions are hotplugged together. This is true for all archs that supports PCI multifunction devices in QEMU. For unplug it varies: x86 will unplug all functions if any function is unplugged, ppc64 needs to unplug each one. Do you know anything about why ppc64 & x86 are different in this respect in QEMU. I think it would be desirable to fix QEMU so that unplug works consistently across architectures. These kind of behavioural differences are a cause of pain as x86 gets all the day to day testing & leaving
On Tue, Jun 18, 2019 at 03:04:40PM -0300, Daniel Henrique Barboza wrote: ppc64 to bitrot if it behaves differently.
Finally had the time to look into it in QEMU. The reason for the difference is indeed architectural - x86 handles it in slot-granularity level and PPC64 handles it in function level. This is why in x86 unplugging any function will remove the whole slot, while in PPC64 each non-zero function needs to be detached first.
I've sent a patch to QEMU to change the way PPC64 behaves in function zero unplug [1]. Instead of bail out claiming that there are non-zero functions left, QEMU will simply remove the remaining functions by itself. This behavior happens only with function zero unplug (at least in this first implementation). Not sure if QEMU will accept it, but if it does, it'll spare us some patches in Libvirt unplug code for PCI Multifunction. Let's see how it goes.
Looks like the ppc maintainer has queued the patch, which is good. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 8/23/19 7:28 AM, Daniel P. Berrangé wrote:
On Thu, Aug 22, 2019 at 05:09:10PM -0300, Daniel Henrique Barboza wrote:
Hi Daniel,
On 6/19/19 4:31 AM, Daniel P. Berrangé wrote:
On Tue, Jun 18, 2019 at 03:04:40PM -0300, Daniel Henrique Barboza wrote: [...] Finally had the time to look into it in QEMU. The reason for the difference is indeed architectural - x86 handles it in slot-granularity level and PPC64 handles it in function level. This is why in x86 unplugging any function will remove the whole slot, while in PPC64 each non-zero function needs to be detached first.
I've sent a patch to QEMU to change the way PPC64 behaves in function zero unplug [1]. Instead of bail out claiming that there are non-zero functions left, QEMU will simply remove the remaining functions by itself. This behavior happens only with function zero unplug (at least in this first implementation). Not sure if QEMU will accept it, but if it does, it'll spare us some patches in Libvirt unplug code for PCI Multifunction. Let's see how it goes. Looks like the ppc maintainer has queued the patch, which is good.
Yes, that will reduce the amount of patches needed for the hot unplug code for both archs (since x86 was being impacted by how ppc64 does the unplug as well). I'll work in resubmit the patch series considering the upstream PPC64 change. Thanks, DHB
Regards, Daniel

On 6/18/19 8:04 PM, Daniel Henrique Barboza wrote:
Hi,
This is labeled as RFC but it's more like a FYI to let people know and comment beforehand. Shiva sent a 28 patch series last year that implements hotplug/unplug support for PCI multifunction devices [1]. The design motivation of his work was based in a RFC sent to this mailing list back in 2016 [2].
I'll briefly summarize the goals and motivations here. What we have today in Libvirt:
- no hotplug/unplug support for multifunction PCI devices
This is explained in details in [2]. When hotplugging a multifunction device, QEMU will queue the hotplug operation of all non-zero functions and, when function 0 is hotplugged, all functions are hotplugged together. This is true for all archs that supports PCI multifunction devices in QEMU. For unplug it varies: x86 will unplug all functions if any function is unplugged, ppc64 needs to unplug each one.
Due to the nature of how Libvirt hotplug works now, hotplug of these devices is not possible. All hotplugs are considered in an isolated manner. Even if we hotplug each function in the proper order (i.e. leaving function 0 last), Libvirt can assign different slots in the guest for each. Similar problems happens with hot-unplug.
This feature aims to address these by creating a new <devices> element, exclusive for multifunction devices, that aggregates all functions of a device in a single operation. To handle this new element, the existing attach/detach functions in the QEMU driver now handles multiple devices. Attaching/detaching a single device is routed away from the specialized multifunction code to be handled to the existing attach/detach code base.
- no support for partial assignment of functions
We can't make the assumption that the guest will always assign all devices of a multifunction device. Some functions might be a security risk to expose to a guest, or the device can behave differently depending on the amount of functions assigned.
Even if the 'leftover' functions can't be used to anything else in the host, the decision of full/partial assignment of functions should come from the user, not us. We can't predict how any other hardware vendor will setup its devices.
This patch series also handles this case.
------
The latest version of this feature, rebase to cdd362e0e7a (the current master as I'm writing this), can be found at:
https://github.com/danielhb/libvirt/tree/multifunction-rc2
This is a work still ongoing that it's not ready for contribution yet (first patches that changes the unit test code are breaking existing tests).
I haven't looked at your patches that closely, but I think I might have a fix for you. I mean, I'm working around this area too (see my latest upstream patches) and I've found that our vfio-pci driver has 'bind' action suspended - it fails. This is because RHEL kernel behaves like that (but not the vanilla one). I've spoke to a kernel developer and he experienced the same behaviour, but did not know from the top of his head what is the root cause. Anyway, it's not that big of a deal because libvirt prefers 'driver_override' and thuse it'll never try 'bind' on non-ancient kernels. And my patch does exactly that - it introduces 'driver_override' to virpcimock.c: https://github.com/zippy2/libvirt/commit/2b8ca7d4598ef479b19b61f942a17d35504... Actually, there are two more patches needed: https://github.com/zippy2/libvirt/commits/nvme 2b8ca7d459 virpcimock: Create driver_override file in device dirs 9631289730 Revert "virpcitest: Test virPCIDeviceDetach failure" 4310fbbc40 virpcimock: Move actions checking one level up Michal

Hi Michal, On 6/19/19 5:51 AM, Michal Privoznik wrote:
On 6/18/19 8:04 PM, Daniel Henrique Barboza wrote:
Hi,
This is labeled as RFC but it's more like a FYI to let people know and comment beforehand. Shiva sent a 28 patch series last year that implements hotplug/unplug support for PCI multifunction devices [1]. The design motivation of his work was based in a RFC sent to this mailing list back in 2016 [2].
I'll briefly summarize the goals and motivations here. What we have today in Libvirt:
- no hotplug/unplug support for multifunction PCI devices
This is explained in details in [2]. When hotplugging a multifunction device, QEMU will queue the hotplug operation of all non-zero functions and, when function 0 is hotplugged, all functions are hotplugged together. This is true for all archs that supports PCI multifunction devices in QEMU. For unplug it varies: x86 will unplug all functions if any function is unplugged, ppc64 needs to unplug each one.
Due to the nature of how Libvirt hotplug works now, hotplug of these devices is not possible. All hotplugs are considered in an isolated manner. Even if we hotplug each function in the proper order (i.e. leaving function 0 last), Libvirt can assign different slots in the guest for each. Similar problems happens with hot-unplug.
This feature aims to address these by creating a new <devices> element, exclusive for multifunction devices, that aggregates all functions of a device in a single operation. To handle this new element, the existing attach/detach functions in the QEMU driver now handles multiple devices. Attaching/detaching a single device is routed away from the specialized multifunction code to be handled to the existing attach/detach code base.
- no support for partial assignment of functions
We can't make the assumption that the guest will always assign all devices of a multifunction device. Some functions might be a security risk to expose to a guest, or the device can behave differently depending on the amount of functions assigned.
Even if the 'leftover' functions can't be used to anything else in the host, the decision of full/partial assignment of functions should come from the user, not us. We can't predict how any other hardware vendor will setup its devices.
This patch series also handles this case.
------
The latest version of this feature, rebase to cdd362e0e7a (the current master as I'm writing this), can be found at:
https://github.com/danielhb/libvirt/tree/multifunction-rc2
This is a work still ongoing that it's not ready for contribution yet (first patches that changes the unit test code are breaking existing tests).
I haven't looked at your patches that closely, but I think I might have a fix for you. I mean, I'm working around this area too (see my latest upstream patches) and I've found that our vfio-pci driver has 'bind' action suspended - it fails. This is because RHEL kernel behaves like that (but not the vanilla one). I've spoke to a kernel developer and he experienced the same behaviour, but did not know from the top of his head what is the root cause. Anyway, it's not that big of a deal because libvirt prefers 'driver_override' and thuse it'll never try 'bind' on non-ancient kernels. And my patch does exactly that - it introduces 'driver_override' to virpcimock.c:
https://github.com/zippy2/libvirt/commit/2b8ca7d4598ef479b19b61f942a17d35504...
Actually, there are two more patches needed:
https://github.com/zippy2/libvirt/commits/nvme
2b8ca7d459 virpcimock: Create driver_override file in device dirs 9631289730 Revert "virpcitest: Test virPCIDeviceDetach failure" 4310fbbc40 virpcimock: Move actions checking one level up
Thanks for that. I was getting test errors after applying a patch that reads "virpcitest: Change the stub driver to vfio from pci-stub", and that was the reason. Applying your patches fixed the issue. Small detail: patch 2b8ca7d459 fails to compile standalone in my machine because 'ret' in pci_driver_handle_change can be returned uninitialized after adding the 'driver_override' conditional that does nothing. Declaring 'int ret = 0;' in pci_driver_handle_change fixed it for me. Thanks again, DHB
Michal
participants (3)
-
Daniel Henrique Barboza
-
Daniel P. Berrangé
-
Michal Privoznik