Hi,
This is a request for comments in the design of the PCI multifunction
hotplug/hot-unplug feature for the QEMU driver that hopefully I'll be sending
shortly for review. The feature went through code changes since [1] mostly
because of Libvirt changes itself, but Shiva's 2016 original design [2]
was preserved.
The design consists of a new 'devices' XML element that will contain all
hostdevs of the multifunction device to be hotplugged:
<devices>
--- hostdev function 0 XML ---
--- hostdev function 1 XML ---
(...)
--- hostdev function N XML ---
</devices>
The only requirement is function 0 to be present. We'll not force all functions
to be hotplugged. Internally, Libvirt will hotplug each non-zero function first,
then the zero function for last, which is what QEMU expects for both x86
and Pseries (the 2 archs that supports this feature AFAIK).
For unplug, the same <devices> XML is required. And this is where the archs
starts to behave differently. x86 guests will hot-unplug all functions,
regardless of which function is unplugged first. In the example above,
hot-unplugging function 1 will trigger all functions to be unplugged. Pseries
does not work like that - all non-zero functions need to be unplugged, then the
function zero unplug finally triggers the unplug of the slot. And the
Libvirt side was doing exactly that.
Back in 2019 I interpreted this as a case of a x86 feature paying the price for
a Pseries behavior - if Pseries hot-unplug worked the same way, we could
hot-unplug all functions with a single call like x86 does, and the user
won't need the <devices> XML for hot-unplug. I ended up tweaking the Pseries
guest in QEMU to execute hot-unplug of all functions in case function 0 is
hot-unplugged, emulating the x86 behavior for function 0 [3]. This change alone
already eases the unplug code in Libvirt side for Pseries, which is good. But
my idea was to not use the <devices> XML for hot-unplug. Instead, use regular
hot-unplug of function 0 to hot-unplug the whole slot, for both archs.
However, recent discussions when contributing the new address='unassigned'
type made me re-think the decision I made above:
- using <devices> XML for hot-unplug isn't too much of a deal, given that the
user would need it for hot-plugging the slot anyway. The gain is marginal,
at best.
- making a single hot-unplug command 'magically' unplug more than one hostdev
is not a good idea. First, there's always the chance that the user hot-unplugs
the function 0 of a multifunction device, thinking that it's a regular hostdev,
just to see whole slot disappearing from the guest. Granted, I can put more
code in this case, warning the user about the removal of the whole slot instead
of a single hostdev. But in the end, this breaks Laine's golden rule [4] of
do not tamper with devices unless the user is explicitly telling to do so,
which aims to avoid these cases of devices appearing/disappearing without
user consent.
All this said, what I'm leaning up to do is to keep Shiva's design, but
simplifying the hot-unplug mechanic considering the changes I made in QEMU for
Pseries guests. I'm not adamant on it though, so I'd like to hear if
I'm missing something and I should go the 'single unplug removes it all'
route instead.
Thanks,
DHB
[1]
https://www.redhat.com/archives/libvir-list/2018-March/msg00729.html
[2]
https://www.redhat.com/archives/libvir-list/2016-April/msg01057.html
[3]
https://lists.gnu.org/archive/html/qemu-ppc/2019-08/msg00447.html
[4] this is something Laine talks about here and there in threads I happen
to be involved. If someone wants to claim authorship let me know