DanPB,
Could you please reiterate the suggestion regarding the placement of the pci root hotplug xml you made in the irc channel?
On 9/11/21 11:26 PM, Ani Sinha wrote:
> Hi all:
>
> This patchset introduces libvirt xml support for the following two pm conf
> options:
>
> <pm>
> <acpi-hotplug-bridge enabled='no'/>
> <acpi-root-hotplug enabled='yes'/>
> </pm>
(before I get into a more radical discussion about different options -
since we aren't exactly duplicating the QEMU option name anyway, what if
we made these names more consistent, e.g. "acpi-hotplug-bridge" and
"acpi-hotplug-root"?)
I've thought quite a bit about whether to put these attributes here, or
somewhere else, and I'm still undecided.
My initial reaction to this was "PM == Power Management, and power
management is all about suspend mode support. Hotplug isn't power
management." But then you look at the name of the QEMU option and PM is
right there in the name, and I guess it's *kind of related* (effectively
suspending/resuming a single device), so maybe I'm thinking too narrowly.
So are there alternate places that might fit the purpose of these new
options better, rather than directly mimicking the QEMU option placement
(for better or worse)? A couple alternative possibilities:
1) ****
One possibility would be to include these new flags within the existing
<acpi> subelement of <features>, which is already used to control
whether the guest exposes ACPI to the guest *at all* (via adding
"-no-acpi" to the QEMU commandline when <acpi> is missing - NB: this
feature flag is currently supported only on x86 and aarch64 QEMU
platforms, and ignored for all other hypervisors).
Possibly the new flags could be put in something like this:
<features>
<acpi>
<hotplug-bridge enabled='no'/>
<hotplug-root enabled='yes'/>
</acpi>
...
</features>
But:
* currently there are no subelements to <acpi>. So this isn't "extending
according to an existing pattern".
* even though the <features> element uses presence of a subelement to
indicate "enabled" and absence of the subelement to indicate "disabled".
But in the case of these new acpi bridge options we would need to
explicitly have the "enabled='yes/no'" rather than just using presence
of the option to mean "enabled" and absence to mean "disabled" because
the default for "root-hotplug" up until now has been *enabled*, and the
default for hotplug-bridge is different depending on machinetype. We
need to continue working properly (and identically) with old/existing
XML, but if we didn't have an "enabled" attribute for these new flags,
there would be no way to tell the difference between "not specified" and
"disabled", and so no way to disable the feature for a QEMU where the
default was "enabled". (Why does this matter? Because I don't like the
inconsistency that would arise from some feature flags using absense to
mean "disabled" and some using it to mean "use the default".)
* Having something in <features> in the domain XML kind of implies that
the associated capability flags should be represented in the <features>
section of the domain capabilities. For example, <acpi/> is listed under
<features> in the output of virsh capabilities, separately from the flag
indicating presence of the -no-acpi option. I'm not sure if we would
need to add something there for these options if we moved them into
<features> (seems a bit redundant to me to have it in both places, but
I'm sure there are $reasons).
2) *****
Alternately, there is an <acpi> subelement of <os>, which is currently
used to add a SLIC table (some sort of software license table, which I'd
never heard of before) using QEMU's -acpitable commandline option. It is
also used somehow by the Xen driver.
<os>
<acpi>
<table type='slic'>/path/to/slic.dat</table>
<hotplug-bridge enabled='no'/>
<hotplug-root enabled='yes'/>
</acpi>
...
</os>
My problem with adding these new PCI controller acpi options to os/acpi
is simply that it's in the <os> subelement, which is claimed elsewhere
to be intended for OS boot options, and is used for things like
specifying the path to a kernel / initrd to boot from.
3) ****
A third option, suggested somewhere by Ani, would be to make a
completely new top-level element, called something like <acpiHotplug>
that would have separate attributes for the two flags, e.g.:
<acpiHotplug bridge='yes' root='yes'/>
I dislike new toplevel options because they just seem so adhoc, as if
the XML namespace is a cluttered, disorganized room. That reminds me too
much of my own workspace, which is just... depressing.
****
Since I always seem to spend *way too much time* worrying about naming,
only to have it come out wrong in the end anyway, I'm looking for some
other opinions. Counting the version that is in Ani's patch currently as
option "0", which option do you all think is the best? Or is it
completely unimportant?
> The above two options are only available for qemu driver and that too for x86
> guests only. Both of them are global options.
>
> ``acpi-hotplug-bridge`` option enables or disables ACPI hotplug support for cold
> plugged bridges. Examples of cold plugged bridges include PCI-PCI bridge
> (pci-bridge controller) for pc machines and pcie-root-port controller for q35
> machines. The corresponding commandline options to qemu for x86 guests are:
The "cold plugged bridges" term here throws me for a loop - it implies
that hotplugging bridges is something that's supported, and I think it
still isn't. Of course this is just the cover letter, so it won't go
into git anywhere, but I think it should be enough to say "enables ACPI
hotplug into non-root bus PCI bridges/ports".
>
> (pc machines): -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=<off/on>
> (q35 machines): -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=<off/on>
So I'm curious - if the QEMU commandline also included "-no-acpi" along
with these, what would happen? Would it be silently ignored? Generate an
error? Or does -no-acpi only control the suspend support, and acpi
hotplug is still available?
>
> Being global options, no other bridge specific options for pci-bridge
> controller or pcie-root-port controllers are required. For pc machine type in
> x86, this option is available in qemu for a long time, from version 2.1.
> Please see the changes in qemu.git:
>
> 9e047b982452c6 ("piix4: add acpi pci hotplug support")
Interesting. So how was hotplug handled before this? With SHPC? I know
there must be *some* kind of hotplug support in older QEMU, because
RHEL6 QEMU supported hotplug, and it was based on qemu 0.12 or something
ancient like that...
> 133a2da488062e ("pc: acpi: generate AML only for PCI0 devices if PCI bridge hotplug is disabled")
>
> For q35 machine type, this was introduced in qemu 6.1 with the following
> changes in qemu.git:
>
> (a) c0e427d6eb5fef ("hw/acpi/ich9: Enable ACPI PCI hot-plug")
> (b) 17858a16950860 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
>
> The reasons for enabling ACPI based hotplug for PCIe (q35) based machines (as
> opposed to native hotplug) for bridges are outlined in (b). It is possible that
> some users might still want to use native hotplug on PCIe [1]. Therefore,
> this conf option enables users to choose either ACPI based hotplug or native
> hotplug for cold plugged bridges (for example for pcie root port controller
> in q35 machines).
>
> ``acpi-root-hotplug`` option enables or disables ACPI based hotplug for PCI root
> bus (pci-root controller). This option is only available for pc machine type.
> The corresponding commandline option to qemu for x86 guests is:
>
> -global PIIX4_PM.acpi-root-pci-hotplug=<off/on>
>
> This additional option enables users to disable hotplug for all devices in the
> system without adding an additional PCI-PCI bridge, putting the devices behind
> the bridge and using the existing ``acpi-hotplug-bridge`` option to disable
> hotplug on that bridge. This feature was introduced from qemu version 5.2 with
> the following change in qemu.git:
>
> 3d7e78aa7777f ("Introduce a new flag for i440fx to disable PCI hotplug on the root bus")
>
> The above qemu commit describes some compelling reasons why users might to
> disable hotplug on PCI root buses [2].
>
> A brief summary of the patches:
>
>> [PATCH v3 1/5] qemu: capablities: detect presence of
>> [PATCH v3 2/5] qemu: capablities: detect presence of
> Patches 1 and 2 implement support for qemu capability checks for the above
> config options.
>
>> [PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and
> Patch 3 actually adds the config option to the schema and adds related unit
> tests.
>
>> [PATCH v3 4/5] qemu: command: add support for qemu options that
> Patch 4 adds the backend qemu commandline support for the options. It also adds
> relevant unit tests for the same.
>
>> [PATCH v3 5/5] NEWS: add new acpi pci hotplug options in the release
> Patch 5 adds the release notes for the next libvirt release.
>
>
> Changelog:
> v1: initial implementation. Had some bugs and missed some unit tests.
> v2: fixed bugs and added additional missing unit tests.
> v3: reorganized the patches as per Laine's suggestion. Added more
> details in commit messages. Added conf description in formatdomain.rst.
> Added changelog for next release.
>
>
> Notes:
>
> [1] One concrete example of why one might still want to use native hotplug with
> pcie-root-port controller is the fact that we are still discovering issues with
> acpi hotplug on PCIE.
Yes, sigh. I recall someone saying something like "if we switch to ACPI
hotplug then all these bugs just go away and everything works" or
something like that. Reality never matches the ideal picture we put in
our brains.
At least ACPI hotplug is only the default on new machinetypes (doesn't
help much for management platforms that always just use "q35" every time
they start a guest). And it can also cause problems with distro-specific
machinetypes in downstream distros when they are rebased:
https://bugzilla.redhat.com/2006409
> One such issue is:
> https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02146.html
> Another reason is that users have been using native hotplug on pcie root ports
> up until now. They have built and tested their systems based on native hotplug.
> They may not want to suddenly move to acpi based hotplug just because it is now
> the default in qemu. Supporting the option to chose one or the other through
> libvirt makes things simpler for end users.
>
> [2] The use case scenario described by Laine in
> https://listman.redhat.com/archives/libvir-list/2020-February/msg00110.html
> intentionally does not discuss i440fx and focusses solely on q35. I do realize
> that redhat has moved on from i440fx and currently efforts for new features
> are concentrated on q35 machines only. We have had some hard debates on this
> on the qemu mailing list before. The fact of the matter is that i440fx is
> not at 1-1 parity with q35. There are many users who are currenly using i440fx
> and are simply not ready to move to q35 without sacrificing some
> existing features they support today. For example
> https://wiki.qemu.org/images/4/4e/Q35.pdf lists some of q35 limitations.
To be fair, aside from "support for Win2000/WinXP", none of the items on
the "limitations" page of that slide deck is something that's impossible
to do with a Q35 machinetype; it's just that accomplishing some things
may be more complicated. But I understand your point. Mainly I brought
it up because I wanted to be sure that we're adding these to fulfill an
actual need, rather than just adding bulk for the sake of completeness,
or to satisfy curiosity.
> https://www.linux-kvm.org/images/0/06/2012-forum-Q35.pdf provides more
> information on the differences. Hence we need to solve the issue Laine has
> described in the above email for i440fx without adding additional bridges.
>
> Further, in Daniel Berrange's words from :
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg03012.html
>
> "From the upstream POV, there's been no decision / agreement to phase
> out PIIX, this is purely a RHEL downstream decision & plan. If other
> distros / users have a different POV, and find the feature useful, we
> should accept the patch if it meets the normal QEMU patch requirements.
> "
>
> Also to be noted that I have already experimented this qemu commandline option
> using libvirt passthrough feature as has been documented in
> http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
> This was only meant to be a short term solution until libvirt started
> supporting this natively. Supporting this option through libvirt would simplify
> their use case as well as add capability validations
> and graceful failure scenarios in case qemu did not support the option.
>
> [3] Finally, I implemented support for ``acpi-root-hotplug`` option in Qemu.
> Since adding the support for this option, I have not run away :-) I am still
> around, fixing other issues in the same subsystem in qemu and also now I have
> added myself as a reviewer of patches in this area. I will also be trying to
> support/maintain this new xml conf option in libvirt to the extent I can in
> future with the help of other experienced maintainers. Obviously this is all
> freelance work at this moment and is highly dependent on available free time.
>
>
Since I don't follow qemu-devel closely, I didn't have prior knowledge
of exactly what the options did, and it was unclear in the earlier
versions of your patches that what <acpi-hotplug-bridge enabled='no'/>
did was to disable ACPI hotplug for the entire guest (which on Q35 means
that native PCIe hotplug will be found/used, and on 440fx means that
hotplug won't be possible (unless SHPC hotplugged is enabled)). Your
exaplanation and documentation in this spin of the patches makes that
all clear though, so I'm beyond the "what does this do and do we need
it?" stage to the "are there any problems with the code?" stage, and
that's what I'll try to address in my review of the patches.