On 9/11/21 11:26 PM, Ani Sinha wrote:
Hi all:
This patchset introduces libvirt xml support for the following two pm conf
options:
<pm>
<acpi-hotplug-bridge enabled='no'/>
<acpi-root-hotplug enabled='yes'/>
</pm>
(before I get into a more radical discussion about different options -
since we aren't exactly duplicating the QEMU option name anyway, what if
we made these names more consistent, e.g. "acpi-hotplug-bridge" and
"acpi-hotplug-root"?)
I've thought quite a bit about whether to put these attributes here, or
somewhere else, and I'm still undecided.
My initial reaction to this was "PM == Power Management, and power
management is all about suspend mode support. Hotplug isn't power
management." But then you look at the name of the QEMU option and PM is
right there in the name, and I guess it's *kind of related* (effectively
suspending/resuming a single device), so maybe I'm thinking too narrowly.
So are there alternate places that might fit the purpose of these new
options better, rather than directly mimicking the QEMU option placement
(for better or worse)? A couple alternative possibilities:
1) ****
One possibility would be to include these new flags within the existing
<acpi> subelement of <features>, which is already used to control
whether the guest exposes ACPI to the guest *at all* (via adding
"-no-acpi" to the QEMU commandline when <acpi> is missing - NB: this
feature flag is currently supported only on x86 and aarch64 QEMU
platforms, and ignored for all other hypervisors).
Possibly the new flags could be put in something like this:
<features>
<acpi>
<hotplug-bridge enabled='no'/>
<hotplug-root enabled='yes'/>
</acpi>
...
</features>
But:
* currently there are no subelements to <acpi>. So this isn't "extending
according to an existing pattern".
* even though the <features> element uses presence of a subelement to
indicate "enabled" and absence of the subelement to indicate
"disabled".
But in the case of these new acpi bridge options we would need to
explicitly have the "enabled='yes/no'" rather than just using presence
of the option to mean "enabled" and absence to mean "disabled" because
the default for "root-hotplug" up until now has been *enabled*, and the
default for hotplug-bridge is different depending on machinetype. We
need to continue working properly (and identically) with old/existing
XML, but if we didn't have an "enabled" attribute for these new flags,
there would be no way to tell the difference between "not specified" and
"disabled", and so no way to disable the feature for a QEMU where the
default was "enabled". (Why does this matter? Because I don't like the
inconsistency that would arise from some feature flags using absense to
mean "disabled" and some using it to mean "use the default".)
* Having something in <features> in the domain XML kind of implies that
the associated capability flags should be represented in the <features>
section of the domain capabilities. For example, <acpi/> is listed under
<features> in the output of virsh capabilities, separately from the flag
indicating presence of the -no-acpi option. I'm not sure if we would
need to add something there for these options if we moved them into
<features> (seems a bit redundant to me to have it in both places, but
I'm sure there are $reasons).
2) *****
Alternately, there is an <acpi> subelement of <os>, which is currently
used to add a SLIC table (some sort of software license table, which I'd
never heard of before) using QEMU's -acpitable commandline option. It is
also used somehow by the Xen driver.
<os>
<acpi>
<table type='slic'>/path/to/slic.dat</table>
<hotplug-bridge enabled='no'/>
<hotplug-root enabled='yes'/>
</acpi>
...
</os>
My problem with adding these new PCI controller acpi options to os/acpi
is simply that it's in the <os> subelement, which is claimed elsewhere
to be intended for OS boot options, and is used for things like
specifying the path to a kernel / initrd to boot from.
3) ****
A third option, suggested somewhere by Ani, would be to make a
completely new top-level element, called something like <acpiHotplug>
that would have separate attributes for the two flags, e.g.:
<acpiHotplug bridge='yes' root='yes'/>
I dislike new toplevel options because they just seem so adhoc, as if
the XML namespace is a cluttered, disorganized room. That reminds me too
much of my own workspace, which is just... depressing.
****
Since I always seem to spend *way too much time* worrying about naming,
only to have it come out wrong in the end anyway, I'm looking for some
other opinions. Counting the version that is in Ani's patch currently as
option "0", which option do you all think is the best? Or is it
completely unimportant?
The above two options are only available for qemu driver and that too
for x86
guests only. Both of them are global options.
``acpi-hotplug-bridge`` option enables or disables ACPI hotplug support for cold
plugged bridges. Examples of cold plugged bridges include PCI-PCI bridge
(pci-bridge controller) for pc machines and pcie-root-port controller for q35
machines. The corresponding commandline options to qemu for x86 guests are:
The "cold plugged bridges" term here throws me for a loop - it implies
that hotplugging bridges is something that's supported, and I think it
still isn't. Of course this is just the cover letter, so it won't go
into git anywhere, but I think it should be enough to say "enables ACPI
hotplug into non-root bus PCI bridges/ports".
(pc machines): -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=<off/on>
(q35 machines): -global ICH9-LPC.acpi-pci-hotplug-with-bridge-support=<off/on>
So I'm curious - if the QEMU commandline also included "-no-acpi" along
with these, what would happen? Would it be silently ignored? Generate an
error? Or does -no-acpi only control the suspend support, and acpi
hotplug is still available?
Being global options, no other bridge specific options for pci-bridge
controller or pcie-root-port controllers are required. For pc machine type in
x86, this option is available in qemu for a long time, from version 2.1.
Please see the changes in qemu.git:
9e047b982452c6 ("piix4: add acpi pci hotplug support")
Interesting. So how was hotplug handled before this? With SHPC? I know
there must be *some* kind of hotplug support in older QEMU, because
RHEL6 QEMU supported hotplug, and it was based on qemu 0.12 or something
ancient like that...
133a2da488062e ("pc: acpi: generate AML only for PCI0 devices if
PCI bridge hotplug is disabled")
For q35 machine type, this was introduced in qemu 6.1 with the following
changes in qemu.git:
(a) c0e427d6eb5fef ("hw/acpi/ich9: Enable ACPI PCI hot-plug")
(b) 17858a16950860 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
The reasons for enabling ACPI based hotplug for PCIe (q35) based machines (as
opposed to native hotplug) for bridges are outlined in (b). It is possible that
some users might still want to use native hotplug on PCIe [1]. Therefore,
this conf option enables users to choose either ACPI based hotplug or native
hotplug for cold plugged bridges (for example for pcie root port controller
in q35 machines).
``acpi-root-hotplug`` option enables or disables ACPI based hotplug for PCI root
bus (pci-root controller). This option is only available for pc machine type.
The corresponding commandline option to qemu for x86 guests is:
-global PIIX4_PM.acpi-root-pci-hotplug=<off/on>
This additional option enables users to disable hotplug for all devices in the
system without adding an additional PCI-PCI bridge, putting the devices behind
the bridge and using the existing ``acpi-hotplug-bridge`` option to disable
hotplug on that bridge. This feature was introduced from qemu version 5.2 with
the following change in qemu.git:
3d7e78aa7777f ("Introduce a new flag for i440fx to disable PCI hotplug on the root
bus")
The above qemu commit describes some compelling reasons why users might to
disable hotplug on PCI root buses [2].
A brief summary of the patches:
> [PATCH v3 1/5] qemu: capablities: detect presence of
> [PATCH v3 2/5] qemu: capablities: detect presence of
Patches 1 and 2 implement support for qemu capability checks for the above
config options.
> [PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and
Patch 3 actually adds the config option to the schema and adds related unit
tests.
> [PATCH v3 4/5] qemu: command: add support for qemu options that
Patch 4 adds the backend qemu commandline support for the options. It also adds
relevant unit tests for the same.
> [PATCH v3 5/5] NEWS: add new acpi pci hotplug options in the release
Patch 5 adds the release notes for the next libvirt release.
Changelog:
v1: initial implementation. Had some bugs and missed some unit tests.
v2: fixed bugs and added additional missing unit tests.
v3: reorganized the patches as per Laine's suggestion. Added more
details in commit messages. Added conf description in formatdomain.rst.
Added changelog for next release.
Notes:
[1] One concrete example of why one might still want to use native hotplug with
pcie-root-port controller is the fact that we are still discovering issues with
acpi hotplug on PCIE.
Yes, sigh. I recall someone saying something like "if we switch to ACPI
hotplug then all these bugs just go away and everything works" or
something like that. Reality never matches the ideal picture we put in
our brains.
At least ACPI hotplug is only the default on new machinetypes (doesn't
help much for management platforms that always just use "q35" every time
they start a guest). And it can also cause problems with distro-specific
machinetypes in downstream distros when they are rebased:
https://bugzilla.redhat.com/2006409
One such issue is:
https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02146.html
Another reason is that users have been using native hotplug on pcie root ports
up until now. They have built and tested their systems based on native hotplug.
They may not want to suddenly move to acpi based hotplug just because it is now
the default in qemu. Supporting the option to chose one or the other through
libvirt makes things simpler for end users.
[2] The use case scenario described by Laine in
https://listman.redhat.com/archives/libvir-list/2020-February/msg00110.html
intentionally does not discuss i440fx and focusses solely on q35. I do realize
that redhat has moved on from i440fx and currently efforts for new features
are concentrated on q35 machines only. We have had some hard debates on this
on the qemu mailing list before. The fact of the matter is that i440fx is
not at 1-1 parity with q35. There are many users who are currenly using i440fx
and are simply not ready to move to q35 without sacrificing some
existing features they support today. For example
https://wiki.qemu.org/images/4/4e/Q35.pdf lists some of q35 limitations.
To be fair, aside from "support for Win2000/WinXP", none of the items on
the "limitations" page of that slide deck is something that's impossible
to do with a Q35 machinetype; it's just that accomplishing some things
may be more complicated. But I understand your point. Mainly I brought
it up because I wanted to be sure that we're adding these to fulfill an
actual need, rather than just adding bulk for the sake of completeness,
or to satisfy curiosity.
https://www.linux-kvm.org/images/0/06/2012-forum-Q35.pdf provides
more
information on the differences. Hence we need to solve the issue Laine has
described in the above email for i440fx without adding additional bridges.
Further, in Daniel Berrange's words from :
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg03012.html
"From the upstream POV, there's been no decision / agreement to phase
out PIIX, this is purely a RHEL downstream decision & plan. If other
distros / users have a different POV, and find the feature useful, we
should accept the patch if it meets the normal QEMU patch requirements.
"
Also to be noted that I have already experimented this qemu commandline option
using libvirt passthrough feature as has been documented in
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
This was only meant to be a short term solution until libvirt started
supporting this natively. Supporting this option through libvirt would simplify
their use case as well as add capability validations
and graceful failure scenarios in case qemu did not support the option.
[3] Finally, I implemented support for ``acpi-root-hotplug`` option in Qemu.
Since adding the support for this option, I have not run away :-) I am still
around, fixing other issues in the same subsystem in qemu and also now I have
added myself as a reviewer of patches in this area. I will also be trying to
support/maintain this new xml conf option in libvirt to the extent I can in
future with the help of other experienced maintainers. Obviously this is all
freelance work at this moment and is highly dependent on available free time.
Since I don't follow qemu-devel closely, I didn't have prior knowledge
of exactly what the options did, and it was unclear in the earlier
versions of your patches that what <acpi-hotplug-bridge enabled='no'/>
did was to disable ACPI hotplug for the entire guest (which on Q35 means
that native PCIe hotplug will be found/used, and on 440fx means that
hotplug won't be possible (unless SHPC hotplugged is enabled)). Your
exaplanation and documentation in this spin of the patches makes that
all clear though, so I'm beyond the "what does this do and do we need
it?" stage to the "are there any problems with the code?" stage, and
that's what I'll try to address in my review of the patches.