Re: [PATCH v3 0/5] Add support for two i386 pm options which control acpi hotplug

24 Sep 2021


      +Igor
+Michael

On Thu, 23 Sep 2021, Laine Stump wrote:
...
On 9/11/21 11:26 PM, Ani Sinha wrote:
...
Hi all:
This patchset introduces libvirt xml support for the following two pm conf
options:
<pm>
   <acpi-hotplug-bridge enabled='no'/>
   <acpi-root-hotplug enabled='yes'/>
</pm>
(before I get into a more radical discussion about different options - since
we aren't exactly duplicating the QEMU option name anyway, what if we made
these names more consistent, e.g. "acpi-hotplug-bridge" and
"acpi-hotplug-root"?)
yes this is fine. I can swap the two words.
...
I've thought quite a bit about whether to put these attributes here, or
somewhere else, and I'm still undecided.
My initial reaction to this was "PM == Power Management, and power management
is all about suspend mode support. Hotplug isn't power management." But then
you look at the name of the QEMU option and PM is right there in the name, and
I guess it's *kind of related* (effectively suspending/resuming a single
device), so maybe I'm thinking too narrowly.
So are there alternate places that might fit the purpose of these new options
better, rather than directly mimicking the QEMU option placement (for better
or worse)? A couple alternative possibilities:
1) ****
One possibility would be to include these new flags within the existing <acpi>
subelement of <features>, which is already used to control whether the guest
exposes ACPI to the guest *at all* (via adding "-no-acpi" to the QEMU
commandline when <acpi> is missing - NB: this feature flag is currently
supported only on x86 and aarch64 QEMU platforms, and ignored for all other
hypervisors).
Possibly the new flags could be put in something like this:
<features>
  <acpi>
    <hotplug-bridge enabled='no'/>
    <hotplug-root enabled='yes'/>
  </acpi>
  ...
</features>
But:
* currently there are no subelements to <acpi>. So this isn't "extending
according to an existing pattern".
* even though the <features> element uses presence of a subelement to indicate
"enabled" and absence of the subelement to indicate "disabled". But in the
case of these new acpi bridge options we would need to explicitly have the
"enabled='yes/no'" rather than just using presence of the option to mean
"enabled" and absence to mean "disabled" because the default for
"root-hotplug" up until now has been *enabled*, and the default for
hotplug-bridge is different depending on machinetype. We need to continue
working properly (and identically) with old/existing XML, but if we didn't
have an "enabled" attribute for these new flags, there would be no way to tell
the difference between "not specified" and "disabled", and so no way to
disable the feature for a QEMU where the default was "enabled". (Why does this
matter? Because I don't like the inconsistency that would arise from some
feature flags using absense to mean "disabled" and some using it to mean "use
the default".)
* Having something in <features> in the domain XML kind of implies that the
associated capability flags should be represented in the <features> section of
the domain capabilities. For example, <acpi/> is listed under <features> in
the output of virsh capabilities, separately from the flag indicating presence
of the -no-acpi option. I'm not sure if we would need to add something there
for these options if we moved them into <features> (seems a bit redundant to
me to have it in both places, but I'm sure there are $reasons).
2) *****
Alternately, there is an <acpi> subelement of <os>, which is currently used to
add a SLIC table (some sort of software license table, which I'd never heard
of before) using QEMU's -acpitable commandline option. It is also used somehow
by the Xen driver.
<os>
  <acpi>
    <table type='slic'>/path/to/slic.dat</table>
    <hotplug-bridge enabled='no'/>
    <hotplug-root enabled='yes'/>
  </acpi>
  ...
</os>
My problem with adding these new PCI controller acpi options to os/acpi is
simply that it's in the <os> subelement, which is claimed elsewhere to be
intended for OS boot options, and is used for things like specifying the path
to a kernel / initrd to boot from.
3) ****
A third option, suggested somewhere by Ani, would be to make a completely new
top-level element, called something like <acpiHotplug> that would have
separate attributes for the two flags, e.g.:
<acpiHotplug bridge='yes' root='yes'/>
I dislike new toplevel options because they just seem so adhoc, as if the XML
namespace is a cluttered, disorganized room. That reminds me too much of my
own workspace, which is just... depressing.
****
Since I always seem to spend *way too much time* worrying about naming, only
to have it come out wrong in the end anyway, I'm looking for some other
opinions. Counting the version that is in Ani's patch currently as option "0",
which option do you all think is the best? Or is it completely unimportant?
My preference is obviously option #0 and #3. However, community
opinion/perspective is certainly required here.
...
...
The above two options are only available for qemu driver and that too for
x86
guests only. Both of them are global options.
``acpi-hotplug-bridge`` option enables or disables ACPI hotplug support for
cold
plugged bridges. Examples of cold plugged bridges include PCI-PCI bridge
(pci-bridge controller) for pc machines and pcie-root-port controller for
q35
machines. The corresponding commandline options to qemu for x86 guests are:
The "cold plugged bridges" term here throws me for a loop - it implies that
hotplugging bridges is something that's supported, and I think it still isn't.
Of course this is just the cover letter, so it won't go into git anywhere, but
I think it should be enough to say "enables ACPI hotplug into non-root bus PCI
bridges/ports".
...
(pc machines): -global
PIIX4_PM.acpi-pci-hotplug-with-bridge-support=<off/on>
(q35 machines): -global
ICH9-LPC.acpi-pci-hotplug-with-bridge-support=<off/on>
So I'm curious - if the QEMU commandline also included "-no-acpi" along with
these, what would happen? Would it be silently ignored? Generate an error? Or
does -no-acpi only control the suspend support, and acpi hotplug is still
available?
-no-acpi disables acpi completely from i386 machines. Please see
acpi_setup() where we bail out of x86_machine_is_acpi_enabled() is false.
So no support for any acpi based hotplug will be available. Those other
options will be ignored.
...
...
Being global options, no other bridge specific options for pci-bridge
controller or pcie-root-port controllers are required. For pc machine type
in
x86, this option is available in qemu for a long time, from version 2.1.
Please see the changes in qemu.git:
9e047b982452c6 ("piix4: add acpi pci hotplug support")
Interesting. So how was hotplug handled before this? With SHPC? I know there
must be *some* kind of hotplug support in older QEMU, because RHEL6 QEMU
supported hotplug, and it was based on qemu 0.12 or something ancient like
that...
good question. I do not know. may be imammeodo and mst (cc'd) can help
here.
...
...
133a2da488062e ("pc: acpi: generate AML only for PCI0 devices if PCI bridge
hotplug is disabled")
For q35 machine type, this was introduced in qemu 6.1 with the following
changes in qemu.git:
(a) c0e427d6eb5fef ("hw/acpi/ich9: Enable ACPI PCI hot-plug")
(b) 17858a16950860 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
The reasons for enabling ACPI based hotplug for PCIe (q35) based machines
(as
opposed to native hotplug) for bridges are outlined in (b). It is possible
that
some users might still want to use native hotplug on PCIe [1]. Therefore,
this conf option enables users to choose either ACPI based hotplug or native
hotplug for cold plugged bridges (for example for pcie root port controller
in q35 machines).
``acpi-root-hotplug`` option enables or disables ACPI based hotplug for PCI
root
bus (pci-root controller). This option is only available for pc machine
type.
The corresponding commandline option to qemu for x86 guests is:
-global PIIX4_PM.acpi-root-pci-hotplug=<off/on>
This additional option enables users to disable hotplug for all devices in
the
system without adding an additional PCI-PCI bridge, putting the devices
behind
the bridge and using the existing ``acpi-hotplug-bridge`` option to disable
hotplug on that bridge. This feature was introduced from qemu version 5.2
with
the following change in qemu.git:
3d7e78aa7777f ("Introduce a new flag for i440fx to disable PCI hotplug on
the root bus")
The above qemu commit describes some compelling reasons why users might to
disable hotplug on PCI root buses [2].
A brief summary of the patches:
...
[PATCH v3 1/5] qemu: capablities: detect presence of
[PATCH v3 2/5] qemu: capablities: detect presence of
Patches 1 and 2 implement support for qemu capability checks for the above
config options.
...
[PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and
Patch 3 actually adds the config option to the schema and adds related unit
tests.
...
[PATCH v3 4/5] qemu: command: add support for qemu options that
Patch 4 adds the backend qemu commandline support for the options. It also
adds
relevant unit tests for the same.
...
[PATCH v3 5/5] NEWS: add new acpi pci hotplug options in the release
Patch 5 adds the release notes for the next libvirt release.
Changelog:
v1: initial implementation. Had some bugs and missed some unit tests.
v2: fixed bugs and added additional missing unit tests.
v3: reorganized the patches as per Laine's suggestion. Added more
     details in commit messages. Added conf description in formatdomain.rst.
     Added changelog for next release.
Notes:
[1] One concrete example of why one might still want to use native hotplug
with
pcie-root-port controller is the fact that we are still discovering issues
with
acpi hotplug on PCIE.
Yes, sigh. I recall someone saying something like "if we switch to ACPI
hotplug then all these bugs just go away and everything works" or something
like that. Reality never matches the ideal picture we put in our brains.
At least ACPI hotplug is only the default on new machinetypes (doesn't help
much for management platforms that always just use "q35" every time they start
a guest). And it can also cause problems with distro-specific machinetypes in
downstream distros when they are rebased: https://bugzilla.redhat.com/2006409
Oh wow, what a tangled web! Yes, during the transition we might see some
more issues until things get stable.
...
...
One such issue is:
https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02146.html
Another reason is that users have been using native hotplug on pcie root
ports
up until now. They have built and tested their systems based on native
hotplug.
They may not want to suddenly move to acpi based hotplug just because it is
now
the default in qemu. Supporting the option to chose one or the other through
libvirt makes things simpler for end users.
[2] The use case scenario described by Laine in
https://listman.redhat.com/archives/libvir-list/2020-February/msg00110.html
intentionally does not discuss i440fx and focusses solely on q35. I do
realize
that redhat has moved on from i440fx and currently efforts for new features
are concentrated on q35 machines only. We have had some hard debates on this
on the qemu mailing list before. The fact of the matter is that i440fx is
not at 1-1 parity with q35. There are many users who are currenly using
i440fx
and are simply not ready to move to q35 without sacrificing some
existing features they support today. For example
https://wiki.qemu.org/images/4/4e/Q35.pdf lists some of q35 limitations.
To be fair, aside from "support for Win2000/WinXP", none of the items on the
"limitations" page of that slide deck is something that's impossible to do
with a Q35 machinetype; it's just that accomplishing some things may be more
complicated. But I understand your point. Mainly I brought it up because I
wanted to be sure that we're adding these to fulfill an actual need, rather
than just adding bulk for the sake of completeness, or to satisfy curiosity.
Makes sense.
...
...
https://www.linux-kvm.org/images/0/06/2012-forum-Q35.pdf provides more
information on the differences. Hence we need to solve the issue Laine has
described in the above email for i440fx without adding additional bridges.
Further, in  Daniel Berrange's words from :
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg03012.html
"From the upstream POV, there's been no decision / agreement to phase
out PIIX, this is purely a RHEL downstream decision & plan. If other
distros / users have a different POV, and find the feature useful, we
should accept the patch if it meets the normal QEMU patch requirements.
"
Also to be noted that I have already experimented this qemu commandline
option
using libvirt passthrough feature as has been documented in
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
This was only meant to be a short term solution until libvirt started
supporting this natively. Supporting this option through libvirt would
simplify
their use case as well as add capability validations
and graceful failure scenarios in case qemu did not support the option.
[3] Finally, I implemented support for ``acpi-root-hotplug`` option in Qemu.
Since adding the support for this option, I have not run away :-) I am still
around, fixing other issues in the same subsystem in qemu and also now I
have
added myself as a reviewer of patches in this area. I will also be trying to
support/maintain this new xml conf option in libvirt to the extent I can in
future with the help of other experienced maintainers. Obviously this is all
freelance work at this moment and is highly dependent on available free
time.
Since I don't follow qemu-devel closely, I didn't have prior knowledge of
exactly what the options did, and it was unclear in the earlier versions of
your patches that what <acpi-hotplug-bridge enabled='no'/> did was to disable
ACPI hotplug for the entire guest (which on Q35 means that native PCIe hotplug
will be found/used, and on 440fx means that hotplug won't be possible (unless
SHPC hotplugged is enabled)). Your exaplanation and documentation in this spin
of the patches makes that all clear though, so I'm beyond the "what does this
do and do we need it?" stage to the "are there any problems with the code?"
stage, and that's what I'll try to address in my review of the patches.
Sounds good.