+Igor
+Michael
On Thu, 23 Sep 2021, Laine Stump wrote:
On 9/11/21 11:26 PM, Ani Sinha wrote:
> Hi all:
>
> This patchset introduces libvirt xml support for the following two pm conf
> options:
>
> <pm>
> <acpi-hotplug-bridge enabled='no'/>
> <acpi-root-hotplug enabled='yes'/>
> </pm>
(before I get into a more radical discussion about different options - since
we aren't exactly duplicating the QEMU option name anyway, what if we made
these names more consistent, e.g. "acpi-hotplug-bridge" and
"acpi-hotplug-root"?)
yes this is fine. I can swap the two words.
I've thought quite a bit about whether to put these attributes here, or
somewhere else, and I'm still undecided.
My initial reaction to this was "PM == Power Management, and power management
is all about suspend mode support. Hotplug isn't power management." But then
you look at the name of the QEMU option and PM is right there in the name, and
I guess it's *kind of related* (effectively suspending/resuming a single
device), so maybe I'm thinking too narrowly.
So are there alternate places that might fit the purpose of these new options
better, rather than directly mimicking the QEMU option placement (for better
or worse)? A couple alternative possibilities:
1) ****
One possibility would be to include these new flags within the existing <acpi>
subelement of <features>, which is already used to control whether the guest
exposes ACPI to the guest *at all* (via adding "-no-acpi" to the QEMU
commandline when <acpi> is missing - NB: this feature flag is currently
supported only on x86 and aarch64 QEMU platforms, and ignored for all other
hypervisors).
Possibly the new flags could be put in something like this:
<features>
<acpi>
<hotplug-bridge enabled='no'/>
<hotplug-root enabled='yes'/>
</acpi>
...
</features>
But:
* currently there are no subelements to <acpi>. So this isn't "extending
according to an existing pattern".
* even though the <features> element uses presence of a subelement to indicate
"enabled" and absence of the subelement to indicate "disabled". But
in the
case of these new acpi bridge options we would need to explicitly have the
"enabled='yes/no'" rather than just using presence of the option to
mean
"enabled" and absence to mean "disabled" because the default for
"root-hotplug" up until now has been *enabled*, and the default for
hotplug-bridge is different depending on machinetype. We need to continue
working properly (and identically) with old/existing XML, but if we didn't
have an "enabled" attribute for these new flags, there would be no way to tell
the difference between "not specified" and "disabled", and so no way
to
disable the feature for a QEMU where the default was "enabled". (Why does this
matter? Because I don't like the inconsistency that would arise from some
feature flags using absense to mean "disabled" and some using it to mean
"use
the default".)
* Having something in <features> in the domain XML kind of implies that the
associated capability flags should be represented in the <features> section of
the domain capabilities. For example, <acpi/> is listed under <features> in
the output of virsh capabilities, separately from the flag indicating presence
of the -no-acpi option. I'm not sure if we would need to add something there
for these options if we moved them into <features> (seems a bit redundant to
me to have it in both places, but I'm sure there are $reasons).
2) *****
Alternately, there is an <acpi> subelement of <os>, which is currently used
to
add a SLIC table (some sort of software license table, which I'd never heard
of before) using QEMU's -acpitable commandline option. It is also used somehow
by the Xen driver.
<os>
<acpi>
<table type='slic'>/path/to/slic.dat</table>
<hotplug-bridge enabled='no'/>
<hotplug-root enabled='yes'/>
</acpi>
...
</os>
My problem with adding these new PCI controller acpi options to os/acpi is
simply that it's in the <os> subelement, which is claimed elsewhere to be
intended for OS boot options, and is used for things like specifying the path
to a kernel / initrd to boot from.
3) ****
A third option, suggested somewhere by Ani, would be to make a completely new
top-level element, called something like <acpiHotplug> that would have
separate attributes for the two flags, e.g.:
<acpiHotplug bridge='yes' root='yes'/>
I dislike new toplevel options because they just seem so adhoc, as if the XML
namespace is a cluttered, disorganized room. That reminds me too much of my
own workspace, which is just... depressing.
****
Since I always seem to spend *way too much time* worrying about naming, only
to have it come out wrong in the end anyway, I'm looking for some other
opinions. Counting the version that is in Ani's patch currently as option
"0",
which option do you all think is the best? Or is it completely unimportant?
My preference is obviously option #0 and #3. However, community
opinion/perspective is certainly required here.
> The above two options are only available for qemu driver and
that too for
> x86
> guests only. Both of them are global options.
>
> ``acpi-hotplug-bridge`` option enables or disables ACPI hotplug support for
> cold
> plugged bridges. Examples of cold plugged bridges include PCI-PCI bridge
> (pci-bridge controller) for pc machines and pcie-root-port controller for
> q35
> machines. The corresponding commandline options to qemu for x86 guests are:
The "cold plugged bridges" term here throws me for a loop - it implies that
hotplugging bridges is something that's supported, and I think it still isn't.
Of course this is just the cover letter, so it won't go into git anywhere, but
I think it should be enough to say "enables ACPI hotplug into non-root bus PCI
bridges/ports".
>
> (pc machines): -global
> PIIX4_PM.acpi-pci-hotplug-with-bridge-support=<off/on>
> (q35 machines): -global
> ICH9-LPC.acpi-pci-hotplug-with-bridge-support=<off/on>
So I'm curious - if the QEMU commandline also included "-no-acpi" along
with
these, what would happen? Would it be silently ignored? Generate an error? Or
does -no-acpi only control the suspend support, and acpi hotplug is still
available?
-no-acpi disables acpi completely from i386 machines. Please see
acpi_setup() where we bail out of x86_machine_is_acpi_enabled() is false.
So no support for any acpi based hotplug will be available. Those other
options will be ignored.
>
> Being global options, no other bridge specific options for pci-bridge
> controller or pcie-root-port controllers are required. For pc machine type
> in
> x86, this option is available in qemu for a long time, from version 2.1.
> Please see the changes in qemu.git:
>
> 9e047b982452c6 ("piix4: add acpi pci hotplug support")
Interesting. So how was hotplug handled before this? With SHPC? I know there
must be *some* kind of hotplug support in older QEMU, because RHEL6 QEMU
supported hotplug, and it was based on qemu 0.12 or something ancient like
that...
good question. I do not know. may be imammeodo and mst (cc'd) can help
here.
> 133a2da488062e ("pc: acpi: generate AML only for PCI0 devices if PCI bridge
> hotplug is disabled")
>
> For q35 machine type, this was introduced in qemu 6.1 with the following
> changes in qemu.git:
>
> (a) c0e427d6eb5fef ("hw/acpi/ich9: Enable ACPI PCI hot-plug")
> (b) 17858a16950860 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
Q35")
>
> The reasons for enabling ACPI based hotplug for PCIe (q35) based machines
> (as
> opposed to native hotplug) for bridges are outlined in (b). It is possible
> that
> some users might still want to use native hotplug on PCIe [1]. Therefore,
> this conf option enables users to choose either ACPI based hotplug or native
> hotplug for cold plugged bridges (for example for pcie root port controller
> in q35 machines).
>
> ``acpi-root-hotplug`` option enables or disables ACPI based hotplug for PCI
> root
> bus (pci-root controller). This option is only available for pc machine
> type.
> The corresponding commandline option to qemu for x86 guests is:
>
> -global PIIX4_PM.acpi-root-pci-hotplug=<off/on>
>
> This additional option enables users to disable hotplug for all devices in
> the
> system without adding an additional PCI-PCI bridge, putting the devices
> behind
> the bridge and using the existing ``acpi-hotplug-bridge`` option to disable
> hotplug on that bridge. This feature was introduced from qemu version 5.2
> with
> the following change in qemu.git:
>
> 3d7e78aa7777f ("Introduce a new flag for i440fx to disable PCI hotplug on
> the root bus")
>
> The above qemu commit describes some compelling reasons why users might to
> disable hotplug on PCI root buses [2].
>
> A brief summary of the patches:
>
> > [PATCH v3 1/5] qemu: capablities: detect presence of
> > [PATCH v3 2/5] qemu: capablities: detect presence of
> Patches 1 and 2 implement support for qemu capability checks for the above
> config options.
>
> > [PATCH v3 3/5] conf: introduce acpi-hotplug-bridge and
> Patch 3 actually adds the config option to the schema and adds related unit
> tests.
>
> > [PATCH v3 4/5] qemu: command: add support for qemu options that
> Patch 4 adds the backend qemu commandline support for the options. It also
> adds
> relevant unit tests for the same.
>
> > [PATCH v3 5/5] NEWS: add new acpi pci hotplug options in the release
> Patch 5 adds the release notes for the next libvirt release.
>
>
> Changelog:
> v1: initial implementation. Had some bugs and missed some unit tests.
> v2: fixed bugs and added additional missing unit tests.
> v3: reorganized the patches as per Laine's suggestion. Added more
> details in commit messages. Added conf description in formatdomain.rst.
> Added changelog for next release.
>
> Notes:
>
> [1] One concrete example of why one might still want to use native hotplug
> with
> pcie-root-port controller is the fact that we are still discovering issues
> with
> acpi hotplug on PCIE.
Yes, sigh. I recall someone saying something like "if we switch to ACPI
hotplug then all these bugs just go away and everything works" or something
like that. Reality never matches the ideal picture we put in our brains.
At least ACPI hotplug is only the default on new machinetypes (doesn't help
much for management platforms that always just use "q35" every time they start
a guest). And it can also cause problems with distro-specific machinetypes in
downstream distros when they are rebased:
https://bugzilla.redhat.com/2006409
Oh wow, what a tangled web! Yes, during the transition we might see some
more issues until things get stable.
> One such issue is:
>
https://lists.gnu.org/archive/html/qemu-devel/2021-09/msg02146.html
> Another reason is that users have been using native hotplug on pcie root
> ports
> up until now. They have built and tested their systems based on native
> hotplug.
> They may not want to suddenly move to acpi based hotplug just because it is
> now
> the default in qemu. Supporting the option to chose one or the other through
> libvirt makes things simpler for end users.
>
> [2] The use case scenario described by Laine in
>
https://listman.redhat.com/archives/libvir-list/2020-February/msg00110.html
> intentionally does not discuss i440fx and focusses solely on q35. I do
> realize
> that redhat has moved on from i440fx and currently efforts for new features
> are concentrated on q35 machines only. We have had some hard debates on this
> on the qemu mailing list before. The fact of the matter is that i440fx is
> not at 1-1 parity with q35. There are many users who are currenly using
> i440fx
> and are simply not ready to move to q35 without sacrificing some
> existing features they support today. For example
>
https://wiki.qemu.org/images/4/4e/Q35.pdf lists some of q35 limitations.
To be fair, aside from "support for Win2000/WinXP", none of the items on the
"limitations" page of that slide deck is something that's impossible to do
with a Q35 machinetype; it's just that accomplishing some things may be more
complicated. But I understand your point. Mainly I brought it up because I
wanted to be sure that we're adding these to fulfill an actual need, rather
than just adding bulk for the sake of completeness, or to satisfy curiosity.
Makes sense.
>
https://www.linux-kvm.org/images/0/06/2012-forum-Q35.pdf
provides more
> information on the differences. Hence we need to solve the issue Laine has
> described in the above email for i440fx without adding additional bridges.
>
> Further, in Daniel Berrange's words from :
>
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg03012.html
>
> "From the upstream POV, there's been no decision / agreement to phase
> out PIIX, this is purely a RHEL downstream decision & plan. If other
> distros / users have a different POV, and find the feature useful, we
> should accept the patch if it meets the normal QEMU patch requirements.
> "
>
> Also to be noted that I have already experimented this qemu commandline
> option
> using libvirt passthrough feature as has been documented in
>
http://blog.vmsplice.net/2011/04/how-to-pass-qemu-command-line-options.html
> This was only meant to be a short term solution until libvirt started
> supporting this natively. Supporting this option through libvirt would
> simplify
> their use case as well as add capability validations
> and graceful failure scenarios in case qemu did not support the option.
>
> [3] Finally, I implemented support for ``acpi-root-hotplug`` option in Qemu.
> Since adding the support for this option, I have not run away :-) I am still
> around, fixing other issues in the same subsystem in qemu and also now I
> have
> added myself as a reviewer of patches in this area. I will also be trying to
> support/maintain this new xml conf option in libvirt to the extent I can in
> future with the help of other experienced maintainers. Obviously this is all
> freelance work at this moment and is highly dependent on available free
> time.
>
>
Since I don't follow qemu-devel closely, I didn't have prior knowledge of
exactly what the options did, and it was unclear in the earlier versions of
your patches that what <acpi-hotplug-bridge enabled='no'/> did was to
disable
ACPI hotplug for the entire guest (which on Q35 means that native PCIe hotplug
will be found/used, and on 440fx means that hotplug won't be possible (unless
SHPC hotplugged is enabled)). Your exaplanation and documentation in this spin
of the patches makes that all clear though, so I'm beyond the "what does this
do and do we need it?" stage to the "are there any problems with the
code?"
stage, and that's what I'll try to address in my review of the patches.
Sounds good.