Cc Paolo for x86 topology part
Hi Daniel,
On Mon, May 13, 2024 at 01:33:57PM +0100, Daniel P. Berrangé wrote:
Date: Mon, 13 May 2024 13:33:57 +0100
From: "Daniel P. Berrangé" <berrange(a)redhat.com>
Subject: [PATCH 1/2] hw/core: allow parameter=1 for SMP topology on any
machine
This effectively reverts
commit 54c4ea8f3ae614054079395842128a856a73dbf9
Author: Zhao Liu <zhao1.liu(a)intel.com>
Date: Sat Mar 9 00:01:37 2024 +0800
hw/core/machine-smp: Deprecate unsupported "parameter=1" SMP
configurations
but is not done as a 'git revert' since the part of the changes to the
file hw/core/machine-smp.c which add 'has_XXX' checks remain desirable.
Furthermore, we have to tweak the subsequently added unit test to
account for differing warning message.
The rationale for the original deprecation was:
"Currently, it was allowed for users to specify the unsupported
topology parameter as "1". For example, x86 PC machine doesn't
support drawer/book/cluster topology levels, but user could specify
"-smp drawers=1,books=1,clusters=1".
This is meaningless and confusing, so that the support for this kind
of configurations is marked deprecated since 9.0."
There are varying POVs on the topic of 'unsupported' topology levels.
It is common to say that on a system without hyperthreading, that there
is always 1 thread. Likewise when new CPUs introduced a concept of
multiple "dies', it was reasonable to say that all historical CPUs
before that implicitly had 1 'die'. Likewise for the more recently
introduced 'modules' and 'clusters' parameter'. From this POV, it is
valid to set 'parameter=1' on the -smp command line for any machine,
only a value > 1 is strictly an error condition.
Currently QEMU has become more and more difficult to maintain a general
topology hierarchy, there are two recent examples:
1. as you mentioned "module" v.s. "cluster", one reason for
introducing
"module" is because it is difficult to define what "cluster" is for
x86,
the cluster in the device tree can be nested, then it can correspond to
an x86 die, or it can correspond to an x86 module. Therefore, specifying
"clusters=1" for x86 is ambiguous.
2. s390 introduces book and drawer, which are above socket/package
level, but for x86, the level above the package names "cluster" (yeah,
"cluster" again :-(). So if user sets "books=1" or
"drawers=1" for x86,
then it's meaningless. Similarly, "clusters=1" is also very confusing for
x86 machine.
I think that only thread/core/socket are architecturally general, the
other topology levels are hard to define across architectures, then
allowing unsupported topology=1 is always confusing...
Moreover, QEMU currently requires a clear topology containment
relationship when defining a topology, after which it will become
increasingly difficult to define a generic topology containment
relationship when new topology levels are introduced in the future...
It doesn't cause any functional difficulty for QEMU, because
internally
the QEMU code is itself assuming that all "unsupported" parameters
implicitly have a value of '1'.
At the libvirt level, we've allowed applications to set 'parameter=1'
when configuring a guest, and pass that through to QEMU.
Deprecating this creates extra difficulty for because there's no info
exposed from QEMU about which machine types "support" which parameters.
Thus, libvirt can't know whether it is valid to pass 'parameter=1' for
a given machine type, or whether it will trigger deprecation messages.
I understand that libvirt is having trouble because there is no interface
to expose which topology levels the current machine supports. As a
workaround to eliminate the difficulties at the libvirt level, it's
ok for me.
But I believe deprecating the unsupported topology is necessary, so do
you think it's acceptable to include an interface to expose the supported
topology if it's going to be deprecated again later?
Regards,
Zhao