On 5/22/20 6:07 PM, Igor Mammedov wrote:
On Fri, 22 May 2020 16:14:14 +0200
Michal Privoznik <mprivozn(a)redhat.com> wrote:
> QEMU is trying to obsolete -numa node,cpus= because that uses
> ambiguous vCPU id to [socket, die, core, thread] mapping. The new
> form is:
>
> -numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T
>
> which is repeated for every vCPU and places it at [S, D, C, T]
> into guest NUMA node N.
>
> While in general this is magic mapping, we can deal with it.
> Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology
> is given then maxvcpus must be sockets * dies * cores * threads
> (i.e. there are no 'holes').
> Secondly, if no topology is given then libvirt itself places each
> vCPU into a different socket (basically, it fakes topology of:
> [maxvcpus, 1, 1, 1])
> Thirdly, we can copy whatever QEMU is doing when mapping vCPUs
> onto topology, to make sure vCPUs don't start to move around.
>
> Note, migration from old to new cmd line works and therefore
> doesn't need any special handling.
>
> Fixes:
https://bugzilla.redhat.com/show_bug.cgi?id=1678085
>
> Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
> ---
> src/qemu/qemu_command.c | 108 +++++++++++++++++-
> .../hugepages-nvdimm.x86_64-latest.args | 4 +-
> ...memory-default-hugepage.x86_64-latest.args | 10 +-
> .../memfd-memory-numa.x86_64-latest.args | 10 +-
> ...y-hotplug-nvdimm-access.x86_64-latest.args | 4 +-
> ...ry-hotplug-nvdimm-align.x86_64-latest.args | 4 +-
> ...ry-hotplug-nvdimm-label.x86_64-latest.args | 4 +-
> ...ory-hotplug-nvdimm-pmem.x86_64-latest.args | 4 +-
> ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args | 4 +-
> ...hotplug-nvdimm-readonly.x86_64-latest.args | 4 +-
> .../memory-hotplug-nvdimm.x86_64-latest.args | 4 +-
> ...vhost-user-fs-fd-memory.x86_64-latest.args | 4 +-
> ...vhost-user-fs-hugepages.x86_64-latest.args | 4 +-
> ...host-user-gpu-secondary.x86_64-latest.args | 3 +-
> .../vhost-user-vga.x86_64-latest.args | 3 +-
> 15 files changed, 158 insertions(+), 16 deletions(-)
>
> diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
> index 7d84fd8b5e..0de4fe4905 100644
> --- a/src/qemu/qemu_command.c
> +++ b/src/qemu/qemu_command.c
> @@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf,
> }
>
>
> +/**
> + * qemuTranlsatevCPUID:
> + *
> + * For given vCPU @id and vCPU topology (@cpu) compute corresponding
> + * @socket, @die, @core and @thread). This assumes linear topology,
> + * that is every [socket, die, core, thread] combination is valid vCPU
> + * ID and there are no 'holes'. This is ensured by
> + * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is
> + * set.
I wouldn't make this assumption, each machine can have (and has) it's own
layout,
and now it's not hard to change that per machine version if necessary.
I'd suppose one could pull the list of possible CPUs from QEMU started
in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS
and then continue to configure numa with QMP commands using provided
CPUs layout.
Continue where? At the 'preconfig mode' the guest is already started,
isn't it? Are you suggesting that libvirt starts a dummy QEMU process,
fetches the CPU topology from it an then starts if for real? Libvirt
tries to avoid that as much as it can.
How to present it to libvirt user I'm not sure (give them that list perhaps
and let select from it???)
This is what I am trying to figure out in the cover letter. Maybe we
need to let users configure the topology (well, vCPU id to [socket, die,
core, thread] mapping), but then again, in my testing the guest ignored
that and displayed different topology (true, I was testing with -cpu
host, so maybe that's why).
But it's irrelevant, to the patch, magical IDs for
socket/core/...whatever
should not be generated by libvirt anymore, but rather taken from QEMU for given
machine + -smp combination.
Taken when? We can do this for running machines, but not for freshly
started ones, can we?
Michal