Re: [PATCH 4/5] qemu: Prefer -numa cpu over -numa node,cpus=

22 May 2020

On 5/22/20 6:07 PM, Igor Mammedov wrote:
...
On Fri, 22 May 2020 16:14:14 +0200
Michal Privoznik <mprivozn@redhat.com> wrote:
...
QEMU is trying to obsolete -numa node,cpus= because that uses
ambiguous vCPU id to [socket, die, core, thread] mapping. The new
form is:
-numa cpu,node-id=N,socket-id=S,die-id=D,core-id=C,thread-id=T
which is repeated for every vCPU and places it at [S, D, C, T]
into guest NUMA node N.
While in general this is magic mapping, we can deal with it.
Firstly, with QEMU 2.7 or newer, libvirt ensures that if topology
is given then maxvcpus must be sockets * dies * cores * threads
(i.e. there are no 'holes').
Secondly, if no topology is given then libvirt itself places each
vCPU into a different socket (basically, it fakes topology of:
[maxvcpus, 1, 1, 1])
Thirdly, we can copy whatever QEMU is doing when mapping vCPUs
onto topology, to make sure vCPUs don't start to move around.
Note, migration from old to new cmd line works and therefore
doesn't need any special handling.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1678085
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
  src/qemu/qemu_command.c                       | 108 +++++++++++++++++-
  .../hugepages-nvdimm.x86_64-latest.args       |   4 +-
  ...memory-default-hugepage.x86_64-latest.args |  10 +-
  .../memfd-memory-numa.x86_64-latest.args      |  10 +-
  ...y-hotplug-nvdimm-access.x86_64-latest.args |   4 +-
  ...ry-hotplug-nvdimm-align.x86_64-latest.args |   4 +-
  ...ry-hotplug-nvdimm-label.x86_64-latest.args |   4 +-
  ...ory-hotplug-nvdimm-pmem.x86_64-latest.args |   4 +-
  ...ory-hotplug-nvdimm-ppc64.ppc64-latest.args |   4 +-
  ...hotplug-nvdimm-readonly.x86_64-latest.args |   4 +-
  .../memory-hotplug-nvdimm.x86_64-latest.args  |   4 +-
  ...vhost-user-fs-fd-memory.x86_64-latest.args |   4 +-
  ...vhost-user-fs-hugepages.x86_64-latest.args |   4 +-
  ...host-user-gpu-secondary.x86_64-latest.args |   3 +-
  .../vhost-user-vga.x86_64-latest.args         |   3 +-
  15 files changed, 158 insertions(+), 16 deletions(-)

diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c
index 7d84fd8b5e..0de4fe4905 100644
--- a/src/qemu/qemu_command.c
+++ b/src/qemu/qemu_command.c
@@ -7079,6 +7079,91 @@ qemuBuildNumaOldCPUs(virBufferPtr buf,
  }
+/**
+ * qemuTranlsatevCPUID:
+ *
+ * For given vCPU @id and vCPU topology (@cpu) compute corresponding
+ * @socket, @die, @core and @thread). This assumes linear topology,
+ * that is every [socket, die, core, thread] combination is valid vCPU
+ * ID and there are no 'holes'. This is ensured by
+ * qemuValidateDomainDef() if QEMU_CAPS_QUERY_HOTPLUGGABLE_CPUS is
+ * set.
I wouldn't make this assumption, each machine can have (and has) it's own layout,
and now it's not hard to change that per machine version if necessary.
I'd suppose one could pull the list of possible CPUs from QEMU started
in preconfig mode with desired -smp x,y,z using QUERY_HOTPLUGGABLE_CPUS
and then continue to configure numa with QMP commands using provided
CPUs layout.
Continue where? At the 'preconfig mode' the guest is already started, 
isn't it? Are you suggesting that libvirt starts a dummy QEMU process, 
fetches the CPU topology from it an then starts if for real? Libvirt 
tries to avoid that as much as it can.
...
How to present it to libvirt user I'm not sure (give them that list perhaps
and let select from it???)
This is what I am trying to figure out in the cover letter. Maybe we 
need to let users configure the topology (well, vCPU id to [socket, die, 
core, thread] mapping), but then again, in my testing the guest ignored 
that and displayed different topology (true, I was testing with -cpu 
host, so maybe that's why).
...
But it's irrelevant, to the patch, magical IDs for socket/core/...whatever
should not be generated by libvirt anymore, but rather taken from QEMU for given
machine + -smp combination.
Taken when? We can do this for running machines, but not for freshly 
started ones, can we?

Michal