Hi all,
libvirt's qemu driver doesn't follow the semantics of CPU-related counters in
nodeinfo structure, which is
nodes : the number of NUMA cell, 1 for uniform mem access
sockets : number of CPU socket per node
cores : number of core per socket
threads : number of threads per core
Qemu driver ignores the "per node" part of sockets semantics, and only gives
total number of sockets found on the host. That actually makes more sense but
we have to fix it since it doesn't follow the documented semantics of public
API. That is, we would do something like the following at the end of
linuxNodeInfoCPUPopulate():
nodeinfo->sockets /= nodeinfo->nodes;
The problem is that NUMA topology is independent on CPU topology and there are
systems for which nodeinfo->sockets % nodeinfo->nodes != 0. An example being
the following NUMA topology of a system with 4 CPU sockets:
node0 CPUs: 0-5
total memory: 8252920
node1 CPUs: 6-11
total memory: 16547840
node2 CPUs: 12-17
total memory: 8273920
node3 CPUs: 18-23
total memory: 16547840
node4 CPUs: 24-29
total memory: 8273920
node5 CPUs: 30-35
total memory: 16547840
node6 CPUs: 36-41
total memory: 8273920
node7 CPUs: 42-47
total memory: 16547840
which shows that the cores are actually mapped via the AMD intra-socket
interconnects. Note that this funky topology was verified to be correct so
it's not just a kernel bug which would result in wrong topology being
reported.
So the suggested calculation wouldn't work on such systems and we cannot
really follow the API semantics since it doesn't work in this case.
My suggestion is to use the following code in linuxNodeInfoCPUPopulate():
if (nodeinfo->sockets % nodeinfo->nodes == 0)
nodeinfo->sockets /= nodeinfo->nodes;
else
nodeinfo->nodes = 1;
That is we would lie about number of NUMA nodes on funky systems. If
nodeinfo->nodes is greater than 1, then applications can rely on it being
correct. If it's 1, applications that care about NUMA topology should consult
/capabilities/host/topology/cells of capabilities XML to check the number of
NUMA nodes in a reliable way, which I guess such applications would do anyway.
However, if you have a better idea of fixing the issue while staying more
compatible with current semantics, don't hesitate to share it.
Note, that we have VIR_NODEINFO_MAXCPUS macro in libvirt.h which computes
maximum number of CPUs as (nodes * sockets * cores * threads) and we need to
keep this working.
Jirka