[libvirt] RFC: CPU counting in qemu driver

18 Nov 2010

      Hi all,

libvirt's qemu driver doesn't follow the semantics of CPU-related counters in
nodeinfo structure, which is

    nodes   : the number of NUMA cell, 1 for uniform mem access
    sockets : number of CPU socket per node
    cores   : number of core per socket
    threads : number of threads per core

Qemu driver ignores the "per node" part of sockets semantics, and only gives
total number of sockets found on the host. That actually makes more sense but
we have to fix it since it doesn't follow the documented semantics of public
API. That is, we would do something like the following at the end of
linuxNodeInfoCPUPopulate():

    nodeinfo->sockets /= nodeinfo->nodes;

The problem is that NUMA topology is independent on CPU topology and there are
systems for which nodeinfo->sockets % nodeinfo->nodes != 0. An example being
the following NUMA topology of a system with 4 CPU sockets:

    node0  CPUs: 0-5
           total memory: 8252920
    node1  CPUs: 6-11
           total memory: 16547840
    node2  CPUs: 12-17
           total memory: 8273920
    node3  CPUs: 18-23
           total memory: 16547840
    node4  CPUs: 24-29
           total memory: 8273920
    node5  CPUs: 30-35
           total memory: 16547840
    node6  CPUs: 36-41
           total memory: 8273920
    node7  CPUs: 42-47
           total memory: 16547840

which shows that the cores are actually mapped via the AMD intra-socket
interconnects. Note that this funky topology was verified to be correct so
it's not just a kernel bug which would result in wrong topology being
reported.

So the suggested calculation wouldn't work on such systems and we cannot
really follow the API semantics since it doesn't work in this case.

My suggestion is to use the following code in linuxNodeInfoCPUPopulate():

    if (nodeinfo->sockets % nodeinfo->nodes == 0)
        nodeinfo->sockets /= nodeinfo->nodes;
    else
        nodeinfo->nodes = 1;

That is we would lie about number of NUMA nodes on funky systems. If
nodeinfo->nodes is greater than 1, then applications can rely on it being
correct. If it's 1, applications that care about NUMA topology should consult
/capabilities/host/topology/cells of capabilities XML to check the number of
NUMA nodes in a reliable way, which I guess such applications would do anyway.

However, if you have a better idea of fixing the issue while staying more
compatible with current semantics, don't hesitate to share it.

Note, that we have VIR_NODEINFO_MAXCPUS macro in libvirt.h which computes
maximum number of CPUs as (nodes * sockets * cores * threads) and we need to
keep this working.

Jirka

Jiri Denemark

Daniel P. Berrange

Jiri Denemark

Jiri Denemark

Jiri Denemark

Daniel P. Berrange

Jiri Denemark

tags

participants (2)