Re: [libvirt] RFC: CPU counting in qemu driver

Monday, 22 November 2010

On Thu, Nov 18, 2010 at 06:51:20PM +0100, Jiri Denemark wrote:
...
 Hi all,

 libvirt's qemu driver doesn't follow the semantics of CPU-related counters in
 nodeinfo structure, which is

     nodes   : the number of NUMA cell, 1 for uniform mem access
     sockets : number of CPU socket per node
     cores   : number of core per socket
     threads : number of threads per core

 Qemu driver ignores the "per node" part of sockets semantics, and only gives
 total number of sockets found on the host. That actually makes more sense but
 we have to fix it since it doesn't follow the documented semantics of public
 API. That is, we would do something like the following at the end of
 linuxNodeInfoCPUPopulate():

     nodeinfo->sockets /= nodeinfo->nodes;

 The problem is that NUMA topology is independent on CPU topology and there are
 systems for which nodeinfo->sockets % nodeinfo->nodes != 0. An example being
 the following NUMA topology of a system with 4 CPU sockets:

     node0  CPUs: 0-5
            total memory: 8252920
     node1  CPUs: 6-11
            total memory: 16547840
     node2  CPUs: 12-17
            total memory: 8273920
     node3  CPUs: 18-23
            total memory: 16547840
     node4  CPUs: 24-29
            total memory: 8273920
     node5  CPUs: 30-35
            total memory: 16547840
     node6  CPUs: 36-41
            total memory: 8273920
     node7  CPUs: 42-47
            total memory: 16547840

 which shows that the cores are actually mapped via the AMD intra-socket
 interconnects. Note that this funky topology was verified to be correct so
 it's not just a kernel bug which would result in wrong topology being
 reported. 
So you are saying that 1 physical CPU socket, can be associated with
2 NUMA nodes at the same time ?  If you have only 4 sockets here, then
there are 12 cores per socket, and 6 cores in each socket in a NUMA
node ?

Can you provide the full 'numactl --hardware' output. I guess we're
facing a 2-level NUMA hierarchy, where the first level is done inside
the socket, and the second level is between sockets.

What does Xen / 'xm info' report on such a host ?

...
 So the suggested calculation wouldn't work on such systems and we
cannot
 really follow the API semantics since it doesn't work in this case.

 My suggestion is to use the following code in linuxNodeInfoCPUPopulate():

     if (nodeinfo->sockets % nodeinfo->nodes == 0)
         nodeinfo->sockets /= nodeinfo->nodes;
     else
         nodeinfo->nodes = 1;

 That is we would lie about number of NUMA nodes on funky systems. If
 nodeinfo->nodes is greater than 1, then applications can rely on it being
 correct. If it's 1, applications that care about NUMA topology should consult
 /capabilities/host/topology/cells of capabilities XML to check the number of
 NUMA nodes in a reliable way, which I guess such applications would do anyway

 However, if you have a better idea of fixing the issue while staying more
 compatible with current semantics, don't hesitate to share it. 
In your example it sounds like we could alternatively lie about the number
of cores per socket. eg, instead of reporting 0.5 sockets per node with 12 cores,
report 1 socket per node each with 6 cores. Thus each of the reported sockets
would once again only be associated with 1 NUMA node at a time.

...
 Note, that we have VIR_NODEINFO_MAXCPUS macro in libvirt.h which
computes
 maximum number of CPUs as (nodes * sockets * cores * threads) and we need to
 keep this working. 

Daniel

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] RFC: CPU counting in qemu driver