Re: [libvirt] [RFC] NUMA topology specification

30 Aug 2011

      Hi,

Here is another attempt at guest NUMA topology XML specification that
should work for different NUMA topologies.

We already specify the number of sockets, cores and threads a system
has by using:

<cpu>
<topology sockets='2' cores='2' threads='2'>
</cpu>

For NUMA, we can add the following:

<numa>
<node cpus='0-3' mems='1024'>
<node cpus='4-7' mems='1024'>
</numa>

Specifying only cpus in the NUMA node specification should be enough
to represent most of the topologies. Based on the number of cpus
specified in each node, we should be able to work out how many cores
and sockets will be part of each node. Only other thing needed is
explicit memory specification.

I have taken a few example NUMA topologies here and shown how the
above specification can help.

Magny cours
-----------------
Topology desc: http://code.google.com/p/likwid-topology/wiki/AMD_MagnyCours8

<cpu>
<topology sockets='4' cores='4' threads='1'>
<numa>
        <node cpus='0-3' mems='1024'>
        <node cpus='4-7' mems='1024'>
        <node cpus='8-11' mems='1024'>
        <node cpus='12-15'mems='1024'>
</numa>
<cpu>

OR if we want to stick to how CPUs get enumerated in real hardware we
can specify like this:

<cpu>
<topology sockets='4' cores='4' threads='1'>
<numa>
        <node  cpus='0,2,4,6' mems='1024>
        <node  cpus='8,10,12,14' mems='1024>
        <node  cpus='1,3,5,7' mems='1024'>
        <node  cpus='9,11,13,15' mems='1024'>
</numa>
<cpu>

The above two specifations for Magny Cours aren't perfect because we
conveniently converted the multi-level NUMA into sigle level NUMA.
System has 2 sockets and 2 NUMA domains consisting of 4 cores in each
domain, but we aren't really reflecting this in the topology
specification. But does this really matter ? We are still showing 4
distinct NUMA domains.

Nehalem
------------
Topology desc: http://code.google.com/p/likwid-topology/wiki/Intel_Nehalem

<cpu>
<topology sockets='2' cores='4' threads='2'>
<numa>
        <node cpus='0-7' mems='1024'>
        <node cpus='8-15' mems='1024'>
</numa>
</cpu>

OR if we want to stick to how CPUs get enumerated in real hardware we
can specify like this:

<cpu>
<topology sockets='2' cores='4' threads='2'>
<numa>
        <node cpus='0-3,8-11' mems='1024'>
        <node cpus='4-7,12-15' mems='1024'>
</numa>
</cpu>

However there is a problem here. The specification isn't granular
enough to specify which CPU is part of which core. As you can see in
the topology diagram, CPUs 0,8 belong to one core, CPUs 1,9 belong to
one core etc. So the whole point of specifying all the CPUs explicitly
in the specification gets defeated.

Dunnington
---------------
Topology desc: 2 nodes, 4 sockets in each node, 6 cores in each socket.

<cpu>
<topology sockets='8' cores='6 threads='1'
<numa>
        <node cpus='0-23' mems=1024>
        <node cpus='24-47' mems=1024>
</numa>
</cpu>

Here also there is the same problem. CPUs 0,4,8,12,16,,20 belong to a
core but the specifcation doesn't allow for that.

So here are some questions that we need to answer:

- Can we just go with flat NUMA specification and convert multilevel
NUMA into flat NUMA wherever possible (like in the above Magny cours
eg) ?
- Are there topologies where this doesn't work ?
- Isn't it enough to enumerate CPUs serially among cores and sockets
and not enumerate them exactly as in real hardware ?

Regards,
Bharata.

Re: [libvirt] [RFC] NUMA topology specification

Bharata B Rao