So you are saying that 1 physical CPU socket, can be associated with
2 NUMA nodes at the same time ? If you have only 4 sockets here, then
there are 12 cores per socket, and 6 cores in each socket in a NUMA
node ?
Yes, that is correct.
Can you provide the full 'numactl --hardware' output. I guess
we're
facing a 2-level NUMA hierarchy, where the first level is done inside
the socket, and the second level is between sockets.
I'm not sure about the details of the topology, Bhavna knows more. Here is the
output of numactl --hardware:
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5
node 0 size: 8189 MB
node 0 free: 7670 MB
node 1 cpus: 6 7 8 9 10 11
node 1 size: 16384 MB
node 1 free: 15855 MB
node 2 cpus: 12 13 14 15 16 17
node 2 size: 8192 MB
node 2 free: 7901 MB
node 3 cpus: 18 19 20 21 22 23
node 3 size: 16384 MB
node 3 free: 15816 MB
node 4 cpus: 24 25 26 27 28 29
node 4 size: 8192 MB
node 4 free: 7897 MB
node 5 cpus: 30 31 32 33 34 35
node 5 size: 16384 MB
node 5 free: 15820 MB
node 6 cpus: 36 37 38 39 40 41
node 6 size: 8192 MB
node 6 free: 7862 MB
node 7 cpus: 42 43 44 45 46 47
node 7 size: 16384 MB
node 7 free: 15858 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 16 16 22 16 22 16 22
1: 16 10 22 16 16 22 22 16
2: 16 22 10 16 16 16 16 16
3: 22 16 16 10 16 16 22 22
4: 16 16 16 16 10 16 16 22
5: 22 22 16 16 16 10 22 16
6: 16 22 16 22 16 22 10 16
7: 22 16 16 22 22 16 16 10
What does Xen / 'xm info' report on such a host ?
The host is currently reserved to someone else and running RHEL-6, so I can't
provide the information now.
In your example it sounds like we could alternatively lie about the
number
of cores per socket. eg, instead of reporting 0.5 sockets per node with 12 cores,
report 1 socket per node each with 6 cores. Thus each of the reported sockets
would once again only be associated with 1 NUMA node at a time.
Yes, but that would also affect CPU topology reported in
/capabilities/host/cpu/topology. I think we should only lie about things for
which we have other ways to determine the truth. And marking nodeinfo->nodes
as unreliable, making it 1 for complicated cases and forcing apps to check
NUMA topology in the XML seems like a more general approach to me. Also
because it would work with architectures where NUMA nodes may contain
different number of sockets (not that I've seen it but I'm sure someone will
come up with it one day :-P).
Jirka