On 10/30/12 21:08, Eric Blake wrote:
On 10/30/2012 05:07 AM, Peter Krempa wrote:
> Forwarding the response from George-Cristian Bîrzan who initialy
> reported that:
>
> George-Cristian has a bunch of identical machines where some report
> having 4 NUMA cells and some just 1:
>
> [...]
>
> I did it on two hosts, one with 1 NUMA cell, one with 4 (as I said before,
> they both only report 12 cores though):
>
>
http://birzan.org/proc1.png
>
http://birzan.org/proc4.png
Both bitmaps show all 24 cores, so hwloc is able to read sysfs and
determine the existence of 2 sockets with 12 nodes each, and where the
12 nodes are numbered 0-5 twice according to which bank of cache they
are tied to. Which version of libvirt is this tested on where libvirt
was only reporting 12 cores, because I thought we already patched that
with commit 80533ca in 0.10.0. That is, I think proc1.png should result in:
Version of libvirt in case of proc1.png is 0.10.2 from fedora 17.
$ virsh nodeinfo
CPU model: x86_64
CPU(s): 24
CPU frequency: 2200 MHz
CPU socket(s): 2
Core(s) per socket: 12
Thread(s) per core: 1
NUMA cell(s): 1
Memory size: 8047272 KiB
Yes, that's what we should report in this case. The problem here is that
(as the hardware topology is identical to the second image in fact) some
of the cores have duplicate ID's and we are not able to detect that
correctly.
We would need a third level of hierarchy, where we would detect trheads
to be able to detect dupicate id's in a perfect manner.
and proc4.png would _ideally_ result in:
$ virsh nodeinfo
CPU model: x86_64
CPU(s): 24
CPU frequency: 2200 MHz
CPU socket(s): 2
Core(s) per socket: 12
Thread(s) per core: 1
NUMA cell(s): 4
Memory size: 8047272 KiB
Unfortunately, the output is : number of NUMA nodes, number of sockeds
per NUMA node, number of cores per socked and number of threads per
core. So the correct output should be:
4 nodes, 1 socket, 6 cores, 1 thread
except that virNodeGetInfo() is constrained by backwards
compatibility
to report 'nodes' == 1 on situations where sockets per node is not
integral (and here, half a socket per node is not integral), so it
_actually_ would give the same data as proc1.png.
or we can use this when somebody is providing inaccurate information.
>
> ------
>
> I think we should take this patch as it resolves this case. The data
> reported by kernel looks OK and the kernel probably trusts that
> everything is OK.
At any rate, I'm looking again at the patch, and the proposed
linux-test7/node data indeed shows a single NUMA cell with 24 cores
(matching up to the proc1.png image).
I think the CPU _is_ reporting the complete NUMA topology through sysfs,
but that we are probably consolidating information from the wrong files
and therefore getting confused.
It is, but there's a problem on other levels as the machine should be
identical to the one with 4 nodes. But the problem might be in the
machine firmware or a ton of other places.
I guess I need to install the linux-test7 files, then step through the
code to see what is actually happening.
To do this, you just need to patch one line in nodeinfo.c that defines
the default path. That's the way I'm doing it while testing.
Also, what does the 'virsh capabilities' report for the <topology>
section? Whereas 'virsh nodeinfo' is constrained by back-compat to give
a lame answer for number of NUMA cells, at least 'virsh capabilities'
should be showing a reasonable representation of the machine's topology.
capabilities were showing the topology as described in the picture
although it's not correct on that level either.
Peter