On Thu, Jan 17, 2013 at 12:12:35AM +0100, Peter Krempa wrote:
On 01/16/13 21:24, Daniel P. Berrange wrote:
>On Wed, Jan 16, 2013 at 05:06:21PM -0300, Amador Pahim wrote:
>>On 01/16/2013 04:30 PM, Daniel P. Berrange wrote:
>>>On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote:
>>>>----- Original Message -----
>>>>From: Daniel P. Berrange <berrange(a)redhat.com>
>>>>To: Peter Krempa <pkrempa(a)redhat.com>
>>>>Cc: Jiri Denemark <jdenemar(a)redhat.com>, Amador Pahim
<apahim(a)redhat.com>, libvirt-list(a)redhat.com, dougsland(a)redhat.com
>>>>Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
>>>>Subject: Re: [libvirt] [RFC] Data in the <topology> element in
the capabilities XML
>>>>
>>>>On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:
>>>>>On 01/16/13 19:11, Daniel P. Berrange wrote:
>>>>>>On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
>>>>>>>Hi everybody,
>>>>>>>
>>>>>>>a while ago there was a discussion about changing the data
that is
>>>>>>>returned in the <topology> sub-element:
>>>>>>>
>>>>>>><capabilities>
>>>>>>><host>
>>>>>>><cpu>
>>>>>>><arch>x86_64</arch>
>>>>>>><model>SandyBridge</model>
>>>>>>><vendor>Intel</vendor>
>>>>>>><topology sockets='1' cores='2'
threads='2'/>
>>>>>>>
>>>>>>>
>>>>>>>The data provided here is as of today taken from the
nodeinfo
>>>>>>>detection code and thus is really wrong when the fallback
mechanisms
>>>>>>>are used.
>>>>>>>
>>>>>>>To get a useful count, the user has to multiply the data by
the
>>>>>>>number of NUMA nodes in the host. With the fallback detection
code
>>>>>>>used for nodeinfo the NUMA node count used to get the CPU
count
>>>>>>>should be 1 instead of the actual number.
>>>>>>>
>>>>>>>As Jiri proposed, I think we should change this output to
separate
>>>>>>>detection code that will not take into account NUMA nodes for
this
>>>>>>>output and will rather provide data as the "lspci"
command does.
>>>>>>>
>>>>>>>This change will make the data provided by the element
standalone
>>>>>>>and also usable in guest XMLs to mirror host's topology.
>>>>>>Well there are 2 parts which need to be considered here. What do
we report
>>>>>>in the host capabilities, and how do you configure guest XML.
>>>>>>
>>>>>> From a historical compatibility pov I don't think we should
be changing
>>>>>>the host capabilities at all. Simply document that
'sockets' is treated
>>>>>>as sockets-per-node everywhere, and that it is wrong in the case
of
>>>>>>machines where an socket can internally have multiple NUMA
nodes.
>>>>>I'm too somewhat concerned about changing this output due to
>>>>>historic reasons.
>>>>>>Apps should be using the separate NUMA <topology> data in
the capabilities
>>>>>>instead of the CPU <topology> data, to get accurate CPU
counts.
>>>>> From the NUMA <topology> the management apps can't tell if
the CPU
>>>>>is a core or a thread. For example oVirt/VDSM bases the decisions on
>>>>>this information.
>>>>Then, we should add information to the NUMA topology XML to indicate
>>>>which of the child <cpu> elements are sibling cores or threads.
>>>>
>>>>Perhaps add a 'socket_id' + 'core_id' attribute to every
<cpu>.
>>>
>>>>In this case, we will also need to add the thread siblings and
>>>>perhaps even core siblings information to allow reliable detection.
>>>The combination fo core_id/socket_id lets you determine that. If two
>>>core have the same socket_id then they are cores or threads within the
>>>same socket. If two <cpu> have the same socket_id & core_id then
they
>>>are threads within the same core.
>>
>>Not true to AMD Magny-Cours 6100 series, where different cores can
>>share the same physical_id and core_id. And they are not threads.
>>This processors has two numa nodes inside the same "package" (aka
>>socket) and they shares the same core ID set. Annoying.
>
>I don't believe there's a problem with that. This example XML
>shows a machine with 4 NUMA nodes, 2 sockets each containing
>2 cores, and 2 threads, giving 16 logical CPUs
>
> <topology>
> <cells num='4'>
> <cell id='0'>
> <cpus num='4'>
> <cpu id='0' socket_id='0' core_id='0'/>
> <cpu id='1' socket_id='0' core_id='0'/>
> <cpu id='2' socket_id='0' core_id='1'/>
> <cpu id='3' socket_id='0' core_id='1'/>
> </cpus>
> </cell>
> <cell id='1'>
> <cpus num='2'>
> <cpu id='4' socket_id='0' core_id='0'/>
> <cpu id='5' socket_id='0' core_id='0'/>
> <cpu id='6' socket_id='0' core_id='1'/>
> <cpu id='7' socket_id='0' core_id='1'/>
> </cpus>
> </cell>
> <cell id='2'>
> <cpus num='2'>
> <cpu id='8' socket_id='1' core_id='0'/>
> <cpu id='9' socket_id='1' core_id='0'/>
> <cpu id='10' socket_id='1' core_id='1'/>
> <cpu id='11' socket_id='1' core_id='1'/>
> </cpus>
> </cell>
> <cell id='3'>
> <cpus num='2'>
> <cpu id='12' socket_id='1' core_id='0'/>
> <cpu id='13' socket_id='1' core_id='0'/>
> <cpu id='14' socket_id='1' core_id='1'/>
> <cpu id='15' socket_id='1' core_id='1'/>
> </cpus>
> </cell>
> </cells>
> </topology>
>
>I believe there's enough info there to determine all the co-location
>aspects of all the sockets/core/threads involved.
Well not for all machines in the wild out there. This is a very
similar approach that libvirt uses now to detect the topology and it
is not enough to detect threads on AMD Bulldozer as the cpus
corresponding to the threads have different core_id's (they are also
considered as cores from the perspective of the kernel). This is
unfortunate for the virtualization management tools as oVirt that
still consider the AMD Bulldozer "module" as a 1 core with two
threads, even if it registers as two cores.
For AMD Bulldozer to be detected correctly, we would need to expose
the thread_id's along with thread siblings information to determine
the two threads belonging together.
NB, the socket_id / core_id values in the above XML are *not* intended
to be anyway related to similarly named values in /proc/cpuinfo. They
are values libvirt assigns to show the topology accurately.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|