On 01/16/2013 04:30 PM, Daniel P. Berrange wrote:
On Wed, Jan 16, 2013 at 02:15:37PM -0500, Peter Krempa wrote:
> ----- Original Message -----
> From: Daniel P. Berrange <berrange(a)redhat.com>
> To: Peter Krempa <pkrempa(a)redhat.com>
> Cc: Jiri Denemark <jdenemar(a)redhat.com>, Amador Pahim
<apahim(a)redhat.com>, libvirt-list(a)redhat.com, dougsland(a)redhat.com
> Sent: Wed, 16 Jan 2013 13:39:28 -0500 (EST)
> Subject: Re: [libvirt] [RFC] Data in the <topology> element in the capabilities
XML
>
> On Wed, Jan 16, 2013 at 07:31:02PM +0100, Peter Krempa wrote:
>> On 01/16/13 19:11, Daniel P. Berrange wrote:
>>> On Wed, Jan 16, 2013 at 05:28:57PM +0100, Peter Krempa wrote:
>>>> Hi everybody,
>>>>
>>>> a while ago there was a discussion about changing the data that is
>>>> returned in the <topology> sub-element:
>>>>
>>>> <capabilities>
>>>> <host>
>>>> <cpu>
>>>> <arch>x86_64</arch>
>>>> <model>SandyBridge</model>
>>>> <vendor>Intel</vendor>
>>>> <topology sockets='1' cores='2'
threads='2'/>
>>>>
>>>>
>>>> The data provided here is as of today taken from the nodeinfo
>>>> detection code and thus is really wrong when the fallback mechanisms
>>>> are used.
>>>>
>>>> To get a useful count, the user has to multiply the data by the
>>>> number of NUMA nodes in the host. With the fallback detection code
>>>> used for nodeinfo the NUMA node count used to get the CPU count
>>>> should be 1 instead of the actual number.
>>>>
>>>> As Jiri proposed, I think we should change this output to separate
>>>> detection code that will not take into account NUMA nodes for this
>>>> output and will rather provide data as the "lspci" command
does.
>>>>
>>>> This change will make the data provided by the element standalone
>>>> and also usable in guest XMLs to mirror host's topology.
>>> Well there are 2 parts which need to be considered here. What do we report
>>> in the host capabilities, and how do you configure guest XML.
>>>
>>> From a historical compatibility pov I don't think we should be changing
>>> the host capabilities at all. Simply document that 'sockets' is
treated
>>> as sockets-per-node everywhere, and that it is wrong in the case of
>>> machines where an socket can internally have multiple NUMA nodes.
>> I'm too somewhat concerned about changing this output due to
>> historic reasons.
>>> Apps should be using the separate NUMA <topology> data in the
capabilities
>>> instead of the CPU <topology> data, to get accurate CPU counts.
>> From the NUMA <topology> the management apps can't tell if the CPU
>> is a core or a thread. For example oVirt/VDSM bases the decisions on
>> this information.
> Then, we should add information to the NUMA topology XML to indicate
> which of the child <cpu> elements are sibling cores or threads.
>
> Perhaps add a 'socket_id' + 'core_id' attribute to every
<cpu>.
> In this case, we will also need to add the thread siblings and
> perhaps even core siblings information to allow reliable detection.
The combination fo core_id/socket_id lets you determine that. If two
core have the same socket_id then they are cores or threads within the
same socket. If two <cpu> have the same socket_id & core_id then they
are threads within the same core.
Not true to AMD Magny-Cours 6100 series, where different cores can share
the same physical_id and core_id. And they are not threads. This
processors has two numa nodes inside the same "package" (aka socket) and
they shares the same core ID set. Annoying.