Thanks, Daniel.
So how about:
for the NUMA format,
we still uses "memory" to describe the mcdram.
But we remove the cpus elements.
<numa>
<cell id='3' memory='8' unit='GiB'/> </numa>
<cell id='4' memory='8' unit='GiB'/> </numa>
At present, for this kind CPUless NUMA , we only support mcdram as
memroy backend.
<domain>
...
<memoryBacking>
<mcdram nodeset="3-4"/>
</memoryBacking>
</domain>
And we reject a CPUless NUMA without memroy backend.
Maybe we will allow it in futures after qemu can handle it well.
A question:
1. Should libvirt probe the "host-nodes" for this kind of memory to make
a smart map?
The qemu arguments will be as follow:
-object
memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node3 \
-numa node,nodeid=3,memdev=node3 \
-object
memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node4 \
-numa node,nodeid=4,memdev=node4 \
2. or we let user specify the host-nodes.
<memoryBacking>
<mcdram nodeset="3-4", host-nodes="0-1"/>
</memoryBacking>
</domain>
BR
ShaoHe Feng
On 2016年12月21日 18:25, Daniel P. Berrange wrote:
On Wed, Dec 21, 2016 at 12:51:29PM +0800, Feng, Shaohe wrote:
> Thanks. Dolpher.
>
> Reply inline.
>
>
> On 2016年12月21日 11:56, Du, Dolpher wrote:
>> Shaohe was dropped from the loop, adding him back.
>>
>>> -----Original Message-----
>>> From: He Chen [mailto:he.chen@linux.intel.com]
>>> Sent: Friday, December 9, 2016 3:46 PM
>>> To: Daniel P. Berrange <berrange(a)redhat.com>
>>> Cc: libvir-list(a)redhat.com; Du, Dolpher <dolpher.du(a)intel.com>;
Zyskowski,
>>> Robert <robert.zyskowski(a)intel.com>; Daniluk, Lukasz
>>> <lukasz.daniluk(a)intel.com>; Zang, Rui <rui.zang(a)intel.com>;
>>> jdenemar(a)redhat.com
>>> Subject: Re: [libvirt] [RFC] phi support in libvirt
>>>
>>>> On Mon, Dec 05, 2016 at 04:12:22PM +0000, Feng, Shaohe wrote:
>>>>> Hi all:
>>>>>
>>>>> As we are know Intel® Xeon phi targets high-performance computing
and
>>>>> other parallel workloads.
>>>>> Now qemu has supported phi virtualization,it is time for libvirt to
>>>>> support phi.
>>>> Can you provide pointer to the relevant QEMU changes.
>>>>
>>> Xeon Phi Knights Landing (KNL) contains 2 primary hardware features, one
>>> is up to 288 CPUs which needs patches to support and we are pushing it,
>>> the other is Multi-Channel DRAM (MCDRAM) which does not need any changes
>>> currently.
>>>
>>> Let me introduce more about MCDRAM, MCDRAM is on-package
>>> high-bandwidth
>>> memory (~500GB/s).
>>>
>>> On KNL platform, hardware expose MCDRAM as a seperate, CPUless and
>>> remote NUMA node to OS so that MCDRAM will not be allocated by default
>>> (since MCDRAM node has no CPU, every CPU regards MCDRAM node as
>>> remote
>>> node). In this way, MCDRAM can be reserved for certain specific
>>> applications.
>>>
>>>>> Different from the traditional X86 server, There is a special numa
>>>>> node with Multi-Channel DRAM (MCDRAM) on Phi, but without any CPU .
>>>>>
>>>>> Now libvirt requires nonempty cpus argument for NUMA node, such as.
>>>>> <numa>
>>>>> <cell id='0' cpus='0-239' memory='80'
unit='GiB'/>
>>>>> <cell id='1' cpus='240-243'
memory='16' unit='GiB'/> </numa>
>>>>>
>>>>> In order to support phi virtualization, libvirt needs to allow a
numa
>>>>> cell definition without 'cpu' attribution.
>>>>>
>>>>> Such as:
>>>>> <numa>
>>>>> <cell id='0' cpus='0-239' memory='80'
unit='GiB'/>
>>>>> <cell id='1' memory='16'
unit='GiB'/> </numa>
>>>>>
>>>>> When a cell without 'cpu', qemu will allocate memory by
default MCDRAM
>>> instead of DDR.
>>>> There's separate concepts at play which your description here is
mixing up.
>>>>
>>>> First is the question of whether the guest NUMA node can be created with
>>> only RAM or CPUs, or a mix of both.
>>>> Second is the question of what kind of host RAM (MCDRAM vs DDR) is used
>>> as the backing store for the guest
>>> Guest NUMA node shoulde be created with memory only (keep the same as
>>> host's) and the more important things is the memory should bind to (come
>>> from) host MCDRAM node.
> So I suggest libvirt distinguish the MCDRAM
>
> And the MCDRAM numa config as follow, add a "mcdram" attribute for
"cell"
> element:
> <numa>
> <cell id='1' mcdram='16' unit='GiB'/>
</numa>
> <cell id='0' cpus='0-239' memory='80'
unit='GiB'/>
No, that is not backwards compatible for applications using libvirt.
We already have a place for storing info about memory backing type,
which we use for huge pages. mcdram should use the same approach
IMHO. eg
<domain>
...
<memoryBacking>
<mcdram nodeset="3-4"/>
</memoryBacking>
</domain>
to indicate that nodes 3 & 4 should use mcdram
Regards,
Daniel