Ｔｈａｎｋｓ, Daniel.

So ｈｏｗ about:

for the NUMA format,
we still uses "memory" to describe the mcdram.
But we remove the cpus elements.
<numa>
<cell id='3' memory='8' unit='GiB'/> </numa>
<cell id='4' memory='8' unit='GiB'/> </numa>

At present, for this kind CPUless NUMA , we only support mcdram as memroy backend.

<domain>
...
<memoryBacking>
<mcdram nodeset="3-4"/>
</memoryBacking>
</domain>

And we reject a CPUless NUMA without memroy backend.
Maybe we will allow it in futures after qemu can handle it well.

A question:
1. Should libvirt probe the "host-nodes" for this kind of memory to make a smart map?

The qemu arguments will be as follow:
-object memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node3 \
-numa node,nodeid=3,memdev=node3 \

-object memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node4 \
-numa node,nodeid=4,memdev=node4 \

2. or we let user specify the host-nodes.
<memoryBacking>
<mcdram nodeset="3-4", host-nodes="0-1"/>
</memoryBacking>
</domain>

BR
ShaoHe Feng

On 2016年12月21日 18:25, Daniel P. Berrange wrote:

-----Original Message----- From: He Chen [mailto:he.chen@linux.intel.com] Sent: Friday, December 9, 2016 3:46 PM To: Daniel P. Berrange <berrange@redhat.com> Cc: libvir-list@redhat.com; Du, Dolpher <dolpher.du@intel.com>; Zyskowski, Robert <robert.zyskowski@intel.com>; Daniluk, Lukasz <lukasz.daniluk@intel.com>; Zang, Rui <rui.zang@intel.com>; jdenemar@redhat.com Subject: Re: [libvirt] [RFC] phi support in libvirt

Xeon Phi Knights Landing (KNL) contains 2 primary hardware features, one is up to 288 CPUs which needs patches to support and we are pushing it, the other is Multi-Channel DRAM (MCDRAM) which does not need any changes currently. Let me introduce more about MCDRAM, MCDRAM is on-package high-bandwidth memory (~500GB/s). On KNL platform, hardware expose MCDRAM as a seperate, CPUless and remote NUMA node to OS so that MCDRAM will not be allocated by default (since MCDRAM node has no CPU, every CPU regards MCDRAM node as remote node). In this way, MCDRAM can be reserved for certain specific applications.

Different from the traditional X86 server, There is a special numa node with Multi-Channel DRAM (MCDRAM) on Phi, but without any CPU . Now libvirt requires nonempty cpus argument for NUMA node, such as. <numa> <cell id='0' cpus='0-239' memory='80' unit='GiB'/> <cell id='1' cpus='240-243' memory='16' unit='GiB'/> </numa> In order to support phi virtualization, libvirt needs to allow a numa cell definition without 'cpu' attribution. Such as: <numa> <cell id='0' cpus='0-239' memory='80' unit='GiB'/> <cell id='1' memory='16' unit='GiB'/> </numa> When a cell without 'cpu', qemu will allocate memory by default MCDRAM

as the backing store for the guest Guest NUMA node shoulde be created with memory only (keep the same as host's) and the more important things is the memory should bind to (come from) host MCDRAM node.

So I suggest libvirt distinguish the MCDRAM And the MCDRAM numa config as follow, add a "mcdram" attribute for "cell" element: <numa> <cell id='1' mcdram='16' unit='GiB'/> </numa> <cell id='0' cpus='0-239' memory='80' unit='GiB'/>

No, that is not backwards compatible for applications using libvirt. We already have a place for storing info about memory backing type, which we use for huge pages. mcdram should use the same approach IMHO. eg <domain> ... <memoryBacking> <mcdram nodeset="3-4"/> </memoryBacking> </domain> to indicate that nodes 3 & 4 should use mcdram Regards, Daniel