Thanks, Daniel. 

So how about:

for the NUMA format,
we still uses "memory" to describe the mcdram.
But we remove the cpus elements.
<numa>
  <cell id='3' memory='8' unit='GiB'/> </numa>
  <cell id='4' memory='8' unit='GiB'/> </numa>

At present, for this kind CPUless NUMA ,  we only support mcdram as memroy backend.

<domain>
  ...
  <memoryBacking>
    <mcdram nodeset="3-4"/>
  </memoryBacking>
</domain>

And we reject a CPUless NUMA without memroy backend.
Maybe we will allow it in futures after qemu can handle it well.


A question:
1. Should libvirt probe the "host-nodes" for this kind of memory to make a smart map?

The qemu arguments will be as follow:
-object memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node3 \
-numa node,nodeid=3,memdev=node3 \

-object memory-backend-ram,size=8G,prealloc=yes,host-nodes=0,policy=bind,id=node4 \
-numa node,nodeid=4,memdev=node4 \


2. or we let user specify the host-nodes.
  <memoryBacking>
    <mcdram nodeset="3-4", host-nodes="0-1"/>
  </memoryBacking>
</domain>


BR
ShaoHe Feng

On 2016年12月21日 18:25, Daniel P. Berrange wrote:
On Wed, Dec 21, 2016 at 12:51:29PM +0800, Feng, Shaohe wrote:
Thanks.  Dolpher.

Reply inline.


On 2016年12月21日 11:56, Du, Dolpher wrote:
Shaohe was dropped from the loop, adding him back.

-----Original Message-----
From: He Chen [mailto:he.chen@linux.intel.com]
Sent: Friday, December 9, 2016 3:46 PM
To: Daniel P. Berrange <berrange@redhat.com>
Cc: libvir-list@redhat.com; Du, Dolpher <dolpher.du@intel.com>; Zyskowski,
Robert <robert.zyskowski@intel.com>; Daniluk, Lukasz
<lukasz.daniluk@intel.com>; Zang, Rui <rui.zang@intel.com>;
jdenemar@redhat.com
Subject: Re: [libvirt] [RFC] phi support in libvirt

On Mon, Dec 05, 2016 at 04:12:22PM +0000, Feng, Shaohe wrote:
Hi all:

As we are know Intel® Xeon phi targets high-performance computing and
other parallel workloads.
Now qemu has supported phi virtualization,it is time for libvirt to
support phi.
Can you provide pointer to the relevant QEMU changes.

Xeon Phi Knights Landing (KNL) contains 2 primary hardware features, one
is up to 288 CPUs which needs patches to support and we are pushing it,
the other is Multi-Channel DRAM (MCDRAM) which does not need any changes
currently.

Let me introduce more about MCDRAM, MCDRAM is on-package
high-bandwidth
memory (~500GB/s).

On KNL platform, hardware expose MCDRAM as a seperate, CPUless and
remote NUMA node to OS so that MCDRAM will not be allocated by default
(since MCDRAM node has no CPU, every CPU regards MCDRAM node as
remote
node). In this way, MCDRAM can be reserved for certain specific
applications.

Different from the traditional X86 server, There is a special numa
node with Multi-Channel DRAM (MCDRAM) on Phi, but without any CPU .

Now libvirt requires nonempty cpus argument for NUMA node, such as.
<numa>
   <cell id='0' cpus='0-239' memory='80' unit='GiB'/>
   <cell id='1' cpus='240-243' memory='16' unit='GiB'/> </numa>

In order to support phi virtualization, libvirt needs to allow a numa
cell definition without 'cpu' attribution.

Such as:
<numa>
   <cell id='0' cpus='0-239' memory='80' unit='GiB'/>
   <cell id='1' memory='16' unit='GiB'/> </numa>

When a cell without 'cpu', qemu will allocate memory by default MCDRAM
instead of DDR.
There's separate concepts at play which your description here is mixing up.

First is the question of whether the guest NUMA node can be created with
only RAM or CPUs, or a mix of both.
Second is the question of what kind of host RAM (MCDRAM vs DDR) is used
as the backing store for the guest
Guest NUMA node shoulde be created with memory only (keep the same as
host's) and the more important things is the memory should bind to (come
from) host MCDRAM node.
So I suggest libvirt distinguish the MCDRAM

And the MCDRAM numa config as follow, add a "mcdram" attribute for "cell"
element:
<numa>
  <cell id='1'  mcdram='16' unit='GiB'/> </numa>
  <cell id='0' cpus='0-239' memory='80' unit='GiB'/>
No, that is not backwards compatible for applications using libvirt.

We already have a place for storing info about memory backing type,
which we use for huge pages. mcdram should use the same approach
IMHO. eg

<domain>
  ...
  <memoryBacking>
    <mcdram nodeset="3-4"/>
  </memoryBacking>
</domain>

to indicate that nodes 3 & 4 should use mcdram

Regards,
Daniel