于 2011年05月06日 04:43, Bill Gray 写道:
Thanks for the feedback Lee!
One reason to use "membind" instead of "preferred" is that one can
prefer only a single node. For large guests, you can specify multiple
nodes with "membind". I think "preferred" would be preferred if it
allowed multiple nodes.
- Bill
Hi, Bill
Will "preferred" be still useful even if it only support single node?
Regards
Osier
On 05/05/2011 10:33 AM, Lee Schermerhorn wrote:
> On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote:
>> Hi, All,
>>
>> This is a simple implenmentation for NUMA tuning support based on binary
>> program 'numactl', currently only supports to bind memory to
>> specified nodes,
>> using option "--membind", perhaps it need to support more, but I'd
like
>> send it early so that could make sure if the principle is correct.
>>
>> Ideally, NUMA tuning support should be added in qemu-kvm first, such
>> as they could provide command options, then what we need to do in
>> libvirt
>> is just to pass the options to qemu-kvm, but unfortunately qemu-kvm
>> doesn't
>> support it yet, what we could do currently is only to use numactl,
>> it forks process, a bit expensive than qemu-kvm supports NUMA tuning
>> inside with libnuma, but it shouldn't affects much I guess.
>>
>> The NUMA tuning XML is like:
>>
>> <numatune>
>> <membind nodeset='+0-4,8-12'/>
>> </numatune>
>>
>> Any thoughts/feedback is appreciated.
>
> Osier:
>
> A couple of thoughts/observations:
>
> 1) you can accomplish the same thing -- restricting a domain's memory to
> a specified set of nodes -- using the cpuset cgroup that is already
> associated with each domain. E.g.,
>
> cgset -r cpuset.mems=<nodeset> /libvirt/qemu/<domain>
>
> Or the equivalent libcgroup call.
>
> However, numactl is more flexible; especially if you intend to support
> more policies: preferred, interleave. Which leads to the question:
>
> 2) Do you really want the full "membind" semantics as opposed to
> "preferred" by default? Membind policy will restrict the VMs pages to
> the specified nodeset and will initiate reclaim/stealing and wait for
> pages to become available or the task is OOM-killed because of mempolicy
> when all of the nodes in nodeset reach their minimum watermark. Membind
> works the same as cpuset.mems in this respect. Preferred policy will
> keep memory allocations [but not vcpu execution] local to the specified
> set of nodes as long as there is sufficient memory, and will silently
> "overflow" allocations to other nodes when necessary. I.e., it's a
> little more forgiving under memory pressure.
>
> But then pinning a VM's vcpus to the physical cpus of a set of nodes and
> retaining the default local allocation policy will have the same effect
> as "preferred" while ensuring that the VM component tasks execute
> locally to the memory footprint. Currently, I do this by looking up the
> cpulist associated with the node[s] from e.g.,
> /sys/devices/system/node/node<i>/cpulist and using that list with the
> vcpu.cpuset attribute. Adding a 'nodeset' attribute to the
> cputune.vcpupin element would simplify specifying that configuration.
>
> Regards,
> Lee
>
>