On Thu, May 05, 2011 at 10:33:46AM -0400, Lee Schermerhorn wrote:
On Thu, 2011-05-05 at 17:38 +0800, Osier Yang wrote:
> Hi, All,
>
> This is a simple implenmentation for NUMA tuning support based on binary
> program 'numactl', currently only supports to bind memory to specified
nodes,
> using option "--membind", perhaps it need to support more, but I'd
like
> send it early so that could make sure if the principle is correct.
>
> Ideally, NUMA tuning support should be added in qemu-kvm first, such
> as they could provide command options, then what we need to do in libvirt
> is just to pass the options to qemu-kvm, but unfortunately qemu-kvm doesn't
> support it yet, what we could do currently is only to use numactl,
> it forks process, a bit expensive than qemu-kvm supports NUMA tuning
> inside with libnuma, but it shouldn't affects much I guess.
>
> The NUMA tuning XML is like:
>
> <numatune>
> <membind nodeset='+0-4,8-12'/>
> </numatune>
>
> Any thoughts/feedback is appreciated.
Osier:
A couple of thoughts/observations:
1) you can accomplish the same thing -- restricting a domain's memory to
a specified set of nodes -- using the cpuset cgroup that is already
associated with each domain. E.g.,
cgset -r cpuset.mems=<nodeset> /libvirt/qemu/<domain>
Or the equivalent libcgroup call.
However, numactl is more flexible; especially if you intend to support
more policies: preferred, interleave. Which leads to the question:
2) Do you really want the full "membind" semantics as opposed to
"preferred" by default? Membind policy will restrict the VMs pages to
the specified nodeset and will initiate reclaim/stealing and wait for
pages to become available or the task is OOM-killed because of mempolicy
when all of the nodes in nodeset reach their minimum watermark. Membind
works the same as cpuset.mems in this respect. Preferred policy will
keep memory allocations [but not vcpu execution] local to the specified
set of nodes as long as there is sufficient memory, and will silently
"overflow" allocations to other nodes when necessary. I.e., it's a
little more forgiving under memory pressure.
I think we need to make the choice of strict binding, vs preferred
binding an XML tunable, since both options are valid.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|