On Thu, Nov 17, 2011 at 05:44:10PM +0800, Hu Tao wrote:
This series does mainly two things:
1. use cgroup cpuset to manage numa parameters
2. add a virsh command numatune to allow user to change numa parameters
from command line
Current numa parameters include nodeset and mode, but these cgroup cpuset
provides don't completely match with them, details:
params cpuset
------------------------------------------------------
nodeset cpuset provides cpuset.mems
mode strict cpuset provides cpuset.mem_hardwall
mode interleave cpuset provices cpuset.memory_spread_*
mode preferred no equivalent. !spread to preferred?
This isn't right - there are only 3 existing configs in the
XML currently, current 'strict' does not map to mem_hardwall,
nor does interleave map to memory_spread AFAICT
Currently we have have three different configurations possible
for memory with the following semantics
mode=strict - allocation is from designated nodes, or fails
mode=preferred - allocation is from designated nodes, or falls back to other nodes
mode=interleave - allocation is interleaved across designated nodes
In cgroups cpuset controller you can set
cpuset.mems - memory is allocated from designated nodes, or fails
cpuset.mem_exclusive - no other cgroups, except parents, or children
can allocation from nos listed in cpuset.mems
cpuset.mem_hardwall - no other cgroups are allowed to allocate from
the nodes listed in cpuset.mems
cpuset.memory_spread* - control allocations of internal kernel data structures
IMHO, the last three are not really required for libvirt per VM
usage - the management application can trivially decide whether
to allow overlapping allocation between VMs without needing to
set this kernel tunable.
So, if using the cgroups cpuset controller for NUMA, the *only*
policy we can implement is mode=strict. We cannot implement
mode=preferred or mode=interleave, given the currently available
cpuset controls.
IMHO, we should thus continue to use libnuma for specifying *all*
the policies, however, if mode=strict, then we should *also* apply
the policy in the cgroups using cpuset.mems since this will at
least allow later tuning of nodemask on the fly.
We will have to refuse any attempt to switch between different modes
on the fly. Only the nodemask, with mode=strict will be dynamically
changable.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|