When the numatune memory mode is not "strict", the cpuset.mems
inherits the parent's setting, which causes problem like:
% virsh dumpxml rhel6_local | grep interleave -2
<vcpu placement='static'>2</vcpu>
<numatune>
<memory mode='interleave' nodeset='1-2'/>
</numatune>
<os>
% cat /proc/3713/status | grep Mems_allowed_list
Mems_allowed_list: 0-3
% virsh numatune rhel6_local
numa_mode : interleave
numa_nodeset : 0-3
Though the domain process's memory binding is set with libnuma
after the cgroup setting.
The reason for only allowing "strict" mode in current code is the
cpuset.mems doesn't understand the memory policy modes (interleave,
prefered, strict), it actually equals to the "strict" mode ("strict"
means the allocation will fail if the memory cannot be allocated on
the target node. Default operation is to fall back to other nodes.
From man numa(3)). However, writing the the cpuset.mems even if the
numatune memory mode is not strict should be better than the blind
inheritance anyway.
---
However, I'm not comfortable with the solution, since anyway the
modes except "strict" are not meaningful for cpuset.mems.
Another problem what I'm not sure about is: If the cpuset.cpus will
affect the libnuma setting? Assuming without this patch, domain
process's cpuset.mems will be set as '0-7' (8 NUMA nodes, each has 8
CPUs). And the numatune memory mode is "interleave", and libnuma set
the memory binding as "1-2". Even with this patch applied, setting
cpuset.mems as "1-2", any potential problem?
So this patch is mainly for raising up the problem, and to see if
guys have any opinions. @hutao, since these codes are from you, any
opinions/idea? Thanks.
---
src/qemu/qemu_cgroup.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c
index 33eebd7..22fe25b 100644
--- a/src/qemu/qemu_cgroup.c
+++ b/src/qemu/qemu_cgroup.c
@@ -597,11 +597,9 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
if (!virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET))
return 0;
- if ((vm->def->numatune.memory.nodemask ||
- (vm->def->numatune.memory.placement_mode ==
- VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)) &&
- vm->def->numatune.memory.mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT) {
-
+ if (vm->def->numatune.memory.nodemask ||
+ (vm->def->numatune.memory.placement_mode ==
+ VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)) {
if (vm->def->numatune.memory.placement_mode ==
VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO)
mem_mask = virBitmapFormat(nodemask);
@@ -614,6 +612,16 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
goto cleanup;
}
+ if (vm->def->numatune.memory.mode ==
+ VIR_DOMAIN_NUMATUNE_MEM_PREFERRED &&
+ strlen(mem_mask) != 1) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("NUMA memory tuning in 'preferred' mode "
+ "only supports single node"));
+ goto cleanup;
+
+ }
+
rc = virCgroupSetCpusetMems(priv->cgroup, mem_mask);
if (rc != 0) {
--
1.8.1.4