On 20/05/13 19:18, Daniel P. Berrange wrote:
On Fri, May 17, 2013 at 07:59:36PM +0800, Osier Yang wrote:
> When either "cpuset" of <vcpu> is specified, or the
"placement" of
> <vcpu> is "auto", only setting the cpuset.mems might cause the guest
> starting to fail. E.g. ("placement" of both <vcpu> and
<numatune> is
> "auto"):
>
> 1) Related XMLs
> <vcpu placement='auto'>4</vcpu>
> <numatune>
> <memory mode='strict' placement='auto'/>
> </numatune>
>
> 2) Host NUMA topology
> % numactl --hardware
> available: 8 nodes (0-7)
> node 0 cpus: 0 4 8 12 16 20 24 28
> node 0 size: 16374 MB
> node 0 free: 11899 MB
> node 1 cpus: 32 36 40 44 48 52 56 60
> node 1 size: 16384 MB
> node 1 free: 15318 MB
> node 2 cpus: 2 6 10 14 18 22 26 30
> node 2 size: 16384 MB
> node 2 free: 15766 MB
> node 3 cpus: 34 38 42 46 50 54 58 62
> node 3 size: 16384 MB
> node 3 free: 15347 MB
> node 4 cpus: 3 7 11 15 19 23 27 31
> node 4 size: 16384 MB
> node 4 free: 15041 MB
> node 5 cpus: 35 39 43 47 51 55 59 63
> node 5 size: 16384 MB
> node 5 free: 15202 MB
> node 6 cpus: 1 5 9 13 17 21 25 29
> node 6 size: 16384 MB
> node 6 free: 15197 MB
> node 7 cpus: 33 37 41 45 49 53 57 61
> node 7 size: 16368 MB
> node 7 free: 15669 MB
>
> 4) cpuset.cpus will be set as: (from debug log)
>
> 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
> Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
> to '0-63'
>
> 5) The advisory nodeset got from querying numad (from debug log)
>
> 2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
> Nodeset returned from numad: 1
>
> 6) cpuset.mems will be set as: (from debug log)
>
> 2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
> Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
> to '0-7'
>
> I.E, the domain process's memory is restricted on the first NUMA node,
> however, it can use all of the CPUs, which will very likely cause the
> domain process to fail to start because of the kernel fails to allocate
> memory with the possible mismatching between CPU nodes and memory nodes.
This is only a problem if the kernel is forced to do allocation
from a memory node which matches the CPU node.
It is perfectly acceptable for the kernel to allocate memory from
a node that is different from the CPU node in general.
eg, it is the mode='strict' attribute in the XML above that causes
the bug.
I can update the commit log to explain it more clearly.
> @@ -665,9 +666,35 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
> }
> }
>
> + if (vm->def->cpumask ||
> + (vm->def->placement_mode ==
> + VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO)) {
I think you should only be doing this if placement==auto *and*
mode=strict.
There is no "mode" for cpu. See:
http://libvirt.org/formatdomain.html#elementsCPUAllocation
> + if (vm->def->placement_mode ==
> + VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO)
> + cpu_mask = virBitmapFormat(nodemask);
> + else
> + cpu_mask = virBitmapFormat(vm->def->cpumask);
> +
> + if (!cpu_mask) {
> + virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> + _("failed to convert memory nodemask"));
> + goto cleanup;
> + }
> +
> + rc = virCgroupSetCpusetCpus(priv->cgroup, cpu_mask);
> +
> + if (rc != 0) {
> + virReportSystemError(-rc,
> + _("Unable to set cpuset.cpus for domain
%s"),
> + vm->def->name);
> + goto cleanup;
> + }
> + }
> +
> ret = 0;
> cleanup:
> - VIR_FREE(mask);
> + VIR_FREE(mem_mask);
> + VIR_FREE(cpu_mask);
> return ret;
Daniel