On Fri, May 17, 2013 at 07:59:36PM +0800, Osier Yang wrote:
When either "cpuset" of <vcpu> is specified, or the
"placement" of
<vcpu> is "auto", only setting the cpuset.mems might cause the guest
starting to fail. E.g. ("placement" of both <vcpu> and <numatune>
is
"auto"):
1) Related XMLs
<vcpu placement='auto'>4</vcpu>
<numatune>
<memory mode='strict' placement='auto'/>
</numatune>
2) Host NUMA topology
% numactl --hardware
available: 8 nodes (0-7)
node 0 cpus: 0 4 8 12 16 20 24 28
node 0 size: 16374 MB
node 0 free: 11899 MB
node 1 cpus: 32 36 40 44 48 52 56 60
node 1 size: 16384 MB
node 1 free: 15318 MB
node 2 cpus: 2 6 10 14 18 22 26 30
node 2 size: 16384 MB
node 2 free: 15766 MB
node 3 cpus: 34 38 42 46 50 54 58 62
node 3 size: 16384 MB
node 3 free: 15347 MB
node 4 cpus: 3 7 11 15 19 23 27 31
node 4 size: 16384 MB
node 4 free: 15041 MB
node 5 cpus: 35 39 43 47 51 55 59 63
node 5 size: 16384 MB
node 5 free: 15202 MB
node 6 cpus: 1 5 9 13 17 21 25 29
node 6 size: 16384 MB
node 6 free: 15197 MB
node 7 cpus: 33 37 41 45 49 53 57 61
node 7 size: 16368 MB
node 7 free: 15669 MB
4) cpuset.cpus will be set as: (from debug log)
2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.cpus'
to '0-63'
5) The advisory nodeset got from querying numad (from debug log)
2013-05-09 16:50:17.295+0000: 417: debug : qemuProcessStart:3614 :
Nodeset returned from numad: 1
6) cpuset.mems will be set as: (from debug log)
2013-05-09 16:50:17.296+0000: 417: debug : virCgroupSetValueStr:331 :
Set value '/sys/fs/cgroup/cpuset/libvirt/qemu/toy/cpuset.mems'
to '0-7'
I.E, the domain process's memory is restricted on the first NUMA node,
however, it can use all of the CPUs, which will very likely cause the
domain process to fail to start because of the kernel fails to allocate
memory with the possible mismatching between CPU nodes and memory nodes.
This is only a problem if the kernel is forced to do allocation
from a memory node which matches the CPU node.
It is perfectly acceptable for the kernel to allocate memory from
a node that is different from the CPU node in general.
eg, it is the mode='strict' attribute in the XML above that causes
the bug.
@@ -665,9 +666,35 @@ qemuSetupCpusetCgroup(virDomainObjPtr vm,
}
}
+ if (vm->def->cpumask ||
+ (vm->def->placement_mode ==
+ VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO)) {
I think you should only be doing this if placement==auto *and*
mode=strict.
+ if (vm->def->placement_mode ==
+ VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO)
+ cpu_mask = virBitmapFormat(nodemask);
+ else
+ cpu_mask = virBitmapFormat(vm->def->cpumask);
+
+ if (!cpu_mask) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("failed to convert memory nodemask"));
+ goto cleanup;
+ }
+
+ rc = virCgroupSetCpusetCpus(priv->cgroup, cpu_mask);
+
+ if (rc != 0) {
+ virReportSystemError(-rc,
+ _("Unable to set cpuset.cpus for domain
%s"),
+ vm->def->name);
+ goto cleanup;
+ }
+ }
+
ret = 0;
cleanup:
- VIR_FREE(mask);
+ VIR_FREE(mem_mask);
+ VIR_FREE(cpu_mask);
return ret;
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|