[libvirt] [RFC][PATCHv2 00/11] add numatune command

This series does mainly two things: 1. use cgroup cpuset to manage numa parameters 2. add a virsh command numatune to allow user to change numa parameters from command line Current numa parameters include nodeset and mode, but these cgroup cpuset provides don't completely match with them, details: params cpuset ------------------------------------------------------ nodeset cpuset provides cpuset.mems mode strict cpuset provides cpuset.mem_hardwall mode interleave cpuset provices cpuset.memory_spread_* mode preferred no equivalent. !spread to preferred? Besides, only one of the mode can be set currently at a time, but for cpuset, the parameters are independent. From the perspective of cpuset, we can set all the modes values independently, but it seems not be consistent with the current numatune definition in xml. Maybe we can improve the xml definition to fit cpuset better?(there are more cpuset parameters than listed above) Hu Tao (11): don't modify CPU set string in virDomainCpuSetParse enable cgroup cpuset by default Add functions to set/get cgroup cpuset parameters introduce numa backend use cpuset to manage numa add VIR_DOMAIN_NUMATUNE_MEM_NONE add new API virDomain{G,S}etNumaParameters Implement main entries of virDomain{G,S}etNumaParameters Add virDomain{G,S}etNumaParameters support to the remote driver Implement virDomain{G,S}etNumaParameters for the qemu driver add new command numatune to virsh daemon/remote.c | 64 +++++++ include/libvirt/libvirt.h.in | 23 +++ python/generator.py | 2 + src/conf/domain_conf.c | 2 +- src/conf/domain_conf.h | 9 + src/driver.h | 15 ++ src/libvirt.c | 113 ++++++++++++ src/libvirt_private.syms | 10 + src/libvirt_public.syms | 6 + src/qemu/qemu.conf | 5 +- src/qemu/qemu_cgroup.c | 73 ++++++++ src/qemu/qemu_conf.c | 3 +- src/qemu/qemu_driver.c | 399 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 11 +- src/remote/remote_driver.c | 50 ++++++ src/remote/remote_protocol.x | 25 +++- src/remote_protocol-structs | 16 ++ src/util/cgroup.c | 112 ++++++++++++ src/util/cgroup.h | 11 ++ tools/virsh.c | 180 +++++++++++++++++++ 20 files changed, 1121 insertions(+), 8 deletions(-) -- 1.7.3.1

As the function only parses the CPU set string, there is no good reason to modify it. --- src/conf/domain_conf.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 6b78d97..71eb209 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -9163,7 +9163,6 @@ virDomainCpuSetParse(const char **str, char sep, } else goto parse_error; } - *str = cur; return (ret); parse_error: -- 1.7.3.1

On Thu, Nov 17, 2011 at 05:44:11PM +0800, Hu Tao wrote:
As the function only parses the CPU set string, there is no good reason to modify it.
- You need to reflect this change in the comment too. - My patchset for guest NUMA specification which uses virDomainCpuSetParse() is upstream now. The call site in virCPUDefParseXML() can be simplified after this change. Regards, Bharata.

None of the callers cared if str was updated to point to the next byte after the parsed cpuset; simplifying this results in quite a few code simplifications. Additionally, virCPUDefParseXML was strdup()'ing a malloc()'d string; avoiding a memory copy resulted in less code. * src/conf/domain_conf.h (virDomainCpuSetParse): Alter signature. * src/conf/domain_conf.c (virDomainCpuSetParse): Don't modify str. (virDomainVcpuPinDefParseXML, virDomainDefParseXML): Adjust callers. * src/conf/cpu_conf.c (virCPUDefParseXML): Likewise. * src/xen/xend_internal.c (sexpr_to_xend_topology): Likewise. * src/xen/xm_internal.c (xenXMDomainPinVcpu): Likewise. * src/xenxs/xen_sxpr.c (xenParseSxpr): Likewise. * src/xenxs/xen_xm.c (xenParseXM): Likewise. --- I'm only proposing this for patch 1/11; 2-11 of v2 might have to change slightly to rebase on top of this. v3: change parameter type and simplify callers, now that we know no one cares about adjusting the parsed string pointer. src/conf/cpu_conf.c | 26 +++++++------------------- src/conf/domain_conf.c | 17 ++++++----------- src/conf/domain_conf.h | 2 +- src/xen/xend_internal.c | 2 +- src/xen/xm_internal.c | 3 +-- src/xenxs/xen_sxpr.c | 3 +-- src/xenxs/xen_xm.c | 2 +- 7 files changed, 18 insertions(+), 37 deletions(-) diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index 7c5bf69..2882389 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -311,42 +311,32 @@ virCPUDefParseXML(const xmlNodePtr node, def->ncells = n; for (i = 0 ; i < n ; i++) { - char *cpus, *cpus_parse, *memory; + char *cpus, *memory; int cpumasklen = VIR_DOMAIN_CPUMASK_LEN; int ret, ncpus = 0; def->cells[i].cellid = i; - cpus = cpus_parse = virXMLPropString(nodes[i], "cpus"); + cpus = virXMLPropString(nodes[i], "cpus"); if (!cpus) { virCPUReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Missing 'cpus' attribute in NUMA cell")); goto error; } + def->cells[i].cpustr = cpus; - def->cells[i].cpustr = strdup(cpus); - if (!def->cells[i].cpustr) { - VIR_FREE(cpus); + if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen) < 0) goto no_memory; - } - if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen) < 0) { - VIR_FREE(cpus); - goto no_memory; - } - - ncpus = virDomainCpuSetParse((const char **)&cpus_parse, - 0, def->cells[i].cpumask, cpumasklen); - if (ncpus <= 0) { - VIR_FREE(cpus); + ncpus = virDomainCpuSetParse(cpus, 0, def->cells[i].cpumask, + cpumasklen); + if (ncpus <= 0) goto error; - } def->cells_cpus += ncpus; memory = virXMLPropString(nodes[i], "memory"); if (!memory) { virCPUReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Missing 'memory' attribute in NUMA cell")); - VIR_FREE(cpus); goto error; } @@ -354,11 +344,9 @@ virCPUDefParseXML(const xmlNodePtr node, if (ret == -1) { virCPUReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Invalid 'memory' attribute in NUMA cell")); - VIR_FREE(cpus); VIR_FREE(memory); goto error; } - VIR_FREE(cpus); VIR_FREE(memory); } } diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 452c8b2..0620bb3 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -6560,8 +6560,7 @@ virDomainVcpuPinDefParseXML(const xmlNodePtr node, virReportOOMError(); goto error; } - if (virDomainCpuSetParse((const char **)&set, - 0, def->cpumask, + if (virDomainCpuSetParse(set, 0, def->cpumask, cpumasklen) < 0) goto error; VIR_FREE(tmp); @@ -6774,8 +6773,7 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, if (VIR_ALLOC_N(def->cpumask, def->cpumasklen) < 0) { goto no_memory; } - if (virDomainCpuSetParse((const char **)&set, - 0, def->cpumask, + if (virDomainCpuSetParse(set, 0, def->cpumask, def->cpumasklen) < 0) goto error; VIR_FREE(tmp); @@ -6845,8 +6843,7 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, } /* "nodeset" leads same syntax with "cpuset". */ - if (virDomainCpuSetParse((const char **)&set, - 0, def->numatune.memory.nodemask, + if (virDomainCpuSetParse(set, 0, def->numatune.memory.nodemask, nodemasklen) < 0) goto error; VIR_FREE(tmp); @@ -9079,7 +9076,7 @@ virDomainCpuSetFormat(char *cpuset, int maxcpu) /** * virDomainCpuSetParse: * @conn: connection - * @str: pointer to a CPU set string pointer + * @str: a CPU set string pointer * @sep: potential character used to mark the end of string if not 0 * @cpuset: pointer to a char array for the CPU set * @maxcpu: number of elements available in @cpuset @@ -9090,10 +9087,9 @@ virDomainCpuSetFormat(char *cpuset, int maxcpu) * * Returns the number of CPU found in that set, or -1 in case of error. * @cpuset is modified accordingly to the value parsed. - * @str is updated to the end of the part parsed */ int -virDomainCpuSetParse(const char **str, char sep, +virDomainCpuSetParse(const char *str, char sep, char *cpuset, int maxcpu) { const char *cur; @@ -9105,7 +9101,7 @@ virDomainCpuSetParse(const char **str, char sep, (maxcpu > 100000)) return (-1); - cur = *str; + cur = str; virSkipSpaces(&cur); if (*cur == 0) goto parse_error; @@ -9170,7 +9166,6 @@ virDomainCpuSetParse(const char **str, char sep, } else goto parse_error; } - *str = cur; return (ret); parse_error: diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 4e86d30..4a19dd2 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1745,7 +1745,7 @@ int virDomainDefFormatInternal(virDomainDefPtr def, unsigned int flags, virBufferPtr buf); -int virDomainCpuSetParse(const char **str, +int virDomainCpuSetParse(const char *str, char sep, char *cpuset, int maxcpu); diff --git a/src/xen/xend_internal.c b/src/xen/xend_internal.c index 80a2472..2318565 100644 --- a/src/xen/xend_internal.c +++ b/src/xen/xend_internal.c @@ -1153,7 +1153,7 @@ sexpr_to_xend_topology(const struct sexpr *root, for (cpu = 0; cpu < numCpus; cpu++) cpuset[cpu] = 0; } else { - nb_cpus = virDomainCpuSetParse(&cur, 'n', cpuset, numCpus); + nb_cpus = virDomainCpuSetParse(cur, 'n', cpuset, numCpus); if (nb_cpus < 0) goto error; } diff --git a/src/xen/xm_internal.c b/src/xen/xm_internal.c index 24311a7..a250b6f 100644 --- a/src/xen/xm_internal.c +++ b/src/xen/xm_internal.c @@ -893,8 +893,7 @@ int xenXMDomainPinVcpu(virDomainPtr domain, virReportOOMError(); goto cleanup; } - if (virDomainCpuSetParse((const char **)&mapstr, 0, - cpuset, maxcpu) < 0) + if (virDomainCpuSetParse(mapstr, 0, cpuset, maxcpu) < 0) goto cleanup; VIR_FREE(entry->def->cpumask); diff --git a/src/xenxs/xen_sxpr.c b/src/xenxs/xen_sxpr.c index e0bc043..124c369 100644 --- a/src/xenxs/xen_sxpr.c +++ b/src/xenxs/xen_sxpr.c @@ -1140,8 +1140,7 @@ xenParseSxpr(const struct sexpr *root, goto error; } - if (virDomainCpuSetParse(&cpus, - 0, def->cpumask, + if (virDomainCpuSetParse(cpus, 0, def->cpumask, def->cpumasklen) < 0) { XENXS_ERROR(VIR_ERR_INTERNAL_ERROR, _("invalid CPU mask %s"), cpus); diff --git a/src/xenxs/xen_xm.c b/src/xenxs/xen_xm.c index a2ef8c8..b415ce8 100644 --- a/src/xenxs/xen_xm.c +++ b/src/xenxs/xen_xm.c @@ -338,7 +338,7 @@ xenParseXM(virConfPtr conf, int xendConfigVersion, if (VIR_ALLOC_N(def->cpumask, def->cpumasklen) < 0) goto no_memory; - if (virDomainCpuSetParse(&str, 0, + if (virDomainCpuSetParse(str, 0, def->cpumask, def->cpumasklen) < 0) goto cleanup; } -- 1.7.7.1

On Fri, Nov 18, 2011 at 11:31:12AM -0700, Eric Blake wrote:
None of the callers cared if str was updated to point to the next byte after the parsed cpuset; simplifying this results in quite a few code simplifications. Additionally, virCPUDefParseXML was strdup()'ing a malloc()'d string; avoiding a memory copy resulted in less code.
Changes to virCPUDefParseXML look good. <numa> ... </numa> XML section is parsed correctly and qemu -numa options are generated correctly after this change. Regards, Bharata.

On 11/18/2011 09:41 PM, Bharata B Rao wrote:
On Fri, Nov 18, 2011 at 11:31:12AM -0700, Eric Blake wrote:
None of the callers cared if str was updated to point to the next byte after the parsed cpuset; simplifying this results in quite a few code simplifications. Additionally, virCPUDefParseXML was strdup()'ing a malloc()'d string; avoiding a memory copy resulted in less code.
Changes to virCPUDefParseXML look good. <numa> ... </numa> XML section is parsed correctly and qemu -numa options are generated correctly after this change.
Thanks for the review; I've pushed this patch now. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Nov 17, 2011 at 05:44:11PM +0800, Hu Tao wrote:
As the function only parses the CPU set string, there is no good reason to modify it. --- src/conf/domain_conf.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 6b78d97..71eb209 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -9163,7 +9163,6 @@ virDomainCpuSetParse(const char **str, char sep, } else goto parse_error; } - *str = cur; return (ret);
parse_error:
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 11/18/2011 03:35 AM, Daniel P. Berrange wrote:
On Thu, Nov 17, 2011 at 05:44:11PM +0800, Hu Tao wrote:
As the function only parses the CPU set string, there is no good reason to modify it. --- src/conf/domain_conf.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 6b78d97..71eb209 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -9163,7 +9163,6 @@ virDomainCpuSetParse(const char **str, char sep, } else goto parse_error; } - *str = cur; return (ret);
parse_error:
ACK
NACK. I audited all existing callers to ensure that they really didn't care if str wasn't updated; xend_internal.c:sexpr_to_xend_topology() was the trickiest, but I'm convinced it was safe (you might get one more iteration of the outer while(*cur != 0) loop than previously, but no one used the modified cur before anyone else modified cur again in a manner that would work from either the old or the new position). Better would be to modify the function signature, adjust all callers (as proof that our audit was sane), and as a side-effect, get rid of some nasty casting. Alternative v3 patch coming up. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This prepares for subsequent patches which introduce dependence on cgroup cpuset. Enable cgroup cpuset by default so users don't have to modify configuration file before encountering a cpuset error. --- src/qemu/qemu.conf | 5 +++-- src/qemu/qemu_conf.c | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 4da5d5a..1c14d8f 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -157,18 +157,19 @@ # - 'devices' - use for device whitelisting # - 'memory' - use for memory tunables # - 'blkio' - use for block devices I/O tunables +# - 'cpuset' - use for CPUs and memory nodes # # NB, even if configured here, they won't be used unless # the administrator has mounted cgroups, e.g.: # # mkdir /dev/cgroup -# mount -t cgroup -o devices,cpu,memory,blkio none /dev/cgroup +# mount -t cgroup -o devices,cpu,memory,blkio,cpuset none /dev/cgroup # # They can be mounted anywhere, and different controllers # can be mounted in different locations. libvirt will detect # where they are located. # -# cgroup_controllers = [ "cpu", "devices", "memory", "blkio" ] +# cgroup_controllers = [ "cpu", "devices", "memory", "blkio", "cpuset" ] # This is the basic set of devices allowed / required by # all virtual machines. diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index d7e9d7d..dfcc6e3 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -304,7 +304,8 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, (1 << VIR_CGROUP_CONTROLLER_CPU) | (1 << VIR_CGROUP_CONTROLLER_DEVICES) | (1 << VIR_CGROUP_CONTROLLER_MEMORY) | - (1 << VIR_CGROUP_CONTROLLER_BLKIO); + (1 << VIR_CGROUP_CONTROLLER_BLKIO) | + (1 << VIR_CGROUP_CONTROLLER_CPUSET); } for (i = 0 ; i < VIR_CGROUP_CONTROLLER_LAST ; i++) { if (driver->cgroupControllers & (1 << i)) { -- 1.7.3.1

On Thu, Nov 17, 2011 at 05:44:12PM +0800, Hu Tao wrote:
This prepares for subsequent patches which introduce dependence on cgroup cpuset. Enable cgroup cpuset by default so users don't have to modify configuration file before encountering a cpuset error. --- src/qemu/qemu.conf | 5 +++-- src/qemu/qemu_conf.c | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/src/qemu/qemu.conf b/src/qemu/qemu.conf index 4da5d5a..1c14d8f 100644 --- a/src/qemu/qemu.conf +++ b/src/qemu/qemu.conf @@ -157,18 +157,19 @@ # - 'devices' - use for device whitelisting # - 'memory' - use for memory tunables # - 'blkio' - use for block devices I/O tunables +# - 'cpuset' - use for CPUs and memory nodes # # NB, even if configured here, they won't be used unless # the administrator has mounted cgroups, e.g.: # # mkdir /dev/cgroup -# mount -t cgroup -o devices,cpu,memory,blkio none /dev/cgroup +# mount -t cgroup -o devices,cpu,memory,blkio,cpuset none /dev/cgroup # # They can be mounted anywhere, and different controllers # can be mounted in different locations. libvirt will detect # where they are located. # -# cgroup_controllers = [ "cpu", "devices", "memory", "blkio" ] +# cgroup_controllers = [ "cpu", "devices", "memory", "blkio", "cpuset" ]
# This is the basic set of devices allowed / required by # all virtual machines. diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index d7e9d7d..dfcc6e3 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -304,7 +304,8 @@ int qemudLoadDriverConfig(struct qemud_driver *driver, (1 << VIR_CGROUP_CONTROLLER_CPU) | (1 << VIR_CGROUP_CONTROLLER_DEVICES) | (1 << VIR_CGROUP_CONTROLLER_MEMORY) | - (1 << VIR_CGROUP_CONTROLLER_BLKIO); + (1 << VIR_CGROUP_CONTROLLER_BLKIO) | + (1 << VIR_CGROUP_CONTROLLER_CPUSET); } for (i = 0 ; i < VIR_CGROUP_CONTROLLER_LAST ; i++) { if (driver->cgroupControllers & (1 << i)) {
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 11/18/2011 03:34 AM, Daniel P. Berrange wrote:
On Thu, Nov 17, 2011 at 05:44:12PM +0800, Hu Tao wrote:
This prepares for subsequent patches which introduce dependence on cgroup cpuset. Enable cgroup cpuset by default so users don't have to modify configuration file before encountering a cpuset error. --- src/qemu/qemu.conf | 5 +++-- src/qemu/qemu_conf.c | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-)
ACK
Pushed. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

--- src/libvirt_private.syms | 10 ++++ src/util/cgroup.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/cgroup.h | 11 +++++ 3 files changed, 133 insertions(+), 0 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b9d537e..d48b7dc 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -78,6 +78,11 @@ virCgroupGetCpuShares; virCgroupGetCpuCfsPeriod; virCgroupGetCpuCfsQuota; virCgroupGetCpuacctUsage; +virCgroupGetCpusetHardwall; +virCgroupGetCpusetMemExclusive; +virCgroupGetCpusetMemorySpreadPage; +virCgroupGetCpusetMemorySpreadSlab; +virCgroupGetCpusetMems; virCgroupGetFreezerState; virCgroupGetMemoryHardLimit; virCgroupGetMemorySoftLimit; @@ -93,6 +98,11 @@ virCgroupSetBlkioWeight; virCgroupSetCpuShares; virCgroupSetCpuCfsPeriod; virCgroupSetCpuCfsQuota; +virCgroupSetCpusetHardwall; +virCgroupSetCpusetMemExclusive; +virCgroupSetCpusetMemorySpreadPage; +virCgroupSetCpusetMemorySpreadSlab; +virCgroupSetCpusetMems; virCgroupSetFreezerState; virCgroupSetMemory; virCgroupSetMemoryHardLimit; diff --git a/src/util/cgroup.c b/src/util/cgroup.c index c8d1f33..c9cd90b 100644 --- a/src/util/cgroup.c +++ b/src/util/cgroup.c @@ -1154,6 +1154,118 @@ int virCgroupGetMemSwapHardLimit(virCgroupPtr group, unsigned long long *kb) } /** + * virCgroupSetCpusetMems: + * + * @group: The cgroup to set cpuset.mems for + * @mems: the numa nodes to set + * + * Returns: 0 on success + */ +int virCgroupSetCpusetMems(virCgroupPtr group, const char *mems) +{ + return virCgroupSetValueStr(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.mems", + mems); +} + +/** + * virCgroupGetCpusetMems: + * + * @group: The cgroup to get cpuset.mems for + * @mems: the numa nodes to get + * + * Returns: 0 on success + */ +int virCgroupGetCpusetMems(virCgroupPtr group, char **mems) +{ + return virCgroupGetValueStr(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.mems", + mems); +} + +/** + * virCgroupSetCpusetHardwall: + * + * @group: The cgroup to set cpuset.mems for + * @hardwall: the hardwall to set + * + * Returns: 0 on success + */ +int virCgroupSetCpusetHardwall(virCgroupPtr group, unsigned long long hardwall) +{ + return virCgroupSetValueU64(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.mem_hardwall", + hardwall); +} + +/** + * virCgroupGetCpusetHardwall: + * + * @group: The cgroup to get cpuset.mem_hardwall for + * @hardwall: the hardwall to get + * + * Returns: 0 on success + */ +int virCgroupGetCpusetHardwall(virCgroupPtr group, unsigned long long *hardwall) +{ + return virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.mem_hardwall", + hardwall); +} + +int virCgroupSetCpusetMemExclusive(virCgroupPtr group, unsigned long long memExclusive) +{ + return virCgroupSetValueU64(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.mem_exclusive", + memExclusive); +} + +int virCgroupGetCpusetMemExclusive(virCgroupPtr group, unsigned long long *memExclusive) +{ + return virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.mem_exclusive", + memExclusive); +} + +int virCgroupSetCpusetMemorySpreadPage(virCgroupPtr group, unsigned long long memSpreadPage) +{ + return virCgroupSetValueU64(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.memory_spread_page", + memSpreadPage); +} + +int virCgroupGetCpusetMemorySpreadPage(virCgroupPtr group, unsigned long long *memSpreadPage) +{ + return virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.memory_spread_page", + memSpreadPage); +} + +int virCgroupSetCpusetMemorySpreadSlab(virCgroupPtr group, unsigned long long memSpreadSlab) +{ + return virCgroupSetValueU64(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.memory_spread_slab", + memSpreadSlab); +} + +int virCgroupGetCpusetMemorySpreadSlab(virCgroupPtr group, unsigned long long *memSpreadSlab) +{ + return virCgroupGetValueU64(group, + VIR_CGROUP_CONTROLLER_CPUSET, + "cpuset.memory_spread_slab", + memSpreadSlab); +} + +/** * virCgroupDenyAllDevices: * * @group: The cgroup to deny all permissions, for all devices diff --git a/src/util/cgroup.h b/src/util/cgroup.h index d190bb3..8825b19 100644 --- a/src/util/cgroup.h +++ b/src/util/cgroup.h @@ -115,6 +115,17 @@ int virCgroupGetCpuacctUsage(virCgroupPtr group, unsigned long long *usage); int virCgroupSetFreezerState(virCgroupPtr group, const char *state); int virCgroupGetFreezerState(virCgroupPtr group, char **state); +int virCgroupSetCpusetMems(virCgroupPtr group, const char *mems); +int virCgroupGetCpusetMems(virCgroupPtr group, char **mems); +int virCgroupSetCpusetMemExclusive(virCgroupPtr group, unsigned long long memExclusive); +int virCgroupGetCpusetMemExclusive(virCgroupPtr group, unsigned long long *memExclusive); +int virCgroupSetCpusetHardwall(virCgroupPtr group, unsigned long long hardwall); +int virCgroupGetCpusetHardwall(virCgroupPtr group, unsigned long long *hardwall); +int virCgroupSetCpusetMemorySpreadPage(virCgroupPtr group, unsigned long long memSpreadPage); +int virCgroupGetCpusetMemorySpreadPage(virCgroupPtr group, unsigned long long *memSpreadPage); +int virCgroupSetCpusetMemorySpreadSlab(virCgroupPtr group, unsigned long long memSpreadSlab); +int virCgroupGetCpusetMemorySpreadSlab(virCgroupPtr group, unsigned long long *memSpreadSlab); + int virCgroupRemove(virCgroupPtr group); void virCgroupFree(virCgroupPtr *group); -- 1.7.3.1

On Thu, Nov 17, 2011 at 05:44:13PM +0800, Hu Tao wrote:
--- src/libvirt_private.syms | 10 ++++ src/util/cgroup.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++ src/util/cgroup.h | 11 +++++ 3 files changed, 133 insertions(+), 0 deletions(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b9d537e..d48b7dc 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -78,6 +78,11 @@ virCgroupGetCpuShares; virCgroupGetCpuCfsPeriod; virCgroupGetCpuCfsQuota; virCgroupGetCpuacctUsage; +virCgroupGetCpusetHardwall; +virCgroupGetCpusetMemExclusive; +virCgroupGetCpusetMemorySpreadPage; +virCgroupGetCpusetMemorySpreadSlab;
IMHO these 4 are not required.
+virCgroupGetCpusetMems;
Just this one is needed.
virCgroupGetFreezerState; virCgroupGetMemoryHardLimit; virCgroupGetMemorySoftLimit; @@ -93,6 +98,11 @@ virCgroupSetBlkioWeight; virCgroupSetCpuShares; virCgroupSetCpuCfsPeriod; virCgroupSetCpuCfsQuota; +virCgroupSetCpusetHardwall; +virCgroupSetCpusetMemExclusive; +virCgroupSetCpusetMemorySpreadPage; +virCgroupSetCpusetMemorySpreadSlab;
Nor these 4.
+virCgroupSetCpusetMems;
Just this one Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Currently we are using libnuma to set up numa, but it's desired to use cgroup cpuset to do the job instead. But for old systems that don't have cgroup cpuset, we fall back to libnuma. --- src/conf/domain_conf.h | 8 ++++++++ src/qemu/qemu_process.c | 11 ++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index c360674..29ac165 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1352,6 +1352,12 @@ enum virDomainNumatuneMemMode { VIR_DOMAIN_NUMATUNE_MEM_LAST }; +enum virDomainNumatuneBackend { + VIR_DOMAIN_NUMATUNE_BACKEND_NONE, + VIR_DOMAIN_NUMATUNE_BACKEND_LIBNUMA, + VIR_DOMAIN_NUMATUNE_BACKEND_CGROUP_CPUSET, +}; + typedef struct _virDomainNumatuneDef virDomainNumatuneDef; typedef virDomainNumatuneDef *virDomainNumatuneDefPtr; struct _virDomainNumatuneDef { @@ -1360,6 +1366,8 @@ struct _virDomainNumatuneDef { int mode; } memory; + int backend; + /* Future NUMA tuning related stuff should go here. */ }; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2882ef8..8add0b8 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1334,12 +1334,16 @@ qemuProcessInitNumaMemoryPolicy(virDomainObjPtr vm) if (!vm->def->numatune.memory.nodemask) return 0; + if (vm->def->numatune.backend != VIR_DOMAIN_NUMATUNE_BACKEND_NONE) + return 0; + vm->def->numatune.backend = VIR_DOMAIN_NUMATUNE_BACKEND_LIBNUMA; + VIR_DEBUG("Setting NUMA memory policy"); if (numa_available() < 0) { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Host kernel is not aware of NUMA.")); - return -1; + goto cleanup; } maxnode = numa_max_node() + 1; @@ -1351,7 +1355,7 @@ qemuProcessInitNumaMemoryPolicy(virDomainObjPtr vm) if (i > NUMA_NUM_NODES) { qemuReportError(VIR_ERR_INTERNAL_ERROR, _("Host cannot support NUMA node %d"), i); - return -1; + goto cleanup; } if (i > maxnode && !warned) { VIR_WARN("nodeset is out of range, there is only %d NUMA " @@ -1397,9 +1401,10 @@ qemuProcessInitNumaMemoryPolicy(virDomainObjPtr vm) goto cleanup; } - ret = 0; + return 0; cleanup: + vm->def->numatune.backend = VIR_DOMAIN_NUMATUNE_BACKEND_NONE; return ret; } #else -- 1.7.3.1

On Thu, Nov 17, 2011 at 05:44:14PM +0800, Hu Tao wrote:
Currently we are using libnuma to set up numa, but it's desired to use cgroup cpuset to do the job instead. But for old systems that don't have cgroup cpuset, we fall back to libnuma. --- src/conf/domain_conf.h | 8 ++++++++ src/qemu/qemu_process.c | 11 ++++++++--- 2 files changed, 16 insertions(+), 3 deletions(-)
NACK to this addition, based on my comments to PATCH 00/11. The usage can't be exclusive, we nede to use libnuma *and* cpusets together. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This patch deprecates libnuma and uses cpuset to manage numa. We can further add a `numatune' command to adjust numa parameters through cpuset. --- src/qemu/qemu_cgroup.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 73 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 2a10bd2..9dd67b7 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -365,6 +365,79 @@ int qemuSetupCgroup(struct qemud_driver *driver, } } + if (vm->def->numatune.memory.nodemask && + vm->def->numatune.backend == VIR_DOMAIN_NUMATUNE_BACKEND_NONE) { + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET)) { + int mode = vm->def->numatune.memory.mode; + char *mask = virDomainCpuSetFormat(vm->def->numatune.memory.nodemask, VIR_DOMAIN_CPUMASK_LEN); + if (!mask) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("failed to convert memory nodemask")); + goto cleanup; + } + + rc = virCgroupSetCpusetMems(cgroup, mask); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.mems for domain %s"), + vm->def->name); + goto cleanup; + } + switch (mode) { + case VIR_DOMAIN_NUMATUNE_MEM_STRICT: + rc = virCgroupSetCpusetHardwall(cgroup, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.mem_hardwall for domain %s"), + vm->def->name); + goto cleanup; + } + break; + case VIR_DOMAIN_NUMATUNE_MEM_PREFERRED: + rc = virCgroupSetCpusetMemorySpreadPage(cgroup, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_page for domain %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupSetCpusetMemorySpreadSlab(cgroup, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_slab for domain %s"), + vm->def->name); + goto cleanup; + } + break; + case VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE: + rc = virCgroupSetCpusetMemorySpreadPage(cgroup, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_page for domain %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupSetCpusetMemorySpreadSlab(cgroup, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_slab for domain %s"), + vm->def->name); + goto cleanup; + } + break; + default: + qemuReportError(VIR_ERR_XML_ERROR, + "%s", _("Invalid mode for memory NUMA tuning.")); + goto cleanup; + } + vm->def->numatune.backend = VIR_DOMAIN_NUMATUNE_BACKEND_CGROUP_CPUSET; + } else { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unable to set numa for domain %s since cpuset cgroup is " + "not available on this host"), vm->def->name); + goto cleanup; + } + } done: virCgroupFree(&cgroup); return 0; -- 1.7.3.1

On Thu, Nov 17, 2011 at 05:44:15PM +0800, Hu Tao wrote:
This patch deprecates libnuma and uses cpuset to manage numa. We can further add a `numatune' command to adjust numa parameters through cpuset. --- src/qemu/qemu_cgroup.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 73 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 2a10bd2..9dd67b7 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -365,6 +365,79 @@ int qemuSetupCgroup(struct qemud_driver *driver, } }
+ if (vm->def->numatune.memory.nodemask && + vm->def->numatune.backend == VIR_DOMAIN_NUMATUNE_BACKEND_NONE) { + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET)) { + int mode = vm->def->numatune.memory.mode; + char *mask = virDomainCpuSetFormat(vm->def->numatune.memory.nodemask, VIR_DOMAIN_CPUMASK_LEN); + if (!mask) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("failed to convert memory nodemask")); + goto cleanup; + } + + rc = virCgroupSetCpusetMems(cgroup, mask); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.mems for domain %s"), + vm->def->name); + goto cleanup; + } + switch (mode) { + case VIR_DOMAIN_NUMATUNE_MEM_STRICT: + rc = virCgroupSetCpusetHardwall(cgroup, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.mem_hardwall for domain %s"), + vm->def->name); + goto cleanup; + } + break; + case VIR_DOMAIN_NUMATUNE_MEM_PREFERRED: + rc = virCgroupSetCpusetMemorySpreadPage(cgroup, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_page for domain %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupSetCpusetMemorySpreadSlab(cgroup, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_slab for domain %s"), + vm->def->name); + goto cleanup; + } + break; + case VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE: + rc = virCgroupSetCpusetMemorySpreadPage(cgroup, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_page for domain %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupSetCpusetMemorySpreadSlab(cgroup, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_slab for domain %s"), + vm->def->name); + goto cleanup; + } + break; + default: + qemuReportError(VIR_ERR_XML_ERROR, + "%s", _("Invalid mode for memory NUMA tuning.")); + goto cleanup; + } + vm->def->numatune.backend = VIR_DOMAIN_NUMATUNE_BACKEND_CGROUP_CPUSET; + } else { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unable to set numa for domain %s since cpuset cgroup is " + "not available on this host"), vm->def->name); + goto cleanup; + } + } done: virCgroupFree(&cgroup); return 0;
As per my initial comments this is all wrong because the semantics of the cpuset tunables don't match the policies. IMHO this should look more like if (vm->def->numatune.memory.nodemask && vm->def->numatune.memory.mode == VIR_DOMAIN_NUMATUNE_MEM_STRICT && qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET)) { char *mask = virDomainCpuSetFormat(vm->def->numatune.memory.nodemask, VIR_DOMAIN_CPUMASK_LEN); if (!mask) { qemuReportError(VIR_ERR_INTERNAL_ERROR, _("failed to convert memory nodemask")); goto cleanup; } rc = virCgroupSetCpusetMems(cgroup, mask); VIR_FREE(mask); if (rc != 0) { virReportSystemError(-rc, _("Unable to set cpuset.mems for domain %s"), vm->def->name); goto cleanup; } } Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

add VIR_DOMAIN_NUMATUNE_MEM_NONE to represent none of the numatune modes is set --- src/conf/domain_conf.c | 1 + src/conf/domain_conf.h | 1 + 2 files changed, 2 insertions(+), 0 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 71eb209..6150b10 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -587,6 +587,7 @@ VIR_ENUM_IMPL(virDomainTimerMode, VIR_DOMAIN_TIMER_MODE_LAST, "smpsafe"); VIR_ENUM_IMPL(virDomainNumatuneMemMode, VIR_DOMAIN_NUMATUNE_MEM_LAST, + "none", "strict", "preferred", "interleave"); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 29ac165..1b8b15c 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1345,6 +1345,7 @@ virDomainVcpuPinDefPtr virDomainVcpuPinFindByVcpu(virDomainVcpuPinDefPtr *def, int vcpu); enum virDomainNumatuneMemMode { + VIR_DOMAIN_NUMATUNE_MEM_NONE, VIR_DOMAIN_NUMATUNE_MEM_STRICT, VIR_DOMAIN_NUMATUNE_MEM_PREFERRED, VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE, -- 1.7.3.1

On Thu, Nov 17, 2011 at 05:44:16PM +0800, Hu Tao wrote:
add VIR_DOMAIN_NUMATUNE_MEM_NONE to represent none of the numatune modes is set --- src/conf/domain_conf.c | 1 + src/conf/domain_conf.h | 1 + 2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 71eb209..6150b10 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -587,6 +587,7 @@ VIR_ENUM_IMPL(virDomainTimerMode, VIR_DOMAIN_TIMER_MODE_LAST, "smpsafe");
VIR_ENUM_IMPL(virDomainNumatuneMemMode, VIR_DOMAIN_NUMATUNE_MEM_LAST, + "none", "strict", "preferred", "interleave"); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 29ac165..1b8b15c 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1345,6 +1345,7 @@ virDomainVcpuPinDefPtr virDomainVcpuPinFindByVcpu(virDomainVcpuPinDefPtr *def, int vcpu);
enum virDomainNumatuneMemMode { + VIR_DOMAIN_NUMATUNE_MEM_NONE, VIR_DOMAIN_NUMATUNE_MEM_STRICT, VIR_DOMAIN_NUMATUNE_MEM_PREFERRED, VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE,
I don't see any purpose for this addition. The three options we have there already suit all needs AFAICT. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Set up the types for the numa functions and insert them into the virDriver structure definition. --- include/libvirt/libvirt.h.in | 23 +++++++++++++++++++++++ python/generator.py | 2 ++ src/driver.h | 15 +++++++++++++++ src/libvirt_public.syms | 6 ++++++ 4 files changed, 46 insertions(+), 0 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 2ab89f5..7ce6352 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1311,6 +1311,29 @@ typedef enum { } virDomainMemoryModFlags; +/* Manage numa parameters */ + +/** + * VIR_DOMAIN_NUMA_NODESET: + * + * numa nodeset + */ +#define VIR_DOMAIN_NUMA_NODESET "numa_nodeset" + +/** + * VIR_DOMAIN_NUMA_MODE: + * + * numa mode + */ +#define VIR_DOMAIN_NUMA_MODE "numa_mode" + +int virDomainSetNumaParameters(virDomainPtr domain, + virTypedParameterPtr params, + int nparams, unsigned int flags); +int virDomainGetNumaParameters(virDomainPtr domain, + virTypedParameterPtr params, + int *nparams, unsigned int flags); + /* * Dynamic control of domains */ diff --git a/python/generator.py b/python/generator.py index 71afdb7..5b09123 100755 --- a/python/generator.py +++ b/python/generator.py @@ -382,6 +382,8 @@ skip_impl = ( 'virDomainGetBlkioParameters', 'virDomainSetMemoryParameters', 'virDomainGetMemoryParameters', + 'virDomainSetNumaParameters', + 'virDomainGetNumaParameters', 'virDomainGetVcpus', 'virDomainPinVcpu', 'virDomainPinVcpuFlags', diff --git a/src/driver.h b/src/driver.h index 4c14aaa..76cf966 100644 --- a/src/driver.h +++ b/src/driver.h @@ -158,6 +158,19 @@ typedef int int *nparams, unsigned int flags); typedef int + (*virDrvDomainSetNumaParameters) + (virDomainPtr domain, + virTypedParameterPtr params, + int nparams, + unsigned int flags); +typedef int + (*virDrvDomainGetNumaParameters) + (virDomainPtr domain, + virTypedParameterPtr params, + int *nparams, + unsigned int flags); + +typedef int (*virDrvDomainSetBlkioParameters) (virDomainPtr domain, virTypedParameterPtr params, @@ -787,6 +800,8 @@ struct _virDriver { virDrvDomainSetMemoryFlags domainSetMemoryFlags; virDrvDomainSetMemoryParameters domainSetMemoryParameters; virDrvDomainGetMemoryParameters domainGetMemoryParameters; + virDrvDomainSetNumaParameters domainSetNumaParameters; + virDrvDomainGetNumaParameters domainGetNumaParameters; virDrvDomainSetBlkioParameters domainSetBlkioParameters; virDrvDomainGetBlkioParameters domainGetBlkioParameters; virDrvDomainGetInfo domainGetInfo; diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index bcefb10..d830974 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -498,4 +498,10 @@ LIBVIRT_0.9.7 { virDomainSnapshotNumChildren; } LIBVIRT_0.9.5; +LIBVIRT_0.9.8 { + global: + virDomainGetNumaParameters; + virDomainSetNumaParameters; +} LIBVIRT_0.9.7; + # .... define new API here using predicted next version number .... -- 1.7.3.1

On Thu, Nov 17, 2011 at 05:44:17PM +0800, Hu Tao wrote:
Set up the types for the numa functions and insert them into the virDriver structure definition. --- include/libvirt/libvirt.h.in | 23 +++++++++++++++++++++++ python/generator.py | 2 ++ src/driver.h | 15 +++++++++++++++ src/libvirt_public.syms | 6 ++++++ 4 files changed, 46 insertions(+), 0 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 2ab89f5..7ce6352 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1311,6 +1311,29 @@ typedef enum { } virDomainMemoryModFlags;
+/* Manage numa parameters */ + +/** + * VIR_DOMAIN_NUMA_NODESET: + * + * numa nodeset + */ +#define VIR_DOMAIN_NUMA_NODESET "numa_nodeset" + +/** + * VIR_DOMAIN_NUMA_MODE: + * + * numa mode + */ +#define VIR_DOMAIN_NUMA_MODE "numa_mode" + +int virDomainSetNumaParameters(virDomainPtr domain, + virTypedParameterPtr params, + int nparams, unsigned int flags); +int virDomainGetNumaParameters(virDomainPtr domain, + virTypedParameterPtr params, + int *nparams, unsigned int flags);
IMHO, we should really have an API more like the CPU pinning API for controlling numa placement. eg the API accepts a bitmap of allowed nodes. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

--- src/libvirt.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 113 insertions(+), 0 deletions(-) diff --git a/src/libvirt.c b/src/libvirt.c index 1518ed2..4eaf297 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -3762,6 +3762,119 @@ error: } /** + * virDomainSetNumaParameters: + * @domain: pointer to domain object + * @params: pointer to numa parameter objects + * @nparams: number of numa parameter (this value can be the same or + * less than the number of parameters supported) + * @flags: bitwise-OR of virDomainModificationImpact + * + * Change all or a subset of the numa tunables. + * This function may require privileged access to the hypervisor. + * + * Returns -1 in case of error, 0 in case of success. + */ +int virDomainSetNumaParameters(virDomainPtr domain, + virTypedParameterPtr params, + int nparams, unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(domain, "params=%p, nparams=%d, flags=%x", + params, nparams, flags); + + virResetLastError(); + + if (!VIR_IS_CONNECTED_DOMAIN(domain)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + if (domain->conn->flags & VIR_CONNECT_RO) { + virLibDomainError(VIR_ERR_OPERATION_DENIED, __FUNCTION__); + goto error; + } + if ((nparams <= 0) || (params == NULL)) { + virLibDomainError(VIR_ERR_INVALID_ARG, __FUNCTION__); + goto error; + } + if (virTypedParameterValidateSet(domain, params, nparams) < 0) + return -1; + + conn = domain->conn; + + if (conn->driver->domainSetNumaParameters) { + int ret; + ret = conn->driver->domainSetNumaParameters (domain, params, nparams, flags); + if (ret < 0) + goto error; + return ret; + } + + virLibConnError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(domain->conn); + return -1; +} + +/** + * virDomainGetNumaParameters: + * @domain: pointer to domain object + * @params: pointer to numa parameter object + * (return value, allocated by the caller) + * @nparams: pointer to number of numa parameters + * @flags: one of virDomainModificationImpact + * + * This function may require privileged access to the hypervisor. This function + * expects the caller to allocate the @params. + * + * Returns -1 in case of error, 0 in case of success. + */ + +int +virDomainGetNumaParameters(virDomainPtr domain, + virTypedParameterPtr params, + int *nparams, unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(domain, "params=%p, nparams=%d, flags=%x", + params, (nparams) ? *nparams : -1, flags); + + virResetLastError(); + + if (!VIR_IS_CONNECTED_DOMAIN(domain)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + if ((nparams == NULL) || (*nparams < 0) || + (params == NULL && *nparams != 0)) { + virLibDomainError(VIR_ERR_INVALID_ARG, __FUNCTION__); + goto error; + } + if (VIR_DRV_SUPPORTS_FEATURE(domain->conn->driver, domain->conn, + VIR_DRV_FEATURE_TYPED_PARAM_STRING)) + flags |= VIR_TYPED_PARAM_STRING_OKAY; + + conn = domain->conn; + + if (conn->driver->domainGetNumaParameters) { + int ret; + ret = conn->driver->domainGetNumaParameters (domain, params, nparams, flags); + if (ret < 0) + goto error; + return ret; + } + virLibConnError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(domain->conn); + return -1; +} + +/** * virDomainSetBlkioParameters: * @domain: pointer to domain object * @params: pointer to blkio parameter objects -- 1.7.3.1

--- daemon/remote.c | 64 ++++++++++++++++++++++++++++++++++++++++++ src/remote/remote_driver.c | 50 ++++++++++++++++++++++++++++++++ src/remote/remote_protocol.x | 25 +++++++++++++++- src/remote_protocol-structs | 16 ++++++++++ 4 files changed, 154 insertions(+), 1 deletions(-) diff --git a/daemon/remote.c b/daemon/remote.c index 97c9538..9ccae47 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1624,6 +1624,70 @@ cleanup: } static int +remoteDispatchDomainGetNumaParameters(virNetServerPtr server ATTRIBUTE_UNUSED, + virNetServerClientPtr client ATTRIBUTE_UNUSED, + virNetMessagePtr msg ATTRIBUTE_UNUSED, + virNetMessageErrorPtr rerr, + remote_domain_get_numa_parameters_args *args, + remote_domain_get_numa_parameters_ret *ret) +{ + virDomainPtr dom = NULL; + virTypedParameterPtr params = NULL; + int nparams = args->nparams; + unsigned int flags; + int rv = -1; + struct daemonClientPrivate *priv = + virNetServerClientGetPrivateData(client); + + if (!priv->conn) { + virNetError(VIR_ERR_INTERNAL_ERROR, "%s", _("connection not open")); + goto cleanup; + } + + flags = args->flags; + + if (nparams > REMOTE_DOMAIN_MEMORY_PARAMETERS_MAX) { + virNetError(VIR_ERR_INTERNAL_ERROR, "%s", _("nparams too large")); + goto cleanup; + } + if (VIR_ALLOC_N(params, nparams) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (!(dom = get_nonnull_domain(priv->conn, args->dom))) + goto cleanup; + + if (virDomainGetNumaParameters(dom, params, &nparams, flags) < 0) + goto cleanup; + + /* In this case, we need to send back the number of parameters + * supported + */ + if (args->nparams == 0) { + ret->nparams = nparams; + goto success; + } + + if (remoteSerializeTypedParameters(params, nparams, + &ret->params.params_val, + &ret->params.params_len, + flags) < 0) + goto cleanup; + +success: + rv = 0; + +cleanup: + if (rv < 0) + virNetMessageSaveError(rerr); + if (dom) + virDomainFree(dom); + VIR_FREE(params); + return rv; +} + +static int remoteDispatchDomainGetBlkioParameters(virNetServerPtr server ATTRIBUTE_UNUSED, virNetServerClientPtr client ATTRIBUTE_UNUSED, virNetMessagePtr msg ATTRIBUTE_UNUSED, diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 94fd3e7..32342cc 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -1514,6 +1514,54 @@ done: } static int +remoteDomainGetNumaParameters (virDomainPtr domain, + virTypedParameterPtr params, int *nparams, + unsigned int flags) +{ + int rv = -1; + remote_domain_get_numa_parameters_args args; + remote_domain_get_numa_parameters_ret ret; + struct private_data *priv = domain->conn->privateData; + + remoteDriverLock(priv); + + make_nonnull_domain (&args.dom, domain); + args.nparams = *nparams; + args.flags = flags; + + memset (&ret, 0, sizeof ret); + if (call (domain->conn, priv, 0, REMOTE_PROC_DOMAIN_GET_NUMA_PARAMETERS, + (xdrproc_t) xdr_remote_domain_get_numa_parameters_args, (char *) &args, + (xdrproc_t) xdr_remote_domain_get_numa_parameters_ret, (char *) &ret) == -1) + goto done; + + /* Handle the case when the caller does not know the number of parameters + * and is asking for the number of parameters supported + */ + if (*nparams == 0) { + *nparams = ret.nparams; + rv = 0; + goto cleanup; + } + + if (remoteDeserializeTypedParameters(ret.params.params_val, + ret.params.params_len, + REMOTE_DOMAIN_NUMA_PARAMETERS_MAX, + params, + nparams) < 0) + goto cleanup; + + rv = 0; + +cleanup: + xdr_free ((xdrproc_t) xdr_remote_domain_get_numa_parameters_ret, + (char *) &ret); +done: + remoteDriverUnlock(priv); + return rv; +} + +static int remoteDomainGetBlkioParameters (virDomainPtr domain, virTypedParameterPtr params, int *nparams, unsigned int flags) @@ -4550,6 +4598,8 @@ static virDriver remote_driver = { .domainGetBlockJobInfo = remoteDomainGetBlockJobInfo, /* 0.9.4 */ .domainBlockJobSetSpeed = remoteDomainBlockJobSetSpeed, /* 0.9.4 */ .domainBlockPull = remoteDomainBlockPull, /* 0.9.4 */ + .domainSetNumaParameters = remoteDomainSetNumaParameters, /* 0.9.8 */ + .domainGetNumaParameters = remoteDomainGetNumaParameters, /* 0.9.8 */ }; static virNetworkDriver network_driver = { diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 5ea1114..8baf793 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -125,6 +125,9 @@ const REMOTE_DOMAIN_BLKIO_PARAMETERS_MAX = 16; /* Upper limit on list of memory parameters. */ const REMOTE_DOMAIN_MEMORY_PARAMETERS_MAX = 16; +/* Upper limit on list of numa parameters. */ +const REMOTE_DOMAIN_NUMA_PARAMETERS_MAX = 16; + /* Upper limit on list of node cpu stats. */ const REMOTE_NODE_CPU_STATS_MAX = 16; @@ -537,6 +540,23 @@ struct remote_domain_get_memory_parameters_ret { int nparams; }; +struct remote_domain_set_numa_parameters_args { + remote_nonnull_domain dom; + remote_typed_param params<REMOTE_DOMAIN_MEMORY_PARAMETERS_MAX>; + unsigned int flags; +}; + +struct remote_domain_get_numa_parameters_args { + remote_nonnull_domain dom; + int nparams; + unsigned int flags; +}; + +struct remote_domain_get_numa_parameters_ret { + remote_typed_param params<REMOTE_DOMAIN_MEMORY_PARAMETERS_MAX>; + int nparams; +}; + struct remote_domain_block_stats_args { remote_nonnull_domain dom; remote_nonnull_string path; @@ -2564,7 +2584,10 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_SNAPSHOT_NUM_CHILDREN = 246, /* autogen autogen priority:high */ REMOTE_PROC_DOMAIN_SNAPSHOT_LIST_CHILDREN_NAMES = 247, /* autogen autogen priority:high */ REMOTE_PROC_DOMAIN_EVENT_DISK_CHANGE = 248, /* skipgen skipgen */ - REMOTE_PROC_DOMAIN_OPEN_GRAPHICS = 249 /* skipgen skipgen */ + REMOTE_PROC_DOMAIN_OPEN_GRAPHICS = 249, /* skipgen skipgen */ + REMOTE_PROC_DOMAIN_SET_NUMA_PARAMETERS = 250, /* autogen autogen */ + + REMOTE_PROC_DOMAIN_GET_NUMA_PARAMETERS = 251 /* skipgen skipgen */ /* * Notice how the entries are grouped in sets of 10 ? diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index b460b77..25360a8 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -168,6 +168,22 @@ struct remote_node_get_memory_stats_ret { } params; int nparams; }; +struct remote_domain_set_numa_parameters_args { + remote_nonnull_domain dom; + remote_typed_param params<REMOTE_DOMAIN_MEMORY_PARAMETERS_MAX>; + unsigned int flags; +}; + +struct remote_domain_get_numa_parameters_args { + remote_nonnull_domain dom; + int nparams; + unsigned int flags; +}; + +struct remote_domain_get_numa_parameters_ret { + remote_typed_param params<REMOTE_DOMAIN_MEMORY_PARAMETERS_MAX>; + int nparams; +}; struct remote_node_get_cells_free_memory_args { int startCell; int maxcells; -- 1.7.3.1

--- src/qemu/qemu_driver.c | 399 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 399 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5f4a18d..5b6398c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -94,6 +94,8 @@ #define QEMU_NB_MEM_PARAM 3 +#define QEMU_NB_NUMA_PARAM 2 + #if HAVE_LINUX_KVM_H # include <linux/kvm.h> #endif @@ -6524,6 +6526,401 @@ cleanup: return ret; } +static int qemuDomainSetNumaParameters(virDomainPtr dom, + virTypedParameterPtr params, + int nparams, + unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + int i; + virDomainDefPtr persistentDef = NULL; + virCgroupPtr group = NULL; + virDomainObjPtr vm = NULL; + int ret = -1; + bool isActive; + + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_DOMAIN_AFFECT_CONFIG, -1); + + qemuDriverLock(driver); + + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + + if (vm == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("No such domain %s"), dom->uuid); + goto cleanup; + } + + isActive = virDomainObjIsActive(vm); + + if (flags == VIR_DOMAIN_AFFECT_CURRENT) { + if (isActive) + flags = VIR_DOMAIN_AFFECT_LIVE; + else + flags = VIR_DOMAIN_AFFECT_CONFIG; + } + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + if (!isActive) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto cleanup; + } + + if (!qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("cgroup cpuset controller is not mounted")); + goto cleanup; + } + + if (virCgroupForDomain(driver->cgroup, vm->def->name, &group, 0) != 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find cgroup for domain %s"), vm->def->name); + goto cleanup; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + if (!vm->persistent) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("cannot change persistent config of a transient domain")); + goto cleanup; + } + if (!(persistentDef = virDomainObjGetPersistentDef(driver->caps, vm))) + goto cleanup; + } + + ret = 0; + for (i = 0; i < nparams; i++) { + virTypedParameterPtr param = ¶ms[i]; + + if (STREQ(param->field, VIR_DOMAIN_NUMA_NODESET)) { + int rc; + if (param->type != VIR_TYPED_PARAM_STRING) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("invalid type for numa nodeset tunable, expected a 'string'")); + ret = -1; + continue; + } + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + rc = virCgroupSetCpusetMems(group, params[i].value.s); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to set memory hard_limit tunable")); + ret = -1; + continue; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + char *oldnodemask = strdup(persistentDef->numatune.memory.nodemask); + if (!oldnodemask) { + virReportOOMError(); + ret = -1; + continue; + } + if (virDomainCpuSetParse((const char **)¶ms[i].value.s, + 0, + persistentDef->numatune.memory.nodemask, + VIR_DOMAIN_CPUMASK_LEN) < 0) { + VIR_FREE(persistentDef->numatune.memory.nodemask); + persistentDef->numatune.memory.nodemask = oldnodemask; + ret = -1; + continue; + } + } + } else if (STREQ(param->field, VIR_DOMAIN_NUMA_MODE)) { + int rc; + if (param->type != VIR_TYPED_PARAM_ULLONG) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("invalid type for numa strict tunable, expected a 'ullong'")); + ret = -1; + continue; + } + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + switch(params[i].value.i) { + case VIR_DOMAIN_NUMATUNE_MEM_STRICT: + rc = virCgroupSetCpusetHardwall(group, params[i].value.i); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to set memory hard_limit tunable")); + ret = -1; + } + break; + case VIR_DOMAIN_NUMATUNE_MEM_PREFERRED: + rc = virCgroupSetCpusetMemorySpreadPage(group, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_page for domain %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupSetCpusetMemorySpreadSlab(group, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_slab for domain %s"), + vm->def->name); + goto cleanup; + } + break; + case VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE: + rc = virCgroupSetCpusetMemorySpreadPage(group, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_page for domain %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupSetCpusetMemorySpreadSlab(group, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_slab for domain %s"), + vm->def->name); + goto cleanup; + } + + break; + default: + qemuReportError(VIR_ERR_INVALID_ARG, _("Unsupported mode")); + goto cleanup; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + persistentDef->numatune.memory.mode = params[i].value.i; + } + } else { + qemuReportError(VIR_ERR_INVALID_ARG, + _("Parameter `%s' not supported"), param->field); + ret = -1; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + if (virDomainSaveConfig(driver->configDir, persistentDef) < 0) + ret = -1; + } + +cleanup: + virCgroupFree(&group); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + +static int qemuDomainGetNumaParameters(virDomainPtr dom, + virTypedParameterPtr params, + int *nparams, + unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + int i; + virCgroupPtr group = NULL; + virDomainObjPtr vm = NULL; + virDomainDefPtr persistentDef = NULL; + char *nodeset = NULL; + unsigned long long val; + int ret = -1; + int rc; + bool isActive; + + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_DOMAIN_AFFECT_CONFIG | + VIR_TYPED_PARAM_STRING_OKAY, -1); + + qemuDriverLock(driver); + + flags &= ~VIR_TYPED_PARAM_STRING_OKAY; + + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + + if (vm == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("No such domain %s"), dom->uuid); + goto cleanup; + } + + isActive = virDomainObjIsActive(vm); + + if (flags == VIR_DOMAIN_AFFECT_CURRENT) { + if (isActive) + flags = VIR_DOMAIN_AFFECT_LIVE; + else + flags = VIR_DOMAIN_AFFECT_CONFIG; + } + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + if (!isActive) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto cleanup; + } + + if (!qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("cgroup memory controller is not mounted")); + goto cleanup; + } + + if (virCgroupForDomain(driver->cgroup, vm->def->name, &group, 0) != 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find cgroup for domain %s"), vm->def->name); + goto cleanup; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + if (!vm->persistent) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("cannot change persistent config of a transient domain")); + goto cleanup; + } + if (!(persistentDef = virDomainObjGetPersistentDef(driver->caps, vm))) + goto cleanup; + } + + if ((*nparams) == 0) { + *nparams = QEMU_NB_NUMA_PARAM; + ret = 0; + goto cleanup; + } + + if ((*nparams) < QEMU_NB_NUMA_PARAM) { + qemuReportError(VIR_ERR_INVALID_ARG, + "%s", _("Invalid parameter count")); + goto cleanup; + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + for (i = 0; i < *nparams; i++) { + virMemoryParameterPtr param = ¶ms[i]; + val = 0; + param->value.ul = 0; + param->type = VIR_TYPED_PARAM_ULLONG; + + switch (i) { + case 0: /* fill numa nodeset here */ + if (virStrcpyStatic(param->field, VIR_DOMAIN_NUMA_NODESET) == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Field numa nodeset too long for destination")); + goto cleanup; + } + if (persistentDef->numatune.memory.nodemask) { + nodeset = virDomainCpuSetFormat(persistentDef->numatune.memory.nodemask, + VIR_DOMAIN_CPUMASK_LEN); + if (!nodeset) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("failed to format nodeset for NUMA memory tuning")); + goto cleanup; + } + param->value.s = nodeset; + nodeset = NULL; + } else { + param->value.s = strdup(""); + } + param->type = VIR_TYPED_PARAM_STRING; + break; + + case 1: /* fill numa mode here */ + if (virStrcpyStatic(param->field, VIR_DOMAIN_NUMA_MODE) == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Field numa mode too long for destination")); + goto cleanup; + } + param->value.i = persistentDef->numatune.memory.mode; + break; + + default: + break; + /* should not hit here */ + } + } + goto out; + } + + for (i = 0; i < QEMU_NB_NUMA_PARAM; i++) { + virTypedParameterPtr param = ¶ms[i]; + val = 0; + param->value.ul = 0; + param->type = VIR_TYPED_PARAM_ULLONG; + + /* Coverity does not realize that if we get here, group is set. */ + sa_assert(group); + + switch (i) { + case 0: /* fill numa nodeset here */ + rc = virCgroupGetCpusetHardwall(group, &val); + rc = virCgroupGetCpusetMems(group, &nodeset); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa nodeset")); + goto cleanup; + } + if (virStrcpyStatic(param->field, VIR_DOMAIN_NUMA_NODESET) == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Field numa nodeset too long for destination")); + VIR_FREE(nodeset); + goto cleanup; + } + param->value.s = nodeset; + param->type = VIR_TYPED_PARAM_STRING; + break; + + case 1: /* file numa mode here */ + if (virStrcpyStatic(param->field, VIR_DOMAIN_NUMA_MODE) == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Field numa exclusive too long for destination")); + goto cleanup; + } + rc = virCgroupGetCpusetHardwall(group, &val); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa mode")); + goto cleanup; + } + if (val) { + param->value.ul = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + break; + } + rc = virCgroupGetCpusetMemorySpreadPage(group, &val); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa mode")); + goto cleanup; + } + if (val) { + param->value.ul = VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE; + break; + } else { + param->value.ul = VIR_DOMAIN_NUMATUNE_MEM_PREFERRED; + break; + } + param->value.ul = VIR_DOMAIN_NUMATUNE_MEM_NONE; + break; + + default: + break; + /* should not hit here */ + } + } + +out: + *nparams = QEMU_NB_NUMA_PARAM; + ret = 0; + +cleanup: + if (group) + virCgroupFree(&group); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + static int qemuSetVcpusBWLive(virDomainObjPtr vm, virCgroupPtr cgroup, unsigned long long period, long long quota) @@ -10919,6 +11316,8 @@ static virDriver qemuDriver = { .domainGetBlockJobInfo = qemuDomainGetBlockJobInfo, /* 0.9.4 */ .domainBlockJobSetSpeed = qemuDomainBlockJobSetSpeed, /* 0.9.4 */ .domainBlockPull = qemuDomainBlockPull, /* 0.9.4 */ + .domainSetNumaParameters = qemuDomainSetNumaParameters, /* 0.9.8 */ + .domainGetNumaParameters = qemuDomainGetNumaParameters, /* 0.9.8 */ }; -- 1.7.3.1

On Thu, Nov 17, 2011 at 05:44:20PM +0800, Hu Tao wrote:
--- src/qemu/qemu_driver.c | 399 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 399 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5f4a18d..5b6398c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -94,6 +94,8 @@
#define QEMU_NB_MEM_PARAM 3
+#define QEMU_NB_NUMA_PARAM 2 + #if HAVE_LINUX_KVM_H # include <linux/kvm.h> #endif @@ -6524,6 +6526,401 @@ cleanup: return ret; }
+static int qemuDomainSetNumaParameters(virDomainPtr dom, + virTypedParameterPtr params, + int nparams, + unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + int i; + virDomainDefPtr persistentDef = NULL; + virCgroupPtr group = NULL; + virDomainObjPtr vm = NULL; + int ret = -1; + bool isActive; + + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_DOMAIN_AFFECT_CONFIG, -1); + + qemuDriverLock(driver); + + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + + if (vm == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("No such domain %s"), dom->uuid); + goto cleanup; + } + + isActive = virDomainObjIsActive(vm); + + if (flags == VIR_DOMAIN_AFFECT_CURRENT) { + if (isActive) + flags = VIR_DOMAIN_AFFECT_LIVE; + else + flags = VIR_DOMAIN_AFFECT_CONFIG; + } + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + if (!isActive) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto cleanup; + } + + if (!qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_CPUSET)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("cgroup cpuset controller is not mounted")); + goto cleanup; + } + + if (virCgroupForDomain(driver->cgroup, vm->def->name, &group, 0) != 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find cgroup for domain %s"), vm->def->name); + goto cleanup; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + if (!vm->persistent) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("cannot change persistent config of a transient domain")); + goto cleanup; + } + if (!(persistentDef = virDomainObjGetPersistentDef(driver->caps, vm))) + goto cleanup; + } + + ret = 0; + for (i = 0; i < nparams; i++) { + virTypedParameterPtr param = ¶ms[i]; + + if (STREQ(param->field, VIR_DOMAIN_NUMA_NODESET)) { + int rc; + if (param->type != VIR_TYPED_PARAM_STRING) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("invalid type for numa nodeset tunable, expected a 'string'")); + ret = -1; + continue; + } + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + rc = virCgroupSetCpusetMems(group, params[i].value.s); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to set memory hard_limit tunable")); + ret = -1; + continue; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + char *oldnodemask = strdup(persistentDef->numatune.memory.nodemask); + if (!oldnodemask) { + virReportOOMError(); + ret = -1; + continue; + } + if (virDomainCpuSetParse((const char **)¶ms[i].value.s, + 0, + persistentDef->numatune.memory.nodemask, + VIR_DOMAIN_CPUMASK_LEN) < 0) { + VIR_FREE(persistentDef->numatune.memory.nodemask); + persistentDef->numatune.memory.nodemask = oldnodemask; + ret = -1; + continue; + } + } + } else if (STREQ(param->field, VIR_DOMAIN_NUMA_MODE)) { + int rc; + if (param->type != VIR_TYPED_PARAM_ULLONG) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("invalid type for numa strict tunable, expected a 'ullong'")); + ret = -1; + continue; + } + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + switch(params[i].value.i) { + case VIR_DOMAIN_NUMATUNE_MEM_STRICT: + rc = virCgroupSetCpusetHardwall(group, params[i].value.i); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to set memory hard_limit tunable")); + ret = -1; + } + break; + case VIR_DOMAIN_NUMATUNE_MEM_PREFERRED: + rc = virCgroupSetCpusetMemorySpreadPage(group, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_page for domain %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupSetCpusetMemorySpreadSlab(group, 0); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_slab for domain %s"), + vm->def->name); + goto cleanup; + } + break; + case VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE: + rc = virCgroupSetCpusetMemorySpreadPage(group, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_page for domain %s"), + vm->def->name); + goto cleanup; + } + rc = virCgroupSetCpusetMemorySpreadSlab(group, 1); + if (rc != 0) { + virReportSystemError(-rc, + _("Unable to set cpuset.memory_spread_slab for domain %s"), + vm->def->name); + goto cleanup; + } + + break; + default: + qemuReportError(VIR_ERR_INVALID_ARG, _("Unsupported mode")); + goto cleanup; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + persistentDef->numatune.memory.mode = params[i].value.i; + } + } else { + qemuReportError(VIR_ERR_INVALID_ARG, + _("Parameter `%s' not supported"), param->field); + ret = -1; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + if (virDomainSaveConfig(driver->configDir, persistentDef) < 0) + ret = -1; + } + +cleanup: + virCgroupFree(&group); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + +static int qemuDomainGetNumaParameters(virDomainPtr dom, + virTypedParameterPtr params, + int *nparams, + unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + int i; + virCgroupPtr group = NULL; + virDomainObjPtr vm = NULL; + virDomainDefPtr persistentDef = NULL; + char *nodeset = NULL; + unsigned long long val; + int ret = -1; + int rc; + bool isActive; + + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_DOMAIN_AFFECT_CONFIG | + VIR_TYPED_PARAM_STRING_OKAY, -1); + + qemuDriverLock(driver); + + flags &= ~VIR_TYPED_PARAM_STRING_OKAY; + + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + + if (vm == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("No such domain %s"), dom->uuid); + goto cleanup; + } + + isActive = virDomainObjIsActive(vm); + + if (flags == VIR_DOMAIN_AFFECT_CURRENT) { + if (isActive) + flags = VIR_DOMAIN_AFFECT_LIVE; + else + flags = VIR_DOMAIN_AFFECT_CONFIG; + } + + if (flags & VIR_DOMAIN_AFFECT_LIVE) { + if (!isActive) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto cleanup; + } + + if (!qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_MEMORY)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("cgroup memory controller is not mounted")); + goto cleanup; + } + + if (virCgroupForDomain(driver->cgroup, vm->def->name, &group, 0) != 0) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot find cgroup for domain %s"), vm->def->name); + goto cleanup; + } + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + if (!vm->persistent) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("cannot change persistent config of a transient domain")); + goto cleanup; + } + if (!(persistentDef = virDomainObjGetPersistentDef(driver->caps, vm))) + goto cleanup; + } + + if ((*nparams) == 0) { + *nparams = QEMU_NB_NUMA_PARAM; + ret = 0; + goto cleanup; + } + + if ((*nparams) < QEMU_NB_NUMA_PARAM) { + qemuReportError(VIR_ERR_INVALID_ARG, + "%s", _("Invalid parameter count")); + goto cleanup; + } + + if (flags & VIR_DOMAIN_AFFECT_CONFIG) { + for (i = 0; i < *nparams; i++) { + virMemoryParameterPtr param = ¶ms[i]; + val = 0; + param->value.ul = 0; + param->type = VIR_TYPED_PARAM_ULLONG; + + switch (i) { + case 0: /* fill numa nodeset here */ + if (virStrcpyStatic(param->field, VIR_DOMAIN_NUMA_NODESET) == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Field numa nodeset too long for destination")); + goto cleanup; + } + if (persistentDef->numatune.memory.nodemask) { + nodeset = virDomainCpuSetFormat(persistentDef->numatune.memory.nodemask, + VIR_DOMAIN_CPUMASK_LEN); + if (!nodeset) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("failed to format nodeset for NUMA memory tuning")); + goto cleanup; + } + param->value.s = nodeset; + nodeset = NULL; + } else { + param->value.s = strdup(""); + } + param->type = VIR_TYPED_PARAM_STRING; + break; + + case 1: /* fill numa mode here */ + if (virStrcpyStatic(param->field, VIR_DOMAIN_NUMA_MODE) == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Field numa mode too long for destination")); + goto cleanup; + } + param->value.i = persistentDef->numatune.memory.mode; + break; + + default: + break; + /* should not hit here */ + } + } + goto out; + } + + for (i = 0; i < QEMU_NB_NUMA_PARAM; i++) { + virTypedParameterPtr param = ¶ms[i]; + val = 0; + param->value.ul = 0; + param->type = VIR_TYPED_PARAM_ULLONG; + + /* Coverity does not realize that if we get here, group is set. */ + sa_assert(group); + + switch (i) { + case 0: /* fill numa nodeset here */ + rc = virCgroupGetCpusetHardwall(group, &val); + rc = virCgroupGetCpusetMems(group, &nodeset); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa nodeset")); + goto cleanup; + } + if (virStrcpyStatic(param->field, VIR_DOMAIN_NUMA_NODESET) == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Field numa nodeset too long for destination")); + VIR_FREE(nodeset); + goto cleanup; + } + param->value.s = nodeset; + param->type = VIR_TYPED_PARAM_STRING; + break; + + case 1: /* file numa mode here */ + if (virStrcpyStatic(param->field, VIR_DOMAIN_NUMA_MODE) == NULL) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Field numa exclusive too long for destination")); + goto cleanup; + } + rc = virCgroupGetCpusetHardwall(group, &val); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa mode")); + goto cleanup; + } + if (val) { + param->value.ul = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + break; + } + rc = virCgroupGetCpusetMemorySpreadPage(group, &val); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa mode")); + goto cleanup; + } + if (val) { + param->value.ul = VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE; + break; + } else { + param->value.ul = VIR_DOMAIN_NUMATUNE_MEM_PREFERRED; + break; + } + param->value.ul = VIR_DOMAIN_NUMATUNE_MEM_NONE; + break; + + default: + break; + /* should not hit here */ + } + } + +out: + *nparams = QEMU_NB_NUMA_PARAM; + ret = 0; + +cleanup: + if (group) + virCgroupFree(&group); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +}
Same API comments as with patch 7 of course. In terms of logic, we can only allow the nodeset mask to be changed on the fly given current cpuset functionality. We can't support switching between policy modes. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

add new command numatune to virsh to get/set numa parameters --- tools/virsh.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 180 insertions(+), 0 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 89fb4e7..3dfa375 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -60,6 +60,7 @@ #include "command.h" #include "virkeycode.h" #include "virnetdevbandwidth.h" +#include "conf/domain_conf.h" static char *progname; @@ -4975,6 +4976,184 @@ cmdMemtune(vshControl * ctl, const vshCmd * cmd) } /* + * "numatune" command + */ +static const vshCmdInfo info_numatune[] = { + {"help", N_("Get or set numa parameters")}, + {"desc", N_("Get or set the current numa parameters for a guest" \ + " domain.\n" \ + " To get the numa parameters use following command: \n\n" \ + " virsh # numatune <domain>")}, + {NULL, NULL} + +}; + +static const vshCmdOptDef opts_numatune[] = { + {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, + {"nodeset", VSH_OT_DATA, VSH_OFLAG_NONE, + N_("NUMA node to set")}, + {"mode", VSH_OT_DATA, VSH_OFLAG_NONE, + N_("NUMA mode, one of strict, preferred and interleave")}, + {"config", VSH_OT_BOOL, 0, N_("affect next boot")}, + {"live", VSH_OT_BOOL, 0, N_("affect running domain")}, + {"current", VSH_OT_BOOL, 0, N_("affect current domain")}, + {NULL, 0, 0, NULL} +}; + +static bool +cmdNumatune(vshControl * ctl, const vshCmd * cmd) +{ + virDomainPtr dom; + int nparams = 0; + unsigned int i = 0; + virTypedParameterPtr params = NULL, temp = NULL; + const char *nodeset = NULL; + bool ret = false; + unsigned int flags = 0; + int current = vshCommandOptBool(cmd, "current"); + int config = vshCommandOptBool(cmd, "config"); + int live = vshCommandOptBool(cmd, "live"); + const char *mode = NULL; + + if (current) { + if (live || config) { + vshError(ctl, "%s", _("--current must be specified exclusively")); + return false; + } + flags = VIR_DOMAIN_AFFECT_CURRENT; + } else { + if (config) + flags |= VIR_DOMAIN_AFFECT_CONFIG; + if (live) + flags |= VIR_DOMAIN_AFFECT_LIVE; + } + + if (!vshConnectionUsability(ctl, ctl->conn)) + return false; + + if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (vshCommandOptString(cmd, "nodeset", &nodeset) < 0) { + vshError(ctl, "%s", _("Unable to parse nodeset.")); + virDomainFree(dom); + return false; + } + if (nodeset) + nparams++; + if (vshCommandOptString(cmd, "mode", &mode) < 0) { + vshError(ctl, "%s", _("Unable to parse mode.")); + virDomainFree(dom); + return false; + } + if (mode) + nparams++; + + if (nparams == 0) { + /* get the number of numa parameters */ + if (virDomainGetNumaParameters(dom, NULL, &nparams, flags) != 0) { + vshError(ctl, "%s", + _("Unable to get number of memory parameters")); + goto cleanup; + } + + if (nparams == 0) { + /* nothing to output */ + ret = true; + goto cleanup; + } + + /* now go get all the numa parameters */ + params = vshCalloc(ctl, nparams, sizeof(*params)); + if (virDomainGetNumaParameters(dom, params, &nparams, flags) != 0) { + vshError(ctl, "%s", _("Unable to get numa parameters")); + goto cleanup; + } + + for (i = 0; i < nparams; i++) { + switch (params[i].type) { + case VIR_TYPED_PARAM_INT: + vshPrint(ctl, "%-15s: %d\n", params[i].field, + params[i].value.i); + break; + case VIR_TYPED_PARAM_UINT: + vshPrint(ctl, "%-15s: %u\n", params[i].field, + params[i].value.ui); + break; + case VIR_TYPED_PARAM_LLONG: + vshPrint(ctl, "%-15s: %lld\n", params[i].field, + params[i].value.l); + break; + case VIR_TYPED_PARAM_ULLONG: + if (STREQ(params[i].field, VIR_DOMAIN_NUMA_MODE)) + vshPrint(ctl, "%-15s: %s\n", params[i].field, + virDomainNumatuneMemModeTypeToString(params[i].value.ul)); + else + vshPrint(ctl, "%-15s: %llu\n", params[i].field, + params[i].value.ul); + break; + case VIR_TYPED_PARAM_DOUBLE: + vshPrint(ctl, "%-15s: %f\n", params[i].field, + params[i].value.d); + break; + case VIR_TYPED_PARAM_BOOLEAN: + vshPrint(ctl, "%-15s: %d\n", params[i].field, + params[i].value.b); + break; + case VIR_TYPED_PARAM_STRING: + vshPrint(ctl, "%-15s: %s\n", params[i].field, + params[i].value.s); + break; + default: + vshPrint(ctl, "unimplemented numa parameter type\n"); + } + } + + ret = true; + } else { + /* set the numa parameters */ + params = vshCalloc(ctl, nparams, sizeof(*params)); + + for (i = 0; i < nparams; i++) { + temp = ¶ms[i]; + temp->type = VIR_TYPED_PARAM_ULLONG; + + /* + * Some magic here, this is used to fill the params structure with + * the valid arguments passed, after filling the particular + * argument we purposely make them 0, so on the next pass it goes + * to the next valid argument and so on. + */ + if (nodeset) { + temp->value.s = vshStrdup(ctl, nodeset); + temp->type = VIR_TYPED_PARAM_STRING; + if (!virStrcpy(temp->field, VIR_DOMAIN_NUMA_NODESET, + sizeof(temp->field))) + goto cleanup; + } + if (mode) { + if ((temp->value.i = + virDomainNumatuneMemModeTypeFromString(mode)) < 0) { + vshError(ctl, "%s %s", _("Invalid mode"), mode); + goto cleanup; + } + strncpy(temp->field, VIR_DOMAIN_NUMA_MODE, + sizeof(temp->field)); + } + } + if (virDomainSetNumaParameters(dom, params, nparams, flags) != 0) + vshError(ctl, "%s", _("Unable to change numa parameters")); + else + ret = true; + } + + cleanup: + VIR_FREE(params); + virDomainFree(dom); + return ret; +} + +/* * "nodeinfo" command */ static const vshCmdInfo info_nodeinfo[] = { @@ -14470,6 +14649,7 @@ static const vshCmdDef domManagementCmds[] = { info_managedsaveremove, 0}, {"maxvcpus", cmdMaxvcpus, opts_maxvcpus, info_maxvcpus, 0}, {"memtune", cmdMemtune, opts_memtune, info_memtune, 0}, + {"numatune", cmdNumatune, opts_numatune, info_numatune, 0}, {"migrate", cmdMigrate, opts_migrate, info_migrate, 0}, {"migrate-setmaxdowntime", cmdMigrateSetMaxDowntime, opts_migrate_setmaxdowntime, info_migrate_setmaxdowntime, 0}, -- 1.7.3.1

于 2011年11月17日 17:44, Hu Tao 写道:
This series does mainly two things:
1. use cgroup cpuset to manage numa parameters 2. add a virsh command numatune to allow user to change numa parameters from command line
Current numa parameters include nodeset and mode, but these cgroup cpuset provides don't completely match with them, details:
params cpuset ------------------------------------------------------ nodeset cpuset provides cpuset.mems mode strict cpuset provides cpuset.mem_hardwall mode interleave cpuset provices cpuset.memory_spread_* mode preferred no equivalent. !spread to preferred?
Besides, only one of the mode can be set currently at a time, but for cpuset, the parameters are independent. From the perspective of cpuset, we can set all the modes values independently,
Which of the mode will work actually? Per they are independant with each other.
but it seems not be consistent with the current numatune definition in xml. Maybe we can improve the xml definition to fit cpuset better?(there are more cpuset parameters than listed above)
As long as the XML is there, it can't be changed for backwards compatibility, make addtiontions on the existed XML may work? e.g. <numatune type='libnuma'> <memory mode='strict' nodeset='0-10, 15,20'/> </numatune> <numatune type='cgroup'> <memory mode='strict' nodeset='0,1'/> <memory mode='interleave' nodeset='2,3'/> <memory mode='preferred' nodeset='4,5'/> ........... more for the lots of cpuset.mem*? </numatune> The type can be "libnuma" by default for backwards campatible.
Hu Tao (11): don't modify CPU set string in virDomainCpuSetParse enable cgroup cpuset by default Add functions to set/get cgroup cpuset parameters introduce numa backend use cpuset to manage numa add VIR_DOMAIN_NUMATUNE_MEM_NONE add new API virDomain{G,S}etNumaParameters Implement main entries of virDomain{G,S}etNumaParameters Add virDomain{G,S}etNumaParameters support to the remote driver Implement virDomain{G,S}etNumaParameters for the qemu driver add new command numatune to virsh
daemon/remote.c | 64 +++++++ include/libvirt/libvirt.h.in | 23 +++ python/generator.py | 2 + src/conf/domain_conf.c | 2 +- src/conf/domain_conf.h | 9 + src/driver.h | 15 ++ src/libvirt.c | 113 ++++++++++++ src/libvirt_private.syms | 10 + src/libvirt_public.syms | 6 + src/qemu/qemu.conf | 5 +- src/qemu/qemu_cgroup.c | 73 ++++++++ src/qemu/qemu_conf.c | 3 +- src/qemu/qemu_driver.c | 399 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_process.c | 11 +- src/remote/remote_driver.c | 50 ++++++ src/remote/remote_protocol.x | 25 +++- src/remote_protocol-structs | 16 ++ src/util/cgroup.c | 112 ++++++++++++ src/util/cgroup.h | 11 ++ tools/virsh.c | 180 +++++++++++++++++++ 20 files changed, 1121 insertions(+), 8 deletions(-)

On Thu, Nov 17, 2011 at 05:44:10PM +0800, Hu Tao wrote:
This series does mainly two things:
1. use cgroup cpuset to manage numa parameters 2. add a virsh command numatune to allow user to change numa parameters from command line
Current numa parameters include nodeset and mode, but these cgroup cpuset provides don't completely match with them, details:
params cpuset ------------------------------------------------------ nodeset cpuset provides cpuset.mems mode strict cpuset provides cpuset.mem_hardwall mode interleave cpuset provices cpuset.memory_spread_* mode preferred no equivalent. !spread to preferred?
This isn't right - there are only 3 existing configs in the XML currently, current 'strict' does not map to mem_hardwall, nor does interleave map to memory_spread AFAICT Currently we have have three different configurations possible for memory with the following semantics mode=strict - allocation is from designated nodes, or fails mode=preferred - allocation is from designated nodes, or falls back to other nodes mode=interleave - allocation is interleaved across designated nodes In cgroups cpuset controller you can set cpuset.mems - memory is allocated from designated nodes, or fails cpuset.mem_exclusive - no other cgroups, except parents, or children can allocation from nos listed in cpuset.mems cpuset.mem_hardwall - no other cgroups are allowed to allocate from the nodes listed in cpuset.mems cpuset.memory_spread* - control allocations of internal kernel data structures IMHO, the last three are not really required for libvirt per VM usage - the management application can trivially decide whether to allow overlapping allocation between VMs without needing to set this kernel tunable. So, if using the cgroups cpuset controller for NUMA, the *only* policy we can implement is mode=strict. We cannot implement mode=preferred or mode=interleave, given the currently available cpuset controls. IMHO, we should thus continue to use libnuma for specifying *all* the policies, however, if mode=strict, then we should *also* apply the policy in the cgroups using cpuset.mems since this will at least allow later tuning of nodemask on the fly. We will have to refuse any attempt to switch between different modes on the fly. Only the nodemask, with mode=strict will be dynamically changable. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (5)
-
Bharata B Rao
-
Daniel P. Berrange
-
Eric Blake
-
Hu Tao
-
Osier Yang