[libvirt] [PATCH 0/6] Support for per-guest-node binding

Currently we are only able to bind the whole domain to some host nodes using the /domain/numatune/memory element. Numerous requests were made to support host<->guest numa node bindings, so this series tries to pinch an idea on how to do that using /domain/numatune/memnode elements. That is incompatible with automatic numa placement (numad) since that makes no sense. Also this disables any live changes to numa parameters (the /domain/numatune/memory settings) since we cannot change the settings given to qemu. Martin Kletzander (6): conf, schema: add 'id' field for cells conf, schema: add support for numatune memnode element conf: add virDomainGetMemsForGuestCpu() qemu: purely a code movement qemu: memory-ram capability probing qemu: pass numa node binding preferences to qemu docs/formatdomain.html.in | 26 ++- docs/schemas/domaincommon.rng | 22 ++ src/conf/cpu_conf.c | 39 +++- src/conf/domain_conf.c | 259 ++++++++++++++++++--- src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 1 + src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_cgroup.c | 18 +- src/qemu/qemu_command.c | 160 +++++++++++-- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_domain.c | 23 +- src/qemu/qemu_driver.c | 23 +- src/qemu/qemu_process.c | 3 +- src/util/virnuma.h | 14 +- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml | 6 +- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml | 6 +- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml | 25 ++ .../qemuxml2argv-numatune-auto-prefer.xml | 29 +++ .../qemuxml2argv-numatune-auto.args | 6 + .../qemuxml2argv-numatune-auto.xml | 26 +++ .../qemuxml2argv-numatune-memnode-nocpu.xml | 25 ++ .../qemuxml2argv-numatune-memnodes-problematic.xml | 31 +++ .../qemuxml2argv-numatune-memnodes.args | 8 + .../qemuxml2argv-numatune-memnodes.xml | 31 +++ .../qemuxml2argv-numatune-prefer.args | 6 + .../qemuxml2argv-numatune-prefer.xml | 29 +++ tests/qemuxml2argvtest.c | 51 ++-- .../qemuxml2xmlout-cpu-numa1.xml | 28 +++ .../qemuxml2xmlout-cpu-numa2.xml | 28 +++ tests/qemuxml2xmltest.c | 4 + tests/qemuxmlnstest.c | 2 +- 32 files changed, 845 insertions(+), 94 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto-prefer.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnode-nocpu.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes-problematic.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa1.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa2.xml -- 2.0.0

In XML format, by definition, order of fields should not matter, so oder of parsing the elements doesn't affect the end result. When specifying guest NUMA cells, we depend only on the order of the 'cell' elements. With this patch all older domain XMLs are parsed as before, but with the 'id' attribute they are parsed and formatted according to that field. This will be useful when we have tuning settings for particular guest NUMA node. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/formatdomain.html.in | 11 +++--- docs/schemas/domaincommon.rng | 5 +++ src/conf/cpu_conf.c | 39 +++++++++++++++++++--- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml | 6 ++-- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml | 6 ++-- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml | 25 ++++++++++++++ tests/qemuxml2argvtest.c | 1 + .../qemuxml2xmlout-cpu-numa1.xml | 28 ++++++++++++++++ .../qemuxml2xmlout-cpu-numa2.xml | 28 ++++++++++++++++ tests/qemuxml2xmltest.c | 3 ++ 10 files changed, 138 insertions(+), 14 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa1.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa2.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 691a451..041f70d 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1030,8 +1030,8 @@ <cpu> ... <numa> - <cell cpus='0-3' memory='512000'/> - <cell cpus='4-7' memory='512000'/> + <cell id='0' cpus='0-3' memory='512000'/> + <cell id='1' cpus='4-7' memory='512000'/> </numa> ... </cpu> @@ -1041,8 +1041,11 @@ Each <code>cell</code> element specifies a NUMA cell or a NUMA node. <code>cpus</code> specifies the CPU or range of CPUs that are part of the node. <code>memory</code> specifies the node memory in kibibytes - (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid - or nodeid in the increasing order starting from 0. + (i.e. blocks of 1024 bytes). All cells should have <code>id</code> + attribute in case referring to some cell is necessary in the code, + otherwise the cells are assigned ids in the increasing order starting + from 0. Mixing cells with and without the <code>id</code> attribute + is not recommended as it may result in unwanted behaviour. </p> <p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index af67123..0787b5a 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -3892,6 +3892,11 @@ <define name="numaCell"> <element name="cell"> + <optional> + <attribute name="id"> + <ref name="unsignedInt"/> + </attribute> + </optional> <attribute name="cpus"> <ref name="cpuset"/> </attribute> diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index ebdaa19..ec80340 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -438,17 +438,47 @@ virCPUDefParseXML(xmlNodePtr node, for (i = 0; i < n; i++) { char *cpus, *memory; int ret, ncpus = 0; + unsigned int cur_cell; + char *tmp = NULL; + + tmp = virXMLPropString(nodes[i], "id"); + if (!tmp) { + cur_cell = i; + } else { + ret = virStrToLong_ui(tmp, NULL, 10, &cur_cell); + VIR_FREE(tmp); + if (ret == -1) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Invalid 'id' attribute in NUMA cell")); + goto error; + } + } + + if (cur_cell >= n) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Exactly one 'cell' element per guest " + "NUMA cell allowed, non-contiguous ranges or " + "ranges not starting from 0 are not allowed")); + goto error; + } + + if (def->cells[cur_cell].cpustr) { + virReportError(VIR_ERR_XML_ERROR, + _("Duplicate NUMA cell info for cell id '%u'"), + cur_cell); + goto error; + } - def->cells[i].cellid = i; + def->cells[cur_cell].cellid = cur_cell; cpus = virXMLPropString(nodes[i], "cpus"); if (!cpus) { virReportError(VIR_ERR_XML_ERROR, "%s", _("Missing 'cpus' attribute in NUMA cell")); goto error; } - def->cells[i].cpustr = cpus; + def->cells[cur_cell].cpustr = cpus; - ncpus = virBitmapParse(cpus, 0, &def->cells[i].cpumask, + ncpus = virBitmapParse(cpus, 0, &def->cells[cur_cell].cpumask, VIR_DOMAIN_CPUMASK_LEN); if (ncpus <= 0) goto error; @@ -461,7 +491,7 @@ virCPUDefParseXML(xmlNodePtr node, goto error; } - ret = virStrToLong_ui(memory, NULL, 10, &def->cells[i].mem); + ret = virStrToLong_ui(memory, NULL, 10, &def->cells[cur_cell].mem); if (ret == -1) { virReportError(VIR_ERR_XML_ERROR, "%s", _("Invalid 'memory' attribute in NUMA cell")); @@ -647,6 +677,7 @@ virCPUDefFormatBuf(virBufferPtr buf, virBufferAdjustIndent(buf, 2); for (i = 0; i < def->ncells; i++) { virBufferAddLit(buf, "<cell"); + virBufferAsprintf(buf, " id='%u'", def->cells[i].cellid); virBufferAsprintf(buf, " cpus='%s'", def->cells[i].cpustr); virBufferAsprintf(buf, " memory='%d'", def->cells[i].mem); virBufferAddLit(buf, "/>\n"); diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml index ee402c8..0543f7f 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml @@ -9,10 +9,10 @@ <boot dev='network'/> </os> <cpu> - <topology sockets="2" cores="4" threads="2"/> + <topology sockets='2' cores='4' threads='2'/> <numa> - <cell cpus="0-7" memory="109550"/> - <cell cpus="8-15" memory="109550"/> + <cell cpus='0-7' memory='109550'/> + <cell cpus='8-15' memory='109550'/> </numa> </cpu> <clock offset='utc'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml index ee402c8..0a5f9fc 100644 --- a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml @@ -9,10 +9,10 @@ <boot dev='network'/> </os> <cpu> - <topology sockets="2" cores="4" threads="2"/> + <topology sockets='2' cores='4' threads='2'/> <numa> - <cell cpus="0-7" memory="109550"/> - <cell cpus="8-15" memory="109550"/> + <cell id='1' cpus='8-15' memory='109550'/> + <cell id='0' cpus='0-7' memory='109550'/> </numa> </cpu> <clock offset='utc'/> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml new file mode 100644 index 0000000..fa3070d --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml @@ -0,0 +1,25 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu> + <topology sockets='2' cores='4' threads='2'/> + <numa> + <cell id='1' cpus='0-7' memory='109550'/> + <cell id='2' cpus='8-15' memory='109550'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 24d104e..bdea8b0 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1169,6 +1169,7 @@ mymain(void) DO_TEST("cpu-strict1", NONE); DO_TEST("cpu-numa1", NONE); DO_TEST("cpu-numa2", QEMU_CAPS_SMP_TOPOLOGY); + DO_TEST_PARSE_ERROR("cpu-numa3", NONE); DO_TEST("cpu-host-model", NONE); skipLegacyCPUs = true; DO_TEST("cpu-host-model-fallback", NONE); diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa1.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa1.xml new file mode 100644 index 0000000..227bf1c --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa1.xml @@ -0,0 +1,28 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu> + <topology sockets='2' cores='4' threads='2'/> + <numa> + <cell id='0' cpus='0-7' memory='109550'/> + <cell id='1' cpus='8-15' memory='109550'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa2.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa2.xml new file mode 100644 index 0000000..227bf1c --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa2.xml @@ -0,0 +1,28 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu> + <topology sockets='2' cores='4' threads='2'/> + <numa> + <cell id='0' cpus='0-7' memory='109550'/> + <cell id='1' cpus='8-15' memory='109550'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index da528da..a446ea1 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -364,6 +364,9 @@ mymain(void) DO_TEST("chardev-label"); + DO_TEST_DIFFERENT("cpu-numa1"); + DO_TEST_DIFFERENT("cpu-numa2"); + virObjectUnref(driver.caps); virObjectUnref(driver.xmlopt); -- 2.0.0

On Wed, Jun 04, 2014 at 04:56:27PM +0200, Martin Kletzander wrote:
In XML format, by definition, order of fields should not matter, so oder of parsing the elements doesn't affect the end result. When specifying guest NUMA cells, we depend only on the order of the 'cell' elements. With this patch all older domain XMLs are parsed as before, but with the 'id' attribute they are parsed and formatted according to that field. This will be useful when we have tuning settings for particular guest NUMA node.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/formatdomain.html.in | 11 +++--- docs/schemas/domaincommon.rng | 5 +++ src/conf/cpu_conf.c | 39 +++++++++++++++++++--- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml | 6 ++-- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml | 6 ++-- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml | 25 ++++++++++++++ tests/qemuxml2argvtest.c | 1 + .../qemuxml2xmlout-cpu-numa1.xml | 28 ++++++++++++++++ .../qemuxml2xmlout-cpu-numa2.xml | 28 ++++++++++++++++ tests/qemuxml2xmltest.c | 3 ++ 10 files changed, 138 insertions(+), 14 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa1.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa2.xml
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This element specifies similar settings as the memory element, although memnode can be used per guest NUMA node. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/formatdomain.html.in | 15 +++ docs/schemas/domaincommon.rng | 17 ++++ src/conf/domain_conf.c | 220 +++++++++++++++++++++++++++++++++++------- src/qemu/qemu_domain.c | 23 ++++- src/qemu/qemu_driver.c | 11 +++ src/util/virnuma.h | 14 ++- 6 files changed, 260 insertions(+), 40 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 041f70d..2d855ea 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -709,6 +709,8 @@ ... <numatune> <memory mode="strict" nodeset="1-4,^3"/> + <memnode cellid="0" mode="strict" nodeset="1"/> + <memnode cellid="2" mode="preferred" nodeset="2"/> </numatune> ... </domain> @@ -745,6 +747,19 @@ <span class='since'>Since 0.9.3</span> </dd> + <dt><code>memnode</code></dt> + <dd> + Optional <code>memnode</code> elements can specify memory allocation + policies per each guest NUMA node. For those nodes having no + corresponding <code>memnode</code> element, the default from + element <code>memory</code> will be used. Attribute <code>cellid</code> + addresses guest NUMA node for which the settings are applied. + Attributes <code>mode</code> and <code>nodeset</code> have the same + meaning and syntax as in <code>memory</code> element. + + This setting is not compatible with automatic placement. + <span class='since'>QEMU Since 1.2.6</span> + </dd> </dl> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 0787b5a..a8e3ba0 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -789,6 +789,23 @@ </choice> </element> </optional> + <zeroOrMore> + <element name="memnode"> + <attribute name="cellid"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="mode"> + <choice> + <value>strict</value> + <value>preferred</value> + <value>interleave</value> + </choice> + </attribute> + <attribute name='nodeset'> + <ref name='cpuset'/> + </attribute> + </element> + </zeroOrMore> </element> </define> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fe06921..352ba92 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2085,6 +2085,9 @@ void virDomainDefFree(virDomainDefPtr def) virDomainVcpuPinDefFree(def->cputune.emulatorpin); virBitmapFree(def->numatune.memory.nodemask); + for (i = 0; i < def->numatune.nmem_nodes; i++) + virBitmapFree(def->numatune.mem_nodes[i].nodemask); + VIR_FREE(def->numatune.mem_nodes); virSysinfoDefFree(def->sysinfo); @@ -11232,6 +11235,8 @@ virDomainDefParseXML(xmlDocPtr xml, bool usb_other = false; bool usb_master = false; bool primaryVideo = false; + bool mem_nodes = false; + if (VIR_ALLOC(def) < 0) return NULL; @@ -11666,6 +11671,33 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + + /* analysis of cpu handling */ + if ((node = virXPathNode("./cpu[1]", ctxt)) != NULL) { + xmlNodePtr oldnode = ctxt->node; + ctxt->node = node; + def->cpu = virCPUDefParseXML(node, ctxt, VIR_CPU_TYPE_GUEST); + ctxt->node = oldnode; + + if (def->cpu == NULL) + goto error; + + if (def->cpu->sockets && + def->maxvcpus > + def->cpu->sockets * def->cpu->cores * def->cpu->threads) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("Maximum CPUs greater than topology limit")); + goto error; + } + + if (def->cpu->cells_cpus > def->maxvcpus) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Number of CPUs in <numa> exceeds the" + " <vcpu> count")); + goto error; + } + } + /* Extract numatune if exists. */ if ((n = virXPathNodeSet("./numatune", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -11682,6 +11714,12 @@ virDomainDefParseXML(xmlDocPtr xml, if (n) { cur = nodes[0]->children; + if (def->cpu) { + if (VIR_ALLOC_N(def->numatune.mem_nodes, def->cpu->ncells) < 0) + goto error; + def->numatune.nmem_nodes = def->cpu->ncells; + } + while (cur != NULL) { if (cur->type == XML_ELEMENT_NODE) { if (xmlStrEqual(cur->name, BAD_CAST "memory")) { @@ -11764,6 +11802,80 @@ virDomainDefParseXML(xmlDocPtr xml, def->placement_mode = VIR_DOMAIN_CPU_PLACEMENT_MODE_AUTO; def->numatune.memory.placement_mode = placement_mode; + + } else if (xmlStrEqual(cur->name, BAD_CAST "memnode")) { + unsigned int cellid; + struct mem_node *mem_node = NULL; + + if (!def->numatune.nmem_nodes) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Element 'memnode' is invalid without " + "any guest NUMA cells")); + goto error; + } + tmp = virXMLPropString(cur, "cellid"); + if (!tmp) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing required cellid attribute " + "in numatune memnode element")); + goto error; + } + if (virStrToLong_ui(tmp, NULL, 10, &cellid) < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Invalid cellid attribute " + "in numatune memnode element")); + goto error; + } + VIR_FREE(tmp); + + if (cellid >= def->numatune.nmem_nodes) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Argument 'cellid' in numatune " + "memnode element must correspond to " + "existing guest's NUMA cell")); + goto error; + } + + mem_node = &def->numatune.mem_nodes[cellid]; + + if (mem_node->specified) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Multiple numatune memnode elements " + "with duplicate 'cellid'")); + goto error; + } + + mem_node->specified = true; + mem_nodes = true; + + tmp = virXMLPropString(cur, "mode"); + if (!tmp) { + mem_node->mode = VIR_DOMAIN_NUMATUNE_MEM_STRICT; + } else { + if ((mem_node->mode = + virDomainNumatuneMemModeTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Invalid mode attribute " + "in numatune memnode element")); + goto error; + } + VIR_FREE(tmp); + } + + tmp = virXMLPropString(cur, "nodeset"); + if (!tmp) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing required nodeset attribute " + "in numatune memnode element")); + goto error; + } + if (virBitmapParse(tmp, 0, + &mem_node->nodemask, + VIR_DOMAIN_CPUMASK_LEN) < 0) { + goto error; + } + VIR_FREE(tmp); + } else { virReportError(VIR_ERR_XML_ERROR, _("unsupported XML element %s"), @@ -11784,6 +11896,42 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(nodes); + if (def->numatune.nmem_nodes && + def->numatune.memory.placement_mode == + VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_AUTO) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("Per-node binding is not compatible with " + "automatic NUMA placement.")); + goto error; + } + + if (!mem_nodes) { + /* If there are no <memnode> settings, clear all these data. + * If any driver wants to use these in the future, this code + * can be cleared. Until then it's easier to keep it this + * way. */ + for (i = 0; i < def->numatune.nmem_nodes; i++) + virBitmapFree(def->numatune.mem_nodes[i].nodemask); + VIR_FREE(def->numatune.mem_nodes); + def->numatune.nmem_nodes = 0; + } else { + /* Copy numatune/memory information into each node, but leave + * specified == false. This eases the process of determination + * of each node's nodemask */ + for (i = 0; i < def->numatune.nmem_nodes; i++) { + struct mem_node *mem_node = &def->numatune.mem_nodes[i]; + + if (mem_node->specified) + continue; + + mem_node->mode = def->numatune.memory.mode; + mem_node->nodemask = virBitmapNewCopy(def->numatune.memory.nodemask); + + if (!mem_node->nodemask) + goto error; + } + } + if ((n = virXPathNodeSet("./resource", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("cannot extract resource nodes")); @@ -12863,32 +13011,6 @@ virDomainDefParseXML(xmlDocPtr xml, goto error; } - /* analysis of cpu handling */ - if ((node = virXPathNode("./cpu[1]", ctxt)) != NULL) { - xmlNodePtr oldnode = ctxt->node; - ctxt->node = node; - def->cpu = virCPUDefParseXML(node, ctxt, VIR_CPU_TYPE_GUEST); - ctxt->node = oldnode; - - if (def->cpu == NULL) - goto error; - - if (def->cpu->sockets && - def->maxvcpus > - def->cpu->sockets * def->cpu->cores * def->cpu->threads) { - virReportError(VIR_ERR_XML_DETAIL, "%s", - _("Maximum CPUs greater than topology limit")); - goto error; - } - - if (def->cpu->cells_cpus > def->maxvcpus) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("Number of CPUs in <numa> exceeds the" - " <vcpu> count")); - goto error; - } - } - if ((node = virXPathNode("./sysinfo[1]", ctxt)) != NULL) { xmlNodePtr oldnode = ctxt->node; ctxt->node = node; @@ -17395,31 +17517,57 @@ virDomainDefFormatInternal(virDomainDefPtr def, virBufferAddLit(buf, "</cputune>\n"); if (def->numatune.memory.nodemask || - def->numatune.memory.placement_mode) { + def->numatune.memory.placement_mode || + def->numatune.nmem_nodes) { const char *mode; char *nodemask = NULL; const char *placement; virBufferAddLit(buf, "<numatune>\n"); virBufferAdjustIndent(buf, 2); - mode = virDomainNumatuneMemModeTypeToString(def->numatune.memory.mode); - virBufferAsprintf(buf, "<memory mode='%s' ", mode); - if (def->numatune.memory.placement_mode == - VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_STATIC) { - nodemask = virBitmapFormat(def->numatune.memory.nodemask); + if (def->numatune.memory.nodemask || + def->numatune.memory.placement_mode) { + + mode = virDomainNumatuneMemModeTypeToString(def->numatune.memory.mode); + virBufferAsprintf(buf, "<memory mode='%s' ", mode); + + if (def->numatune.memory.placement_mode == + VIR_NUMA_TUNE_MEM_PLACEMENT_MODE_STATIC) { + nodemask = virBitmapFormat(def->numatune.memory.nodemask); + if (nodemask == NULL) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to format nodeset for " + "NUMA memory tuning")); + goto error; + } + virBufferAsprintf(buf, "nodeset='%s'/>\n", nodemask); + VIR_FREE(nodemask); + } else if (def->numatune.memory.placement_mode) { + placement = virNumaTuneMemPlacementModeTypeToString(def->numatune.memory.placement_mode); + virBufferAsprintf(buf, "placement='%s'/>\n", placement); + } + } + + for (i = 0; i < def->numatune.nmem_nodes; i++) { + struct mem_node *mem_node = &def->numatune.mem_nodes[i]; + if (!mem_node->specified) + continue; + + nodemask = virBitmapFormat(mem_node->nodemask); + mode = virDomainNumatuneMemModeTypeToString(mem_node->mode); if (nodemask == NULL) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("failed to format nodeset for " "NUMA memory tuning")); goto error; } - virBufferAsprintf(buf, "nodeset='%s'/>\n", nodemask); + virBufferAsprintf(buf, + "<memnode cellid='%zu' mode='%s' nodeset='%s'/>\n", + i, mode, nodemask); VIR_FREE(nodemask); - } else if (def->numatune.memory.placement_mode) { - placement = virNumaTuneMemPlacementModeTypeToString(def->numatune.memory.placement_mode); - virBufferAsprintf(buf, "placement='%s'/>\n", placement); } + virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</numatune>\n"); } diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index bbe32a0..99f9c48 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -712,7 +712,28 @@ qemuDomainDefPostParse(virDomainDefPtr def, bool addDefaultMemballoon = true; bool addDefaultUSBKBD = false; bool addDefaultUSBMouse = false; - + size_t i = 0; + + if (def->numatune.memory.nodemask) { + for (i = 0; i < def->numatune.nmem_nodes; i++) { + struct mem_node *mem_node = &def->numatune.mem_nodes[i]; + ssize_t pos = -1; + bool bit = false; + + if (!mem_node->specified) + continue; + + while ((pos = virBitmapNextSetBit(mem_node->nodemask, pos)) >= 0) { + if (virBitmapGetBit(def->numatune.memory.nodemask, pos, &bit) < 0 || + !bit) { + virReportError(VIR_ERR_XML_DETAIL, "%s", + _("memnode nodeset must be subset of the " + "global memory nodeset")); + return -1; + } + } + } + } /* check for emulator and create a default one if needed */ if (!def->emulator && !(def->emulator = virDomainDefGetDefaultEmulator(def, caps))) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3a7622a..545516e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -8646,6 +8646,7 @@ qemuDomainSetNumaParamsLive(virDomainObjPtr vm, virCgroupPtr cgroup_temp = NULL; virBitmapPtr temp_nodeset = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainDefPtr def = vm->def; char *nodeset_str = NULL; size_t i = 0; int ret = -1; @@ -8657,6 +8658,16 @@ qemuDomainSetNumaParamsLive(virDomainObjPtr vm, goto cleanup; } + for (i = 0; i < def->numatune.nmem_nodes; i++) { + if (def->numatune.mem_nodes[i].specified) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("change of nodeset for running domain " + "with per guest NUMA node numatune settings " + "is not supported")); + goto cleanup; + } + } + /* Get existing nodeset values */ if (virCgroupGetCpusetMems(priv->cgroup, &nodeset_str) < 0 || virBitmapParse(nodeset_str, 0, &temp_nodeset, diff --git a/src/util/virnuma.h b/src/util/virnuma.h index fe1e966..50fa3f8 100644 --- a/src/util/virnuma.h +++ b/src/util/virnuma.h @@ -1,7 +1,7 @@ /* * virnuma.h: helper APIs for managing numa * - * Copyright (C) 2011-2013 Red Hat, Inc. + * Copyright (C) 2011-2014 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -43,9 +43,17 @@ typedef virNumaTuneDef *virNumaTuneDefPtr; struct _virNumaTuneDef { struct { virBitmapPtr nodemask; - int mode; + int mode; /* enum virDomainNumatuneMemMode */ int placement_mode; /* enum virNumaTuneMemPlacementMode */ - } memory; + } memory; /* pinning for all the memory */ + + struct mem_node { + bool specified; + unsigned int nodeid; + virBitmapPtr nodemask; + int mode; + } *mem_nodes; /* pinning per guest's NUMA node */ + size_t nmem_nodes; /* Future NUMA tuning related stuff should go here. */ }; -- 2.0.0

On Wed, Jun 04, 2014 at 04:56:28PM +0200, Martin Kletzander wrote:
@@ -43,9 +43,17 @@ typedef virNumaTuneDef *virNumaTuneDefPtr; struct _virNumaTuneDef { struct { virBitmapPtr nodemask; - int mode; + int mode; /* enum virDomainNumatuneMemMode */ int placement_mode; /* enum virNumaTuneMemPlacementMode */ - } memory; + } memory; /* pinning for all the memory */ + + struct mem_node {
Declaring structs inline without typedefs isn't our usual style. There should be a typedef for virNumaTuneMemNodeDef & virNumaTuneMemNodeDefPtr Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, Jun 04, 2014 at 04:07:06PM +0100, Daniel P. Berrange wrote:
On Wed, Jun 04, 2014 at 04:56:28PM +0200, Martin Kletzander wrote:
@@ -43,9 +43,17 @@ typedef virNumaTuneDef *virNumaTuneDefPtr; struct _virNumaTuneDef { struct { virBitmapPtr nodemask; - int mode; + int mode; /* enum virDomainNumatuneMemMode */ int placement_mode; /* enum virNumaTuneMemPlacementMode */ - } memory; + } memory; /* pinning for all the memory */ + + struct mem_node {
Declaring structs inline without typedefs isn't our usual style. There should be a typedef for virNumaTuneMemNodeDef & virNumaTuneMemNodeDefPtr
Yeah, it should. Also the numatunedef should be in some conf/ file with encapsulated insides. Parsing should be in that file too, of course. I took the liberty of proposing it this way and re-factoring afterwards since all those details were drafting me away of the thing I wanted to make out of that. However, I forgot to mention that the same way I forgot to mention that virGetNumaNodeParameters() API will follow even though is not part of this patch. Martin

On Wed, Jun 04, 2014 at 04:56:28PM +0200, Martin Kletzander wrote:
This element specifies similar settings as the memory element, although memnode can be used per guest NUMA node.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/formatdomain.html.in | 15 +++ docs/schemas/domaincommon.rng | 17 ++++ src/conf/domain_conf.c | 220 +++++++++++++++++++++++++++++++++++------- src/qemu/qemu_domain.c | 23 ++++- src/qemu/qemu_driver.c | 11 +++ src/util/virnuma.h | 14 ++- 6 files changed, 260 insertions(+), 40 deletions(-)
The new elements should be added to the tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml files you added in the previous commit too. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/conf/domain_conf.c | 39 +++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 4 ++++ src/libvirt_private.syms | 1 + 3 files changed, 44 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 352ba92..fef956c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -19879,3 +19879,42 @@ virDomainObjSetMetadata(virDomainObjPtr vm, return 0; } + +int +virDomainGetMemsForGuestCpu(virDomainDefPtr def, + size_t cpuid, + char **nodemask) +{ + ssize_t i = 0; + + *nodemask = NULL; + + /* nothing to do with no per-node settings */ + if (!def->numatune.nmem_nodes) + return 0; + + /* !!(def->numatune.nmem_nodes) -> !!(def->cpu) */ + for (i = 0; i < def->cpu->ncells; i++) { + bool result = false; + + if (virBitmapGetBit(def->cpu->cells[i].cpumask, cpuid, &result) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Problem accessing domain's " + "NUMA cell information")); + return -1; + } + if (!result) + continue; + + if (def->numatune.mem_nodes[i].mode != VIR_DOMAIN_NUMATUNE_MEM_STRICT) + return 0; + + *nodemask = virBitmapFormat(def->numatune.mem_nodes[i].nodemask); + if (!*nodemask) + return -1; + + return 0; + } + + return 0; +} diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 2de807d..b6e0765 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2711,4 +2711,8 @@ int virDomainObjSetMetadata(virDomainObjPtr vm, const char *configDir, unsigned int flags); +int virDomainGetMemsForGuestCpu(virDomainDefPtr def, + size_t cpuid, + char **nodemask); + #endif /* __DOMAIN_CONF_H */ diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index d73a9f5..88eae2f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -244,6 +244,7 @@ virDomainFSTypeToString; virDomainFSWrpolicyTypeFromString; virDomainFSWrpolicyTypeToString; virDomainGetFilesystemForTarget; +virDomainGetMemsForGuestCpu; virDomainGraphicsAuthConnectedTypeFromString; virDomainGraphicsAuthConnectedTypeToString; virDomainGraphicsDefFree; -- 2.0.0

On Wed, Jun 04, 2014 at 04:56:29PM +0200, Martin Kletzander wrote:
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/conf/domain_conf.c | 39 +++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 4 ++++ src/libvirt_private.syms | 1 + 3 files changed, 44 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

to ease the review of commits to follow. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_command.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index e6acced..cc6023d 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6541,11 +6541,22 @@ qemuBuildNumaArgStr(const virDomainDef *def, virCommandPtr cmd) for (i = 0; i < def->cpu->ncells; i++) { VIR_FREE(cpumask); + int cellmem = VIR_DIV_UP(def->cpu->cells[i].mem, 1024); + def->cpu->cells[i].mem = cellmem * 1024; + + if (!(cpumask = virBitmapFormat(def->cpu->cells[i].cpumask))) + goto cleanup; + + if (strchr(cpumask, ',')) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA cpu ranges are not supported " + "with this QEMU")); + goto cleanup; + } + virCommandAddArg(cmd, "-numa"); virBufferAsprintf(&buf, "node,nodeid=%d", def->cpu->cells[i].cellid); virBufferAddLit(&buf, ",cpus="); - if (!(cpumask = virBitmapFormat(def->cpu->cells[i].cpumask))) - goto cleanup; /* Up through qemu 1.4, -numa does not accept a cpus * argument any more complex than start-stop. @@ -6553,16 +6564,8 @@ qemuBuildNumaArgStr(const virDomainDef *def, virCommandPtr cmd) * XXX For qemu 1.5, the syntax has not yet been decided; * but when it is, we need a capability bit and * translation of our cpumask into the qemu syntax. */ - if (strchr(cpumask, ',')) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("disjoint NUMA cpu ranges are not supported " - "with this QEMU")); - goto cleanup; - } virBufferAdd(&buf, cpumask, -1); - def->cpu->cells[i].mem = VIR_DIV_UP(def->cpu->cells[i].mem, - 1024) * 1024; - virBufferAsprintf(&buf, ",mem=%d", def->cpu->cells[i].mem / 1024); + virBufferAsprintf(&buf, ",mem=%d", cellmem); if (virBufferError(&buf)) { virReportOOMError(); -- 2.0.0

On Wed, Jun 04, 2014 at 04:56:30PM +0200, Martin Kletzander wrote:
to ease the review of commits to follow.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_command.c | 25 ++++++++++++++----------- 1 file changed, 14 insertions(+), 11 deletions(-)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

The numa patch series in qemu adds "memory-ram" object type by which we can tell whether we can use such objects. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 3 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 08c3d04..a479035 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -256,6 +256,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "usb-kbd", /* 165 */ "host-pci-multidomain", "msg-timestamp", + "memory-ram", ); @@ -1440,6 +1441,7 @@ struct virQEMUCapsStringFlags virQEMUCapsObjectTypes[] = { { "ich9-intel-hda", QEMU_CAPS_DEVICE_ICH9_INTEL_HDA }, { "pvpanic", QEMU_CAPS_DEVICE_PANIC }, { "usb-kbd", QEMU_CAPS_DEVICE_USB_KBD }, + { "memory-ram", QEMU_CAPS_OBJECT_MEMORY_RAM }, }; static struct virQEMUCapsStringFlags virQEMUCapsObjectPropsVirtioBlk[] = { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index d755caa..350c43e 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -206,6 +206,7 @@ enum virQEMUCapsFlags { QEMU_CAPS_DEVICE_USB_KBD = 165, /* -device usb-kbd */ QEMU_CAPS_HOST_PCI_MULTIDOMAIN = 166, /* support domain > 0 in host pci address */ QEMU_CAPS_MSG_TIMESTAMP = 167, /* -msg timestamp */ + QEMU_CAPS_OBJECT_MEMORY_RAM = 168, /* -object memory-ram */ QEMU_CAPS_LAST, /* this must always be the last item */ }; -- 2.0.0

On Wed, Jun 04, 2014 at 04:56:31PM +0200, Martin Kletzander wrote:
The numa patch series in qemu adds "memory-ram" object type by which we can tell whether we can use such objects.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 3 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Currently, we only bind the whole QEMU domain to memory nodes specified in nodemask altogether. That, however, doesn't make much sense when one wants to control from where the memory for particular guest nodes should be allocated. QEMU allows us to do that by specifying 'host-nodes' parameter for the 'memory-ram' object, so let's use that. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/qemu/qemu_cgroup.c | 18 ++- src/qemu/qemu_command.c | 141 +++++++++++++++++++-- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_driver.c | 12 +- src/qemu/qemu_process.c | 3 +- .../qemuxml2argv-numatune-auto-prefer.xml | 29 +++++ .../qemuxml2argv-numatune-auto.args | 6 + .../qemuxml2argv-numatune-auto.xml | 26 ++++ .../qemuxml2argv-numatune-memnode-nocpu.xml | 25 ++++ .../qemuxml2argv-numatune-memnodes-problematic.xml | 31 +++++ .../qemuxml2argv-numatune-memnodes.args | 8 ++ .../qemuxml2argv-numatune-memnodes.xml | 31 +++++ .../qemuxml2argv-numatune-prefer.args | 6 + .../qemuxml2argv-numatune-prefer.xml | 29 +++++ tests/qemuxml2argvtest.c | 50 +++++--- tests/qemuxml2xmltest.c | 1 + tests/qemuxmlnstest.c | 2 +- 17 files changed, 389 insertions(+), 32 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto-prefer.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnode-nocpu.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes-problematic.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.xml diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index b1bfb5a..6415a6d 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -23,6 +23,8 @@ #include <config.h> +#include "domain_audit.h" +#include "domain_conf.h" #include "qemu_cgroup.h" #include "qemu_domain.h" #include "qemu_process.h" @@ -30,7 +32,6 @@ #include "virlog.h" #include "viralloc.h" #include "virerror.h" -#include "domain_audit.h" #include "virscsi.h" #include "virstring.h" #include "virfile.h" @@ -895,6 +896,7 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm) size_t i, j; unsigned long long period = vm->def->cputune.period; long long quota = vm->def->cputune.quota; + char *mem_mask = NULL; if ((period || quota) && !virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPU)) { @@ -930,8 +932,15 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm) goto cleanup; } + if (virDomainGetMemsForGuestCpu(def, i, &mem_mask) < 0) + goto cleanup; + /* Set vcpupin in cgroup if vcpupin xml is provided */ if (virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) { + + if (mem_mask && virCgroupSetCpusetMems(cgroup_vcpu, mem_mask) < 0) + goto cleanup; + /* find the right CPU to pin, otherwise * qemuSetupCgroupVcpuPin will fail. */ for (j = 0; j < def->cputune.nvcpupin; j++) { @@ -946,14 +955,21 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm) break; } + } else if (mem_mask) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cgroup cpuset is required for " + "per-node memory binding")); + goto cleanup; } + VIR_FREE(mem_mask); virCgroupFree(&cgroup_vcpu); } return 0; cleanup: + VIR_FREE(mem_mask); if (cgroup_vcpu) { virCgroupRemove(cgroup_vcpu); virCgroupFree(&cgroup_vcpu); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index cc6023d..8699eea 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6531,21 +6531,99 @@ qemuBuildSmpArgStr(const virDomainDef *def, return virBufferContentAndReset(&buf); } +static const char * +qemuNumaPolicyToString(virDomainNumatuneMemMode mode) +{ + const char *policy = NULL; + + switch (mode) { + case VIR_DOMAIN_NUMATUNE_MEM_STRICT: + policy = "bind"; + break; + + case VIR_DOMAIN_NUMATUNE_MEM_PREFERRED: + policy = "preferred"; + break; + + case VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE: + policy = "interleave"; + break; + + case VIR_DOMAIN_NUMATUNE_MEM_LAST: + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot translate memory policy")); + break; + } + + return policy; +} + static int -qemuBuildNumaArgStr(const virDomainDef *def, virCommandPtr cmd) +qemuBuildNumaArgStr(const virDomainDef *def, + virCommandPtr cmd, + virQEMUDriverPtr driver, + virQEMUCapsPtr qemuCaps, + virBitmapPtr autoNodemask) { + int ret = -1; size_t i; - virBuffer buf = VIR_BUFFER_INITIALIZER; char *cpumask = NULL; - int ret = -1; + virCapsPtr caps = virQEMUDriverGetCapabilities(driver, false); + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *nodemask = NULL; + char *globalNodemask = NULL; + const char *globalPolicy = qemuNumaPolicyToString(def->numatune.memory.mode); + + if (!globalPolicy) + goto cleanup; + + /* + * The @autoNodemask parameter means there is numa placement advise we + * should use if possible. If it's NULL, then we ought to format our + * own nodemask from the numatune specification. Let's hide it from + * most of the code in this function for the sake of simplification. + */ + if (autoNodemask || def->numatune.memory.nodemask) { + + if (autoNodemask) + globalNodemask = virBitmapFormat(autoNodemask); + else + globalNodemask = virBitmapFormat(def->numatune.memory.nodemask); + + if (!globalNodemask) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to format nodemask")); + goto cleanup; + } + } for (i = 0; i < def->cpu->ncells; i++) { - VIR_FREE(cpumask); + const char *policy = NULL; int cellmem = VIR_DIV_UP(def->cpu->cells[i].mem, 1024); def->cpu->cells[i].mem = cellmem * 1024; - if (!(cpumask = virBitmapFormat(def->cpu->cells[i].cpumask))) + VIR_FREE(nodemask); + if (def->numatune.mem_nodes && + def->numatune.mem_nodes[i].specified) { + policy = qemuNumaPolicyToString(def->numatune.mem_nodes[i].mode); + if (!policy) + goto cleanup; + + nodemask = virBitmapFormat(def->numatune.mem_nodes[i].nodemask); + if (!nodemask) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to format nodemask")); + goto cleanup; + } + } + + VIR_FREE(cpumask); + + if (!(cpumask = virBitmapFormat(def->cpu->cells[i].cpumask))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to format nodemask")); goto cleanup; + } if (strchr(cpumask, ',')) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -6554,6 +6632,38 @@ qemuBuildNumaArgStr(const virDomainDef *def, virCommandPtr cmd) goto cleanup; } + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM)) { + virBufferAsprintf(&buf, "memory-ram,size=%dM,id=ram-node%d", + cellmem, def->cpu->cells[i].cellid); + + if (nodemask) { + if (strchr(nodemask, ',')) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA node ranges are not supported " + "with this QEMU")); + goto cleanup; + } + + virBufferEscape(&buf, ',', ",", ",host-nodes=%s", nodemask); + virBufferAsprintf(&buf, ",policy=%s", policy); + + } else if (globalNodemask) { + + if (strchr(globalNodemask, ',')) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA node ranges are not supported " + "with this QEMU")); + goto cleanup; + } + + virBufferEscape(&buf, ',', ",", ",host-nodes=%s", globalNodemask); + virBufferAsprintf(&buf, ",policy=%s", globalPolicy); + } + + virCommandAddArg(cmd, "-object"); + virCommandAddArgBuffer(cmd, &buf); + } + virCommandAddArg(cmd, "-numa"); virBufferAsprintf(&buf, "node,nodeid=%d", def->cpu->cells[i].cellid); virBufferAddLit(&buf, ",cpus="); @@ -6565,7 +6675,13 @@ qemuBuildNumaArgStr(const virDomainDef *def, virCommandPtr cmd) * but when it is, we need a capability bit and * translation of our cpumask into the qemu syntax. */ virBufferAdd(&buf, cpumask, -1); - virBufferAsprintf(&buf, ",mem=%d", cellmem); + + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM)) { + virBufferAsprintf(&buf, ",memdev=ram-node%d", + def->cpu->cells[i].cellid); + } else { + virBufferAsprintf(&buf, ",mem=%d", cellmem); + } if (virBufferError(&buf)) { virReportOOMError(); @@ -6577,6 +6693,10 @@ qemuBuildNumaArgStr(const virDomainDef *def, virCommandPtr cmd) ret = 0; cleanup: + if (!autoNodemask) + VIR_FREE(globalNodemask); + VIR_FREE(nodemask); + virObjectUnref(caps); VIR_FREE(cpumask); virBufferFreeAndReset(&buf); return ret; @@ -7215,7 +7335,8 @@ qemuBuildCommandLine(virConnectPtr conn, virDomainSnapshotObjPtr snapshot, virNetDevVPortProfileOp vmop, qemuBuildCommandLineCallbacksPtr callbacks, - bool standalone) + bool standalone, + virBitmapPtr nodemask) { virErrorPtr originalError = NULL; size_t i, j; @@ -7399,9 +7520,9 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, smp); VIR_FREE(smp); - if (def->cpu && def->cpu->ncells) - if (qemuBuildNumaArgStr(def, cmd) < 0) - goto error; + if (def->cpu && def->cpu->ncells && + qemuBuildNumaArgStr(def, cmd, driver, qemuCaps, nodemask) < 0) + goto error; if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_UUID)) virCommandAddArgList(cmd, "-uuid", uuid, NULL); diff --git a/src/qemu/qemu_command.h b/src/qemu/qemu_command.h index afbd6ff..4856d74 100644 --- a/src/qemu/qemu_command.h +++ b/src/qemu/qemu_command.h @@ -78,7 +78,8 @@ virCommandPtr qemuBuildCommandLine(virConnectPtr conn, virDomainSnapshotObjPtr current_snapshot, virNetDevVPortProfileOp vmop, qemuBuildCommandLineCallbacksPtr callbacks, - bool forXMLToArgv) + bool standalone, + virBitmapPtr nodemask) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(11); /* Generate '-device' string for chardev device */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 545516e..6f138de 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4038,6 +4038,7 @@ static int qemuDomainHotplugVcpus(virQEMUDriverPtr driver, pid_t *cpupids = NULL; int ncpupids; virCgroupPtr cgroup_vcpu = NULL; + char *mem_mask = NULL; qemuDomainObjEnterMonitor(driver, vm); @@ -4151,6 +4152,15 @@ static int qemuDomainHotplugVcpus(virQEMUDriverPtr driver, ret = -1; goto cleanup; } + + if (virDomainGetMemsForGuestCpu(vm->def, i, &mem_mask) < 0) + goto cleanup; + + if (mem_mask && virCgroupSetCpusetMems(cgroup_vcpu, + mem_mask) < 0) + goto cleanup; + VIR_FREE(mem_mask); + } else { if (virProcessSetAffinity(cpupids[i], vcpupin->cpumask) < 0) { @@ -6031,7 +6041,7 @@ static char *qemuConnectDomainXMLToNative(virConnectPtr conn, NULL, -1, NULL, VIR_NETDEV_VPORT_PROFILE_OP_NO_OP, &buildCommandLineCallbacks, - true))) + true, NULL))) goto cleanup; ret = virCommandToString(cmd); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index d719716..1db9177 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3922,7 +3922,8 @@ int qemuProcessStart(virConnectPtr conn, if (!(cmd = qemuBuildCommandLine(conn, driver, vm->def, priv->monConfig, priv->monJSON, priv->qemuCaps, migrateFrom, stdin_fd, snapshot, vmop, - &buildCommandLineCallbacks, false))) + &buildCommandLineCallbacks, false, + nodemask))) goto cleanup; /* now that we know it is about to start call the hook if present */ diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto-prefer.xml b/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto-prefer.xml new file mode 100644 index 0000000..63f0d1f --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto-prefer.xml @@ -0,0 +1,29 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>9f4b6512-e73a-4a25-93e8-5307802821ce</uuid> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='auto'>1</vcpu> + <numatune> + <memory mode='preferred'/> + </numatune> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <numa> + <cell id='0' cpus='0' memory='65536'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/kvm</emulator> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.args b/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.args new file mode 100644 index 0000000..7287f4f --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.args @@ -0,0 +1,6 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test \ +QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc -m 64 -smp 1 -object \ +memory-ram,size=64M,id=ram-node0,host-nodes=4-5,policy=bind -numa \ +node,nodeid=0,cpus=0,memdev=ram-node0 -nographic -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot c -usb -net none \ +-serial none -parallel none diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.xml b/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.xml new file mode 100644 index 0000000..8b9c90d --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.xml @@ -0,0 +1,26 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>9f4b6512-e73a-4a25-93e8-5307802821ce</uuid> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='auto'>1</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <numa> + <cell id='0' cpus='0' memory='65536'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/kvm</emulator> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnode-nocpu.xml b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnode-nocpu.xml new file mode 100644 index 0000000..7b0e248 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnode-nocpu.xml @@ -0,0 +1,25 @@ +<domain type='kvm'> + <name>QEMUGuest</name> + <uuid>9f4b6512-e73a-4a25-93e8-5307802821ce</uuid> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='static'>1</vcpu> + <numatune> + <memory mode='strict' nodeset='0,2'/> + <memnode cellid='0' mode='interleave' nodeset='3'/> + </numatune> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/kvm</emulator> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes-problematic.xml b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes-problematic.xml new file mode 100644 index 0000000..390f6af --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes-problematic.xml @@ -0,0 +1,31 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>9f4b6512-e73a-4a25-93e8-5307802821ce</uuid> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='static'>2</vcpu> + <numatune> + <memory mode='strict' nodeset='0-2'/> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <numa> + <cell id='0' cpus='0' memory='32768'/> + <cell id='1' cpus='1' memory='32768'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/kvm</emulator> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.args b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.args new file mode 100644 index 0000000..47db091 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.args @@ -0,0 +1,8 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test \ +QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc -m 64 -smp 2 -object \ +memory-ram,size=32M,id=ram-node0,host-nodes=3,policy=preferred -numa \ +node,nodeid=0,cpus=0,memdev=ram-node0 -object \ +memory-ram,size=32M,id=ram-node1,host-nodes=0-3,policy=bind -numa \ +node,nodeid=1,cpus=1,memdev=ram-node1 -nographic -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot c -usb -net none \ +-serial none -parallel none diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml new file mode 100644 index 0000000..18b00d8 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml @@ -0,0 +1,31 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>9f4b6512-e73a-4a25-93e8-5307802821ce</uuid> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='static'>2</vcpu> + <numatune> + <memory mode='strict' nodeset='0-3'/> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <numa> + <cell id='0' cpus='0' memory='32768'/> + <cell id='1' cpus='1' memory='32768'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/kvm</emulator> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.args b/tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.args new file mode 100644 index 0000000..079930a --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.args @@ -0,0 +1,6 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test \ +QEMU_AUDIO_DRV=none /usr/bin/kvm -S -M pc -m 64 -smp 1 -object \ +memory-ram,size=64M,id=ram-node0,host-nodes=1-2,policy=preferred -numa \ +node,nodeid=0,cpus=0,memdev=ram-node0 -nographic -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot c -usb -net none \ +-serial none -parallel none diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.xml b/tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.xml new file mode 100644 index 0000000..58cd12c --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.xml @@ -0,0 +1,29 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>9f4b6512-e73a-4a25-93e8-5307802821ce</uuid> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='static'>1</vcpu> + <numatune> + <memory mode='preferred'/> + </numatune> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <numa> + <cell id='0' cpus='0' memory='65536'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/kvm</emulator> + <controller type='usb' index='0'/> + <controller type='pci' index='0' model='pci-root'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index bdea8b0..e851115 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -263,7 +263,8 @@ static int testCompareXMLToArgvFiles(const char *xml, virQEMUCapsPtr extraFlags, const char *migrateFrom, int migrateFd, - virQemuXML2ArgvTestFlags flags) + virQemuXML2ArgvTestFlags flags, + const char *nodemaskStr) { char *expectargv = NULL; int len; @@ -271,10 +272,15 @@ static int testCompareXMLToArgvFiles(const char *xml, int ret = -1; virDomainDefPtr vmdef = NULL; virDomainChrSourceDef monitor_chr; - virConnectPtr conn; + virConnectPtr conn = NULL; char *log = NULL; virCommandPtr cmd = NULL; size_t i; + virBitmapPtr nodemask = NULL; + + if (nodemaskStr && + virBitmapParse(nodemaskStr, 0, &nodemask, VIR_DOMAIN_CPUMASK_LEN) < 0) + goto out; if (!(conn = virGetConnect())) goto out; @@ -357,7 +363,7 @@ static int testCompareXMLToArgvFiles(const char *xml, (flags & FLAG_JSON), extraFlags, migrateFrom, migrateFd, NULL, VIR_NETDEV_VPORT_PROFILE_OP_NO_OP, - &testCallbacks, false))) { + &testCallbacks, false, nodemask))) { if (!virtTestOOMActive() && (flags & FLAG_EXPECT_FAILURE)) { ret = 0; @@ -404,6 +410,7 @@ static int testCompareXMLToArgvFiles(const char *xml, ret = 0; out: + virBitmapFree(nodemask); VIR_FREE(log); VIR_FREE(expectargv); VIR_FREE(actualargv); @@ -420,6 +427,7 @@ struct testInfo { const char *migrateFrom; int migrateFd; unsigned int flags; + const char *nodemask; }; static int @@ -442,7 +450,7 @@ testCompareXMLToArgvHelper(const void *data) result = testCompareXMLToArgvFiles(xml, args, info->extraFlags, info->migrateFrom, info->migrateFd, - flags); + flags, info->nodemask); cleanup: VIR_FREE(xml); @@ -532,10 +540,10 @@ mymain(void) if (VIR_STRDUP_QUIET(driver.config->spicePassword, "123456") < 0) return EXIT_FAILURE; -# define DO_TEST_FULL(name, migrateFrom, migrateFd, flags, ...) \ +# define DO_TEST_FULL(name, migrateFrom, migrateFd, flags, nodemask, ...) \ do { \ static struct testInfo info = { \ - name, NULL, migrateFrom, migrateFd, (flags) \ + name, NULL, migrateFrom, migrateFd, (flags), nodemask \ }; \ if (!(info.extraFlags = virQEMUCapsNew())) \ return EXIT_FAILURE; \ @@ -549,18 +557,18 @@ mymain(void) } while (0) # define DO_TEST(name, ...) \ - DO_TEST_FULL(name, NULL, -1, 0, __VA_ARGS__) + DO_TEST_FULL(name, NULL, -1, 0, NULL, __VA_ARGS__) # define DO_TEST_ERROR(name, ...) \ - DO_TEST_FULL(name, NULL, -1, FLAG_EXPECT_ERROR, __VA_ARGS__) + DO_TEST_FULL(name, NULL, -1, FLAG_EXPECT_ERROR, NULL, __VA_ARGS__) # define DO_TEST_FAILURE(name, ...) \ - DO_TEST_FULL(name, NULL, -1, FLAG_EXPECT_FAILURE, __VA_ARGS__) + DO_TEST_FULL(name, NULL, -1, FLAG_EXPECT_FAILURE, NULL, __VA_ARGS__) # define DO_TEST_PARSE_ERROR(name, ...) \ DO_TEST_FULL(name, NULL, -1, \ FLAG_EXPECT_PARSE_ERROR | FLAG_EXPECT_ERROR, \ - __VA_ARGS__) + NULL, __VA_ARGS__) # define NONE QEMU_CAPS_LAST @@ -1144,13 +1152,13 @@ mymain(void) QEMU_CAPS_PCIDEVICE, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG, QEMU_CAPS_PCI_ROMBAR); - DO_TEST_FULL("restore-v1", "stdio", 7, 0, QEMU_CAPS_MIGRATE_KVM_STDIO); - DO_TEST_FULL("restore-v2", "stdio", 7, 0, QEMU_CAPS_MIGRATE_QEMU_EXEC); - DO_TEST_FULL("restore-v2", "exec:cat", 7, 0, QEMU_CAPS_MIGRATE_QEMU_EXEC); - DO_TEST_FULL("restore-v2-fd", "stdio", 7, 0, QEMU_CAPS_MIGRATE_QEMU_FD); - DO_TEST_FULL("restore-v2-fd", "fd:7", 7, 0, QEMU_CAPS_MIGRATE_QEMU_FD); - DO_TEST_FULL("migrate", "tcp:10.0.0.1:5000", -1, 0, - QEMU_CAPS_MIGRATE_QEMU_TCP); + DO_TEST_FULL("restore-v1", "stdio", 7, 0, NULL, QEMU_CAPS_MIGRATE_KVM_STDIO); + DO_TEST_FULL("restore-v2", "stdio", 7, 0, NULL, QEMU_CAPS_MIGRATE_QEMU_EXEC); + DO_TEST_FULL("restore-v2", "exec:cat", 7, 0, NULL, QEMU_CAPS_MIGRATE_QEMU_EXEC); + DO_TEST_FULL("restore-v2-fd", "stdio", 7, 0, NULL, QEMU_CAPS_MIGRATE_QEMU_FD); + DO_TEST_FULL("restore-v2-fd", "fd:7", 7, 0, NULL, QEMU_CAPS_MIGRATE_QEMU_FD); + DO_TEST_FULL("migrate", "tcp:10.0.0.1:5000", -1, 0, NULL, + QEMU_CAPS_MIGRATE_QEMU_TCP); DO_TEST("qemu-ns", NONE); @@ -1187,7 +1195,12 @@ mymain(void) DO_TEST("cputune", QEMU_CAPS_NAME); DO_TEST("cputune-zero-shares", QEMU_CAPS_NAME); DO_TEST("numatune-memory", NONE); + DO_TEST("numatune-memnodes", QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST_PARSE_ERROR("numatune-memnodes-problematic", QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST_PARSE_ERROR("numatune-memnode-nocpu", QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST_PARSE_ERROR("numatune-auto-prefer", QEMU_CAPS_OBJECT_MEMORY_RAM); DO_TEST("numatune-auto-nodeset-invalid", NONE); + DO_TEST_FULL("numatune-auto", NULL, -1, 0, "4-5", QEMU_CAPS_OBJECT_MEMORY_RAM); DO_TEST("numad", NONE); DO_TEST("numad-auto-vcpu-static-numatune", NONE); DO_TEST("numad-auto-memory-vcpu-cpuset", NONE); @@ -1378,6 +1391,9 @@ mymain(void) DO_TEST("panic", QEMU_CAPS_DEVICE_PANIC, QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG); + DO_TEST("panic", QEMU_CAPS_DEVICE_PANIC, + QEMU_CAPS_DEVICE, QEMU_CAPS_NODEFCONFIG); + virObjectUnref(driver.config); virObjectUnref(driver.caps); virObjectUnref(driver.xmlopt); diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index a446ea1..f1f419a 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -366,6 +366,7 @@ mymain(void) DO_TEST_DIFFERENT("cpu-numa1"); DO_TEST_DIFFERENT("cpu-numa2"); + DO_TEST("numatune-memnodes"); virObjectUnref(driver.caps); virObjectUnref(driver.xmlopt); diff --git a/tests/qemuxmlnstest.c b/tests/qemuxmlnstest.c index e8f70d6..ec5eec5 100644 --- a/tests/qemuxmlnstest.c +++ b/tests/qemuxmlnstest.c @@ -119,7 +119,7 @@ static int testCompareXMLToArgvFiles(const char *xml, vmdef, &monitor_chr, json, extraFlags, migrateFrom, migrateFd, NULL, VIR_NETDEV_VPORT_PROFILE_OP_NO_OP, - &testCallbacks, false))) + &testCallbacks, false, NULL))) goto fail; if (!virtTestOOMActive()) { -- 2.0.0

On Wed, Jun 04, 2014 at 04:56:32PM +0200, Martin Kletzander wrote:
diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index b1bfb5a..6415a6d 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -930,8 +932,15 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm) goto cleanup; }
+ if (virDomainGetMemsForGuestCpu(def, i, &mem_mask) < 0) + goto cleanup; + /* Set vcpupin in cgroup if vcpupin xml is provided */ if (virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) { + + if (mem_mask && virCgroupSetCpusetMems(cgroup_vcpu, mem_mask) < 0) + goto cleanup; +
I'm still pretty unclear on whether we want todo this or not. IMHO the <memnode> is primarily about making guest NUMA nodes be allocated from specific host NUMA nodes. I'm not sure that arbitrary allocations done by the QEMU vCPU threads should be bound the same way or not. Be nice to have explicit confirmation from someone knowing KVM to say that we should force vCPU threads to allocate from the same NUMA nodes as the guest NUMA the vCPU is in.
/* find the right CPU to pin, otherwise * qemuSetupCgroupVcpuPin will fail. */ for (j = 0; j < def->cputune.nvcpupin; j++) { @@ -946,14 +955,21 @@ qemuSetupCgroupForVcpu(virDomainObjPtr vm)
break; } + } else if (mem_mask) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("cgroup cpuset is required for " + "per-node memory binding")); + goto cleanup; }
+ VIR_FREE(mem_mask); virCgroupFree(&cgroup_vcpu); }
return 0;
cleanup: + VIR_FREE(mem_mask); if (cgroup_vcpu) { virCgroupRemove(cgroup_vcpu); virCgroupFree(&cgroup_vcpu); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index cc6023d..8699eea 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6531,21 +6531,99 @@ qemuBuildSmpArgStr(const virDomainDef *def, return virBufferContentAndReset(&buf); }
+static const char * +qemuNumaPolicyToString(virDomainNumatuneMemMode mode) +{ + const char *policy = NULL; + + switch (mode) { + case VIR_DOMAIN_NUMATUNE_MEM_STRICT: + policy = "bind"; + break; + + case VIR_DOMAIN_NUMATUNE_MEM_PREFERRED: + policy = "preferred"; + break; + + case VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE: + policy = "interleave"; + break; + + case VIR_DOMAIN_NUMATUNE_MEM_LAST: + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Cannot translate memory policy")); + break; + } + + return policy; +}
Looks like a candidate for qemu local VIR_ENUM
+ static int -qemuBuildNumaArgStr(const virDomainDef *def, virCommandPtr cmd) +qemuBuildNumaArgStr(const virDomainDef *def, + virCommandPtr cmd, + virQEMUDriverPtr driver, + virQEMUCapsPtr qemuCaps, + virBitmapPtr autoNodemask) { + int ret = -1; size_t i; - virBuffer buf = VIR_BUFFER_INITIALIZER; char *cpumask = NULL; - int ret = -1; + virCapsPtr caps = virQEMUDriverGetCapabilities(driver, false); + virBuffer buf = VIR_BUFFER_INITIALIZER; + char *nodemask = NULL; + char *globalNodemask = NULL; + const char *globalPolicy = qemuNumaPolicyToString(def->numatune.memory.mode); + + if (!globalPolicy) + goto cleanup; +
for (i = 0; i < def->cpu->ncells; i++) { - VIR_FREE(cpumask); + const char *policy = NULL; int cellmem = VIR_DIV_UP(def->cpu->cells[i].mem, 1024); def->cpu->cells[i].mem = cellmem * 1024;
- if (!(cpumask = virBitmapFormat(def->cpu->cells[i].cpumask))) + VIR_FREE(nodemask); + if (def->numatune.mem_nodes && + def->numatune.mem_nodes[i].specified) { + policy = qemuNumaPolicyToString(def->numatune.mem_nodes[i].mode); + if (!policy) + goto cleanup; + + nodemask = virBitmapFormat(def->numatune.mem_nodes[i].nodemask); + if (!nodemask) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to format nodemask")); + goto cleanup; + } + } + + VIR_FREE(cpumask); + + if (!(cpumask = virBitmapFormat(def->cpu->cells[i].cpumask))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to format nodemask")); goto cleanup; + }
if (strchr(cpumask, ',')) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", @@ -6554,6 +6632,38 @@ qemuBuildNumaArgStr(const virDomainDef *def, virCommandPtr cmd) goto cleanup; }
+ if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_RAM)) { + virBufferAsprintf(&buf, "memory-ram,size=%dM,id=ram-node%d", + cellmem, def->cpu->cells[i].cellid); + + if (nodemask) { + if (strchr(nodemask, ',')) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA node ranges are not supported " + "with this QEMU")); + goto cleanup; + } + + virBufferEscape(&buf, ',', ",", ",host-nodes=%s", nodemask); + virBufferAsprintf(&buf, ",policy=%s", policy); + + } else if (globalNodemask) { + + if (strchr(globalNodemask, ',')) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA node ranges are not supported " + "with this QEMU")); + goto cleanup; + } + + virBufferEscape(&buf, ',', ",", ",host-nodes=%s", globalNodemask); + virBufferAsprintf(&buf, ",policy=%s", globalPolicy); + } + + virCommandAddArg(cmd, "-object"); + virCommandAddArgBuffer(cmd, &buf); + }
We should be reporting VIR_ERR_CONFIG_SUPPORTED if they have requested <memnode> in the XML and we don't have the QEMU_CAPS_OBJECT_MEMORY_RAM capability.
diff --git a/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml new file mode 100644 index 0000000..18b00d8 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml @@ -0,0 +1,31 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>9f4b6512-e73a-4a25-93e8-5307802821ce</uuid> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='static'>2</vcpu> + <numatune> + <memory mode='strict' nodeset='0-3'/> + <memnode cellid='0' mode='preferred' nodeset='3'/> + </numatune> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <cpu> + <numa> + <cell id='0' cpus='0' memory='32768'/> + <cell id='1' cpus='1' memory='32768'/>
How about expanding this test case to demonstrate a huge asymmetric NUMA topology. eg 32 vcpus. 3 numa nodes. 16 vCPUs in 1 node, and 8 in each of the other nodes. Also, perhaps we should have an example topology that has CPU only nodes and memory only nodes. eg <cell id='0' cpus='0-1'/> <cell id='1' memory='65536'/> Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Wed, 2014-06-04 at 16:56 +0200, Martin Kletzander wrote:
Currently we are only able to bind the whole domain to some host nodes using the /domain/numatune/memory element. Numerous requests were made to support host<->guest numa node bindings, so this series tries to pinch an idea on how to do that using /domain/numatune/memnode elements.
That is incompatible with automatic numa placement (numad) since that makes no sense. Also this disables any live changes to numa parameters (the /domain/numatune/memory settings) since we cannot change the settings given to qemu. Hi Martin,
Sorry for that I have not observed this patch. I made a duplicated work about this recently. and I found this patch has not been updated for several days, but since the QEMU have extra supported "memory-file" and some flags/properties, this patches should be refactored. Do you plan to send a new version ? If not, Can I take over them? Thanks, Chen
Martin Kletzander (6): conf, schema: add 'id' field for cells conf, schema: add support for numatune memnode element conf: add virDomainGetMemsForGuestCpu() qemu: purely a code movement qemu: memory-ram capability probing qemu: pass numa node binding preferences to qemu
docs/formatdomain.html.in | 26 ++- docs/schemas/domaincommon.rng | 22 ++ src/conf/cpu_conf.c | 39 +++- src/conf/domain_conf.c | 259 ++++++++++++++++++--- src/conf/domain_conf.h | 4 + src/libvirt_private.syms | 1 + src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_cgroup.c | 18 +- src/qemu/qemu_command.c | 160 +++++++++++-- src/qemu/qemu_command.h | 3 +- src/qemu/qemu_domain.c | 23 +- src/qemu/qemu_driver.c | 23 +- src/qemu/qemu_process.c | 3 +- src/util/virnuma.h | 14 +- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml | 6 +- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml | 6 +- tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml | 25 ++ .../qemuxml2argv-numatune-auto-prefer.xml | 29 +++ .../qemuxml2argv-numatune-auto.args | 6 + .../qemuxml2argv-numatune-auto.xml | 26 +++ .../qemuxml2argv-numatune-memnode-nocpu.xml | 25 ++ .../qemuxml2argv-numatune-memnodes-problematic.xml | 31 +++ .../qemuxml2argv-numatune-memnodes.args | 8 + .../qemuxml2argv-numatune-memnodes.xml | 31 +++ .../qemuxml2argv-numatune-prefer.args | 6 + .../qemuxml2argv-numatune-prefer.xml | 29 +++ tests/qemuxml2argvtest.c | 51 ++-- .../qemuxml2xmlout-cpu-numa1.xml | 28 +++ .../qemuxml2xmlout-cpu-numa2.xml | 28 +++ tests/qemuxml2xmltest.c | 4 + tests/qemuxmlnstest.c | 2 +- 32 files changed, 845 insertions(+), 94 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa3.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto-prefer.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-auto.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnode-nocpu.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes-problematic.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-memnodes.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-numatune-prefer.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa1.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-cpu-numa2.xml
-- 2.0.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Thu, Jun 26, 2014 at 01:50:22AM +0000, chen.fan.fnst@cn.fujitsu.com wrote:
On Wed, 2014-06-04 at 16:56 +0200, Martin Kletzander wrote:
Currently we are only able to bind the whole domain to some host nodes using the /domain/numatune/memory element. Numerous requests were made to support host<->guest numa node bindings, so this series tries to pinch an idea on how to do that using /domain/numatune/memnode elements.
That is incompatible with automatic numa placement (numad) since that makes no sense. Also this disables any live changes to numa parameters (the /domain/numatune/memory settings) since we cannot change the settings given to qemu. Hi Martin,
Sorry for that I have not observed this patch. I made a duplicated work about this recently. and I found this patch has not been updated for several days, but since the QEMU have extra supported "memory-file" and some flags/properties, this patches should be refactored. Do you plan to send a new version ? If not, Can I take over them?
I'm completely re-factoring the numatune parsing code and reworking few other things for this patch. For memory-file, that will be automatically supported as well, but with Michal's patches. We already have an option that says "use hugepages" and we would like to re-use that instead of creating new device(s). But we will greatly value your input on these patches (both mine and Michal's) when these hit the list. So if there's something else you find missing or wrong, that should be added or fixed, let me know. Martin

On Thu, 2014-06-26 at 07:35 +0200, Martin Kletzander wrote:
On Thu, Jun 26, 2014 at 01:50:22AM +0000, chen.fan.fnst@cn.fujitsu.com wrote:
On Wed, 2014-06-04 at 16:56 +0200, Martin Kletzander wrote:
Currently we are only able to bind the whole domain to some host nodes using the /domain/numatune/memory element. Numerous requests were made to support host<->guest numa node bindings, so this series tries to pinch an idea on how to do that using /domain/numatune/memnode elements.
That is incompatible with automatic numa placement (numad) since that makes no sense. Also this disables any live changes to numa parameters (the /domain/numatune/memory settings) since we cannot change the settings given to qemu. Hi Martin,
Sorry for that I have not observed this patch. I made a duplicated work about this recently. and I found this patch has not been updated for several days, but since the QEMU have extra supported "memory-file" and some flags/properties, this patches should be refactored. Do you plan to send a new version ? If not, Can I take over them?
I'm completely re-factoring the numatune parsing code and reworking few other things for this patch. For memory-file, that will be automatically supported as well, but with Michal's patches. We already have an option that says "use hugepages" and we would like to re-use that instead of creating new device(s).
But we will greatly value your input on these patches (both mine and Michal's) when these hit the list. So if there's something else you find missing or wrong, that should be added or fixed, let me know. Ok.
Thanks, Chen
Martin
participants (3)
-
chen.fan.fnst@cn.fujitsu.com
-
Daniel P. Berrange
-
Martin Kletzander