[PATCH 00/13] Allow HMAT configuration

Heterogeneous Memory Attribute Table describes links between NUMA nodes (what's the bandwidth and/or latency between two nodes and/or their side caches). This enables guests to fine tune their performance. Michal Prívozník (13): qemuxml2xmltest: Add "numatune-distance" test case conf: Move and rename virDomainParseScaledValue() numa_conf: Drop CPU from name of two functions qemu_command: Rename qemuBuildNumaArgStr() qemuBuildMachineCommandLine: Drop needless check numa_conf: Make virDomainNumaSetNodeCpumask() return void Allow NUMA nodes without vCPUs conf: Parse and format HMAT conf: Validate NUMA HMAT configuration numa: expose HMAT APIs qemu: Introduce QEMU_CAPS_NUMA_HMAT capability qemu: Build HMAT command line news: Document HMAT addition NEWS.rst | 5 + docs/formatdomain.html.in | 90 +++ docs/schemas/cputypes.rng | 118 +++- src/conf/cpu_conf.c | 2 +- src/conf/domain_conf.c | 140 +--- src/conf/numa_conf.c | 664 ++++++++++++++++-- src/conf/numa_conf.h | 72 +- src/libvirt_private.syms | 13 + src/libxl/xen_xl.c | 14 +- src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_command.c | 226 +++++- src/qemu/qemu_validate.c | 22 +- src/util/virxml.c | 72 ++ src/util/virxml.h | 8 + .../caps_5.0.0.aarch64.xml | 1 + .../qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 1 + .../caps_5.0.0.riscv64.xml | 1 + .../caps_5.0.0.x86_64.xml | 1 + .../caps_5.1.0.x86_64.xml | 1 + .../numatune-hmat.x86_64-latest.args | 53 ++ tests/qemuxml2argvdata/numatune-hmat.xml | 52 ++ tests/qemuxml2argvdata/numatune-no-vcpu.args | 33 + tests/qemuxml2argvdata/numatune-no-vcpu.xml | 42 ++ tests/qemuxml2argvtest.c | 2 + .../qemuxml2xmloutdata/numatune-distances.xml | 96 +++ tests/qemuxml2xmloutdata/numatune-hmat.xml | 1 + tests/qemuxml2xmloutdata/numatune-no-vcpu.xml | 1 + tests/qemuxml2xmltest.c | 3 + 29 files changed, 1536 insertions(+), 201 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-hmat.x86_64-latest.args create mode 100644 tests/qemuxml2argvdata/numatune-hmat.xml create mode 100644 tests/qemuxml2argvdata/numatune-no-vcpu.args create mode 100644 tests/qemuxml2argvdata/numatune-no-vcpu.xml create mode 100644 tests/qemuxml2xmloutdata/numatune-distances.xml create mode 120000 tests/qemuxml2xmloutdata/numatune-hmat.xml create mode 120000 tests/qemuxml2xmloutdata/numatune-no-vcpu.xml -- 2.26.2

This test case tests that expanding of NUMA distances work. On input we accept if only distance from A to B is specified. On the output we format the B to A distance too. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- .../qemuxml2xmloutdata/numatune-distances.xml | 96 +++++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 2 files changed, 97 insertions(+) create mode 100644 tests/qemuxml2xmloutdata/numatune-distances.xml diff --git a/tests/qemuxml2xmloutdata/numatune-distances.xml b/tests/qemuxml2xmloutdata/numatune-distances.xml new file mode 100644 index 0000000000..48f89cb015 --- /dev/null +++ b/tests/qemuxml2xmloutdata/numatune-distances.xml @@ -0,0 +1,96 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>c7a5fdb2-cdaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static'>12</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu> + <numa> + <cell id='0' cpus='0,11' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='10'/> + <sibling id='1' value='21'/> + <sibling id='2' value='31'/> + <sibling id='3' value='41'/> + <sibling id='4' value='51'/> + <sibling id='5' value='61'/> + </distances> + </cell> + <cell id='1' cpus='1,10' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='21'/> + <sibling id='1' value='10'/> + <sibling id='2' value='21'/> + <sibling id='3' value='31'/> + <sibling id='4' value='41'/> + <sibling id='5' value='51'/> + </distances> + </cell> + <cell id='2' cpus='2,9' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='31'/> + <sibling id='1' value='21'/> + <sibling id='2' value='10'/> + <sibling id='3' value='21'/> + <sibling id='4' value='31'/> + <sibling id='5' value='41'/> + </distances> + </cell> + <cell id='3' cpus='3,8' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='41'/> + <sibling id='1' value='31'/> + <sibling id='2' value='21'/> + <sibling id='3' value='10'/> + <sibling id='4' value='21'/> + <sibling id='5' value='31'/> + </distances> + </cell> + <cell id='4' cpus='4,7' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='51'/> + <sibling id='1' value='41'/> + <sibling id='2' value='31'/> + <sibling id='3' value='21'/> + <sibling id='4' value='10'/> + <sibling id='5' value='21'/> + </distances> + </cell> + <cell id='5' cpus='5-6' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='61'/> + <sibling id='1' value='51'/> + <sibling id='2' value='41'/> + <sibling id='3' value='31'/> + <sibling id='4' value='21'/> + <sibling id='5' value='10'/> + </distances> + </cell> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 5a124853b4..d203c97e36 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1104,6 +1104,7 @@ mymain(void) DO_TEST("numatune-auto-prefer", NONE); DO_TEST("numatune-memnode", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("numatune-memnode-no-memory", QEMU_CAPS_OBJECT_MEMORY_FILE); + DO_TEST("numatune-distances", QEMU_CAPS_NUMA, QEMU_CAPS_NUMA_DIST); DO_TEST("bios-nvram", NONE); DO_TEST("bios-nvram-os-interleave", NONE); -- 2.26.2

On 6/24/20 10:48 AM, Michal Privoznik wrote:
This test case tests that expanding of NUMA distances work. On
This first sentence seems odd. Perhaps change it to "This test case checks that expanding NUMA distance works"
input we accept if only distance from A to B is specified. On the output we format the B to A distance too.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- .../qemuxml2xmloutdata/numatune-distances.xml | 96 +++++++++++++++++++ tests/qemuxml2xmltest.c | 1 + 2 files changed, 97 insertions(+) create mode 100644 tests/qemuxml2xmloutdata/numatune-distances.xml
diff --git a/tests/qemuxml2xmloutdata/numatune-distances.xml b/tests/qemuxml2xmloutdata/numatune-distances.xml new file mode 100644 index 0000000000..48f89cb015 --- /dev/null +++ b/tests/qemuxml2xmloutdata/numatune-distances.xml @@ -0,0 +1,96 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>c7a5fdb2-cdaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static'>12</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu> + <numa> + <cell id='0' cpus='0,11' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='10'/> + <sibling id='1' value='21'/> + <sibling id='2' value='31'/> + <sibling id='3' value='41'/> + <sibling id='4' value='51'/> + <sibling id='5' value='61'/> + </distances> + </cell> + <cell id='1' cpus='1,10' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='21'/> + <sibling id='1' value='10'/> + <sibling id='2' value='21'/> + <sibling id='3' value='31'/> + <sibling id='4' value='41'/> + <sibling id='5' value='51'/> + </distances> + </cell> + <cell id='2' cpus='2,9' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='31'/> + <sibling id='1' value='21'/> + <sibling id='2' value='10'/> + <sibling id='3' value='21'/> + <sibling id='4' value='31'/> + <sibling id='5' value='41'/> + </distances> + </cell> + <cell id='3' cpus='3,8' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='41'/> + <sibling id='1' value='31'/> + <sibling id='2' value='21'/> + <sibling id='3' value='10'/> + <sibling id='4' value='21'/> + <sibling id='5' value='31'/> + </distances> + </cell> + <cell id='4' cpus='4,7' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='51'/> + <sibling id='1' value='41'/> + <sibling id='2' value='31'/> + <sibling id='3' value='21'/> + <sibling id='4' value='10'/> + <sibling id='5' value='21'/> + </distances> + </cell> + <cell id='5' cpus='5-6' memory='2097152' unit='KiB'> + <distances> + <sibling id='0' value='61'/> + <sibling id='1' value='51'/> + <sibling id='2' value='41'/> + <sibling id='3' value='31'/> + <sibling id='4' value='21'/> + <sibling id='5' value='10'/> + </distances> + </cell> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 5a124853b4..d203c97e36 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1104,6 +1104,7 @@ mymain(void) DO_TEST("numatune-auto-prefer", NONE); DO_TEST("numatune-memnode", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("numatune-memnode-no-memory", QEMU_CAPS_OBJECT_MEMORY_FILE); + DO_TEST("numatune-distances", QEMU_CAPS_NUMA, QEMU_CAPS_NUMA_DIST);
DO_TEST("bios-nvram", NONE); DO_TEST("bios-nvram-os-interleave", NONE);

On 6/24/20 5:43 PM, Daniel Henrique Barboza wrote:
On 6/24/20 10:48 AM, Michal Privoznik wrote:
This test case tests that expanding of NUMA distances work. On
This first sentence seems odd. Perhaps change it to
"This test case checks that expanding NUMA distance works"
Forgot to add my r-b: Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

There is nothing domain specific about the function, thus it should not have virDomain prefix. Also, the fact that it is a static function makes it impossible to use from other files. Move the function to virxml.c and drop the 'Domain' infix. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 135 ++++++++++----------------------------- src/libvirt_private.syms | 1 + src/util/virxml.c | 72 +++++++++++++++++++++ src/util/virxml.h | 8 +++ 4 files changed, 114 insertions(+), 102 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fc7fcfb0c6..65a110b16f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -10808,75 +10808,6 @@ virDomainDiskDefParseXML(virDomainXMLOptionPtr xmlopt, goto cleanup; } -/** - * virDomainParseScaledValue: - * @xpath: XPath to memory amount - * @units_xpath: XPath to units attribute - * @ctxt: XPath context - * @val: scaled value is stored here - * @scale: default scale for @val - * @max: maximal @val allowed - * @required: is the value required? - * - * Parse a value located at @xpath within @ctxt, and store the - * result into @val. The value is scaled by units located at - * @units_xpath (or the 'unit' attribute under @xpath if - * @units_xpath is NULL). If units are not present, the default - * @scale is used. If @required is set, then the value must - * exist; otherwise, the value is optional. The resulting value - * is in bytes. - * - * Returns 1 on success, - * 0 if the value was not present and !@required, - * -1 on failure after issuing error. - */ -static int -virDomainParseScaledValue(const char *xpath, - const char *units_xpath, - xmlXPathContextPtr ctxt, - unsigned long long *val, - unsigned long long scale, - unsigned long long max, - bool required) -{ - unsigned long long bytes; - g_autofree char *xpath_full = NULL; - g_autofree char *unit = NULL; - g_autofree char *bytes_str = NULL; - - *val = 0; - xpath_full = g_strdup_printf("string(%s)", xpath); - - bytes_str = virXPathString(xpath_full, ctxt); - if (!bytes_str) { - if (!required) - return 0; - virReportError(VIR_ERR_XML_ERROR, - _("missing element or attribute '%s'"), - xpath); - return -1; - } - VIR_FREE(xpath_full); - - if (virStrToLong_ullp(bytes_str, NULL, 10, &bytes) < 0) { - virReportError(VIR_ERR_XML_ERROR, - _("Invalid value '%s' for element or attribute '%s'"), - bytes_str, xpath); - return -1; - } - - if (units_xpath) - xpath_full = g_strdup_printf("string(%s)", units_xpath); - else - xpath_full = g_strdup_printf("string(%s/@unit)", xpath); - unit = virXPathString(xpath_full, ctxt); - - if (virScaleInteger(&bytes, unit, scale, max) < 0) - return -1; - - *val = bytes; - return 1; -} /** @@ -10913,8 +10844,8 @@ virDomainParseMemory(const char *xpath, max = virMemoryMaxValue(capped); - if (virDomainParseScaledValue(xpath, units_xpath, ctxt, - &bytes, 1024, max, required) < 0) + if (virParseScaledValue(xpath, units_xpath, ctxt, + &bytes, 1024, max, required) < 0) return -1; /* Yes, we really do use kibibytes for our internal sizing. */ @@ -10956,9 +10887,9 @@ virDomainParseMemoryLimit(const char *xpath, int ret; unsigned long long bytes; - ret = virDomainParseScaledValue(xpath, units_xpath, ctxt, &bytes, 1024, - VIR_DOMAIN_MEMORY_PARAM_UNLIMITED << 10, - false); + ret = virParseScaledValue(xpath, units_xpath, ctxt, &bytes, 1024, + VIR_DOMAIN_MEMORY_PARAM_UNLIMITED << 10, + false); if (ret < 0) return -1; @@ -11286,9 +11217,9 @@ virDomainControllerDefParseXML(virDomainXMLOptionPtr xmlopt, "have an address")); goto error; } - if ((rc = virDomainParseScaledValue("./pcihole64", NULL, - ctxt, &bytes, 1024, - 1024ULL * ULONG_MAX, false)) < 0) + if ((rc = virParseScaledValue("./pcihole64", NULL, + ctxt, &bytes, 1024, + 1024ULL * ULONG_MAX, false)) < 0) goto error; if (rc == 1) @@ -11529,14 +11460,14 @@ virDomainFSDefParseXML(virDomainXMLOptionPtr xmlopt, def->multidevs = VIR_DOMAIN_FS_MULTIDEVS_DEFAULT; } - if (virDomainParseScaledValue("./space_hard_limit[1]", - NULL, ctxt, &def->space_hard_limit, - 1, ULLONG_MAX, false) < 0) + if (virParseScaledValue("./space_hard_limit[1]", + NULL, ctxt, &def->space_hard_limit, + 1, ULLONG_MAX, false) < 0) goto error; - if (virDomainParseScaledValue("./space_soft_limit[1]", - NULL, ctxt, &def->space_soft_limit, - 1, ULLONG_MAX, false) < 0) + if (virParseScaledValue("./space_soft_limit[1]", + NULL, ctxt, &def->space_soft_limit, + 1, ULLONG_MAX, false) < 0) goto error; cur = node->children; @@ -15388,8 +15319,8 @@ virDomainShmemDefParseXML(virDomainXMLOptionPtr xmlopt, goto cleanup; } - if (virDomainParseScaledValue("./size[1]", NULL, ctxt, - &def->size, 1, ULLONG_MAX, false) < 0) + if (virParseScaledValue("./size[1]", NULL, ctxt, + &def->size, 1, ULLONG_MAX, false) < 0) goto cleanup; if ((server = virXPathNode("./server[1]", ctxt))) { @@ -19432,13 +19363,13 @@ virDomainFeaturesDefParse(virDomainDefPtr def, VIR_FREE(tmp); } - if (virDomainParseScaledValue("./features/hpt/maxpagesize", - NULL, - ctxt, - &def->hpt_maxpagesize, - 1024, - ULLONG_MAX, - false) < 0) { + if (virParseScaledValue("./features/hpt/maxpagesize", + NULL, + ctxt, + &def->hpt_maxpagesize, + 1024, + ULLONG_MAX, + false) < 0) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Unable to parse HPT maxpagesize setting")); @@ -19782,13 +19713,13 @@ virDomainFeaturesDefParse(virDomainDefPtr def, } if (def->features[VIR_DOMAIN_FEATURE_SMM] == VIR_TRISTATE_SWITCH_ON) { - int rv = virDomainParseScaledValue("string(./features/smm/tseg)", - "string(./features/smm/tseg/@unit)", - ctxt, - &def->tseg_size, - 1024 * 1024, /* Defaults to mebibytes */ - ULLONG_MAX, - false); + int rv = virParseScaledValue("string(./features/smm/tseg)", + "string(./features/smm/tseg/@unit)", + ctxt, + &def->tseg_size, + 1024 * 1024, /* Defaults to mebibytes */ + ULLONG_MAX, + false); if (rv < 0) goto error; def->tseg_specified = rv; @@ -20510,9 +20441,9 @@ virDomainCachetuneDefParseCache(xmlXPathContextPtr ctxt, return -1; } - if (virDomainParseScaledValue("./@size", "./@unit", - ctxt, &size, 1024, - ULLONG_MAX, true) < 0) + if (virParseScaledValue("./@size", "./@unit", + ctxt, &size, 1024, + ULLONG_MAX, true) < 0) return -1; if (virResctrlAllocSetCacheSize(alloc, level, type, cache, size) < 0) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 487eea4149..9f2abc1d8d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3484,6 +3484,7 @@ virVsockSetGuestCid; # util/virxml.h +virParseScaledValue; virXMLCheckIllegalChars; virXMLChildElementCount; virXMLExtractNamespaceXML; diff --git a/src/util/virxml.c b/src/util/virxml.c index 02b59ea2f8..e9429cd4ac 100644 --- a/src/util/virxml.c +++ b/src/util/virxml.c @@ -32,6 +32,7 @@ #include "viralloc.h" #include "virfile.h" #include "virstring.h" +#include "virutil.h" #define VIR_FROM_THIS VIR_FROM_XML @@ -1405,3 +1406,74 @@ virXMLNamespaceRegister(xmlXPathContextPtr ctxt, return 0; } + + +/** + * virParseScaledValue: + * @xpath: XPath to memory amount + * @units_xpath: XPath to units attribute + * @ctxt: XPath context + * @val: scaled value is stored here + * @scale: default scale for @val + * @max: maximal @val allowed + * @required: is the value required? + * + * Parse a value located at @xpath within @ctxt, and store the + * result into @val. The value is scaled by units located at + * @units_xpath (or the 'unit' attribute under @xpath if + * @units_xpath is NULL). If units are not present, the default + * @scale is used. If @required is set, then the value must + * exist; otherwise, the value is optional. The resulting value + * is in bytes. + * + * Returns 1 on success, + * 0 if the value was not present and !@required, + * -1 on failure after issuing error. + */ +int +virParseScaledValue(const char *xpath, + const char *units_xpath, + xmlXPathContextPtr ctxt, + unsigned long long *val, + unsigned long long scale, + unsigned long long max, + bool required) +{ + unsigned long long bytes; + g_autofree char *xpath_full = NULL; + g_autofree char *unit = NULL; + g_autofree char *bytes_str = NULL; + + *val = 0; + xpath_full = g_strdup_printf("string(%s)", xpath); + + bytes_str = virXPathString(xpath_full, ctxt); + if (!bytes_str) { + if (!required) + return 0; + virReportError(VIR_ERR_XML_ERROR, + _("missing element or attribute '%s'"), + xpath); + return -1; + } + VIR_FREE(xpath_full); + + if (virStrToLong_ullp(bytes_str, NULL, 10, &bytes) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid value '%s' for element or attribute '%s'"), + bytes_str, xpath); + return -1; + } + + if (units_xpath) + xpath_full = g_strdup_printf("string(%s)", units_xpath); + else + xpath_full = g_strdup_printf("string(%s/@unit)", xpath); + unit = virXPathString(xpath_full, ctxt); + + if (virScaleInteger(&bytes, unit, scale, max) < 0) + return -1; + + *val = bytes; + return 1; +} diff --git a/src/util/virxml.h b/src/util/virxml.h index ed178105f6..886d1e025f 100644 --- a/src/util/virxml.h +++ b/src/util/virxml.h @@ -270,3 +270,11 @@ virXMLNamespaceFormatNS(virBufferPtr buf, int virXMLNamespaceRegister(xmlXPathContextPtr ctxt, virXMLNamespace const *ns); + +int virParseScaledValue(const char *xpath, + const char *units_xpath, + xmlXPathContextPtr ctxt, + unsigned long long *val, + unsigned long long scale, + unsigned long long max, + bool required); -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
There is nothing domain specific about the function, thus it should not have virDomain prefix. Also, the fact that it is a static function makes it impossible to use from other files. Move the function to virxml.c and drop the 'Domain' infix.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

There are two functions virDomainNumaDefCPUFormatXML() and virDomainNumaDefCPUParseXML() which format and parse domain's <numa/>. There is nothing CPU specific about them. Drop the infix. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/cpu_conf.c | 2 +- src/conf/domain_conf.c | 2 +- src/conf/numa_conf.c | 8 ++++---- src/conf/numa_conf.h | 4 ++-- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index e1b0a5653f..a1514c2e14 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -734,7 +734,7 @@ virCPUDefFormatBufFull(virBufferPtr buf, if (virCPUDefFormatBuf(&childrenBuf, def) < 0) goto cleanup; - if (virDomainNumaDefCPUFormatXML(&childrenBuf, numa) < 0) + if (virDomainNumaDefFormatXML(&childrenBuf, numa) < 0) goto cleanup; /* Put it all together */ diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 65a110b16f..ce75831037 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -21394,7 +21394,7 @@ virDomainDefParseXML(xmlDocPtr xml, if (virCPUDefParseXML(ctxt, "./cpu[1]", VIR_CPU_TYPE_GUEST, &def->cpu) < 0) goto error; - if (virDomainNumaDefCPUParseXML(def->numa, ctxt) < 0) + if (virDomainNumaDefParseXML(def->numa, ctxt) < 0) goto error; if (virDomainNumaGetCPUCountTotal(def->numa) > virDomainDefGetVcpusMax(def)) { diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 21a8be2cac..2685a3e828 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -842,8 +842,8 @@ virDomainNumaDefNodeDistanceParseXML(virDomainNumaPtr def, } int -virDomainNumaDefCPUParseXML(virDomainNumaPtr def, - xmlXPathContextPtr ctxt) +virDomainNumaDefParseXML(virDomainNumaPtr def, + xmlXPathContextPtr ctxt) { xmlNodePtr *nodes = NULL; VIR_XPATH_NODE_AUTORESTORE(ctxt); @@ -970,8 +970,8 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def, int -virDomainNumaDefCPUFormatXML(virBufferPtr buf, - virDomainNumaPtr def) +virDomainNumaDefFormatXML(virBufferPtr buf, + virDomainNumaPtr def) { virDomainMemoryAccess memAccess; virTristateBool discard; diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h index cdf87a87e8..206dffe304 100644 --- a/src/conf/numa_conf.h +++ b/src/conf/numa_conf.h @@ -181,8 +181,8 @@ bool virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune, bool virDomainNumatuneNodeSpecified(virDomainNumaPtr numatune, int cellid); -int virDomainNumaDefCPUParseXML(virDomainNumaPtr def, xmlXPathContextPtr ctxt); -int virDomainNumaDefCPUFormatXML(virBufferPtr buf, virDomainNumaPtr def); +int virDomainNumaDefParseXML(virDomainNumaPtr def, xmlXPathContextPtr ctxt); +int virDomainNumaDefFormatXML(virBufferPtr buf, virDomainNumaPtr def); unsigned int virDomainNumaGetCPUCountTotal(virDomainNumaPtr numa); -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
There are two functions virDomainNumaDefCPUFormatXML() and virDomainNumaDefCPUParseXML() which format and parse domain's <numa/>. There is nothing CPU specific about them. Drop the infix.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

The function doesn't just build the argument for -numa. Since the -numa can be repeated multiple times, it also puts -numa onto the cmd line. Also, the rest of the functions has 'Command' infix. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index f355ddbfd5..05e5c19118 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -7122,10 +7122,10 @@ qemuBuildIOThreadCommandLine(virCommandPtr cmd, static int -qemuBuildNumaArgStr(virQEMUDriverConfigPtr cfg, - virDomainDefPtr def, - virCommandPtr cmd, - qemuDomainObjPrivatePtr priv) +qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, + virDomainDefPtr def, + virCommandPtr cmd, + qemuDomainObjPrivatePtr priv) { size_t i, j; virQEMUCapsPtr qemuCaps = priv->qemuCaps; @@ -9725,7 +9725,7 @@ qemuBuildCommandLine(virQEMUDriverPtr driver, return NULL; if (virDomainNumaGetNodeCount(def->numa) && - qemuBuildNumaArgStr(cfg, def, cmd, priv) < 0) + qemuBuildNumaCommandLine(cfg, def, cmd, priv) < 0) return NULL; if (qemuBuildMemoryDeviceCommandLine(cmd, cfg, def, priv) < 0) -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
The function doesn't just build the argument for -numa. Since the -numa can be repeated multiple times, it also puts -numa onto the cmd line. Also, the rest of the functions has 'Command' infix.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

The machine can not be NULL at this point - qemuDomainDefPostParse() makes sure it isn't. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_command.c | 7 ------- 1 file changed, 7 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 05e5c19118..c5b0ee231e 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6722,13 +6722,6 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; size_t i; - /* This should *never* be NULL, since we always provide - * a machine in the capabilities data for QEMU. So this - * check is just here as a safety in case the unexpected - * happens */ - if (!def->os.machine) - return 0; - virCommandAddArg(cmd, "-machine"); virBufferAdd(&buf, def->os.machine, -1); -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
The machine can not be NULL at this point - qemuDomainDefPostParse() makes sure it isn't.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

There is only one caller of virDomainNumaSetNodeCpumask() which checks for the return value but because the function will return NULL iff the @cpumask was NULL in the first place. But in that place @cpumask can't be NULL because it was just allocated by virBitmapParse(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/numa_conf.c | 4 +--- src/conf/numa_conf.h | 6 +++--- src/libxl/xen_xl.c | 4 ++-- 3 files changed, 6 insertions(+), 8 deletions(-) diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 2685a3e828..7b9084456e 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -1317,14 +1317,12 @@ virDomainNumaGetNodeCpumask(virDomainNumaPtr numa, } -virBitmapPtr +void virDomainNumaSetNodeCpumask(virDomainNumaPtr numa, size_t node, virBitmapPtr cpumask) { numa->mem_nodes[node].cpumask = cpumask; - - return numa->mem_nodes[node].cpumask; } diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h index 206dffe304..25e896826b 100644 --- a/src/conf/numa_conf.h +++ b/src/conf/numa_conf.h @@ -155,9 +155,9 @@ size_t virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa, size_t ndistances) ATTRIBUTE_NONNULL(1); -virBitmapPtr virDomainNumaSetNodeCpumask(virDomainNumaPtr numa, - size_t node, - virBitmapPtr cpumask) +void virDomainNumaSetNodeCpumask(virDomainNumaPtr numa, + size_t node, + virBitmapPtr cpumask) ATTRIBUTE_NONNULL(1); /* diff --git a/src/libxl/xen_xl.c b/src/libxl/xen_xl.c index d40c2e1d8e..1c1279e1cb 100644 --- a/src/libxl/xen_xl.c +++ b/src/libxl/xen_xl.c @@ -508,10 +508,10 @@ xenParseXLVnuma(virConfPtr conf, goto cleanup; } - if ((virBitmapParse(vtoken, &cpumask, VIR_DOMAIN_CPUMASK_LEN) < 0) || - (virDomainNumaSetNodeCpumask(numa, vnodeCnt, cpumask) == NULL)) + if (virBitmapParse(vtoken, &cpumask, VIR_DOMAIN_CPUMASK_LEN) < 0) goto cleanup; + virDomainNumaSetNodeCpumask(numa, vnodeCnt, cpumask); vcpus += virBitmapCountBits(cpumask); } else if (STRPREFIX(str, "vdistances")) { -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
There is only one caller of virDomainNumaSetNodeCpumask() which checks for the return value but because the function will return NULL iff the @cpumask was NULL in the first place. But in that place @cpumask can't be NULL because it was just allocated by virBitmapParse().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

QEMU allows creating NUMA nodes that have memory only. These are somehow important for HMAT. With check done in qemuValidateDomainDef() for QEMU 2.7 or newer, we can be sure that the vCPUs are fully assigned to NUMA nodes in domain XML. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 2 + docs/schemas/cputypes.rng | 8 ++- src/conf/numa_conf.c | 59 ++++++++++--------- src/libxl/xen_xl.c | 10 ++-- src/qemu/qemu_command.c | 26 ++++---- src/qemu/qemu_validate.c | 22 +++---- tests/qemuxml2argvdata/numatune-no-vcpu.args | 33 +++++++++++ tests/qemuxml2argvdata/numatune-no-vcpu.xml | 42 +++++++++++++ tests/qemuxml2argvtest.c | 1 + tests/qemuxml2xmloutdata/numatune-no-vcpu.xml | 1 + tests/qemuxml2xmltest.c | 1 + 11 files changed, 149 insertions(+), 56 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-no-vcpu.args create mode 100644 tests/qemuxml2argvdata/numatune-no-vcpu.xml create mode 120000 tests/qemuxml2xmloutdata/numatune-no-vcpu.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index bd662727d3..07dcca57f5 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1840,6 +1840,8 @@ consistent across qemu and libvirt versions. <code>memory</code> specifies the node memory in kibibytes (i.e. blocks of 1024 bytes). + <span class="since">Since 6.6.0</span> the <code>cpus</code> attribute + is optional and if omitted a CPU-less NUMA node is created. <span class="since">Since 1.2.11</span> one can use an additional <a href="#elementsMemoryAllocation"><code>unit</code></a> attribute to define units in which <code>memory</code> is specified. diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng index e2744acad3..a1682a1003 100644 --- a/docs/schemas/cputypes.rng +++ b/docs/schemas/cputypes.rng @@ -115,9 +115,11 @@ <ref name="unsignedInt"/> </attribute> </optional> - <attribute name="cpus"> - <ref name="cpuset"/> - </attribute> + <optional> + <attribute name="cpus"> + <ref name="cpuset"/> + </attribute> + </optional> <attribute name="memory"> <ref name="memoryKB"/> </attribute> diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 7b9084456e..7cf62ce7da 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -889,32 +889,28 @@ virDomainNumaDefParseXML(virDomainNumaPtr def, } VIR_FREE(tmp); - if (def->mem_nodes[cur_cell].cpumask) { + if (def->mem_nodes[cur_cell].mem) { virReportError(VIR_ERR_XML_ERROR, _("Duplicate NUMA cell info for cell id '%u'"), cur_cell); goto cleanup; } - if (!(tmp = virXMLPropString(nodes[i], "cpus"))) { - virReportError(VIR_ERR_XML_ERROR, "%s", - _("Missing 'cpus' attribute in NUMA cell")); - goto cleanup; - } + if ((tmp = virXMLPropString(nodes[i], "cpus"))) { + g_autoptr(virBitmap) cpumask = NULL; - if (virBitmapParse(tmp, &def->mem_nodes[cur_cell].cpumask, - VIR_DOMAIN_CPUMASK_LEN) < 0) - goto cleanup; + if (virBitmapParse(tmp, &cpumask, VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; - if (virBitmapIsAllClear(def->mem_nodes[cur_cell].cpumask)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("NUMA cell %d has no vCPUs assigned"), cur_cell); - goto cleanup; + if (!virBitmapIsAllClear(cpumask)) + def->mem_nodes[cur_cell].cpumask = g_steal_pointer(&cpumask); + VIR_FREE(tmp); } - VIR_FREE(tmp); for (j = 0; j < n; j++) { - if (j == cur_cell || !def->mem_nodes[j].cpumask) + if (j == cur_cell || + !def->mem_nodes[j].cpumask || + !def->mem_nodes[cur_cell].cpumask) continue; if (virBitmapOverlaps(def->mem_nodes[j].cpumask, @@ -975,7 +971,6 @@ virDomainNumaDefFormatXML(virBufferPtr buf, { virDomainMemoryAccess memAccess; virTristateBool discard; - char *cpustr; size_t ncells = virDomainNumaGetNodeCount(def); size_t i; @@ -985,17 +980,22 @@ virDomainNumaDefFormatXML(virBufferPtr buf, virBufferAddLit(buf, "<numa>\n"); virBufferAdjustIndent(buf, 2); for (i = 0; i < ncells; i++) { + virBitmapPtr cpumask = virDomainNumaGetNodeCpumask(def, i); int ndistances; memAccess = virDomainNumaGetNodeMemoryAccessMode(def, i); discard = virDomainNumaGetNodeDiscard(def, i); - if (!(cpustr = virBitmapFormat(virDomainNumaGetNodeCpumask(def, i)))) - return -1; - virBufferAddLit(buf, "<cell"); virBufferAsprintf(buf, " id='%zu'", i); - virBufferAsprintf(buf, " cpus='%s'", cpustr); + + if (cpumask) { + g_autofree char *cpustr = virBitmapFormat(cpumask); + + if (!cpustr) + return -1; + virBufferAsprintf(buf, " cpus='%s'", cpustr); + } virBufferAsprintf(buf, " memory='%llu'", virDomainNumaGetNodeMemorySize(def, i)); virBufferAddLit(buf, " unit='KiB'"); @@ -1031,8 +1031,6 @@ virDomainNumaDefFormatXML(virBufferPtr buf, virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</cell>\n"); } - - VIR_FREE(cpustr); } virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</numa>\n"); @@ -1047,8 +1045,12 @@ virDomainNumaGetCPUCountTotal(virDomainNumaPtr numa) size_t i; unsigned int ret = 0; - for (i = 0; i < numa->nmem_nodes; i++) - ret += virBitmapCountBits(virDomainNumaGetNodeCpumask(numa, i)); + for (i = 0; i < numa->nmem_nodes; i++) { + virBitmapPtr cpumask = virDomainNumaGetNodeCpumask(numa, i); + + if (cpumask) + ret += virBitmapCountBits(cpumask); + } return ret; } @@ -1060,11 +1062,14 @@ virDomainNumaGetMaxCPUID(virDomainNumaPtr numa) unsigned int ret = 0; for (i = 0; i < numa->nmem_nodes; i++) { + virBitmapPtr cpumask = virDomainNumaGetNodeCpumask(numa, i); int bit; - bit = virBitmapLastSetBit(virDomainNumaGetNodeCpumask(numa, i)); - if (bit > ret) - ret = bit; + if (cpumask) { + bit = virBitmapLastSetBit(cpumask); + if (bit > ret) + ret = bit; + } } return ret; diff --git a/src/libxl/xen_xl.c b/src/libxl/xen_xl.c index 1c1279e1cb..b99ddd150b 100644 --- a/src/libxl/xen_xl.c +++ b/src/libxl/xen_xl.c @@ -1444,19 +1444,21 @@ xenFormatXLVnuma(virConfValuePtr list, { int ret = -1; size_t i; - virBuffer buf = VIR_BUFFER_INITIALIZER; virConfValuePtr numaVnode, tmp; - + virBitmapPtr cpumask = virDomainNumaGetNodeCpumask(numa, node); size_t nodeSize = virDomainNumaGetNodeMemorySize(numa, node) / 1024; - char *nodeVcpus = virBitmapFormat(virDomainNumaGetNodeCpumask(numa, node)); + g_autofree char *nodeVcpus = NULL; - if (VIR_ALLOC(numaVnode) < 0) + if (!cpumask || + VIR_ALLOC(numaVnode) < 0) goto cleanup; numaVnode->type = VIR_CONF_LIST; numaVnode->list = NULL; + nodeVcpus = virBitmapFormat(cpumask); + /* pnode */ virBufferAsprintf(&buf, "pnode=%zu", node); xenFormatXLVnode(numaVnode, &buf); diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c5b0ee231e..628ff970bd 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -7123,8 +7123,6 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, size_t i, j; virQEMUCapsPtr qemuCaps = priv->qemuCaps; g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; - char *cpumask = NULL; - char *tmpmask = NULL; char *next = NULL; virBufferPtr nodeBackends = NULL; bool needBackend = false; @@ -7169,9 +7167,7 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, goto cleanup; for (i = 0; i < ncells; i++) { - VIR_FREE(cpumask); - if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i)))) - goto cleanup; + virBitmapPtr cpumask = virDomainNumaGetNodeCpumask(def->numa, i); if (needBackend) { virCommandAddArg(cmd, "-object"); @@ -7181,11 +7177,19 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, virCommandAddArg(cmd, "-numa"); virBufferAsprintf(&buf, "node,nodeid=%zu", i); - for (tmpmask = cpumask; tmpmask; tmpmask = next) { - if ((next = strchr(tmpmask, ','))) - *(next++) = '\0'; - virBufferAddLit(&buf, ",cpus="); - virBufferAdd(&buf, tmpmask, -1); + if (cpumask) { + g_autofree char *cpumaskStr = NULL; + char *tmpmask; + + if (!(cpumaskStr = virBitmapFormat(cpumask))) + goto cleanup; + + for (tmpmask = cpumaskStr; tmpmask; tmpmask = next) { + if ((next = strchr(tmpmask, ','))) + *(next++) = '\0'; + virBufferAddLit(&buf, ",cpus="); + virBufferAdd(&buf, tmpmask, -1); + } } if (needBackend) @@ -7216,8 +7220,6 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, ret = 0; cleanup: - VIR_FREE(cpumask); - if (nodeBackends) { for (i = 0; i < ncells; i++) virBufferFreeAndReset(&nodeBackends[i]); diff --git a/src/qemu/qemu_validate.c b/src/qemu/qemu_validate.c index 5082a29dc7..53a03c1e0b 100644 --- a/src/qemu/qemu_validate.c +++ b/src/qemu/qemu_validate.c @@ -654,7 +654,7 @@ qemuValidateDomainDefNuma(const virDomainDef *def, } for (i = 0; i < ncells; i++) { - g_autofree char * cpumask = NULL; + virBitmapPtr cpumask = virDomainNumaGetNodeCpumask(def->numa, i); if (!hasMemoryCap && virDomainNumaGetNodeMemoryAccessMode(def->numa, i)) { @@ -664,17 +664,19 @@ qemuValidateDomainDefNuma(const virDomainDef *def, return -1; } - if (!(cpumask = virBitmapFormat(virDomainNumaGetNodeCpumask(def->numa, i)))) - return -1; + if (cpumask) { + g_autofree char * cpumaskStr = NULL; + if (!(cpumaskStr = virBitmapFormat(cpumask))) + return -1; - if (strchr(cpumask, ',') && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("disjoint NUMA cpu ranges are not supported " - "with this QEMU")); - return -1; + if (strchr(cpumaskStr, ',') && + !virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("disjoint NUMA cpu ranges are not supported " + "with this QEMU")); + return -1; + } } - } if (virDomainNumaNodesDistancesAreBeingSet(def->numa) && diff --git a/tests/qemuxml2argvdata/numatune-no-vcpu.args b/tests/qemuxml2argvdata/numatune-no-vcpu.args new file mode 100644 index 0000000000..a1f1ee044e --- /dev/null +++ b/tests/qemuxml2argvdata/numatune-no-vcpu.args @@ -0,0 +1,33 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest \ +-S \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off \ +-m 12288 \ +-realtime mlock=off \ +-smp 12,sockets=12,cores=1,threads=1 \ +-numa node,nodeid=0,cpus=0-3,mem=2048 \ +-numa node,nodeid=1,cpus=4-7,mem=2048 \ +-numa node,nodeid=2,cpus=8-11,mem=2048 \ +-numa node,nodeid=3,mem=2048 \ +-numa node,nodeid=4,mem=2048 \ +-numa node,nodeid=5,mem=2048 \ +-uuid c7a5fdb2-cdaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/numatune-no-vcpu.xml b/tests/qemuxml2argvdata/numatune-no-vcpu.xml new file mode 100644 index 0000000000..f25a07d7ed --- /dev/null +++ b/tests/qemuxml2argvdata/numatune-no-vcpu.xml @@ -0,0 +1,42 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>c7a5fdb2-cdaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>12582912</memory> + <currentMemory unit='KiB'>12582912</currentMemory> + <vcpu placement='static'>12</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu> + <numa> + <cell id='0' cpus='0-3' memory='2097152' unit='KiB'/> + <cell id='1' cpus='4-7' memory='2097152' unit='KiB'/> + <cell id='2' cpus='8-11' memory='2097152' unit='KiB'/> + <cell id='3' memory='2097152' unit='KiB'/> + <cell id='4' memory='2097152' unit='KiB'/> + <cell id='5' memory='2097152' unit='KiB'/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 1195f9c982..98d9a7b5b2 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1912,6 +1912,7 @@ mymain(void) DO_TEST_PARSE_ERROR("numatune-memnode-no-memory", NONE); DO_TEST("numatune-distances", QEMU_CAPS_NUMA, QEMU_CAPS_NUMA_DIST); + DO_TEST("numatune-no-vcpu", NONE); DO_TEST("numatune-auto-nodeset-invalid", NONE); DO_TEST("numatune-auto-prefer", QEMU_CAPS_OBJECT_MEMORY_RAM, diff --git a/tests/qemuxml2xmloutdata/numatune-no-vcpu.xml b/tests/qemuxml2xmloutdata/numatune-no-vcpu.xml new file mode 120000 index 0000000000..f213032685 --- /dev/null +++ b/tests/qemuxml2xmloutdata/numatune-no-vcpu.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/numatune-no-vcpu.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index d203c97e36..376c427415 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1105,6 +1105,7 @@ mymain(void) DO_TEST("numatune-memnode", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("numatune-memnode-no-memory", QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("numatune-distances", QEMU_CAPS_NUMA, QEMU_CAPS_NUMA_DIST); + DO_TEST("numatune-no-vcpu", QEMU_CAPS_NUMA); DO_TEST("bios-nvram", NONE); DO_TEST("bios-nvram-os-interleave", NONE); -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
QEMU allows creating NUMA nodes that have memory only. These are somehow important for HMAT.
With check done in qemuValidateDomainDef() for QEMU 2.7 or newer,
You're mentioning "QEMU 2.7 or newer" but the code in qemu_validate is checking for QEMU_CAPS_NUMA. I'm assuming that QEMU 2.7 is where QEMU_CAPS_NUMA first appeared. In this case, I think that an adendum like "QEMU 2.7 or newer (checked via QEMU_CAPS_NUMA)" would be nice to clarify.
we can be sure that the vCPUs are fully assigned to NUMA nodes in domain XML.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 2 + docs/schemas/cputypes.rng | 8 ++- src/conf/numa_conf.c | 59 ++++++++++--------- src/libxl/xen_xl.c | 10 ++-- src/qemu/qemu_command.c | 26 ++++---- src/qemu/qemu_validate.c | 22 +++---- tests/qemuxml2argvdata/numatune-no-vcpu.args | 33 +++++++++++ tests/qemuxml2argvdata/numatune-no-vcpu.xml | 42 +++++++++++++ tests/qemuxml2argvtest.c | 1 + tests/qemuxml2xmloutdata/numatune-no-vcpu.xml | 1 + tests/qemuxml2xmltest.c | 1 + 11 files changed, 149 insertions(+), 56 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-no-vcpu.args create mode 100644 tests/qemuxml2argvdata/numatune-no-vcpu.xml create mode 120000 tests/qemuxml2xmloutdata/numatune-no-vcpu.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index bd662727d3..07dcca57f5 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1840,6 +1840,8 @@ consistent across qemu and libvirt versions. <code>memory</code> specifies the node memory in kibibytes (i.e. blocks of 1024 bytes). + <span class="since">Since 6.6.0</span> the <code>cpus</code> attribute
s/Since 6.6.0/Since 6.5.0 ? Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

To cite ACPI specification: Heterogeneous Memory Attribute Table describes the memory attributes, such as memory side cache attributes and bandwidth and latency details, related to the System Physical Address (SPA) Memory Ranges. The software is expected to use this information as hint for optimization. According to our upstream discussion [1] this is exposed under <numa/> as <cache/> under NUMA <cell/> and <latency> or <bandwidth/> under numa/latencies. 1: https://www.redhat.com/archives/libvir-list/2020-January/msg00422.html Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 88 ++++++ docs/schemas/cputypes.rng | 110 ++++++- src/conf/numa_conf.c | 350 ++++++++++++++++++++- src/conf/numa_conf.h | 33 ++ src/libvirt_private.syms | 6 + tests/qemuxml2argvdata/numatune-hmat.xml | 52 +++ tests/qemuxml2xmloutdata/numatune-hmat.xml | 1 + tests/qemuxml2xmltest.c | 1 + 8 files changed, 625 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-hmat.xml create mode 120000 tests/qemuxml2xmloutdata/numatune-hmat.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 07dcca57f5..78b2d2828d 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1931,6 +1931,94 @@ using 10 for local and 20 for remote distances. </p> + <h4><a id="hmat">Heterogeneous Memory Attribute Table</a></h4> + +<pre> +... +<cpu> + ... + <numa> + <cell id='0' cpus='0-3' memory='512000' unit='KiB' discard='yes'/> + <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/> + <cell id='3' cpus='0-3' memory='2097152' unit='KiB'> + <cache level='1' associativity='direct' policy='writeback'> + <size value='10' unit='KiB'/> + <line value='8' unit='B'/> + </cache> + </cell> + <latencies> + <latency initiator='0' target='0' type='access' value='5'/> + <latency initiator='0' target='0' cache='1' type='access' value='10'/> + <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> + </latencies> + </numa> + ... +</cpu> +...</pre> + + <p> + <span class='since'>Since 6.5.0</span> the <code>cell</code> element can + have <code>cache</code> child element which describes memory side cache + for memory proximity domains. The <code>cache</code> element has + <code>level</code> attribute describing the cache level and thus the + element can be repeated multiple times to describe different levels of + the cache. + </p> + + <p> + The <code>cache</code> element then has <code>associativity</code> + attribute (accepted values are <code>none</code>, <code>direct</code> and + <code>full</code>) describing the cache associativity, and + <code>policy</code> attribute (accepted values are <code>none</code>, + <code>writeback</code> and <code>writethrough</code>) describing cache + write associativity. The element has two mandatory child elements then: + <code>size</code> and <code>line</code> which describe cache size and + cache line size. Both elements accept two attributes: <code>value</code> + and <code>unit</code> which set the value of corresponding cache + attribute. + </p> + + <p> + The NUMA description has optional <code>latencies</code> element that + describes the normalized memory read/write latency, read/write bandwidth + between Initiator Proximity Domains (Processor or I/O) and Target + Proximity Domains (Memory). + </p> + + <p> + The <code>latencies</code> element can have zero or more + <code>latency</code> child elements to describe latency between two + memory nodes and zero or more <code>bandwidth</code> child elements to + describe bandwidth between two memory nodes. Both these have the + following mandatory attributes: + </p> + + <dl> + <dt><code>initiator</code></dt> + <dd>Refers to the source NUMA node</dd> + + <dt><code>target</code></dt> + <dd>Refers to the target NUMA node</dd> + + <dt><code>type</code></dt> + <dd>The type of the access. Accepted values: <code>access</code>, + <code>read</code>, <code>write</code></dd> + + <dt><code>value</code></dt> + <dd>The actual value. For latency this is delay in nanoseconds, for + bandwidth this value is in kibibytes per second. Use additional + <code>unit</code> attribute to change the units.</dd> + </dl> + + <p> + To describe latency from one NUMA node to a cache of another NUMA node + the <code>latency</code> element has optional <code>cache</code> + attribute which in combination with <code>target</code> attribute creates + full reference to distant NUMA node's cache level. For instance, + <code>target='0' cache='1'</code> refers to the first level cache of NUMA + node 0. + </p> + <h3><a id="elementsEvents">Events configuration</a></h3> <p> diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng index a1682a1003..70c455cfa9 100644 --- a/docs/schemas/cputypes.rng +++ b/docs/schemas/cputypes.rng @@ -102,9 +102,14 @@ <define name="cpuNuma"> <element name="numa"> - <oneOrMore> - <ref name="numaCell"/> - </oneOrMore> + <interleave> + <oneOrMore> + <ref name="numaCell"/> + </oneOrMore> + <optional> + <ref name="numaLatencies"/> + </optional> + </interleave> </element> </define> @@ -148,6 +153,9 @@ </oneOrMore> </element> </optional> + <zeroOrMore> + <ref name="numaCache"/> + </zeroOrMore> </element> </define> @@ -162,6 +170,102 @@ </element> </define> + <define name="numaCache"> + <element name="cache"> + <attribute name="level"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="associativity"> + <choice> + <value>none</value> + <value>direct</value> + <value>full</value> + </choice> + </attribute> + <attribute name="policy"> + <choice> + <value>none</value> + <value>writeback</value> + <value>writethrough</value> + </choice> + </attribute> + <interleave> + <element name="size"> + <attribute name="value"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="unit"> + <ref name="unit"/> + </attribute> + </element> + <element name="line"> + <attribute name="value"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="unit"> + <ref name="unit"/> + </attribute> + </element> + </interleave> + </element> + </define> + + <define name="numaLatencies"> + <element name="latencies"> + <interleave> + <zeroOrMore> + <element name="latency"> + <attribute name="initiator"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="target"> + <ref name="unsignedInt"/> + </attribute> + <optional> + <attribute name="cache"> + <ref name="unsignedInt"/> + </attribute> + </optional> + <attribute name="type"> + <choice> + <value>access</value> + <value>read</value> + <value>write</value> + </choice> + </attribute> + <attribute name="value"> + <ref name="unsignedInt"/> + </attribute> + <empty/> + </element> + </zeroOrMore> + <zeroOrMore> + <element name="bandwidth"> + <attribute name="initiator"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="target"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="type"> + <choice> + <value>access</value> + <value>read</value> + <value>write</value> + </choice> + </attribute> + <attribute name="value"> + <ref name="unsignedInt"/> + </attribute> + <attribute name="unit"> + <ref name="unit"/> + </attribute> + </element> + </zeroOrMore> + </interleave> + </element> + </define> + <!-- Memory as an attribute is in KiB, no way to express a unit --> <define name="memoryKB"> <data type="unsignedLong"/> diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 7cf62ce7da..f510dfebce 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -59,9 +59,37 @@ VIR_ENUM_IMPL(virDomainMemoryAccess, "private", ); +VIR_ENUM_IMPL(virDomainCacheAssociativity, + VIR_DOMAIN_CACHE_ASSOCIATIVITY_LAST, + "none", + "direct", + "full", +); + +VIR_ENUM_IMPL(virDomainCachePolicy, + VIR_DOMAIN_CACHE_POLICY_LAST, + "none", + "writeback", + "writethrough", +); + +VIR_ENUM_IMPL(virDomainMemoryLatency, + VIR_DOMAIN_MEMORY_LATENCY_LAST, + "none", + "access", + "read", + "write" +); + typedef struct _virDomainNumaDistance virDomainNumaDistance; typedef virDomainNumaDistance *virDomainNumaDistancePtr; +typedef struct _virDomainNumaCache virDomainNumaCache; +typedef virDomainNumaCache *virDomainNumaCachePtr; + +typedef struct _virDomainNumaLatency virDomainNumaLatency; +typedef virDomainNumaLatency *virDomainNumaLatencyPtr; + typedef struct _virDomainNumaNode virDomainNumaNode; typedef virDomainNumaNode *virDomainNumaNodePtr; @@ -86,9 +114,30 @@ struct _virDomainNuma { unsigned int cellid; } *distances; /* remote node distances */ size_t ndistances; + + struct _virDomainNumaCache { + unsigned int level; /* cache level */ + unsigned int size; /* cache size */ + unsigned int line; /* line size, !!! in bytes !!! */ + virDomainCacheAssociativity associativity; /* cache associativity */ + virDomainCachePolicy policy; /* cache policy */ + } *caches; + size_t ncaches; } *mem_nodes; /* guest node configuration */ size_t nmem_nodes; + struct _virDomainNumaLatency { + virDomainNumaLatencyType type; /* whether structure describes latency + or bandwidth */ + unsigned int initiator; /* the initiator NUMA node */ + unsigned int target; /* the target NUMA node */ + unsigned int cache; /* the target cache on @target; if 0 then the + memory on @target */ + virDomainMemoryLatency accessType; /* what type of access is defined */ + unsigned long value; /* value itself */ + } *latencies; + size_t nlatencies; + /* Future NUMA tuning related stuff should go here. */ }; @@ -368,9 +417,13 @@ virDomainNumaFree(virDomainNumaPtr numa) if (numa->mem_nodes[i].ndistances > 0) VIR_FREE(numa->mem_nodes[i].distances); + + VIR_FREE(numa->mem_nodes[i].caches); } VIR_FREE(numa->mem_nodes); + VIR_FREE(numa->latencies); + VIR_FREE(numa); } @@ -841,12 +894,102 @@ virDomainNumaDefNodeDistanceParseXML(virDomainNumaPtr def, return ret; } + +static int +virDomainNumaDefNodeCacheParseXML(virDomainNumaPtr def, + xmlXPathContextPtr ctxt, + unsigned int cur_cell) +{ + g_autofree xmlNodePtr *nodes = NULL; + int n; + size_t i; + + if ((n = virXPathNodeSet("./cache", ctxt, &nodes)) < 0) + return -1; + + def->mem_nodes[cur_cell].caches = g_new0(virDomainNumaCache, n); + + for (i = 0; i < n; i++) { + VIR_XPATH_NODE_AUTORESTORE(ctxt); + virDomainNumaCachePtr cache = &def->mem_nodes[cur_cell].caches[i]; + g_autofree char *tmp = NULL; + unsigned int level; + int associativity; + int policy; + unsigned long long size; + unsigned long long line; + + if (!(tmp = virXMLPropString(nodes[i], "level"))) { + virReportError(VIR_ERR_XML_ERROR, + _("Missing 'level' attribute in cache " + "element for NUMA node %d"), + cur_cell); + return -1; + } + + if (virStrToLong_uip(tmp, NULL, 10, &level) < 0 || + level == 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid 'level' attribute in cache " + "element for NUMA node %d"), + cur_cell); + return -1; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "associativity"))) { + virReportError(VIR_ERR_XML_ERROR, + _("Missing 'associativity' attribute in cache " + "element for NUMA node %d"), + cur_cell); + return -1; + } + + if ((associativity = virDomainCacheAssociativityTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid cache associativity '%s'"), + tmp); + return -1; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "policy"))) { + virReportError(VIR_ERR_XML_ERROR, + _("Missing 'policy' attribute in cache " + "element for NUMA node %d"), + cur_cell); + } + + if ((policy = virDomainCachePolicyTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_XML_ERROR, + _("Invalid cache policy '%s'"), + tmp); + return -1; + } + VIR_FREE(tmp); + + ctxt->node = nodes[i]; + if (virDomainParseMemory("./size/@value", "./size/unit", + ctxt, &size, true, false) < 0) + return -1; + + if (virParseScaledValue("./line/@value", "./line/unit", + ctxt, &line, 1, ULLONG_MAX, true) < 0) + return -1; + + *cache = (virDomainNumaCache){level, size, line, associativity, policy}; + def->mem_nodes[cur_cell].ncaches++; + } + + return 0; +} + + int virDomainNumaDefParseXML(virDomainNumaPtr def, xmlXPathContextPtr ctxt) { xmlNodePtr *nodes = NULL; - VIR_XPATH_NODE_AUTORESTORE(ctxt); char *tmp = NULL; int n; size_t i, j; @@ -867,6 +1010,7 @@ virDomainNumaDefParseXML(virDomainNumaPtr def, def->nmem_nodes = n; for (i = 0; i < n; i++) { + VIR_XPATH_NODE_AUTORESTORE(ctxt); int rc; unsigned int cur_cell = i; @@ -953,7 +1097,109 @@ virDomainNumaDefParseXML(virDomainNumaPtr def, /* Parse NUMA distances info */ if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) + goto cleanup; + + /* Parse cache info */ + if (virDomainNumaDefNodeCacheParseXML(def, ctxt, cur_cell) < 0) + goto cleanup; + } + + VIR_FREE(nodes); + if ((n = virXPathNodeSet("./cpu/numa[1]/latencies[1]/latency|" + "./cpu/numa[1]/latencies[1]/bandwidth", ctxt, &nodes)) < 0) + goto cleanup; + + def->latencies = g_new0(virDomainNumaLatency, n); + for (i = 0; i < n; i++) { + virDomainNumaLatencyType type; + unsigned int initiator; + unsigned int target; + unsigned int cache = 0; + int accessType; + unsigned long long value; + + if (virXMLNodeNameEqual(nodes[i], "latency")) { + type = VIR_DOMAIN_NUMA_LATENCY_TYPE_LATENCY; + + if (!(tmp = virXMLPropString(nodes[i], "value"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing 'value' attribute in NUMA latencies")); goto cleanup; + } + + if (virStrToLong_ullp(tmp, NULL, 10, &value) < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Invalid 'value' attribute in NUMA latencies")); + goto cleanup; + } + VIR_FREE(tmp); + } else if (virXMLNodeNameEqual(nodes[i], "bandwidth")) { + VIR_XPATH_NODE_AUTORESTORE(ctxt); + type = VIR_DOMAIN_NUMA_LATENCY_TYPE_BANDWIDTH; + + ctxt->node = nodes[i]; + + if (virDomainParseMemory("./@value", "./@unit", ctxt, &value, true, false) < 0) + goto cleanup; + } else { + /* Ignore yet unknown child elements. */ + continue; + } + + if (!(tmp = virXMLPropString(nodes[i], "initiator"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing 'initiator' attribute in NUMA latencies")); + goto cleanup; + } + + if (virStrToLong_uip(tmp, NULL, 10, &initiator) < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Invalid 'initiator' attribute in NUMA latencies")); + goto cleanup; + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "target"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing 'target' attribute in NUMA latencies")); + goto cleanup; + } + + if (virStrToLong_uip(tmp, NULL, 10, &target) < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Invalid 'target' attribute in NUMA latencies")); + goto cleanup; + } + VIR_FREE(tmp); + + + /* cache attribute is optional */ + if ((tmp = virXMLPropString(nodes[i], "cache"))) { + if (virStrToLong_uip(tmp, NULL, 10, &cache) < 0 || + cache == 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Invalid 'cache' attribute in NUMA latencies")); + goto cleanup; + } + } + VIR_FREE(tmp); + + if (!(tmp = virXMLPropString(nodes[i], "type"))) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Missing 'type' attribute in NUMA latencies")); + goto cleanup; + } + + if ((accessType = virDomainMemoryLatencyTypeFromString(tmp)) <= 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Invalid 'type' attribute in NUMA latencies")); + goto cleanup; + } + VIR_FREE(tmp); + + def->latencies[i] = (virDomainNumaLatency) {type, initiator, target, + cache, accessType, value}; + def->nlatencies++; } ret = 0; @@ -982,6 +1228,7 @@ virDomainNumaDefFormatXML(virBufferPtr buf, for (i = 0; i < ncells; i++) { virBitmapPtr cpumask = virDomainNumaGetNodeCpumask(def, i); int ndistances; + size_t ncaches; memAccess = virDomainNumaGetNodeMemoryAccessMode(def, i); discard = virDomainNumaGetNodeDiscard(def, i); @@ -1008,30 +1255,107 @@ virDomainNumaDefFormatXML(virBufferPtr buf, virTristateBoolTypeToString(discard)); ndistances = def->mem_nodes[i].ndistances; - if (ndistances == 0) { + ncaches = def->mem_nodes[i].ncaches; + if (ndistances == 0 && ncaches == 0) { virBufferAddLit(buf, "/>\n"); } else { size_t j; - virDomainNumaDistancePtr distances = def->mem_nodes[i].distances; virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2); - virBufferAddLit(buf, "<distances>\n"); - virBufferAdjustIndent(buf, 2); - for (j = 0; j < ndistances; j++) { - if (distances[j].value) { - virBufferAddLit(buf, "<sibling"); - virBufferAsprintf(buf, " id='%d'", distances[j].cellid); - virBufferAsprintf(buf, " value='%d'", distances[j].value); - virBufferAddLit(buf, "/>\n"); + + if (ndistances) { + virDomainNumaDistancePtr distances = def->mem_nodes[i].distances; + + virBufferAddLit(buf, "<distances>\n"); + virBufferAdjustIndent(buf, 2); + for (j = 0; j < ndistances; j++) { + if (distances[j].value) { + virBufferAddLit(buf, "<sibling"); + virBufferAsprintf(buf, " id='%d'", distances[j].cellid); + virBufferAsprintf(buf, " value='%d'", distances[j].value); + virBufferAddLit(buf, "/>\n"); + } } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</distances>\n"); + } + + for (j = 0; j < ncaches; j++) { + virDomainNumaCachePtr cache = &def->mem_nodes[i].caches[j]; + + virBufferAsprintf(buf, "<cache level='%u'", cache->level); + if (cache->associativity) { + virBufferAsprintf(buf, " associativity='%s'", + virDomainCacheAssociativityTypeToString(cache->associativity)); + } + + if (cache->policy) { + virBufferAsprintf(buf, " policy='%s'", + virDomainCachePolicyTypeToString(cache->policy)); + } + virBufferAddLit(buf, ">\n"); + + virBufferAdjustIndent(buf, 2); + virBufferAsprintf(buf, + "<size value='%u' unit='KiB'/>\n", + cache->size); + + if (cache->line) { + virBufferAsprintf(buf, + "<line value='%u' unit='B'/>\n", + cache->line); + } + + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</cache>\n"); } - virBufferAdjustIndent(buf, -2); - virBufferAddLit(buf, "</distances>\n"); virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</cell>\n"); } } + + if (def->nlatencies) { + virBufferAddLit(buf, "<latencies>\n"); + virBufferAdjustIndent(buf, 2); + } + + for (i = 0; i < def->nlatencies; i++) { + virDomainNumaLatencyPtr l = &def->latencies[i]; + + switch (l->type) { + case VIR_DOMAIN_NUMA_LATENCY_TYPE_LATENCY: + virBufferAddLit(buf, "<latency"); + break; + case VIR_DOMAIN_NUMA_LATENCY_TYPE_BANDWIDTH: + virBufferAddLit(buf, "<bandwidth"); + } + + virBufferAsprintf(buf, + " initiator='%u' target='%u'", + l->initiator, l->target); + + if (l->cache > 0) { + virBufferAsprintf(buf, + " cache='%u'", + l->cache); + } + + virBufferAsprintf(buf, + " type='%s' value='%lu'", + virDomainMemoryLatencyTypeToString(l->accessType), + l->value); + + if (l->type == VIR_DOMAIN_NUMA_LATENCY_TYPE_BANDWIDTH) + virBufferAddLit(buf, " unit='KiB'"); + virBufferAddLit(buf, "/>\n"); + } + + if (def->nlatencies) { + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</latencies>\n"); + } + virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "</numa>\n"); diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h index 25e896826b..a7dc3190d3 100644 --- a/src/conf/numa_conf.h +++ b/src/conf/numa_conf.h @@ -51,6 +51,39 @@ typedef enum { } virDomainMemoryAccess; VIR_ENUM_DECL(virDomainMemoryAccess); +typedef enum { + VIR_DOMAIN_CACHE_ASSOCIATIVITY_NONE, /* No associativity */ + VIR_DOMAIN_CACHE_ASSOCIATIVITY_DIRECT, /* Direct mapped cache */ + VIR_DOMAIN_CACHE_ASSOCIATIVITY_FULL, /* Fully associative cache */ + + VIR_DOMAIN_CACHE_ASSOCIATIVITY_LAST +} virDomainCacheAssociativity; +VIR_ENUM_DECL(virDomainCacheAssociativity); + +typedef enum { + VIR_DOMAIN_CACHE_POLICY_NONE, /* No policy */ + VIR_DOMAIN_CACHE_POLICY_WRITEBACK, /* Write-back policy */ + VIR_DOMAIN_CACHE_POLICY_WRITETHROUGH, /* Write-through policy */ + + VIR_DOMAIN_CACHE_POLICY_LAST +} virDomainCachePolicy; +VIR_ENUM_DECL(virDomainCachePolicy); + +typedef enum { + VIR_DOMAIN_NUMA_LATENCY_TYPE_LATENCY, + VIR_DOMAIN_NUMA_LATENCY_TYPE_BANDWIDTH, +} virDomainNumaLatencyType; + +typedef enum { + VIR_DOMAIN_MEMORY_LATENCY_NONE = 0, /* No memory latency defined */ + VIR_DOMAIN_MEMORY_LATENCY_ACCESS, /* Access latency */ + VIR_DOMAIN_MEMORY_LATENCY_READ, /* Read latency */ + VIR_DOMAIN_MEMORY_LATENCY_WRITE, /* Write latency */ + + VIR_DOMAIN_MEMORY_LATENCY_LAST +} virDomainMemoryLatency; +VIR_ENUM_DECL(virDomainMemoryLatency); + virDomainNumaPtr virDomainNumaNew(void); void virDomainNumaFree(virDomainNumaPtr numa); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 9f2abc1d8d..65fe1c7f47 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -830,8 +830,14 @@ virNodeDeviceDeleteVport; virNodeDeviceGetParentName; # conf/numa_conf.h +virDomainCacheAssociativityTypeFromString; +virDomainCacheAssociativityTypeToString; +virDomainCachePolicyTypeFromString; +virDomainCachePolicyTypeToString; virDomainMemoryAccessTypeFromString; virDomainMemoryAccessTypeToString; +virDomainMemoryLatencyTypeFromString; +virDomainMemoryLatencyTypeToString; virDomainNumaCheckABIStability; virDomainNumaEquals; virDomainNumaFillCPUsInNode; diff --git a/tests/qemuxml2argvdata/numatune-hmat.xml b/tests/qemuxml2argvdata/numatune-hmat.xml new file mode 100644 index 0000000000..b20ca0e439 --- /dev/null +++ b/tests/qemuxml2argvdata/numatune-hmat.xml @@ -0,0 +1,52 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>c7a5fdb2-cdaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static'>12</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu> + <numa> + <cell id='0' cpus='0-3' memory='2097152' unit='KiB'> + <cache level='1' associativity='direct' policy='writeback'> + <size value='10' unit='KiB'/> + <line value='8' unit='B'/> + </cache> + </cell> + <cell id='1' cpus='4-7' memory='2097152' unit='KiB'/> + <cell id='2' cpus='8-11' memory='2097152' unit='KiB'/> + <cell id='3' memory='2097152' unit='KiB'/> + <cell id='4' memory='2097152' unit='KiB'/> + <cell id='5' memory='2097152' unit='KiB'/> + <latencies> + <latency initiator='0' target='0' type='access' value='5'/> + <latency initiator='0' target='0' cache='1' type='access' value='10'/> + <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> + </latencies> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>restart</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <controller type='usb' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> + </memballoon> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/numatune-hmat.xml b/tests/qemuxml2xmloutdata/numatune-hmat.xml new file mode 120000 index 0000000000..6903a80ab1 --- /dev/null +++ b/tests/qemuxml2xmloutdata/numatune-hmat.xml @@ -0,0 +1 @@ +../qemuxml2argvdata/numatune-hmat.xml \ No newline at end of file diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 376c427415..103ce155a4 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -1106,6 +1106,7 @@ mymain(void) DO_TEST("numatune-memnode-no-memory", QEMU_CAPS_OBJECT_MEMORY_FILE); DO_TEST("numatune-distances", QEMU_CAPS_NUMA, QEMU_CAPS_NUMA_DIST); DO_TEST("numatune-no-vcpu", QEMU_CAPS_NUMA); + DO_TEST("numatune-hmat", NONE); DO_TEST("bios-nvram", NONE); DO_TEST("bios-nvram-os-interleave", NONE); -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
To cite ACPI specification:
Heterogeneous Memory Attribute Table describes the memory attributes, such as memory side cache attributes and bandwidth and latency details, related to the System Physical Address (SPA) Memory Ranges. The software is expected to use this information as hint for optimization.
According to our upstream discussion [1] this is exposed under <numa/> as <cache/> under NUMA <cell/> and <latency> or <bandwidth/> under numa/latencies.
1: https://www.redhat.com/archives/libvir-list/2020-January/msg00422.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 88 ++++++ docs/schemas/cputypes.rng | 110 ++++++- src/conf/numa_conf.c | 350 ++++++++++++++++++++- src/conf/numa_conf.h | 33 ++ src/libvirt_private.syms | 6 + tests/qemuxml2argvdata/numatune-hmat.xml | 52 +++ tests/qemuxml2xmloutdata/numatune-hmat.xml | 1 + tests/qemuxml2xmltest.c | 1 + 8 files changed, 625 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-hmat.xml create mode 120000 tests/qemuxml2xmloutdata/numatune-hmat.xml
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 07dcca57f5..78b2d2828d 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1931,6 +1931,94 @@ using 10 for local and 20 for remote distances. </p>
+ <h4><a id="hmat">Heterogeneous Memory Attribute Table</a></h4> + +<pre> +... +<cpu> + ... + <numa> + <cell id='0' cpus='0-3' memory='512000' unit='KiB' discard='yes'/> + <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/> + <cell id='3' cpus='0-3' memory='2097152' unit='KiB'> + <cache level='1' associativity='direct' policy='writeback'> + <size value='10' unit='KiB'/> + <line value='8' unit='B'/> + </cache> + </cell> + <latencies> + <latency initiator='0' target='0' type='access' value='5'/> + <latency initiator='0' target='0' cache='1' type='access' value='10'/> + <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> + </latencies> + </numa> + ... +</cpu> +...</pre> + + <p> + <span class='since'>Since 6.5.0</span> the <code>cell</code> element can + have <code>cache</code> child element which describes memory side cache
"have a <code>cache</code> child ...."
+ for memory proximity domains. The <code>cache</code> element has
"<code>cache</code> element has a"
+ <code>level</code> attribute describing the cache level and thus the + element can be repeated multiple times to describe different levels of + the cache. + </p> + + <p> + The <code>cache</code> element then has <code>associativity</code> + attribute (accepted values are <code>none</code>, <code>direct</code> and + <code>full</code>) describing the cache associativity, and + <code>policy</code> attribute (accepted values are <code>none</code>, + <code>writeback</code> and <code>writethrough</code>) describing cache + write associativity. The element has two mandatory child elements then: + <code>size</code> and <code>line</code> which describe cache size and + cache line size. Both elements accept two attributes: <code>value</code> + and <code>unit</code> which set the value of corresponding cache + attribute. + </p>
This second paragraph is about the same <cache> element that you started to describe in the previous paragraph. Might as well make a big paragraph talking just about the cache element. And there's a lot of different elements and attributes at different levels to unpack at once here. I think you can use a description list (<dl>) here, like you did below with the <latencies> element, or any other hierarchical HTML struct for that matter, to make it easier to display all the info you're providing here.
+ + <p> + The NUMA description has optional <code>latencies</code> element that
"The NUMA description has an optional ..." Everything else LGTM and these HTML changes are somewhat cosmetic, so: Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

On Wed, Jun 24, 2020 at 03:49:06PM +0200, Michal Privoznik wrote:
To cite ACPI specification:
Heterogeneous Memory Attribute Table describes the memory attributes, such as memory side cache attributes and bandwidth and latency details, related to the System Physical Address (SPA) Memory Ranges. The software is expected to use this information as hint for optimization.
According to our upstream discussion [1] this is exposed under <numa/> as <cache/> under NUMA <cell/> and <latency> or <bandwidth/> under numa/latencies.
1: https://www.redhat.com/archives/libvir-list/2020-January/msg00422.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 88 ++++++ docs/schemas/cputypes.rng | 110 ++++++- src/conf/numa_conf.c | 350 ++++++++++++++++++++- src/conf/numa_conf.h | 33 ++ src/libvirt_private.syms | 6 + tests/qemuxml2argvdata/numatune-hmat.xml | 52 +++ tests/qemuxml2xmloutdata/numatune-hmat.xml | 1 + tests/qemuxml2xmltest.c | 1 + 8 files changed, 625 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-hmat.xml create mode 120000 tests/qemuxml2xmloutdata/numatune-hmat.xml diff --git a/tests/qemuxml2argvdata/numatune-hmat.xml b/tests/qemuxml2argvdata/numatune-hmat.xml new file mode 100644 index 0000000000..b20ca0e439 --- /dev/null +++ b/tests/qemuxml2argvdata/numatune-hmat.xml @@ -0,0 +1,52 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>c7a5fdb2-cdaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static'>12</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu> + <numa> + <cell id='0' cpus='0-3' memory='2097152' unit='KiB'> + <cache level='1' associativity='direct' policy='writeback'> + <size value='10' unit='KiB'/> + <line value='8' unit='B'/> + </cache> + </cell> + <cell id='1' cpus='4-7' memory='2097152' unit='KiB'/> + <cell id='2' cpus='8-11' memory='2097152' unit='KiB'/> + <cell id='3' memory='2097152' unit='KiB'/> + <cell id='4' memory='2097152' unit='KiB'/> + <cell id='5' memory='2097152' unit='KiB'/> + <latencies> + <latency initiator='0' target='0' type='access' value='5'/> + <latency initiator='0' target='0' cache='1' type='access' value='10'/> + <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> + </latencies>
Functionally the XML looks find to me, but I was just thinking it looks a bit wierd to have <bandwidth> under a <latencies> wrapper. I'm not entirely sure what better name we should use - perhaps "interconnects" ? <interconnects> <latency initiator='0' target='0' type='access' value='5'/> <latency initiator='0' target='0' cache='1' type='access' value='10'/> <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> </interconnects> any other ideas ? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 7/3/20 11:23 AM, Daniel P. Berrangé wrote:
On Wed, Jun 24, 2020 at 03:49:06PM +0200, Michal Privoznik wrote:
To cite ACPI specification:
Heterogeneous Memory Attribute Table describes the memory attributes, such as memory side cache attributes and bandwidth and latency details, related to the System Physical Address (SPA) Memory Ranges. The software is expected to use this information as hint for optimization.
According to our upstream discussion [1] this is exposed under <numa/> as <cache/> under NUMA <cell/> and <latency> or <bandwidth/> under numa/latencies.
1: https://www.redhat.com/archives/libvir-list/2020-January/msg00422.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/formatdomain.html.in | 88 ++++++ docs/schemas/cputypes.rng | 110 ++++++- src/conf/numa_conf.c | 350 ++++++++++++++++++++- src/conf/numa_conf.h | 33 ++ src/libvirt_private.syms | 6 + tests/qemuxml2argvdata/numatune-hmat.xml | 52 +++ tests/qemuxml2xmloutdata/numatune-hmat.xml | 1 + tests/qemuxml2xmltest.c | 1 + 8 files changed, 625 insertions(+), 16 deletions(-) create mode 100644 tests/qemuxml2argvdata/numatune-hmat.xml create mode 120000 tests/qemuxml2xmloutdata/numatune-hmat.xml diff --git a/tests/qemuxml2argvdata/numatune-hmat.xml b/tests/qemuxml2argvdata/numatune-hmat.xml new file mode 100644 index 0000000000..b20ca0e439 --- /dev/null +++ b/tests/qemuxml2argvdata/numatune-hmat.xml @@ -0,0 +1,52 @@ +<domain type='qemu'> + <name>QEMUGuest</name> + <uuid>c7a5fdb2-cdaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>8388608</memory> + <currentMemory unit='KiB'>8388608</currentMemory> + <vcpu placement='static'>12</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <pae/> + </features> + <cpu> + <numa> + <cell id='0' cpus='0-3' memory='2097152' unit='KiB'> + <cache level='1' associativity='direct' policy='writeback'> + <size value='10' unit='KiB'/> + <line value='8' unit='B'/> + </cache> + </cell> + <cell id='1' cpus='4-7' memory='2097152' unit='KiB'/> + <cell id='2' cpus='8-11' memory='2097152' unit='KiB'/> + <cell id='3' memory='2097152' unit='KiB'/> + <cell id='4' memory='2097152' unit='KiB'/> + <cell id='5' memory='2097152' unit='KiB'/> + <latencies> + <latency initiator='0' target='0' type='access' value='5'/> + <latency initiator='0' target='0' cache='1' type='access' value='10'/> + <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> + </latencies>
Functionally the XML looks find to me, but I was just thinking it looks a bit wierd to have <bandwidth> under a <latencies> wrapper. I'm not entirely sure what better name we should use - perhaps "interconnects" ?
<interconnects> <latency initiator='0' target='0' type='access' value='5'/> <latency initiator='0' target='0' cache='1' type='access' value='10'/> <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> </interconnects>
any other ideas ?
That sounds better. Or we can admit this is HMAT an have <hmat/> instead <interconnects/>? But that won't be much future proof so your suggestion sound better. Michal

On 7/3/20 11:51 AM, Michal Privoznik wrote:
On 7/3/20 11:23 AM, Daniel P. Berrangé wrote:
On Wed, Jun 24, 2020 at 03:49:06PM +0200, Michal Privoznik wrote:
To cite ACPI specification:
Functionally the XML looks find to me, but I was just thinking it looks a bit wierd to have <bandwidth> under a <latencies> wrapper. I'm not entirely sure what better name we should use - perhaps "interconnects" ?
<interconnects> <latency initiator='0' target='0' type='access' value='5'/> <latency initiator='0' target='0' cache='1' type='access' value='10'/> <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> </interconnects>
any other ideas ?
That sounds better. Or we can admit this is HMAT an have <hmat/> instead <interconnects/>? But that won't be much future proof so your suggestion sound better.
Alright, no one suggested anything better and I'd like to push these to get the most out of the development window possible. I'll change this to <interconnects/> and push. We can still change it until the release, if needed. Michal

On Wed, Jul 08, 2020 at 10:03:12AM +0200, Michal Privoznik wrote:
On 7/3/20 11:51 AM, Michal Privoznik wrote:
On 7/3/20 11:23 AM, Daniel P. Berrangé wrote:
On Wed, Jun 24, 2020 at 03:49:06PM +0200, Michal Privoznik wrote:
To cite ACPI specification:
Functionally the XML looks find to me, but I was just thinking it looks a bit wierd to have <bandwidth> under a <latencies> wrapper. I'm not entirely sure what better name we should use - perhaps "interconnects" ?
<interconnects> <latency initiator='0' target='0' type='access' value='5'/> <latency initiator='0' target='0' cache='1' type='access' value='10'/> <bandwidth initiator='0' target='0' type='access' value='204800' unit='KiB'/> </interconnects>
any other ideas ?
That sounds better. Or we can admit this is HMAT an have <hmat/> instead <interconnects/>? But that won't be much future proof so your suggestion sound better.
Alright, no one suggested anything better and I'd like to push these to get the most out of the development window possible. I'll change this to <interconnects/> and push. We can still change it until the release, if needed.
Sounds fine to me. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

There are several restrictions, for instance @initiator and @target have to refer to existing NUMA nodes (daa), @cache has to refer to a defined cache level and so on. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/domain_conf.c | 3 ++ src/conf/numa_conf.c | 99 ++++++++++++++++++++++++++++++++++++++++++ src/conf/numa_conf.h | 1 + 3 files changed, 103 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index ce75831037..0d73968221 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7321,6 +7321,9 @@ virDomainDefValidateInternal(const virDomainDef *def, if (virDomainDefCputuneValidate(def) < 0) return -1; + if (virDomainNumaDefValidate(def->numa) < 0) + return -1; + return 0; } diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index f510dfebce..bf8442eada 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -1363,6 +1363,105 @@ virDomainNumaDefFormatXML(virBufferPtr buf, } +int +virDomainNumaDefValidate(const virDomainNuma *def) +{ + size_t i; + size_t j; + + if (!def) + return 0; + + for (i = 0; i < def->nmem_nodes; i++) { + const virDomainNumaNode *node = &def->mem_nodes[i]; + g_autoptr(virBitmap) levelsSeen = virBitmapNewEmpty(); + + for (j = 0; j < node->ncaches; j++) { + const virDomainNumaCache *cache = &node->caches[j]; + + /* Relax this if there's ever fourth layer of cache */ + if (cache->level > 3) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Ain't nobody heard of that much cache level")); + return -1; + } + + if (virBitmapIsBitSet(levelsSeen, cache->level)) { + virReportError(VIR_ERR_XML_ERROR, + _("Cache level '%u' already defined"), + cache->level); + return -1; + } + + if (virBitmapSetBitExpand(levelsSeen, cache->level)) + return -1; + } + } + + for (i = 0; i < def->nlatencies; i++) { + const virDomainNumaLatency *l = &def->latencies[i]; + + if (l->initiator >= def->nmem_nodes) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("'initiator' refers to a non-existent NUMA node")); + return -1; + } + + if (l->target >= def->nmem_nodes) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("'target' refers to a non-existent NUMA node")); + return -1; + } + + if (!def->mem_nodes[l->initiator].cpumask) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("NUMA nodes without CPUs can't be initiator")); + return -1; + } + + if (l->cache > 0) { + for (j = 0; j < def->mem_nodes[l->target].ncaches; j++) { + const virDomainNumaCache *cache = def->mem_nodes[l->target].caches; + + if (l->cache == cache->level) + break; + } + + if (j == def->mem_nodes[l->target].ncaches) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("'cache' refers to a non-existent NUMA node cache")); + return -1; + } + } + + for (j = 0; j < i; j++) { + virDomainNumaLatencyPtr ll = &def->latencies[j]; + + if (l->type == ll->type && + l->initiator == ll->initiator && + l->target == ll->target && + l->cache == ll->cache && + l->accessType == ll->accessType) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Duplicate info for NUMA latencies")); + return -1; + } + + + if (l->initiator != l->target && + l->initiator == ll->target && + l->target == ll->initiator) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("Link already defined")); + return -1; + } + } + } + + return 0; +} + + unsigned int virDomainNumaGetCPUCountTotal(virDomainNumaPtr numa) { diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h index a7dc3190d3..6d6e9de551 100644 --- a/src/conf/numa_conf.h +++ b/src/conf/numa_conf.h @@ -216,6 +216,7 @@ bool virDomainNumatuneNodeSpecified(virDomainNumaPtr numatune, int virDomainNumaDefParseXML(virDomainNumaPtr def, xmlXPathContextPtr ctxt); int virDomainNumaDefFormatXML(virBufferPtr buf, virDomainNumaPtr def); +int virDomainNumaDefValidate(const virDomainNuma *def); unsigned int virDomainNumaGetCPUCountTotal(virDomainNumaPtr numa); -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
There are several restrictions, for instance @initiator and @target have to refer to existing NUMA nodes (daa), @cache has to refer to a defined cache level and so on.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

These APIs will be used by QEMU driver when building the command line. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/numa_conf.c | 139 +++++++++++++++++++++++++++++++++++++++ src/conf/numa_conf.h | 28 ++++++++ src/libvirt_private.syms | 6 ++ 3 files changed, 173 insertions(+) diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index bf8442eada..b35e5040fc 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -1829,3 +1829,142 @@ virDomainNumaFillCPUsInNode(virDomainNumaPtr numa, return 0; } + + +bool +virDomainNumaHasHMAT(const virDomainNuma *numa) +{ + size_t i; + + if (!numa) + return false; + + if (numa->nlatencies) + return true; + + for (i = 0; i < numa->nmem_nodes; i++) { + if (numa->mem_nodes[i].ncaches) + return true; + } + + return false; +} + + +size_t +virDomainNumaGetNodeCacheCount(const virDomainNuma *numa, + size_t node) +{ + if (!numa || node >= numa->nmem_nodes) + return 0; + + return numa->mem_nodes[node].ncaches; +} + + +int +virDomainNumaGetNodeCache(const virDomainNuma *numa, + size_t node, + size_t cache, + unsigned int *level, + unsigned int *size, + unsigned int *line, + virDomainCacheAssociativity *associativity, + virDomainCachePolicy *policy) +{ + const virDomainNumaNode *cell; + + if (!numa || node >= numa->nmem_nodes) + return -1; + + cell = &numa->mem_nodes[node]; + + if (cache >= cell->ncaches) + return -1; + + *level = cell->caches[cache].level; + *size = cell->caches[cache].size; + *line = cell->caches[cache].line; + *associativity = cell->caches[cache].associativity; + *policy = cell->caches[cache].policy; + return 0; +} + + +ssize_t +virDomainNumaGetNodeInitiator(const virDomainNuma *numa, + size_t node) +{ + size_t i; + unsigned int maxBandwidth = 0; + ssize_t candidateBandwidth = -1; + unsigned int minLatency = UINT_MAX; + ssize_t candidateLatency = -1; + + if (!numa || node >= numa->nmem_nodes) + return -1; + + for (i = 0; i < numa->nlatencies; i++) { + const virDomainNumaLatency *l = &numa->latencies[i]; + + if (l->target != node) + continue; + + switch (l->type) { + case VIR_DOMAIN_NUMA_LATENCY_TYPE_LATENCY: + if (l->value < minLatency) { + minLatency = l->value; + candidateLatency = l->initiator; + } + break; + + case VIR_DOMAIN_NUMA_LATENCY_TYPE_BANDWIDTH: + if (l->value > maxBandwidth) { + maxBandwidth = l->value; + candidateBandwidth = l->initiator; + } + break; + } + } + + if (candidateLatency >= 0) + return candidateLatency; + + return candidateBandwidth; +} + + +size_t +virDomainNumaGetLatenciesCount(const virDomainNuma *numa) +{ + if (!numa) + return 0; + + return numa->nlatencies; +} + + +int +virDomainNumaGetLatency(const virDomainNuma *numa, + size_t i, + virDomainNumaLatencyType *type, + unsigned int *initiator, + unsigned int *target, + unsigned int *cache, + virDomainMemoryLatency *accessType, + unsigned long *value) +{ + const virDomainNumaLatency *l; + + if (!numa || i >= numa->nlatencies) + return -1; + + l = &numa->latencies[i]; + *type = l->type; + *initiator = l->initiator; + *target = l->target; + *cache = l->cache; + *accessType = l->accessType; + *value = l->value; + return 0; +} diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h index 6d6e9de551..b150fe4037 100644 --- a/src/conf/numa_conf.h +++ b/src/conf/numa_conf.h @@ -222,3 +222,31 @@ unsigned int virDomainNumaGetCPUCountTotal(virDomainNumaPtr numa); int virDomainNumaFillCPUsInNode(virDomainNumaPtr numa, size_t node, unsigned int maxCpus); + +bool virDomainNumaHasHMAT(const virDomainNuma *numa); + +size_t virDomainNumaGetNodeCacheCount(const virDomainNuma *numa, + size_t node); + +int virDomainNumaGetNodeCache(const virDomainNuma *numa, + size_t node, + size_t cache, + unsigned int *level, + unsigned int *size, + unsigned int *line, + virDomainCacheAssociativity *associativity, + virDomainCachePolicy *policy); + +ssize_t virDomainNumaGetNodeInitiator(const virDomainNuma *numa, + size_t node); + +size_t virDomainNumaGetLatenciesCount(const virDomainNuma *numa); + +int virDomainNumaGetLatency(const virDomainNuma *numa, + size_t i, + virDomainNumaLatencyType *type, + unsigned int *initiator, + unsigned int *target, + unsigned int *cache, + virDomainMemoryLatency *accessType, + unsigned long *value); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 65fe1c7f47..41e61e9b4d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -843,14 +843,20 @@ virDomainNumaEquals; virDomainNumaFillCPUsInNode; virDomainNumaFree; virDomainNumaGetCPUCountTotal; +virDomainNumaGetLatenciesCount; +virDomainNumaGetLatency; virDomainNumaGetMaxCPUID; virDomainNumaGetMemorySize; +virDomainNumaGetNodeCache; +virDomainNumaGetNodeCacheCount; virDomainNumaGetNodeCount; virDomainNumaGetNodeCpumask; virDomainNumaGetNodeDiscard; virDomainNumaGetNodeDistance; +virDomainNumaGetNodeInitiator; virDomainNumaGetNodeMemoryAccessMode; virDomainNumaGetNodeMemorySize; +virDomainNumaHasHMAT; virDomainNumaNew; virDomainNumaNodeDistanceIsUsingDefaults; virDomainNumaNodesDistancesAreBeingSet; -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
These APIs will be used by QEMU driver when building the command line.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

This capability tracks whether QEMU is capable of defining HMAT ACPI table for the guest. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + 7 files changed, 8 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index fc502ff8e4..66a663f827 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -595,6 +595,7 @@ VIR_ENUM_IMPL(virQEMUCaps, "migration-param.xbzrle-cache-size", "intel-iommu.aw-bits", "spapr-tpm-proxy", + "numa.hmat", ); @@ -1522,6 +1523,7 @@ static struct virQEMUCapsStringFlags virQEMUCapsQMPSchemaQueries[] = { { "migrate-set-parameters/arg-type/max-bandwidth", QEMU_CAPS_MIGRATION_PARAM_BANDWIDTH }, { "migrate-set-parameters/arg-type/downtime-limit", QEMU_CAPS_MIGRATION_PARAM_DOWNTIME }, { "migrate-set-parameters/arg-type/xbzrle-cache-size", QEMU_CAPS_MIGRATION_PARAM_XBZRLE_CACHE_SIZE }, + { "set-numa-node/arg-type/+hmat-lb", QEMU_CAPS_NUMA_HMAT }, }; typedef struct _virQEMUCapsObjectTypeProps virQEMUCapsObjectTypeProps; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 0bf46d493e..7d9b2ac7db 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -575,6 +575,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_MIGRATION_PARAM_XBZRLE_CACHE_SIZE, /* xbzrle-cache-size field in migrate-set-parameters */ QEMU_CAPS_INTEL_IOMMU_AW_BITS, /* intel-iommu.aw-bits */ QEMU_CAPS_DEVICE_SPAPR_TPM_PROXY, /* -device spapr-tpm-proxy */ + QEMU_CAPS_NUMA_HMAT, /* -numa hmat */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml index f2d691734f..9446ddc22a 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml @@ -197,6 +197,7 @@ <flag name='migration-param.bandwidth'/> <flag name='migration-param.downtime'/> <flag name='migration-param.xbzrle-cache-size'/> + <flag name='numa.hmat'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml index f5d6e0679e..8c371a045b 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml @@ -206,6 +206,7 @@ <flag name='migration-param.downtime'/> <flag name='migration-param.xbzrle-cache-size'/> <flag name='spapr-tpm-proxy'/> + <flag name='numa.hmat'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml index 3119f6deb7..15c9d54332 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml @@ -193,6 +193,7 @@ <flag name='migration-param.bandwidth'/> <flag name='migration-param.downtime'/> <flag name='migration-param.xbzrle-cache-size'/> + <flag name='numa.hmat'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml index da53abc857..dafca26b89 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml @@ -241,6 +241,7 @@ <flag name='migration-param.downtime'/> <flag name='migration-param.xbzrle-cache-size'/> <flag name='intel-iommu.aw-bits'/> + <flag name='numa.hmat'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml index b7058ee597..bd6e83ccaf 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml @@ -241,6 +241,7 @@ <flag name='migration-param.downtime'/> <flag name='migration-param.xbzrle-cache-size'/> <flag name='intel-iommu.aw-bits'/> + <flag name='numa.hmat'/> <version>5000050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
This capability tracks whether QEMU is capable of defining HMAT ACPI table for the guest.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> ---
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1786303 Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/numa_conf.c | 7 + src/qemu/qemu_command.c | 183 ++++++++++++++++++ .../numatune-hmat.x86_64-latest.args | 53 +++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 244 insertions(+) create mode 100644 tests/qemuxml2argvdata/numatune-hmat.x86_64-latest.args diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index b35e5040fc..7fbbe0f52e 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -1904,6 +1904,13 @@ virDomainNumaGetNodeInitiator(const virDomainNuma *numa, if (!numa || node >= numa->nmem_nodes) return -1; + /* A NUMA node which has at least one vCPU is initiator to itself by + * definition. */ + if (numa->mem_nodes[node].cpumask) + return node; + + /* For the rest, "NUMA node that has best performance (the lowest + * latency or largest bandwidth) to this NUMA node." */ for (i = 0; i < numa->nlatencies; i++) { const virDomainNumaLatency *l = &numa->latencies[i]; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 628ff970bd..b28c43f110 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6932,6 +6932,16 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, virBufferAsprintf(&buf, ",pflash1=%s", priv->pflash1->nodeformat); } + if (virDomainNumaHasHMAT(def->numa)) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA_HMAT)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("HMAT is not supported with this QEMU")); + return -1; + } + + virBufferAddLit(&buf, ",hmat=on"); + } + virCommandAddArgBuffer(cmd, &buf); return 0; @@ -7114,6 +7124,134 @@ qemuBuildIOThreadCommandLine(virCommandPtr cmd, } +static int +qemuBuilNumaCellCache(virCommandPtr cmd, + const virDomainDef *def, + size_t cell) +{ + size_t ncaches = virDomainNumaGetNodeCacheCount(def->numa, cell); + size_t i; + + if (ncaches == 0) + return 0; + + for (i = 0; i < ncaches; i++) { + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + unsigned int level; + unsigned int size; + unsigned int line; + virDomainCacheAssociativity associativity; + virDomainCachePolicy policy; + + if (virDomainNumaGetNodeCache(def->numa, cell, i, + &level, &size, &line, + &associativity, &policy) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Unable to format NUMA node cache")); + return -1; + } + + virBufferAsprintf(&buf, + "hmat-cache,node-id=%zu,size=%uK,level=%u", + cell, size, level); + + switch (associativity) { + case VIR_DOMAIN_CACHE_ASSOCIATIVITY_NONE: + virBufferAddLit(&buf, ",associativity=none"); + break; + case VIR_DOMAIN_CACHE_ASSOCIATIVITY_DIRECT: + virBufferAddLit(&buf, ",associativity=direct"); + break; + case VIR_DOMAIN_CACHE_ASSOCIATIVITY_FULL: + virBufferAddLit(&buf, ",associativity=complex"); + break; + case VIR_DOMAIN_CACHE_ASSOCIATIVITY_LAST: + break; + } + + switch (policy) { + case VIR_DOMAIN_CACHE_POLICY_NONE: + virBufferAddLit(&buf, ",policy=none"); + break; + case VIR_DOMAIN_CACHE_POLICY_WRITEBACK: + virBufferAddLit(&buf, ",policy=write-back"); + break; + case VIR_DOMAIN_CACHE_POLICY_WRITETHROUGH: + virBufferAddLit(&buf, ",policy=write-through"); + break; + case VIR_DOMAIN_CACHE_POLICY_LAST: + break; + } + + if (line > 0) + virBufferAsprintf(&buf, ",line=%u", line); + + virCommandAddArg(cmd, "-numa"); + virCommandAddArgBuffer(cmd, &buf); + } + + return 0; +} + + +VIR_ENUM_DECL(qemuDomainMemoryHierarchy); +VIR_ENUM_IMPL(qemuDomainMemoryHierarchy, + 4, /* Maximum level of cache */ + "memory", /* Special case, whole memory not specific cache */ + "first-level", + "second-level", + "third-level"); + +static int +qemuBuildNumaHMATCommandLine(virCommandPtr cmd, + const virDomainDef *def) +{ + size_t nlatencies; + size_t i; + + if (!def->numa) + return 0; + + nlatencies = virDomainNumaGetLatenciesCount(def->numa); + for (i = 0; i < nlatencies; i++) { + g_auto(virBuffer) buf = VIR_BUFFER_INITIALIZER; + virDomainNumaLatencyType type; + unsigned int initiator; + unsigned int target; + unsigned int cache; + virDomainMemoryLatency accessType; + unsigned long value; + const char *hierarchyStr; + const char *accessStr; + + if (virDomainNumaGetLatency(def->numa, i, + &type, &initiator, &target, + &cache, &accessType, &value) < 0) + return -1; + + hierarchyStr = qemuDomainMemoryHierarchyTypeToString(cache); + accessStr = virDomainMemoryLatencyTypeToString(accessType); + virBufferAsprintf(&buf, + "hmat-lb,initiator=%u,target=%u,hierarchy=%s,data-type=%s-", + initiator, target, hierarchyStr, accessStr); + + switch (type) { + case VIR_DOMAIN_NUMA_LATENCY_TYPE_LATENCY: + virBufferAsprintf(&buf, "latency,latency=%lu", value); + break; + case VIR_DOMAIN_NUMA_LATENCY_TYPE_BANDWIDTH: + virBufferAsprintf(&buf, "bandwidth,bandwidth=%luK", value); + break; + } + + virCommandAddArg(cmd, "-numa"); + virCommandAddArgBuffer(cmd, &buf); + } + + return 0; +} + + static int qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, virDomainDefPtr def, @@ -7126,9 +7264,11 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, char *next = NULL; virBufferPtr nodeBackends = NULL; bool needBackend = false; + bool hmat = false; int rc; int ret = -1; size_t ncells = virDomainNumaGetNodeCount(def->numa); + ssize_t masterInitiator = -1; if (!virDomainNumatuneNodesetIsAvailable(def->numa, priv->autoNodeset)) goto cleanup; @@ -7138,6 +7278,16 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, def->os.machine)) needBackend = true; + if (virDomainNumaHasHMAT(def->numa)) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA_HMAT)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("HMAT is not supported with this QEMU")); + goto cleanup; + } + needBackend = true; + hmat = true; + } + if (VIR_ALLOC_N(nodeBackends, ncells) < 0) goto cleanup; @@ -7166,8 +7316,22 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, qemuBuildMemPathStr(def, cmd, priv) < 0) goto cleanup; + for (i = 0; i < ncells; i++) { + if (virDomainNumaGetNodeCpumask(def->numa, i)) { + masterInitiator = i; + break; + } + } + + if (masterInitiator) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("At least one NUMA node has to have CPUs")); + goto cleanup; + } + for (i = 0; i < ncells; i++) { virBitmapPtr cpumask = virDomainNumaGetNodeCpumask(def->numa, i); + ssize_t initiator = virDomainNumaGetNodeInitiator(def->numa, i); if (needBackend) { virCommandAddArg(cmd, "-object"); @@ -7192,6 +7356,13 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, } } + if (hmat) { + if (initiator < 0) + initiator = masterInitiator; + + virBufferAsprintf(&buf, ",initiator=%zd", initiator); + } + if (needBackend) virBufferAsprintf(&buf, ",memdev=ram-node%zu", i); else @@ -7217,6 +7388,18 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, } } + if (hmat) { + if (qemuBuildNumaHMATCommandLine(cmd, def) < 0) + goto cleanup; + + /* This can't be moved into any of the loops above, + * because hmat-cache can be specified only after hmat-lb. */ + for (i = 0; i < ncells; i++) { + if (qemuBuilNumaCellCache(cmd, def, i) < 0) + goto cleanup; + } + } + ret = 0; cleanup: diff --git a/tests/qemuxml2argvdata/numatune-hmat.x86_64-latest.args b/tests/qemuxml2argvdata/numatune-hmat.x86_64-latest.args new file mode 100644 index 0000000000..c52015caa8 --- /dev/null +++ b/tests/qemuxml2argvdata/numatune-hmat.x86_64-latest.args @@ -0,0 +1,53 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name guest=QEMUGuest,debug-threads=on \ +-S \ +-object secret,id=masterKey0,format=raw,\ +file=/tmp/lib/domain--1-QEMUGuest/master-key.aes \ +-machine pc,accel=tcg,usb=off,dump-guest-core=off,hmat=on \ +-cpu qemu64 \ +-m 12288 \ +-overcommit mem-lock=off \ +-smp 12,sockets=12,cores=1,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=2147483648 \ +-numa node,nodeid=0,cpus=0-3,initiator=0,memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=2147483648 \ +-numa node,nodeid=1,cpus=4-7,initiator=1,memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=2147483648 \ +-numa node,nodeid=2,cpus=8-11,initiator=2,memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=2147483648 \ +-numa node,nodeid=3,initiator=0,memdev=ram-node3 \ +-object memory-backend-ram,id=ram-node4,size=2147483648 \ +-numa node,nodeid=4,initiator=0,memdev=ram-node4 \ +-object memory-backend-ram,id=ram-node5,size=2147483648 \ +-numa node,nodeid=5,initiator=0,memdev=ram-node5 \ +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-latency,\ +latency=5 \ +-numa hmat-lb,initiator=0,target=0,hierarchy=first-level,\ +data-type=access-latency,latency=10 \ +-numa hmat-lb,initiator=0,target=0,hierarchy=memory,data-type=access-bandwidth,\ +bandwidth=204800K \ +-numa hmat-cache,node-id=0,size=10K,level=1,associativity=direct,\ +policy=write-back,line=8 \ +-uuid c7a5fdb2-cdaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=1729,server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,\ +resourcecontrol=deny \ +-msg timestamp=on diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 98d9a7b5b2..3fdcd41317 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1913,6 +1913,7 @@ mymain(void) DO_TEST("numatune-distances", QEMU_CAPS_NUMA, QEMU_CAPS_NUMA_DIST); DO_TEST("numatune-no-vcpu", NONE); + DO_TEST_CAPS_LATEST("numatune-hmat"); DO_TEST("numatune-auto-nodeset-invalid", NONE); DO_TEST("numatune-auto-prefer", QEMU_CAPS_OBJECT_MEMORY_RAM, -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1786303
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/conf/numa_conf.c | 7 + src/qemu/qemu_command.c | 183 ++++++++++++++++++ .../numatune-hmat.x86_64-latest.args | 53 +++++ tests/qemuxml2argvtest.c | 1 + 4 files changed, 244 insertions(+) create mode 100644 tests/qemuxml2argvdata/numatune-hmat.x86_64-latest.args
diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index b35e5040fc..7fbbe0f52e 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -1904,6 +1904,13 @@ virDomainNumaGetNodeInitiator(const virDomainNuma *numa, if (!numa || node >= numa->nmem_nodes) return -1;
+ /* A NUMA node which has at least one vCPU is initiator to itself by + * definition. */ + if (numa->mem_nodes[node].cpumask) + return node; + + /* For the rest, "NUMA node that has best performance (the lowest + * latency or largest bandwidth) to this NUMA node." */ for (i = 0; i < numa->nlatencies; i++) { const virDomainNumaLatency *l = &numa->latencies[i];
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 628ff970bd..b28c43f110 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -6932,6 +6932,16 @@ qemuBuildMachineCommandLine(virCommandPtr cmd, virBufferAsprintf(&buf, ",pflash1=%s", priv->pflash1->nodeformat); }
+ if (virDomainNumaHasHMAT(def->numa)) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA_HMAT)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("HMAT is not supported with this QEMU")); + return -1; + }
This qemuCaps validation must be moved to qemu_validation.c, qemuValidateDomainDefNuma(), to allow post-parse checking instead of runtime. This is the original intent of qemu_validate.c. Yeah, I know that there are still some validations being done here in qemu_command.c. I'll come back to this work of moving this stuff to qemu_validate.c soon (TM). As a bonus:
+ + virBufferAddLit(&buf, ",hmat=on"); + } + virCommandAddArgBuffer(cmd, &buf);
return 0; @@ -7114,6 +7124,134 @@ qemuBuildIOThreadCommandLine(virCommandPtr cmd, }
[...]
+} + + static int qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, virDomainDefPtr def, @@ -7126,9 +7264,11 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, char *next = NULL; virBufferPtr nodeBackends = NULL; bool needBackend = false; + bool hmat = false; int rc; int ret = -1; size_t ncells = virDomainNumaGetNodeCount(def->numa); + ssize_t masterInitiator = -1;
if (!virDomainNumatuneNodesetIsAvailable(def->numa, priv->autoNodeset)) goto cleanup; @@ -7138,6 +7278,16 @@ qemuBuildNumaCommandLine(virQEMUDriverConfigPtr cfg, def->os.machine)) needBackend = true;
+ if (virDomainNumaHasHMAT(def->numa)) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_NUMA_HMAT)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("HMAT is not supported with this QEMU")); + goto cleanup; + } + needBackend = true; + hmat = true; + } +
You can get rid of this second duplicated validation block since this would be already be checked once in qemuValidateDomainDefNuma(). With the validation code moved: Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/NEWS.rst b/NEWS.rst index 2fc43c31b9..8990cdaf61 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -47,6 +47,11 @@ v6.5.0 (unreleased) alphabetical order. Hook script in old place will be executed as first for backward compatibility. + * Allow configuring of NUMA HMAT + + Libvirt allows configuring Heterogeneous Memory Attribute Table to hint + software running inside the guest on optimization. + * **Improvements** * **Bug fixes** -- 2.26.2

On 6/24/20 10:49 AM, Michal Privoznik wrote:
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- NEWS.rst | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/NEWS.rst b/NEWS.rst index 2fc43c31b9..8990cdaf61 100644 --- a/NEWS.rst +++ b/NEWS.rst @@ -47,6 +47,11 @@ v6.5.0 (unreleased) alphabetical order. Hook script in old place will be executed as first for backward compatibility.
+ * Allow configuring of NUMA HMAT + + Libvirt allows configuring Heterogeneous Memory Attribute Table to hint + software running inside the guest on optimization. + * **Improvements**
Please add an ACPI mention here like: "Libvirt allows configuring ACPI Heterogeneous ...." to avoid PPC64 devs like myself having to deal with "HMAT does not work in Power9" bugs. In fact, please mention that this is an ACPI exclusive feature in formatdomain.html.in (patch 08/13) as well :) Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
* **Bug fixes**
participants (3)
-
Daniel Henrique Barboza
-
Daniel P. Berrangé
-
Michal Privoznik