[libvirt] [RFC PATCH v1 0/4] NUMA Host or Node Partitioning

From: Wim ten Have <wim.ten.have@oracle.com> This patch extends guest domain administration by adding a feature that creates a guest with a NUMA layout, also referred to as vNUMA (Virtual NUMA). NUMA (Non-Uniform Memory Access) is a method of configuring a cluster of nodes within a single multiprocessing system such that each node shares its processor local memory with other nodes, improving performance and the ability of the system to be expanded. The illustration below shows a typical 4-node NUMA system. Within this system, each socket is equipped with its own distinct memory and some also with I/O. Access to memory or I/O on remote nodes is only possible communicating through the "Interconnect." +-------------+-------+ +-------+-------------+ |NODE0| | | | | |NODE3| | | CPU00 | CPU03 | | CPU12 | CPU15 | | | | | | | | | | | Mem +--- Socket0 ---<-------->--- Socket3 ---+ Mem | | | | | | | | | +-----+ CPU01 | CPU02 | | CPU13 | CPU14 | | | I/O | | | | | | | +-----+-------^-------+ +-------^-------+-----+ | | | Interconnect | | | +-------------v-------+ +-------v-------------+ |NODE1| | | | | |NODE2| | | CPU04 | CPU07 | | CPU08 | CPU11 | | | | | | | | | | | Mem +--- Socket1 ---<-------->--- Socket2 ---+ Mem | | | | | | | | | +-----+ CPU05 | CPU06 | | CPU09 | CPU10 | | | I/O | | | | | | | +-----+-------+-------+ +-------+-------+-----+ Unfortunately, NUMA architectures have some drawbacks. For example, when data is stored in memory associated with Socket2 but is accessed by a CPU in Socket0, that CPU uses the interconnect to access the memory associated with Socket2. These interconnect hops add data access delays. Some high performance software takes NUMA architecture into account by carefully placing data in memory and pinning the processes most likely to access that data to CPUs with the shortest access times. Similarly, such software can pin its I/O processes to CPUs with the shortest access times to I/O devices. When such software is run within a guest VM, constructing the VM such that its virtual NUMA topology mirrors the physical NUMA topology preserves the application software's performance. The changes brought by this patch series add a new libvirt domain element named <vnuma> that allows for dynamic 'host' or 'node' partitioning of a guest where libvirt inspects the host capabilities and renders a best guest XML design holding a host matching vNUMA topology. <domain> .. <vnuma mode='host|node' distribution='contiguous|siblings|round-robin|interleave'> <memory unit='KiB'>524288</memory> <partition nodeset="1-4,^3" cells="8"/> </vnuma> .. </domain> The content of this <vnuma> element causes libvirt to dynamically partition the guest domain XML into a 'host' or 'node' numa model. Under <vnuma mode='host' ... > the guest domain is automatically partitioned according to the "host" capabilities. Under <vnuma mode='node' ... > the guest domain is partitioned according to the nodeset and cells under the vnuma partition subelement. The optional <vnuma> attribute distribution='type' is to indicate the guest numa cell cpus distribution. This distribution='type' can have the following values: - 'contiguous' delivery, under which the cpus enumerate sequentially over the numa defined cells. - 'siblings' cpus are distributed over the numa cells matching the host CPU SMT model. - 'round-robin' cpus are distributed over the numa cells matching the host CPU topology. - 'interleave' cpus are interleaved one at a time over the numa cells. The optional subelement <memory> specifies the memory size reserved for the guest to dimension its <numa> <cell id> size. If no memory is specified, the <vnuma> <memory> setting is acquired from the guest's total memory, <domain> <memory> setting. The optional attribute <partition> is only active when <vnuma mode='node'> is in effect and allows for defining the active "nodeset" and "cells" to target for under the "guest" domain. For example, the specified attribute "nodeset" can limit the assigned host NUMA nodes in effect under the guest with help of NUMA node tuning (<numatune>.) Alternatively, the provided "cells" attribute can define the guest number of vNUMA cells to render. We're planning a 'virsh vnuma' command to convert existing guest domains to one of these vNUMA models. Wim ten Have (4): XML definitions for guest vNUMA and parsing routines qemu: driver changes adding vNUMA vCPU hotplug support qemu: driver changes adding vNUMA memory hotplug support tests: add various tests to exercise vNUMA host partitioning docs/formatdomain.html.in | 94 ++++ docs/schemas/domaincommon.rng | 65 +++ src/conf/domain_conf.c | 482 +++++++++++++++++- src/conf/domain_conf.h | 2 + src/conf/numa_conf.c | 241 ++++++++- src/conf/numa_conf.h | 58 ++- src/libvirt_private.syms | 8 + src/qemu/qemu_driver.c | 65 ++- src/qemu/qemu_hotplug.c | 95 +++- .../cpu-host-passthrough-nonuma.args | 29 ++ .../cpu-host-passthrough-nonuma.xml | 19 + .../cpu-host-passthrough-numa-contiguous.args | 37 ++ .../cpu-host-passthrough-numa-contiguous.xml | 20 + .../cpu-host-passthrough-numa-interleave.args | 41 ++ .../cpu-host-passthrough-numa-interleave.xml | 19 + ...host-passthrough-numa-node-contiguous.args | 53 ++ ...-host-passthrough-numa-node-contiguous.xml | 21 + ...host-passthrough-numa-node-interleave.args | 41 ++ ...-host-passthrough-numa-node-interleave.xml | 22 + ...ost-passthrough-numa-node-round-robin.args | 125 +++++ ...host-passthrough-numa-node-round-robin.xml | 21 + ...u-host-passthrough-numa-node-siblings.args | 32 ++ ...pu-host-passthrough-numa-node-siblings.xml | 23 + ...cpu-host-passthrough-numa-round-robin.args | 37 ++ .../cpu-host-passthrough-numa-round-robin.xml | 22 + .../cpu-host-passthrough-numa-siblings.args | 37 ++ .../cpu-host-passthrough-numa-siblings.xml | 20 + .../cpu-host-passthrough-numa.args | 37 ++ .../cpu-host-passthrough-numa.xml | 20 + tests/qemuxml2argvtest.c | 10 + 30 files changed, 1765 insertions(+), 31 deletions(-) create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml -- 2.21.0

From: Wim ten Have <wim.ten.have@oracle.com> This patch adds XML definitions to a guest with a vNUMA layout and contains routines to parse the same. The guest vNUMA specification looks like: <vnuma mode='host|node' distribution='contiguous|siblings|round-robin|interleave'> <memory unit='#unitsize'>size</memory> <partition nodeset='#nodes' cells='#cells'/> </vnuma> With mode='host' the guest XML is rendered to match the host's NUMA topology. With mode='node' the guest XML is rendered according to the "nodes" and "cells" attributes of the <partition> element. Signed-off-by: Wim ten Have <wim.ten.have@oracle.com> --- docs/formatdomain.html.in | 94 +++++++ docs/schemas/domaincommon.rng | 65 +++++ src/conf/domain_conf.c | 482 +++++++++++++++++++++++++++++++++- src/conf/domain_conf.h | 2 + src/conf/numa_conf.c | 241 ++++++++++++++++- src/conf/numa_conf.h | 58 +++- src/libvirt_private.syms | 8 + 7 files changed, 932 insertions(+), 18 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 962766b792d3..80165f9bd896 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1294,6 +1294,98 @@ </dl> + <h3><a id="elementsvNUMAPartitioning">NUMA Host or Node Partitioning</a></h3> + + <p> + With the help of the <code>vnuma</code> element, libvirt can + dynamically partition a guest domain for vNUMA by rendering its XML + into a 'host' or 'node' <a href="#elementsNUMAtopology"><code>NUMA + topology</code></a> matching model. + </p> + +<pre> +<domain> + ... + <vnuma mode='host|node' distribution='contiguous|siblings|round-robin|interleave'> + <memory unit='KiB'>524288</memory> + <partition nodeset="1-4,^3" cells="8"/> + </vnuma> + ... +</domain> +</pre> + + <dl> + <dt><code>vnuma</code></dt> + <dd> + The attribute <code>mode</code> selects a specific rendering + method. Its value is either "host" or "node." If <code>mode</code> + is set to "host" the guest domain is automatically partitioned + to match the host NUMA topology. If <code>mode</code> + is "node," the guest domain is partitioned according to the + <code>nodeset</code> and <code>cells</code> under the + <code>vnuma</code> <code>partition</code> subelement. + <span class="since">Since 5.9</span> + + The optional attribute <code>distribution</code> selects the + guest <a href="#elementsNUMAtopology"><code>numa</code></a> + <code>cell</code> <code>cpus</code> distribution. It allows + <span class="since">Since 5.9</span> for: + <dl> + <dt><code>contiguous</code></dt> + <dd> The cpus are enumerate sequentially over the + <a href="#elementsNUMAtopology"><code>numa</code></a> defined + cells. + </dd> + <dt><code>siblings</code></dt> + <dd> The cpus are distributed over the + <a href="#elementsNUMAtopology"><code>numa</code></a> + cells matching the host CPU SMT model. + </dd> + <dt><code>round-robin</code></dt> + <dd> The cpus are distributed over the + <a href="#elementsNUMAtopology"><code>numa</code></a> + cells matching the host CPU topology. + </dd> + <dt><code>interleave</code></dt> + <dd> The cpus are interleaved one at a time over the + <a href="#elementsNUMAtopology"><code>numa</code></a> cells. + </dd> + </dl> + </dd> + + <dt><code>memory</code></dt> + <dd> + The optional subelement <code>memory</code> specifies the + memory size reserved for the guest assigned + <a href="#elementsNUMAtopology"><code>numa</code></a> cells. + <span class="since">Since 1.2.11</span>, one can use an additional + <a href="#elementsMemoryAllocation"><code>unit</code></a> + attribute to define units in which this <code>memory</code> + size is quantified. If no <code>memory</code> is specified, the + <a href="#elementsMemoryAllocation">memory</a> setting is + acquired to set this subelement documented + <a href="#elementsvNUMAPartitioning"><code>vnuma</code></a> value. + <span class="since">Since 5.9</span> + </dd> + + <dt><code>partition</code></dt> + <dd> + The optional attribute <code>partition</code> is only active when + <a href="#elementsvNUMAPartitioning"><code>vnuma</code></a> + <code>mode</code> "node" is selected and allows for defining the + active "nodeset" and "cells" to target for under the "guest" domain. + For example; the specified <code>nodeset</code> can limit the + <a href="#elementsNUMATuning"><code>numatune</code></a> assigned + host NUMA nodes in effect under the "guest". Alternatively, + the provided <code>cells</code> attribute can define the number + of <a href="#elementsNUMAtopology"><code>numa</code></a> cells + to render. + + <span class="since">Since 5.9</span> + </dd> + </dl> + + <h3><a id="elementsNUMATuning">NUMA Node Tuning</a></h3> <pre> @@ -1755,6 +1847,8 @@ </dd> </dl> + <h3><a id="elementsNUMAtopology">NUMA topology</a></h3> + <p> Guest NUMA topology can be specified using the <code>numa</code> element. <span class="since">Since 0.9.8</span> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index e06f892da393..227c856a362c 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -786,6 +786,10 @@ <ref name="cputune"/> </optional> + <optional> + <ref name="vnuma"/> + </optional> + <optional> <ref name="numatune"/> </optional> @@ -1062,6 +1066,67 @@ </choice> </define> + <!-- All the "host vnuma" related tunables would go in the vnuma --> + <define name="vnuma"> + <element name="vnuma"> + <optional> + <ref name="vnumaMode"/> + </optional> + <optional> + <ref name="vnumaDistribution"/> + </optional> + <interleave> + <optional> + <element name="memory"> + <ref name="scaledInteger"/> + </element> + </optional> + <optional> + <element name="partition"> + <optional> + <ref name="vnumaNodeset"/> + </optional> + <optional> + <ref name="vnumaCells"/> + </optional> + </element> + </optional> + </interleave> + </element> + </define> + + <define name="vnumaMode"> + <attribute name="mode"> + <choice> + <value>host</value> + <value>node</value> + </choice> + </attribute> + </define> + + <define name="vnumaDistribution"> + <attribute name="distribution"> + <choice> + <value>contiguous</value> + <value>siblings</value> + <value>round-robin</value> + <value>interleave</value> + </choice> + </attribute> + </define> + + <define name="vnumaNodeset"> + <attribute name='nodeset'> + <ref name='cpuset'/> + </attribute> + </define> + + <define name="vnumaCells"> + <attribute name='cells'> + <ref name="positiveInteger"/> + </attribute> + </define> + <!-- All the NUMA related tunables would go in the numatune --> <define name="numatune"> <element name="numatune"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 317e7846ceb0..32b29740bffd 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -1824,6 +1824,18 @@ virDomainDefSetVcpusMax(virDomainDefPtr def, if (def->maxvcpus == maxvcpus) return 0; + if (virDomainVnumaIsEnabled(def->numa)) { + size_t nnumaCell = virDomainNumaGetNodeCount(def->numa); + + if (maxvcpus % nnumaCell) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("vNUMA: the maximum vCPU count %d is not a " + "multiple of the configured vNUMA node count %ld"), + maxvcpus, nnumaCell); + return -1; + } + } + if (def->maxvcpus < maxvcpus) { if (VIR_EXPAND_N(def->vcpus, def->maxvcpus, maxvcpus - def->maxvcpus) < 0) return -1; @@ -2067,6 +2079,394 @@ virDomainDefGetVcpusTopology(const virDomainDef *def, } +void +virDomainDefSetVcpusVnuma(virDomainDefPtr def, + size_t nvcpus) +{ + int vcpuscnt = nvcpus; + size_t cell, i; + size_t vcpu_node; + size_t nnumaCell = virDomainNumaGetNodeCount(def->numa); + + if (!nnumaCell) + return; + + /* vcpu_node represents the maximum vcpus per vNUMA + * node that theoretically could be within a set. + */ + vcpu_node = (def->maxvcpus / nnumaCell) + ((def->maxvcpus % nnumaCell) ? 1 : 0); + + for (i = 0; i < vcpu_node; i++) { + for (cell = 0; cell < nnumaCell; cell++) { + virDomainVcpuDefPtr vcpu; + size_t cid = cell * vcpu_node + i; + + if (cid >= def->maxvcpus) + break; + + vcpu = def->vcpus[cid]; + + if (vcpuscnt-- > 0) + vcpu->online = true; + else + vcpu->online = false; + + /* vCPU0 cannot be hotplugged */ + if (cid) + vcpu->hotpluggable = true; + } + } + def->individualvcpus = true; + + return; +} + + +/** + * virDomainNumaAutoconfig: vNUMA automatic host partition processing + * @def: domain definition + * @caps: host capabilities + * + * vNUMA automatic host partitioning is requested by adding the <vnuma + * mode=...> element to the guest XML. See virDomainVnumaParseXML() for + * parsing the related XML and filling the virDomainAutoPartition structure. + * + * If the virDomainAutoPartition structure is valid, libvirt takes into + * account the host hardware configuration (including maxvcpus, online + * vcpus, and memory) and creates the guest such that vcpus and memory + * are spread evenly across the host. + * + * Returns 0 on success and -1 on error. + */ +static int +virDomainNumaAutoconfig(virDomainDefPtr def, + virCapsPtr caps) +{ + int ret = -1; + virBitmapPtr nodeset = NULL; + virDomainNumaPtr numa = def->numa; + virDomainAutoPartitionPtr avnuma; + + if (!numa) + goto error; + + if (caps && + (avnuma = virDomainVnumaParseXML(numa, NULL))) { + + size_t i, j, cell; + size_t nvcpus = 0; + size_t nnumaCell = 0; + size_t vcpu_node; + unsigned long long memsizeCell = 0; + virCapsHostPtr host = &caps->host; + unsigned int threads = host->cpu->threads; + + if (!def->cpu) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("vNUMA: unable to render <vnuma> partitioning for " + "domain %s because of undefined <cpu ... /> topology."), + def->name); + goto error; + } + + if (!avnuma->nodeset) { + if (!(avnuma->nodeset = virBitmapNew(host->nnumaCell))) + goto cleanup; + + for (i = 0; i < host->nnumaCell; i++) + if (virBitmapSetBit(avnuma->nodeset, i) < 0) + goto cleanup; + } + + /* Set the vNUMA cell count */ + nnumaCell = avnuma->vcell ? avnuma->vcell : virBitmapCountBits(avnuma->nodeset); + + if (!nnumaCell) + goto cleanup; + + /* Compute the online vcpus */ + for (i = 0; i < def->maxvcpus; i++) + if (def->vcpus[i]->online) + nvcpus++; + + /* vcpu_node represents the maximum vcpus per numanode that + * theoretically could be within a set. + */ + vcpu_node = (def->maxvcpus / nnumaCell) + ((def->maxvcpus % nnumaCell) ? 1 : 0); + + /* Do the host provided "CPU topology" threads fit */ + threads = (nnumaCell % threads) ? 1 : threads; + + /* Is it possible to render the guest for vNUMA auto partition? */ + if ((def->maxvcpus % nnumaCell) || + (def->maxvcpus < (nnumaCell * threads))) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("vNUMA: %ld vcpus is insufficient to " + "arrange a vNUMA topology for %ld nodes."), + def->maxvcpus, nnumaCell); + goto error; + } + + /* Compute the memory size (memsizeCell) per arranged nnumaCell. + * If no memory for vNUMA auto partitioning was specified then + * compute its value from the total_memory configuration. + */ + if ((memsizeCell = avnuma->mem / nnumaCell) == 0) { + unsigned long long hotplugMemory = 0; + + /* Calculate the size of hotplug memory */ + for (i = 0; i < def->nmems; i++) + hotplugMemory += def->mems[i]->size; + + memsizeCell = (def->mem.total_memory - hotplugMemory) / nnumaCell; + } + + /* Under vNUMA automatic host partitioning the 'memballoon' controlled + * cur_balloon value should reflect the guest's total_memory setting. + */ + def->mem.cur_balloon = def->mem.total_memory; + + /* Correct vNUMA can only be accomplished if the number of maxvcpus + * is a multiple of the number of physical nodes. If this is not + * possible we set sockets, cores and threads to 0 so libvirt creates + * a default topology where all vcpus appear as sockets and cores and + * threads are set to 1. + */ + if (def->maxvcpus % (nnumaCell * threads)) { + VIR_WARN("Disabling guest %s auto vNUMA topology because configured " + "%ld vCPUs do not match the host's %ld NUMA nodes to produce " + "an evenly balanced CPU topology.", + def->name, def->maxvcpus, nnumaCell); + def->cpu->sockets = def->cpu->cores = def->cpu->threads = 0; + } else { + /* Below computed topology aims to align the guest's sockets, + * cores and threads with the host's topology. + */ + def->cpu->cores = def->maxvcpus / (nnumaCell * threads); + def->cpu->threads = threads; + def->cpu->sockets = nnumaCell; + } + + /* Build the vNUMA topology. The previous configuration may + * have changed entirely, so free the current NUMA allocation + * and start over from scratch. + */ + virDomainNumaFree(numa); + if (!(numa = virDomainNumaNew())) + goto error; + + /* We're clean and good to rebuild the entire guest domain + * respecting the requested vNUMA topoplogy provided by <vnuma> + * avnuma stored objects. + */ + avnuma->mem = memsizeCell * nnumaCell; + + if (!virDomainNumaSetNodeCount(numa, nnumaCell)) + goto error; + + if (!(nodeset = virBitmapNewCopy(avnuma->nodeset))) + goto error; + + for (cell = 0; cell < nnumaCell; cell++) { + size_t ndistances; + size_t vcell = cell % host->nnumaCell; + size_t vcpu_strt, vcpu_last, vcpu_left; + ssize_t node = 0; + unsigned int cores = def->cpu->cores; + virBitmapPtr cpumask = NULL; + virBitmapPtr vnumask = NULL; + virCapsHostNUMACell *numaCell = NULL; + + /* per NUMA cell memory size */ + virDomainNumaSetNodeMemorySize(numa, cell, memsizeCell); + + /* per NUMA cell bind memory (mode='strict') */ + if ((node = virBitmapNextSetBit(nodeset, (vcell-1))) < 0) + node = vcell - 1; + + if (node >= host->nnumaCell) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("vNUMA: domain %s defined nodeset node %ld " + "is out of range. Valid range is 0-%ld"), + def->name, node, (host->nnumaCell-1)); + goto error; + } + + if (virDomainNumatuneSetmemset(numa, cell, node, + VIR_DOMAIN_NUMATUNE_MEM_STRICT) < 0) + goto error; + + /* per NUMA cell vcpu range to mask */ + if (!(cpumask = virBitmapNew(def->maxvcpus))) + goto error; + + switch (avnuma->distribution) { + case VIR_DOMAIN_VNUMA_DISTRIBUTION_CONTIGUOUS: + /* vcpus are equally balanced from 0 to highest vcpu id + * available, keeping ranges contiguous where the maximum vcpu + * sets run from lowest vNUMA cells to highest available. + */ + vcpu_strt = cell * vcpu_node; + vcpu_last = MIN(vcpu_strt + vcpu_node, def->maxvcpus); + + for (i = vcpu_strt; i < vcpu_last; i++) { + if (virBitmapSetBitExpand(cpumask, i) < 0) { + virBitmapFree(cpumask); + goto error; + } + } + break; + + case VIR_DOMAIN_VNUMA_DISTRIBUTION_SIBLINGS: + /* Create vNUMA node vcpu ranges that represent a clean + * processor sockets/core/threads model, placing one + * socket per NUMA node. + */ + vcpu_strt = cell * cores; + vcpu_last = def->maxvcpus; + vcpu_left = def->maxvcpus / threads; + + for (i = vcpu_strt; i < vcpu_last; i += vcpu_left) { + for (j = 0; j < cores; j++) { + unsigned int id = i + j; + + if (id < def->maxvcpus && + virBitmapSetBitExpand(cpumask, id) < 0) { + virBitmapFree(cpumask); + goto error; + } + } + } + break; + + case VIR_DOMAIN_VNUMA_DISTRIBUTION_ROUNDROBIN: + /* Create vNUMA node vcpu ranges that round-robin + * interleave one core per node over the available nodes. + */ + vcpu_strt = cell * threads; + vcpu_last = def->maxvcpus; + vcpu_left = threads * nnumaCell; + + for (i = vcpu_strt; i < vcpu_last; i += vcpu_left) { + for (j = 0; j < threads; j++) { + unsigned int id = i + j; + + if (id < def->maxvcpus && + virBitmapSetBitExpand(cpumask, id) < 0) { + virBitmapFree(cpumask); + goto error; + } + } + } + break; + + case VIR_DOMAIN_VNUMA_DISTRIBUTION_INTERLEAVE: + /* Distribute vCPUs over the NUMA nodes in a round-robin, + * interleaved fashion, with one vCPU (thread) per node. + */ + def->cpu->sockets = def->cpu->cores = def->cpu->threads = 0; + for (i = cell; i < def->maxvcpus; i += nnumaCell) { + if (virBitmapSetBitExpand(cpumask, i) < 0) { + virBitmapFree(cpumask); + goto error; + } + } + break; + + default: + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("vNUMA: domain %s non-existent vCPU distribution requested.")); + goto error; + break; + } + + if (virDomainNumaSetNodeCpumask(numa, cell, cpumask) == NULL) + goto error; + + /* per NUMA cpus sibling vNUMA pinning */ + numaCell = host->numaCell[node]; + if (!(vnumask = virBitmapNew(nnumaCell * numaCell->ncpus))) + goto error; + + for (i = 0; i < numaCell->ncpus; i++) { + unsigned int id = numaCell->cpus[i].id; + + if (virBitmapSetBitExpand(vnumask, id) < 0) { + virBitmapFree(vnumask); + goto error; + } + } + + for (i = 0; i < def->maxvcpus; i++) { + if (virBitmapIsBitSet(cpumask, i)) { + if (!(def->vcpus[i]->cpumask = virBitmapNewCopy(vnumask))) + goto error; + } + } + virBitmapFree(vnumask); + + /* per NUMA cell sibling distances */ + numaCell = host->numaCell[node]; + switch (avnuma->mode) { + case VIR_DOMAIN_VNUMA_MODE_HOST: + ndistances = numaCell->nsiblings; + break; + + case VIR_DOMAIN_VNUMA_MODE_NODE: + ndistances = 1; + if (avnuma->vcell) + vcell = cell; + else + if (virBitmapClearBit(nodeset, node) < 0) + goto error; + + break; + + default: + goto error; + } + + /* Set vNUMA distances */ + if (ndistances > 1) { + if (virDomainNumaSetNodeDistanceCount(numa, + vcell, + ndistances) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("vNUMA: domain %s failed to render a " + "matching vNUMA node distances set, defined " + "vNUMA nodes %ld build on %ld host nodes."), + def->name, nnumaCell, ndistances); + goto error; + } + + for (i = 0; i < ndistances; i++) { + unsigned int distance = numaCell->siblings[i].distance; + + if (virDomainNumaSetNodeDistance(numa, cell, i, distance) != distance) + goto error; + } + } + } + + /* We're done - enable the vNUMA marker */ + virDomainVnumaSetEnabled(numa, avnuma); + + /* Adjust the new created vNUMA description */ + def->numa = numa; + + /* per NUMA cpus sibling vNUMA hotplugging directives */ + virDomainDefSetVcpusVnuma(def, virDomainDefGetVcpus(def)); + } + cleanup: + + ret = 0; + + error: + virBitmapFree(nodeset); + return ret; +} + + virDomainDiskDefPtr virDomainDiskDefNew(virDomainXMLOptionPtr xmlopt) { @@ -10510,6 +10910,38 @@ virDomainDefSetMemoryTotal(virDomainDefPtr def, } +/** + * virDomainDefSetNUMAMemoryTotal: + * @def: domain definition + * @size: size to set + * @caps: host capabilities + * + * A frontend to set the total memory size in @def. If the guest's + * configured "total_memory" setting and the requested "size" differ, + * call virDomainNumaAutoconfig() to evenly distribute the additional + * memory across all vNUMA nodes. + */ +int +virDomainDefSetNUMAMemoryTotal(virDomainDefPtr def, + unsigned long long size, + virCapsPtr caps) +{ + bool DoNumaAutoConfig = (def->mem.total_memory != size); + + if (DoNumaAutoConfig) { + if (virDomainVnumaSetMemory(def->numa, size) < 0) + return -1; + + if (virDomainNumaAutoconfig(def, caps)) + return -1; + + if (virDomainDefPostParseMemory(def, VIR_DOMAIN_DEF_PARSE_ABI_UPDATE) < 0) + return -1; + } + return 0; +} + + /** * virDomainDefGetMemoryTotal: * @def: domain definition @@ -18809,7 +19241,8 @@ virDomainIOThreadSchedParse(xmlNodePtr node, static int virDomainVcpuParse(virDomainDefPtr def, xmlXPathContextPtr ctxt, - virDomainXMLOptionPtr xmlopt) + virDomainXMLOptionPtr xmlopt, + bool IsAvNUMA) { int n; xmlNodePtr vcpuNode; @@ -18876,6 +19309,15 @@ virDomainVcpuParse(virDomainDefPtr def, if (virDomainDefSetVcpusMax(def, maxvcpus, xmlopt) < 0) return -1; + /* If vNUMA applies def->numa is reinitialized later */ + if (IsAvNUMA) { + + if (virDomainDefSetVcpus(def, vcpus) < 0) + return -1; + + return 0; + } + if ((n = virXPathNodeSet("./vcpus/vcpu", ctxt, &nodes)) < 0) return -1; @@ -19746,6 +20188,7 @@ virDomainDefParseXML(xmlDocPtr xml, char *netprefix = NULL; g_autofree xmlNodePtr *nodes = NULL; g_autofree char *tmp = NULL; + bool IsAvNUMA; if (flags & VIR_DOMAIN_DEF_PARSE_VALIDATE_SCHEMA) { g_autofree char *schema = NULL; @@ -19871,6 +20314,8 @@ virDomainDefParseXML(xmlDocPtr xml, } VIR_FREE(tmp); + IsAvNUMA = virDomainVnumaParseXML(def->numa, ctxt) ? true : false; + tmp = virXPathString("string(./memoryBacking/source/@type)", ctxt); if (tmp) { if ((def->mem.source = virDomainMemorySourceTypeFromString(tmp)) <= 0) { @@ -19986,7 +20431,7 @@ virDomainDefParseXML(xmlDocPtr xml, &def->mem.swap_hard_limit) < 0) goto error; - if (virDomainVcpuParse(def, ctxt, xmlopt) < 0) + if (virDomainVcpuParse(def, ctxt, xmlopt, IsAvNUMA) < 0) goto error; if (virDomainDefParseIOThreads(def, ctxt) < 0) @@ -20059,14 +20504,16 @@ virDomainDefParseXML(xmlDocPtr xml, goto error; } - if ((n = virXPathNodeSet("./cputune/vcpupin", ctxt, &nodes)) < 0) - goto error; - - for (i = 0; i < n; i++) { - if (virDomainVcpuPinDefParseXML(def, nodes[i])) + if (!IsAvNUMA) { + if ((n = virXPathNodeSet("./cputune/vcpupin", ctxt, &nodes)) < 0) goto error; + + for (i = 0; i < n; i++) { + if (virDomainVcpuPinDefParseXML(def, nodes[i])) + goto error; + } + VIR_FREE(nodes); } - VIR_FREE(nodes); if ((n = virXPathNodeSet("./cputune/emulatorpin", ctxt, &nodes)) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -20173,6 +20620,10 @@ virDomainDefParseXML(xmlDocPtr xml, if (virDomainNumaDefCPUParseXML(def->numa, ctxt) < 0) goto error; + /* Check and update the guest's XML vNUMA topology if needed */ + if (virDomainNumaAutoconfig(def, caps)) + goto error; + if (virDomainNumaGetCPUCountTotal(def->numa) > virDomainDefGetVcpusMax(def)) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Number of CPUs in <numa> exceeds the" @@ -20186,10 +20637,11 @@ virDomainDefParseXML(xmlDocPtr xml, goto error; } - if (virDomainNumatuneParseXML(def->numa, - def->placement_mode == - VIR_DOMAIN_CPU_PLACEMENT_MODE_STATIC, - ctxt) < 0) + if (!virDomainVnumaIsEnabled(def->numa) && + (virDomainNumatuneParseXML(def->numa, + def->placement_mode == + VIR_DOMAIN_CPU_PLACEMENT_MODE_STATIC, + ctxt) < 0)) goto error; if (virDomainNumatuneHasPlacementAuto(def->numa) && @@ -28496,6 +28948,9 @@ virDomainDefFormatInternalSetRootName(virDomainDefPtr def, if (virDomainMemtuneFormat(buf, &def->mem) < 0) goto error; + if (virDomainVnumaFormatXML(buf, def->numa) < 0) + goto error; + if (virDomainCpuDefFormat(buf, def) < 0) goto error; @@ -29148,6 +29603,9 @@ virDomainSaveConfig(const char *configDir, { g_autofree char *xml = NULL; + if (virDomainNumaAutoconfig(def, caps) < 0) + return -1; + if (!(xml = virDomainDefFormat(def, caps, VIR_DOMAIN_DEF_FORMAT_SECURE))) return -1; diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 5a17acedf299..0db77d9247a1 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2535,6 +2535,7 @@ struct _virDomainDef { unsigned long long virDomainDefGetMemoryInitial(const virDomainDef *def); void virDomainDefSetMemoryTotal(virDomainDefPtr def, unsigned long long size); +int virDomainDefSetNUMAMemoryTotal(virDomainDefPtr def, unsigned long long size, virCapsPtr caps); unsigned long long virDomainDefGetMemoryTotal(const virDomainDef *def); bool virDomainDefHasMemoryHotplug(const virDomainDef *def); @@ -2816,6 +2817,7 @@ int virDomainDefSetVcpusMax(virDomainDefPtr def, bool virDomainDefHasVcpusOffline(const virDomainDef *def); unsigned int virDomainDefGetVcpusMax(const virDomainDef *def); int virDomainDefSetVcpus(virDomainDefPtr def, unsigned int vcpus); +void virDomainDefSetVcpusVnuma(virDomainDefPtr def, size_t vcpus); unsigned int virDomainDefGetVcpus(const virDomainDef *def); virBitmapPtr virDomainDefGetOnlineVcpumap(const virDomainDef *def); virDomainVcpuDefPtr virDomainDefGetVcpu(virDomainDefPtr def, unsigned int vcpu) diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 6720d5620d1d..8e6ef4008b8d 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -45,6 +45,20 @@ VIR_ENUM_IMPL(virDomainNumatuneMemMode, "interleave", ); +VIR_ENUM_IMPL(virDomainVnumaMode, + VIR_DOMAIN_VNUMA_MODE_LAST, + "host", + "node", +); + +VIR_ENUM_IMPL(virDomainVnumaDistribution, + VIR_DOMAIN_VNUMA_DISTRIBUTION_LAST, + "contiguous", + "siblings", + "round-robin", + "interleave", +); + VIR_ENUM_IMPL(virDomainNumatunePlacement, VIR_DOMAIN_NUMATUNE_PLACEMENT_LAST, "default", @@ -90,6 +104,7 @@ struct _virDomainNuma { size_t nmem_nodes; /* Future NUMA tuning related stuff should go here. */ + virDomainAutoPartitionPtr avnuma; }; @@ -353,6 +368,156 @@ virDomainNumatuneFormatXML(virBufferPtr buf, return 0; } +int +virDomainVnumaFormatXML(virBufferPtr buf, + virDomainNumaPtr numa) +{ + char *nodeset = NULL; + if (numa && virDomainVnumaIsEnabled(numa)) { + + virBufferAddLit(buf, "<vnuma"); + virBufferAsprintf(buf, " mode='%s'", + virDomainVnumaModeTypeToString(numa->avnuma->mode)); + virBufferAsprintf(buf, " distribution='%s'", + virDomainVnumaDistributionTypeToString(numa->avnuma->distribution)); + virBufferAddLit(buf, ">\n"); + + virBufferAdjustIndent(buf, 2); + virBufferAsprintf(buf, "<memory unit='KiB'>%llu</memory>\n", + numa->avnuma->mem); + + + if (numa->avnuma->mode == VIR_DOMAIN_VNUMA_MODE_NODE) { + if ((nodeset = virBitmapFormat(numa->avnuma->nodeset))) { + virBufferAsprintf(buf, "<partition nodeset='%s'", nodeset); + VIR_FREE(nodeset); + } + + if (numa->avnuma->vcell) + virBufferAsprintf(buf, " cells='%u'", numa->avnuma->vcell); + virBufferAddLit(buf, "/>\n"); + } + virBufferAdjustIndent(buf, -2); + + virBufferAddLit(buf, "</vnuma>\n"); + } + + return 0; +} + +virDomainAutoPartitionPtr +virDomainVnumaParseXML(virDomainNumaPtr numa, + xmlXPathContextPtr ctxt) +{ + int ret = -1; + char *tmp = NULL; + xmlNodePtr node, oldnode; + virDomainAutoPartitionPtr avnuma = NULL; + + if (!numa) + return NULL; + + if (!ctxt) + return avnuma = numa->avnuma; + + oldnode = ctxt->node; + node = virXPathNode("./vnuma[1]", ctxt); + if (node) { + int mode = -1; + int distribution = VIR_DOMAIN_VNUMA_DISTRIBUTION_CONTIGUOUS; + unsigned int maxvcell = 0; + unsigned long long mem = 0L; + virBitmapPtr nodeset = NULL; + + if (!virXMLNodeNameEqual(node, "vnuma")) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("domain definition does not contain expected 'vnuma' element")); + goto cleanup; + } + + if (VIR_ALLOC(avnuma) < 0) + goto cleanup; + + /* There has to be a valid vnuma mode setting */ + if (!(tmp = virXMLPropString(node, "mode"))) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("No vNUMA 'mode' specified for automatic host partitioning")); + goto cleanup; + } + + if ((mode = virDomainVnumaModeTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unsupported automatic vNUMA partitioning mode '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + /* If specified get the vcpu 'distribution' type */ + if ((tmp = virXMLPropString(node, "distribution")) && + (distribution = virDomainVnumaDistributionTypeFromString(tmp)) < 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unsupported automatic vNUMA partitioning distribution '%s'"), tmp); + goto cleanup; + } + VIR_FREE(tmp); + + /* Obtain the designated <vnuma mode='node' attributes */ + ctxt->node = node; + switch (mode) { + case VIR_DOMAIN_VNUMA_MODE_NODE: + if ((node = virXPathNode("./partition[1]", ctxt))) { + + /* Get the host <partition> nodeset='#nodeset' for <numatune> */ + if ((tmp = virXMLPropString(node, "nodeset"))) { + if (virBitmapParse(tmp, &nodeset, VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + VIR_FREE(tmp); + } + + /* Get the fictitious <partition> cells='#count' attribute */ + if ((tmp = virXMLPropString(node, "cells"))) { + if (virStrToLong_ui(tmp, NULL, 10, &maxvcell) < 0) { + virReportError(VIR_ERR_XML_ERROR, "%s", + _("maximum vcpus count must be an integer")); + goto cleanup; + } + VIR_FREE(tmp); + } + } + break; + + case VIR_DOMAIN_VNUMA_MODE_HOST: + default: + break; + } + + /* Get the <memory> size to render the <numa> nodes with */ + if (virDomainParseMemory("./memory[1]", NULL, ctxt, + &mem, false, true) < 0) + goto cleanup; + + /* We're set and good to go */ + avnuma->mode = mode; + avnuma->distribution = distribution; + avnuma->nodeset = nodeset; + avnuma->mem = mem; + avnuma->vcell = maxvcell; + + numa->avnuma = avnuma; + } + ret = 0; + + cleanup: + if (ret) { + VIR_FREE(tmp); + VIR_FREE(avnuma); + avnuma = NULL; + } + ctxt->node = oldnode; + + return avnuma; +} + void virDomainNumaFree(virDomainNumaPtr numa) { @@ -572,6 +737,76 @@ virDomainNumatuneSet(virDomainNumaPtr numa, return ret; } +int +virDomainNumatuneSetmemset(virDomainNumaPtr numa, + size_t cell, + size_t node, + int mode) +{ + int ret = -1; + virDomainNumaNodePtr mem_node = &numa->mem_nodes[cell]; + + /* Get out if this is under control of numad! */ + if (numa->memory.specified) + goto cleanup; + + /* Get out if numa does not apply */ + if (cell > numa->nmem_nodes) + goto cleanup; + + /* Get out if mode is out of range */ + if (mode < 0 || mode >= VIR_DOMAIN_NUMATUNE_MEM_LAST) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Unsupported numatune mode '%d'"), + mode); + goto cleanup; + } + + /* Force the numatune/memset setting */ + if (!(mem_node->nodeset = virBitmapNew(numa->nmem_nodes)) || + virBitmapSetBitExpand(mem_node->nodeset, node) < 0) { + virBitmapFree(mem_node->nodeset); + goto cleanup; + } + mem_node->mode = mode; + + ret = 0; + + cleanup: + return ret; +} + +bool +virDomainVnumaIsEnabled(virDomainNumaPtr numa) +{ + if (numa && numa->avnuma) + return numa->avnuma->specified; + + return false; +} + +void +virDomainVnumaSetEnabled(virDomainNumaPtr numa, + virDomainAutoPartitionPtr avnuma) +{ + if (numa && avnuma) { + numa->avnuma = avnuma; + numa->avnuma->specified = true; + } +} + +int +virDomainVnumaSetMemory(virDomainNumaPtr numa, + unsigned long long size) +{ + if (!numa) + return -1; + + numa->avnuma->mem = size; + + return 0; +} + static bool virDomainNumaNodesEqual(virDomainNumaPtr n1, virDomainNumaPtr n2) @@ -1273,7 +1508,7 @@ virDomainNumaSetNodeDistance(virDomainNumaPtr numa, } -size_t +int virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa, size_t node, size_t ndistances) @@ -1285,11 +1520,11 @@ virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa, virReportError(VIR_ERR_INTERNAL_ERROR, _("Cannot alter an existing nmem_nodes distances set for node: %zu"), node); - return 0; + return -1; } if (VIR_ALLOC_N(distances, ndistances) < 0) - return 0; + return -1; numa->mem_nodes[node].distances = distances; numa->mem_nodes[node].ndistances = ndistances; diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h index e76a09c20cdc..bdc1deb6e143 100644 --- a/src/conf/numa_conf.h +++ b/src/conf/numa_conf.h @@ -32,6 +32,9 @@ typedef struct _virDomainNuma virDomainNuma; typedef virDomainNuma *virDomainNumaPtr; +typedef struct _virDomainAutoPartition virDomainAutoPartition; +typedef virDomainAutoPartition *virDomainAutoPartitionPtr; + typedef enum { VIR_DOMAIN_NUMATUNE_PLACEMENT_DEFAULT = 0, VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC, @@ -43,6 +46,24 @@ typedef enum { VIR_ENUM_DECL(virDomainNumatunePlacement); VIR_ENUM_DECL(virDomainNumatuneMemMode); +typedef enum { + VIR_DOMAIN_VNUMA_MODE_HOST = 0, + VIR_DOMAIN_VNUMA_MODE_NODE, + + VIR_DOMAIN_VNUMA_MODE_LAST +} virDomainVnumaMode; +VIR_ENUM_DECL(virDomainVnumaMode); + +typedef enum { + VIR_DOMAIN_VNUMA_DISTRIBUTION_CONTIGUOUS = 0, + VIR_DOMAIN_VNUMA_DISTRIBUTION_SIBLINGS, + VIR_DOMAIN_VNUMA_DISTRIBUTION_ROUNDROBIN, + VIR_DOMAIN_VNUMA_DISTRIBUTION_INTERLEAVE, + + VIR_DOMAIN_VNUMA_DISTRIBUTION_LAST +} virDomainVnumaDistribution; +VIR_ENUM_DECL(virDomainVnumaDistribution); + typedef enum { VIR_DOMAIN_MEMORY_ACCESS_DEFAULT = 0, /* No memory access defined */ VIR_DOMAIN_MEMORY_ACCESS_SHARED, /* Memory access is set as shared */ @@ -52,6 +73,14 @@ typedef enum { } virDomainMemoryAccess; VIR_ENUM_DECL(virDomainMemoryAccess); +struct _virDomainAutoPartition { + bool specified; /* Auto vNUMA active */ + int mode; /* Auto vNUMA mode */ + int distribution; /* Auto vNUMA distribution */ + unsigned long long mem; /* Auto vNUMA total memory */ + unsigned int vcell; /* Auto vNUMA node Cell */ + virBitmapPtr nodeset; /* Auto vNUMA host nodes where this guest node resides */ +}; virDomainNumaPtr virDomainNumaNew(void); void virDomainNumaFree(virDomainNumaPtr numa); @@ -67,9 +96,19 @@ int virDomainNumatuneParseXML(virDomainNumaPtr numa, int virDomainNumatuneFormatXML(virBufferPtr buf, virDomainNumaPtr numatune) ATTRIBUTE_NONNULL(1); +virDomainAutoPartitionPtr virDomainVnumaParseXML(virDomainNumaPtr numa, + xmlXPathContextPtr ctxt) + ATTRIBUTE_NONNULL(1); + +int virDomainVnumaFormatXML(virBufferPtr buf, virDomainNumaPtr numa) + ATTRIBUTE_NONNULL(1); + /* * Getters */ +bool virDomainVnumaIsEnabled(virDomainNumaPtr numa) + ATTRIBUTE_NONNULL(1); + int virDomainNumatuneGetMode(virDomainNumaPtr numatune, int cellid, virDomainNumatuneMemMode *mode); @@ -134,6 +173,19 @@ int virDomainNumatuneSet(virDomainNumaPtr numa, virBitmapPtr nodeset) ATTRIBUTE_NONNULL(1); +void virDomainVnumaSetEnabled(virDomainNumaPtr numa, + virDomainAutoPartitionPtr avnuma) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int virDomainVnumaSetMemory(virDomainNumaPtr numa, + unsigned long long size) + ATTRIBUTE_NONNULL(1); + +int virDomainNumatuneSetmemset(virDomainNumaPtr numa, + size_t cell, + size_t node, + int mode) + ATTRIBUTE_NONNULL(1); + size_t virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes) ATTRIBUTE_NONNULL(1); @@ -149,9 +201,9 @@ int virDomainNumaSetNodeDistance(virDomainNumaPtr numa, unsigned int value) ATTRIBUTE_NONNULL(1); -size_t virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa, - size_t node, - size_t ndistances) +int virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa, + size_t node, + size_t ndistances) ATTRIBUTE_NONNULL(1); virBitmapPtr virDomainNumaSetNodeCpumask(virDomainNumaPtr numa, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 17977229d18f..7f7c3fdeafaa 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -311,8 +311,10 @@ virDomainDefParseNode; virDomainDefParseString; virDomainDefPostParse; virDomainDefSetMemoryTotal; +virDomainDefSetNUMAMemoryTotal; virDomainDefSetVcpus; virDomainDefSetVcpusMax; +virDomainDefSetVcpusVnuma; virDomainDefValidate; virDomainDefVcpuOrderClear; virDomainDeleteConfig; @@ -828,7 +830,13 @@ virDomainNumatuneParseXML; virDomainNumatunePlacementTypeFromString; virDomainNumatunePlacementTypeToString; virDomainNumatuneSet; +virDomainNumatuneSetmemset; virDomainNumatuneSpecifiedMaxNode; +virDomainVnumaFormatXML; +virDomainVnumaIsEnabled; +virDomainVnumaParseXML; +virDomainVnumaSetEnabled; +virDomainVnumaSetMemory; # conf/nwfilter_conf.h -- 2.21.0

From: Wim ten Have <wim.ten.have@oracle.com> Add support for hot plugging/unplugging vCPUs in vNUMA partitioned KVM guests. Signed-off-by: Wim ten Have <wim.ten.have@oracle.com> Signed-off-by: Menno Lageman <menno.lageman@oracle.com> --- src/qemu/qemu_driver.c | 6 ++- src/qemu/qemu_hotplug.c | 95 ++++++++++++++++++++++++++++++++++++++--- 2 files changed, 94 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 71947efa4e50..e64afcb8efc9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4965,14 +4965,16 @@ qemuDomainSetVcpusMax(virQEMUDriverPtr driver, return -1; } - if (virDomainNumaGetCPUCountTotal(persistentDef->numa) > nvcpus) { + if (!virDomainVnumaIsEnabled(persistentDef->numa) && + virDomainNumaGetCPUCountTotal(persistentDef->numa) > nvcpus) { virReportError(VIR_ERR_INVALID_ARG, "%s", _("Number of CPUs in <numa> exceeds the desired " "maximum vcpu count")); return -1; } - if (virDomainDefGetVcpusTopology(persistentDef, &topologycpus) == 0 && + if (!virDomainVnumaIsEnabled(persistentDef->numa) && + virDomainDefGetVcpusTopology(persistentDef, &topologycpus) == 0 && nvcpus != topologycpus) { /* allow setting a valid vcpu count for the topology so an invalid * setting may be corrected via this API */ diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 2d47f7461f93..2d48c5bba762 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -6081,6 +6081,60 @@ qemuDomainHotplugAddVcpu(virQEMUDriverPtr driver, } +/** + * qemuDomainGetNumaMappedVcpuEntry: + * + * In case of vNUMA guest description we need the node + * mapped vcpu to ensure that guest vcpus are hot-plugged + * or hot-unplugged in a round-robin fashion with whole + * cores on the same NUMA node so they get sibling host + * CPUs. + * + * 2 NUMA node system, 2 threads/core: + * +---+---+---+---+---+---+---+---+---+---+--// + * vcpu | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |... + * +---+---+---+---+---+---+---+---+---+---+--// + * NUMA \------/ \-----/ \-----/ \-----/ \-----/ \-// + * node 0 1 0 1 0 ... + * + * bit 0 1 0 1 2 3 2 3 4 5 ... + * + * 4 NUMA node system, 2 threads/core: + * +---+---+---+---+---+---+---+---+---+---+--// + * vcpu | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |... + * +---+---+---+---+---+---+---+---+---+---+--// + * NUMA \------/ \-----/ \-----/ \-----/ \-----/ \-// + * node 0 1 2 3 0 ... + * + * bit 0 1 0 1 0 1 0 1 2 3 ... + * + */ +static ssize_t +qemuDomainGetNumaMappedVcpuEntry(virDomainDefPtr def, + ssize_t vcpu) +{ + virBitmapPtr nodecpumask = NULL; + size_t ncells = virDomainNumaGetNodeCount(def->numa); + size_t threads = def->cpu->threads ? def->cpu->threads : 1; + ssize_t node, bit, pcpu = -1; + + if (!ncells) + return vcpu; + + node = (vcpu / threads) % ncells; + nodecpumask = virDomainNumaGetNodeCpumask(def->numa, node); + + bit = ((vcpu / (threads * ncells)) * threads) + (vcpu % threads); + + while (((pcpu = virBitmapNextSetBit(nodecpumask, pcpu)) >= 0) && bit--); + + /* GIGO: Garbage In? Garbage Out! */ + pcpu = (pcpu < 0) ? vcpu : pcpu; + + return pcpu; +} + + /** * qemuDomainSelectHotplugVcpuEntities: * @@ -6104,7 +6158,27 @@ qemuDomainSelectHotplugVcpuEntities(virDomainDefPtr def, qemuDomainVcpuPrivatePtr vcpupriv; unsigned int maxvcpus = virDomainDefGetVcpusMax(def); unsigned int curvcpus = virDomainDefGetVcpus(def); - ssize_t i; + ssize_t i, target; + size_t threads = def->cpu->threads; + size_t nnumaCell = virDomainNumaGetNodeCount(def->numa); + size_t minvcpus = nnumaCell * threads; + bool HasAutonuma = virDomainVnumaIsEnabled(def->numa); + + /* If SMT topology is in place, check that the number of vcpus meets + * the following constraints: + * - at least one fully used core is assigned on each NUMA node + * - cores must be used fully, i.e. all threads of a core are assigned to + * the same guest + */ + if (HasAutonuma && threads && + (nvcpus < minvcpus || (nvcpus - minvcpus) % threads)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("vNUMA: guest %s configured %d vcpus setting " + "does not fit the vNUMA topology for at " + "least one whole core per vNUMA node."), + def->name, nvcpus); + goto error; + } if (!(ret = virBitmapNew(maxvcpus))) return NULL; @@ -6113,7 +6187,9 @@ qemuDomainSelectHotplugVcpuEntities(virDomainDefPtr def, *enable = true; for (i = 0; i < maxvcpus && curvcpus < nvcpus; i++) { - vcpu = virDomainDefGetVcpu(def, i); + + target = qemuDomainGetNumaMappedVcpuEntry(def, i); + vcpu = virDomainDefGetVcpu(def, target); vcpupriv = QEMU_DOMAIN_VCPU_PRIVATE(vcpu); if (vcpu->online) @@ -6130,14 +6206,17 @@ qemuDomainSelectHotplugVcpuEntities(virDomainDefPtr def, "desired vcpu count")); goto error; } + VIR_DEBUG("guest %s hotplug target vcpu = %zd\n", def->name, target); - ignore_value(virBitmapSetBit(ret, i)); + ignore_value(virBitmapSetBit(ret, target)); } } else { *enable = false; for (i = maxvcpus - 1; i >= 0 && curvcpus > nvcpus; i--) { - vcpu = virDomainDefGetVcpu(def, i); + + target = qemuDomainGetNumaMappedVcpuEntry(def, i); + vcpu = virDomainDefGetVcpu(def, target); vcpupriv = QEMU_DOMAIN_VCPU_PRIVATE(vcpu); if (!vcpu->online) @@ -6157,8 +6236,9 @@ qemuDomainSelectHotplugVcpuEntities(virDomainDefPtr def, "desired vcpu count")); goto error; } + VIR_DEBUG("guest %s hotunplug target vcpu = %zd\n", def->name, target); - ignore_value(virBitmapSetBit(ret, i)); + ignore_value(virBitmapSetBit(ret, target)); } } @@ -6241,6 +6321,11 @@ qemuDomainSetVcpusConfig(virDomainDefPtr def, if (curvcpus == nvcpus) return; + if (virDomainVnumaIsEnabled(def->numa)) { + virDomainDefSetVcpusVnuma(def, nvcpus); + return; + } + if (curvcpus < nvcpus) { for (i = 0; i < maxvcpus; i++) { vcpu = virDomainDefGetVcpu(def, i); -- 2.21.0

From: Wim ten Have <wim.ten.have@oracle.com> Add support for hot plugging memory into vNUMA partitioned KVM guests. Hot plugging memory without a target <numa> node with result in evenly balancing the added memory along all vNUMA nodes. Signed-off-by: Wim ten Have <wim.ten.have@oracle.com> --- src/qemu/qemu_driver.c | 59 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 53 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e64afcb8efc9..8d1f0bf13cb7 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2348,9 +2348,12 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, } if (persistentDef) { - /* resizing memory with NUMA nodes specified doesn't work as there - * is no way to change the individual node sizes with this API */ - if (virDomainNumaGetNodeCount(persistentDef->numa) > 0) { + /* Resizing memory with NUMA nodes specified doesn't work, as there + * is no way to change the individual node sizes with this API, but + * when vNUMA automatic partitioning is in effect resizing is possible. + */ + if (!virDomainVnumaIsEnabled(persistentDef->numa) && + virDomainNumaGetNodeCount(persistentDef->numa) > 0) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("initial memory size of a domain with NUMA " "nodes cannot be modified with this API")); @@ -2365,7 +2368,12 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, goto endjob; } - virDomainDefSetMemoryTotal(persistentDef, newmem); + if (virDomainDefSetNUMAMemoryTotal(persistentDef, newmem, driver->caps) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("failed to distribute newly configured " + "memory across NUMA nodes")); + goto endjob; + } if (persistentDef->mem.cur_balloon > newmem) persistentDef->mem.cur_balloon = newmem; @@ -2378,6 +2386,18 @@ static int qemuDomainSetMemoryFlags(virDomainPtr dom, unsigned long newmem, /* resize the current memory */ unsigned long oldmax = 0; + if ((def && + virDomainVnumaIsEnabled(def->numa) && + virDomainNumaGetNodeCount(def->numa)) || + (persistentDef && + virDomainVnumaIsEnabled(persistentDef->numa) && + virDomainNumaGetNodeCount(persistentDef->numa)) > 0) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("the current memory size of a domain with NUMA " + "nodes cannot be modified with this API")); + goto endjob; + } + if (def) oldmax = virDomainDefGetMemoryTotal(def); if (persistentDef) { @@ -7820,6 +7840,7 @@ qemuDomainAttachDeviceLive(virDomainObjPtr vm, { int ret = -1; const char *alias = NULL; + virDomainMemoryDefPtr mem; switch ((virDomainDeviceType)dev->type) { case VIR_DOMAIN_DEVICE_DISK: @@ -7895,8 +7916,34 @@ qemuDomainAttachDeviceLive(virDomainObjPtr vm, case VIR_DOMAIN_DEVICE_MEMORY: /* note that qemuDomainAttachMemory always consumes dev->data.memory * and dispatches DeviceAdded event on success */ - ret = qemuDomainAttachMemory(driver, vm, - dev->data.memory); + + mem = dev->data.memory; + if (mem->targetNode >= 0) { + ret = qemuDomainAttachMemory(driver, vm, + dev->data.memory); + } else { + size_t i, ncells = virDomainNumaGetNodeCount(vm->def->numa); + unsigned long long memsizeCell = dev->data.memory->size / ncells; + + for (i = 0; i < ncells; i++) { + + if (VIR_ALLOC(mem) < 0) { + ret = -1; + break; + } + + memcpy(mem, dev->data.memory, sizeof(virDomainMemoryDef)); + + if (dev->data.memory->sourceNodes) + virBitmapCopy(mem->sourceNodes, dev->data.memory->sourceNodes); + + mem->size = memsizeCell; + mem->targetNode = i; + + ret = qemuDomainAttachMemory(driver, vm, mem); + } + virDomainMemoryDefFree(dev->data.memory); + } dev->data.memory = NULL; break; -- 2.21.0

From: Wim ten Have <wim.ten.have@oracle.com> Tests for the new <vnuma> element and its variations. Signed-off-by: Wim ten Have <wim.ten.have@oracle.com> --- .../cpu-host-passthrough-nonuma.args | 29 ++++ .../cpu-host-passthrough-nonuma.xml | 19 +++ .../cpu-host-passthrough-numa-contiguous.args | 37 ++++++ .../cpu-host-passthrough-numa-contiguous.xml | 20 +++ .../cpu-host-passthrough-numa-interleave.args | 41 ++++++ .../cpu-host-passthrough-numa-interleave.xml | 19 +++ ...host-passthrough-numa-node-contiguous.args | 53 ++++++++ ...-host-passthrough-numa-node-contiguous.xml | 21 +++ ...host-passthrough-numa-node-interleave.args | 41 ++++++ ...-host-passthrough-numa-node-interleave.xml | 22 +++ ...ost-passthrough-numa-node-round-robin.args | 125 ++++++++++++++++++ ...host-passthrough-numa-node-round-robin.xml | 21 +++ ...u-host-passthrough-numa-node-siblings.args | 32 +++++ ...pu-host-passthrough-numa-node-siblings.xml | 23 ++++ ...cpu-host-passthrough-numa-round-robin.args | 37 ++++++ .../cpu-host-passthrough-numa-round-robin.xml | 22 +++ .../cpu-host-passthrough-numa-siblings.args | 37 ++++++ .../cpu-host-passthrough-numa-siblings.xml | 20 +++ .../cpu-host-passthrough-numa.args | 37 ++++++ .../cpu-host-passthrough-numa.xml | 20 +++ tests/qemuxml2argvtest.c | 10 ++ 21 files changed, 686 insertions(+) create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.xml create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.args create mode 100644 tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args b/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args new file mode 100644 index 000000000000..197bda882a01 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.args @@ -0,0 +1,29 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 214 \ +-realtime mlock=off \ +-smp 1,sockets=1,cores=1,threads=1 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml new file mode 100644 index 000000000000..c7838aed8e12 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-nonuma.xml @@ -0,0 +1,19 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>1</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.args new file mode 100644 index 000000000000..6a59cf7b44e6 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.args @@ -0,0 +1,37 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 216 \ +-realtime mlock=off \ +-smp 1,maxcpus=16,sockets=4,cores=4,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=56623104,host-nodes=0,policy=bind \ +-numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=56623104,host-nodes=1,policy=bind \ +-numa node,nodeid=1,cpus=4-7,memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=56623104,host-nodes=2,policy=bind \ +-numa node,nodeid=2,cpus=8-11,memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=56623104,host-nodes=3,policy=bind \ +-numa node,nodeid=3,cpus=12-15,memdev=ram-node3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.xml new file mode 100644 index 000000000000..7fcb0e8997d9 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-contiguous.xml @@ -0,0 +1,20 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <vnuma mode='host' distribution='contiguous'/> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.args new file mode 100644 index 000000000000..58bb366062f5 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.args @@ -0,0 +1,41 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 216 \ +-realtime mlock=off \ +-smp 1,maxcpus=24,sockets=24,cores=1,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=56623104,host-nodes=0,policy=bind \ +-numa node,nodeid=0,cpus=0,cpus=4,cpus=8,cpus=12,cpus=16,cpus=20,\ +memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=56623104,host-nodes=1,policy=bind \ +-numa node,nodeid=1,cpus=1,cpus=5,cpus=9,cpus=13,cpus=17,cpus=21,\ +memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=56623104,host-nodes=2,policy=bind \ +-numa node,nodeid=2,cpus=2,cpus=6,cpus=10,cpus=14,cpus=18,cpus=22,\ +memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=56623104,host-nodes=3,policy=bind \ +-numa node,nodeid=3,cpus=3,cpus=7,cpus=11,cpus=15,cpus=19,cpus=23,\ +memdev=ram-node3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.xml new file mode 100644 index 000000000000..86e385808511 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-interleave.xml @@ -0,0 +1,19 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>24</vcpu> + <vnuma mode='host' distribution='interleave'/> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.args new file mode 100644 index 000000000000..bd360976e553 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.args @@ -0,0 +1,53 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 32768 \ +-realtime mlock=off \ +-smp 1,maxcpus=16,sockets=8,cores=2,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=4294967296,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=4294967296,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=4294967296,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=2,cpus=4-5,memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=4294967296,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=3,cpus=6-7,memdev=ram-node3 \ +-object memory-backend-ram,id=ram-node4,size=4294967296,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=4,cpus=8-9,memdev=ram-node4 \ +-object memory-backend-ram,id=ram-node5,size=4294967296,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=5,cpus=10-11,memdev=ram-node5 \ +-object memory-backend-ram,id=ram-node6,size=4294967296,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=6,cpus=12-13,memdev=ram-node6 \ +-object memory-backend-ram,id=ram-node7,size=4294967296,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=7,cpus=14-15,memdev=ram-node7 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.xml new file mode 100644 index 000000000000..4c71ca30cc47 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-contiguous.xml @@ -0,0 +1,21 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='GiB'>32</memory> + <vcpu placement='static'>16</vcpu> + <vnuma mode='node' distribution='contiguous'> + <partition nodeset='1,2' cells='8'/> + </vnuma> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.args new file mode 100644 index 000000000000..c7e591ebf48b --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.args @@ -0,0 +1,41 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 18 \ +-realtime mlock=off \ +-smp 1,maxcpus=24,sockets=24,cores=1,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=3145728,host-nodes=0,policy=bind \ +-numa node,nodeid=0,cpus=0,cpus=6,cpus=12,cpus=18,memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=3145728,host-nodes=1,policy=bind \ +-numa node,nodeid=1,cpus=1,cpus=7,cpus=13,cpus=19,memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=3145728,host-nodes=2,policy=bind \ +-numa node,nodeid=2,cpus=2,cpus=8,cpus=14,cpus=20,memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=3145728,host-nodes=3,policy=bind \ +-numa node,nodeid=3,cpus=3,cpus=9,cpus=15,cpus=21,memdev=ram-node3 \ +-object memory-backend-ram,id=ram-node4,size=3145728,host-nodes=0,policy=bind \ +-numa node,nodeid=4,cpus=4,cpus=10,cpus=16,cpus=22,memdev=ram-node4 \ +-object memory-backend-ram,id=ram-node5,size=3145728,host-nodes=1,policy=bind \ +-numa node,nodeid=5,cpus=5,cpus=11,cpus=17,cpus=23,memdev=ram-node5 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.xml new file mode 100644 index 000000000000..ddfb8c06b4f2 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-interleave.xml @@ -0,0 +1,22 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>24</vcpu> + <vnuma mode='node' distribution='interleave'> + <memory unit='Kib'>12345</memory> + <partition cells='6'/> + </vnuma> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.args new file mode 100644 index 000000000000..3758c10a2e18 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.args @@ -0,0 +1,125 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 2097152 \ +-realtime mlock=off \ +-smp 1,maxcpus=128,sockets=32,cores=4,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=68719476736,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=0,cpus=0,cpus=32,cpus=64,cpus=96,memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=68719476736,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=1,cpus=1,cpus=33,cpus=65,cpus=97,memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=68719476736,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=2,cpus=2,cpus=34,cpus=66,cpus=98,memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=68719476736,host-nodes=3,\ +policy=bind \ +-numa node,nodeid=3,cpus=3,cpus=35,cpus=67,cpus=99,memdev=ram-node3 \ +-object memory-backend-ram,id=ram-node4,size=68719476736,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=4,cpus=4,cpus=36,cpus=68,cpus=100,memdev=ram-node4 \ +-object memory-backend-ram,id=ram-node5,size=68719476736,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=5,cpus=5,cpus=37,cpus=69,cpus=101,memdev=ram-node5 \ +-object memory-backend-ram,id=ram-node6,size=68719476736,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=6,cpus=6,cpus=38,cpus=70,cpus=102,memdev=ram-node6 \ +-object memory-backend-ram,id=ram-node7,size=68719476736,host-nodes=3,\ +policy=bind \ +-numa node,nodeid=7,cpus=7,cpus=39,cpus=71,cpus=103,memdev=ram-node7 \ +-object memory-backend-ram,id=ram-node8,size=68719476736,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=8,cpus=8,cpus=40,cpus=72,cpus=104,memdev=ram-node8 \ +-object memory-backend-ram,id=ram-node9,size=68719476736,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=9,cpus=9,cpus=41,cpus=73,cpus=105,memdev=ram-node9 \ +-object memory-backend-ram,id=ram-node10,size=68719476736,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=10,cpus=10,cpus=42,cpus=74,cpus=106,memdev=ram-node10 \ +-object memory-backend-ram,id=ram-node11,size=68719476736,host-nodes=3,\ +policy=bind \ +-numa node,nodeid=11,cpus=11,cpus=43,cpus=75,cpus=107,memdev=ram-node11 \ +-object memory-backend-ram,id=ram-node12,size=68719476736,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=12,cpus=12,cpus=44,cpus=76,cpus=108,memdev=ram-node12 \ +-object memory-backend-ram,id=ram-node13,size=68719476736,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=13,cpus=13,cpus=45,cpus=77,cpus=109,memdev=ram-node13 \ +-object memory-backend-ram,id=ram-node14,size=68719476736,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=14,cpus=14,cpus=46,cpus=78,cpus=110,memdev=ram-node14 \ +-object memory-backend-ram,id=ram-node15,size=68719476736,host-nodes=3,\ +policy=bind \ +-numa node,nodeid=15,cpus=15,cpus=47,cpus=79,cpus=111,memdev=ram-node15 \ +-object memory-backend-ram,id=ram-node16,size=68719476736,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=16,cpus=16,cpus=48,cpus=80,cpus=112,memdev=ram-node16 \ +-object memory-backend-ram,id=ram-node17,size=68719476736,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=17,cpus=17,cpus=49,cpus=81,cpus=113,memdev=ram-node17 \ +-object memory-backend-ram,id=ram-node18,size=68719476736,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=18,cpus=18,cpus=50,cpus=82,cpus=114,memdev=ram-node18 \ +-object memory-backend-ram,id=ram-node19,size=68719476736,host-nodes=3,\ +policy=bind \ +-numa node,nodeid=19,cpus=19,cpus=51,cpus=83,cpus=115,memdev=ram-node19 \ +-object memory-backend-ram,id=ram-node20,size=68719476736,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=20,cpus=20,cpus=52,cpus=84,cpus=116,memdev=ram-node20 \ +-object memory-backend-ram,id=ram-node21,size=68719476736,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=21,cpus=21,cpus=53,cpus=85,cpus=117,memdev=ram-node21 \ +-object memory-backend-ram,id=ram-node22,size=68719476736,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=22,cpus=22,cpus=54,cpus=86,cpus=118,memdev=ram-node22 \ +-object memory-backend-ram,id=ram-node23,size=68719476736,host-nodes=3,\ +policy=bind \ +-numa node,nodeid=23,cpus=23,cpus=55,cpus=87,cpus=119,memdev=ram-node23 \ +-object memory-backend-ram,id=ram-node24,size=68719476736,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=24,cpus=24,cpus=56,cpus=88,cpus=120,memdev=ram-node24 \ +-object memory-backend-ram,id=ram-node25,size=68719476736,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=25,cpus=25,cpus=57,cpus=89,cpus=121,memdev=ram-node25 \ +-object memory-backend-ram,id=ram-node26,size=68719476736,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=26,cpus=26,cpus=58,cpus=90,cpus=122,memdev=ram-node26 \ +-object memory-backend-ram,id=ram-node27,size=68719476736,host-nodes=3,\ +policy=bind \ +-numa node,nodeid=27,cpus=27,cpus=59,cpus=91,cpus=123,memdev=ram-node27 \ +-object memory-backend-ram,id=ram-node28,size=68719476736,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=28,cpus=28,cpus=60,cpus=92,cpus=124,memdev=ram-node28 \ +-object memory-backend-ram,id=ram-node29,size=68719476736,host-nodes=1,\ +policy=bind \ +-numa node,nodeid=29,cpus=29,cpus=61,cpus=93,cpus=125,memdev=ram-node29 \ +-object memory-backend-ram,id=ram-node30,size=68719476736,host-nodes=2,\ +policy=bind \ +-numa node,nodeid=30,cpus=30,cpus=62,cpus=94,cpus=126,memdev=ram-node30 \ +-object memory-backend-ram,id=ram-node31,size=68719476736,host-nodes=3,\ +policy=bind \ +-numa node,nodeid=31,cpus=31,cpus=63,cpus=95,cpus=127,memdev=ram-node31 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.xml new file mode 100644 index 000000000000..7d89a6d05303 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-round-robin.xml @@ -0,0 +1,21 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='TiB'>2</memory> + <vcpu placement='static'>128</vcpu> + <vnuma mode='node' distribution='round-robin'> + <partition cells='32'/> + </vnuma> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.args new file mode 100644 index 000000000000..7b7cb522073b --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.args @@ -0,0 +1,32 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 4096 \ +-realtime mlock=off \ +-smp 1,maxcpus=16,sockets=1,cores=16,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=4294967296,host-nodes=0,\ +policy=bind \ +-numa node,nodeid=0,cpus=0-15,memdev=ram-node0 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.xml new file mode 100644 index 000000000000..8fd65ac571c9 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-node-siblings.xml @@ -0,0 +1,23 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <vnuma mode='node' distribution='siblings'> + <memory unit='GiB'>4</memory> + <partition nodeset='0' cells='1'/> + </vnuma> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.args new file mode 100644 index 000000000000..8628e8be6c71 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.args @@ -0,0 +1,37 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 216 \ +-realtime mlock=off \ +-smp 1,maxcpus=16,sockets=4,cores=4,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=56623104,host-nodes=0,policy=bind \ +-numa node,nodeid=0,cpus=0,cpus=4,cpus=8,cpus=12,memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=56623104,host-nodes=1,policy=bind \ +-numa node,nodeid=1,cpus=1,cpus=5,cpus=9,cpus=13,memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=56623104,host-nodes=2,policy=bind \ +-numa node,nodeid=2,cpus=2,cpus=6,cpus=10,cpus=14,memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=56623104,host-nodes=3,policy=bind \ +-numa node,nodeid=3,cpus=3,cpus=7,cpus=11,cpus=15,memdev=ram-node3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.xml new file mode 100644 index 000000000000..d4795c549f62 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-round-robin.xml @@ -0,0 +1,22 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <vnuma mode='host' distribution='round-robin'> + <memory unit='KiB'>219100</memory> + </vnuma> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.args new file mode 100644 index 000000000000..6a59cf7b44e6 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.args @@ -0,0 +1,37 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 216 \ +-realtime mlock=off \ +-smp 1,maxcpus=16,sockets=4,cores=4,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=56623104,host-nodes=0,policy=bind \ +-numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=56623104,host-nodes=1,policy=bind \ +-numa node,nodeid=1,cpus=4-7,memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=56623104,host-nodes=2,policy=bind \ +-numa node,nodeid=2,cpus=8-11,memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=56623104,host-nodes=3,policy=bind \ +-numa node,nodeid=3,cpus=12-15,memdev=ram-node3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.xml new file mode 100644 index 000000000000..9cf3794a4c11 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa-siblings.xml @@ -0,0 +1,20 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <vnuma mode='host' distribution='siblings'/> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa.args b/tests/qemuxml2argvdata/cpu-host-passthrough-numa.args new file mode 100644 index 000000000000..6a59cf7b44e6 --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa.args @@ -0,0 +1,37 @@ +LC_ALL=C \ +PATH=/bin \ +HOME=/tmp/lib/domain--1-QEMUGuest1 \ +USER=test \ +LOGNAME=test \ +XDG_DATA_HOME=/tmp/lib/domain--1-QEMUGuest1/.local/share \ +XDG_CACHE_HOME=/tmp/lib/domain--1-QEMUGuest1/.cache \ +XDG_CONFIG_HOME=/tmp/lib/domain--1-QEMUGuest1/.config \ +QEMU_AUDIO_DRV=none \ +/usr/bin/qemu-system-x86_64 \ +-name QEMUGuest1 \ +-S \ +-machine pc,accel=kvm,usb=off,dump-guest-core=off \ +-cpu host \ +-m 216 \ +-realtime mlock=off \ +-smp 1,maxcpus=16,sockets=4,cores=4,threads=1 \ +-object memory-backend-ram,id=ram-node0,size=56623104,host-nodes=0,policy=bind \ +-numa node,nodeid=0,cpus=0-3,memdev=ram-node0 \ +-object memory-backend-ram,id=ram-node1,size=56623104,host-nodes=1,policy=bind \ +-numa node,nodeid=1,cpus=4-7,memdev=ram-node1 \ +-object memory-backend-ram,id=ram-node2,size=56623104,host-nodes=2,policy=bind \ +-numa node,nodeid=2,cpus=8-11,memdev=ram-node2 \ +-object memory-backend-ram,id=ram-node3,size=56623104,host-nodes=3,policy=bind \ +-numa node,nodeid=3,cpus=12-15,memdev=ram-node3 \ +-uuid c7a5fdbd-edaf-9455-926a-d65c16db1809 \ +-display none \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,path=/tmp/lib/domain--1-QEMUGuest1/monitor.sock,\ +server,nowait \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc \ +-no-shutdown \ +-no-acpi \ +-usb \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 diff --git a/tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml b/tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml new file mode 100644 index 000000000000..eae23a5ea24f --- /dev/null +++ b/tests/qemuxml2argvdata/cpu-host-passthrough-numa.xml @@ -0,0 +1,20 @@ +<domain type='kvm'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static'>16</vcpu> + <vnuma mode='host'/> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu mode='host-passthrough' check='none'/> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 122e14b07175..9554ac0c6fc3 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -1732,6 +1732,16 @@ mymain(void) ARG_FLAGS, FLAG_SKIP_LEGACY_CPUS | FLAG_EXPECT_FAILURE, ARG_QEMU_CAPS, NONE); DO_TEST("cpu-host-passthrough", QEMU_CAPS_KVM); + DO_TEST("cpu-host-passthrough-numa", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-numa-contiguous", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-numa-node-contiguous", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-numa-siblings", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-numa-node-siblings", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-numa-round-robin", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-numa-node-round-robin", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-numa-interleave", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-numa-node-interleave", QEMU_CAPS_NUMA, QEMU_CAPS_OBJECT_MEMORY_RAM); + DO_TEST("cpu-host-passthrough-nonuma", QEMU_CAPS_NUMA); DO_TEST_FAILURE("cpu-qemu-host-passthrough", QEMU_CAPS_KVM); qemuTestSetHostArch(driver.caps, VIR_ARCH_S390X); -- 2.21.0

On Mon, Oct 21, 2019 at 09:21:04PM +0200, Wim Ten Have wrote:
From: Wim ten Have <wim.ten.have@oracle.com>
This patch extends guest domain administration by adding a feature that creates a guest with a NUMA layout, also referred to as vNUMA (Virtual NUMA).
Errr, that feature already exists. You can create a guest NUMA layout with this: <domain> <cpu> ... <numa> <cell id='0' cpus='0-3' memory='512000' unit='KiB' discard='yes'/> <cell id='1' cpus='4-7' memory='512000' unit='KiB' memAccess='shared'/> </numa> ... </cpu> </domain> [snip]
The changes brought by this patch series add a new libvirt domain element named <vnuma> that allows for dynamic 'host' or 'node' partitioning of a guest where libvirt inspects the host capabilities and renders a best guest XML design holding a host matching vNUMA topology.
<domain> .. <vnuma mode='host|node' distribution='contiguous|siblings|round-robin|interleave'> <memory unit='KiB'>524288</memory> <partition nodeset="1-4,^3" cells="8"/> </vnuma> .. </domain>
The content of this <vnuma> element causes libvirt to dynamically partition the guest domain XML into a 'host' or 'node' numa model
Under <vnuma mode='host' ... > the guest domain is automatically partitioned according to the "host" capabilities.
Under <vnuma mode='node' ... > the guest domain is partitioned according to the nodeset and cells under the vnuma partition subelement.
The optional <vnuma> attribute distribution='type' is to indicate the guest numa cell cpus distribution. This distribution='type' can have the following values: - 'contiguous' delivery, under which the cpus enumerate sequentially over the numa defined cells. - 'siblings' cpus are distributed over the numa cells matching the host CPU SMT model. - 'round-robin' cpus are distributed over the numa cells matching the host CPU topology. - 'interleave' cpus are interleaved one at a time over the numa cells.
The optional subelement <memory> specifies the memory size reserved for the guest to dimension its <numa> <cell id> size. If no memory is specified, the <vnuma> <memory> setting is acquired from the guest's total memory, <domain> <memory> setting.
This seems to be just implementing some specific policies to automagically configure the NUMA config. This is all already possible for the mgmt apps todo with the existing XML configs we expose AFAIK. Libvirt's goal is to /not/ implement specific policies like this, but instead expose the mechanism for apps to use to define policies as they see fit. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Daniel P. Berrangé
-
Wim Ten Have