From: Wim ten Have <wim.ten.have(a)oracle.com>
Add libvirtd NUMA cell domain administration functionality to
describe underlying cell id sibling distances in full fashion
when configuring HVM guests.
Schema updates are made to docs/schemas/cputypes.rng enforcing domain
administration to follow the syntax below the numa cell id and
docs/schemas/basictypes.rng to add "numaDistanceValue".
A minimum value of 10 representing the LOCAL_DISTANCE as 0-9 are
reserved values and can not be used as System Locality Distance Information.
A value of 20 represents the default setting of REMOTE_DISTANCE
where a maximum value of 255 represents UNREACHABLE.
Effectively any cell sibling can be assigned a distance value where
practically 'LOCAL_DISTANCE <= value <= UNREACHABLE'.
[below is an example of a 4 node setup]
<cpu>
<numa>
<cell id='0' cpus='0' memory='2097152'
unit='KiB'>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='21'/>
<sibling id='2' value='31'/>
<sibling id='3' value='41'/>
</distances>
</cell>
<cell id='1' cpus='1' memory='2097152'
unit='KiB'>
<distances>
<sibling id='0' value='21'/>
<sibling id='1' value='10'/>
<sibling id='2' value='31'/>
<sibling id='3' value='41'/>
</distances>
</cell>
<cell id='2' cpus='2' memory='2097152'
unit='KiB'>
<distances>
<sibling id='0' value='31'/>
<sibling id='1' value='21'/>
<sibling id='2' value='10'/>
<sibling id='3' value='21'/>
</distances>
<cell id='3' cpus='3' memory='2097152'
unit='KiB'>
<distances>
<sibling id='0' value='41'/>
<sibling id='1' value='31'/>
<sibling id='2' value='21'/>
<sibling id='3' value='10'/>
</distances>
</cell>
</numa>
</cpu>
Whenever a sibling id the cell LOCAL_DISTANCE does apply and for any
sibling id not being covered a default of REMOTE_DISTANCE is used
for internal computations.
Signed-off-by: Wim ten Have <wim.ten.have(a)oracle.com>
---
Changes on v1:
- Add changes to docs/formatdomain.html.in describing schema update.
Changes on v2:
- Automatically apply distance symmetry maintaining cell <-> sibling.
- Check for maximum '255' on numaDistanceValue.
- Automatically complete empty distance ranges.
- Check that sibling_id's are in range with cell identifiers.
- Allow non-contiguous ranges, starting from any node id.
- Respect parameters as ATTRIBUTE_NONNULL fix functions and callers.
- Add and apply topology for LOCAL_DISTANCE=10 and REMOTE_DISTANCE=20.
Changes on v3
- Add UNREACHABLE if one locality is unreachable from another.
- Add code cleanup aligning function naming in a separated patch.
- Add numa related driver code in a separated patch.
- Remove <choice> from numaDistanceValue schema/basictypes.rng
- Correct doc changes.
---
docs/formatdomain.html.in | 63 +++++++++++++-
docs/schemas/basictypes.rng | 7 ++
docs/schemas/cputypes.rng | 18 ++++
src/conf/numa_conf.c | 200 +++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 284 insertions(+), 4 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 8ca7637..c453d44 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -1529,7 +1529,68 @@
</p>
<p>
- This guest NUMA specification is currently available only for QEMU/KVM.
+ This guest NUMA specification is currently available only for
+ QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct
+ description of NUMA arranged <code>sibling</code>
<code>cell</code>
+ <code>distances</code> <span class="since">Since
3.6.0</span>.
+ </p>
+
+ <p>
+ Under NUMA h/w architecture, distinct resources such as memory
+ create a designated distance between <code>cell</code> and
+ <code>siblings</code> that now can be described with the help of
+ <code>distances</code>. A detailed description can be found within
+ the ACPI (Advanced Configuration and Power Interface Specification)
+ within the chapter explaining the system's SLIT (System Locality
+ Distance Information Table).
+ </p>
+
+<pre>
+...
+<cpu>
+ ...
+ <numa>
+ <cell id='0' cpus='0,4-7' memory='512000'
unit='KiB'>
+ <distances>
+ <sibling id='0' value='10'/>
+ <sibling id='1' value='21'/>
+ <sibling id='2' value='31'/>
+ <sibling id='3' value='41'/>
+ </distances>
+ </cell>
+ <cell id='1' cpus='1,8-10,12-15' memory='512000'
unit='KiB' memAccess='shared'>
+ <distances>
+ <sibling id='0' value='21'/>
+ <sibling id='1' value='10'/>
+ <sibling id='2' value='21'/>
+ <sibling id='3' value='31'/>
+ </distances>
+ </cell>
+ <cell id='2' cpus='2,11' memory='512000'
unit='KiB' memAccess='shared'>
+ <distances>
+ <sibling id='0' value='31'/>
+ <sibling id='1' value='21'/>
+ <sibling id='2' value='10'/>
+ <sibling id='3' value='21'/>
+ </distances>
+ </cell>
+ <cell id='3' cpus='3' memory='512000'
unit='KiB'>
+ <distances>
+ <sibling id='0' value='41'/>
+ <sibling id='1' value='31'/>
+ <sibling id='2' value='21'/>
+ <sibling id='3' value='10'/>
+ </distances>
+ </cell>
+ </numa>
+ ...
+</cpu>
+...</pre>
+
+ <p>
+ Under Xen driver, if no <code>distances</code> are given to describe
+ the SLIT data between different cells, it will default to a scheme
+ using 10 for local and 20 for remote distances.
</p>
<h3><a id="elementsEvents">Events
configuration</a></h3>
diff --git a/docs/schemas/basictypes.rng b/docs/schemas/basictypes.rng
index 1ea667c..1a18cd3 100644
--- a/docs/schemas/basictypes.rng
+++ b/docs/schemas/basictypes.rng
@@ -77,6 +77,13 @@
</choice>
</define>
+ <define name="numaDistanceValue">
+ <data type="unsignedInt">
+ <param name="minInclusive">10</param>
+ <param name="maxInclusive">255</param>
+ </data>
+ </define>
+
<define name="pciaddress">
<optional>
<attribute name="domain">
diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng
index 3eef16a..c45b6df 100644
--- a/docs/schemas/cputypes.rng
+++ b/docs/schemas/cputypes.rng
@@ -129,6 +129,24 @@
</choice>
</attribute>
</optional>
+ <optional>
+ <element name="distances">
+ <oneOrMore>
+ <ref name="numaDistance"/>
+ </oneOrMore>
+ </element>
+ </optional>
+ </element>
+ </define>
+
+ <define name="numaDistance">
+ <element name="sibling">
+ <attribute name="id">
+ <ref name="unsignedInt"/>
+ </attribute>
+ <attribute name="value">
+ <ref name="numaDistanceValue"/>
+ </attribute>
</element>
</define>
diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c
index b71dc01..5db4311 100644
--- a/src/conf/numa_conf.c
+++ b/src/conf/numa_conf.c
@@ -29,6 +29,15 @@
#include "virnuma.h"
#include "virstring.h"
+/*
+ * Distance definitions defined Conform ACPI 2.0 SLIT.
+ * See include/linux/topology.h
+ */
+#define LOCAL_DISTANCE 10
+#define REMOTE_DISTANCE 20
+/* SLIT entry value is a one-byte unsigned integer. */
+#define UNREACHABLE 255
+
#define VIR_FROM_THIS VIR_FROM_DOMAIN
VIR_ENUM_IMPL(virDomainNumatuneMemMode,
@@ -48,6 +57,8 @@ VIR_ENUM_IMPL(virDomainMemoryAccess, VIR_DOMAIN_MEMORY_ACCESS_LAST,
"shared",
"private")
+typedef struct _virDomainNumaDistance virDomainNumaDistance;
+typedef virDomainNumaDistance *virDomainNumaDistancePtr;
typedef struct _virDomainNumaNode virDomainNumaNode;
typedef virDomainNumaNode *virDomainNumaNodePtr;
@@ -66,6 +77,12 @@ struct _virDomainNuma {
virBitmapPtr nodeset; /* host memory nodes where this guest node resides */
virDomainNumatuneMemMode mode; /* memory mode selection */
virDomainMemoryAccess memAccess; /* shared memory access configuration */
+
+ struct _virDomainNumaDistance {
+ unsigned int value; /* locality value for node i->j or j->i */
+ unsigned int cellid;
+ } *distances; /* remote node distances */
+ size_t ndistances;
} *mem_nodes; /* guest node configuration */
size_t nmem_nodes;
@@ -686,6 +703,153 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune,
}
+static int
+virDomainNumaDefNodeDistanceParseXML(virDomainNumaPtr def,
+ xmlXPathContextPtr ctxt,
+ unsigned int cur_cell)
+{
+ int ret = -1;
+ int sibling;
+ char *tmp = NULL;
+ xmlNodePtr *nodes = NULL;
+ size_t i, ndistances = def->nmem_nodes;
+
+ if (!ndistances)
+ return 0;
+
+ /* check if NUMA distances definition is present */
+ if (!virXPathNode("./distances[1]", ctxt))
+ return 0;
+
+ if ((sibling = virXPathNodeSet("./distances[1]/sibling", ctxt, &nodes))
<= 0) {
+ virReportError(VIR_ERR_XML_ERROR, "%s",
+ _("NUMA distances defined without siblings"));
+ goto cleanup;
+ }
+
+ for (i = 0; i < sibling; i++) {
+ virDomainNumaDistancePtr ldist, rdist;
+ unsigned int sibling_id, sibling_value;
+
+ /* siblings are in order of parsing or explicitly numbered */
+ if (!(tmp = virXMLPropString(nodes[i], "id"))) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Missing 'id' attribute in NUMA "
+ "distances under 'cell id %d'"),
+ cur_cell);
+ goto cleanup;
+ }
+
+ /* The "id" needs to be applicable */
+ if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Invalid 'id' attribute in NUMA "
+ "distances for sibling: '%s'"),
+ tmp);
+ goto cleanup;
+ }
+ VIR_FREE(tmp);
+
+ /* The "id" needs to be within numa/cell range */
+ if (sibling_id >= ndistances) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("There is no cell administrated matching "
+ "'sibling_id %d' under NUMA 'cell id
%d' "),
+ sibling_id, cur_cell);
+ goto cleanup;
+ }
+
+ /* We need a locality value. Check and correct
+ * distance to local and distance to remote node.
+ */
+ if (!(tmp = virXMLPropString(nodes[i], "value"))) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Missing 'value' attribute in NUMA distances
"
+ "under 'cell id %d' for 'sibling id
%d'"),
+ cur_cell, sibling_id);
+ goto cleanup;
+ }
+
+ /* The "value" needs to be applicable */
+ if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Invalid 'value' attribute in NUMA "
+ "distances for value: '%s'"),
+ tmp);
+ goto cleanup;
+ }
+ VIR_FREE(tmp);
+
+ /* LOCAL_DISTANCE <= "value" <= UNREACHABLE */
+ if (sibling_value < LOCAL_DISTANCE ||
+ sibling_value > UNREACHABLE) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Out of range value '%d' set for "
+ "'sibling id %d' under NUMA 'cell id
%d' "),
+ sibling_value, sibling_id, cur_cell);
+ goto cleanup;
+ }
+
+ ldist = def->mem_nodes[cur_cell].distances;
+ if (!ldist) {
+ if (def->mem_nodes[cur_cell].ndistances) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Invalid 'ndistances' set in NUMA "
+ "distances for sibling id: '%d'"),
+ cur_cell);
+ goto cleanup;
+ }
+
+ if (VIR_ALLOC_N(ldist, ndistances) < 0)
+ goto cleanup;
+
+ if (!ldist[cur_cell].value)
+ ldist[cur_cell].value = LOCAL_DISTANCE;
+ ldist[cur_cell].cellid = cur_cell;
+ def->mem_nodes[cur_cell].ndistances = ndistances;
+ }
+
+ ldist[sibling_id].cellid = sibling_id;
+ ldist[sibling_id].value = sibling_value;
+ def->mem_nodes[cur_cell].distances = ldist;
+
+ rdist = def->mem_nodes[sibling_id].distances;
+ if (!rdist) {
+ if (def->mem_nodes[sibling_id].ndistances) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Invalid 'ndistances' set in NUMA "
+ "distances for sibling id: '%d'"),
+ sibling_id);
+ goto cleanup;
+ }
+
+ if (VIR_ALLOC_N(rdist, ndistances) < 0)
+ goto cleanup;
+
+ if (!rdist[sibling_id].value)
+ rdist[sibling_id].value = LOCAL_DISTANCE;
+ rdist[sibling_id].cellid = sibling_id;
+ def->mem_nodes[sibling_id].ndistances = ndistances;
+ }
+
+ rdist[cur_cell].cellid = cur_cell;
+ rdist[cur_cell].value = sibling_value;
+ def->mem_nodes[sibling_id].distances = rdist;
+ }
+
+ ret = 0;
+
+ cleanup:
+ if (ret) {
+ for (i = 0; i < ndistances; i++)
+ VIR_FREE(def->mem_nodes[i].distances);
+ }
+ VIR_FREE(nodes);
+ VIR_FREE(tmp);
+
+ return ret;
+}
+
int
virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
xmlXPathContextPtr ctxt)
@@ -694,7 +858,7 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
xmlNodePtr oldNode = ctxt->node;
char *tmp = NULL;
int n;
- size_t i;
+ size_t i, j;
int ret = -1;
/* check if NUMA definition is present */
@@ -712,7 +876,6 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
def->nmem_nodes = n;
for (i = 0; i < n; i++) {
- size_t j;
int rc;
unsigned int cur_cell = i;
@@ -788,6 +951,10 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
def->mem_nodes[cur_cell].memAccess = rc;
VIR_FREE(tmp);
}
+
+ /* Parse NUMA distances info */
+ if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0)
+ goto cleanup;
}
ret = 0;
@@ -815,6 +982,8 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf,
virBufferAddLit(buf, "<numa>\n");
virBufferAdjustIndent(buf, 2);
for (i = 0; i < ncells; i++) {
+ int ndistances;
+
memAccess = virDomainNumaGetNodeMemoryAccessMode(def, i);
if (!(cpustr = virBitmapFormat(virDomainNumaGetNodeCpumask(def, i))))
@@ -829,7 +998,32 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf,
if (memAccess)
virBufferAsprintf(buf, " memAccess='%s'",
virDomainMemoryAccessTypeToString(memAccess));
- virBufferAddLit(buf, "/>\n");
+
+ ndistances = def->mem_nodes[i].ndistances;
+ if (!ndistances) {
+ virBufferAddLit(buf, "/>\n");
+ } else {
+ size_t j;
+ virDomainNumaDistancePtr distances = def->mem_nodes[i].distances;
+
+ virBufferAddLit(buf, ">\n");
+ virBufferAdjustIndent(buf, 2);
+ virBufferAddLit(buf, "<distances>\n");
+ virBufferAdjustIndent(buf, 2);
+ for (j = 0; j < ndistances; j++) {
+ if (distances[j].value) {
+ virBufferAddLit(buf, "<sibling");
+ virBufferAsprintf(buf, " id='%d'",
distances[j].cellid);
+ virBufferAsprintf(buf, " value='%d'",
distances[j].value);
+ virBufferAddLit(buf, "/>\n");
+ }
+ }
+ virBufferAdjustIndent(buf, -2);
+ virBufferAddLit(buf, "</distances>\n");
+ virBufferAdjustIndent(buf, -2);
+ virBufferAddLit(buf, "</cell>\n");
+ }
+
VIR_FREE(cpustr);
}
virBufferAdjustIndent(buf, -2);
--
2.9.5