On Thu, Aug 31, 2017 at 04:02:38PM +0200, Wim Ten Have wrote:
From: Wim ten Have <wim.ten.have(a)oracle.com>
I haven't seen the previous versions, so sorry if I point out something
that got changed already.
Add libvirtd NUMA cell domain administration functionality to
describe underlying cell id sibling distances in full fashion
when configuring HVM guests.
[below is an example of a 4 node setup]
<cpu>
<numa>
<cell id='0' cpus='0' memory='2097152'
unit='KiB'>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='21'/>
<sibling id='2' value='31'/>
<sibling id='3' value='41'/>
</distances>
</cell>
<cell id='1' cpus='1' memory='2097152'
unit='KiB'>
<distances>
<sibling id='0' value='21'/>
<sibling id='1' value='10'/>
<sibling id='2' value='31'/>
<sibling id='3' value='41'/>
</distances>
</cell>
<cell id='2' cpus='2' memory='2097152'
unit='KiB'>
<distances>
<sibling id='0' value='31'/>
<sibling id='1' value='21'/>
<sibling id='2' value='10'/>
<sibling id='3' value='21'/>
</distances>
<cell id='3' cpus='3' memory='2097152'
unit='KiB'>
<distances>
<sibling id='0' value='41'/>
<sibling id='1' value='31'/>
<sibling id='2' value='21'/>
<sibling id='3' value='10'/>
</distances>
</cell>
</numa>
</cpu>
Are all those required? I don't see the point in requiring duplicate
values.
Changes under this commit concern all those that require adding
the valid data structures, virDomainNuma* functional routines and the
XML doc schema enhancements to enforce appropriate administration.
I don't understand the point of this paragraph.
These changes are accompanied with topic related documentation under
docs/formatdomain.html within the "CPU model and topology" extending the
"Guest NUMA topology" paragraph.
This can be seen from the patch.
For terminology we refer to sockets as "nodes" where access
to each
others' distinct resources such as memory make them "siblings" with a
designated "distance" between them. A specific design is described under
the ACPI (Advanced Configuration and Power Interface Specification)
within the chapter explaining the system's SLIT (System Locality Distance
Information Table).
The above paragraph is also being added in the docs.
These patches extend core libvirt's XML description of a virtual
machine's
hardware to include NUMA distance information for sibling nodes, which
is then passed to Xen guests via libxl. Recently qemu landed support for
constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
distance for different NUMA nodes"), hence these core libvirt extensions
can also help other drivers in supporting this feature.
This is not true, libxl changes are in separate patch. There's a lot of
copy-paste from the cover-letter...
These changes alter the docs/schemas/cputypes.rng enforcing
domain administration to follow the syntax below per numa cell id.
These changes also alter docs/schemas/basictypes.rng to add
"numaDistanceValue" which is an "unsignedInt" with a minimum value
of 10 as 0-9 are reserved values and can not be used as System
Locality Distance Information Table data.
Signed-off-by: Wim ten Have <wim.ten.have(a)oracle.com>
---
Changes on v1:
- Add changes to docs/formatdomain.html.in describing schema update.
Changes on v2:
- Automatically apply distance symmetry maintaining cell <-> sibling.
- Check for maximum '255' on numaDistanceValue.
- Automatically complete empty distance ranges.
- Check that sibling_id's are in range with cell identifiers.
- Allow non-contiguous ranges, starting from any node id.
- Respect parameters as ATTRIBUTE_NONNULL fix functions and callers.
- Add and apply topoly for LOCAL_DISTANCE=10 and REMOTE_DISTANCE=20.
---
docs/formatdomain.html.in | 70 +++++++++-
docs/schemas/basictypes.rng | 9 ++
docs/schemas/cputypes.rng | 18 +++
src/conf/cpu_conf.c | 2 +-
src/conf/numa_conf.c | 323 +++++++++++++++++++++++++++++++++++++++++++-
src/conf/numa_conf.h | 25 +++-
src/libvirt_private.syms | 6 +
7 files changed, 444 insertions(+), 9 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 8ca7637..19a2ac7 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -1529,7 +1529,75 @@
</p>
<p>
- This guest NUMA specification is currently available only for QEMU/KVM.
+ This guest NUMA specification is currently available only for
+ QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct
+ description of NUMA arranged <code>sibling</code>
<code>cell</code>
+ <code>distances</code> <span class="since">Since
3.6.0</span>.
+ </p>
+
+ <p>
+ Under NUMA h/w architecture, distinct resources such as memory
+ create a designated distance between <code>cell</code> and
+ <code>siblings</code> that now can be described with the help of
+ <code>distances</code>. A detailed describtion can be found within
+ the ACPI (Advanced Configuration and Power Interface Specification)
+ within the chapter explaining the system's SLIT (System Locality
+ Distance Information Table).
+ </p>
+
+<pre>
+...
+<cpu>
+ ...
+ <numa>
+ <cell id='0' cpus='0,4-7' memory='512000'
unit='KiB'>
+ <distances>
+ <sibling id='0' value='10'/>
+ <sibling id='1' value='21'/>
+ <sibling id='2' value='31'/>
+ <sibling id='3' value='41'/>
+ </distances>
+ </cell>
+ <cell id='1' cpus='1,8-10,12-15' memory='512000'
unit='KiB' memAccess='shared'>
+ <distances>
+ <sibling id='0' value='21'/>
+ <sibling id='1' value='10'/>
+ <sibling id='2' value='21'/>
+ <sibling id='3' value='31'/>
+ </distances>
+ </cell>
+ <cell id='2' cpus='2,11' memory='512000'
unit='KiB' memAccess='shared'>
+ <distances>
+ <sibling id='0' value='31'/>
+ <sibling id='1' value='21'/>
+ <sibling id='2' value='10'/>
+ <sibling id='3' value='21'/>
+ </distances>
+ </cell>
+ <cell id='3' cpus='3' memory='512000'
unit='KiB'>
+ <distances>
+ <sibling id='0' value='41'/>
+ <sibling id='1' value='31'/>
+ <sibling id='2' value='21'/>
+ <sibling id='3' value='10'/>
+ </distances>
+ </cell>
+ </numa>
+ ...
+</cpu>
+...</pre>
+
+ <p>
+ Under Xen driver, if no <code>distances</code> are given to describe
+ the SLIT data between different cells, it will default to a scheme
+ using 10 for local and 20 for remote distances. Made defaults
+ for guest OS when no SLIT is specified.
+ <br/>See include/linux/topology.h under
+ <pre>
+ /* Conform to ACPI 2.0 SLIT distance definitions */
+ #define LOCAL_DISTANCE 10
+ #define REMOTE_DISTANCE 20
+ </pre>
I wouldn't include the code in the <pre>, I don't see the point for that.
</p>
<h3><a id="elementsEvents">Events
configuration</a></h3>
diff --git a/docs/schemas/basictypes.rng b/docs/schemas/basictypes.rng
index 1ea667c..bbef282 100644
--- a/docs/schemas/basictypes.rng
+++ b/docs/schemas/basictypes.rng
@@ -77,6 +77,15 @@
</choice>
</define>
+ <define name="numaDistanceValue">
+ <choice>
+ <data type="unsignedInt">
+ <param name="minInclusive">10</param>
+ <param name="maxInclusive">255</param>
+ </data>
+ </choice>
Why <choice>?
+ </define>
+
<define name="pciaddress">
<optional>
<attribute name="domain">
diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng
index 3eef16a..c45b6df 100644
--- a/docs/schemas/cputypes.rng
+++ b/docs/schemas/cputypes.rng
@@ -129,6 +129,24 @@
</choice>
</attribute>
</optional>
+ <optional>
+ <element name="distances">
+ <oneOrMore>
+ <ref name="numaDistance"/>
+ </oneOrMore>
+ </element>
+ </optional>
+ </element>
+ </define>
+
+ <define name="numaDistance">
+ <element name="sibling">
+ <attribute name="id">
+ <ref name="unsignedInt"/>
+ </attribute>
+ <attribute name="value">
+ <ref name="numaDistanceValue"/>
+ </attribute>
</element>
</define>
diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c
index c21d11d..8d804a1 100644
--- a/src/conf/cpu_conf.c
+++ b/src/conf/cpu_conf.c
@@ -642,7 +642,7 @@ virCPUDefFormatBufFull(virBufferPtr buf,
if (virCPUDefFormatBuf(&childrenBuf, def, updateCPU) < 0)
goto cleanup;
- if (virDomainNumaDefCPUFormat(&childrenBuf, numa) < 0)
+ if (virDomainNumaDefCPUFormatXML(&childrenBuf, numa) < 0)
goto cleanup;
Changing function names should be separate patch. Why is this
changed anyway?
if (virBufferCheckError(&attributeBuf) < 0 ||
diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c
index bfd3703..fb2a74c 100644
--- a/src/conf/numa_conf.c
+++ b/src/conf/numa_conf.c
@@ -29,6 +29,13 @@
#include "virnuma.h"
#include "virstring.h"
+/*
+ * Distance definitions defined Conform ACPI 2.0 SLIT.
+ * See include/linux/topology.h
+ */
+#define LOCAL_DISTANCE 10
+#define REMOTE_DISTANCE 20
+
#define VIR_FROM_THIS VIR_FROM_DOMAIN
VIR_ENUM_IMPL(virDomainNumatuneMemMode,
@@ -48,6 +55,8 @@ VIR_ENUM_IMPL(virDomainMemoryAccess, VIR_DOMAIN_MEMORY_ACCESS_LAST,
"shared",
"private")
+typedef struct _virDomainNumaDistance virDomainNumaDistance;
+typedef virDomainNumaDistance *virDomainNumaDistancePtr;
typedef struct _virDomainNumaNode virDomainNumaNode;
typedef virDomainNumaNode *virDomainNumaNodePtr;
@@ -66,6 +75,12 @@ struct _virDomainNuma {
virBitmapPtr nodeset; /* host memory nodes where this guest node resides */
virDomainNumatuneMemMode mode; /* memory mode selection */
virDomainMemoryAccess memAccess; /* shared memory access configuration */
+
+ struct _virDomainNumaDistance {
+ unsigned int value; /* locality value for node i->j or j->i */
+ unsigned int cellid;
+ } *distances; /* remote node distances */
+ size_t ndistances;
} *mem_nodes; /* guest node configuration */
size_t nmem_nodes;
@@ -686,6 +701,128 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune,
}
+static int
+virDomainNumaDefNodeDistanceParseXML(virDomainNumaPtr def,
+ xmlXPathContextPtr ctxt,
+ unsigned int cur_cell)
+{
+ int ret = -1;
+ int sibling;
+ char *tmp = NULL;
+ xmlNodePtr *nodes = NULL;
+ size_t i, ndistances = def->nmem_nodes;
+
+ if (!ndistances)
+ return 0;
+
+ if (!virXPathNode("./distances[1]/sibling", ctxt))
+ return 0;
+
+ if ((sibling = virXPathNodeSet("./distances[1]/sibling", ctxt, &nodes))
<= 0) {
+ virReportError(VIR_ERR_XML_ERROR, "%s",
+ _("NUMA distances defined without siblings"));
+ goto cleanup;
+ }
+
Aren't these two blocks of code checking the same thing?
+ for (i = 0; i < sibling; i++) {
+ virDomainNumaDistancePtr ldist, rdist;
+ unsigned int sibling_id, sibling_value;
+
+ /* siblings are in order of parsing or explicitly numbered */
+ if ((tmp = virXMLPropString(nodes[i], "id"))) {
+ if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Invalid 'id' attribute in NUMA distances
for sibling: '%s'"),
+ tmp);
+ goto cleanup;
+ }
+ }
+ VIR_FREE(tmp);
+
+ if (sibling_id >= ndistances) {
Isn't sibling_id uninitialized in case there is no 'id' specified?
+ virReportError(VIR_ERR_XML_ERROR,
+ _("No cell administration for 'sibling_id %d'
under NUMA cell '%d' "),
+ sibling_id, cur_cell);
+ goto cleanup;
+ }
+
+ /* We need a locality value. Check and correct
+ * distance to local and distance to remote node.
+ */
+ if (!(tmp = virXMLPropString(nodes[i], "value"))) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Missing 'value' attribute in NUMA distances
for sibling id: '%d'"),
+ sibling_id);
+ goto cleanup;
+ }
+
+ /* It needs to be applicable */
+ if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Invalid 'value' attribute in NUMA distances
for sibling id: '%d'"),
+ sibling_id);
+ goto cleanup;
+ }
+ VIR_FREE(tmp);
+
+ ldist = def->mem_nodes[cur_cell].distances;
+ if (!ldist) {
+ if (def->mem_nodes[cur_cell].ndistances) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Erratic 'ndistances' set in NUMA distances
for sibling id: '%d'"),
+ cur_cell);
+ goto cleanup;
+ }
+
+ if (VIR_ALLOC_N(ldist, ndistances) < 0)
+ goto cleanup;
+
+ if (!ldist[cur_cell].value)
+ ldist[cur_cell].value = LOCAL_DISTANCE;
+ ldist[cur_cell].cellid = cur_cell;
+ def->mem_nodes[cur_cell].ndistances = ndistances;
+ }
+
+ ldist[sibling_id].cellid = sibling_id;
+ ldist[sibling_id].value = sibling_value;
I don't see the check for 10 <= sibling_value <= 255 anywhere.
+ def->mem_nodes[cur_cell].distances = ldist;
+
+ rdist = def->mem_nodes[sibling_id].distances;
+ if (!rdist) {
+ if (def->mem_nodes[sibling_id].ndistances) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Erratic 'ndistances' set in NUMA distances
for sibling id: '%d'"),
+ sibling_id);
+ goto cleanup;
+ }
+
+ if (VIR_ALLOC_N(rdist, ndistances) < 0)
+ goto cleanup;
+
+ if (!rdist[sibling_id].value)
+ rdist[sibling_id].value = LOCAL_DISTANCE;
+ rdist[sibling_id].cellid = sibling_id;
+ def->mem_nodes[sibling_id].ndistances = ndistances;
+ }
+
+ rdist[cur_cell].cellid = cur_cell;
+ rdist[cur_cell].value = sibling_value;
+ def->mem_nodes[sibling_id].distances = rdist;
+ }
+
+ ret = 0;
+
+ cleanup:
+ if (ret) {
+ for (i = 0; i < ndistances; i++)
+ VIR_FREE(def->mem_nodes[i].distances);
+ }
+ VIR_FREE(nodes);
+ VIR_FREE(tmp);
+
+ return ret;
+}
+
int
virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
xmlXPathContextPtr ctxt)
@@ -694,7 +831,7 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
xmlNodePtr oldNode = ctxt->node;
char *tmp = NULL;
int n;
- size_t i;
+ size_t i, j;
int ret = -1;
/* check if NUMA definition is present */
@@ -712,7 +849,6 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
def->nmem_nodes = n;
for (i = 0; i < n; i++) {
- size_t j;
int rc;
unsigned int cur_cell = i;
@@ -788,6 +924,10 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
def->mem_nodes[cur_cell].memAccess = rc;
VIR_FREE(tmp);
}
+
+ /* Parse NUMA distances info */
+ if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0)
+ goto cleanup;
}
ret = 0;
@@ -801,8 +941,8 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
int
-virDomainNumaDefCPUFormat(virBufferPtr buf,
- virDomainNumaPtr def)
+virDomainNumaDefCPUFormatXML(virBufferPtr buf,
+ virDomainNumaPtr def)
Same here with the rename.
{
virDomainMemoryAccess memAccess;
char *cpustr;
@@ -815,6 +955,8 @@ virDomainNumaDefCPUFormat(virBufferPtr buf,
virBufferAddLit(buf, "<numa>\n");
virBufferAdjustIndent(buf, 2);
for (i = 0; i < ncells; i++) {
+ int ndistances;
+
memAccess = virDomainNumaGetNodeMemoryAccessMode(def, i);
if (!(cpustr = virBitmapFormat(virDomainNumaGetNodeCpumask(def, i))))
@@ -829,7 +971,32 @@ virDomainNumaDefCPUFormat(virBufferPtr buf,
if (memAccess)
virBufferAsprintf(buf, " memAccess='%s'",
virDomainMemoryAccessTypeToString(memAccess));
- virBufferAddLit(buf, "/>\n");
+
+ ndistances = def->mem_nodes[i].ndistances;
+ if (!ndistances) {
+ virBufferAddLit(buf, "/>\n");
+ } else {
+ size_t j;
+ virDomainNumaDistancePtr distances = def->mem_nodes[i].distances;
+
+ virBufferAddLit(buf, ">\n");
+ virBufferAdjustIndent(buf, 2);
+ virBufferAddLit(buf, "<distances>\n");
+ virBufferAdjustIndent(buf, 2);
+ for (j = 0; j < ndistances; j++) {
+ if (distances[j].value) {
+ virBufferAddLit(buf, "<sibling");
+ virBufferAsprintf(buf, " id='%d'",
distances[j].cellid);
+ virBufferAsprintf(buf, " value='%d'",
distances[j].value);
+ virBufferAddLit(buf, "/>\n");
+ }
+ }
+ virBufferAdjustIndent(buf, -2);
+ virBufferAddLit(buf, "</distances>\n");
+ virBufferAdjustIndent(buf, -2);
+ virBufferAddLit(buf, "</cell>\n");
+ }
+
VIR_FREE(cpustr);
}
virBufferAdjustIndent(buf, -2);
@@ -922,13 +1089,146 @@ virDomainNumaCheckABIStability(virDomainNumaPtr src,
size_t
virDomainNumaGetNodeCount(virDomainNumaPtr numa)
{
- if (!numa)
+ if (!numa || !numa->mem_nodes)
Just "why?" ^^
return 0;
return numa->nmem_nodes;
}
+size_t
+virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes)
+{
+ if (!nmem_nodes) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("Cannot set an empty nmem_nodes set"));
+ return 0;
+ }
+
+ if (numa->mem_nodes) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("Cannot alter an existing nmem_nodes set"));
+ return 0;
+ }
+
+ if (VIR_ALLOC_N(numa->mem_nodes, nmem_nodes) < 0)
+ return 0;
+
+ numa->nmem_nodes = nmem_nodes;
+
+ return numa->nmem_nodes;
+}
+
Are you introducing a function that is not used anywhere as a part of a
patch that does something else as well? I can't see the point of this
function really... It looks like you are guarding against user input
which doesn't really makes sense here for me. Maybe I need to see
where it is used but it's not in this patch...
+size_t
+virDomainNumaGetNodeDistance(virDomainNumaPtr numa,
+ size_t node,
+ size_t cellid)
+{
+ virDomainNumaDistancePtr distances = NULL;
+
+ if (node < numa->nmem_nodes)
+ distances = numa->mem_nodes[node].distances;
+
+ /*
+ * Present the configured distance value. If
+ * out of range or not available set the platform
+ * defined default for local and remote nodes.
+ */
+ if (!distances ||
+ !distances[cellid].value ||
+ !numa->mem_nodes[node].ndistances)
+ return (node == cellid) ? \
+ LOCAL_DISTANCE : REMOTE_DISTANCE;
+
+ return distances[cellid].value;
+}
+
+
+int
+virDomainNumaSetNodeDistance(virDomainNumaPtr numa,
+ size_t node,
+ size_t cellid,
+ unsigned int value)
+{
+ virDomainNumaDistancePtr distances;
+
+ if (node >= numa->nmem_nodes) {
+ virReportError(VIR_ERR_INTERNAL_ERROR,
+ _("Argument 'node' %zu outranges defined number
"
+ "of NUMA nodes"), node);
+ return -1;
+ }
+
+ distances = numa->mem_nodes[node].distances;
+ if (!distances ||
+ cellid >= numa->mem_nodes[node].ndistances) {
+ virReportError(VIR_ERR_XML_ERROR, "%s",
+ _("Arguments under memnode element do not "
+ "correspond with existing guest's NUMA cell"));
+ return -1;
+ }
+
+ /*
+ * Advanced Configuration and Power Interface
+ * Specification version 6.1. Chapter 5.2.17
+ * System Locality Distance Information Table
+ * ... Distance values of 0-9 are reserved.
+ */
+ if (value < LOCAL_DISTANCE) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+ _("Distance value of %d is in 0-9 reserved range"),
+ value);
+ return -1;
+ }
+
+ if (value > LOCAL_DISTANCE && node == cellid) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+ _("Distance value %d under node %zu does "
+ "not meet LOCAL_DISTANCE of 10"),
Maybe you meant REMOTE_DISTANCE here? If there were tests it would be
visible. Similarly if this function was used anywhere in this patch. I
don't feel like reading further, other may continue the review.
Have a nice day,
Martin