[libvirt] [PATCH v2 0/2] Expose distance between host NUMA nodes

diff to v1: -slight change in the XML format -added description to distances value Michal Privoznik (2): virnuma: Introduce virNumaGetDistances virCaps: Expose distance between host NUMA nodes docs/schemas/capability.rng | 15 +++++++++++ src/conf/capabilities.c | 19 +++++++++++++- src/conf/capabilities.h | 13 +++++++++- src/libvirt_private.syms | 1 + src/libxl/libxl_conf.c | 4 +-- src/nodeinfo.c | 61 ++++++++++++++++++++++++++++++++++++++++++--- src/test/test_driver.c | 2 +- src/util/virnuma.c | 55 ++++++++++++++++++++++++++++++++++++++++ src/util/virnuma.h | 3 +++ src/xen/xend_internal.c | 3 ++- tests/vircapstest.c | 4 +-- 11 files changed, 169 insertions(+), 11 deletions(-) -- 2.0.0

The API gets a NUMA node and find distances to other nodes. The distances are returned in an array. If an item X within the array equals to value of zero, then there's no such node as X. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnuma.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virnuma.h | 3 +++ 3 files changed, 59 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 4c1f61c..d73a9f5 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1652,6 +1652,7 @@ virNodeSuspendGetTargetMask; virDomainNumatuneMemModeTypeFromString; virDomainNumatuneMemModeTypeToString; virNumaGetAutoPlacementAdvice; +virNumaGetDistances; virNumaGetMaxNode; virNumaGetNodeMemory; virNumaIsAvailable; diff --git a/src/util/virnuma.c b/src/util/virnuma.c index 1e099eb..e8ceec8 100644 --- a/src/util/virnuma.c +++ b/src/util/virnuma.c @@ -217,6 +217,61 @@ virNumaGetMaxNode(void) /** + * virNumaGetDistances: + * @node: identifier of the requested NUMA node + * @distances: array of distances to sibling nodes + * @ndistances: size of @distances + * + * Get array of distances to sibling nodes from @node. If a + * distances[x] equals to zero, the node x is not enabled or + * doesn't exist. As a special case, if @node itself refers to + * disabled or nonexistent NUMA node, then @distances and + * @ndistances are set to NULL and zero respectively. + * + * The distances are a bit of magic. For a local node the value + * is 10, for remote it's typically 20 meaning that time penalty + * for accessing a remote node is two time bigger than when + * accessing a local node. + * + * Returns 0 on success, -1 otherwise. + */ +int +virNumaGetDistances(int node, + int **distances, + int *ndistances) +{ + int ret = -1; + int max_node; + size_t i; + + if (!numa_bitmask_isbitset(numa_nodes_ptr, node)) { + VIR_DEBUG("Node %d does not exist", node); + *distances = NULL; + *ndistances = 0; + return 0; + } + + if ((max_node = virNumaGetMaxNode()) < 0) + goto cleanup; + + if (VIR_ALLOC_N(*distances, max_node) < 0) + goto cleanup; + + *ndistances = max_node + 1; + + for (i = 0; i<= max_node; i++) { + if (!numa_bitmask_isbitset(numa_nodes_ptr, i)) + continue; + + (*distances)[i] = numa_distance(node, i); + } + + ret = 0; + cleanup: + return ret; +} + +/** * virNumaGetNodeMemory: * @node: identifier of the requested NUMA node * @memsize: returns the total size of memory in the NUMA node diff --git a/src/util/virnuma.h b/src/util/virnuma.h index 8464b19..fe1e966 100644 --- a/src/util/virnuma.h +++ b/src/util/virnuma.h @@ -58,6 +58,9 @@ int virNumaSetupMemoryPolicy(virNumaTuneDef numatune, bool virNumaIsAvailable(void); int virNumaGetMaxNode(void); +int virNumaGetDistances(int node, + int **distances, + int *ndistances); int virNumaGetNodeMemory(int node, unsigned long long *memsize, unsigned long long *memfree); -- 2.0.0

On Tue, Jun 03, 2014 at 03:26:58PM +0200, Michal Privoznik wrote:
The API gets a NUMA node and find distances to other nodes. The distances are returned in an array. If an item X within the array equals to value of zero, then there's no such node as X.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virnuma.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virnuma.h | 3 +++ 3 files changed, 59 insertions(+)
ACK Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

If user or management application wants to create a guest, it may be useful to know the cost of internode latencies before the guest resources are pinned. For example: <capabilities> <host> ... <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>4004132</memory> <distances> <sibling id='0' value='10'/> <sibling id='1' value='20'/> </distances> <cpus num='2'> <cpu id='0' socket_id='0' core_id='0' siblings='0'/> <cpu id='2' socket_id='0' core_id='2' siblings='2'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>4030064</memory> <distances> <sibling id='0' value='20'/> <sibling id='1' value='10'/> </distances> <cpus num='2'> <cpu id='1' socket_id='0' core_id='0' siblings='1'/> <cpu id='3' socket_id='0' core_id='2' siblings='3'/> </cpus> </cell> </cells> </topology> ... </host> ... </capabilities> we can see the distance from node1 to node0 is 20 and within nodes 10. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/schemas/capability.rng | 15 +++++++++++ src/conf/capabilities.c | 19 +++++++++++++- src/conf/capabilities.h | 13 +++++++++- src/libxl/libxl_conf.c | 4 +-- src/nodeinfo.c | 61 ++++++++++++++++++++++++++++++++++++++++++--- src/test/test_driver.c | 2 +- src/xen/xend_internal.c | 3 ++- tests/vircapstest.c | 4 +-- 8 files changed, 110 insertions(+), 11 deletions(-) diff --git a/docs/schemas/capability.rng b/docs/schemas/capability.rng index d2d9776..0c95c05 100644 --- a/docs/schemas/capability.rng +++ b/docs/schemas/capability.rng @@ -189,6 +189,21 @@ </optional> <optional> + <element name='distances'> + <zeroOrMore> + <element name='sibling'> + <attribute name='id'> + <ref name='unsignedInt'/> + </attribute> + <attribute name='value'> + <ref name='unsignedInt'/> + </attribute> + </element> + </zeroOrMore> + </element> + </optional> + + <optional> <element name='cpus'> <attribute name='num'> <ref name='unsignedInt'/> diff --git a/src/conf/capabilities.c b/src/conf/capabilities.c index cf474d7..9971eb2 100644 --- a/src/conf/capabilities.c +++ b/src/conf/capabilities.c @@ -107,6 +107,7 @@ virCapabilitiesFreeHostNUMACell(virCapsHostNUMACellPtr cell) virCapabilitiesClearHostNUMACellCPUTopology(cell->cpus, cell->ncpus); VIR_FREE(cell->cpus); + VIR_FREE(cell->siblings); VIR_FREE(cell); } @@ -286,8 +287,10 @@ int virCapabilitiesAddHostNUMACell(virCapsPtr caps, int num, int ncpus, + int nsiblings, unsigned long long mem, - virCapsHostNUMACellCPUPtr cpus) + virCapsHostNUMACellCPUPtr cpus, + virCapsHostNUMACellSiblingInfoPtr siblings) { virCapsHostNUMACellPtr cell; @@ -302,6 +305,8 @@ virCapabilitiesAddHostNUMACell(virCapsPtr caps, cell->num = num; cell->mem = mem; cell->cpus = cpus; + cell->siblings = siblings; + cell->nsiblings = nsiblings; caps->host.numaCell[caps->host.nnumaCell++] = cell; @@ -766,6 +771,18 @@ virCapabilitiesFormatNUMATopology(virBufferPtr buf, virBufferAsprintf(buf, "<memory unit='KiB'>%llu</memory>\n", cells[i]->mem); + if (cells[i]->nsiblings) { + virBufferAddLit(buf, "<distances>\n"); + virBufferAdjustIndent(buf, 2); + for (j = 0; j < cells[i]->nsiblings; j++) { + virBufferAsprintf(buf, "<sibling id='%d' value='%d'/>\n", + cells[i]->siblings[j].node, + cells[i]->siblings[j].distance); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</distances>\n"); + } + virBufferAsprintf(buf, "<cpus num='%d'>\n", cells[i]->ncpus); virBufferAdjustIndent(buf, 2); for (j = 0; j < cells[i]->ncpus; j++) { diff --git a/src/conf/capabilities.h b/src/conf/capabilities.h index ba99e1a..5da31a8 100644 --- a/src/conf/capabilities.h +++ b/src/conf/capabilities.h @@ -95,6 +95,13 @@ struct _virCapsHostNUMACellCPU { virBitmapPtr siblings; }; +typedef struct _virCapsHostNUMACellSiblingInfo virCapsHostNUMACellSiblingInfo; +typedef virCapsHostNUMACellSiblingInfo *virCapsHostNUMACellSiblingInfoPtr; +struct _virCapsHostNUMACellSiblingInfo { + int node; /* foreign NUMA node */ + unsigned int distance; /* distance to the node */ +}; + typedef struct _virCapsHostNUMACell virCapsHostNUMACell; typedef virCapsHostNUMACell *virCapsHostNUMACellPtr; struct _virCapsHostNUMACell { @@ -102,6 +109,8 @@ struct _virCapsHostNUMACell { int ncpus; unsigned long long mem; /* in kibibytes */ virCapsHostNUMACellCPUPtr cpus; + int nsiblings; + virCapsHostNUMACellSiblingInfoPtr siblings; }; typedef struct _virCapsHostSecModelLabel virCapsHostSecModelLabel; @@ -194,8 +203,10 @@ extern int virCapabilitiesAddHostNUMACell(virCapsPtr caps, int num, int ncpus, + int nsiblings, unsigned long long mem, - virCapsHostNUMACellCPUPtr cpus); + virCapsHostNUMACellCPUPtr cpus, + virCapsHostNUMACellSiblingInfoPtr siblings); extern int diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c index 14742de..9f35249 100644 --- a/src/libxl/libxl_conf.c +++ b/src/libxl/libxl_conf.c @@ -208,8 +208,8 @@ libxlCapsInitNuma(libxl_ctx *ctx, virCapsPtr caps) continue; if (virCapabilitiesAddHostNUMACell(caps, i, nr_cpus_node[i], - numa_info[i].size / 1024, - cpus[i]) < 0) { + 0, numa_info[i].size / 1024, + cpus[i], NULL) < 0) { virCapabilitiesClearHostNUMACellCPUTopology(cpus[i], nr_cpus_node[i]); goto cleanup; diff --git a/src/nodeinfo.c b/src/nodeinfo.c index 56f2b02..f76a91e 100644 --- a/src/nodeinfo.c +++ b/src/nodeinfo.c @@ -1644,9 +1644,9 @@ nodeCapsInitNUMAFake(virCapsPtr caps ATTRIBUTE_UNUSED) } if (virCapabilitiesAddHostNUMACell(caps, 0, - ncpus, + ncpus, 0, nodeinfo.memory, - cpus) < 0) + cpus, NULL) < 0) goto error; return 0; @@ -1748,6 +1748,53 @@ virNodeCapsFillCPUInfo(int cpu_id ATTRIBUTE_UNUSED, #endif } +static int +virNodeCapsGetSiblingInfo(int node, + virCapsHostNUMACellSiblingInfoPtr *siblings, + int *nsiblings) +{ + virCapsHostNUMACellSiblingInfoPtr tmp = NULL; + int tmp_size = 0; + int ret = -1; + int *distances = NULL; + int ndistances = 0; + size_t i; + + if (virNumaGetDistances(node, &distances, &ndistances) < 0) + goto cleanup; + + if (!distances) { + *siblings = NULL; + *nsiblings = 0; + return 0; + } + + if (VIR_ALLOC_N(tmp, ndistances) < 0) + goto cleanup; + + for (i = 0; i < ndistances; i++) { + if (!distances[i]) + continue; + + tmp[tmp_size].node = i; + tmp[tmp_size].distance = distances[i]; + tmp_size++; + } + + if (VIR_REALLOC_N(tmp, tmp_size) < 0) + goto cleanup; + + *siblings = tmp; + *nsiblings = tmp_size; + tmp = NULL; + tmp_size = 0; + ret = 0; + cleanup: + VIR_FREE(distances); + VIR_FREE(tmp); + return ret; +} + int nodeCapsInitNUMA(virCapsPtr caps) { @@ -1755,6 +1802,8 @@ nodeCapsInitNUMA(virCapsPtr caps) unsigned long long memory; virCapsHostNUMACellCPUPtr cpus = NULL; virBitmapPtr cpumap = NULL; + virCapsHostNUMACellSiblingInfoPtr siblings = NULL; + int nsiblings; int ret = -1; int ncpus = 0; int cpu; @@ -1794,14 +1843,19 @@ nodeCapsInitNUMA(virCapsPtr caps) } } + if (virNodeCapsGetSiblingInfo(n, &siblings, &nsiblings) < 0) + goto cleanup; + /* Detect the amount of memory in the numa cell in KiB */ virNumaGetNodeMemory(n, &memory, NULL); memory >>= 10; - if (virCapabilitiesAddHostNUMACell(caps, n, ncpus, memory, cpus) < 0) + if (virCapabilitiesAddHostNUMACell(caps, n, ncpus, nsiblings, + memory, cpus, siblings) < 0) goto cleanup; cpus = NULL; + siblings = NULL; } ret = 0; @@ -1812,6 +1866,7 @@ nodeCapsInitNUMA(virCapsPtr caps) virBitmapFree(cpumap); VIR_FREE(cpus); + VIR_FREE(siblings); if (ret < 0) VIR_FREE(cpus); diff --git a/src/test/test_driver.c b/src/test/test_driver.c index 37756e7..b204e7d 100644 --- a/src/test/test_driver.c +++ b/src/test/test_driver.c @@ -337,7 +337,7 @@ testBuildCapabilities(virConnectPtr conn) if (virCapabilitiesAddHostNUMACell(caps, i, privconn->cells[i].numCpus, - 0, cpu_cells) < 0) + 0, 0, cpu_cells, NULL) < 0) goto error; } diff --git a/src/xen/xend_internal.c b/src/xen/xend_internal.c index c30b136..f4c824e 100644 --- a/src/xen/xend_internal.c +++ b/src/xen/xend_internal.c @@ -1100,7 +1100,8 @@ sexpr_to_xend_topology(const struct sexpr *root, virCapsPtr caps) } virBitmapFree(cpuset); - if (virCapabilitiesAddHostNUMACell(caps, cell, nb_cpus, 0, cpuInfo) < 0) + if (virCapabilitiesAddHostNUMACell(caps, cell, nb_cpus, + 0, 0, cpuInfo, NULL) < 0) goto error; cpuInfo = NULL; } diff --git a/tests/vircapstest.c b/tests/vircapstest.c index 821a92b..9df074a 100644 --- a/tests/vircapstest.c +++ b/tests/vircapstest.c @@ -64,9 +64,9 @@ buildNUMATopology(int seq) id++; if (virCapabilitiesAddHostNUMACell(caps, cell_id + seq, - MAX_CPUS_IN_CELL, + MAX_CPUS_IN_CELL, 0, MAX_MEM_IN_CELL, - cell_cpus) < 0) + cell_cpus, NULL) < 0) goto error; cell_cpus = NULL; -- 2.0.0

On Tue, Jun 03, 2014 at 03:26:59PM +0200, Michal Privoznik wrote:
If user or management application wants to create a guest, it may be useful to know the cost of internode latencies before the guest resources are pinned. For example:
<capabilities>
<host> ... <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>4004132</memory> <distances> <sibling id='0' value='10'/> <sibling id='1' value='20'/> </distances> <cpus num='2'> <cpu id='0' socket_id='0' core_id='0' siblings='0'/> <cpu id='2' socket_id='0' core_id='2' siblings='2'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>4030064</memory> <distances> <sibling id='0' value='20'/> <sibling id='1' value='10'/> </distances> <cpus num='2'> <cpu id='1' socket_id='0' core_id='0' siblings='1'/> <cpu id='3' socket_id='0' core_id='2' siblings='3'/> </cpus> </cell> </cells> </topology> ... </host> ... </capabilities>
we can see the distance from node1 to node0 is 20 and within nodes 10.
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- docs/schemas/capability.rng | 15 +++++++++++ src/conf/capabilities.c | 19 +++++++++++++- src/conf/capabilities.h | 13 +++++++++- src/libxl/libxl_conf.c | 4 +-- src/nodeinfo.c | 61 ++++++++++++++++++++++++++++++++++++++++++--- src/test/test_driver.c | 2 +- src/xen/xend_internal.c | 3 ++- tests/vircapstest.c | 4 +-- 8 files changed, 110 insertions(+), 11 deletions(-)
I notice our docs for the capabilities XML are non-existant. Separately it'd be nice to start fixing this.
@@ -766,6 +771,18 @@ virCapabilitiesFormatNUMATopology(virBufferPtr buf, virBufferAsprintf(buf, "<memory unit='KiB'>%llu</memory>\n", cells[i]->mem);
+ if (cells[i]->nsiblings) { + virBufferAddLit(buf, "<distances>\n"); + virBufferAdjustIndent(buf, 2); + for (j = 0; j < cells[i]->nsiblings; j++) { + virBufferAsprintf(buf, "<sibling id='%d' value='%d'/>\n", + cells[i]->siblings[j].node, + cells[i]->siblings[j].distance); + } + virBufferAdjustIndent(buf, -2); + virBufferAddLit(buf, "</distances>\n"); + }
Indentation is a bit off - should be 4 space indent rather than 2
@@ -194,8 +203,10 @@ extern int virCapabilitiesAddHostNUMACell(virCapsPtr caps, int num, int ncpus, + int nsiblings,
I'd have a slight prefence for 'nsiblings' to be immediately before the 'siblings' parameter, to make it more obvious that ther're related. While you're changing it, I'd probably also shuffle 'mem' before 'ncpus' so thjat 'ncpus' and 'cpus' are adjacent.
unsigned long long mem, - virCapsHostNUMACellCPUPtr cpus); + virCapsHostNUMACellCPUPtr cpus, + virCapsHostNUMACellSiblingInfoPtr siblings);
extern int
ACK with those nits fixed Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (2)
-
Daniel P. Berrange
-
Michal Privoznik