[libvirt] [PATCH 0/5] qemu: process: Don't try to use NUMA nodes without memory from numad advice

Cgroups code fails if it's instructed to bind memory usage of a qemu process to a NUMA node which does not have any memory. First four patches are preliminary cleanups and the fix to keep the autoCpuset private data entry in the status XML necessary to modify autoNodeset in the future. The last patch modifies autoNodeset to be an intersection of the advice from numad with host NUMA node set containing memory, so that we don't ask for impossible things when setting up cgroups. Peter Krempa (5): util: bitmap: Modify virBitmapSubtract to virBitmapIntersect qemu: domain: Extract parsing and formatting of priv->autoNodeset qemu: domain: Store and restore autoCpuset to status XML qemu: process: Extract gathering of 'numad' placement into a function qemu: process: Don't put memoryless NUMA nodes into autoNodeset src/libvirt_private.syms | 2 +- src/qemu/qemu_domain.c | 102 ++++++++++++++++++++++++++++++++++++----------- src/qemu/qemu_process.c | 73 +++++++++++++++++++++++---------- src/util/virbitmap.c | 14 +++---- src/util/virbitmap.h | 2 +- tests/qemuxml2xmltest.c | 2 +- tests/virbitmaptest.c | 14 +++---- 7 files changed, 147 insertions(+), 62 deletions(-) -- 2.12.2

Since virBitmapSubtract is unused modify it to perform bitmap intersection. --- src/libvirt_private.syms | 2 +- src/util/virbitmap.c | 14 +++++++------- src/util/virbitmap.h | 2 +- tests/virbitmaptest.c | 14 +++++++------- 4 files changed, 16 insertions(+), 16 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 3f94521df..4ad1f7a60 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1295,6 +1295,7 @@ virBitmapEqual; virBitmapFormat; virBitmapFree; virBitmapGetBit; +virBitmapIntersect; virBitmapIsAllClear; virBitmapIsAllSet; virBitmapIsBitSet; @@ -1315,7 +1316,6 @@ virBitmapSetBit; virBitmapSetBitExpand; virBitmapSize; virBitmapString; -virBitmapSubtract; virBitmapToData; virBitmapToDataBuf; diff --git a/src/util/virbitmap.c b/src/util/virbitmap.c index eac63d997..a5077fe7c 100644 --- a/src/util/virbitmap.c +++ b/src/util/virbitmap.c @@ -1091,15 +1091,15 @@ virBitmapOverlaps(virBitmapPtr b1, } /** - * virBitmapSubtract: - * @a: minuend/result - * @b: subtrahend + * virBitmapIntersect: + * @a: bitmap, modified to contain result + * @b: bitmap * - * Performs bitwise subtraction: a = a - b + * Performs intersection of two bitmaps: a = intersect(a, b) */ void -virBitmapSubtract(virBitmapPtr a, - virBitmapPtr b) +virBitmapIntersect(virBitmapPtr a, + virBitmapPtr b) { size_t i; size_t max = a->map_len; @@ -1108,5 +1108,5 @@ virBitmapSubtract(virBitmapPtr a, max = b->map_len; for (i = 0; i < max; i++) - a->map[i] &= ~b->map[i]; + a->map[i] &= b->map[i]; } diff --git a/src/util/virbitmap.h b/src/util/virbitmap.h index 36282af1c..ffa3c42d7 100644 --- a/src/util/virbitmap.h +++ b/src/util/virbitmap.h @@ -143,7 +143,7 @@ bool virBitmapOverlaps(virBitmapPtr b1, virBitmapPtr b2) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); -void virBitmapSubtract(virBitmapPtr a, virBitmapPtr b) +void virBitmapIntersect(virBitmapPtr a, virBitmapPtr b) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); #endif diff --git a/tests/virbitmaptest.c b/tests/virbitmaptest.c index e5305d022..882c71544 100644 --- a/tests/virbitmaptest.c +++ b/tests/virbitmaptest.c @@ -589,7 +589,7 @@ test11(const void *opaque) virBitmapParse(data->res, &resmap, 256) < 0) goto cleanup; - virBitmapSubtract(amap, bmap); + virBitmapIntersect(amap, bmap); if (!virBitmapEqual(amap, resmap)) { fprintf(stderr, "\n bitmap subtraction failed: '%s'-'%s'!='%s'\n", @@ -700,13 +700,13 @@ mymain(void) ret = -1; virTestCounterReset("test11-"); - TESTBINARYOP("0", "0", "0,^0", test11); - TESTBINARYOP("0-3", "0", "1-3", test11); - TESTBINARYOP("0-3", "0,3", "1-2", test11); + TESTBINARYOP("0", "0", "0", test11); + TESTBINARYOP("0-3", "0", "0", test11); + TESTBINARYOP("0-3", "0,3", "0,3", test11); TESTBINARYOP("0,^0", "0", "0,^0", test11); - TESTBINARYOP("0-3", "0-3", "0,^0", test11); - TESTBINARYOP("0-3", "0,^0", "0-3", test11); - TESTBINARYOP("0,2", "1,3", "0,2", test11); + TESTBINARYOP("0-3", "0-3", "0-3", test11); + TESTBINARYOP("0-3", "0,^0", "0,^0", test11); + TESTBINARYOP("0,2", "1,3", "0,^0", test11); if (virTestRun("test12", test12, NULL) < 0) ret = -1; -- 2.12.2

On Wed, 2017-07-12 at 15:44 +0200, Peter Krempa wrote:
Since virBitmapSubtract is unused modify it to perform bitmap intersection.
Normally I would ask you to remove the old function in one patch and introduce the new one in another, but I can see you manage to reduce the test suite churn by squeezing both changes together, so I'm okay with it :)
diff --git a/tests/virbitmaptest.c b/tests/virbitmaptest.c index e5305d022..882c71544 100644 --- a/tests/virbitmaptest.c +++ b/tests/virbitmaptest.c @@ -589,7 +589,7 @@ test11(const void *opaque) virBitmapParse(data->res, &resmap, 256) < 0) goto cleanup; - virBitmapSubtract(amap, bmap); + virBitmapIntersect(amap, bmap); if (!virBitmapEqual(amap, resmap)) { fprintf(stderr, "\n bitmap subtraction failed: '%s'-'%s'!='%s'\n",
You'll want to update the error message as well. Reviewed-by: Andrea Bolognani <abologna@redhat.com> -- Andrea Bolognani / Red Hat / Virtualization

Move the code to separate functions to avoid complicating the existing ones with changes. --- src/qemu/qemu_domain.c | 84 +++++++++++++++++++++++++++++++++++--------------- 1 file changed, 60 insertions(+), 24 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 6772f1ae2..92e5609ad 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1761,6 +1761,29 @@ qemuDomainObjPrivateXMLFormatVcpus(virBufferPtr buf, static int +qemuDomainObjPtrivateXMLFormatAutomaticPlacement(virBufferPtr buf, + qemuDomainObjPrivatePtr priv) +{ + char *nodeset = NULL; + int ret = -1; + + if (!priv->autoNodeset) + return 0; + + if (!(nodeset = virBitmapFormat(priv->autoNodeset))) + goto cleanup; + + virBufferAsprintf(buf, "<numad nodeset='%s'/>\n", nodeset); + + ret = 0; + + cleanup: + VIR_FREE(nodeset); + return ret; +} + + +static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, virDomainObjPtr vm) { @@ -1869,15 +1892,8 @@ qemuDomainObjPrivateXMLFormat(virBufferPtr buf, virBufferAddLit(buf, "</devices>\n"); } - if (priv->autoNodeset) { - char *nodeset = virBitmapFormat(priv->autoNodeset); - - if (!nodeset) - return -1; - - virBufferAsprintf(buf, "<numad nodeset='%s'/>\n", nodeset); - VIR_FREE(nodeset); - } + if (qemuDomainObjPtrivateXMLFormatAutomaticPlacement(buf, priv) < 0) + return -1; /* Various per-domain paths */ virBufferEscapeString(buf, "<libDir path='%s'/>\n", priv->libDir); @@ -1936,6 +1952,40 @@ qemuDomainObjPrivateXMLParseVcpu(xmlNodePtr node, static int +qemuDomainObjPrivateXMLParseAutomaticPlacement(virQEMUDriverPtr driver, + qemuDomainObjPrivatePtr priv, + xmlXPathContextPtr ctxt) +{ + virCapsPtr caps = NULL; + char *nodeset; + int ret = -1; + + nodeset = virXPathString("string(./numad/@nodeset)", ctxt); + + if (!nodeset) + return 0; + + if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + goto cleanup; + + if (virBitmapParse(nodeset, &priv->autoNodeset, caps->host.nnumaCell_max) < 0) + goto cleanup; + + if (!(priv->autoCpuset = virCapabilitiesGetCpusForNodemask(caps, + priv->autoNodeset))) + goto cleanup; + + ret = 0; + + cleanup: + virObjectUnref(caps); + VIR_FREE(nodeset); + + return ret; +} + + +static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, virDomainObjPtr vm, virDomainDefParserConfigPtr config) @@ -1949,7 +1999,6 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, xmlNodePtr *nodes = NULL; xmlNodePtr node = NULL; virQEMUCapsPtr qemuCaps = NULL; - virCapsPtr caps = NULL; if (VIR_ALLOC(priv->monConfig) < 0) goto error; @@ -2133,21 +2182,9 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, } VIR_FREE(nodes); - if (!(caps = virQEMUDriverGetCapabilities(driver, false))) + if (qemuDomainObjPrivateXMLParseAutomaticPlacement(driver, priv, ctxt) < 0) goto error; - if ((tmp = virXPathString("string(./numad/@nodeset)", ctxt))) { - if (virBitmapParse(tmp, &priv->autoNodeset, - caps->host.nnumaCell_max) < 0) - goto error; - - if (!(priv->autoCpuset = virCapabilitiesGetCpusForNodemask(caps, - priv->autoNodeset))) - goto error; - } - virObjectUnref(caps); - VIR_FREE(tmp); - if ((tmp = virXPathString("string(./libDir/@path)", ctxt))) priv->libDir = tmp; if ((tmp = virXPathString("string(./channelTargetDir/@path)", ctxt))) @@ -2175,7 +2212,6 @@ qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, virStringListFree(priv->qemuDevices); priv->qemuDevices = NULL; virObjectUnref(qemuCaps); - virObjectUnref(caps); return -1; } -- 2.12.2

On Wed, 2017-07-12 at 15:44 +0200, Peter Krempa wrote:
static int +qemuDomainObjPrivateXMLParseAutomaticPlacement(virQEMUDriverPtr driver, + qemuDomainObjPrivatePtr priv, + xmlXPathContextPtr ctxt)
I think you should invert the first and last arguments to keep the signature closer to that of the calling function. Reviewed-by: Andrea Bolognani <abologna@redhat.com> -- Andrea Bolognani / Red Hat / Virtualization

Decouple them by storing them in the XML separately rather than regenerating them. This will simplify upcoming fixes. --- src/qemu/qemu_domain.c | 32 +++++++++++++++++++++++++------- tests/qemuxml2xmltest.c | 2 +- 2 files changed, 26 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 92e5609ad..8c29db15f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1765,20 +1765,30 @@ qemuDomainObjPtrivateXMLFormatAutomaticPlacement(virBufferPtr buf, qemuDomainObjPrivatePtr priv) { char *nodeset = NULL; + char *cpuset = NULL; int ret = -1; - if (!priv->autoNodeset) + if (!priv->autoNodeset && !priv->autoCpuset) return 0; - if (!(nodeset = virBitmapFormat(priv->autoNodeset))) + if (priv->autoNodeset && + !((nodeset = virBitmapFormat(priv->autoNodeset)))) goto cleanup; - virBufferAsprintf(buf, "<numad nodeset='%s'/>\n", nodeset); + if (priv->autoCpuset && + !((cpuset = virBitmapFormat(priv->autoCpuset)))) + goto cleanup; + + virBufferAddLit(buf, "<numad"); + virBufferEscapeString(buf, " nodeset='%s'", nodeset); + virBufferEscapeString(buf, " cpuset='%s'", cpuset); + virBufferAddLit(buf, "/>\n"); ret = 0; cleanup: VIR_FREE(nodeset); + VIR_FREE(cpuset); return ret; } @@ -1958,11 +1968,13 @@ qemuDomainObjPrivateXMLParseAutomaticPlacement(virQEMUDriverPtr driver, { virCapsPtr caps = NULL; char *nodeset; + char *cpuset; int ret = -1; nodeset = virXPathString("string(./numad/@nodeset)", ctxt); + cpuset = virXPathString("string(./numad/@cpuset)", ctxt); - if (!nodeset) + if (!nodeset && !cpuset) return 0; if (!(caps = virQEMUDriverGetCapabilities(driver, false))) @@ -1971,15 +1983,21 @@ qemuDomainObjPrivateXMLParseAutomaticPlacement(virQEMUDriverPtr driver, if (virBitmapParse(nodeset, &priv->autoNodeset, caps->host.nnumaCell_max) < 0) goto cleanup; - if (!(priv->autoCpuset = virCapabilitiesGetCpusForNodemask(caps, - priv->autoNodeset))) - goto cleanup; + if (cpuset) { + if (virBitmapParse(cpuset, &priv->autoCpuset, VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + } else { + if (!(priv->autoCpuset = virCapabilitiesGetCpusForNodemask(caps, + priv->autoNodeset))) + goto cleanup; + } ret = 0; cleanup: virObjectUnref(caps); VIR_FREE(nodeset); + VIR_FREE(cpuset); return ret; } diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 79347671f..24d1eb33c 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -95,7 +95,7 @@ static const char testStatusXMLPrefixFooter[] = " <device alias='net0'/>\n" " <device alias='usb'/>\n" " </devices>\n" -" <numad nodeset='0-2'/>\n" +" <numad nodeset='0-2' cpuset='1,3'/>\n" " <libDir path='/tmp'/>\n" " <channelTargetDir path='/tmp/channel'/>\n"; -- 2.12.2

On Wed, 2017-07-12 at 15:44 +0200, Peter Krempa wrote:
@@ -1765,20 +1765,30 @@ qemuDomainObjPtrivateXMLFormatAutomaticPlacement(virBufferPtr buf, qemuDomainObjPrivatePtr priv) { [...] - virBufferAsprintf(buf, "<numad nodeset='%s'/>\n", nodeset); + if (priv->autoCpuset && + !((cpuset = virBitmapFormat(priv->autoCpuset)))) + goto cleanup;
The previous implementation didn't format the <numad> element at all if there was nodeset, whereas the new one will always format it. You could add if (!nodeset && !cpuset) goto cleanup; here to restore that behavior, but maybe you just decided it's not worth it. Just felt like I should point that out.
+ virBufferAddLit(buf, "<numad"); + virBufferEscapeString(buf, " nodeset='%s'", nodeset); + virBufferEscapeString(buf, " cpuset='%s'", cpuset);
I had to look up the implementation of virBufferEscapeString() to figure out that nodeset or cpuset possibly being NULL is handled automatically by that function, which I found to be a pretty surprising behavior. Not a problem with your patch though :)
@@ -1958,11 +1968,13 @@ qemuDomainObjPrivateXMLParseAutomaticPlacement(virQEMUDriverPtr driver, { virCapsPtr caps = NULL; char *nodeset; + char *cpuset; int ret = -1; nodeset = virXPathString("string(./numad/@nodeset)", ctxt); + cpuset = virXPathString("string(./numad/@cpuset)", ctxt); - if (!nodeset) + if (!nodeset && !cpuset) return 0;
I don't think you want this hunk. With the new condition, you will perform an early exit only if both nodeset and cpuset are NULL; if nodeset is NULL but cpuset isn't, the first call to virBitmapParse() a few lines below will error out. But leaving the condition alone makes sure all scenarios are handled successfully. Reviewed-by: Andrea Bolognani <abologna@redhat.com> -- Andrea Bolognani / Red Hat / Virtualization

On Thu, Jul 20, 2017 at 14:12:44 +0200, Andrea Bolognani wrote:
On Wed, 2017-07-12 at 15:44 +0200, Peter Krempa wrote:
@@ -1765,20 +1765,30 @@ qemuDomainObjPtrivateXMLFormatAutomaticPlacement(virBufferPtr buf, qemuDomainObjPrivatePtr priv) { [...] - virBufferAsprintf(buf, "<numad nodeset='%s'/>\n", nodeset); + if (priv->autoCpuset && + !((cpuset = virBitmapFormat(priv->autoCpuset)))) + goto cleanup;
The previous implementation didn't format the <numad> element at all if there was nodeset, whereas the new one will always format it. You could add
if (!nodeset && !cpuset) goto cleanup;
If you call virBitmapFormat on an empty or NULL bitmap you still get a (empty) string allocated so that condition is basically identical to the one that's already there due to how the bitmaps are formatted: if (!priv->autoNodeset && !priv->autoCpuset) return 0; if (priv->autoNodeset && !((nodeset = virBitmapFormat(priv->autoNodeset)))) goto cleanup; if (priv->autoCpuset && !((cpuset = virBitmapFormat(priv->autoCpuset)))) goto cleanup;
here to restore that behavior, but maybe you just decided it's not worth it. Just felt like I should point that out.
+ virBufferAddLit(buf, "<numad"); + virBufferEscapeString(buf, " nodeset='%s'", nodeset); + virBufferEscapeString(buf, " cpuset='%s'", cpuset);
I had to look up the implementation of virBufferEscapeString() to figure out that nodeset or cpuset possibly being NULL is handled automatically by that function, which I found to be a pretty surprising behavior. Not a problem with your patch though :)
@@ -1958,11 +1968,13 @@ qemuDomainObjPrivateXMLParseAutomaticPlacement(virQEMUDriverPtr driver, { virCapsPtr caps = NULL; char *nodeset; + char *cpuset; int ret = -1; nodeset = virXPathString("string(./numad/@nodeset)", ctxt); + cpuset = virXPathString("string(./numad/@cpuset)", ctxt); - if (!nodeset) + if (!nodeset && !cpuset) return 0;
I don't think you want this hunk.
With the new condition, you will perform an early exit only if both nodeset and cpuset are NULL; if nodeset is NULL but cpuset isn't, the first call to virBitmapParse() a few lines below will error out.
That shouldn't ever happen (tm) :D I'll can add a condition that if nodeset is not in the XML the parsing will be skipped. So in that case only cpuset can be present (for future use). I'll also add a note that accessing autoNodeset in the else branch of if (cpuset) is safe.

On Thu, 2017-07-20 at 15:46 +0200, Peter Krempa wrote:
- virBufferAsprintf(buf, "<numad nodeset='%s'/>\n", nodeset); + if (priv->autoCpuset && + !((cpuset = virBitmapFormat(priv->autoCpuset)))) + goto cleanup;
The previous implementation didn't format the <numad> element at all if there was nodeset, whereas the new one will always format it. You could add
if (!nodeset && !cpuset) goto cleanup;
If you call virBitmapFormat on an empty or NULL bitmap you still get a (empty) string allocated so that condition is basically identical to the one that's already there due to how the bitmaps are formatted:
if (!priv->autoNodeset && !priv->autoCpuset) return 0;
if (priv->autoNodeset && !((nodeset = virBitmapFormat(priv->autoNodeset)))) goto cleanup;
if (priv->autoCpuset && !((cpuset = virBitmapFormat(priv->autoCpuset)))) goto cleanup;
Oh, you're right. Nevermind then.
- if (!nodeset) + if (!nodeset && !cpuset) return 0;
I don't think you want this hunk.
With the new condition, you will perform an early exit only if both nodeset and cpuset are NULL; if nodeset is NULL but cpuset isn't, the first call to virBitmapParse() a few lines below will error out.
That shouldn't ever happen (tm) :D
I'll can add a condition that if nodeset is not in the XML the parsing will be skipped. So in that case only cpuset can be present (for future use).
I'll also add a note that accessing autoNodeset in the else branch of if (cpuset) is safe.
Works for me :) -- Andrea Bolognani / Red Hat / Virtualization

Remove the code from qemuProcessPrepareDomain so that it won't get even more bloated. --- src/qemu/qemu_process.c | 61 ++++++++++++++++++++++++++++++++----------------- 1 file changed, 40 insertions(+), 21 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e6522a294..01fe33c92 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5410,6 +5410,44 @@ qemuProcessUpdateGuestCPU(virDomainDefPtr def, } +static int +qemuProcessPrepareDomainNUMAPlacement(virDomainObjPtr vm, + virCapsPtr caps) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + char *nodesetstr = NULL; + int ret = -1; + + /* Get the advisory nodeset from numad if 'placement' of + * either <vcpu> or <numatune> is 'auto'. + */ + if (!virDomainDefNeedsPlacementAdvice(vm->def)) + return 0; + + nodesetstr = virNumaGetAutoPlacementAdvice(virDomainDefGetVcpus(vm->def), + virDomainDefGetMemoryTotal(vm->def)); + + if (!nodesetstr) + goto cleanup; + + VIR_DEBUG("Nodeset returned from numad: %s", nodesetstr); + + if (virBitmapParse(nodesetstr, &priv->autoNodeset, + VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (!(priv->autoCpuset = virCapabilitiesGetCpusForNodemask(caps, + priv->autoNodeset))) + goto cleanup; + + ret = 0; + + cleanup: + VIR_FREE(nodesetstr); + return ret; +} + + /** * qemuProcessPrepareDomain * @@ -5430,7 +5468,6 @@ qemuProcessPrepareDomain(virConnectPtr conn, { int ret = -1; size_t i; - char *nodeset = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); virCapsPtr caps; @@ -5448,25 +5485,8 @@ qemuProcessPrepareDomain(virConnectPtr conn, } virDomainAuditSecurityLabel(vm, true); - /* Get the advisory nodeset from numad if 'placement' of - * either <vcpu> or <numatune> is 'auto'. - */ - if (virDomainDefNeedsPlacementAdvice(vm->def)) { - nodeset = virNumaGetAutoPlacementAdvice(virDomainDefGetVcpus(vm->def), - virDomainDefGetMemoryTotal(vm->def)); - if (!nodeset) - goto cleanup; - - VIR_DEBUG("Nodeset returned from numad: %s", nodeset); - - if (virBitmapParse(nodeset, &priv->autoNodeset, - VIR_DOMAIN_CPUMASK_LEN) < 0) - goto cleanup; - - if (!(priv->autoCpuset = virCapabilitiesGetCpusForNodemask(caps, - priv->autoNodeset))) - goto cleanup; - } + if (qemuProcessPrepareDomainNUMAPlacement(vm, caps) < 0) + goto cleanup; } /* Whether we should use virtlogd as stdio handler for character @@ -5537,7 +5557,6 @@ qemuProcessPrepareDomain(virConnectPtr conn, ret = 0; cleanup: - VIR_FREE(nodeset); virObjectUnref(caps); virObjectUnref(cfg); return ret; -- 2.12.2

On Wed, 2017-07-12 at 15:44 +0200, Peter Krempa wrote:
+static int +qemuProcessPrepareDomainNUMAPlacement(virDomainObjPtr vm, + virCapsPtr caps) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + char *nodesetstr = NULL; + int ret = -1; + + /* Get the advisory nodeset from numad if 'placement' of + * either <vcpu> or <numatune> is 'auto'. + */ + if (!virDomainDefNeedsPlacementAdvice(vm->def)) + return 0; + + nodesetstr = virNumaGetAutoPlacementAdvice(virDomainDefGetVcpus(vm->def), + virDomainDefGetMemoryTotal(vm->def)); + + if (!nodesetstr) + goto cleanup; + + VIR_DEBUG("Nodeset returned from numad: %s", nodesetstr); + + if (virBitmapParse(nodesetstr, &priv->autoNodeset, + VIR_DOMAIN_CPUMASK_LEN) < 0)
This call is not any longer than others before or after it, so there's no reason IMHO to distribute it among two lines. You could even rename 'nodesetstr' to 'nodeset', which is the name you've used for the same purpose in other places, and shorten it a bit further ;) Reviewed-by: Andrea Bolognani <abologna@redhat.com> -- Andrea Bolognani / Red Hat / Virtualization

'numad' may return a nodeset which contains NUMA nodes without memory for certain configurations. Since cgroups code will not be happy using nodes without memory we need to store only numa nodes with memory in autoNodeset. On the other hand autoCpuset should contain cpus also for nodes which do not have any memory. --- src/qemu/qemu_process.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 01fe33c92..78c5692a0 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5416,6 +5416,8 @@ qemuProcessPrepareDomainNUMAPlacement(virDomainObjPtr vm, { qemuDomainObjPrivatePtr priv = vm->privateData; char *nodesetstr = NULL; + virBitmapPtr numadNodeset = NULL; + virBitmapPtr hostMemoryNodeset = NULL; int ret = -1; /* Get the advisory nodeset from numad if 'placement' of @@ -5430,20 +5432,30 @@ qemuProcessPrepareDomainNUMAPlacement(virDomainObjPtr vm, if (!nodesetstr) goto cleanup; + if (!(hostMemoryNodeset = virNumaGetHostMemoryNodeset())) + goto cleanup; + VIR_DEBUG("Nodeset returned from numad: %s", nodesetstr); - if (virBitmapParse(nodesetstr, &priv->autoNodeset, - VIR_DOMAIN_CPUMASK_LEN) < 0) + if (virBitmapParse(nodesetstr, &numadNodeset, VIR_DOMAIN_CPUMASK_LEN) < 0) goto cleanup; - if (!(priv->autoCpuset = virCapabilitiesGetCpusForNodemask(caps, - priv->autoNodeset))) + /* numad may return a nodeset that only contains cpus but cgroups don't play + * well with that. Set the autoCpuset from all cpus from that nodeset, but + * assign autoNodeset only with nodes containing memory. */ + if (!(priv->autoCpuset = virCapabilitiesGetCpusForNodemask(caps, numadNodeset))) goto cleanup; + virBitmapIntersect(numadNodeset, hostMemoryNodeset); + + VIR_STEAL_PTR(priv->autoNodeset, numadNodeset); + ret = 0; cleanup: VIR_FREE(nodesetstr); + virBitmapFree(numadNodeset); + virBitmapFree(hostMemoryNodeset); return ret; } -- 2.12.2

On Wed, 2017-07-12 at 15:44 +0200, Peter Krempa wrote:
'numad' may return a nodeset which contains NUMA nodes without memory for certain configurations. Since cgroups code will not be happy using nodes without memory we need to store only numa nodes with memory in autoNodeset. On the other hand autoCpuset should contain cpus also for nodes which do not have any memory. --- src/qemu/qemu_process.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-)
Reviewed-by: Andrea Bolognani <abologna@redhat.com> -- Andrea Bolognani / Red Hat / Virtualization

Hello Peter, This is my first review. I tested the patch series on my system and it works fine. The refactoring part and the logic for autoCpuset and autoNodeset looks good. ACK from my side. Thanks, Nitesh. On Wed, Jul 12, 2017 at 7:14 PM, Peter Krempa <pkrempa@redhat.com> wrote:
Cgroups code fails if it's instructed to bind memory usage of a qemu process to a NUMA node which does not have any memory.
First four patches are preliminary cleanups and the fix to keep the autoCpuset private data entry in the status XML necessary to modify autoNodeset in the future.
The last patch modifies autoNodeset to be an intersection of the advice from numad with host NUMA node set containing memory, so that we don't ask for impossible things when setting up cgroups.
Peter Krempa (5): util: bitmap: Modify virBitmapSubtract to virBitmapIntersect qemu: domain: Extract parsing and formatting of priv->autoNodeset qemu: domain: Store and restore autoCpuset to status XML qemu: process: Extract gathering of 'numad' placement into a function qemu: process: Don't put memoryless NUMA nodes into autoNodeset
src/libvirt_private.syms | 2 +- src/qemu/qemu_domain.c | 102 ++++++++++++++++++++++++++++++ ++++++----------- src/qemu/qemu_process.c | 73 +++++++++++++++++++++++---------- src/util/virbitmap.c | 14 +++---- src/util/virbitmap.h | 2 +- tests/qemuxml2xmltest.c | 2 +- tests/virbitmaptest.c | 14 +++---- 7 files changed, 147 insertions(+), 62 deletions(-)
-- 2.12.2
participants (3)
-
Andrea Bolognani
-
Nitesh Konkar
-
Peter Krempa