[libvirt] [PATCH v2 1/2] add nodeset='all' for interleave mode

sometimes we hoped that the memory of vm can be evenly distributed in all nodes according to interleave mode. But different hosts has different node number. So we add nodeset='all' for interleave mode. Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> --- src/conf/numa_conf.c | 77 ++++++++++++++++++++++++++++++++++++------- 1 files changed, 64 insertions(+), 13 deletions(-) diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 97a3ca4..a336a62 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -29,6 +29,9 @@ #include "virnuma.h" #include "virstring.h" +#if WITH_NUMACTL +#include <numa.h> +#endif /* * Distance definitions defined Conform ACPI 2.0 SLIT. * See include/linux/topology.h @@ -66,6 +69,7 @@ typedef virDomainNumaNode *virDomainNumaNodePtr; struct _virDomainNuma { struct { bool specified; + bool allnode; virBitmapPtr nodeset; virDomainNumatuneMemMode mode; virDomainNumatunePlacement placement; @@ -259,13 +263,21 @@ virDomainNumatuneParseXML(virDomainNumaPtr numa, tmp = virXMLPropString(node, "nodeset"); if (tmp) { - if (virBitmapParse(tmp, &nodeset, VIR_DOMAIN_CPUMASK_LEN) < 0) - goto cleanup; - - if (virBitmapIsAllClear(nodeset)) { + if (STREQ(tmp, "all") && !virNumaIsAvailable()) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Invalid value of 'nodeset': %s"), tmp); + _("Invalid nodeset=%s when numactl is not supported"), tmp); goto cleanup; + } else if (STREQ(tmp, "all") && virNumaIsAvailable()) { + numa->memory.allnode = true; + } else { + if (virBitmapParse(tmp, &nodeset, VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(nodeset)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'nodeset': %s"), tmp); + goto cleanup; + } } VIR_FREE(tmp); @@ -319,10 +331,14 @@ virDomainNumatuneFormatXML(virBufferPtr buf, virBufferAsprintf(buf, "<memory mode='%s' ", tmp); if (numatune->memory.placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC) { - if (!(nodeset = virBitmapFormat(numatune->memory.nodeset))) - return -1; - virBufferAsprintf(buf, "nodeset='%s'/>\n", nodeset); - VIR_FREE(nodeset); + if (numatune->memory.allnode == true) { + virBufferAddLit(buf, "nodeset='all'/>\n"); + } else { + if (!(nodeset = virBitmapFormat(numatune->memory.nodeset))) + return -1; + virBufferAsprintf(buf, "nodeset='%s'/>\n", nodeset); + VIR_FREE(nodeset); + } } else if (numatune->memory.placement) { tmp = virDomainNumatunePlacementTypeToString(numatune->memory.placement); virBufferAsprintf(buf, "placement='%s'/>\n", tmp); @@ -489,6 +505,37 @@ virDomainNumatuneMaybeFormatNodeset(virDomainNumaPtr numatune, return 0; } +#if WITH_NUMACTL +static int +makeAllnodeBitmap(virDomainNumaPtr numa) +{ + size_t i = 0, maxnode = 0; + virBitmapPtr bitmap = NULL; + + if ((bitmap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)) == NULL) + return -1; + virBitmapClearAll(bitmap); + maxnode = numa_max_node(); + for (i = 0; i <= maxnode; i++) { + if (virBitmapSetBit(bitmap, i) < 0) { + virBitmapFree(bitmap); + return -1; + } + } + + virBitmapFree(numa->memory.nodeset); + numa->memory.nodeset = bitmap; + + return 0; +} +#else +static int +makeAllnodeBitmap(virDomainNumaPtr numa) +{ + return -1; +} +#endif + int virDomainNumatuneSet(virDomainNumaPtr numa, bool placement_static, @@ -538,20 +585,28 @@ virDomainNumatuneSet(virDomainNumaPtr numa, } if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_DEFAULT) { - if (numa->memory.nodeset || placement_static) + if (numa->memory.nodeset || placement_static || numa->memory.allnode) placement = VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC; else placement = VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO; } if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC && - !numa->memory.nodeset) { + !numa->memory.nodeset && !numa->memory.allnode) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("nodeset for NUMA memory tuning must be set " "if 'placement' is 'static'")); goto cleanup; } + if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC && + mode == VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE && + numa->memory.allnode && virNumaIsAvailable()) { + + if (makeAllnodeBitmap(numa) < 0) + goto cleanup; + } + /* setting nodeset when placement auto is invalid */ if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO && numa->memory.nodeset) { -- 1.8.3.1

Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5..9aca921 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1242,6 +1242,7 @@ ... <numatune> <memory mode="strict" nodeset="1-4,^3"/> + <memory mode="interleave" nodeset="all"/> <memnode cellid="0" mode="strict" nodeset="1"/> <memnode cellid="2" mode="preferred" nodeset="2"/> </numatune> @@ -1264,7 +1265,8 @@ attributes. Attribute <code>mode</code> is either 'interleave', 'strict', or 'preferred', defaults to 'strict'. Attribute <code>nodeset</code> specifies the NUMA nodes, using the same syntax as - attribute <code>cpuset</code> of element <code>vcpu</code>. Attribute + attribute <code>cpuset</code> of element <code>vcpu</code>.For 'interleave' + mode, 'nodeset' can be set to 'all' which stands for all nodes in host. Attribute <code>placement</code> (<span class='since'>since 0.9.12</span>) can be used to indicate the memory placement mode for domain process, its value can be either "static" or "auto", defaults to <code>placement</code> of -- 1.8.3.1

On 9/29/18 6:58 AM, Peng Hao wrote:
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> --- docs/formatdomain.html.in | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
This should have been merged with patch 1
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index 1f12ab5..9aca921 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1242,6 +1242,7 @@ ... <numatune> <memory mode="strict" nodeset="1-4,^3"/> + <memory mode="interleave" nodeset="all"/> <memnode cellid="0" mode="strict" nodeset="1"/> <memnode cellid="2" mode="preferred" nodeset="2"/> </numatune> @@ -1264,7 +1265,8 @@ attributes. Attribute <code>mode</code> is either 'interleave', 'strict', or 'preferred', defaults to 'strict'. Attribute <code>nodeset</code> specifies the NUMA nodes, using the same syntax as - attribute <code>cpuset</code> of element <code>vcpu</code>. Attribute + attribute <code>cpuset</code> of element <code>vcpu</code>.For 'interleave'
s/.F/. F/
+ mode, 'nodeset' can be set to 'all' which stands for all nodes in host. Attribute
Like noted before I think perhaps all would be useful for other {cpu|node}set values. Also when you add a new feature, then you need to note a <span class='since'>Since 4.9.0</span> on the new attribute option "all". (or whatever version this would be added). John
<code>placement</code> (<span class='since'>since 0.9.12</span>) can be used to indicate the memory placement mode for domain process, its value can be either "static" or "auto", defaults to <code>placement</code> of

On 9/29/18 6:58 AM, Peng Hao wrote:
sometimes we hoped that the memory of vm can be evenly distributed in all nodes according to interleave mode. But different hosts has different node number. So we add nodeset='all' for interleave mode.
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> --- src/conf/numa_conf.c | 77 ++++++++++++++++++++++++++++++++++++------- 1 files changed, 64 insertions(+), 13 deletions(-)
Not so sure you followed Michal's v1 review comments with respect to needing an RNG update and test coverage (xml2xml and xml2argv). For the RNG, look at docs/schemas/*.rng and search on "nodeset" and "cpuset". Eventually you will find where nodeset gets defined for the schema and perhaps be able to figure out a way to adjust the pattern that's allowable for the schema. The XML file you create that would have "all" would need to pass a virt-xml-validate type command. For the xml2xml and xml2argv you will need to make sure your grammar (RNG) change works for XML processing and it would create the command line you expect. Next, do you think the very narrow band of numatune / memory / nodeset would be the only "set" to benefit from having being able to specify "in some manner" all the for a set? As noted above nodeset and cpuset are reusable throughout libvirt in various forms. Dig deeper - think of how that string is presented and parsed (hint: see virBitmap*). Knowing how many "nodes" there are is part of the <numa> child of <cpu>, right? While knowing that "all" means one thing "today" it could mean something different on some subsequent change and if the numa topology is changed there's lots of code currently around that would seemingly need to handle "all". Anyway, back to the cpuset/nodeset presentation. If today it's "A,C,E" or "A-E,^B,^D", then why not devise a way to use "0,~0" meaning 0 and the one's complement of 0 (e.g. everything else). That way if I wanted to, I could "0,~0,^4" meaning everything except 4. Of course the downside is that I need to know what the guest maximum is. The only way to know that is in post parse processing. So doing this may involve moving the translation of @nodeset and @cpuset... Whether any of this is of use to anyone - who knows. I just think using "all" is not the best idea. If you've defined your guest to have 4 vCPUs and/or use 4 NUMA nodes, then it should be very simple to say 0-3 for the cpuset or nodeset. Allowing "all" is in a way the lazy way and may not be true at some point in the future. Maybe someone adds a new node or vCPU to the guest and wants it to perform some other specific task. If adding that "misses" someone setting "all" for some nodeset, then that usage model would be unexpected. As an aside, are there hugepage implications to this? Or other nodeset configuration options related to NUMA that I'm not as aware of...
diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 97a3ca4..a336a62 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -29,6 +29,9 @@ #include "virnuma.h" #include "virstring.h"
+#if WITH_NUMACTL +#include <numa.h> +#endif
Personally, probably should stick to only having src/util/virnuma.c do this.
/* * Distance definitions defined Conform ACPI 2.0 SLIT. * See include/linux/topology.h @@ -66,6 +69,7 @@ typedef virDomainNumaNode *virDomainNumaNodePtr; struct _virDomainNuma { struct { bool specified; + bool allnode; virBitmapPtr nodeset; virDomainNumatuneMemMode mode; virDomainNumatunePlacement placement; @@ -259,13 +263,21 @@ virDomainNumatuneParseXML(virDomainNumaPtr numa,
tmp = virXMLPropString(node, "nodeset"); if (tmp) { - if (virBitmapParse(tmp, &nodeset, VIR_DOMAIN_CPUMASK_LEN) < 0) - goto cleanup; - - if (virBitmapIsAllClear(nodeset)) { + if (STREQ(tmp, "all") && !virNumaIsAvailable()) {
What does it matter? Seems to me there'd already be some check related to NUMA availability in the event some nodeset range was defined. Doing this in grammar checking doesn't feel right. I took a cursory look, but you can dig deeper yourself to understand what happens if nodeset is defined and NUMA isn't available otherwise.
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("Invalid value of 'nodeset': %s"), tmp); + _("Invalid nodeset=%s when numactl is not supported"), tmp);
^ The alignment here is wrong.
goto cleanup; + } else if (STREQ(tmp, "all") && virNumaIsAvailable()) { + numa->memory.allnode = true;
So "all" would mean we wait until some time in the future to determine what that means as opposed to setting those up now. What happens to various checks that may expect that numa->memory.nodeset contains something? What other checks are out there that will be missed?
+ } else { + if (virBitmapParse(tmp, &nodeset, VIR_DOMAIN_CPUMASK_LEN) < 0) + goto cleanup; + + if (virBitmapIsAllClear(nodeset)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid value of 'nodeset': %s"), tmp); + goto cleanup; + } }
VIR_FREE(tmp); @@ -319,10 +331,14 @@ virDomainNumatuneFormatXML(virBufferPtr buf, virBufferAsprintf(buf, "<memory mode='%s' ", tmp);
if (numatune->memory.placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC) { - if (!(nodeset = virBitmapFormat(numatune->memory.nodeset))) - return -1; - virBufferAsprintf(buf, "nodeset='%s'/>\n", nodeset); - VIR_FREE(nodeset); + if (numatune->memory.allnode == true) {
There would be no need for the == true comparison, it's a boolean operation.
+ virBufferAddLit(buf, "nodeset='all'/>\n"); + } else { + if (!(nodeset = virBitmapFormat(numatune->memory.nodeset))) + return -1; + virBufferAsprintf(buf, "nodeset='%s'/>\n", nodeset); + VIR_FREE(nodeset); + } } else if (numatune->memory.placement) { tmp = virDomainNumatunePlacementTypeToString(numatune->memory.placement); virBufferAsprintf(buf, "placement='%s'/>\n", tmp); @@ -489,6 +505,37 @@ virDomainNumatuneMaybeFormatNodeset(virDomainNumaPtr numatune, return 0; }
+#if WITH_NUMACTL +static int +makeAllnodeBitmap(virDomainNumaPtr numa)
Method name doesn't adhere to standards in https://libvirt.org/hacking.html
+{ + size_t i = 0, maxnode = 0; + virBitmapPtr bitmap = NULL; + + if ((bitmap = virBitmapNew(VIR_DOMAIN_CPUMASK_LEN)) == NULL)
Consider, if (!(bitmap = xxx())))
+ return -1; + virBitmapClearAll(bitmap); + maxnode = numa_max_node();
I see a virNumaGetMaxNode... So are we apply "all" to the guest or host configuration? I guess that part just isn't clear enough to me. Configuring NUMA isn't in my normal list of things to look at, but since the patch has been sitting for a bit I've looked.
+ for (i = 0; i <= maxnode; i++) {
So what is the value of @maxnode in relation to the value of VIR_DOMAIN_CPUMASK_LEN (1024)? Less I hope... Furthermore if counting starts at 0, then it should be < not <=, right?
+ if (virBitmapSetBit(bitmap, i) < 0) { + virBitmapFree(bitmap); + return -1; + } + } + + virBitmapFree(numa->memory.nodeset); + numa->memory.nodeset = bitmap; + + return 0; +} +#else +static int
^^^^ This wouldn't be static in a src/util/vir*.c module...
+makeAllnodeBitmap(virDomainNumaPtr numa)
And @numa would be ATTRIBUTE_UNUSED; otherwise, there'd be compiler issues.
+{ + return -1;
This would be wrong because there'd be a failure that some parsing failing for some unknown reason. You'd need to add a real error message.
+} +#endif + int virDomainNumatuneSet(virDomainNumaPtr numa, bool placement_static, @@ -538,20 +585,28 @@ virDomainNumatuneSet(virDomainNumaPtr numa, }
if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_DEFAULT) { - if (numa->memory.nodeset || placement_static) + if (numa->memory.nodeset || placement_static || numa->memory.allnode) placement = VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC; else placement = VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO; }
if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC && - !numa->memory.nodeset) { + !numa->memory.nodeset && !numa->memory.allnode) {
So if the next hunk would theoretically fill in numa->memory.nodeset for you, then why modify this check now? Why wouldn't the next hunk go above this hunk?
virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("nodeset for NUMA memory tuning must be set " "if 'placement' is 'static'")); goto cleanup; }
+ if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_STATIC && + mode == VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE && + numa->memory.allnode && virNumaIsAvailable()) {
Seems to me XML post parse processing would have needed to check that if you had this allnode set, then you better also have static and interleave (and of course that Numa is available). That way, all you'd have to do is determine that allnode was set and call your function to do that. Those are "my" thoughts on the matter. Still not clear as to the need. I'm 50/50 on it. I see benefit, but there are concerns especially as how things relate to migration and domain save/restore. John
+ + if (makeAllnodeBitmap(numa) < 0) + goto cleanup; + } + /* setting nodeset when placement auto is invalid */ if (placement == VIR_DOMAIN_NUMATUNE_PLACEMENT_AUTO && numa->memory.nodeset) {

On Thu, Oct 11, 2018 at 12:03:44PM -0400, John Ferlan wrote:
On 9/29/18 6:58 AM, Peng Hao wrote:
sometimes we hoped that the memory of vm can be evenly distributed in all nodes according to interleave mode. But different hosts has different node number. So we add nodeset='all' for interleave mode.
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> --- src/conf/numa_conf.c | 77 ++++++++++++++++++++++++++++++++++++------- 1 files changed, 64 insertions(+), 13 deletions(-)
Anyway, back to the cpuset/nodeset presentation. If today it's "A,C,E" or "A-E,^B,^D", then why not devise a way to use "0,~0" meaning 0 and the one's complement of 0 (e.g. everything else). That way if I wanted to, I could "0,~0,^4" meaning everything except 4. Of course the downside is that I need to know what the guest maximum is. The only way to know that is in post parse processing. So doing this may involve moving the translation of @nodeset and @cpuset...
Adding ~0 will break applications parsing the attribute in the same way "all" will, as they will reject '~' as an invalid value. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Sat, Sep 29, 2018 at 06:58:01PM +0800, Peng Hao wrote:
sometimes we hoped that the memory of vm can be evenly distributed in all nodes according to interleave mode. But different hosts has different node number. So we add nodeset='all' for interleave mode.
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn> --- src/conf/numa_conf.c | 77 ++++++++++++++++++++++++++++++++++++------- 1 files changed, 64 insertions(+), 13 deletions(-)
IMHO this is not a viable change as it breaks backcompatibility for apps parsing the XML, by changing the data type used. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (3)
-
Daniel P. Berrangé
-
John Ferlan
-
Peng Hao