[libvirt] [PATCH 0/3] Guest NUMA topology support - v1

Hi, This patch series adds support for specifying NUMA topology for guests. <cpu> ... <topology sockets='2' cores='4' threads='2'/> <numa> <cell cpus='0-7' mems='512000'/> <cell cpus='8-15' mems='512000'/> </numa> ... </cpu> This change allows libvirt to generate -numa options for QEMU/KVM. Changes since v0 ---------------- - Patch re-organization as per danpb's comments. - Use of virBufferAsPrintf instead of snprintf as per danpb's comments. - Use "cell" instead of "node" as per danpb's suggestion. - Add a helper to truncate virBuffer as per Eric's suggestion. - Add documentation. - Add testcases - Add basic validity check for cpus specified in <numa> ... </numa> Testing -------- The patches pass all tests except domainschema test. I think this is not a real failure, but the test is probably failing to pickup the right domaincommon.rng file. Will investigate further. TEST: domainschematest ........................................ 40 ........................................ 80 .........!.............................. 120 ........................................ 160 .................!...................... 200 ........................................ 240 ........................................ 280 ........................................ 320 ... 323 FAILED daemon-conf fails, but it fails even without my patches too. I guess my patches are really affecting it. TEST: daemon-conf .....!!!!!!...!!!!!!!!!!!!!!!!!!!./daemon-conf: line 98: kill: (7811) - No such process ! 34 FAILED v0 -- - https://www.redhat.com/archives/libvir-list/2011-October/msg00025.html RFC --- - http://permalink.gmane.org/gmane.comp.emulators.libvirt/44626 Regards, Bharata.

XML definitions for guest NUMA and parsing routines. From: Bharata B Rao <bharata@linux.vnet.ibm.com> This patch adds XML definitions for guest NUMA specification and contains routines to parse the same. The guest NUMA specification looks like this: <cpu> ... <topology sockets='2' cores='4' threads='2'/> <numa> <cell cpus='0-7' mems='512000'/> <cell cpus='8-15' mems='512000'/> </numa> ... </cpu> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- docs/formatdomain.html.in | 32 +++++++++++++++++++++ docs/schemas/domaincommon.rng | 32 +++++++++++++++++++++ src/conf/cpu_conf.c | 62 +++++++++++++++++++++++++++++++++++++++++ src/conf/cpu_conf.h | 12 ++++++++ src/conf/domain_conf.c | 7 +++++ 5 files changed, 145 insertions(+), 0 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index c3e7752..8d070ca 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -628,6 +628,38 @@ </dd> </dl> + <p> + Guest NUMA topology can be specifed using <code>numa</code> element. + <span class="since">Since X.X.X</span> + </p> + +<pre> + ... + <cpu> + ... + <numa> + <cell cpus='0-3' mems='512000'/> + <cell cpus='4-7' mems='512000'/> + </numa> + ... + </cpu> + ...</pre> + + <p> + Each <code>cell</code> element specifies a NUMA cell or a NUMA node. + <code>cpus</code> specifies the CPU or range of CPUs that are part of + the node. <code>mems</code> specifies the node memory in kilobytes + (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid + or nodeid in the increasing order starting from 0. + </p> + + <p> + This guest NUMA specification translates to <code>-numa</code> command + line option for QEMU/KVM. For the above example, the following QEMU + command line option is generated: + <code>-numa node,nodeid=0,cpus=0-3,mems=512000 -numa node,nodeid=1,cpus=4-7,mems=512000</code> + </p> + <h3><a name="elementsLifecycle">Lifecycle control</a></h3> <p> diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index b6f858e..b09060a 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -2311,6 +2311,9 @@ <zeroOrMore> <ref name="cpuFeature"/> </zeroOrMore> + <optional> + <ref name="cpuNuma"/> + </optional> </interleave> </group> </choice> @@ -2371,6 +2374,25 @@ </element> </define> + <define name="cpuNuma"> + <element name="numa"> + <oneOrMore> + <ref name="numaCell"/> + </oneOrMore> + </element> + </define> + + <define name="numaCell"> + <element name="cell"> + <attribute name="cpus"> + <ref name="Cellcpus"/> + </attribute> + <attribute name="mems"> + <ref name="Cellmems"/> + </attribute> + </element> + </define> + <!-- System information specification: Placeholder for system specific informations likes the ones @@ -2745,4 +2767,14 @@ <param name="pattern">[a-zA-Z0-9_\.:]+</param> </data> </define> + <define name="Cellcpus"> + <data type="string"> + <param name="pattern">([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*</param> + </data> + </define> + <define name="Cellmems"> + <data type="unsignedInt"> + <param name="pattern">[0-9]+</param> + </data> + </define> </grammar> diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c index 41e997e..e55897f 100644 --- a/src/conf/cpu_conf.c +++ b/src/conf/cpu_conf.c @@ -28,6 +28,7 @@ #include "util.h" #include "buf.h" #include "cpu_conf.h" +#include "domain_conf.h" #define VIR_FROM_THIS VIR_FROM_CPU @@ -67,6 +68,10 @@ virCPUDefFree(virCPUDefPtr def) VIR_FREE(def->features[i].name); VIR_FREE(def->features); + for (i = 0 ; i < def->ncells ; i++) + VIR_FREE(def->cells[i].cpumask); + VIR_FREE(def->cells); + VIR_FREE(def); } @@ -109,6 +114,19 @@ no_memory: return NULL; } +static int +virCPUDefNumaCPUs(virCPUDefPtr def) +{ + int i, j, ncpus = 0; + + for (i = 0; i < def->ncells; i++) { + for (j = 0; j < VIR_DOMAIN_CPUMASK_LEN; j++) { + if (def->cells[i].cpumask[j]) + ncpus++; + } + } + return ncpus; +} virCPUDefPtr virCPUDefParseXML(const xmlNodePtr node, @@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node, def->features[i].policy = policy; } + if (virXPathNode("./numa[1]", ctxt)) { + VIR_FREE(nodes); + n = virXPathNodeSet("./numa[1]/cell", ctxt, &nodes); + if (n < 0 || n == 0) { + virCPUReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("NUMA topology defined without NUMA cells")); + goto error; + } + + if (VIR_RESIZE_N(def->cells, def->ncells_max, + def->ncells, n) < 0) + goto no_memory; + + def->ncells = n; + + for (i = 0 ; i < n ; i++) { + char *cpus; + int cpumasklen = VIR_DOMAIN_CPUMASK_LEN; + unsigned long ul; + int ret; + + def->cells[i].cellid = i; + cpus = virXMLPropString(nodes[i], "cpus"); + + if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen) < 0) + goto no_memory; + + if (virDomainCpuSetParse((const char **)&cpus, + 0, def->cells[i].cpumask, + cpumasklen) < 0) + goto error; + + ret = virXPathULong("string(./numa[1]/cell/@mems)", + ctxt, &ul); + if (ret < 0) { + virCPUReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Missing 'mems' attribute in NUMA topology")); + goto error; + } + def->cells[i].mem = (unsigned int) ul; + } + def->cells_cpus = virCPUDefNumaCPUs(def); + } + cleanup: VIR_FREE(nodes); diff --git a/src/conf/cpu_conf.h b/src/conf/cpu_conf.h index 4406cba..bf4270a 100644 --- a/src/conf/cpu_conf.h +++ b/src/conf/cpu_conf.h @@ -67,6 +67,14 @@ struct _virCPUFeatureDef { int policy; /* enum virCPUFeaturePolicy */ }; +typedef struct _virCellDef virCellDef; +typedef virCellDef *virCellDefPtr; +struct _virCellDef { + int cellid; + char *cpumask; /* CPUs that are part of this node */ + unsigned int mem; /* Node memory */ +}; + typedef struct _virCPUDef virCPUDef; typedef virCPUDef *virCPUDefPtr; struct _virCPUDef { @@ -81,6 +89,10 @@ struct _virCPUDef { size_t nfeatures; size_t nfeatures_max; virCPUFeatureDefPtr features; + size_t ncells; + size_t ncells_max; + virCellDefPtr cells; + unsigned int cells_cpus; }; diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 6e2d421..02fc1e7 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7511,6 +7511,13 @@ static virDomainDefPtr virDomainDefParseXML(virCapsPtr caps, if (def->cpu == NULL) goto error; + + if (def->cpu->cells_cpus > def->maxvcpus) { + virDomainReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Number of CPUs in <numa> exceeds the" + " <vcpu> count")); + goto error; + } } if ((node = virXPathNode("./sysinfo[1]", ctxt)) != NULL) {

On 11/06/2011 06:57 AM, Bharata B Rao wrote:
XML definitions for guest NUMA and parsing routines.
From: Bharata B Rao<bharata@linux.vnet.ibm.com>
This patch adds XML definitions for guest NUMA specification and contains routines to parse the same. The guest NUMA specification looks like this:
<cpu> ... <topology sockets='2' cores='4' threads='2'/> <numa> <cell cpus='0-7' mems='512000'/> <cell cpus='8-15' mems='512000'/> </numa> ... </cpu>
Signed-off-by: Bharata B Rao<bharata@linux.vnet.ibm.com> ---
+<p> + Guest NUMA topology can be specifed using<code>numa</code> element. +<span class="since">Since X.X.X</span>
Let's just put 0.9.8 here. It's easier at feature freeze time to grep and replace a concrete 0.9.8 into a different numbering scheme (0.10.0, 1.0.0, ?) if we decide on something different than 0.9.8, than it is to remember to also search for X.X.X.
+</p> + +<pre> + ... +<cpu> + ... +<numa> +<cell cpus='0-3' mems='512000'/> +<cell cpus='4-7' mems='512000'/>
I understand 'cpus' (a valid word meaning multiple cpu units), but 'mems' seems odd; I think it would be better naming this attribute 'memory' to match our <memory> element at the top level. Just because qemu's command line names the option mems= doesn't mean we should be stuck making our XML inconsistent.
+</numa> + ... +</cpu> + ...</pre> + +<p> + Each<code>cell</code> element specifies a NUMA cell or a NUMA node. +<code>cpus</code> specifies the CPU or range of CPUs that are part of + the node.<code>mems</code> specifies the node memory in kilobytes + (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid + or nodeid in the increasing order starting from 0.
I agree with doing things in 1024-byte blocks [1], since <memory> and <currentMemory> are also in that unit. [1] 1024-byte blocks is technically kibibytes, not kilobytes; but you're copying from existing text, so at least we're consistent, not to mention fitting right in with the wikipedia complaint that KiB has had slow adoption by the computer industry: https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :)
+</p> + +<p> + This guest NUMA specification translates to<code>-numa</code> command + line option for QEMU/KVM. For the above example, the following QEMU + command line option is generated: +<code>-numa node,nodeid=0,cpus=0-3,mems=512000 -numa node,nodeid=1,cpus=4-7,mems=512000</code>
This paragraph is not necessary. We don't need to give one hypervisor-specific example of how the XML is translated; it is sufficient to simply document the XML semantics in a way that can be implemented by any number of hypervisors.
+ +<define name="numaCell"> +<element name="cell"> +<attribute name="cpus"> +<ref name="Cellcpus"/>
Typically, ref names start with a lower case letter.
+</attribute> +<attribute name="mems"> +<ref name="Cellmems"/> +</attribute>
Is it possible for these attributes to be optional? That is, on the qemu line, can I specify cpus= but not mems=, or mems= but not cpus=? If so, then the attributes need to be optional in the schema, and the code behave with sane defaults when one of the two attributes is not present.
@@ -2745,4 +2767,14 @@ <param name="pattern">[a-zA-Z0-9_\.:]+</param> </data> </define> +<define name="Cellcpus"> +<data type="string"> +<param name="pattern">([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*</param>
This looks like a repeat of <define name="cpuset">; if so, let's reuse that define instead of making a new one (and if not, why are we introducing yet another syntax?).
+</data> +</define> +<define name="Cellmems"> +<data type="unsignedInt"> +<param name="pattern">[0-9]+</param>
Likewise, this looks like a repeat of <define name="memoryKB">, so lets reuse that.
@@ -109,6 +114,19 @@ no_memory: return NULL; }
+static int +virCPUDefNumaCPUs(virCPUDefPtr def) +{ + int i, j, ncpus = 0; + + for (i = 0; i< def->ncells; i++) { + for (j = 0; j< VIR_DOMAIN_CPUMASK_LEN; j++) { + if (def->cells[i].cpumask[j]) + ncpus++; + } + }
Can this loop be made any faster by using count_one_bits?
+ return ncpus; +}
virCPUDefPtr virCPUDefParseXML(const xmlNodePtr node, @@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node, def->features[i].policy = policy; }
+ if (virXPathNode("./numa[1]", ctxt)) { + VIR_FREE(nodes); + n = virXPathNodeSet("./numa[1]/cell", ctxt,&nodes); + if (n< 0 || n == 0) {
This looks a bit funny, compared to if (n <= 0).
+ for (i = 0 ; i< n ; i++) { + char *cpus; + int cpumasklen = VIR_DOMAIN_CPUMASK_LEN; + unsigned long ul; + int ret; + + def->cells[i].cellid = i; + cpus = virXMLPropString(nodes[i], "cpus"); + + if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen)< 0) + goto no_memory; + + if (virDomainCpuSetParse((const char **)&cpus, + 0, def->cells[i].cpumask, + cpumasklen)< 0)
Does this behave properly if the cpus=... attribute was missing?
+ goto error; + + ret = virXPathULong("string(./numa[1]/cell/@mems)", + ctxt,&ul); + if (ret< 0) { + virCPUReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Missing 'mems' attribute in NUMA topology")); + goto error;
Especially since you checked for a missing mems= counterpart? And back to my earlier question of whether both attributes are really mandatory, or whether just one in isolation can make sense.
+++ b/src/conf/cpu_conf.h @@ -67,6 +67,14 @@ struct _virCPUFeatureDef { int policy; /* enum virCPUFeaturePolicy */ };
+typedef struct _virCellDef virCellDef; +typedef virCellDef *virCellDefPtr; +struct _virCellDef { + int cellid; + char *cpumask; /* CPUs that are part of this node */ + unsigned int mem; /* Node memory */
I'd write this as /* Node memory in kB */, to make it clear what scale is in use. I saw the code for parsing the XML, but what about the code for generating the xml during 'virsh dumpxml' (aka virDomainDefFormat)? The patch is incomplete without that. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Mon, Nov 07, 2011 at 11:35:27AM -0700, Eric Blake wrote:
On 11/06/2011 06:57 AM, Bharata B Rao wrote:
XML definitions for guest NUMA and parsing routines.
From: Bharata B Rao<bharata@linux.vnet.ibm.com>
This patch adds XML definitions for guest NUMA specification and contains routines to parse the same. The guest NUMA specification looks like this:
<cpu> ... <topology sockets='2' cores='4' threads='2'/> <numa> <cell cpus='0-7' mems='512000'/> <cell cpus='8-15' mems='512000'/> </numa> ... </cpu>
Signed-off-by: Bharata B Rao<bharata@linux.vnet.ibm.com> ---
+<p> + Guest NUMA topology can be specifed using<code>numa</code> element. +<span class="since">Since X.X.X</span>
Let's just put 0.9.8 here. It's easier at feature freeze time to grep and replace a concrete 0.9.8 into a different numbering scheme (0.10.0, 1.0.0, ?) if we decide on something different than 0.9.8, than it is to remember to also search for X.X.X.
Ok, changed to 0.9.8 in v3.
+</p> + +<pre> + ... +<cpu> + ... +<numa> +<cell cpus='0-3' mems='512000'/> +<cell cpus='4-7' mems='512000'/>
I understand 'cpus' (a valid word meaning multiple cpu units), but 'mems' seems odd; I think it would be better naming this attribute 'memory' to match our <memory> element at the top level. Just because qemu's command line names the option mems= doesn't mean we should be stuck making our XML inconsistent.
Changed to memory.
+</numa> + ... +</cpu> + ...</pre> + +<p> + Each<code>cell</code> element specifies a NUMA cell or a NUMA node. +<code>cpus</code> specifies the CPU or range of CPUs that are part of + the node.<code>mems</code> specifies the node memory in kilobytes + (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid + or nodeid in the increasing order starting from 0.
I agree with doing things in 1024-byte blocks [1], since <memory> and <currentMemory> are also in that unit.
[1] 1024-byte blocks is technically kibibytes, not kilobytes; but you're copying from existing text, so at least we're consistent, not to mention fitting right in with the wikipedia complaint that KiB has had slow adoption by the computer industry: https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :)
+</p> + +<p> + This guest NUMA specification translates to<code>-numa</code> command + line option for QEMU/KVM. For the above example, the following QEMU + command line option is generated: +<code>-numa node,nodeid=0,cpus=0-3,mems=512000 -numa node,nodeid=1,cpus=4-7,mems=512000</code>
This paragraph is not necessary. We don't need to give one hypervisor-specific example of how the XML is translated; it is sufficient to simply document the XML semantics in a way that can be implemented by any number of hypervisors.
Ok, removed this paragraph.
+ +<define name="numaCell"> +<element name="cell"> +<attribute name="cpus"> +<ref name="Cellcpus"/>
Typically, ref names start with a lower case letter.
Ok, changed.
+</attribute> +<attribute name="mems"> +<ref name="Cellmems"/> +</attribute>
Is it possible for these attributes to be optional? That is, on the qemu line, can I specify cpus= but not mems=, or mems= but not cpus=? If so, then the attributes need to be optional in the schema, and the code behave with sane defaults when one of the two attributes is not present.
Actually qemu allows both cpus and mem to be optional. But if we allow that, we will have specifications like this: <numa> <cell /> <cell /> </numa> That looks ugly. Either both of them should allowed to be optional or none of them, I am going for the latter specification. IOW lets make both of them mandatory in a numa cell.
@@ -2745,4 +2767,14 @@ <param name="pattern">[a-zA-Z0-9_\.:]+</param> </data> </define> +<define name="Cellcpus"> +<data type="string"> +<param name="pattern">([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*</param>
This looks like a repeat of <define name="cpuset">; if so, let's reuse that define instead of making a new one (and if not, why are we introducing yet another syntax?).
Fixed.
+</data> +</define> +<define name="Cellmems"> +<data type="unsignedInt"> +<param name="pattern">[0-9]+</param>
Likewise, this looks like a repeat of <define name="memoryKB">, so lets reuse that.
Done.
@@ -109,6 +114,19 @@ no_memory: return NULL; }
+static int +virCPUDefNumaCPUs(virCPUDefPtr def) +{ + int i, j, ncpus = 0; + + for (i = 0; i< def->ncells; i++) { + for (j = 0; j< VIR_DOMAIN_CPUMASK_LEN; j++) { + if (def->cells[i].cpumask[j]) + ncpus++; + } + }
Can this loop be made any faster by using count_one_bits?
Yes provided we start storing CPUs in bitmasks rather than char arrays. But I saw other parts of the code (like cpuset) storing CPUs in array and hence used the same for numa too. In any case I got rid of this since we can arrive at this information from other means.
+ return ncpus; +}
virCPUDefPtr virCPUDefParseXML(const xmlNodePtr node, @@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node, def->features[i].policy = policy; }
+ if (virXPathNode("./numa[1]", ctxt)) { + VIR_FREE(nodes); + n = virXPathNodeSet("./numa[1]/cell", ctxt,&nodes); + if (n< 0 || n == 0) {
This looks a bit funny, compared to if (n <= 0).
Yes. I first accounted for the error case and later added a check for zero cells in <numa>. Fixed this.
+ for (i = 0 ; i< n ; i++) { + char *cpus; + int cpumasklen = VIR_DOMAIN_CPUMASK_LEN; + unsigned long ul; + int ret; + + def->cells[i].cellid = i; + cpus = virXMLPropString(nodes[i], "cpus"); + + if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen)< 0) + goto no_memory; + + if (virDomainCpuSetParse((const char **)&cpus, + 0, def->cells[i].cpumask, + cpumasklen)< 0)
Does this behave properly if the cpus=... attribute was missing?
This was buggy. Fixed now.
+ goto error; + + ret = virXPathULong("string(./numa[1]/cell/@mems)", + ctxt,&ul); + if (ret< 0) { + virCPUReportError(VIR_ERR_INTERNAL_ERROR, + "%s", _("Missing 'mems' attribute in NUMA topology")); + goto error;
Especially since you checked for a missing mems= counterpart? And back to my earlier question of whether both attributes are really mandatory, or whether just one in isolation can make sense.
As explained above, I made both of them mandatory now.
+++ b/src/conf/cpu_conf.h @@ -67,6 +67,14 @@ struct _virCPUFeatureDef { int policy; /* enum virCPUFeaturePolicy */ };
+typedef struct _virCellDef virCellDef; +typedef virCellDef *virCellDefPtr; +struct _virCellDef { + int cellid; + char *cpumask; /* CPUs that are part of this node */ + unsigned int mem; /* Node memory */
I'd write this as /* Node memory in kB */, to make it clear what scale is in use.
Done.
I saw the code for parsing the XML, but what about the code for generating the xml during 'virsh dumpxml' (aka virDomainDefFormat)? The patch is incomplete without that.
Thanks for pointing this out. Added code for this now. Regards, Bharata.

Routine to truncate virBuffer From: Bharata B Rao <bharata@linux.vnet.ibm.com> Add a helper to truncate virBuffer. /** * virBufferTruncate: * @buf: the buffer * @len: number of bytes by which the buffer is truncated * * Truncate the buffer by @len bytes. * * Returns zero on success or -1 on error */ int virBufferTruncate(virBufferPtr buf, unsigned int len) This doesn't reduce the buffer size, but instead reduces the in-use buffer. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- src/libvirt_private.syms | 1 + src/util/buf.c | 25 +++++++++++++++++++++++++ src/util/buf.h | 1 + tests/virbuftest.c | 3 ++- 4 files changed, 29 insertions(+), 1 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 6a1562e..3a2ae86 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -33,6 +33,7 @@ virBufferEscapeString; virBufferFreeAndReset; virBufferGetIndent; virBufferStrcat; +virBufferTruncate; virBufferURIEncodeString; virBufferUse; virBufferVasprintf; diff --git a/src/util/buf.c b/src/util/buf.c index 5043128..4bba350 100644 --- a/src/util/buf.c +++ b/src/util/buf.c @@ -123,6 +123,31 @@ virBufferGrow(virBufferPtr buf, unsigned int len) } /** + * virBufferTruncate: + * @buf: the buffer + * @len: number of bytes by which the buffer is truncated + * + * Truncate the buffer by @len bytes. + * + * Returns zero on success or -1 on error + */ +int +virBufferTruncate(virBufferPtr buf, unsigned int len) +{ + if (buf->error) + return -1; + + if ((signed)(buf->use - len) < 0) { + virBufferSetError(buf, -1); + return -1; + } + + buf->use -= len; + buf->content[buf->use] = '\0'; + return 0; +} + +/** * virBufferAdd: * @buf: the buffer to append to * @str: the string diff --git a/src/util/buf.h b/src/util/buf.h index 1a8ebfb..76f48e0 100644 --- a/src/util/buf.h +++ b/src/util/buf.h @@ -63,5 +63,6 @@ void virBufferURIEncodeString(virBufferPtr buf, const char *str); void virBufferAdjustIndent(virBufferPtr buf, int indent); int virBufferGetIndent(const virBufferPtr buf, bool dynamic); +int virBufferTruncate(virBufferPtr buf, unsigned int len); #endif /* __VIR_BUFFER_H__ */ diff --git a/tests/virbuftest.c b/tests/virbuftest.c index 51a62fe..2230cb0 100644 --- a/tests/virbuftest.c +++ b/tests/virbuftest.c @@ -123,7 +123,8 @@ static int testBufAutoIndent(const void *data ATTRIBUTE_UNUSED) virBufferAddChar(buf, '\n'); virBufferEscapeShell(buf, " 11"); virBufferAddChar(buf, '\n'); - + virBufferAsprintf(buf, "%d", 12); + virBufferTruncate(buf, 4); result = virBufferContentAndReset(buf); if (!result || STRNEQ(result, expected)) { virtTestDifference(stderr, expected, result);

On 11/06/2011 06:58 AM, Bharata B Rao wrote:
Routine to truncate virBuffer
From: Bharata B Rao<bharata@linux.vnet.ibm.com>
Add a helper to truncate virBuffer.
/** * virBufferTruncate:
+++ b/src/util/buf.c @@ -123,6 +123,31 @@ virBufferGrow(virBufferPtr buf, unsigned int len) }
/** + * virBufferTruncate: + * @buf: the buffer + * @len: number of bytes by which the buffer is truncated + * + * Truncate the buffer by @len bytes. + * + * Returns zero on success or -1 on error + */ +int +virBufferTruncate(virBufferPtr buf, unsigned int len)
What good is returning an error, given that callers already have to use virBufferError for detecting other errors, and given that you aren't marking the function as ATTRIBUTE_RETURN_CHECK to require callers to respect that error? One of the points of the virBuffer API is that most functions can return void (for example, see virBufferAdjustIndent), because the user doesn't care about error checking until the end.
+{ + if (buf->error) + return -1; + + if ((signed)(buf->use - len)< 0) {
I tend to write the cast as (int), not (signed), if a cast is even needed. You don't catch all possible overflows, though - if buf->use is 2, and len is 4294967295 (aka (unsigned)-1), then you've proceeded to expand(!) buf->use to 3, skipping an uninitialized byte. A better filter would be: if (len > buf->use)
+ virBufferSetError(buf, -1); + return -1; + } + + buf->use -= len; + buf->content[buf->use] = '\0';
This looks correct, once you filter out invalid len first.
+++ b/tests/virbuftest.c @@ -123,7 +123,8 @@ static int testBufAutoIndent(const void *data ATTRIBUTE_UNUSED) virBufferAddChar(buf, '\n'); virBufferEscapeShell(buf, " 11"); virBufferAddChar(buf, '\n'); - + virBufferAsprintf(buf, "%d", 12); + virBufferTruncate(buf, 4);
That gives no change in the expected output. I'd almost rather see: virBufferAsprintf(buf, "%d", 123); virBufferTruncate(buf, 1); as well as an update to the expected output to see output ending in "12". You're on the right track, but I don't think it is worth applying this until after 0.9.7 is released. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Mon, Nov 07, 2011 at 11:12:07AM -0700, Eric Blake wrote:
You're on the right track, but I don't think it is worth applying this until after 0.9.7 is released.
Sure, no problem. Thanks for your detailed review. I will respond to individual bits of your comments when I prepare v3 of the patchset. Regards, Bharata.

On Mon, Nov 07, 2011 at 11:12:07AM -0700, Eric Blake wrote:
On 11/06/2011 06:58 AM, Bharata B Rao wrote:
Routine to truncate virBuffer
From: Bharata B Rao<bharata@linux.vnet.ibm.com>
Add a helper to truncate virBuffer.
/** * virBufferTruncate:
+++ b/src/util/buf.c @@ -123,6 +123,31 @@ virBufferGrow(virBufferPtr buf, unsigned int len) }
/** + * virBufferTruncate: + * @buf: the buffer + * @len: number of bytes by which the buffer is truncated + * + * Truncate the buffer by @len bytes. + * + * Returns zero on success or -1 on error + */ +int +virBufferTruncate(virBufferPtr buf, unsigned int len)
What good is returning an error, given that callers already have to use virBufferError for detecting other errors, and given that you aren't marking the function as ATTRIBUTE_RETURN_CHECK to require callers to respect that error? One of the points of the virBuffer API is that most functions can return void (for example, see virBufferAdjustIndent), because the user doesn't care about error checking until the end.
Ok, point taken.
+{ + if (buf->error) + return -1; + + if ((signed)(buf->use - len)< 0) {
I tend to write the cast as (int), not (signed), if a cast is even needed. You don't catch all possible overflows, though - if buf->use is 2, and len is 4294967295 (aka (unsigned)-1), then you've proceeded to expand(!) buf->use to 3, skipping an uninitialized byte. A better filter would be:
if (len > buf->use)
+ virBufferSetError(buf, -1); + return -1; + } + + buf->use -= len; + buf->content[buf->use] = '\0';
This looks correct, once you filter out invalid len first.
Agreed.
+++ b/tests/virbuftest.c @@ -123,7 +123,8 @@ static int testBufAutoIndent(const void *data ATTRIBUTE_UNUSED) virBufferAddChar(buf, '\n'); virBufferEscapeShell(buf, " 11"); virBufferAddChar(buf, '\n'); - + virBufferAsprintf(buf, "%d", 12); + virBufferTruncate(buf, 4);
That gives no change in the expected output. I'd almost rather see:
virBufferAsprintf(buf, "%d", 123); virBufferTruncate(buf, 1);
as well as an update to the expected output to see output ending in "12".
The idea was to test with minimal change and hence resorted to this. In any case, I am dropping this patch for the reason explained in another thread. Regards, Bharata.

qemu: Generate -numa option From: Bharata B Rao <bharata@linux.vnet.ibm.com> Add routines to generate -numa QEMU command line option based on <numa> ... </numa> XML specifications. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- src/qemu/qemu_command.c | 71 ++++++++++++++++++++ tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args | 5 + tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml | 25 +++++++ tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args | 6 ++ tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml | 25 +++++++ tests/qemuxml2argvtest.c | 2 + 6 files changed, 134 insertions(+), 0 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index dc92fa3..b042e34 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3251,6 +3251,74 @@ qemuBuildSmpArgStr(const virDomainDefPtr def, return virBufferContentAndReset(&buf); } +static int +qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf) +{ + int i, first, last; + int cpuSet = 0; + + for (i = 0; i < VIR_DOMAIN_CPUMASK_LEN; i++) { + if (cpumask[i]) { + if (cpuSet) + last = i; + else { + first = last = i; + cpuSet = 1; + } + } else { + if (!cpuSet) + continue; + if (first == last) + virBufferAsprintf(buf, "%d,", first); + else + virBufferAsprintf(buf, "%d-%d,", first, last); + cpuSet = 0; + } + } + + if (cpuSet) { + if (first == last) + virBufferAsprintf(buf, "%d,", first); + else + virBufferAsprintf(buf, "%d-%d,", first, last); + } + + /* Remove the trailing comma */ + return virBufferTruncate(buf, 1); +} + +static int +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd) +{ + int i; + char *node; + virBuffer buf = VIR_BUFFER_INITIALIZER; + + for (i = 0; i < def->cpu->ncells; i++) { + virCommandAddArg(cmd, "-numa"); + virBufferAsprintf(&buf, "%s", "node"); + virBufferAsprintf(&buf, ",nodeid=%d", def->cpu->cells[i].cellid); + virBufferAsprintf(&buf, ",cpus="); + + if (qemuBuildNumaCPUArgStr(def->cpu->cells[i].cpumask, &buf)) + goto error; + + virBufferAsprintf(&buf, ",mems=%d", def->cpu->cells[i].mem); + + if (virBufferError(&buf)) + goto error; + + node = virBufferContentAndReset(&buf); + virCommandAddArg(cmd, node); + VIR_FREE(node); + } + return 0; + +error: + virBufferFreeAndReset(&buf); + virReportOOMError(); + return -1; +} /* * Constructs a argv suitable for launching qemu with config defined @@ -3414,6 +3482,9 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, smp); VIR_FREE(smp); + if (def->cpu && def->cpu->ncells && qemuBuildNumaArgStr(def, cmd)) + goto error; + if (qemuCapsGet(qemuCaps, QEMU_CAPS_NAME)) { virCommandAddArg(cmd, "-name"); if (driver->setProcessName && diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args new file mode 100644 index 0000000..e8fd277 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.args @@ -0,0 +1,5 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test ./qemu.sh -S -M pc \ +-m 214 -smp 16 -numa node,nodeid=0,cpus=0-7,mems=109550 \ +-numa node,nodeid=1,cpus=8-15,mems=109550 -nographic -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot n -net none -serial none \ +-parallel none -usb diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml new file mode 100644 index 0000000..07fe891 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa1.xml @@ -0,0 +1,25 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219100</memory> + <currentMemory>219100</currentMemory> + <vcpu>16</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu> + <topology sockets="2" cores="4" threads="2"/> + <numa> + <cell cpus="0-7" mems="109550"/> + <cell cpus="8-15" mems="109550"/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/./qemu.sh</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args new file mode 100644 index 0000000..bc96cef --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.args @@ -0,0 +1,6 @@ +LC_ALL=C PATH=/bin HOME=/home/test USER=test LOGNAME=test ./qemu.sh -S -M pc \ +-m 214 -smp 16,sockets=2,cores=4,threads=2 \ +-numa node,nodeid=0,cpus=0-7,mems=109550 \ +-numa node,nodeid=1,cpus=8-15,mems=109550 -nographic -monitor \ +unix:/tmp/test-monitor,server,nowait -no-acpi -boot n -net none -serial none \ +-parallel none -usb diff --git a/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml new file mode 100644 index 0000000..07fe891 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-cpu-numa2.xml @@ -0,0 +1,25 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory>219100</memory> + <currentMemory>219100</currentMemory> + <vcpu>16</vcpu> + <os> + <type arch='x86_64' machine='pc'>hvm</type> + <boot dev='network'/> + </os> + <cpu> + <topology sockets="2" cores="4" threads="2"/> + <numa> + <cell cpus="0-7" mems="109550"/> + <cell cpus="8-15" mems="109550"/> + </numa> + </cpu> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/./qemu.sh</emulator> + </devices> +</domain> diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 4d6db01..da8c016 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -578,6 +578,8 @@ mymain(void) DO_TEST("cpu-exact1", false, NONE); DO_TEST("cpu-exact2", false, NONE); DO_TEST("cpu-strict1", false, NONE); + DO_TEST("cpu-numa1", false, NONE); + DO_TEST("cpu-numa2", false, QEMU_CAPS_SMP_TOPOLOGY); DO_TEST("memtune", false, QEMU_CAPS_NAME); DO_TEST("blkiotune", false, QEMU_CAPS_NAME);

On 11/06/2011 06:59 AM, Bharata B Rao wrote:
qemu: Generate -numa option
From: Bharata B Rao<bharata@linux.vnet.ibm.com>
Add routines to generate -numa QEMU command line option based on <numa> ...</numa> XML specifications.
Signed-off-by: Bharata B Rao<bharata@linux.vnet.ibm.com>
+static int +qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf) +{ + int i, first, last; + int cpuSet = 0; +
What happens if cpumask is all 0?
+ for (i = 0; i< VIR_DOMAIN_CPUMASK_LEN; i++) { + if (cpumask[i]) {
This if branch is skipped,
+ if (cpuSet) + last = i; + else { + first = last = i; + cpuSet = 1; + } + } else { + if (!cpuSet) + continue;
so this branch always continues,
+ if (first == last) + virBufferAsprintf(buf, "%d,", first); + else + virBufferAsprintf(buf, "%d-%d,", first, last); + cpuSet = 0; + } + } + + if (cpuSet) {
and this if is skipped,
+ if (first == last) + virBufferAsprintf(buf, "%d,", first); + else + virBufferAsprintf(buf, "%d-%d,", first, last); + } + + /* Remove the trailing comma */ + return virBufferTruncate(buf, 1);
meaning that nothing was appended to buf, and you are now stripping unknown text, rather than a comma you just added. Do we need a sanity check to ensure that the cpumask specifies at least one cpu? And if so, would that mask be better here, or up front at the xml parsing time?
+} + +static int +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd) +{ + int i; + char *node; + virBuffer buf = VIR_BUFFER_INITIALIZER; + + for (i = 0; i< def->cpu->ncells; i++) { + virCommandAddArg(cmd, "-numa"); + virBufferAsprintf(&buf, "%s", "node");
More efficient as virBufferAddLit(&buf, "node"), or...
+ virBufferAsprintf(&buf, ",nodeid=%d", def->cpu->cells[i].cellid);
merge these two into a single: virBufferAsprintf(&buf, "node,nodeid=%d", ...);
+ virBufferAsprintf(&buf, ",cpus=");
Again, with no % in the format string, it is more efficient to use virBufferAddLit.
+ + if (qemuBuildNumaCPUArgStr(def->cpu->cells[i].cpumask,&buf))
Generally, we prefer an explicit < 0 comparison when checking for failure.
+ goto error; + + virBufferAsprintf(&buf, ",mems=%d", def->cpu->cells[i].mem); +
Why do we need to bother with stripping a trailing comma in qemuBuildNumaCPUArgStr, if we will just be adding a comma back again here as the very next statement? You could skip all the hassle of adding virBufferTruncate by just transferring the comma out of this statement and into qemuBuildNumaCPUArgStr (that said, I still think virBufferTruncate will be a useful addition in other contexts).
+ if (virBufferError(&buf)) + goto error; + + node = virBufferContentAndReset(&buf); + virCommandAddArg(cmd, node); + VIR_FREE(node);
It's more efficient to replace these five lines with one: virCommandAddArgBuffer(cmd, &buf);
@@ -3414,6 +3482,9 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, smp); VIR_FREE(smp);
+ if (def->cpu&& def->cpu->ncells&& qemuBuildNumaArgStr(def, cmd))
Again, explicit < 0 check when looking for errors.
+<topology sockets="2" cores="4" threads="2"/> +<numa> +<cell cpus="0-7" mems="109550"/> +<cell cpus="8-15" mems="109550"/>
Of course, this will need tweaking to match any XML changes made in 1/3, but thanks for adding test cases! Overall, I think we'll need a v3 (you may want to use git send-email --subject-prefix=PATCHv3; it wasn't very clear from the subject line that this was already a v2 series), but I like where it's heading. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Mon, Nov 07, 2011 at 11:47:10AM -0700, Eric Blake wrote:
On 11/06/2011 06:59 AM, Bharata B Rao wrote:
qemu: Generate -numa option
From: Bharata B Rao<bharata@linux.vnet.ibm.com>
Add routines to generate -numa QEMU command line option based on <numa> ...</numa> XML specifications.
Signed-off-by: Bharata B Rao<bharata@linux.vnet.ibm.com>
+static int +qemuBuildNumaCPUArgStr(char *cpumask, virBufferPtr buf) +{ + int i, first, last; + int cpuSet = 0; +
What happens if cpumask is all 0?
+ for (i = 0; i< VIR_DOMAIN_CPUMASK_LEN; i++) { + if (cpumask[i]) {
This if branch is skipped,
+ if (cpuSet) + last = i; + else { + first = last = i; + cpuSet = 1; + } + } else { + if (!cpuSet) + continue;
so this branch always continues,
+ if (first == last) + virBufferAsprintf(buf, "%d,", first); + else + virBufferAsprintf(buf, "%d-%d,", first, last); + cpuSet = 0; + } + } + + if (cpuSet) {
and this if is skipped,
+ if (first == last) + virBufferAsprintf(buf, "%d,", first); + else + virBufferAsprintf(buf, "%d-%d,", first, last); + } + + /* Remove the trailing comma */ + return virBufferTruncate(buf, 1);
meaning that nothing was appended to buf, and you are now stripping unknown text, rather than a comma you just added. Do we need a sanity check to ensure that the cpumask specifies at least one cpu? And if so, would that mask be better here, or up front at the xml parsing time?
Good catch. I am now ensuring that this doesn't get a mask with no cpus.
+} + +static int +qemuBuildNumaArgStr(const virDomainDefPtr def, virCommandPtr cmd) +{ + int i; + char *node; + virBuffer buf = VIR_BUFFER_INITIALIZER; + + for (i = 0; i< def->cpu->ncells; i++) { + virCommandAddArg(cmd, "-numa"); + virBufferAsprintf(&buf, "%s", "node");
More efficient as virBufferAddLit(&buf, "node"), or...
Right.
+ virBufferAsprintf(&buf, ",nodeid=%d", def->cpu->cells[i].cellid);
merge these two into a single:
virBufferAsprintf(&buf, "node,nodeid=%d", ...);
+ virBufferAsprintf(&buf, ",cpus=");
Again, with no % in the format string, it is more efficient to use virBufferAddLit.
+ + if (qemuBuildNumaCPUArgStr(def->cpu->cells[i].cpumask,&buf))
Generally, we prefer an explicit < 0 comparison when checking for failure.
Ok as you prefer :)
+ goto error; + + virBufferAsprintf(&buf, ",mems=%d", def->cpu->cells[i].mem); +
Why do we need to bother with stripping a trailing comma in qemuBuildNumaCPUArgStr, if we will just be adding a comma back again here as the very next statement? You could skip all the hassle of adding virBufferTruncate by just transferring the comma out of this statement and into qemuBuildNumaCPUArgStr
This is what I mentioned in my last iteration. But since you hinted at extending virBuffer, I went on that path. Another motivation was to keep qemuBuildNumaArgStr() generic enough so that it gives out comma/dash separate CPU string without the ending comma in case this functionality is needed elsewhere in libvirt. But I guess I will stick with not appending comma to "memory" in which case this extension to virBuffer isn't needed.
(that said, I still think virBufferTruncate will be a useful addition in other contexts).
I think we should add virBufferTruncate only when we get any user for that. Hence I am not including virBufferTruncate in v3 patchset.
+ if (virBufferError(&buf)) + goto error; + + node = virBufferContentAndReset(&buf); + virCommandAddArg(cmd, node); + VIR_FREE(node);
It's more efficient to replace these five lines with one:
virCommandAddArgBuffer(cmd, &buf);
I can't because this is in a loop and I want to break out from the loop in case of buffer error. virCommandAddArgBuffer doesn't allow that. But I did replace the last 3 lines with virCommandAddArgBuffer :)
@@ -3414,6 +3482,9 @@ qemuBuildCommandLine(virConnectPtr conn, virCommandAddArg(cmd, smp); VIR_FREE(smp);
+ if (def->cpu&& def->cpu->ncells&& qemuBuildNumaArgStr(def, cmd))
Again, explicit < 0 check when looking for errors.
Sure.
+<topology sockets="2" cores="4" threads="2"/> +<numa> +<cell cpus="0-7" mems="109550"/> +<cell cpus="8-15" mems="109550"/>
Of course, this will need tweaking to match any XML changes made in 1/3, but thanks for adding test cases!
Tweaked the testcases. BTW the testcases were useful to me as they uncovered 2 bugs in my code!
Overall, I think we'll need a v3 (you may want to use git send-email --subject-prefix=PATCHv3; it wasn't very clear from the subject line that this was already a v2 series), but I like where it's heading.
Will ensure [PATCH v3] for the next iteration. Thanks. v3 on its way. Regards, Bharata.

On 11/06/2011 06:55 AM, Bharata B Rao wrote:
daemon-conf fails, but it fails even without my patches too. I guess my patches are really affecting it.
TEST: daemon-conf .....!!!!!!...!!!!!!!!!!!!!!!!!!!./daemon-conf: line 98: kill: (7811) - No such process ! 34 FAILED
That's a (known) bug in the testsuite; we really need to fix that test to skip instead of fail if you don't have the right prereqs installed. Basically, you need cyrus-sasl-devel installed for the test to have a chance of passing: https://www.redhat.com/archives/libvir-list/2010-March/msg00395.html -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org
participants (2)
-
Bharata B Rao
-
Eric Blake