Re: [libvirt] [PATCH 1/3] XML definitions for guest NUMA

Friday, 11 November 2011

On Mon, Nov 07, 2011 at 11:35:27AM -0700, Eric Blake wrote:
...
 On 11/06/2011 06:57 AM, Bharata B Rao wrote:
 >XML definitions for guest NUMA and parsing routines.
 >
 >From: Bharata B Rao&lt;bharata(a)linux.vnet.ibm.com&gt;
 >
 >This patch adds XML definitions for guest NUMA specification and contains
 >routines to parse the same. The guest NUMA specification looks like this:
 >
 ><cpu>
 >         ...
 >         <topology sockets='2' cores='4' threads='2'/>
 >         <numa>
 >                 <cell cpus='0-7' mems='512000'/>
 >                 <cell cpus='8-15' mems='512000'/>
 >         </numa>
 >         ...
 ></cpu>
 >
 >Signed-off-by: Bharata B Rao&lt;bharata(a)linux.vnet.ibm.com&gt;
 >---

 >+<p>
 >+      Guest NUMA topology can be specifed using<code>numa</code> 
element.
 >+<span class="since">Since X.X.X</span>

 Let's just put 0.9.8 here.  It's easier at feature freeze time to
 grep and replace a concrete 0.9.8 into a different numbering scheme
 (0.10.0, 1.0.0, ?) if we decide on something different than 0.9.8,
 than it is to remember to also search for X.X.X. 
Ok, changed to 0.9.8 in v3.

...

 >+</p>
 >+
 >+<pre>
 >+  ...
 >+&lt;cpu&gt;
 >+    ...
 >+&lt;numa&gt;
 >+&lt;cell cpus='0-3' mems='512000'/&gt;
 >+&lt;cell cpus='4-7' mems='512000'/&gt;

 I understand 'cpus' (a valid word meaning multiple cpu units), but
 'mems' seems odd; I think it would be better naming this attribute
 'memory' to match our <memory> element at the top level.  Just
 because qemu's command line names the option mems= doesn't mean we
 should be stuck making our XML inconsistent. 
Changed to memory.

...

 >+&lt;/numa&gt;
 >+    ...
 >+&lt;/cpu&gt;
 >+  ...</pre>
 >+
 >+<p>
 >+      Each<code>cell</code>  element specifies a NUMA cell or a NUMA
node.
 >+<code>cpus</code>  specifies the CPU or range of CPUs that are part of
 >+      the node.<code>mems</code>  specifies the node memory in
kilobytes
 >+      (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid
 >+      or nodeid in the increasing order starting from 0.

 I agree with doing things in 1024-byte blocks [1], since <memory>
 and <currentMemory> are also in that unit.

 [1] 1024-byte blocks is technically kibibytes, not kilobytes; but
 you're copying from existing text, so at least we're consistent, not
 to mention fitting right in with the wikipedia complaint that KiB
 has had slow adoption by the computer industry:
 https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :)

 >+</p>
 >+
 >+<p>
 >+      This guest NUMA specification translates to<code>-numa</code> 
command
 >+      line option for QEMU/KVM. For the above example, the following QEMU
 >+      command line option is generated:
 >+<code>-numa node,nodeid=0,cpus=0-3,mems=512000 -numa
node,nodeid=1,cpus=4-7,mems=512000</code>

 This paragraph is not necessary.  We don't need to give one
 hypervisor-specific example of how the XML is translated; it is
 sufficient to simply document the XML semantics in a way that can be
 implemented by any number of hypervisors. 
Ok, removed this paragraph.

...

 >+
 >+<define name="numaCell">
 >+<element name="cell">
 >+<attribute name="cpus">
 >+<ref name="Cellcpus"/>

 Typically, ref names start with a lower case letter. 
Ok, changed.

...

 >+</attribute>
 >+<attribute name="mems">
 >+<ref name="Cellmems"/>
 >+</attribute>

 Is it possible for these attributes to be optional?  That is, on the
 qemu line, can I specify cpus= but not mems=, or mems= but not
 cpus=? If so, then the attributes need to be optional in the schema,
 and the code behave with sane defaults when one of the two
 attributes is not present. 
Actually qemu allows both cpus and mem to be optional. But if we allow
that, we will have specifications like this:

<numa>
  <cell />
  <cell />
</numa>

That looks ugly.

Either both of them should allowed to be optional or none of them, I am
going for the latter specification. IOW lets make both of them
mandatory in a numa cell.

...

 >@@ -2745,4 +2767,14 @@
 >        <param name="pattern">[a-zA-Z0-9_\.:]+</param>
 >      </data>
 >    </define>
 >+<define name="Cellcpus">
 >+<data type="string">
 >+<param
name="pattern">([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*</param>

 This looks like a repeat of <define name="cpuset">; if so, let's
 reuse that define instead of making a new one (and if not, why are
 we introducing yet another syntax?). 
Fixed.

...

 >+</data>
 >+</define>
 >+<define name="Cellmems">
 >+<data type="unsignedInt">
 >+<param name="pattern">[0-9]+</param>

 Likewise, this looks like a repeat of <define name="memoryKB">, so
 lets reuse that. 
Done.

...

 >@@ -109,6 +114,19 @@ no_memory:
 >      return NULL;
 >  }
 >
 >+static int
 >+virCPUDefNumaCPUs(virCPUDefPtr def)
 >+{
 >+    int i, j, ncpus = 0;
 >+
 >+    for (i = 0; i<  def->ncells; i++) {
 >+        for (j = 0; j<  VIR_DOMAIN_CPUMASK_LEN; j++) {
 >+            if (def->cells[i].cpumask[j])
 >+                ncpus++;
 >+        }
 >+    }

 Can this loop be made any faster by using count_one_bits? 
Yes provided we start storing CPUs in bitmasks rather than
char arrays. But I saw other parts of the code (like cpuset)
storing CPUs in array and hence used the same for numa too.
In any case I got rid of this since we can arrive at this information
from other means.

...

 >+    return ncpus;
 >+}
 >
 >  virCPUDefPtr
 >  virCPUDefParseXML(const xmlNodePtr node,
 >@@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node,
 >          def->features[i].policy = policy;
 >      }
 >
 >+    if (virXPathNode("./numa[1]", ctxt)) {
 >+        VIR_FREE(nodes);
 >+        n = virXPathNodeSet("./numa[1]/cell", ctxt,&nodes);
 >+        if (n<  0 || n == 0) {

 This looks a bit funny, compared to if (n <= 0). 
Yes. I first accounted for the error case and later added a check for
zero cells in <numa>. Fixed this.

...

 >+        for (i = 0 ; i<  n ; i++) {
 >+            char *cpus;
 >+            int cpumasklen = VIR_DOMAIN_CPUMASK_LEN;
 >+            unsigned long ul;
 >+            int ret;
 >+
 >+            def->cells[i].cellid = i;
 >+            cpus = virXMLPropString(nodes[i], "cpus");
 >+
 >+            if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen)<  0)
 >+                goto no_memory;
 >+
 >+            if (virDomainCpuSetParse((const char **)&cpus,
 >+                                 0, def->cells[i].cpumask,
 >+                                 cpumasklen)<  0)

 Does this behave properly if the cpus=... attribute was missing? 
This was buggy. Fixed now.

...

 >+                goto error;
 >+
 >+            ret = virXPathULong("string(./numa[1]/cell/@mems)",
 >+                            ctxt,&ul);
 >+            if (ret<  0) {
 >+                virCPUReportError(VIR_ERR_INTERNAL_ERROR,
 >+                    "%s", _("Missing 'mems' attribute in NUMA
topology"));
 >+                goto error;

 Especially since you checked for a missing mems= counterpart?  And
 back to my earlier question of whether both attributes are really
 mandatory, or whether just one in isolation can make sense. 
As explained above, I made both of them mandatory now.

...

 >+++ b/src/conf/cpu_conf.h
 >@@ -67,6 +67,14 @@ struct _virCPUFeatureDef {
 >      int policy;         /* enum virCPUFeaturePolicy */
 >  };
 >
 >+typedef struct _virCellDef virCellDef;
 >+typedef virCellDef *virCellDefPtr;
 >+struct _virCellDef {
 >+   int cellid;
 >+   char *cpumask;	/* CPUs that are part of this node */
 >+   unsigned int mem;	/* Node memory */

 I'd write this as /* Node memory in kB */, to make it clear what
 scale is in use. 
Done.

...

 I saw the code for parsing the XML, but what about the code for
 generating the xml during 'virsh dumpxml' (aka virDomainDefFormat)?
 The patch is incomplete without that. 
Thanks for pointing this out. Added code for this now.

Regards,
Bharata.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH 1/3] XML definitions for guest NUMA