On 11/06/2011 06:57 AM, Bharata B Rao wrote:
XML definitions for guest NUMA and parsing routines.
From: Bharata B Rao<bharata(a)linux.vnet.ibm.com>
This patch adds XML definitions for guest NUMA specification and contains
routines to parse the same. The guest NUMA specification looks like this:
<cpu>
...
<topology sockets='2' cores='4' threads='2'/>
<numa>
<cell cpus='0-7' mems='512000'/>
<cell cpus='8-15' mems='512000'/>
</numa>
...
</cpu>
Signed-off-by: Bharata B Rao<bharata(a)linux.vnet.ibm.com>
---
+<p>
+ Guest NUMA topology can be specifed using<code>numa</code> element.
+<span class="since">Since X.X.X</span>
Let's just put 0.9.8 here. It's easier at feature freeze time to grep
and replace a concrete 0.9.8 into a different numbering scheme (0.10.0,
1.0.0, ?) if we decide on something different than 0.9.8, than it is to
remember to also search for X.X.X.
+</p>
+
+<pre>
+ ...
+<cpu>
+ ...
+<numa>
+<cell cpus='0-3' mems='512000'/>
+<cell cpus='4-7' mems='512000'/>
I understand 'cpus' (a valid word meaning multiple cpu units), but
'mems' seems odd; I think it would be better naming this attribute
'memory' to match our <memory> element at the top level. Just because
qemu's command line names the option mems= doesn't mean we should be
stuck making our XML inconsistent.
+</numa>
+ ...
+</cpu>
+ ...</pre>
+
+<p>
+ Each<code>cell</code> element specifies a NUMA cell or a NUMA node.
+<code>cpus</code> specifies the CPU or range of CPUs that are part of
+ the node.<code>mems</code> specifies the node memory in kilobytes
+ (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid
+ or nodeid in the increasing order starting from 0.
I agree with doing things in 1024-byte blocks [1], since <memory> and
<currentMemory> are also in that unit.
[1] 1024-byte blocks is technically kibibytes, not kilobytes; but you're
copying from existing text, so at least we're consistent, not to mention
fitting right in with the wikipedia complaint that KiB has had slow
adoption by the computer industry:
https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :)
+</p>
+
+<p>
+ This guest NUMA specification translates to<code>-numa</code>
command
+ line option for QEMU/KVM. For the above example, the following QEMU
+ command line option is generated:
+<code>-numa node,nodeid=0,cpus=0-3,mems=512000 -numa
node,nodeid=1,cpus=4-7,mems=512000</code>
This paragraph is not necessary. We don't need to give one
hypervisor-specific example of how the XML is translated; it is
sufficient to simply document the XML semantics in a way that can be
implemented by any number of hypervisors.
+
+<define name="numaCell">
+<element name="cell">
+<attribute name="cpus">
+<ref name="Cellcpus"/>
Typically, ref names start with a lower case letter.
+</attribute>
+<attribute name="mems">
+<ref name="Cellmems"/>
+</attribute>
Is it possible for these attributes to be optional? That is, on the
qemu line, can I specify cpus= but not mems=, or mems= but not cpus=?
If so, then the attributes need to be optional in the schema, and the
code behave with sane defaults when one of the two attributes is not
present.
@@ -2745,4 +2767,14 @@
<param name="pattern">[a-zA-Z0-9_\.:]+</param>
</data>
</define>
+<define name="Cellcpus">
+<data type="string">
+<param
name="pattern">([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*</param>
This looks like a repeat of <define name="cpuset">; if so, let's reuse
that define instead of making a new one (and if not, why are we
introducing yet another syntax?).
+</data>
+</define>
+<define name="Cellmems">
+<data type="unsignedInt">
+<param name="pattern">[0-9]+</param>
Likewise, this looks like a repeat of <define name="memoryKB">, so lets
reuse that.
@@ -109,6 +114,19 @@ no_memory:
return NULL;
}
+static int
+virCPUDefNumaCPUs(virCPUDefPtr def)
+{
+ int i, j, ncpus = 0;
+
+ for (i = 0; i< def->ncells; i++) {
+ for (j = 0; j< VIR_DOMAIN_CPUMASK_LEN; j++) {
+ if (def->cells[i].cpumask[j])
+ ncpus++;
+ }
+ }
Can this loop be made any faster by using count_one_bits?
+ return ncpus;
+}
virCPUDefPtr
virCPUDefParseXML(const xmlNodePtr node,
@@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node,
def->features[i].policy = policy;
}
+ if (virXPathNode("./numa[1]", ctxt)) {
+ VIR_FREE(nodes);
+ n = virXPathNodeSet("./numa[1]/cell", ctxt,&nodes);
+ if (n< 0 || n == 0) {
This looks a bit funny, compared to if (n <= 0).
+ for (i = 0 ; i< n ; i++) {
+ char *cpus;
+ int cpumasklen = VIR_DOMAIN_CPUMASK_LEN;
+ unsigned long ul;
+ int ret;
+
+ def->cells[i].cellid = i;
+ cpus = virXMLPropString(nodes[i], "cpus");
+
+ if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen)< 0)
+ goto no_memory;
+
+ if (virDomainCpuSetParse((const char **)&cpus,
+ 0, def->cells[i].cpumask,
+ cpumasklen)< 0)
Does this behave properly if the cpus=... attribute was missing?
+ goto error;
+
+ ret = virXPathULong("string(./numa[1]/cell/@mems)",
+ ctxt,&ul);
+ if (ret< 0) {
+ virCPUReportError(VIR_ERR_INTERNAL_ERROR,
+ "%s", _("Missing 'mems' attribute in NUMA
topology"));
+ goto error;
Especially since you checked for a missing mems= counterpart? And back
to my earlier question of whether both attributes are really mandatory,
or whether just one in isolation can make sense.
+++ b/src/conf/cpu_conf.h
@@ -67,6 +67,14 @@ struct _virCPUFeatureDef {
int policy; /* enum virCPUFeaturePolicy */
};
+typedef struct _virCellDef virCellDef;
+typedef virCellDef *virCellDefPtr;
+struct _virCellDef {
+ int cellid;
+ char *cpumask; /* CPUs that are part of this node */
+ unsigned int mem; /* Node memory */
I'd write this as /* Node memory in kB */, to make it clear what scale
is in use.
I saw the code for parsing the XML, but what about the code for
generating the xml during 'virsh dumpxml' (aka virDomainDefFormat)? The
patch is incomplete without that.
--
Eric Blake eblake(a)redhat.com +1-801-349-2682
Libvirt virtualization library
http://libvirt.org