On Mon, Nov 07, 2011 at 11:35:27AM -0700, Eric Blake wrote:
On 11/06/2011 06:57 AM, Bharata B Rao wrote:
>XML definitions for guest NUMA and parsing routines.
>
>From: Bharata B Rao<bharata(a)linux.vnet.ibm.com>
>
>This patch adds XML definitions for guest NUMA specification and contains
>routines to parse the same. The guest NUMA specification looks like this:
>
><cpu>
> ...
> <topology sockets='2' cores='4' threads='2'/>
> <numa>
> <cell cpus='0-7' mems='512000'/>
> <cell cpus='8-15' mems='512000'/>
> </numa>
> ...
></cpu>
>
>Signed-off-by: Bharata B Rao<bharata(a)linux.vnet.ibm.com>
>---
>+<p>
>+ Guest NUMA topology can be specifed using<code>numa</code>
element.
>+<span class="since">Since X.X.X</span>
Let's just put 0.9.8 here. It's easier at feature freeze time to
grep and replace a concrete 0.9.8 into a different numbering scheme
(0.10.0, 1.0.0, ?) if we decide on something different than 0.9.8,
than it is to remember to also search for X.X.X.
Ok, changed to 0.9.8 in v3.
>+</p>
>+
>+<pre>
>+ ...
>+<cpu>
>+ ...
>+<numa>
>+<cell cpus='0-3' mems='512000'/>
>+<cell cpus='4-7' mems='512000'/>
I understand 'cpus' (a valid word meaning multiple cpu units), but
'mems' seems odd; I think it would be better naming this attribute
'memory' to match our <memory> element at the top level. Just
because qemu's command line names the option mems= doesn't mean we
should be stuck making our XML inconsistent.
Changed to memory.
>+</numa>
>+ ...
>+</cpu>
>+ ...</pre>
>+
>+<p>
>+ Each<code>cell</code> element specifies a NUMA cell or a NUMA
node.
>+<code>cpus</code> specifies the CPU or range of CPUs that are part of
>+ the node.<code>mems</code> specifies the node memory in
kilobytes
>+ (i.e. blocks of 1024 bytes). Each cell or node is assigned cellid
>+ or nodeid in the increasing order starting from 0.
I agree with doing things in 1024-byte blocks [1], since <memory>
and <currentMemory> are also in that unit.
[1] 1024-byte blocks is technically kibibytes, not kilobytes; but
you're copying from existing text, so at least we're consistent, not
to mention fitting right in with the wikipedia complaint that KiB
has had slow adoption by the computer industry:
https://secure.wikimedia.org/wikipedia/en/wiki/Kibibyte :)
>+</p>
>+
>+<p>
>+ This guest NUMA specification translates to<code>-numa</code>
command
>+ line option for QEMU/KVM. For the above example, the following QEMU
>+ command line option is generated:
>+<code>-numa node,nodeid=0,cpus=0-3,mems=512000 -numa
node,nodeid=1,cpus=4-7,mems=512000</code>
This paragraph is not necessary. We don't need to give one
hypervisor-specific example of how the XML is translated; it is
sufficient to simply document the XML semantics in a way that can be
implemented by any number of hypervisors.
Ok, removed this paragraph.
>+
>+<define name="numaCell">
>+<element name="cell">
>+<attribute name="cpus">
>+<ref name="Cellcpus"/>
Typically, ref names start with a lower case letter.
Ok, changed.
>+</attribute>
>+<attribute name="mems">
>+<ref name="Cellmems"/>
>+</attribute>
Is it possible for these attributes to be optional? That is, on the
qemu line, can I specify cpus= but not mems=, or mems= but not
cpus=? If so, then the attributes need to be optional in the schema,
and the code behave with sane defaults when one of the two
attributes is not present.
Actually qemu allows both cpus and mem to be optional. But if we allow
that, we will have specifications like this:
<numa>
<cell />
<cell />
</numa>
That looks ugly.
Either both of them should allowed to be optional or none of them, I am
going for the latter specification. IOW lets make both of them
mandatory in a numa cell.
>@@ -2745,4 +2767,14 @@
> <param name="pattern">[a-zA-Z0-9_\.:]+</param>
> </data>
> </define>
>+<define name="Cellcpus">
>+<data type="string">
>+<param
name="pattern">([0-9]+(-[0-9]+)?|\^[0-9]+)(,([0-9]+(-[0-9]+)?|\^[0-9]+))*</param>
This looks like a repeat of <define name="cpuset">; if so, let's
reuse that define instead of making a new one (and if not, why are
we introducing yet another syntax?).
Fixed.
>+</data>
>+</define>
>+<define name="Cellmems">
>+<data type="unsignedInt">
>+<param name="pattern">[0-9]+</param>
Likewise, this looks like a repeat of <define name="memoryKB">, so
lets reuse that.
Done.
>@@ -109,6 +114,19 @@ no_memory:
> return NULL;
> }
>
>+static int
>+virCPUDefNumaCPUs(virCPUDefPtr def)
>+{
>+ int i, j, ncpus = 0;
>+
>+ for (i = 0; i< def->ncells; i++) {
>+ for (j = 0; j< VIR_DOMAIN_CPUMASK_LEN; j++) {
>+ if (def->cells[i].cpumask[j])
>+ ncpus++;
>+ }
>+ }
Can this loop be made any faster by using count_one_bits?
Yes provided we start storing CPUs in bitmasks rather than
char arrays. But I saw other parts of the code (like cpuset)
storing CPUs in array and hence used the same for numa too.
In any case I got rid of this since we can arrive at this information
from other means.
>+ return ncpus;
>+}
>
> virCPUDefPtr
> virCPUDefParseXML(const xmlNodePtr node,
>@@ -289,6 +307,50 @@ virCPUDefParseXML(const xmlNodePtr node,
> def->features[i].policy = policy;
> }
>
>+ if (virXPathNode("./numa[1]", ctxt)) {
>+ VIR_FREE(nodes);
>+ n = virXPathNodeSet("./numa[1]/cell", ctxt,&nodes);
>+ if (n< 0 || n == 0) {
This looks a bit funny, compared to if (n <= 0).
Yes. I first accounted for the error case and later added a check for
zero cells in <numa>. Fixed this.
>+ for (i = 0 ; i< n ; i++) {
>+ char *cpus;
>+ int cpumasklen = VIR_DOMAIN_CPUMASK_LEN;
>+ unsigned long ul;
>+ int ret;
>+
>+ def->cells[i].cellid = i;
>+ cpus = virXMLPropString(nodes[i], "cpus");
>+
>+ if (VIR_ALLOC_N(def->cells[i].cpumask, cpumasklen)< 0)
>+ goto no_memory;
>+
>+ if (virDomainCpuSetParse((const char **)&cpus,
>+ 0, def->cells[i].cpumask,
>+ cpumasklen)< 0)
Does this behave properly if the cpus=... attribute was missing?
This was buggy. Fixed now.
>+ goto error;
>+
>+ ret = virXPathULong("string(./numa[1]/cell/@mems)",
>+ ctxt,&ul);
>+ if (ret< 0) {
>+ virCPUReportError(VIR_ERR_INTERNAL_ERROR,
>+ "%s", _("Missing 'mems' attribute in NUMA
topology"));
>+ goto error;
Especially since you checked for a missing mems= counterpart? And
back to my earlier question of whether both attributes are really
mandatory, or whether just one in isolation can make sense.
As explained above, I made both of them mandatory now.
>+++ b/src/conf/cpu_conf.h
>@@ -67,6 +67,14 @@ struct _virCPUFeatureDef {
> int policy; /* enum virCPUFeaturePolicy */
> };
>
>+typedef struct _virCellDef virCellDef;
>+typedef virCellDef *virCellDefPtr;
>+struct _virCellDef {
>+ int cellid;
>+ char *cpumask; /* CPUs that are part of this node */
>+ unsigned int mem; /* Node memory */
I'd write this as /* Node memory in kB */, to make it clear what
scale is in use.
Done.
I saw the code for parsing the XML, but what about the code for
generating the xml during 'virsh dumpxml' (aka virDomainDefFormat)?
The patch is incomplete without that.
Thanks for pointing this out. Added code for this now.
Regards,
Bharata.