Daniel Veillard wrote:
1) Provide a function describing the topology as an XML instance:
char * virNodeGetTopology(virConnectPtr conn);
which would return an XML instance as in virConnectGetCapabilities.
I
toyed with the idea of extending virConnectGetCapabilities() to add a
topology section in case of NUMA support at the hypervisor level, but
it was looking to me that the two might be used at different times
and separating both might be a bit cleaner, but I could be convinced
otherwise.
I'd definitely prefer to extend virConnectGetCapabilities XML. It
avoids changing the remote driver and language bindings, and really
callers only need to pull capabilities once per connection.
---------------------------------
<topology>
<cells num='2'>
<cell id='0'>
<cpus num='2'>
<cpu id='0'/>
<cpu id='1'/>
</cpus>
<memory size='2097152'/>
</cell>
<cell id='1'>
<cpus num='2'>
<cpu id='2'/>
<cpu id='3'/>
</cpus>
<memory size='2097152'/>
</cell>
</cells>
</topology>
---------------------------------
A few things to note:
- the <cells> element list the top sibling cells
Not <nodes>?
- the <cell> element describes as child the resources
available
like the list of CPUs, the size of the local memory, that could
be extended by disk descriptions too
<disk dev='/dev/sdb'/>
and possibly other special devices (no idea what ATM).
- in case of deeper hierarchical topology one may need to be able to
name sub-cells and the format could be extended for example as
<cells num='2'>
<cells num='2'>
<cell id='1'>
...
</cell>
<cell id='2'>
...
</cell>
</cells>
<cells num='2'>
<cell id='3'>
...
</cell>
<cell id='4'>
...
</cell>
</cells>
</cells>
But that can be discussed/changed when the need arise :-)
Especially note that 4 (or more) socket AMDs have a topology like this,
with two different penalties for reaching nodes which are one and two
hops away. Do we have a way to describe the penalties along different
paths?
2) Function to get the free memory of a given cell:
unsigned long virNodeGetCellFreeMemory(virConnectPtr conn, int cell);
that's relatively simple, would match the request from the initial mail
but I'm wondering a bit. If the program tries to do a best placement it
will usually run that request for a number of cells no ? Maybe a call
returning the memory amounts for a range of cells would be more appropriate.
Yes, I guess they'd want to get the free memory for all nodes. But IBM
will have a better idea about this.
Rich.
--
Emerging Technologies, Red Hat -
http://et.redhat.com/~rjones/
Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod
Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in
England and Wales under Company Registration No. 03798903