Re: [Libvir] Extending libvirt to probe NUMA topology

6 Sep 2007


      On Thu, Sep 06, 2007 at 03:40:23PM +0100, Richard W.M. Jones wrote:
...
Daniel Veillard wrote:
...
1) Provide a function describing the topology as an XML instance:
char *	virNodeGetTopology(virConnectPtr conn);
...
which would return an XML instance as in virConnectGetCapabilities. I
toyed with the idea of extending virConnectGetCapabilities() to add a
topology section in case of NUMA support at the hypervisor level, but
it was looking to me that the two might be used at different times
and separating both might be a bit cleaner, but I could be convinced
otherwise.
I'd definitely prefer to extend virConnectGetCapabilities XML.  It 
avoids changing the remote driver and language bindings, and really 
callers only need to pull capabilities once per connection.
yeah, I understand that concern, simplifies a lot of stuff inside, but
the goal at the library level is to simplify the user code even if that
means a more complex implementation. However if people think they don't
need a separate call then I'm really fine with this.
...
...
---------------------------------
<topology>
 <cells num='2'>
   <cell id='0'>
     <cpus num='2'>
       <cpu id='0'/>
       <cpu id='1'/>
     </cpus>
     <memory size='2097152'/>
   </cell>
   <cell id='1'>
     <cpus num='2'>
       <cpu id='2'/>
       <cpu id='3'/>
     </cpus>
     <memory size='2097152'/>
   </cell>
 </cells>
</topology>
---------------------------------
A few things to note:
  - the <cells> element list the top sibling cells
Not <nodes>?
A Node in libvirt terminology is a single physical machine, cell is
a weel accepted term I think for a sub-node within a NUMA box.
...
...
- the <cell> element describes as child the resources available
    like the list of CPUs, the size of the local memory, that could
    be extended by disk descriptions too
    <disk dev='/dev/sdb'/>
    and possibly other special devices (no idea what ATM).
- in case of deeper hierarchical topology one may need to be able to
    name sub-cells and the format could be extended for example as
    <cells num='2'>
      <cells num='2'>
        <cell id='1'>
          ...
        </cell>
        <cell id='2'>
          ...
        </cell>
      </cells>
      <cells num='2'>
        <cell id='3'>
          ...
        </cell>
        <cell id='4'>
          ...
        </cell>
      </cells>
    </cells>
    But that can be discussed/changed when the need arise :-)
Especially note that 4 (or more) socket AMDs have a topology like this, 
with two different penalties for reaching nodes which are one and two 
hops away.  Do we have a way to describe the penalties along different 
paths?
As hinted in my mail, I think the access costs will have to be added
separately and probably as a array map, unless people come with a more 
intelligent way of exposing those informations.
...
...
2) Function to get the free memory of a given cell:
unsigned long virNodeGetCellFreeMemory(virConnectPtr conn, int cell);
that's relatively simple, would match the request from the initial mail
but I'm wondering a bit. If the program tries to do a best placement it
will usually run that request for a number of cells no ? Maybe a call
returning the memory amounts for a range of cells would be more 
appropriate.
Yes, I guess they'd want to get the free memory for all nodes.  But IBM 
will have a better idea about this.
Well I'm looking for feedback :-)

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
veillard@redhat.com  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/