[Libvir] [RFC][PATCH 0/2] Tested NUMA patches for available memory and topology

I have tested the patches on a NUMA and a non-NUMA configuration, and they fundamentally appear to work. The first patch is for accessing available memory on a per-node basis. The second patch is for accessing NUMA node topology. I've gotten some helpful suggestions about my string parsing code, introducing me to sscanf :-). I've become convinced that there are more elegant ways to do this which would have about the same level of error checking. However, I am out of time if Daniel wants to check this in this week. So I offer what I do have, which functions, but is not elegant. I have been playing with other ways to do this and am not far from finished. So, Daniel, you need to tell me if you want to take this code, and possibly upgrade to something more compact later, or if you'd like to wait for the next revision. One point to comment on for posterity is that the string returned from xend is not what might be expected. An example: printf("string is %s\n", tempstr); would return string is node0:0\n node1:1 So, this means that the '\n' sent by the xend python code is somehow translated to "\n". And the trailing '\n' that should be there isn't. Instead there is a '\0'. Looking at the xend code, it would appear this string should look like: "node0:0\n node1:1\n" where these are '\n'. -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: eak@us.ibm.com

On Fri, Sep 28, 2007 at 12:06:24AM -0400, beth kon wrote:
I have tested the patches on a NUMA and a non-NUMA configuration, and they fundamentally appear to work. The first patch is for accessing available memory on a per-node basis.
The second patch is for accessing NUMA node topology. I've gotten some helpful suggestions about my string parsing code, introducing me to sscanf :-). I've become convinced that there are more elegant ways to do this which would have about the same level of error checking. However, I am out of time if Daniel wants to check this in this week. So I offer what I do have, which functions, but is not elegant. I have been playing with other ways to do this and am not far from finished. So, Daniel, you need to tell me if you want to take this code, and possibly upgrade to something more compact later, or if you'd like to wait for the next revision.
At this point I won't be too picky about parsing the xend output for the NUMA topology, as long as it works and is tested and has no obvious hole that's good enough. So I have commited the 4 patches to CVS: - my initial patch - your 2 patches - the virsh freecell extension I had to clean up a few things for example the warnings raised by Rich, added the new call to the exported symbol list in the library, extended the virsh man page and added the new call to the entry point list support page. However I think there is at least a few things still left to be done before pushing this in a new release: - if possible get remote operations for the new call - a bit more testing for example I found out virsh # freecell 0 0: 64339968 kB virsh # freecell Total: 64339968 kB virsh # freecell 1 libvir: Xen error : invalid argument in xenHypervisorNodeGetCellsFreeMemory: invalid argument virsh # freecell -1 Total: 64339968 kB virsh # freecell -2 -2: 0 kB we should probably see an error in the last 2 case instead - isolate as a separate call what is the total sum of free memory available on the Node - on NUMA boxes in the capability dump I would like to see the amount of memory available on the cell see https://www.redhat.com/archives/libvir-list/2007-September/msg00015.html <memory size='2097152'/> in the topology example. - now that the code is in CVS reorganize a bit for example move back generic code to xen_unified.c Anyway it looks like we are in good shape, thanks a lot ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard wrote:
- isolate as a separate call what is the total sum of free memory available on the Node
There is currently no way to get that information from Xen.
- on NUMA boxes in the capability dump I would like to see the amount of memory available on the cell see https://www.redhat.com/archives/libvir-list/2007-September/msg00015.html <memory size='2097152'/> in the topology example. - now that the code is in CVS reorganize a bit for example move back generic code to xen_unified.c
Anyway it looks like we are in good shape,
thanks a lot !
Daniel
-- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: eak@us.ibm.com

* Elizabeth Kon <eak@us.ibm.com> [2007-09-28 11:27]:
Daniel Veillard wrote:
- isolate as a separate call what is the total sum of free memory available on the Node
There is currently no way to get that information from Xen.
no, we can always get a total of _free_ memory, we just don't have a call for _total_ ram (ie, free and non-free) -- only what's in the heap (free mem).
- on NUMA boxes in the capability dump I would like to see the amount of memory available on the cell see https://www.redhat.com/archives/libvir-list/2007-September/msg00015.html <memory size='2097152'/> in the topology example. - now that the code is in CVS reorganize a bit for example move back generic code to xen_unified.c
Anyway it looks like we are in good shape,
thanks a lot !
Daniel
-- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: eak@us.ibm.com
-- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com

Ryan Harper wrote:
* Elizabeth Kon <eak@us.ibm.com> [2007-09-28 11:27]:
Daniel Veillard wrote:
- isolate as a separate call what is the total sum of free memory available on the Node
There is currently no way to get that information from Xen.
no, we can always get a total of _free_ memory, we just don't have a call for _total_ ram (ie, free and non-free) -- only what's in the heap (free mem).
I asked DV about this off-list and he said he actually wanted total, not free. DV please correct me if I misunderstood.
- on NUMA boxes in the capability dump I would like to see the amount of memory available on the cell see https://www.redhat.com/archives/libvir-list/2007-September/msg00015.html <memory size='2097152'/> in the topology example. - now that the code is in CVS reorganize a bit for example move back generic code to xen_unified.c
Anyway it looks like we are in good shape,
thanks a lot !
Daniel
-- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: eak@us.ibm.com
-- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: eak@us.ibm.com

* Elizabeth Kon <eak@us.ibm.com> [2007-09-28 12:32]:
Ryan Harper wrote:
* Elizabeth Kon <eak@us.ibm.com> [2007-09-28 11:27]:
Daniel Veillard wrote:
- isolate as a separate call what is the total sum of free memory available on the Node
There is currently no way to get that information from Xen.
no, we can always get a total of _free_ memory, we just don't have a call for _total_ ram (ie, free and non-free) -- only what's in the heap (free mem).
I asked DV about this off-list and he said he actually wanted total, not free. DV please correct me if I misunderstood.
Ah, OK - the text as written mentions _free_ - which is why I responded. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com

On Fri, Sep 28, 2007 at 12:41:21PM -0500, Ryan Harper wrote:
* Elizabeth Kon <eak@us.ibm.com> [2007-09-28 12:32]:
no, we can always get a total of _free_ memory, we just don't have a call for _total_ ram (ie, free and non-free) -- only what's in the heap (free mem).
I asked DV about this off-list and he said he actually wanted total, not free. DV please correct me if I misunderstood.
Ah, OK - the text as written mentions _free_ - which is why I responded.
It seems a bit silly to me to have topology informations about which CPUs are part of the same Cell (i.e. share the same memory costs) but being unable to find out how much memory is actually local to that cell. Sure the current free heap on that cell helps to place new jobs but it's only a temporary view. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

* Daniel Veillard <veillard@redhat.com> [2007-09-28 12:59]:
On Fri, Sep 28, 2007 at 12:41:21PM -0500, Ryan Harper wrote:
* Elizabeth Kon <eak@us.ibm.com> [2007-09-28 12:32]:
no, we can always get a total of _free_ memory, we just don't have a call for _total_ ram (ie, free and non-free) -- only what's in the heap (free mem).
I asked DV about this off-list and he said he actually wanted total, not free. DV please correct me if I misunderstood.
Ah, OK - the text as written mentions _free_ - which is why I responded.
It seems a bit silly to me to have topology informations about which CPUs are part of the same Cell (i.e. share the same memory costs) but being unable to find out how much memory is actually local to that cell. Sure the current free heap on that cell helps to place new jobs but it's only a temporary view.
I don't see how having the total changes anything - we need current free to determine where the next (even first) vm should go. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com

On Fri, Sep 28, 2007 at 01:08:08PM -0500, Ryan Harper wrote:
* Daniel Veillard <veillard@redhat.com> [2007-09-28 12:59]:
On Fri, Sep 28, 2007 at 12:41:21PM -0500, Ryan Harper wrote:
* Elizabeth Kon <eak@us.ibm.com> [2007-09-28 12:32]:
no, we can always get a total of _free_ memory, we just don't have a call for _total_ ram (ie, free and non-free) -- only what's in the heap (free mem).
I asked DV about this off-list and he said he actually wanted total, not free. DV please correct me if I misunderstood.
Ah, OK - the text as written mentions _free_ - which is why I responded.
It seems a bit silly to me to have topology informations about which CPUs are part of the same Cell (i.e. share the same memory costs) but being unable to find out how much memory is actually local to that cell. Sure the current free heap on that cell helps to place new jobs but it's only a temporary view.
I don't see how having the total changes anything - we need current free to determine where the next (even first) vm should go.
While its not technically neccessary, it will help with an UI visualization of the host's allocation state, which IMHO is pretty important because we need good visualization to help users understand this stuff. BTW, does the Xen model allow for fact that you can have NUMA cells which only have memory - ie no CPUs attached. This is something that's possible in ia64 boxes... Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

* Daniel P. Berrange <berrange@redhat.com> [2007-09-28 13:54]:
On Fri, Sep 28, 2007 at 01:08:08PM -0500, Ryan Harper wrote:
* Daniel Veillard <veillard@redhat.com> [2007-09-28 12:59]:
On Fri, Sep 28, 2007 at 12:41:21PM -0500, Ryan Harper wrote:
* Elizabeth Kon <eak@us.ibm.com> [2007-09-28 12:32]:
no, we can always get a total of _free_ memory, we just don't have a call for _total_ ram (ie, free and non-free) -- only what's in the heap (free mem).
I asked DV about this off-list and he said he actually wanted total, not free. DV please correct me if I misunderstood.
Ah, OK - the text as written mentions _free_ - which is why I responded.
It seems a bit silly to me to have topology informations about which CPUs are part of the same Cell (i.e. share the same memory costs) but being unable to find out how much memory is actually local to that cell. Sure the current free heap on that cell helps to place new jobs but it's only a temporary view.
I don't see how having the total changes anything - we need current free to determine where the next (even first) vm should go.
While its not technically neccessary, it will help with an UI visualization of the host's allocation state, which IMHO is pretty important because we need good visualization to help users understand this stuff.
Fair enough. Currently xen doesn't give us this information directly -- the raw SRAT table parsing includes physical addr ranges that could be used to calculate the total size of a numa-node. I might have an old patch that exported the physical ranges for each node as part of the physinfo hcall -- that wasn't accepted by the Xen folks at the time.
BTW, does the Xen model allow for fact that you can have NUMA cells which only have memory - ie no CPUs attached. This is something that's possible in ia64 boxes...
AFAIK, yes. The Xen NUMA parsing and data structures are based upon Linux NUMA which as I understand it handles the above case.
Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
-- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com

On Fri, Sep 28, 2007 at 10:46:04AM -0400, Daniel Veillard wrote:
On Fri, Sep 28, 2007 at 12:06:24AM -0400, beth kon wrote: However I think there is at least a few things still left to be done before pushing this in a new release: - if possible get remote operations for the new call
well no progress on that I assume the release will be without it
- isolate as a separate call what is the total sum of free memory available on the Node
done virNodeGetFreeMemory(), but it's plugged only for Xen, no remote support (where it would really be useful) nor QEmu back-end.
- a bit more testing for example I found out virsh # freecell -1 Total: 64339968 kB
virsh # freecell -2 -2: 0 kB
withn previous change this could be checked
- on NUMA boxes in the capability dump I would like to see the amount of memory available on the cell see https://www.redhat.com/archives/libvir-list/2007-September/msg00015.html <memory size='2097152'/> in the topology example.
Apparently this is not available (the installed memory per cell not the what is currently free).
- now that the code is in CVS reorganize a bit for example move back generic code to xen_unified.c
I didn't do that, I was afraid of breaking stuff I could not test at this stage Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Sun, Sep 30, 2007 at 09:09:31AM -0400, Daniel Veillard wrote:
On Fri, Sep 28, 2007 at 10:46:04AM -0400, Daniel Veillard wrote:
On Fri, Sep 28, 2007 at 12:06:24AM -0400, beth kon wrote: However I think there is at least a few things still left to be done before pushing this in a new release: - if possible get remote operations for the new call
well no progress on that I assume the release will be without it
- isolate as a separate call what is the total sum of free memory available on the Node
done virNodeGetFreeMemory(), but it's plugged only for Xen, no remote support (where it would really be useful) nor QEmu back-end.
Unless someone else was already planning to look at QEMU/Remote stuff I'll take a look it starting this
- a bit more testing for example I found out virsh # freecell -1 Total: 64339968 kB
virsh # freecell -2 -2: 0 kB
withn previous change this could be checked
- on NUMA boxes in the capability dump I would like to see the amount of memory available on the cell see https://www.redhat.com/archives/libvir-list/2007-September/msg00015.html <memory size='2097152'/> in the topology example.
Apparently this is not available (the installed memory per cell not the what is currently free).
It is possible on Linux (aka QEMU/KVM driver will have it) and there are patches for Xen we just need to push upstream. So I don't see a problem noting that this will be added in future. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
participants (4)
-
beth kon
-
Daniel P. Berrange
-
Daniel Veillard
-
Ryan Harper