[Libvir] RFC PATCH - Initial NodeGetCellsFreeMemory patch

Here is my first pass at a patch for accessing the available memory associated with each NUMA cell through libvirt. The initial framework patch provided by Daniel Veillard is a prereq of this patch: https://www.redhat.com/archives/libvir-list/2007-September/msg00069.html I have not yet tested this. I'm just distributing for comments. A few comments/questions: 1) I think I got the versioning stuff right but that deserves a close look. 2) In Daniel's patch, libvirt.h and libvirt.h.in were identical. I assumed the patch would be before running autogen.sh, and only contain changes in libvirt.h.in, so that's how I did mine. Let me know if I misunderstand something here. 3) I had to put #ifdef PROXY around calls to virXenErrorFunc in xenHypervisorNodeGetCellsFreeMemory to get this to build. I haven't figured out how the proxy code works so am not sure if this is the right approach. -- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: eak@us.ibm.com

On Thu, Sep 20, 2007 at 05:10:49PM -0400, beth kon wrote:
Here is my first pass at a patch for accessing the available memory associated with each NUMA cell through libvirt.
Thanks !
The initial framework patch provided by Daniel Veillard is a prereq of this patch: https://www.redhat.com/archives/libvir-list/2007-September/msg00069.html
That applied fine to my tree using CVS HEAD + the patch.
I have not yet tested this. I'm just distributing for comments.
A few comments/questions: 1) I think I got the versioning stuff right but that deserves a close look.
I looked at it, it looks fine but need testing for the differentiation between 3.1 which doesn't have the feature and current xen-unstable which does.
2) In Daniel's patch, libvirt.h and libvirt.h.in were identical. I
nearly identical, yes
assumed the patch would be before running autogen.sh, and only contain changes in libvirt.h.in, so that's how I did mine. Let me know if I misunderstand something here.
That's right, I just provided both as that's what CVS diff gave after autogen, no problem here.
3) I had to put #ifdef PROXY around calls to virXenErrorFunc in xenHypervisorNodeGetCellsFreeMemory to get this to build. I haven't figured out how the proxy code works so am not sure if this is the right approach.
It compiles only a subset of the xen_internal.c code. Don't worry too much about it, the proxy code fate is to disapear at some point.
int virNodeGetCellsFreeMemory(virConnectPtr conn, - unsigned long *freeMems, - int nbCells); + long long *freeMems, + int startCell, + int maxCells);
okay, so switch to long long and add a start cell number, perfect.
/** * virNodeGetCellFreeMemory: + * @conn: pointer to the hypervisor connection - * @freeMems: pointer to the array of unsigned long - * @nbCells: number of entries available in freeMems + * @freeMems: pointer to the array of long long + * @startCell: index of first cell to return freeMems info on (0 to + * maxCells - 1). + * @maxCells: number of entries available in freeMems (-1 for sum of + * free memory across all cells, returned in freeMems[0]) * - * This call allows to ask the amount of free memory in each NUMA cell. + * This call is a request for the amount of free memory, either in the whole + * node, or in each NUMA cell. * The @freeMems array must be allocated by the caller and will be filled * with the amounts of free memory in kilobytes for each cell starting - * from cell #0 and up to @nbCells -1 or the number of cell in the Node + * from cell #0 and up to @maxCells -1 or the number of cells in the Node * (which can be found using virNodeGetInfo() see the nodes entry in the * structure). + * If maxCells == -1, the free memory will be summed across all cells and + * returned in freeMems[0].
Hum, that -1 handling is a bit surprizing, but apparently that's what the Xen hypercall does. It's a bit convoluted as an API to get the full free memory on a Node, maybe this should be separated as a different API as people looking for this information may not think about looking at NUMA special entry points. We don't have currently a simple API to get the free memory on a Node, maybe that should be added.
+struct xen_v2s4_availheap { + uint32_t min_bitwidth; /* Smallest address width (zero if don't care). */ + uint32_t max_bitwidth; /* Largest address width (zero if don't care). */
I'm a bit puzzled by those 2 fields I still wonder what it is about :-)
+ int32_t node; /* NUMA node (-1 for sum across all nodes). */ + uint64_t avail_bytes; /* Bytes available in the specified region. */ +}; + +typedef struct xen_v2s4_availheap xen_v2s4_availheap; + [...] @@ -2012,7 +2028,7 @@ xenHypervisorInit(void) #endif return(-1); } - /* Currently consider RHEL5.0 Fedora7 and xen-unstable */ + /* Currently consider RHEL5.0 Fedora7, xen-3.1, and xen-unstable */ sys_interface_version = 2; /* XEN_SYSCTL_INTERFACE_VERSION */ if (virXen_getdomaininfo(fd, 0, &info) == 1) { /* RHEL 5.0 */ @@ -2035,7 +2051,7 @@ xenHypervisorInit(void)
sys_interface_version = 3; /* XEN_SYSCTL_INTERFACE_VERSION */ if (virXen_getdomaininfo(fd, 0, &info) == 1) { - /* xen-unstable */ + /* xen-3.1 */ dom_interface_version = 5; /* XEN_DOMCTL_INTERFACE_VERSION */ if (virXen_getvcpusinfo(fd, 0, 0, ipt, NULL, 0) == 0){ #ifdef DEBUG @@ -2045,6 +2061,18 @@ xenHypervisorInit(void) } }
+ sys_interface_version = 4; /* XEN_SYSCTL_INTERFACE_VERSION */ + if (virXen_getdomaininfo(fd, 0, &info) == 1) { + /* xen-unstable */ + dom_interface_version = 5; /* XEN_DOMCTL_INTERFACE_VERSION */ + if (virXen_getvcpusinfo(fd, 0, 0, ipt, NULL, 0) == 0){ +#ifdef DEBUG + fprintf(stderr, "Using hypervisor call v2, sys ver4 dom ver5\n"); +#endif + goto done; + } + } + hypervisor_version = 1; sys_interface_version = -1; if (virXen_getdomaininfo(fd, 0, &info) == 1) {
That looks right, I agree, but need testing for example on a normal F-7 which has 3.1 and one with updated xen.
+xenHypervisorNodeGetCellsFreeMemory(virConnectPtr conn, long long *freeMems, + int startCell, int maxCells) { - if ((conn == NULL) || (freeMems == NULL) || (nbCells < 0)) - return -1; + xen_op_v2_sys op_sys; + int i, ret, nbCells; + virNodeInfo nodeInfo; + virNodeInfoPtr nodeInfoPtr = &nodeInfo; + xenUnifiedPrivatePtr priv; +
+ if ((conn == NULL) || (freeMems == NULL) || (maxCells < 0)) + return -1;
Hum, actually, that would catch the (maxCells == -1) so that won't work and won't catch maxCells == 0 which could lead to a segfault. Maybe (maxCells == 0) || (maxCells < -1) should be used instead.
+ /* get actual number of cells */ + if (xenDaemonNodeGetInfo(conn, nodeInfoPtr)) {
Hum, since the number of cells is static, maybe this should be stored permanently in a variable on init. The xenDaemon RPC will be orders of magnitude more expensive than the direct hypercall below.
+ virXenError(VIR_ERR_XEN_CALL, " cannot determine actual number of cells", 0); + return -1; + } + nbCells = nodeInfoPtr->nodes; + + if (maxCells > nbCells) + maxCells = nbCells;
Hum ... maxCells is the number of entries in the array, I'm afraid you will need 2 counters or I misunderstood the way the API works (possible :-), I would fill in freeMems[] starting at index 0 for startCell, and going up. That feels more coherent than leaving uninitialized values at the start of the array: for (i = startCell, j = 0;(i < nbCells) && (j < maxCells);i++,j++) { op_sys.u.availheap.node = i; ret = xenHypervisorDoV2Sys(priv->handle, &op_sys); if (ret < 0) return(-1); freeMems[j] = op_sys.u.availheap.avail_bytes; } return (j);
+ + if (startCell > nbCells - 1) + return -1; + + for (i = startCell; i < maxCells; i++) { + op_sys.u.availheap.node = i; + ret = xenHypervisorDoV2Sys(priv->handle, &op_sys); + if (ret < 0) + return(-1); + freeMems[i] = op_sys.u.availheap.avail_bytes; + } + return (maxCells - startCell); }
A few things to check, but that's looks like a good start :-) thanks a lot ! Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Daniel Veillard wrote:
+ * If maxCells == -1, the free memory will be summed across all cells and + * returned in freeMems[0].
Hum, that -1 handling is a bit surprizing, but apparently that's what the Xen hypercall does. It's a bit convoluted as an API to get the full free memory on a Node, maybe this should be separated as a different API as people looking for this information may not think about looking at NUMA special entry points. We don't have currently a simple API to get the free memory on a Node, maybe that should be added.
Having the separate semantics is necessary because it seems like this can be done in a single hypercall if the caller only needs to know the total free memory. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

beth kon wrote:
Here is my first pass at a patch for accessing the available memory associated with each NUMA cell through libvirt.
The initial framework patch provided by Daniel Veillard is a prereq of this patch: https://www.redhat.com/archives/libvir-list/2007-September/msg00069.html
I have not yet tested this. I'm just distributing for comments.
A few comments/questions: 1) I think I got the versioning stuff right but that deserves a close look. 2) In Daniel's patch, libvirt.h and libvirt.h.in were identical. I assumed the patch would be before running autogen.sh, and only contain changes in libvirt.h.in, so that's how I did mine. Let me know if I misunderstand something here.
That fine, although be aware that libvirt.h only gets recreated when you run ./configure explicitly. I think there should be a dependency in the Makefile so that it can be recreated from config.status, but that's another bug so don't worry about it for this.
3) I had to put #ifdef PROXY around calls to virXenErrorFunc in xenHypervisorNodeGetCellsFreeMemory to get this to build. I haven't figured out how the proxy code works so am not sure if this is the right approach.
As a general background, the Xen proxy was an earlier attempt to allow query Xen without having to be root. The Xen proxy was a setuid binary which only contains "safe" read-only code. It was built from the same sources as libvirt but with any code thought to be unsafe disabled (#ifndef PROXY ... #endif). In any case the proxy ability has been completely superseded by a combination of remote access and PolicyKit. In fact I think Dan was planning to remove the Xen proxy entirely. Comments on the code itself: int virNodeGetCellsFreeMemory(virConnectPtr conn, - unsigned long *freeMems, - int nbCells); + long long *freeMems, + int startCell, + int maxCells); I guess Daniel Veillard can comment on this change. Did we decide that this wouldn't be useful on realistic topologies because interesting cell ranges would never be contiguous? + * @freeMems: pointer to the array of long long + * @startCell: index of first cell to return freeMems info on (0 to + * maxCells - 1). + * @maxCells: number of entries available in freeMems (-1 for sum of + * free memory across all cells, returned in freeMems[0]) I'm slightly confused by this documentation. If maxCells > 0, does this only fill a subset of the array freeMems? - if ((freeMems == NULL) || (nbCells <= 0)) { + if ((freeMems == NULL) || (maxCells <= 0) || (startCell > maxCells - 1)) { virLibConnError(conn, VIR_ERR_INVALID_ARG, __FUNCTION__); return (-1); } But we allow maxCells == -1 though, so this test is wrong. + if ((conn == NULL) || (freeMems == NULL) || (maxCells < 0)) + return -1; Should always call the error function before returning an error. Unfortunately the error handling in libvirt is over-engineered in the extreme and, what is more, in order to get around this over-engineering we have defined "simpler" wrapper error functions in different / incompatible ways in each file. In this particular file (xen_internal.c) the "simpler" error function is called virXenErrorFunc. + /* get actual number of cells */ + if (xenDaemonNodeGetInfo(conn, nodeInfoPtr)) { + virXenError(VIR_ERR_XEN_CALL, " cannot determine actual number of cells", 0); + return -1; + } + nbCells = nodeInfoPtr->nodes; Is there a way for me to find out the total number of cells (nbCells)? + if (maxCells > nbCells) + maxCells = nbCells; I would prefer to return an error here, but it's not too bad because the return value tells us how many cells were filled. + if (startCell > nbCells - 1) + return -1; Surely, the condition should be `startCell + maxCells > nbCells'? Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

Richard W.M. Jones wrote:
beth kon wrote:
Comments on the code itself:
+ * @freeMems: pointer to the array of long long + * @startCell: index of first cell to return freeMems info on (0 to + * maxCells - 1). + * @maxCells: number of entries available in freeMems (-1 for sum of + * free memory across all cells, returned in freeMems[0])
I'm slightly confused by this documentation. If maxCells > 0, does this only fill a subset of the array freeMems?
maxCells is size of the freeMems array. I will clarify.
- if ((freeMems == NULL) || (nbCells <= 0)) { + if ((freeMems == NULL) || (maxCells <= 0) || (startCell > maxCells - 1)) { virLibConnError(conn, VIR_ERR_INVALID_ARG, __FUNCTION__); return (-1); }
But we allow maxCells == -1 though, so this test is wrong.
+ if ((conn == NULL) || (freeMems == NULL) || (maxCells < 0)) + return -1;
yep. I will clean up the maxCells and startCell code.
Should always call the error function before returning an error. Unfortunately the error handling in libvirt is over-engineered in the extreme and, what is more, in order to get around this over-engineering we have defined "simpler" wrapper error functions in different / incompatible ways in each file. In this particular file (xen_internal.c) the "simpler" error function is called virXenErrorFunc.
I saw other cases where this was done and was following that precedent, though usually I did call an error function. In any case, it is obviously better to do so, so I will correct this.
+ /* get actual number of cells */ + if (xenDaemonNodeGetInfo(conn, nodeInfoPtr)) { + virXenError(VIR_ERR_XEN_CALL, " cannot determine actual number of cells", 0); + return -1; + } + nbCells = nodeInfoPtr->nodes;
Is there a way for me to find out the total number of cells (nbCells)?
It is returned by virNodeGetInfo, and also in the virNodeGetTopology XML string (which I'm currently working on). Do you think it should also be included here? I would think it reasonable to assume the caller has done virNodeGetInfo before virNodeGetCellsFreeMemory, though it is easy enough to add here.
+ if (maxCells > nbCells) + maxCells = nbCells;
I would prefer to return an error here, but it's not too bad because the return value tells us how many cells were filled.
+ if (startCell > nbCells - 1) + return -1;
Surely, the condition should be `startCell + maxCells > nbCells'?
I will rework all this so we can discuss the new version. Thanks for the comments!
Rich.
-- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: eak@us.ibm.com

Richard W.M. Jones wrote:
+ if (startCell > nbCells - 1) + return -1;
Surely, the condition should be `startCell + maxCells > nbCells'?
Actually I will discuss this now since I plan to leave it as is. First I'll make sure terms are clear. startCell is the first cell to gather freeMems info on. maxCells is the maximum number of entries available in freeMems (otherwise the buffer will be overrun). nbCells is the total number of cells on the node. I assume the caller could legally specify a maxCells value that is greater than nbCells. It just means they provided a bigger buffer than necessary. My test here is just verifying that the startCell provided is not out of range of the number of cells that exist on the node. freeMems entries will be returned for up to maxCells cells, starting from startCell, ending at either nbCells, or maxCells number of entries, whichever comes first. Does that make sense?
Rich.
-- Elizabeth Kon (Beth) IBM Linux Technology Center Open Hypervisor Team email: eak@us.ibm.com

beth kon wrote:
I assume the caller could legally specify a maxCells value that is greater than nbCells. It just means they provided a bigger buffer than necessary.
I don't think that would be a problem. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903
participants (3)
-
beth kon
-
Daniel Veillard
-
Richard W.M. Jones