[libvirt] [PATCH 0/4] libxl: implement some chuncks of the NUMA interface

Hi Jim, Everyone, First patch series, so maybe a small introduction is required: I'm Dario and I work for Citrix on improving the NUMA support of Xen. This patch series implements some of the missing bits and pieces, in the libxl driver, regarding obtaining per-host and per-domain NUMA related information. It's not the full libvirt NUMA interface, since we don't have all we would need for that in Xen yet (we will in the next version, 4.4), but it's certainly better than having nothing! :-) Basically, I'm enhancing capability reporting, to cover NUMA topology (patch 01), and I'm implementing nodeGetCellsFreeMemory (patch 02) and virDomainGetNumaParameters (patch 04) for the libxl driver. This last one requires at least Xen 4.3, so I put the implementation within the proper #ifdef-ery. What I'm really not sure about is patch 03, which is something I need if I want patch 04 to function properly. Basically it is about advertising that the libxl driver supports VIR_TYPED_PARAM_STRING. I looked at how that is done in the qemu driver, but I'm not entirely sure I completely understood the logic behind it, so, please, tell me if I missed or misinterpreted anything! In particular, should I have added more of those "flags &= ~VIR_TYPED_PARAM_STRING_OKAY;" statements, as it happens in the qemu driver? If yes, in which functions? Finally, allow me to say that it was a while that I wanted start hacking a bit on libvirt. I'm really glad I've eventually been able to do so, and I definitely plan to continue (with particular focus on NUMA related stuff). Comments are of course more than welcome. :-) Thanks and Regards, Dario --- Dario Faggioli (4): libxl: implement NUMA capabilities reporting libxl: implement per NUMA node free memory reporting libxl: advertise the support for VIR_TYPED_PARAM_STRING libxl: implement virDomainGetNumaParameters src/libxl/libxl_conf.c | 128 +++++++++++++++++++++++++++++- src/libxl/libxl_driver.c | 199 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 326 insertions(+), 1 deletion(-) -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Starting from Xen 4.2, libxl has all the bits and pieces are in place for retrieving an adequate amount of information about the host NUMA topology. It is therefore possible, after a bit of shuffling, to arrange those information in the way libvirt wants to present them to the outside world. Therefore, with this patch, the <topology> section of the host capabilities is properly populated, when running on Xen, so that we can figure out whether or not we're running on a NUMA host, and what its characteristics are. [raistlin@Zhaman ~]$ sudo virsh --connect xen:/// capabilities <capabilities> <host> <cpu> .... <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>6291456</memory> <cpus num='8'> <cpu id='0' socket_id='1' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='1' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='1' core_id='1' siblings='2-3'/> <cpu id='3' socket_id='1' core_id='1' siblings='2-3'/> <cpu id='4' socket_id='1' core_id='9' siblings='4-5'/> <cpu id='5' socket_id='1' core_id='9' siblings='4-5'/> <cpu id='6' socket_id='1' core_id='10' siblings='6-7'/> <cpu id='7' socket_id='1' core_id='10' siblings='6-7'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>6881280</memory> <cpus num='8'> <cpu id='8' socket_id='0' core_id='0' siblings='8-9'/> <cpu id='9' socket_id='0' core_id='0' siblings='8-9'/> <cpu id='10' socket_id='0' core_id='1' siblings='10-11'/> <cpu id='11' socket_id='0' core_id='1' siblings='10-11'/> <cpu id='12' socket_id='0' core_id='9' siblings='12-13'/> <cpu id='13' socket_id='0' core_id='9' siblings='12-13'/> <cpu id='14' socket_id='0' core_id='10' siblings='14-15'/> <cpu id='15' socket_id='0' core_id='10' siblings='14-15'/> </cpus> </cell> </cells> </topology> </host> .... Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- src/libxl/libxl_conf.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 127 insertions(+), 1 deletion(-) diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c index e170357..2a9bcf0 100644 --- a/src/libxl/libxl_conf.c +++ b/src/libxl/libxl_conf.c @@ -163,6 +163,8 @@ libxlBuildCapabilities(virArch hostarch, static virCapsPtr libxlMakeCapabilitiesInternal(virArch hostarch, libxl_physinfo *phy_info, + libxl_numainfo *numa_info, int nr_nodes, + libxl_cputopology *cpu_topo, int nr_cpus, char *capabilities) { char *str, *token; @@ -173,6 +175,12 @@ libxlMakeCapabilitiesInternal(virArch hostarch, int host_pae = 0; struct guest_arch guest_archs[32]; int nr_guest_archs = 0; + + /* For building NUMA capabilities */ + virCapsHostNUMACellCPUPtr *cpus = NULL; + int *nr_cpus_node = NULL; + bool numa_failed = false; + virCapsPtr caps = NULL; memset(guest_archs, 0, sizeof(guest_archs)); @@ -280,6 +288,100 @@ libxlMakeCapabilitiesInternal(virArch hostarch, nr_guest_archs)) == NULL) goto no_memory; + /* What about NUMA? */ + if (!numa_info || !cpu_topo) + return caps; + + if (VIR_ALLOC_N(cpus, nr_nodes)) + goto no_memory; + memset(cpus, 0, sizeof(cpus) * nr_nodes); + + if (VIR_ALLOC_N(nr_cpus_node, nr_nodes)) { + VIR_FREE(cpus); + goto no_memory; + } + memset(nr_cpus_node, 0, sizeof(nr_cpus_node) * nr_nodes); + + /* For each node, prepare a list of CPUs belonging to that node */ + for (i = 0; i < nr_cpus; i++) { + int node = cpu_topo[i].node; + + if (cpu_topo[i].core == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) + continue; + + nr_cpus_node[node]++; + + if (nr_cpus_node[node] == 1) { + if (VIR_ALLOC(cpus[node]) < 0) { + numa_failed = true; + goto cleanup; + } + } + else { + if (VIR_REALLOC_N(cpus[node], nr_cpus_node[node]) < 0) { + numa_failed = true; + goto cleanup; + } + } + + /* Mapping between what libxl tells and what libvirt wants */ + cpus[node][nr_cpus_node[node]-1].id = i; + cpus[node][nr_cpus_node[node]-1].socket_id = cpu_topo[i].socket; + cpus[node][nr_cpus_node[node]-1].core_id = cpu_topo[i].core; + cpus[node][nr_cpus_node[node]-1].siblings = virBitmapNew(nr_cpus); + + if (!cpus[node][nr_cpus_node[node]-1].siblings) { + virReportOOMError(); + numa_failed = true; + goto cleanup; + } + } + + /* Let's now populate the siblings bitmaps */ + for (i = 0; i < nr_cpus; i++) { + int j, node = cpu_topo[i].node; + + if (cpu_topo[i].core == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) + continue; + + for (j = 0; j < nr_cpus_node[node]; j++) { + if (cpus[node][j].core_id == cpu_topo[i].core) + ignore_value(virBitmapSetBit(cpus[node][j].siblings, i)); + } + } + + for (i = 0; i < nr_nodes; i++) { + if (numa_info[i].size == LIBXL_NUMAINFO_INVALID_ENTRY) + continue; + + if (virCapabilitiesAddHostNUMACell(caps, i, nr_cpus_node[i], + numa_info[i].size / 1024, + cpus[i]) < 0) { + virCapabilitiesClearHostNUMACellCPUTopology(cpus[i], + nr_cpus_node[i]); + numa_failed = true; + goto cleanup; + } + + /* This is safe, as the CPU list is now stored in the NUMA cell */ + cpus[i] = NULL; + } + +cleanup: + if (numa_failed) { + /* Looks like something went wrong. Well, that's bad, but probably + * not enough to break the whole driver, so we log and carry on */ + for (i = 0; i < nr_nodes; i++) { + VIR_FREE(cpus[i]); + } + VIR_WARN("Failed to retrieve and build host NUMA topology properly,\n" + "disabling NUMA capabilities"); + virCapabilitiesFreeNUMAInfo(caps); + } + + VIR_FREE(cpus); + VIR_FREE(nr_cpus_node); + return caps; no_memory: @@ -772,7 +874,11 @@ libxlMakeCapabilities(libxl_ctx *ctx) { int err; libxl_physinfo phy_info; + libxl_numainfo *numa_info = NULL; + libxl_cputopology *cpu_topo = NULL; const libxl_version_info *ver_info; + int nr_nodes = 0, nr_cpus = 0; + virCapsPtr caps; err = regcomp(&xen_cap_rec, xen_cap_re, REG_EXTENDED); if (err != 0) { @@ -796,9 +902,29 @@ libxlMakeCapabilities(libxl_ctx *ctx) return NULL; } - return libxlMakeCapabilitiesInternal(virArchFromHost(), + /* Let's try to fetch NUMA info, but it is not critical if we fail */ + numa_info = libxl_get_numainfo(ctx, &nr_nodes); + if (numa_info == NULL) + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data"); + else { + /* If the above failed, we'd have no NUMa caps anyway! */ + cpu_topo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (cpu_topo == NULL) { + VIR_WARN("libxl_get_cpu_topology failed to retrieve topology"); + libxl_numainfo_list_free(numa_info, nr_nodes); + } + } + + caps = libxlMakeCapabilitiesInternal(virArchFromHost(), &phy_info, + numa_info, nr_nodes, + cpu_topo, nr_cpus, ver_info->capabilities); + + libxl_cputopology_list_free(cpu_topo, nr_cpus); + libxl_numainfo_list_free(numa_info, nr_nodes); + + return caps; } int

On 06/28/2013 08:32 AM, Dario Faggioli wrote:
Starting from Xen 4.2, libxl has all the bits and pieces are in
s/are in/in/
place for retrieving an adequate amount of information about the host NUMA topology. It is therefore possible, after a bit of shuffling, to arrange those information in the way libvirt wants to present them to the outside world.
Therefore, with this patch, the <topology> section of the host capabilities is properly populated, when running on Xen, so that we can figure out whether or not we're running on a NUMA host, and what its characteristics are.
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// capabilities <capabilities> <host> <cpu> .... <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>6291456</memory> <cpus num='8'> <cpu id='0' socket_id='1' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='1' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='1' core_id='1' siblings='2-3'/> <cpu id='3' socket_id='1' core_id='1' siblings='2-3'/> <cpu id='4' socket_id='1' core_id='9' siblings='4-5'/> <cpu id='5' socket_id='1' core_id='9' siblings='4-5'/> <cpu id='6' socket_id='1' core_id='10' siblings='6-7'/> <cpu id='7' socket_id='1' core_id='10' siblings='6-7'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>6881280</memory> <cpus num='8'> <cpu id='8' socket_id='0' core_id='0' siblings='8-9'/> <cpu id='9' socket_id='0' core_id='0' siblings='8-9'/> <cpu id='10' socket_id='0' core_id='1' siblings='10-11'/> <cpu id='11' socket_id='0' core_id='1' siblings='10-11'/> <cpu id='12' socket_id='0' core_id='9' siblings='12-13'/> <cpu id='13' socket_id='0' core_id='9' siblings='12-13'/> <cpu id='14' socket_id='0' core_id='10' siblings='14-15'/> <cpu id='15' socket_id='0' core_id='10' siblings='14-15'/> </cpus> </cell> </cells> </topology> </host> ....
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- src/libxl/libxl_conf.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 127 insertions(+), 1 deletion(-)
diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c index e170357..2a9bcf0 100644 --- a/src/libxl/libxl_conf.c +++ b/src/libxl/libxl_conf.c @@ -163,6 +163,8 @@ libxlBuildCapabilities(virArch hostarch, static virCapsPtr libxlMakeCapabilitiesInternal(virArch hostarch, libxl_physinfo *phy_info, + libxl_numainfo *numa_info, int nr_nodes, + libxl_cputopology *cpu_topo, int nr_cpus,
The most prominent pattern in libvirt is one param per line after the first, if they all don't fit in 80 columns. E.g. libxlMakeCapabilitiesInternal(virArch hostarch, libxl_physinfo *phy_info, libxl_numainfo *numa_info, int nr_nodes, libxl_cputopology *cpu_topo, int nr_cpus, ...)
char *capabilities) { char *str, *token; @@ -173,6 +175,12 @@ libxlMakeCapabilitiesInternal(virArch hostarch, int host_pae = 0; struct guest_arch guest_archs[32]; int nr_guest_archs = 0; + + /* For building NUMA capabilities */ + virCapsHostNUMACellCPUPtr *cpus = NULL; + int *nr_cpus_node = NULL; + bool numa_failed = false; +
No need for the extra whitespace between these local variables.
virCapsPtr caps = NULL;
memset(guest_archs, 0, sizeof(guest_archs)); @@ -280,6 +288,100 @@ libxlMakeCapabilitiesInternal(virArch hostarch, nr_guest_archs)) == NULL) goto no_memory;
+ /* What about NUMA? */
What about it? I think the comment should just say "Include NUMA information if available" or similar :).
+ if (!numa_info || !cpu_topo) + return caps; + + if (VIR_ALLOC_N(cpus, nr_nodes)) + goto no_memory; + memset(cpus, 0, sizeof(cpus) * nr_nodes);
VIR_ALLOC_N will already zero-fill the memory. From its' documentation in viralloc.h * Allocate an array of 'count' elements, each sizeof(*ptr) * bytes long and store the address of allocated memory in * 'ptr'. Fill the newly allocated memory with zeros.
+ + if (VIR_ALLOC_N(nr_cpus_node, nr_nodes)) { + VIR_FREE(cpus); + goto no_memory; + } + memset(nr_cpus_node, 0, sizeof(nr_cpus_node) * nr_nodes);
Same here.
+ + /* For each node, prepare a list of CPUs belonging to that node */ + for (i = 0; i < nr_cpus; i++) { + int node = cpu_topo[i].node; + + if (cpu_topo[i].core == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) + continue; + + nr_cpus_node[node]++; + + if (nr_cpus_node[node] == 1) { + if (VIR_ALLOC(cpus[node]) < 0) { + numa_failed = true; + goto cleanup; + } + } + else { + if (VIR_REALLOC_N(cpus[node], nr_cpus_node[node]) < 0) {
virReportOOMError() ?
+ numa_failed = true; + goto cleanup; + } + } + + /* Mapping between what libxl tells and what libvirt wants */ + cpus[node][nr_cpus_node[node]-1].id = i; + cpus[node][nr_cpus_node[node]-1].socket_id = cpu_topo[i].socket; + cpus[node][nr_cpus_node[node]-1].core_id = cpu_topo[i].core; + cpus[node][nr_cpus_node[node]-1].siblings = virBitmapNew(nr_cpus); + + if (!cpus[node][nr_cpus_node[node]-1].siblings) { + virReportOOMError(); + numa_failed = true; + goto cleanup; + } + } + + /* Let's now populate the siblings bitmaps */ + for (i = 0; i < nr_cpus; i++) { + int j, node = cpu_topo[i].node; + + if (cpu_topo[i].core == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) + continue; + + for (j = 0; j < nr_cpus_node[node]; j++) { + if (cpus[node][j].core_id == cpu_topo[i].core) + ignore_value(virBitmapSetBit(cpus[node][j].siblings, i)); + } + } + + for (i = 0; i < nr_nodes; i++) { + if (numa_info[i].size == LIBXL_NUMAINFO_INVALID_ENTRY) + continue; + + if (virCapabilitiesAddHostNUMACell(caps, i, nr_cpus_node[i], + numa_info[i].size / 1024, + cpus[i]) < 0) {
On my non-NUMA test machine I have the cell memory reported as <memory unit='KiB'>9175040</memory> The machine has 8G of memory, running xen 4.3 rc6, with dom0_mem=1024M. 'xl info --numa' says total_memory : 8190 ... numa_info : node: memsize memfree distances 0: 8960 7116 10 Why is the node memsize > total_memory?
+ virCapabilitiesClearHostNUMACellCPUTopology(cpus[i], + nr_cpus_node[i]); + numa_failed = true; + goto cleanup; + } + + /* This is safe, as the CPU list is now stored in the NUMA cell */ + cpus[i] = NULL; + } + +cleanup: + if (numa_failed) { + /* Looks like something went wrong. Well, that's bad, but probably + * not enough to break the whole driver, so we log and carry on */ + for (i = 0; i < nr_nodes; i++) { + VIR_FREE(cpus[i]); + } + VIR_WARN("Failed to retrieve and build host NUMA topology properly,\n" + "disabling NUMA capabilities"); + virCapabilitiesFreeNUMAInfo(caps); + } + + VIR_FREE(cpus); + VIR_FREE(nr_cpus_node);
Hmm, I'm beginning to think the numa additions to libxlMakeCapabilitiesInternal() should be in a helper function, e.g. libxlMakeNumaCapabilities(), and called when numa_info and cpu_topo are provided. Regards, Jim
+ return caps;
no_memory: @@ -772,7 +874,11 @@ libxlMakeCapabilities(libxl_ctx *ctx) { int err; libxl_physinfo phy_info; + libxl_numainfo *numa_info = NULL; + libxl_cputopology *cpu_topo = NULL; const libxl_version_info *ver_info; + int nr_nodes = 0, nr_cpus = 0; + virCapsPtr caps;
err = regcomp(&xen_cap_rec, xen_cap_re, REG_EXTENDED); if (err != 0) { @@ -796,9 +902,29 @@ libxlMakeCapabilities(libxl_ctx *ctx) return NULL; }
- return libxlMakeCapabilitiesInternal(virArchFromHost(), + /* Let's try to fetch NUMA info, but it is not critical if we fail */ + numa_info = libxl_get_numainfo(ctx, &nr_nodes); + if (numa_info == NULL) + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data"); + else { + /* If the above failed, we'd have no NUMa caps anyway! */ + cpu_topo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (cpu_topo == NULL) { + VIR_WARN("libxl_get_cpu_topology failed to retrieve topology"); + libxl_numainfo_list_free(numa_info, nr_nodes); + } + } + + caps = libxlMakeCapabilitiesInternal(virArchFromHost(), &phy_info, + numa_info, nr_nodes, + cpu_topo, nr_cpus, ver_info->capabilities); + + libxl_cputopology_list_free(cpu_topo, nr_cpus); + libxl_numainfo_list_free(numa_info, nr_nodes); + + return caps; }
int

On lun, 2013-07-01 at 16:47 -0600, Jim Fehlig wrote:
On 06/28/2013 08:32 AM, Dario Faggioli wrote:
Starting from Xen 4.2, libxl has all the bits and pieces are in
s/are in/in/
Uups! Will fix, thanks.
diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c index e170357..2a9bcf0 100644
static virCapsPtr libxlMakeCapabilitiesInternal(virArch hostarch, libxl_physinfo *phy_info, + libxl_numainfo *numa_info, int nr_nodes, + libxl_cputopology *cpu_topo, int nr_cpus,
The most prominent pattern in libvirt is one param per line after the first, if they all don't fit in 80 columns. E.g.
libxlMakeCapabilitiesInternal(virArch hostarch, libxl_physinfo *phy_info, libxl_numainfo *numa_info, int nr_nodes, libxl_cputopology *cpu_topo, int nr_cpus, ...)
Ok.
char *capabilities) { char *str, *token; @@ -173,6 +175,12 @@ libxlMakeCapabilitiesInternal(virArch hostarch, int host_pae = 0; struct guest_arch guest_archs[32]; int nr_guest_archs = 0; + + /* For building NUMA capabilities */ + virCapsHostNUMACellCPUPtr *cpus = NULL; + int *nr_cpus_node = NULL; + bool numa_failed = false; +
No need for the extra whitespace between these local variables.
Killed.
virCapsPtr caps = NULL;
memset(guest_archs, 0, sizeof(guest_archs)); @@ -280,6 +288,100 @@ libxlMakeCapabilitiesInternal(virArch hostarch, nr_guest_archs)) == NULL) goto no_memory;
+ /* What about NUMA? */
What about it? I think the comment should just say "Include NUMA information if available" or similar :).
Agreed.
+ if (!numa_info || !cpu_topo) + return caps; + + if (VIR_ALLOC_N(cpus, nr_nodes)) + goto no_memory; + memset(cpus, 0, sizeof(cpus) * nr_nodes);
VIR_ALLOC_N will already zero-fill the memory. From its' documentation in viralloc.h
Right. I even looked it up (or so I remember), but then I wasn't sure it was doing that. Anyway, thanks, I'll get rid of the explicit zero-filling part.
+ if (nr_cpus_node[node] == 1) { + if (VIR_ALLOC(cpus[node]) < 0) { + numa_failed = true; + goto cleanup; + } + } + else { + if (VIR_REALLOC_N(cpus[node], nr_cpus_node[node]) < 0) {
virReportOOMError() ?
Sounds reasonable. Will do.
+cleanup: + if (numa_failed) { + /* Looks like something went wrong. Well, that's bad, but probably + * not enough to break the whole driver, so we log and carry on */ + for (i = 0; i < nr_nodes; i++) { + VIR_FREE(cpus[i]); + } + VIR_WARN("Failed to retrieve and build host NUMA topology properly,\n" + "disabling NUMA capabilities"); + virCapabilitiesFreeNUMAInfo(caps); + } + + VIR_FREE(cpus); + VIR_FREE(nr_cpus_node);
Hmm, I'm beginning to think the numa additions to libxlMakeCapabilitiesInternal() should be in a helper function, e.g. libxlMakeNumaCapabilities(), and called when numa_info and cpu_topo are provided.
Yeah, it's a bit long, isn't it. I agree with the above and will make it an helper for v2. Thanks and Regards, Dario

[Moving the conversation on @xen-devel and adding Jan, as that seems more appropriate] [Jan, this came up as I'm implementing some NUMA bits in libvirt but, as you see, the core of Jim's question is purely about Xen] On lun, 2013-07-01 at 16:47 -0600, Jim Fehlig wrote:
On my non-NUMA test machine I have the cell memory reported as
<memory unit='KiB'>9175040</memory>
Which is 8960, if divided by 1024, so at least it's consistent. However...
The machine has 8G of memory, running xen 4.3 rc6, with dom0_mem=1024M. 'xl info --numa' says
total_memory : 8190 ... numa_info : node: memsize memfree distances 0: 8960 7116 10
Why is the node memsize > total_memory?
Mmm... Interesting question. I really never paid attention to this... Jan (or anyone else), is that something known and/or expected? I went checking this down in Xen, and here's what I found. total_memory is: info.total_pages/((1 << 20) / vinfo->pagesize) where 'info' is what libxl_get_physinfo() provides. On its turn, libxl_get_physinfo() is xc_physinfo(), which is XEN_SYSCTL_physinfo, which uses total_pages, which is assigned the number of pages, down in __start_xen(), as it results from parsing the E820 map (looking for RAM blocks). OTOH, memsize comes from libxl_get_numainfo(), which is xc_numainfo(), which is XEN_SYSCTL_numainfo, which puts in memsize what node_spanned_pages(<node_id>) says. That seems to come, on a NUMA box, from the parsing of SRAT, and on a non-NUMA box, from just (start_pfn-end_pfn) (in pages, of course). Anyway, on my NUMA box, I see something similar to what Jim sees on a non-NUMA one: # xl info -n ... total_memory : 12285 ... numa_info : node: memsize memfree distances 0: 6144 23 10,20 1: 6720 104 20,10 Where 6144+6720=12864 > 12285 Looking at what Xen says during boot, I see this (the [*], [+], [=] and [|] are mine): (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 0000000000096000 (usable) (XEN) 00000000000f0000 - 0000000000100000 (reserved) [*] (XEN) 0000000000100000 - 00000000dbdf9c00 (usable) (XEN) 00000000dbdf9c00 - 00000000dbe4bc00 (ACPI NVS) [+] (XEN) 00000000dbe4bc00 - 00000000dbe4dc00 (ACPI data) [=] (XEN) 00000000dbe4dc00 - 00000000dc000000 (reserved) [|] (XEN) 00000000f8000000 - 00000000fd000000 (reserved) (XEN) 00000000fe000000 - 00000000fed00400 (reserved) (XEN) 00000000fee00000 - 00000000fef00000 (reserved) (XEN) 00000000ffb00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000324000000 (usable) ... (XEN) System RAM: 12285MB (12580412kB) And my math says that 12285MB is the sum of the areas marked as (usable), i.e., I guess, what during parsing is recognised as E820_RAM... which makes total sense. A bit below that I have this: (XEN) SRAT: Node 1 PXM 1 0-dc000000 (XEN) SRAT: Node 1 PXM 1 100000000-1a4000000 (XEN) SRAT: Node 0 PXM 0 1a4000000-324000000 Which, after the needed calculations, gives exactly the same results than memsize-s in `xl info -n'. Now, if I add up the [*], [+], [=] and [|] regions above, and then subtract that from node 1's PXMs, I see that node 1 has only ~6141MB of usable RAM, instead of 6720MB. And in fact, 6720-6141=579, just as much as 12864-12285=579. So, if I haven't messed up with the calculations, it looks like that Xen, when reporting to the upper layers the amount of memory it has available, does filter out the non-RAM regions, if this happens via XEN_SYSCTL_physinfo (i.e., by parsing E820), while it does not do that, if this happens via XEN_SYSCTL_numainfo (i.e., by parsing ACPI SRAT). What I'm not sure about is whether or not that was something known/intended and whether or not it is something we should be concerned about. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

By providing the implementation of nodeGetCellsFreeMemory for the driver. This is all just a matter of properly formatting, in a way that libvirt like, what Xen provides via libxl_get_numainfo(). [raistlin@Zhaman ~]$ sudo virsh --connect xen:/// freecell --all 0: 25004 KiB 1: 105848 KiB -------------------- Total: 130852 KiB Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- src/libxl/libxl_driver.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 9f52394..a3a9171 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -4098,6 +4098,51 @@ libxlNodeGetFreeMemory(virConnectPtr conn) } static int +libxlNodeGetCellsFreeMemory(virConnectPtr conn, + unsigned long long *freeMems, + int startCell, + int maxCells) +{ + int n, lastCell, numCells; + int ret = -1, nr_nodes = 0; + libxl_numainfo *numa_info = NULL; + libxlDriverPrivatePtr driver = conn->privateData; + + if (virNodeGetCellsFreeMemoryEnsureACL(conn) < 0) + return -1; + + /* Early failure is probably worth just a warning */ + numa_info = libxl_get_numainfo(driver->ctx, &nr_nodes); + if (numa_info == NULL || nr_nodes == 0) { + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data"); + return 0; + } + + /* Check/sanitize the cell range */ + if (startCell > nr_nodes) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("start cell %d out of range (0-%d)"), + startCell, nr_nodes); + goto cleanup; + } + lastCell = startCell + maxCells - 1; + if (lastCell > nr_nodes) + lastCell = nr_nodes; + + for (numCells = 0, n = startCell; n <= lastCell; n++) { + if (numa_info[n].size == LIBXL_NUMAINFO_INVALID_ENTRY) + freeMems[numCells++] = 0; + else + freeMems[numCells++] = numa_info[n].free; + } + ret = numCells; + +cleanup: + libxl_numainfo_list_free(numa_info, nr_nodes); + return ret; +} + +static int libxlConnectDomainEventRegister(virConnectPtr conn, virConnectDomainEventCallback callback, void *opaque, virFreeCallback freecb) @@ -4683,6 +4728,7 @@ static virDriver libxlDriver = { .domainSetSchedulerParameters = libxlDomainSetSchedulerParameters, /* 0.9.0 */ .domainSetSchedulerParametersFlags = libxlDomainSetSchedulerParametersFlags, /* 0.9.2 */ .nodeGetFreeMemory = libxlNodeGetFreeMemory, /* 0.9.0 */ + .nodeGetCellsFreeMemory = libxlNodeGetCellsFreeMemory, /* 1.1.1 */ .connectDomainEventRegister = libxlConnectDomainEventRegister, /* 0.9.0 */ .connectDomainEventDeregister = libxlConnectDomainEventDeregister, /* 0.9.0 */ .domainManagedSave = libxlDomainManagedSave, /* 0.9.2 */

On 06/28/2013 08:32 AM, Dario Faggioli wrote:
By providing the implementation of nodeGetCellsFreeMemory for the driver. This is all just a matter of properly formatting, in a way that libvirt like, what Xen provides via libxl_get_numainfo().
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// freecell --all 0: 25004 KiB 1: 105848 KiB -------------------- Total: 130852 KiB
Yeah, rather straight forward patch, which looks good on my non-NUMA machine as well # virsh freecell --all 0: 7287216 KiB -------------------- Total: 7287216 KiB # virsh freecell --cellno 0 0: 7287216 KiB ACK and pushed. Regards, Jim
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- src/libxl/libxl_driver.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 9f52394..a3a9171 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -4098,6 +4098,51 @@ libxlNodeGetFreeMemory(virConnectPtr conn) }
static int +libxlNodeGetCellsFreeMemory(virConnectPtr conn, + unsigned long long *freeMems, + int startCell, + int maxCells) +{ + int n, lastCell, numCells; + int ret = -1, nr_nodes = 0; + libxl_numainfo *numa_info = NULL; + libxlDriverPrivatePtr driver = conn->privateData; + + if (virNodeGetCellsFreeMemoryEnsureACL(conn) < 0) + return -1; + + /* Early failure is probably worth just a warning */ + numa_info = libxl_get_numainfo(driver->ctx, &nr_nodes); + if (numa_info == NULL || nr_nodes == 0) { + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data"); + return 0; + } + + /* Check/sanitize the cell range */ + if (startCell > nr_nodes) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("start cell %d out of range (0-%d)"), + startCell, nr_nodes); + goto cleanup; + } + lastCell = startCell + maxCells - 1; + if (lastCell > nr_nodes) + lastCell = nr_nodes; + + for (numCells = 0, n = startCell; n <= lastCell; n++) { + if (numa_info[n].size == LIBXL_NUMAINFO_INVALID_ENTRY) + freeMems[numCells++] = 0; + else + freeMems[numCells++] = numa_info[n].free; + } + ret = numCells; + +cleanup: + libxl_numainfo_list_free(numa_info, nr_nodes); + return ret; +} + +static int libxlConnectDomainEventRegister(virConnectPtr conn, virConnectDomainEventCallback callback, void *opaque, virFreeCallback freecb) @@ -4683,6 +4728,7 @@ static virDriver libxlDriver = { .domainSetSchedulerParameters = libxlDomainSetSchedulerParameters, /* 0.9.0 */ .domainSetSchedulerParametersFlags = libxlDomainSetSchedulerParametersFlags, /* 0.9.2 */ .nodeGetFreeMemory = libxlNodeGetFreeMemory, /* 0.9.0 */ + .nodeGetCellsFreeMemory = libxlNodeGetCellsFreeMemory, /* 1.1.1 */ .connectDomainEventRegister = libxlConnectDomainEventRegister, /* 0.9.0 */ .connectDomainEventDeregister = libxlConnectDomainEventDeregister, /* 0.9.0 */ .domainManagedSave = libxlDomainManagedSave, /* 0.9.2 */

On lun, 2013-07-01 at 17:27 -0600, Jim Fehlig wrote:
Yeah, rather straight forward patch, which looks good on my non-NUMA machine as well
# virsh freecell --all 0: 7287216 KiB -------------------- Total: 7287216 KiB
# virsh freecell --cellno 0 0: 7287216 KiB
ACK and pushed.
Cool, thanks! I will take this out of my queue. Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

domainGetNumaParameters has a string typed parameter, hence it is necessary for the libxl driver to support this. This change implements the connectSupportsFeature hook for the libxl driver, advertising that VIR_DRV_FEATURE_TYPED_PARAM_STRING is something good to go. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Eric Blake <eblake@redhat.com> --- src/libxl/libxl_driver.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a3a9171..53af609 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -4660,6 +4660,20 @@ libxlConnectListAllDomains(virConnectPtr conn, return ret; } +/* Which features are supported by this driver? */ +static int +libxlConnectSupportsFeature(virConnectPtr conn, int feature) +{ + if (virConnectSupportsFeatureEnsureACL(conn) < 0) + return -1; + + switch (feature) { + case VIR_DRV_FEATURE_TYPED_PARAM_STRING: + return 1; + default: + return 0; + } +} static virDriver libxlDriver = { @@ -4740,6 +4754,7 @@ static virDriver libxlDriver = { .connectDomainEventRegisterAny = libxlConnectDomainEventRegisterAny, /* 0.9.0 */ .connectDomainEventDeregisterAny = libxlConnectDomainEventDeregisterAny, /* 0.9.0 */ .connectIsAlive = libxlConnectIsAlive, /* 0.9.8 */ + .connectSupportsFeature = libxlConnectSupportsFeature, /* 1.1.1 */ }; static virStateDriver libxlStateDriver = {

On 06/28/2013 08:32 AM, Dario Faggioli wrote:
domainGetNumaParameters has a string typed parameter, hence it is necessary for the libxl driver to support this.
This change implements the connectSupportsFeature hook for the libxl driver, advertising that VIR_DRV_FEATURE_TYPED_PARAM_STRING is something good to go.
s/something good to go/supported/
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Eric Blake <eblake@redhat.com> --- src/libxl/libxl_driver.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a3a9171..53af609 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -4660,6 +4660,20 @@ libxlConnectListAllDomains(virConnectPtr conn, return ret; }
+/* Which features are supported by this driver? */ +static int +libxlConnectSupportsFeature(virConnectPtr conn, int feature) +{ + if (virConnectSupportsFeatureEnsureACL(conn) < 0) + return -1; + + switch (feature) { + case VIR_DRV_FEATURE_TYPED_PARAM_STRING: + return 1; + default: + return 0; + } +}
static virDriver libxlDriver = { @@ -4740,6 +4754,7 @@ static virDriver libxlDriver = { .connectDomainEventRegisterAny = libxlConnectDomainEventRegisterAny, /* 0.9.0 */ .connectDomainEventDeregisterAny = libxlConnectDomainEventDeregisterAny, /* 0.9.0 */ .connectIsAlive = libxlConnectIsAlive, /* 0.9.8 */ + .connectSupportsFeature = libxlConnectSupportsFeature, /* 1.1.1 */ };
static virStateDriver libxlStateDriver = {
As was mentioned in the reply to 0/4, libxlDomainGetSchedulerParametersFlags() will have to accommodate this change. Regards, Jim

Although, having it depending on Xen >= 4.3 (by using the proper libxl feature flag). Xen currently implements a NUMA placement policy which is basically the same as the 'interleaved' policy of `numactl', although it can be applied on a subset of the available nodes. We therefore hardcode "interleave" as 'numa_mode', and we use the newly introduced libxl interface to figure out what nodes a domain spans ('numa_nodeset'). With this change, it is now possible to query the NUMA node affinity of a running domain: [raistlin@Zhaman ~]$ sudo virsh --connect xen:/// list Id Name State ---------------------------------------------------- 23 F18_x64 running [raistlin@Zhaman ~]$ sudo virsh --connect xen:/// numatune 23 numa_mode : interleave numa_nodeset : 1 Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- src/libxl/libxl_driver.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 53af609..9bd6d99 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -28,6 +28,7 @@ #include <math.h> #include <libxl.h> +#include <libxl_utils.h> #include <fcntl.h> #include <regex.h> @@ -4514,6 +4515,140 @@ libxlDomainSetSchedulerParameters(virDomainPtr dom, virTypedParameterPtr params, return libxlDomainSetSchedulerParametersFlags(dom, params, nparams, 0); } +/* NUMA node affinity information is available through libxl + * starting from Xen 4.3. */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + +/* Number of Xen NUMA parameters */ +#define LIBXL_NUMA_NPARAM 2 + +static int +libxlDomainGetNumaParameters(virDomainPtr dom, + virTypedParameterPtr params, + int *nparams, + unsigned int flags) +{ + libxlDriverPrivatePtr driver = dom->conn->privateData; + libxlDomainObjPrivatePtr priv; + virDomainObjPtr vm; + + libxl_bitmap nodemap; + virBitmapPtr nodes = NULL; + char *nodeset = NULL; + + int rc, ret = -1; + int i, j; + + /* In Xen 4.3, it is possible to query the NUMA node affinity of a domain + * via libxl, but not to change it. We therefore only allow AFFECT_LIVE. */ + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_TYPED_PARAM_STRING_OKAY, -1); + + libxlDriverLock(driver); + vm = virDomainObjListFindByUUID(driver->domains, dom->uuid); + libxlDriverUnlock(driver); + + if (!vm) { + virReportError(VIR_ERR_NO_DOMAIN, "%s", + _("no domain with matching uuid")); + goto cleanup; + } + + if (virDomainGetNumaParametersEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Domain is not running")); + goto cleanup; + } + + priv = vm->privateData; + + libxl_bitmap_init(&nodemap); + + if ((*nparams) == 0) { + *nparams = LIBXL_NUMA_NPARAM; + ret = 0; + goto cleanup; + } + + for (i = 0; i < LIBXL_NUMA_NPARAM && i < *nparams; i++) { + virMemoryParameterPtr param = ¶ms[i]; + + switch (i) { + case 0: + /* NUMA mode */ + + /* Xen implements something that is is really close to numactl's + * 'interleave' policy (see `man 8 numactl' for details). */ + if (virTypedParameterAssign(param, VIR_DOMAIN_NUMA_MODE, + VIR_TYPED_PARAM_INT, + VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE) < 0) + goto cleanup; + + break; + + case 1: + /* Node affinity */ + + /* Let's allocate both libxl and libvirt bitmaps */ + if (libxl_node_bitmap_alloc(priv->ctx, &nodemap, 0) || + !(nodes = virBitmapNew(libxl_get_max_nodes(priv->ctx)))) { + virReportOOMError(); + goto cleanup; + } + + rc = libxl_domain_get_nodeaffinity(priv->ctx, vm->def->id, + &nodemap); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa affinity")); + goto cleanup; + } + + /* First, we convert libxl_bitmap into virBitmap. After that, + * we format virBitmap as a string that can be returned. */ + virBitmapClearAll(nodes); + libxl_for_each_set_bit(j, nodemap) { + if (virBitmapSetBit(nodes, j)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Node %d out of range"), j); + goto cleanup; + } + } + + nodeset = virBitmapFormat(nodes); + if (!nodeset && VIR_STRDUP(nodeset, "") < 0) + goto cleanup; + + if (virTypedParameterAssign(param, VIR_DOMAIN_NUMA_NODESET, + VIR_TYPED_PARAM_STRING, nodeset) < 0) + goto cleanup; + + nodeset = NULL; + + break; + + default: + break; + } + } + + if (*nparams > LIBXL_NUMA_NPARAM) + *nparams = LIBXL_NUMA_NPARAM; + ret = 0; + +cleanup: + VIR_FREE(nodeset); + virBitmapFree(nodes); + libxl_bitmap_dispose(&nodemap); + if (vm) + virObjectUnlock(vm); + return ret; +} +#endif + static int libxlDomainIsActive(virDomainPtr dom) { @@ -4741,6 +4876,9 @@ static virDriver libxlDriver = { .domainGetSchedulerParametersFlags = libxlDomainGetSchedulerParametersFlags, /* 0.9.2 */ .domainSetSchedulerParameters = libxlDomainSetSchedulerParameters, /* 0.9.0 */ .domainSetSchedulerParametersFlags = libxlDomainSetSchedulerParametersFlags, /* 0.9.2 */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + .domainGetNumaParameters = libxlDomainGetNumaParameters, /* 1.1.1 */ +#endif .nodeGetFreeMemory = libxlNodeGetFreeMemory, /* 0.9.0 */ .nodeGetCellsFreeMemory = libxlNodeGetCellsFreeMemory, /* 1.1.1 */ .connectDomainEventRegister = libxlConnectDomainEventRegister, /* 0.9.0 */

On 06/28/2013 08:33 AM, Dario Faggioli wrote:
Although, having it depending on Xen >= 4.3 (by using the proper libxl feature flag).
Xen currently implements a NUMA placement policy which is basically the same as the 'interleaved' policy of `numactl', although it can be applied on a subset of the available nodes. We therefore hardcode "interleave" as 'numa_mode', and we use the newly introduced libxl interface to figure out what nodes a domain spans ('numa_nodeset').
With this change, it is now possible to query the NUMA node affinity of a running domain:
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// list Id Name State ---------------------------------------------------- 23 F18_x64 running
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// numatune 23 numa_mode : interleave numa_nodeset : 1
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- src/libxl/libxl_driver.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 53af609..9bd6d99 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -28,6 +28,7 @@
#include <math.h> #include <libxl.h> +#include <libxl_utils.h> #include <fcntl.h> #include <regex.h>
@@ -4514,6 +4515,140 @@ libxlDomainSetSchedulerParameters(virDomainPtr dom, virTypedParameterPtr params, return libxlDomainSetSchedulerParametersFlags(dom, params, nparams, 0); }
+/* NUMA node affinity information is available through libxl + * starting from Xen 4.3. */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + +/* Number of Xen NUMA parameters */ +#define LIBXL_NUMA_NPARAM 2
Is there a similar definition in Xen? E.g. would future changes to libxl adding more parameters, but neglecting to update here, cause problems?
+ +static int +libxlDomainGetNumaParameters(virDomainPtr dom, + virTypedParameterPtr params, + int *nparams, + unsigned int flags) +{ + libxlDriverPrivatePtr driver = dom->conn->privateData; + libxlDomainObjPrivatePtr priv; + virDomainObjPtr vm; + + libxl_bitmap nodemap; + virBitmapPtr nodes = NULL; + char *nodeset = NULL; + + int rc, ret = -1; + int i, j;
No need for the extra whitespace between these local variables.
+ + /* In Xen 4.3, it is possible to query the NUMA node affinity of a domain + * via libxl, but not to change it. We therefore only allow AFFECT_LIVE. */ + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_TYPED_PARAM_STRING_OKAY, -1); + + libxlDriverLock(driver); + vm = virDomainObjListFindByUUID(driver->domains, dom->uuid); + libxlDriverUnlock(driver); + + if (!vm) { + virReportError(VIR_ERR_NO_DOMAIN, "%s", + _("no domain with matching uuid")); + goto cleanup; + } + + if (virDomainGetNumaParametersEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Domain is not running")); + goto cleanup; + } + + priv = vm->privateData; + + libxl_bitmap_init(&nodemap); + + if ((*nparams) == 0) { + *nparams = LIBXL_NUMA_NPARAM; + ret = 0; + goto cleanup; + } + + for (i = 0; i < LIBXL_NUMA_NPARAM && i < *nparams; i++) { + virMemoryParameterPtr param = ¶ms[i]; + + switch (i) { + case 0: + /* NUMA mode */ + + /* Xen implements something that is is really close to numactl's + * 'interleave' policy (see `man 8 numactl' for details). */ + if (virTypedParameterAssign(param, VIR_DOMAIN_NUMA_MODE, + VIR_TYPED_PARAM_INT, + VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE) < 0) + goto cleanup; + + break; + + case 1: + /* Node affinity */ + + /* Let's allocate both libxl and libvirt bitmaps */ + if (libxl_node_bitmap_alloc(priv->ctx, &nodemap, 0) || + !(nodes = virBitmapNew(libxl_get_max_nodes(priv->ctx)))) { + virReportOOMError(); + goto cleanup; + } + + rc = libxl_domain_get_nodeaffinity(priv->ctx, vm->def->id, + &nodemap);
Fits on one line, albeit with nothing to spare :).
+ if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa affinity")); + goto cleanup; + } + + /* First, we convert libxl_bitmap into virBitmap. After that, + * we format virBitmap as a string that can be returned. */ + virBitmapClearAll(nodes); + libxl_for_each_set_bit(j, nodemap) { + if (virBitmapSetBit(nodes, j)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Node %d out of range"), j); + goto cleanup; + } + } + + nodeset = virBitmapFormat(nodes); + if (!nodeset && VIR_STRDUP(nodeset, "") < 0) + goto cleanup; + + if (virTypedParameterAssign(param, VIR_DOMAIN_NUMA_NODESET, + VIR_TYPED_PARAM_STRING, nodeset) < 0) + goto cleanup; + + nodeset = NULL; + + break; + + default: + break; + } + } + + if (*nparams > LIBXL_NUMA_NPARAM) + *nparams = LIBXL_NUMA_NPARAM; + ret = 0; + +cleanup: + VIR_FREE(nodeset); + virBitmapFree(nodes); + libxl_bitmap_dispose(&nodemap); + if (vm) + virObjectUnlock(vm); + return ret; +} +#endif + static int libxlDomainIsActive(virDomainPtr dom) { @@ -4741,6 +4876,9 @@ static virDriver libxlDriver = { .domainGetSchedulerParametersFlags = libxlDomainGetSchedulerParametersFlags, /* 0.9.2 */ .domainSetSchedulerParameters = libxlDomainSetSchedulerParameters, /* 0.9.0 */ .domainSetSchedulerParametersFlags = libxlDomainSetSchedulerParametersFlags, /* 0.9.2 */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + .domainGetNumaParameters = libxlDomainGetNumaParameters, /* 1.1.1 */ +#endif .nodeGetFreeMemory = libxlNodeGetFreeMemory, /* 0.9.0 */ .nodeGetCellsFreeMemory = libxlNodeGetCellsFreeMemory, /* 1.1.1 */ .connectDomainEventRegister = libxlConnectDomainEventRegister, /* 0.9.0 */
Looks good with the exception of these minor nits, but needs to go along with an updated 3/4. Regards, Jim

On 07/02/2013 02:22 PM, Jim Fehlig wrote:
On 06/28/2013 08:33 AM, Dario Faggioli wrote:
Although, having it depending on Xen >= 4.3 (by using the proper libxl feature flag).
Xen currently implements a NUMA placement policy which is basically the same as the 'interleaved' policy of `numactl', although it can be applied on a subset of the available nodes. We therefore hardcode "interleave" as 'numa_mode', and we use the newly introduced libxl interface to figure out what nodes a domain spans ('numa_nodeset').
With this change, it is now possible to query the NUMA node affinity of a running domain:
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// list Id Name State ---------------------------------------------------- 23 F18_x64 running
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// numatune 23 numa_mode : interleave numa_nodeset : 1
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- src/libxl/libxl_driver.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+)
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 53af609..9bd6d99 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -28,6 +28,7 @@ #include <math.h> #include <libxl.h> +#include <libxl_utils.h> #include <fcntl.h> #include <regex.h> @@ -4514,6 +4515,140 @@ libxlDomainSetSchedulerParameters(virDomainPtr dom, virTypedParameterPtr params, return libxlDomainSetSchedulerParametersFlags(dom, params, nparams, 0); } +/* NUMA node affinity information is available through libxl + * starting from Xen 4.3. */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + +/* Number of Xen NUMA parameters */ +#define LIBXL_NUMA_NPARAM 2
Is there a similar definition in Xen? E.g. would future changes to libxl adding more parameters, but neglecting to update here, cause problems?
BTW, with cppi installed this fails 'make syntax-check' preprocessor_indentation cppi: src/libxl/libxl_driver.c: line 4523: not properly indented maint.mk: incorrect preprocessor indentation make: *** [sc_preprocessor_indentation] Error 1 Preprocessor nesting is indented as follows #if ... # define ... # if ... # define ... # endif #endif Regards, Jim

On mar, 2013-07-02 at 14:35 -0600, Jim Fehlig wrote:
BTW, with cppi installed this fails 'make syntax-check'
preprocessor_indentation cppi: src/libxl/libxl_driver.c: line 4523: not properly indented maint.mk: incorrect preprocessor indentation make: *** [sc_preprocessor_indentation] Error 1
Preprocessor nesting is indented as follows
#if ... # define ... # if ... # define ... # endif #endif
Oh, I see... I indeed tried to install cppi, but `yum search' found nothing, probably because of some typo, since I'm finding (and I have installed it) now! :-) Will fix this. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

On mar, 2013-07-02 at 14:22 -0600, Jim Fehlig wrote:
On 06/28/2013 08:33 AM, Dario Faggioli wrote:
diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 53af609..9bd6d99 100644
+/* NUMA node affinity information is available through libxl + * starting from Xen 4.3. */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + +/* Number of Xen NUMA parameters */ +#define LIBXL_NUMA_NPARAM 2
Is there a similar definition in Xen? E.g. would future changes to libxl adding more parameters, but neglecting to update here, cause problems?
There is nothing like this in Xen. I went for this #define based approach because that is right what the QEMU driver does for NUMA parameters and, also, what libxl driver does for scheduler parameters. I therefore think that this could be considered safe, from that point of view. What could happen is that more NUMA (and, perhaps, even more scheduler) parameters can pop up in the future. However, given libxl commitment for API stability, this will be advertised with something like LIBXL_HAVE_THIS_NEW_NUMA_PARAM symbol. That means we, as soon as we wan to support it, will end up with something like: #ifdef LIBXL_HAVE_THIS_NEW_NUMA_PARAM # define LIBXL_NUMA_NPARAM 3 #else # define LIBXL_NUMA_NPARAM 2 #endif Is that reasonable?
+ +static int +libxlDomainGetNumaParameters(virDomainPtr dom, + virTypedParameterPtr params, + int *nparams, + unsigned int flags) +{ + libxlDriverPrivatePtr driver = dom->conn->privateData; + libxlDomainObjPrivatePtr priv; + virDomainObjPtr vm; + + libxl_bitmap nodemap; + virBitmapPtr nodes = NULL; + char *nodeset = NULL; + + int rc, ret = -1; + int i, j;
No need for the extra whitespace between these local variables.
Ok, done.
+ case 1: + /* Node affinity */ + + /* Let's allocate both libxl and libvirt bitmaps */ + if (libxl_node_bitmap_alloc(priv->ctx, &nodemap, 0) || + !(nodes = virBitmapNew(libxl_get_max_nodes(priv->ctx)))) { + virReportOOMError(); + goto cleanup; + } + + rc = libxl_domain_get_nodeaffinity(priv->ctx, vm->def->id, + &nodemap);
Fits on one line, albeit with nothing to spare :).
Does it? According to my editor here, if I put it in just one line that ends at 82. :-( I'm therefore changing it in rc = libxl_domain_get_nodeaffinity(priv->ctx, vm->def->id, &nodemap); As you mentioned in another e-mail.
@@ -4741,6 +4876,9 @@ static virDriver libxlDriver = { .domainGetSchedulerParametersFlags = libxlDomainGetSchedulerParametersFlags, /* 0.9.2 */ .domainSetSchedulerParameters = libxlDomainSetSchedulerParameters, /* 0.9.0 */ .domainSetSchedulerParametersFlags = libxlDomainSetSchedulerParametersFlags, /* 0.9.2 */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + .domainGetNumaParameters = libxlDomainGetNumaParameters, /* 1.1.1 */ +#endif .nodeGetFreeMemory = libxlNodeGetFreeMemory, /* 0.9.0 */ .nodeGetCellsFreeMemory = libxlNodeGetCellsFreeMemory, /* 1.1.1 */ .connectDomainEventRegister = libxlConnectDomainEventRegister, /* 0.9.0 */
Looks good with the exception of these minor nits, but needs to go along with an updated 3/4.
Sure. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

On 06/28/2013 08:32 AM, Dario Faggioli wrote:
Hi Jim, Everyone,
First patch series, so maybe a small introduction is required: I'm Dario and I work for Citrix on improving the NUMA support of Xen.
Welcome Dario!
This patch series implements some of the missing bits and pieces, in the libxl driver, regarding obtaining per-host and per-domain NUMA related information. It's not the full libvirt NUMA interface, since we don't have all we would need for that in Xen yet (we will in the next version, 4.4), but it's certainly better than having nothing! :-)
Yep, incremental improvements are fine.
Basically, I'm enhancing capability reporting, to cover NUMA topology (patch 01), and I'm implementing nodeGetCellsFreeMemory (patch 02) and virDomainGetNumaParameters (patch 04) for the libxl driver. This last one requires at least Xen 4.3, so I put the implementation within the proper #ifdef-ery.
Ok, I'll comment in the individual patches too.
What I'm really not sure about is patch 03, which is something I need if I want patch 04 to function properly. Basically it is about advertising that the libxl driver supports VIR_TYPED_PARAM_STRING. I looked at how that is done in the qemu driver, but I'm not entirely sure I completely understood the logic behind it, so, please, tell me if I missed or misinterpreted anything!
I think we'll need another libvirt dev more familiar with this to verify, but afaict advertising that the driver supports VIR_TYPED_PARAM_STRING is required when returning a string in virTypedParam, for compatibility with older libvirt clients. Without it, strings wouldn't be returned to newer clients that support VIR_TYPED_PARAM_STRING.
In particular, should I have added more of those "flags &= ~VIR_TYPED_PARAM_STRING_OKAY;" statements, as it happens in the qemu driver? If yes, in which functions?
Once the driver advertises VIR_TYPED_PARAM_STRING, I think the functions taking virTypedParamPtr as a parameter must accommodate that, at least the getters. E.g. qemuDomainGetSchedulerParametersFlags() in the qemu driver has virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | VIR_DOMAIN_AFFECT_CONFIG | VIR_TYPED_PARAM_STRING_OKAY, -1); /* We don't return strings, and thus trivially support this flag. */ flags &= ~VIR_TYPED_PARAM_STRING_OKAY; I think the same applies for libxlDomainGetSchedulerParametersFlags(), which looks to be the only getter function in the libxl driver taking virTypedParam.
Finally, allow me to say that it was a while that I wanted start hacking a bit on libvirt. I'm really glad I've eventually been able to do so, and I definitely plan to continue (with particular focus on NUMA related stuff).
Great to hear and looking forward to your work! Regards, Jim

On 07/01/2013 03:20 PM, Jim Fehlig wrote:
What I'm really not sure about is patch 03, which is something I need if I want patch 04 to function properly. Basically it is about advertising that the libxl driver supports VIR_TYPED_PARAM_STRING. I looked at how that is done in the qemu driver, but I'm not entirely sure I completely understood the logic behind it, so, please, tell me if I missed or misinterpreted anything!
I think we'll need another libvirt dev more familiar with this to verify, but afaict advertising that the driver supports VIR_TYPED_PARAM_STRING is required when returning a string in virTypedParam, for compatibility with older libvirt clients. Without it, strings wouldn't be returned to newer clients that support VIR_TYPED_PARAM_STRING.
Any new API added after VIR_TYPED_PARAM_STRING was added can unconditionally assume that all callers are prepared to handle the string. But for APIs that were created before string support was added, even if your driver doesn't implement the API until now, you are correct that you must advertise support for strings before returning any strings. About the only API I know of off-hand that uses typed parameters but was added after string support is the latest incantation of migration via parameters. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On lun, 2013-07-01 at 15:27 -0600, Eric Blake wrote:
On 07/01/2013 03:20 PM, Jim Fehlig wrote:
What I'm really not sure about is patch 03, which is something I need if I want patch 04 to function properly. Basically it is about advertising that the libxl driver supports VIR_TYPED_PARAM_STRING. I looked at how that is done in the qemu driver, but I'm not entirely sure I completely understood the logic behind it, so, please, tell me if I missed or misinterpreted anything!
I think we'll need another libvirt dev more familiar with this to verify, but afaict advertising that the driver supports VIR_TYPED_PARAM_STRING is required when returning a string in virTypedParam, for compatibility with older libvirt clients. Without it, strings wouldn't be returned to newer clients that support VIR_TYPED_PARAM_STRING.
Any new API added after VIR_TYPED_PARAM_STRING was added can unconditionally assume that all callers are prepared to handle the string. But for APIs that were created before string support was added, even if your driver doesn't implement the API until now, you are correct that you must advertise support for strings before returning any strings.
Ok, I think I've understood this better now... I'll do as Jim suggest, and add the proper bits to the functions, in the libxl driver, that need them. Thanks and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

On lun, 2013-07-01 at 15:20 -0600, Jim Fehlig wrote:
On 06/28/2013 08:32 AM, Dario Faggioli wrote:
Hi Jim, Everyone,
First patch series, so maybe a small introduction is required: I'm Dario and I work for Citrix on improving the NUMA support of Xen.
Welcome Dario!
Hey! :-)
Basically, I'm enhancing capability reporting, to cover NUMA topology (patch 01), and I'm implementing nodeGetCellsFreeMemory (patch 02) and virDomainGetNumaParameters (patch 04) for the libxl driver. This last one requires at least Xen 4.3, so I put the implementation within the proper #ifdef-ery.
Ok, I'll comment in the individual patches too.
So, I just released v2, in which I think I addressed all the comments raised in this round... Let me know what you think. Thanks again and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
participants (3)
-
Dario Faggioli
-
Eric Blake
-
Jim Fehlig