[libvirt] [PATCH v3 0/3] libxl: implement some chuncks of the NUMA interface

Hi everyone again, 3rd round for the initial NUMA support in the libxl driver. I think I addressed all the comments that have been raised during v2 review (more details in the signle changelogs). Let me know if there's more. Thanks and Regards, Dario --- Dario Faggioli (3): libxl: implement NUMA capabilities reporting libxl: advertise the support for VIR_TYPED_PARAM_STRING libxl: implement virDomainGetNumaParameters src/libxl/libxl_conf.c | 148 ++++++++++++++++++++++++++++++++++++++++++ src/libxl/libxl_driver.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 307 insertions(+), 2 deletions(-) -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Starting from Xen 4.2, libxl has all the bits and pieces in place for retrieving an adequate amount of information about the host NUMA topology. It is therefore possible, after a bit of shuffling, to arrange those information in the way libvirt wants to present them to the outside world. Therefore, with this patch, the <topology> section of the host capabilities is properly populated, when running on Xen, so that we can figure out whether or not we're running on a NUMA host, and what its characteristics are. [raistlin@Zhaman ~]$ sudo virsh --connect xen:/// capabilities <capabilities> <host> <cpu> .... <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>6291456</memory> <cpus num='8'> <cpu id='0' socket_id='1' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='1' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='1' core_id='1' siblings='2-3'/> <cpu id='3' socket_id='1' core_id='1' siblings='2-3'/> <cpu id='4' socket_id='1' core_id='9' siblings='4-5'/> <cpu id='5' socket_id='1' core_id='9' siblings='4-5'/> <cpu id='6' socket_id='1' core_id='10' siblings='6-7'/> <cpu id='7' socket_id='1' core_id='10' siblings='6-7'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>6881280</memory> <cpus num='8'> <cpu id='8' socket_id='0' core_id='0' siblings='8-9'/> <cpu id='9' socket_id='0' core_id='0' siblings='8-9'/> <cpu id='10' socket_id='0' core_id='1' siblings='10-11'/> <cpu id='11' socket_id='0' core_id='1' siblings='10-11'/> <cpu id='12' socket_id='0' core_id='9' siblings='12-13'/> <cpu id='13' socket_id='0' core_id='9' siblings='12-13'/> <cpu id='14' socket_id='0' core_id='10' siblings='14-15'/> <cpu id='15' socket_id='0' core_id='10' siblings='14-15'/> </cpus> </cell> </cells> </topology> </host> .... Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * iterators turned from int to size_t; * fixed wrong sibling maps if on same node but different socket; * code motion and error handling, as requested during review. Changes from v1: * fixed a typo in the commit message, as requested during review; * fixed coding style (one function parameters per line and no spaces between variable definitions), as requested during review; * avoid zero-filling memory after VIR_ALLOC_N(), since it does that already, as requested during review; * improved out of memory error reporting, as requested during review; * libxlMakeNumaCapabilities() created, accommodating all the NUMA related additions, instead of having them within libxlMakeCapabilitiesInternal(), as suggested during review. --- src/libxl/libxl_conf.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 147 insertions(+), 1 deletion(-) diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c index 4a0fba9..c097d1e 100644 --- a/src/libxl/libxl_conf.c +++ b/src/libxl/libxl_conf.c @@ -161,6 +161,117 @@ libxlBuildCapabilities(virArch hostarch, } static virCapsPtr +libxlMakeNumaCapabilities(libxl_numainfo *numa_info, + int nr_nodes, + libxl_cputopology *cpu_topo, + int nr_cpus, + virCapsPtr caps) +{ + virCapsHostNUMACellCPUPtr *cpus = NULL; + int *nr_cpus_node = NULL; + bool numa_failed = false; + size_t i; + + if (VIR_ALLOC_N(cpus, nr_nodes)) { + virReportOOMError(); + return caps; + } + + if (VIR_ALLOC_N(nr_cpus_node, nr_nodes)) { + VIR_FREE(cpus); + virReportOOMError(); + return caps; + } + + /* For each node, prepare a list of CPUs belonging to that node */ + for (i = 0; i < nr_cpus; i++) { + int node = cpu_topo[i].node; + + if (cpu_topo[i].core == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) + continue; + + nr_cpus_node[node]++; + + if (nr_cpus_node[node] == 1) { + if (VIR_ALLOC(cpus[node]) < 0) { + virReportOOMError(); + numa_failed = true; + goto cleanup; + } + } + else { + if (VIR_REALLOC_N(cpus[node], nr_cpus_node[node]) < 0) { + virReportOOMError(); + numa_failed = true; + goto cleanup; + } + } + + /* Mapping between what libxl tells and what libvirt wants */ + cpus[node][nr_cpus_node[node]-1].id = i; + cpus[node][nr_cpus_node[node]-1].socket_id = cpu_topo[i].socket; + cpus[node][nr_cpus_node[node]-1].core_id = cpu_topo[i].core; + /* Allocate the siblings maps. We will be filling them later */ + cpus[node][nr_cpus_node[node]-1].siblings = virBitmapNew(nr_cpus); + if (!cpus[node][nr_cpus_node[node]-1].siblings) { + virReportOOMError(); + numa_failed = true; + goto cleanup; + } + } + + /* Let's now populate the siblings bitmaps */ + for (i = 0; i < nr_cpus; i++) { + int node = cpu_topo[i].node; + size_t j; + + if (cpu_topo[i].core == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) + continue; + + for (j = 0; j < nr_cpus_node[node]; j++) { + if (cpus[node][j].socket_id == cpu_topo[i].socket && + cpus[node][j].core_id == cpu_topo[i].core) + ignore_value(virBitmapSetBit(cpus[node][j].siblings, i)); + } + } + + for (i = 0; i < nr_nodes; i++) { + if (numa_info[i].size == LIBXL_NUMAINFO_INVALID_ENTRY) + continue; + + if (virCapabilitiesAddHostNUMACell(caps, i, nr_cpus_node[i], + numa_info[i].size / 1024, + cpus[i]) < 0) { + virCapabilitiesClearHostNUMACellCPUTopology(cpus[i], + nr_cpus_node[i]); + numa_failed = true; + goto cleanup; + } + + /* This is safe, as the CPU list is now stored in the NUMA cell */ + cpus[i] = NULL; + } + + cleanup: + + if (numa_failed) { + /* Looks like something went wrong. Well, that's bad, but probably + * not enough to break the whole driver, so we log and carry on */ + for (i = 0; i < nr_nodes; i++) { + VIR_FREE(cpus[i]); + } + VIR_WARN("Failed to retrieve and build host NUMA topology properly,\n" + "disabling NUMA capabilities"); + virCapabilitiesFreeNUMAInfo(caps); + } + + VIR_FREE(cpus); + VIR_FREE(nr_cpus_node); + + return caps; +} + +static virCapsPtr libxlMakeCapabilitiesInternal(virArch hostarch, libxl_physinfo *phy_info, char *capabilities) @@ -764,7 +875,11 @@ libxlMakeCapabilities(libxl_ctx *ctx) { int err; libxl_physinfo phy_info; + libxl_numainfo *numa_info = NULL; + libxl_cputopology *cpu_topo = NULL; const libxl_version_info *ver_info; + int nr_nodes = 0, nr_cpus = 0; + virCapsPtr caps; err = regcomp(&xen_cap_rec, xen_cap_re, REG_EXTENDED); if (err != 0) { @@ -788,9 +903,40 @@ libxlMakeCapabilities(libxl_ctx *ctx) return NULL; } - return libxlMakeCapabilitiesInternal(virArchFromHost(), + caps = libxlMakeCapabilitiesInternal(virArchFromHost(), &phy_info, ver_info->capabilities); + + /* Check if caps is valid. If it is, it must remain so till the end! */ + if (caps == NULL) + goto out; + + /* Let's try to fetch NUMA info now (not critical in case we fail) */ + numa_info = libxl_get_numainfo(ctx, &nr_nodes); + if (numa_info == NULL) + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data"); + else { + /* If the above failed, we'd have no NUMa caps anyway! */ + cpu_topo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (cpu_topo == NULL) { + VIR_WARN("libxl_get_cpu_topology failed to retrieve topology"); + libxl_numainfo_list_free(numa_info, nr_nodes); + } + else { + /* And add topology information to caps */ + caps = libxlMakeNumaCapabilities(numa_info, + nr_nodes, + cpu_topo, + nr_cpus, + caps); + } + } + + libxl_cputopology_list_free(cpu_topo, nr_cpus); + libxl_numainfo_list_free(numa_info, nr_nodes); + +out: + return caps; } int

On Sat, Jul 13, 2013 at 02:27:03AM +0200, Dario Faggioli wrote:
Starting from Xen 4.2, libxl has all the bits and pieces in place for retrieving an adequate amount of information about the host NUMA topology. It is therefore possible, after a bit of shuffling, to arrange those information in the way libvirt wants to present them to the outside world.
Therefore, with this patch, the <topology> section of the host capabilities is properly populated, when running on Xen, so that we can figure out whether or not we're running on a NUMA host, and what its characteristics are.
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// capabilities <capabilities> <host> <cpu> .... <topology> <cells num='2'> <cell id='0'> <memory unit='KiB'>6291456</memory> <cpus num='8'> <cpu id='0' socket_id='1' core_id='0' siblings='0-1'/> <cpu id='1' socket_id='1' core_id='0' siblings='0-1'/> <cpu id='2' socket_id='1' core_id='1' siblings='2-3'/> <cpu id='3' socket_id='1' core_id='1' siblings='2-3'/> <cpu id='4' socket_id='1' core_id='9' siblings='4-5'/> <cpu id='5' socket_id='1' core_id='9' siblings='4-5'/> <cpu id='6' socket_id='1' core_id='10' siblings='6-7'/> <cpu id='7' socket_id='1' core_id='10' siblings='6-7'/> </cpus> </cell> <cell id='1'> <memory unit='KiB'>6881280</memory> <cpus num='8'> <cpu id='8' socket_id='0' core_id='0' siblings='8-9'/> <cpu id='9' socket_id='0' core_id='0' siblings='8-9'/> <cpu id='10' socket_id='0' core_id='1' siblings='10-11'/> <cpu id='11' socket_id='0' core_id='1' siblings='10-11'/> <cpu id='12' socket_id='0' core_id='9' siblings='12-13'/> <cpu id='13' socket_id='0' core_id='9' siblings='12-13'/> <cpu id='14' socket_id='0' core_id='10' siblings='14-15'/> <cpu id='15' socket_id='0' core_id='10' siblings='14-15'/> </cpus> </cell> </cells> </topology> </host> ....
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * iterators turned from int to size_t; * fixed wrong sibling maps if on same node but different socket; * code motion and error handling, as requested during review.
Changes from v1: * fixed a typo in the commit message, as requested during review; * fixed coding style (one function parameters per line and no spaces between variable definitions), as requested during review; * avoid zero-filling memory after VIR_ALLOC_N(), since it does that already, as requested during review; * improved out of memory error reporting, as requested during review; * libxlMakeNumaCapabilities() created, accommodating all the NUMA related additions, instead of having them within libxlMakeCapabilitiesInternal(), as suggested during review. --- src/libxl/libxl_conf.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 147 insertions(+), 1 deletion(-)
diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c index 4a0fba9..c097d1e 100644 --- a/src/libxl/libxl_conf.c +++ b/src/libxl/libxl_conf.c @@ -161,6 +161,117 @@ libxlBuildCapabilities(virArch hostarch, }
static virCapsPtr +libxlMakeNumaCapabilities(libxl_numainfo *numa_info, + int nr_nodes, + libxl_cputopology *cpu_topo, + int nr_cpus, + virCapsPtr caps) +{ + virCapsHostNUMACellCPUPtr *cpus = NULL; + int *nr_cpus_node = NULL; + bool numa_failed = false; + size_t i; + + if (VIR_ALLOC_N(cpus, nr_nodes)) { + virReportOOMError(); + return caps; + } + + if (VIR_ALLOC_N(nr_cpus_node, nr_nodes)) { + VIR_FREE(cpus); + virReportOOMError(); + return caps; + }
Afraid some more changes have landed in GIT that affect you. All the VIR_*ALLOC* functions, except those suffixed _QUIET will now call virReportOOMError() for you. So you can remove virReportOOMError here and all other places related to these alloc fnuctions. Also, you should be checking 'VIR_ALLOC_N() < 0', not just for a non-zero value.
+ + /* For each node, prepare a list of CPUs belonging to that node */ + for (i = 0; i < nr_cpus; i++) { + int node = cpu_topo[i].node; + + if (cpu_topo[i].core == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) + continue; + + nr_cpus_node[node]++; + + if (nr_cpus_node[node] == 1) { + if (VIR_ALLOC(cpus[node]) < 0) { + virReportOOMError(); + numa_failed = true; + goto cleanup; + } + } + else { + if (VIR_REALLOC_N(cpus[node], nr_cpus_node[node]) < 0) { + virReportOOMError(); + numa_failed = true; + goto cleanup; + } + } + + /* Mapping between what libxl tells and what libvirt wants */ + cpus[node][nr_cpus_node[node]-1].id = i; + cpus[node][nr_cpus_node[node]-1].socket_id = cpu_topo[i].socket; + cpus[node][nr_cpus_node[node]-1].core_id = cpu_topo[i].core; + /* Allocate the siblings maps. We will be filling them later */ + cpus[node][nr_cpus_node[node]-1].siblings = virBitmapNew(nr_cpus); + if (!cpus[node][nr_cpus_node[node]-1].siblings) { + virReportOOMError(); + numa_failed = true; + goto cleanup; + } + } + + /* Let's now populate the siblings bitmaps */ + for (i = 0; i < nr_cpus; i++) { + int node = cpu_topo[i].node; + size_t j; + + if (cpu_topo[i].core == LIBXL_CPUTOPOLOGY_INVALID_ENTRY) + continue; + + for (j = 0; j < nr_cpus_node[node]; j++) { + if (cpus[node][j].socket_id == cpu_topo[i].socket && + cpus[node][j].core_id == cpu_topo[i].core) + ignore_value(virBitmapSetBit(cpus[node][j].siblings, i)); + } + } + + for (i = 0; i < nr_nodes; i++) { + if (numa_info[i].size == LIBXL_NUMAINFO_INVALID_ENTRY) + continue; + + if (virCapabilitiesAddHostNUMACell(caps, i, nr_cpus_node[i], + numa_info[i].size / 1024, + cpus[i]) < 0) { + virCapabilitiesClearHostNUMACellCPUTopology(cpus[i], + nr_cpus_node[i]); + numa_failed = true; + goto cleanup; + } + + /* This is safe, as the CPU list is now stored in the NUMA cell */ + cpus[i] = NULL; + } + + cleanup: + + if (numa_failed) { + /* Looks like something went wrong. Well, that's bad, but probably + * not enough to break the whole driver, so we log and carry on */ + for (i = 0; i < nr_nodes; i++) { + VIR_FREE(cpus[i]); + } + VIR_WARN("Failed to retrieve and build host NUMA topology properly,\n" + "disabling NUMA capabilities"); + virCapabilitiesFreeNUMAInfo(caps);
This is wrong. All the cases of error leading to this point are out of memory errors. It is madness to prevent everything is ok & carry on here. This should be treated as a proper error.
@@ -788,9 +903,40 @@ libxlMakeCapabilities(libxl_ctx *ctx) return NULL; }
- return libxlMakeCapabilitiesInternal(virArchFromHost(), + caps = libxlMakeCapabilitiesInternal(virArchFromHost(), &phy_info, ver_info->capabilities); + + /* Check if caps is valid. If it is, it must remain so till the end! */ + if (caps == NULL) + goto out; + + /* Let's try to fetch NUMA info now (not critical in case we fail) */ + numa_info = libxl_get_numainfo(ctx, &nr_nodes); + if (numa_info == NULL) + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data");
Under what scenario can libxl_get_numainfo() return NULL ? Unless this is an valid expected scenario, we should treat this is an error.
+ else { + /* If the above failed, we'd have no NUMa caps anyway! */ + cpu_topo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (cpu_topo == NULL) { + VIR_WARN("libxl_get_cpu_topology failed to retrieve topology"); + libxl_numainfo_list_free(numa_info, nr_nodes); + }
Likewise here.
+ else { + /* And add topology information to caps */ + caps = libxlMakeNumaCapabilities(numa_info, + nr_nodes, + cpu_topo, + nr_cpus, + caps); + } + } + + libxl_cputopology_list_free(cpu_topo, nr_cpus); + libxl_numainfo_list_free(numa_info, nr_nodes); + +out:
s/out/cleanup/ is our preferred label naming
+ return caps; }
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On mar, 2013-07-16 at 10:41 +0100, Daniel P. Berrange wrote:
On Sat, Jul 13, 2013 at 02:27:03AM +0200, Dario Faggioli wrote:
[..]
diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c index 4a0fba9..c097d1e 100644 --- a/src/libxl/libxl_conf.c +++ b/src/libxl/libxl_conf.c @@ -161,6 +161,117 @@ libxlBuildCapabilities(virArch hostarch, }
static virCapsPtr +libxlMakeNumaCapabilities(libxl_numainfo *numa_info, + int nr_nodes, + libxl_cputopology *cpu_topo, + int nr_cpus, + virCapsPtr caps) +{ + virCapsHostNUMACellCPUPtr *cpus = NULL; + int *nr_cpus_node = NULL; + bool numa_failed = false; + size_t i; + + if (VIR_ALLOC_N(cpus, nr_nodes)) { + virReportOOMError(); + return caps; + } + + if (VIR_ALLOC_N(nr_cpus_node, nr_nodes)) { + VIR_FREE(cpus); + virReportOOMError(); + return caps; + }
Afraid some more changes have landed in GIT that affect you.
All the VIR_*ALLOC* functions, except those suffixed _QUIET will now call virReportOOMError() for you. So you can remove virReportOOMError here and all other places related to these alloc fnuctions.
Yeah, I saw that after sending the patches. I can't access my testbox for this whole week, so a repost will have to wait until Monday. Once at home, I'll rebase on top of that and resubmit. Thanks for pointing that out.
Also, you should be checking 'VIR_ALLOC_N() < 0', not just for a non-zero value.
Ok, I see. Will do that.
+ cleanup: + + if (numa_failed) { + /* Looks like something went wrong. Well, that's bad, but probably + * not enough to break the whole driver, so we log and carry on */ + for (i = 0; i < nr_nodes; i++) { + VIR_FREE(cpus[i]); + } + VIR_WARN("Failed to retrieve and build host NUMA topology properly,\n" + "disabling NUMA capabilities"); + virCapabilitiesFreeNUMAInfo(caps);
This is wrong. All the cases of error leading to this point are out of memory errors. It is madness to prevent everything is ok & carry on here. This should be treated as a proper error.
Fair enough, will do that too.
@@ -788,9 +903,40 @@ libxlMakeCapabilities(libxl_ctx *ctx) return NULL; }
- return libxlMakeCapabilitiesInternal(virArchFromHost(), + caps = libxlMakeCapabilitiesInternal(virArchFromHost(), &phy_info, ver_info->capabilities); + + /* Check if caps is valid. If it is, it must remain so till the end! */ + if (caps == NULL) + goto out; + + /* Let's try to fetch NUMA info now (not critical in case we fail) */ + numa_info = libxl_get_numainfo(ctx, &nr_nodes); + if (numa_info == NULL) + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data");
Under what scenario can libxl_get_numainfo() return NULL ? Unless this is an valid expected scenario, we should treat this is an error.
There are indeed a couple of possible reasons. Actually, I saw that the qemu driver does pretty much the same, i.e., if retrieving NUMA information fails, it gives up on that, but does not make things explode, and I really think it is something that makes sense. The actual possible failure reasons are: (1) it cannot prepare the parameters for the hypercall, or (2) the hypercall fails. It is true that, in both cases, something really serious might have happened, but there is no way to tell it from here. Thus, I honestly think that trying to carry on is sound... If it is really the case that some critical component died, we'll find out soon enough. What do you think?
+ else { + /* If the above failed, we'd have no NUMa caps anyway! */ + cpu_topo = libxl_get_cpu_topology(ctx, &nr_cpus); + if (cpu_topo == NULL) { + VIR_WARN("libxl_get_cpu_topology failed to retrieve topology"); + libxl_numainfo_list_free(numa_info, nr_nodes); + }
Likewise here.
Likewise here for me too. :-)
+ else { + /* And add topology information to caps */ + caps = libxlMakeNumaCapabilities(numa_info, + nr_nodes, + cpu_topo, + nr_cpus, + caps); + } + } + + libxl_cputopology_list_free(cpu_topo, nr_cpus); + libxl_numainfo_list_free(numa_info, nr_nodes); + +out:
s/out/cleanup/ is our preferred label naming
Ok. Thanks a lot for the review, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

On Tue, Jul 16, 2013 at 11:10:25AM +0100, Dario Faggioli wrote:
On mar, 2013-07-16 at 10:41 +0100, Daniel P. Berrange wrote:
On Sat, Jul 13, 2013 at 02:27:03AM +0200, Dario Faggioli wrote:
@@ -788,9 +903,40 @@ libxlMakeCapabilities(libxl_ctx *ctx) return NULL; }
- return libxlMakeCapabilitiesInternal(virArchFromHost(), + caps = libxlMakeCapabilitiesInternal(virArchFromHost(), &phy_info, ver_info->capabilities); + + /* Check if caps is valid. If it is, it must remain so till the end! */ + if (caps == NULL) + goto out; + + /* Let's try to fetch NUMA info now (not critical in case we fail) */ + numa_info = libxl_get_numainfo(ctx, &nr_nodes); + if (numa_info == NULL) + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data");
Under what scenario can libxl_get_numainfo() return NULL ? Unless this is an valid expected scenario, we should treat this is an error.
There are indeed a couple of possible reasons. Actually, I saw that the qemu driver does pretty much the same, i.e., if retrieving NUMA information fails, it gives up on that, but does not make things explode, and I really think it is something that makes sense.
The reason the QEMU driver does that is that libnuma will return an error if the host machine does not expose NUMA info in its BIOS. This is an expected, valid scenario, so we have to ignore the error and libnuma provides no way to distinguish this valid scenario from other errors.
The actual possible failure reasons are: (1) it cannot prepare the parameters for the hypercall, or (2) the hypercall fails. It is true that, in both cases, something really serious might have happened, but there is no way to tell it from here. Thus, I honestly think that trying to carry on is sound... If it is really the case that some critical component died, we'll find out soon enough.
The only scenario in which it is acceptable to ignore the failure is if the physical hardware does not support NUMA. The question is whether the Xen API lets you distinguish that scenario, from other types of errors. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On mar, 2013-07-16 at 11:18 +0100, Daniel P. Berrange wrote:
On Tue, Jul 16, 2013 at 11:10:25AM +0100, Dario Faggioli wrote:
There are indeed a couple of possible reasons. Actually, I saw that the qemu driver does pretty much the same, i.e., if retrieving NUMA information fails, it gives up on that, but does not make things explode, and I really think it is something that makes sense.
The reason the QEMU driver does that is that libnuma will return an error if the host machine does not expose NUMA info in its BIOS. This is an expected, valid scenario, so we have to ignore the error and libnuma provides no way to distinguish this valid scenario from other errors.
Ok, I see.
The actual possible failure reasons are: (1) it cannot prepare the parameters for the hypercall, or (2) the hypercall fails. It is true that, in both cases, something really serious might have happened, but there is no way to tell it from here. Thus, I honestly think that trying to carry on is sound... If it is really the case that some critical component died, we'll find out soon enough.
The only scenario in which it is acceptable to ignore the failure is if the physical hardware does not support NUMA. The question is whether the Xen API lets you distinguish that scenario, from other types of errors.
Mmm... I see what you mean. Well, I guess it depends on what you mean by 'not support'. If we're talking about a box with only 1 NUMA node, and call it a non-NUMA box, then libxl will just tell you that there is only that 1 node, and that is already being taken care of properly (see Jim's testing of the series). If "host machine does not expose NUMA info in its BIOS" means something different than that, then I've just never put hands on one of such machine, hence I really am not sure what Xen would tell libxl about them. Anyway, it really looks to me too now that I should not special case errors happening in libxl_get_numainfo(), and treat them as all the other ones. I'll do that. Thanks again and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

domainGetNumaParameters has a string typed parameter, hence it is necessary for the libxl driver to support this. This change implements the connectSupportsFeature hook for the libxl driver, advertising that VIR_DRV_FEATURE_TYPED_PARAM_STRING is supported. Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Eric Blake <eblake@redhat.com> --- Changes from v1: * commit message improved, as requested during review; * added VIR_TYPED_PARAM_STRING_OKAY handling code in libxlDomainGetSchedulerParametersFlags(); --- src/libxl/libxl_driver.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index 21c5162..a5addab 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -4342,7 +4342,10 @@ libxlDomainGetSchedulerParametersFlags(virDomainPtr dom, libxl_scheduler sched_id; int ret = -1; - virCheckFlags(0, -1); + virCheckFlags(VIR_TYPED_PARAM_STRING_OKAY, -1); + + /* We don't return strings, and thus trivially support this flag. */ + flags &= ~VIR_TYPED_PARAM_STRING_OKAY; libxlDriverLock(driver); vm = virDomainObjListFindByUUID(driver->domains, dom->uuid); @@ -4643,6 +4646,20 @@ libxlConnectListAllDomains(virConnectPtr conn, return ret; } +/* Which features are supported by this driver? */ +static int +libxlConnectSupportsFeature(virConnectPtr conn, int feature) +{ + if (virConnectSupportsFeatureEnsureACL(conn) < 0) + return -1; + + switch (feature) { + case VIR_DRV_FEATURE_TYPED_PARAM_STRING: + return 1; + default: + return 0; + } +} static virDriver libxlDriver = { @@ -4723,6 +4740,7 @@ static virDriver libxlDriver = { .connectDomainEventRegisterAny = libxlConnectDomainEventRegisterAny, /* 0.9.0 */ .connectDomainEventDeregisterAny = libxlConnectDomainEventDeregisterAny, /* 0.9.0 */ .connectIsAlive = libxlConnectIsAlive, /* 0.9.8 */ + .connectSupportsFeature = libxlConnectSupportsFeature, /* 1.1.1 */ }; static virStateDriver libxlStateDriver = {

On Sat, Jul 13, 2013 at 02:27:10AM +0200, Dario Faggioli wrote:
domainGetNumaParameters has a string typed parameter, hence it is necessary for the libxl driver to support this.
This change implements the connectSupportsFeature hook for the libxl driver, advertising that VIR_DRV_FEATURE_TYPED_PARAM_STRING is supported.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Eric Blake <eblake@redhat.com> --- Changes from v1: * commit message improved, as requested during review; * added VIR_TYPED_PARAM_STRING_OKAY handling code in libxlDomainGetSchedulerParametersFlags(); --- src/libxl/libxl_driver.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Daniel P. Berrange wrote:
On Sat, Jul 13, 2013 at 02:27:10AM +0200, Dario Faggioli wrote:
domainGetNumaParameters has a string typed parameter, hence it is necessary for the libxl driver to support this.
This change implements the connectSupportsFeature hook for the libxl driver, advertising that VIR_DRV_FEATURE_TYPED_PARAM_STRING is supported.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Eric Blake <eblake@redhat.com> --- Changes from v1: * commit message improved, as requested during review; * added VIR_TYPED_PARAM_STRING_OKAY handling code in libxlDomainGetSchedulerParametersFlags(); --- src/libxl/libxl_driver.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
ACK
I've pushed patches 2 and 3 now. Thanks Dario! Regards, Jim

On mer, 2013-07-17 at 10:59 -0600, Jim Fehlig wrote:
Daniel P. Berrange wrote:
On Sat, Jul 13, 2013 at 02:27:10AM +0200, Dario Faggioli wrote:
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Cc: Eric Blake <eblake@redhat.com> --- Changes from v1: * commit message improved, as requested during review; * added VIR_TYPED_PARAM_STRING_OKAY handling code in libxlDomainGetSchedulerParametersFlags(); --- src/libxl/libxl_driver.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
ACK
I've pushed patches 2 and 3 now. Thanks Dario!
Sweet... Thanks to you! As I said, I'm away from my dev/test environment right now, and I'll be on vacation Mon to Wed (included) next week. I'll send an updated version of the last remaining patch ASAP. Thanks again and Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

Although, having it depending on Xen >= 4.3 (by using the proper libxl feature flag). Xen currently implements a NUMA placement policy which is basically the same as the 'interleaved' policy of `numactl', although it can be applied on a subset of the available nodes. We therefore hardcode "interleave" as 'numa_mode', and we use the newly introduced libxl interface to figure out what nodes a domain spans ('numa_nodeset'). With this change, it is now possible to query the NUMA node affinity of a running domain: [raistlin@Zhaman ~]$ sudo virsh --connect xen:/// list Id Name State ---------------------------------------------------- 23 F18_x64 running [raistlin@Zhaman ~]$ sudo virsh --connect xen:/// numatune 23 numa_mode : interleave numa_nodeset : 1 Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * iterators turned from int to size_t Changes from v1: * fixed coding style, as requested during review; * VIR_TYPED_PARAM_STRING_OKAY handled more properly; --- src/libxl/libxl_driver.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+) diff --git a/src/libxl/libxl_driver.c b/src/libxl/libxl_driver.c index a5addab..358d387 100644 --- a/src/libxl/libxl_driver.c +++ b/src/libxl/libxl_driver.c @@ -28,6 +28,7 @@ #include <math.h> #include <libxl.h> +#include <libxl_utils.h> #include <fcntl.h> #include <regex.h> @@ -4499,6 +4500,143 @@ libxlDomainSetSchedulerParameters(virDomainPtr dom, virTypedParameterPtr params, return libxlDomainSetSchedulerParametersFlags(dom, params, nparams, 0); } +/* NUMA node affinity information is available through libxl + * starting from Xen 4.3. */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + +/* Number of Xen NUMA parameters */ +# define LIBXL_NUMA_NPARAM 2 + +static int +libxlDomainGetNumaParameters(virDomainPtr dom, + virTypedParameterPtr params, + int *nparams, + unsigned int flags) +{ + libxlDriverPrivatePtr driver = dom->conn->privateData; + libxlDomainObjPrivatePtr priv; + virDomainObjPtr vm; + libxl_bitmap nodemap; + virBitmapPtr nodes = NULL; + char *nodeset = NULL; + int rc, ret = -1; + size_t i, j; + + /* In Xen 4.3, it is possible to query the NUMA node affinity of a domain + * via libxl, but not to change it. We therefore only allow AFFECT_LIVE. */ + virCheckFlags(VIR_DOMAIN_AFFECT_LIVE | + VIR_TYPED_PARAM_STRING_OKAY, -1); + + /* We blindly return a string, and let libvirt.c and remote_driver.c do + * the filtering on behalf of older clients that can't parse it. */ + flags &= ~VIR_TYPED_PARAM_STRING_OKAY; + + libxlDriverLock(driver); + vm = virDomainObjListFindByUUID(driver->domains, dom->uuid); + libxlDriverUnlock(driver); + + if (!vm) { + virReportError(VIR_ERR_NO_DOMAIN, "%s", + _("no domain with matching uuid")); + goto cleanup; + } + + if (virDomainGetNumaParametersEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Domain is not running")); + goto cleanup; + } + + priv = vm->privateData; + + libxl_bitmap_init(&nodemap); + + if ((*nparams) == 0) { + *nparams = LIBXL_NUMA_NPARAM; + ret = 0; + goto cleanup; + } + + for (i = 0; i < LIBXL_NUMA_NPARAM && i < *nparams; i++) { + virMemoryParameterPtr param = ¶ms[i]; + + switch (i) { + case 0: + /* NUMA mode */ + + /* Xen implements something that is really close to numactl's + * 'interleave' policy (see `man 8 numactl' for details). */ + if (virTypedParameterAssign(param, VIR_DOMAIN_NUMA_MODE, + VIR_TYPED_PARAM_INT, + VIR_DOMAIN_NUMATUNE_MEM_INTERLEAVE) < 0) + goto cleanup; + + break; + + case 1: + /* Node affinity */ + + /* Let's allocate both libxl and libvirt bitmaps */ + if (libxl_node_bitmap_alloc(priv->ctx, &nodemap, 0) || + !(nodes = virBitmapNew(libxl_get_max_nodes(priv->ctx)))) { + virReportOOMError(); + goto cleanup; + } + + rc = libxl_domain_get_nodeaffinity(priv->ctx, + vm->def->id, + &nodemap); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to get numa affinity")); + goto cleanup; + } + + /* First, we convert libxl_bitmap into virBitmap. After that, + * we format virBitmap as a string that can be returned. */ + virBitmapClearAll(nodes); + libxl_for_each_set_bit(j, nodemap) { + if (virBitmapSetBit(nodes, j)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Node %zu out of range"), j); + goto cleanup; + } + } + + nodeset = virBitmapFormat(nodes); + if (!nodeset && VIR_STRDUP(nodeset, "") < 0) + goto cleanup; + + if (virTypedParameterAssign(param, VIR_DOMAIN_NUMA_NODESET, + VIR_TYPED_PARAM_STRING, nodeset) < 0) + goto cleanup; + + nodeset = NULL; + + break; + + default: + break; + } + } + + if (*nparams > LIBXL_NUMA_NPARAM) + *nparams = LIBXL_NUMA_NPARAM; + ret = 0; + +cleanup: + VIR_FREE(nodeset); + virBitmapFree(nodes); + libxl_bitmap_dispose(&nodemap); + if (vm) + virObjectUnlock(vm); + return ret; +} +#endif + static int libxlDomainIsActive(virDomainPtr dom) { @@ -4727,6 +4865,9 @@ static virDriver libxlDriver = { .domainGetSchedulerParametersFlags = libxlDomainGetSchedulerParametersFlags, /* 0.9.2 */ .domainSetSchedulerParameters = libxlDomainSetSchedulerParameters, /* 0.9.0 */ .domainSetSchedulerParametersFlags = libxlDomainSetSchedulerParametersFlags, /* 0.9.2 */ +#ifdef LIBXL_HAVE_DOMAIN_NODEAFFINITY + .domainGetNumaParameters = libxlDomainGetNumaParameters, /* 1.1.1 */ +#endif .nodeGetFreeMemory = libxlNodeGetFreeMemory, /* 0.9.0 */ .nodeGetCellsFreeMemory = libxlNodeGetCellsFreeMemory, /* 1.1.1 */ .connectDomainEventRegister = libxlConnectDomainEventRegister, /* 0.9.0 */

On Sat, Jul 13, 2013 at 02:27:18AM +0200, Dario Faggioli wrote:
Although, having it depending on Xen >= 4.3 (by using the proper libxl feature flag).
Xen currently implements a NUMA placement policy which is basically the same as the 'interleaved' policy of `numactl', although it can be applied on a subset of the available nodes. We therefore hardcode "interleave" as 'numa_mode', and we use the newly introduced libxl interface to figure out what nodes a domain spans ('numa_nodeset').
With this change, it is now possible to query the NUMA node affinity of a running domain:
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// list Id Name State ---------------------------------------------------- 23 F18_x64 running
[raistlin@Zhaman ~]$ sudo virsh --connect xen:/// numatune 23 numa_mode : interleave numa_nodeset : 1
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> --- Changes from v2: * iterators turned from int to size_t
Changes from v1: * fixed coding style, as requested during review; * VIR_TYPED_PARAM_STRING_OKAY handled more properly; --- src/libxl/libxl_driver.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Dario Faggioli
-
Jim Fehlig