On mar, 2013-07-16 at 10:41 +0100, Daniel P. Berrange wrote:
On Sat, Jul 13, 2013 at 02:27:03AM +0200, Dario Faggioli wrote:
[..]
> diff --git a/src/libxl/libxl_conf.c b/src/libxl/libxl_conf.c
> index 4a0fba9..c097d1e 100644
> --- a/src/libxl/libxl_conf.c
> +++ b/src/libxl/libxl_conf.c
> @@ -161,6 +161,117 @@ libxlBuildCapabilities(virArch hostarch,
> }
>
> static virCapsPtr
> +libxlMakeNumaCapabilities(libxl_numainfo *numa_info,
> + int nr_nodes,
> + libxl_cputopology *cpu_topo,
> + int nr_cpus,
> + virCapsPtr caps)
> +{
> + virCapsHostNUMACellCPUPtr *cpus = NULL;
> + int *nr_cpus_node = NULL;
> + bool numa_failed = false;
> + size_t i;
> +
> + if (VIR_ALLOC_N(cpus, nr_nodes)) {
> + virReportOOMError();
> + return caps;
> + }
> +
> + if (VIR_ALLOC_N(nr_cpus_node, nr_nodes)) {
> + VIR_FREE(cpus);
> + virReportOOMError();
> + return caps;
> + }
Afraid some more changes have landed in GIT that affect you.
All the VIR_*ALLOC* functions, except those suffixed _QUIET
will now call virReportOOMError() for you. So you can remove
virReportOOMError here and all other places related to these
alloc fnuctions.
Yeah, I saw that after sending the patches. I can't access my testbox
for this whole week, so a repost will have to wait until Monday. Once at
home, I'll rebase on top of that and resubmit.
Thanks for pointing that out.
Also, you should be checking 'VIR_ALLOC_N() < 0', not
just
for a non-zero value.
Ok, I see. Will do that.
> + cleanup:
> +
> + if (numa_failed) {
> + /* Looks like something went wrong. Well, that's bad, but probably
> + * not enough to break the whole driver, so we log and carry on */
> + for (i = 0; i < nr_nodes; i++) {
> + VIR_FREE(cpus[i]);
> + }
> + VIR_WARN("Failed to retrieve and build host NUMA topology
properly,\n"
> + "disabling NUMA capabilities");
> + virCapabilitiesFreeNUMAInfo(caps);
This is wrong. All the cases of error leading to this point are out of
memory errors. It is madness to prevent everything is ok & carry on here.
This should be treated as a proper error.
Fair enough, will do that too.
> @@ -788,9 +903,40 @@ libxlMakeCapabilities(libxl_ctx *ctx)
> return NULL;
> }
>
> - return libxlMakeCapabilitiesInternal(virArchFromHost(),
> + caps = libxlMakeCapabilitiesInternal(virArchFromHost(),
> &phy_info,
> ver_info->capabilities);
> +
> + /* Check if caps is valid. If it is, it must remain so till the end! */
> + if (caps == NULL)
> + goto out;
> +
> + /* Let's try to fetch NUMA info now (not critical in case we fail) */
> + numa_info = libxl_get_numainfo(ctx, &nr_nodes);
> + if (numa_info == NULL)
> + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA data");
Under what scenario can libxl_get_numainfo() return NULL ? Unless this
is an valid expected scenario, we should treat this is an error.
There are indeed a couple of possible reasons. Actually, I saw that the
qemu driver does pretty much the same, i.e., if retrieving NUMA
information fails, it gives up on that, but does not make things
explode, and I really think it is something that makes sense.
The actual possible failure reasons are: (1) it cannot prepare the
parameters for the hypercall, or (2) the hypercall fails. It is true
that, in both cases, something really serious might have happened, but
there is no way to tell it from here. Thus, I honestly think that trying
to carry on is sound... If it is really the case that some critical
component died, we'll find out soon enough.
What do you think?
> + else {
> + /* If the above failed, we'd have no NUMa caps anyway! */
> + cpu_topo = libxl_get_cpu_topology(ctx, &nr_cpus);
> + if (cpu_topo == NULL) {
> + VIR_WARN("libxl_get_cpu_topology failed to retrieve
topology");
> + libxl_numainfo_list_free(numa_info, nr_nodes);
> + }
Likewise here.
Likewise here for me too. :-)
> + else {
> + /* And add topology information to caps */
> + caps = libxlMakeNumaCapabilities(numa_info,
> + nr_nodes,
> + cpu_topo,
> + nr_cpus,
> + caps);
> + }
> + }
> +
> + libxl_cputopology_list_free(cpu_topo, nr_cpus);
> + libxl_numainfo_list_free(numa_info, nr_nodes);
> +
> +out:
s/out/cleanup/ is our preferred label naming
Ok.
Thanks a lot for the review,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D,
http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)