On Tue, Jul 16, 2013 at 11:10:25AM +0100, Dario Faggioli wrote:
On mar, 2013-07-16 at 10:41 +0100, Daniel P. Berrange wrote:
> On Sat, Jul 13, 2013 at 02:27:03AM +0200, Dario Faggioli wrote:
> > @@ -788,9 +903,40 @@ libxlMakeCapabilities(libxl_ctx *ctx)
> > return NULL;
> > }
> >
> > - return libxlMakeCapabilitiesInternal(virArchFromHost(),
> > + caps = libxlMakeCapabilitiesInternal(virArchFromHost(),
> > &phy_info,
> > ver_info->capabilities);
> > +
> > + /* Check if caps is valid. If it is, it must remain so till the end! */
> > + if (caps == NULL)
> > + goto out;
> > +
> > + /* Let's try to fetch NUMA info now (not critical in case we fail) */
> > + numa_info = libxl_get_numainfo(ctx, &nr_nodes);
> > + if (numa_info == NULL)
> > + VIR_WARN("libxl_get_numainfo failed to retrieve NUMA
data");
>
> Under what scenario can libxl_get_numainfo() return NULL ? Unless this
> is an valid expected scenario, we should treat this is an error.
>
There are indeed a couple of possible reasons. Actually, I saw that the
qemu driver does pretty much the same, i.e., if retrieving NUMA
information fails, it gives up on that, but does not make things
explode, and I really think it is something that makes sense.
The reason the QEMU driver does that is that libnuma will return an
error if the host machine does not expose NUMA info in its BIOS. This
is an expected, valid scenario, so we have to ignore the error and
libnuma provides no way to distinguish this valid scenario from other
errors.
The actual possible failure reasons are: (1) it cannot prepare the
parameters for the hypercall, or (2) the hypercall fails. It is true
that, in both cases, something really serious might have happened, but
there is no way to tell it from here. Thus, I honestly think that trying
to carry on is sound... If it is really the case that some critical
component died, we'll find out soon enough.
The only scenario in which it is acceptable to ignore the failure
is if the physical hardware does not support NUMA. The question is
whether the Xen API lets you distinguish that scenario, from other
types of errors.
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|