Re: [libvirt] [Libguestfs] Quantifying libvirt errors in launching the libguestfs appliance

Thursday, 14 January 2016

On Thu, Jan 14, 2016 at 10:51:47AM +0100, Jiri Denemark wrote:
...
 On Wed, Jan 13, 2016 at 16:25:14 +0100, Martin Kletzander wrote:
 > On Wed, Jan 13, 2016 at 10:18:42AM +0000, Richard W.M. Jones wrote:
 > >As people may know, we frequently encounter errors caused by libvirt
 > >when running the libguestfs appliance.
 > >
 > >I wanted to find out exactly how frequently these happen and classify
 > >the errors, so I ran the 'virt-df' tool overnight 1700 times.  This
 > >tool runs several parallel qemu:///session libvirt connections both
 > >creating a short-lived appliance guest.
 > >
 > >Note that I have added Cole's patch to fix
https://bugzilla.redhat.com/1271183
 > >"XML-RPC error : Cannot write data: Transport endpoint is not
connected"
 > >
 > >Results:
 > >
 > >The test failed 538 times (32% of the time), which is pretty dismal.
 > >To be fair, virt-df is aggressive about how it launches parallel
 > >libvirt connections.  Most other virt-* tools use only a single
 > >libvirt connection and are consequently more reliable.
 > >
 > >Of the failures, 518 (96%) were of the form:
 > >
 > >  process exited while connecting to monitor: qemu: could not load kernel
'/home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d/kernel': Permission
denied
 > >
 > >which is https://bugzilla.redhat.com/921135 or maybe
 > >https://bugzilla.redhat.com/1269975.  It's not clear to me if these
 > >bugs have different causes, but if they do then potentially we're
 > >seeing a mix of both since my test has no way to distinguish them.
 > >
 > 
 > It looks to me as the same problem.  And as the same problem we were
 > talking about bunch of time and, apparently, didn't get to a conclusion.
 > 
 > For each of the kernels, libvirt labels them (with both DAC and selinux
 > labels), then proceeds to launching qemu.  If this is done parallel, the
 > race is pretty obvious.  Could you remind me why you couldn't use
 > <seclabel model='none'/> or <seclabel relabel='no'/> or
something that
 > would mitigate this?  If we cannot use this, then we need to implement
 > the <seclabel/> element for kernel and initrd.

 Hmm, can't we just label kernel and initrd files the same way we label
 <shareable/> disk images, i.e., non-exclusive label so that all QEMU
 process can access them and avoid removing the label once a domain
 disappears? 
We actually should treat it in the same way as <readonly/> disks,
and give it a shared read-only label. And indeed we *do* that.

The difference comes in the restore step - where we blow away the
readonly label and put it back to the original. For disks we never
restore readonly/shared labels, but for kernels we do. If we just
kill the restore step for kernels too, we should be fine AFAICT.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [Libguestfs] Quantifying libvirt errors in launching the libguestfs appliance