
On Thu, Jan 14, 2016 at 10:51:47AM +0100, Jiri Denemark wrote:
On Wed, Jan 13, 2016 at 16:25:14 +0100, Martin Kletzander wrote:
On Wed, Jan 13, 2016 at 10:18:42AM +0000, Richard W.M. Jones wrote:
As people may know, we frequently encounter errors caused by libvirt when running the libguestfs appliance.
I wanted to find out exactly how frequently these happen and classify the errors, so I ran the 'virt-df' tool overnight 1700 times. This tool runs several parallel qemu:///session libvirt connections both creating a short-lived appliance guest.
Note that I have added Cole's patch to fix https://bugzilla.redhat.com/1271183 "XML-RPC error : Cannot write data: Transport endpoint is not connected"
Results:
The test failed 538 times (32% of the time), which is pretty dismal. To be fair, virt-df is aggressive about how it launches parallel libvirt connections. Most other virt-* tools use only a single libvirt connection and are consequently more reliable.
Of the failures, 518 (96%) were of the form:
process exited while connecting to monitor: qemu: could not load kernel '/home/rjones/d/libguestfs/tmp/.guestfs-1000/appliance.d/kernel': Permission denied
which is https://bugzilla.redhat.com/921135 or maybe https://bugzilla.redhat.com/1269975. It's not clear to me if these bugs have different causes, but if they do then potentially we're seeing a mix of both since my test has no way to distinguish them.
It looks to me as the same problem. And as the same problem we were talking about bunch of time and, apparently, didn't get to a conclusion.
For each of the kernels, libvirt labels them (with both DAC and selinux labels), then proceeds to launching qemu. If this is done parallel, the race is pretty obvious. Could you remind me why you couldn't use <seclabel model='none'/> or <seclabel relabel='no'/> or something that would mitigate this? If we cannot use this, then we need to implement the <seclabel/> element for kernel and initrd.
Hmm, can't we just label kernel and initrd files the same way we label <shareable/> disk images, i.e., non-exclusive label so that all QEMU process can access them and avoid removing the label once a domain disappears?
We actually should treat it in the same way as <readonly/> disks, and give it a shared read-only label. And indeed we *do* that. The difference comes in the restore step - where we blow away the readonly label and put it back to the original. For disks we never restore readonly/shared labels, but for kernels we do. If we just kill the restore step for kernels too, we should be fine AFAICT. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|