[libvirt] Easy reproducer for multiple races and segfaults in libvirtd

You need to read the instructions at the top, and download the following appliance too: http://libguestfs.org/download/binaries/appliance/appliance-1.18.9.tar.xz So far I've filed the following bugs: https://bugzilla.redhat.com/show_bug.cgi?id=875741 https://bugzilla.redhat.com/show_bug.cgi?id=877110 https://bugzilla.redhat.com/show_bug.cgi?id=877312 https://bugzilla.redhat.com/show_bug.cgi?id=877429 https://bugzilla.redhat.com/show_bug.cgi?id=877430 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v

On Fri, Nov 16, 2012 at 02:16:04PM +0000, Richard W.M. Jones wrote:
You need to read the instructions at the top, and download the following appliance too:
http://libguestfs.org/download/binaries/appliance/appliance-1.18.9.tar.xz
So far I've filed the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=875741 https://bugzilla.redhat.com/show_bug.cgi?id=877110 https://bugzilla.redhat.com/show_bug.cgi?id=877312 https://bugzilla.redhat.com/show_bug.cgi?id=877429 https://bugzilla.redhat.com/show_bug.cgi?id=877430
Thanks for the reproducer program, that should make life much easier. Just to confirm, you're seeing these problems on both 1.0.0 and current GIT master ? Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Nov 16, 2012 at 02:40:31PM +0000, Daniel P. Berrange wrote:
On Fri, Nov 16, 2012 at 02:16:04PM +0000, Richard W.M. Jones wrote:
You need to read the instructions at the top, and download the following appliance too:
http://libguestfs.org/download/binaries/appliance/appliance-1.18.9.tar.xz
So far I've filed the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=875741 https://bugzilla.redhat.com/show_bug.cgi?id=877110 https://bugzilla.redhat.com/show_bug.cgi?id=877312 https://bugzilla.redhat.com/show_bug.cgi?id=877429 https://bugzilla.redhat.com/show_bug.cgi?id=877430
Thanks for the reproducer program, that should make life much easier.
Just to confirm, you're seeing these problems on both 1.0.0 and current GIT master ?
Actually I'm testing libvirt-0.10.2.1-2.fc18.x86_64 & libvirt from git, and seeing roughly the same set of problems with both. Didn't try 1.0.0 at all. To use libvirt from git, I'm doing: killall libvirtd lt-libvirtd ~/d/libvirt/run ./test-parallel Plus I should note a few things about my environment: - Fedora 18 - 16 GB of RAM (if you don't have that, reduce NR_THREADS in the test) - baremetal with KVM on a very fast Intel Sandybridge (I doubt this is reproducible in a VM) - I've configured core_pattern and ulimit to capture coredumps in /tmp Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org

On Fri, Nov 16, 2012 at 02:48:03PM +0000, Richard W.M. Jones wrote:
Plus I should note a few things about my environment: [..]
and news just in: - reproducible on two different Intel machines (both F18 & baremetal) Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top

On Fri, Nov 16, 2012 at 02:48:03PM +0000, Richard W.M. Jones wrote:
On Fri, Nov 16, 2012 at 02:40:31PM +0000, Daniel P. Berrange wrote:
On Fri, Nov 16, 2012 at 02:16:04PM +0000, Richard W.M. Jones wrote:
You need to read the instructions at the top, and download the following appliance too:
http://libguestfs.org/download/binaries/appliance/appliance-1.18.9.tar.xz
So far I've filed the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=875741 https://bugzilla.redhat.com/show_bug.cgi?id=877110 https://bugzilla.redhat.com/show_bug.cgi?id=877312 https://bugzilla.redhat.com/show_bug.cgi?id=877429 https://bugzilla.redhat.com/show_bug.cgi?id=877430
Thanks for the reproducer program, that should make life much easier.
Just to confirm, you're seeing these problems on both 1.0.0 and current GIT master ?
Actually I'm testing libvirt-0.10.2.1-2.fc18.x86_64 & libvirt from git, and seeing roughly the same set of problems with both. Didn't try 1.0.0 at all.
To use libvirt from git, I'm doing:
killall libvirtd lt-libvirtd ~/d/libvirt/run ./test-parallel
Plus I should note a few things about my environment:
- Fedora 18
- 16 GB of RAM (if you don't have that, reduce NR_THREADS in the test)
- baremetal with KVM on a very fast Intel Sandybridge (I doubt this is reproducible in a VM)
- I've configured core_pattern and ulimit to capture coredumps in /tmp
I've run the test on several machines, and finally found one which would reproduce the "Operation is not valid" bug. I don't see any of the other BZs you list above occurring. At least i can now investigate what's gone wrong with 877430 Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Fri, Nov 16, 2012 at 03:31:43PM +0000, Daniel P. Berrange wrote:
On Fri, Nov 16, 2012 at 02:48:03PM +0000, Richard W.M. Jones wrote:
On Fri, Nov 16, 2012 at 02:40:31PM +0000, Daniel P. Berrange wrote:
On Fri, Nov 16, 2012 at 02:16:04PM +0000, Richard W.M. Jones wrote:
You need to read the instructions at the top, and download the following appliance too:
http://libguestfs.org/download/binaries/appliance/appliance-1.18.9.tar.xz
So far I've filed the following bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=875741 https://bugzilla.redhat.com/show_bug.cgi?id=877110 https://bugzilla.redhat.com/show_bug.cgi?id=877312 https://bugzilla.redhat.com/show_bug.cgi?id=877429 https://bugzilla.redhat.com/show_bug.cgi?id=877430
Thanks for the reproducer program, that should make life much easier.
Just to confirm, you're seeing these problems on both 1.0.0 and current GIT master ?
Actually I'm testing libvirt-0.10.2.1-2.fc18.x86_64 & libvirt from git, and seeing roughly the same set of problems with both. Didn't try 1.0.0 at all.
To use libvirt from git, I'm doing:
killall libvirtd lt-libvirtd ~/d/libvirt/run ./test-parallel
Plus I should note a few things about my environment:
- Fedora 18
- 16 GB of RAM (if you don't have that, reduce NR_THREADS in the test)
- baremetal with KVM on a very fast Intel Sandybridge (I doubt this is reproducible in a VM)
- I've configured core_pattern and ulimit to capture coredumps in /tmp
I've run the test on several machines, and finally found one which would reproduce the "Operation is not valid" bug. I don't see any of the other BZs you list above occurring.
At least i can now investigate what's gone wrong with 877430
Of the two machines I'm using, 877430 is most "popular" by far on the slower machine. Segfaults in libvirtd also happen on the slower machine, but much less regularly, and because of a configuration error I didn't manage to catch a core dump yet. 875741 happens most frequently on the faster machine. This might be caused by the relative speed or it might be because of some other combination of installed software. In any case, it takes at least 10 minutes on the faster machine (and usually longer) to get an error. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) http://cocan.org/getting_started_with_ocaml_on_red_hat_and_fedora

- Move the socket close after virConnectClose. This was probably what caused #877430, so that wasn't a real bug. - Update some error messages. - Turn off debugging by default. If it's working, it should be silent. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones New in Fedora 11: Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 70 libraries supprt'd http://fedoraproject.org/wiki/MinGW http://www.annexia.org/fedora_mingw
participants (2)
-
Daniel P. Berrange
-
Richard W.M. Jones