Ping ?
Christophe
On Wed, Jun 20, 2012 at 12:29:46PM +0200, Christophe Fergeau wrote:
Hey,
While testing Boxes, I've noticed that if I started a VM and then
let it in the background for a while, it usually crashed after
a while. When this happens, the backtrace is:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6671898 in gvir_event_handle_dispatch (source=0x7fffdc001f60,
condition=G_IO_IN, opaque=0x7fffdc0021a0) at libvirt-glib-event.c:135
135 (data->cb)(data->watch, data->fd, events, data->opaque);
(gdb) bt
#0 0x00007ffff6671898 in gvir_event_handle_dispatch (source=0x7fffdc001f60,
condition=G_IO_IN, opaque=0x7fffdc0021a0) at libvirt-glib-event.c:135
#1 0x00007ffff231dcb4 in g_io_unix_dispatch (source=0x40e34a0,
callback=0x7ffff66717d8 <gvir_event_handle_dispatch>,
user_data=0x7fffdc0021a0) at giounix.c:166
#2 0x00007ffff22cef15 in g_main_dispatch (context=0x7007f0) at gmain.c:2539
#3 0x00007ffff22cfbda in g_main_context_dispatch (context=0x7007f0)
at gmain.c:3075
#4 0x00007ffff22cfdbd in g_main_context_iterate (context=0x7007f0, block=1,
dispatch=1, self=0x15d3180) at gmain.c:3146
#5 0x00007ffff22cfe81 in g_main_context_iteration (context=0x7007f0,
may_block=1) at gmain.c:3207
#6 0x00007ffff2ad7e7c in g_application_run (application=0x70b890, argc=0,
argv=0x0) at gapplication.c:1607
#7 0x000000000041df98 in boxes_app_run (self=0x1527010) at app.c:1254
#8 0x0000000000450134 in _vala_main (args=0x7fffffffd708, args_length1=1)
at main.c:729
#9 0x000000000045019e in main (argc=1, argv=0x7fffffffd708) at main.c:740
and the corresponding gvir_event_handle structure is corrupted:
(gdb) p *data
$3 = {watch = 0, fd = 0, events = 1886417008, enabled = 1886417008,
channel = 0x7070707070707070, source = 1886417008, cb = 0x7070707070707070,
opaque = 0x7070707070707070, ff = 0x7ffff60cfba0 <virNetSocketEventFree>}
(gdb) p/x *data
$4 = {watch = 0x0, fd = 0x0, events = 0x70707070, enabled = 0x70707070,
channel = 0x7070707070707070, source = 0x70707070, cb = 0x7070707070707070,
opaque = 0x7070707070707070, ff = 0x7ffff60cfba0}
(I'm running with MALLOC_PERTURB_ set so the 0x70707070 pattern
indicates uninitialized memory).
Right before this happens, the libvirt-glib event debugging log has:
(gnome-boxes:29577): Libvirt.GLib-DEBUG: Remove handle 0x7fffdc0021a0 1 27
(gnome-boxes:29577): Libvirt.GLib-DEBUG: Update handle 0x7fffdc0021a0 1 27
1
(gnome-boxes:29577): Libvirt.GLib-DEBUG: Update for missing handle watch 1
(gnome-boxes:29577): Libvirt.GLib-DEBUG: Dispatch handler 0x7fffdc0021a0 0
0 1 0x7070707070707070
that is, libvirt removes a handle watch, and then updates this handle (in
other words, it calls virEventPollUpdateHandle after having called
virEventPollRemoveHandle on the same handle).
I don't know if it's legit for libvirt to do that, but we can make
libvirt-glib robust against this situation, which is what patch 5/5 does.
The first 3 patches are small cleanups, and 4/5 fixes a potential race
which I found while looking at this code.
Christophe
--
libvir-list mailing list
libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list