[libvirt] [libvirt-glib] mainloop integration fixes

Hey, While testing Boxes, I've noticed that if I started a VM and then let it in the background for a while, it usually crashed after a while. When this happens, the backtrace is: Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6671898 in gvir_event_handle_dispatch (source=0x7fffdc001f60, condition=G_IO_IN, opaque=0x7fffdc0021a0) at libvirt-glib-event.c:135 135 (data->cb)(data->watch, data->fd, events, data->opaque); (gdb) bt #0 0x00007ffff6671898 in gvir_event_handle_dispatch (source=0x7fffdc001f60, condition=G_IO_IN, opaque=0x7fffdc0021a0) at libvirt-glib-event.c:135 #1 0x00007ffff231dcb4 in g_io_unix_dispatch (source=0x40e34a0, callback=0x7ffff66717d8 <gvir_event_handle_dispatch>, user_data=0x7fffdc0021a0) at giounix.c:166 #2 0x00007ffff22cef15 in g_main_dispatch (context=0x7007f0) at gmain.c:2539 #3 0x00007ffff22cfbda in g_main_context_dispatch (context=0x7007f0) at gmain.c:3075 #4 0x00007ffff22cfdbd in g_main_context_iterate (context=0x7007f0, block=1, dispatch=1, self=0x15d3180) at gmain.c:3146 #5 0x00007ffff22cfe81 in g_main_context_iteration (context=0x7007f0, may_block=1) at gmain.c:3207 #6 0x00007ffff2ad7e7c in g_application_run (application=0x70b890, argc=0, argv=0x0) at gapplication.c:1607 #7 0x000000000041df98 in boxes_app_run (self=0x1527010) at app.c:1254 #8 0x0000000000450134 in _vala_main (args=0x7fffffffd708, args_length1=1) at main.c:729 #9 0x000000000045019e in main (argc=1, argv=0x7fffffffd708) at main.c:740 and the corresponding gvir_event_handle structure is corrupted: (gdb) p *data $3 = {watch = 0, fd = 0, events = 1886417008, enabled = 1886417008, channel = 0x7070707070707070, source = 1886417008, cb = 0x7070707070707070, opaque = 0x7070707070707070, ff = 0x7ffff60cfba0 <virNetSocketEventFree>} (gdb) p/x *data $4 = {watch = 0x0, fd = 0x0, events = 0x70707070, enabled = 0x70707070, channel = 0x7070707070707070, source = 0x70707070, cb = 0x7070707070707070, opaque = 0x7070707070707070, ff = 0x7ffff60cfba0} (I'm running with MALLOC_PERTURB_ set so the 0x70707070 pattern indicates uninitialized memory). Right before this happens, the libvirt-glib event debugging log has: (gnome-boxes:29577): Libvirt.GLib-DEBUG: Remove handle 0x7fffdc0021a0 1 27 (gnome-boxes:29577): Libvirt.GLib-DEBUG: Update handle 0x7fffdc0021a0 1 27 1 (gnome-boxes:29577): Libvirt.GLib-DEBUG: Update for missing handle watch 1 (gnome-boxes:29577): Libvirt.GLib-DEBUG: Dispatch handler 0x7fffdc0021a0 0 0 1 0x7070707070707070 that is, libvirt removes a handle watch, and then updates this handle (in other words, it calls virEventPollUpdateHandle after having called virEventPollRemoveHandle on the same handle). I don't know if it's legit for libvirt to do that, but we can make libvirt-glib robust against this situation, which is what patch 5/5 does. The first 3 patches are small cleanups, and 4/5 fixes a potential race which I found while looking at this code. Christophe

Make sure the watch ID and the address of the structure libvirt-glib uses to watch the handle are always printed to make tracking operation on the same watch easier. --- libvirt-glib/libvirt-glib-event.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index c3eb3c8..f09d6c8 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -130,7 +130,7 @@ gvir_event_handle_dispatch(GIOChannel *source G_GNUC_UNUSED, if (condition & G_IO_ERR) events |= VIR_EVENT_HANDLE_ERROR; - g_debug("Dispatch handler %d %d %p\n", data->fd, events, data->opaque); + g_debug("Dispatch handler %p %d %d %d %p\n", data, data->watch, data->fd, events, data->opaque); (data->cb)(data->watch, data->fd, events, data->opaque); @@ -166,7 +166,7 @@ gvir_event_handle_add(int fd, data->channel = g_io_channel_unix_new(fd); data->ff = ff; - g_debug("Add handle %d %d %p\n", data->fd, events, data->opaque); + g_debug("Add handle %p %d %d %d %p\n", data, data->watch, data->fd, events, data->opaque); data->source = g_io_add_watch(data->channel, cond, @@ -278,7 +278,7 @@ gvir_event_handle_remove(int watch) goto cleanup; } - g_debug("Remove handle %d %d\n", watch, data->fd); + g_debug("Remove handle %p %d %d\n", data, watch, data->fd); if (!data->source) goto cleanup; -- 1.7.10.2

On Wed, Jun 20, 2012 at 12:29:47PM +0200, Christophe Fergeau wrote:
Make sure the watch ID and the address of the structure libvirt-glib uses to watch the handle are always printed to make tracking operation on the same watch easier. --- libvirt-glib/libvirt-glib-event.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index c3eb3c8..f09d6c8 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -130,7 +130,7 @@ gvir_event_handle_dispatch(GIOChannel *source G_GNUC_UNUSED, if (condition & G_IO_ERR) events |= VIR_EVENT_HANDLE_ERROR;
- g_debug("Dispatch handler %d %d %p\n", data->fd, events, data->opaque); + g_debug("Dispatch handler %p %d %d %d %p\n", data, data->watch, data->fd, events, data->opaque);
(data->cb)(data->watch, data->fd, events, data->opaque);
@@ -166,7 +166,7 @@ gvir_event_handle_add(int fd, data->channel = g_io_channel_unix_new(fd); data->ff = ff;
- g_debug("Add handle %d %d %p\n", data->fd, events, data->opaque); + g_debug("Add handle %p %d %d %d %p\n", data, data->watch, data->fd, events, data->opaque);
data->source = g_io_add_watch(data->channel, cond, @@ -278,7 +278,7 @@ gvir_event_handle_remove(int watch) goto cleanup; }
- g_debug("Remove handle %d %d\n", watch, data->fd); + g_debug("Remove handle %p %d %d\n", data, watch, data->fd);
if (!data->source) goto cleanup;
ACK, trivial Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

All timeout and watch additions/updates/removals are logged except successful watch updates. This commit adds such a log message. --- libvirt-glib/libvirt-glib-event.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index f09d6c8..c720c1f 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -219,6 +219,8 @@ gvir_event_handle_update(int watch, goto cleanup; } + g_debug("Update handle %p %d %d %d\n", data, watch, data->fd, events); + if (events) { GIOCondition cond = 0; if (events == data->events) -- 1.7.10.2

On Wed, Jun 20, 2012 at 12:29:48PM +0200, Christophe Fergeau wrote:
All timeout and watch additions/updates/removals are logged except successful watch updates. This commit adds such a log message. --- libvirt-glib/libvirt-glib-event.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index f09d6c8..c720c1f 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -219,6 +219,8 @@ gvir_event_handle_update(int watch, goto cleanup; }
+ g_debug("Update handle %p %d %d %d\n", data, watch, data->fd, events); + if (events) { GIOCondition cond = 0; if (events == data->events)
ACK, trivial Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

--- libvirt-glib/libvirt-glib-event.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index c720c1f..c2eeb7a 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -183,7 +183,7 @@ gvir_event_handle_add(int fd, } static struct gvir_event_handle * -gvir_event_handle_find(int watch, guint *idx) +gvir_event_handle_find(int watch) { guint i; @@ -196,8 +196,6 @@ gvir_event_handle_find(int watch, guint *idx) } if (h->watch == watch) { - if (idx != NULL) - *idx = i; return h; } } @@ -213,7 +211,7 @@ gvir_event_handle_update(int watch, g_mutex_lock(eventlock); - data = gvir_event_handle_find(watch, NULL); + data = gvir_event_handle_find(watch); if (!data) { g_debug("Update for missing handle watch %d", watch); goto cleanup; @@ -270,11 +268,10 @@ gvir_event_handle_remove(int watch) { struct gvir_event_handle *data; int ret = -1; - guint idx; g_mutex_lock(eventlock); - data = gvir_event_handle_find(watch, &idx); + data = gvir_event_handle_find(watch); if (!data) { g_debug("Remove of missing watch %d", watch); goto cleanup; @@ -343,7 +340,7 @@ gvir_event_timeout_add(int interval, static struct gvir_event_timeout * -gvir_event_timeout_find(int timer, guint *idx) +gvir_event_timeout_find(int timer) { guint i; @@ -358,8 +355,6 @@ gvir_event_timeout_find(int timer, guint *idx) } if (t->timer == timer) { - if (idx != NULL) - *idx = i; return t; } } @@ -376,7 +371,7 @@ gvir_event_timeout_update(int timer, g_mutex_lock(eventlock); - data = gvir_event_timeout_find(timer, NULL); + data = gvir_event_timeout_find(timer); if (!data) { g_debug("Update of missing timer %d", timer); goto cleanup; @@ -421,12 +416,11 @@ static int gvir_event_timeout_remove(int timer) { struct gvir_event_timeout *data; - guint idx; int ret = -1; g_mutex_lock(eventlock); - data = gvir_event_timeout_find(timer, &idx); + data = gvir_event_timeout_find(timer); if (!data) { g_debug("Remove of missing timer %d", timer); goto cleanup; -- 1.7.10.2

On Wed, Jun 20, 2012 at 12:29:49PM +0200, Christophe Fergeau wrote:
--- libvirt-glib/libvirt-glib-event.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index c720c1f..c2eeb7a 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -183,7 +183,7 @@ gvir_event_handle_add(int fd, }
static struct gvir_event_handle * -gvir_event_handle_find(int watch, guint *idx) +gvir_event_handle_find(int watch) { guint i;
@@ -196,8 +196,6 @@ gvir_event_handle_find(int watch, guint *idx) }
if (h->watch == watch) { - if (idx != NULL) - *idx = i; return h; } } @@ -213,7 +211,7 @@ gvir_event_handle_update(int watch,
g_mutex_lock(eventlock);
- data = gvir_event_handle_find(watch, NULL); + data = gvir_event_handle_find(watch); if (!data) { g_debug("Update for missing handle watch %d", watch); goto cleanup; @@ -270,11 +268,10 @@ gvir_event_handle_remove(int watch) { struct gvir_event_handle *data; int ret = -1; - guint idx;
g_mutex_lock(eventlock);
- data = gvir_event_handle_find(watch, &idx); + data = gvir_event_handle_find(watch); if (!data) { g_debug("Remove of missing watch %d", watch); goto cleanup; @@ -343,7 +340,7 @@ gvir_event_timeout_add(int interval,
static struct gvir_event_timeout * -gvir_event_timeout_find(int timer, guint *idx) +gvir_event_timeout_find(int timer) { guint i;
@@ -358,8 +355,6 @@ gvir_event_timeout_find(int timer, guint *idx) }
if (t->timer == timer) { - if (idx != NULL) - *idx = i; return t; } } @@ -376,7 +371,7 @@ gvir_event_timeout_update(int timer,
g_mutex_lock(eventlock);
- data = gvir_event_timeout_find(timer, NULL); + data = gvir_event_timeout_find(timer); if (!data) { g_debug("Update of missing timer %d", timer); goto cleanup; @@ -421,12 +416,11 @@ static int gvir_event_timeout_remove(int timer) { struct gvir_event_timeout *data; - guint idx; int ret = -1;
g_mutex_lock(eventlock);
- data = gvir_event_timeout_find(timer, &idx); + data = gvir_event_timeout_find(timer); if (!data) { g_debug("Remove of missing timer %d", timer); goto cleanup;
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Timeout and watch deletion is done from an idle callback. However, we cannot assume that all libvirt event calls (the callbacks passed to virEventRegisterImpl) will be done from the mainloop thread. It's thus possible that a libvirt event call will run a thread while one of the idle deletion callbacks runs. Given that the 'handles' and 'timeouts' arrays are shared among all threads, we need to make sure we hold the 'eventlock' mutex before modifying them. --- libvirt-glib/libvirt-glib-event.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index c2eeb7a..587a59e 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -255,11 +255,15 @@ _event_handle_remove(gpointer data) { struct gvir_event_handle *h = data; + g_mutex_lock(eventlock); + if (h->ff) (h->ff)(h->opaque); g_ptr_array_remove_fast(handles, h); + g_mutex_unlock(eventlock); + return FALSE; } @@ -404,11 +408,15 @@ _event_timeout_remove(gpointer data) { struct gvir_event_timeout *t = data; + g_mutex_lock(eventlock); + if (t->ff) (t->ff)(t->opaque); g_ptr_array_remove_fast(timeouts, t); + g_mutex_unlock(eventlock); + return FALSE; } -- 1.7.10.2

On Wed, Jun 20, 2012 at 12:29:50PM +0200, Christophe Fergeau wrote:
Timeout and watch deletion is done from an idle callback. However, we cannot assume that all libvirt event calls (the callbacks passed to virEventRegisterImpl) will be done from the mainloop thread. It's thus possible that a libvirt event call will run a thread while one of the idle deletion callbacks runs. Given that the 'handles' and 'timeouts' arrays are shared among all threads, we need to make sure we hold the 'eventlock' mutex before modifying them. --- libvirt-glib/libvirt-glib-event.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index c2eeb7a..587a59e 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -255,11 +255,15 @@ _event_handle_remove(gpointer data) { struct gvir_event_handle *h = data;
+ g_mutex_lock(eventlock); + if (h->ff) (h->ff)(h->opaque);
g_ptr_array_remove_fast(handles, h);
+ g_mutex_unlock(eventlock); + return FALSE; }
@@ -404,11 +408,15 @@ _event_timeout_remove(gpointer data) { struct gvir_event_timeout *t = data;
+ g_mutex_lock(eventlock); + if (t->ff) (t->ff)(t->opaque);
g_ptr_array_remove_fast(timeouts, t);
+ g_mutex_unlock(eventlock); + return FALSE; }
ACK, clear thread safety bug Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

The deletion of libvirt timeouts/watches is done in 2 steps: - the first step is synchronous and unregisters the timeout/watch from glib mainloop - the second step is asynchronous and triggered from the first step. It releases the memory used for bookkeeping for the timeout/watch being deleted This is done this way to avoid some possible deadlocks when reentering the sync callback while freeing the memory associated with the timeout/watch. However, it's possible to call gvir_event_update_handle after gvir_event_remove_handle but before _event_handle_remove does the final cleanup. When this happen, _update_handle will reregister the handle with glib mainloop, and bad things will happen when a glib callback is triggered for this event after _event_handle_remove has freed the memory associated with this handle watch. This commit marks the timeouts and watches as removed in the synchronous _remove callback and makes sure removed timeouts/watches are ignored in _update callbacks. --- libvirt-glib/libvirt-glib-event.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index 587a59e..657c1bf 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -87,7 +87,7 @@ struct gvir_event_handle int watch; int fd; int events; - int enabled; + int removed; GIOChannel *channel; guint source; virEventHandleCallback cb; @@ -99,6 +99,7 @@ struct gvir_event_timeout { int timer; int interval; + int removed; guint source; virEventTimeoutCallback cb; void *opaque; @@ -195,7 +196,7 @@ gvir_event_handle_find(int watch) continue; } - if (h->watch == watch) { + if ((h->watch == watch) && !h->removed) { return h; } } @@ -289,6 +290,11 @@ gvir_event_handle_remove(int watch) g_source_remove(data->source); data->source = 0; data->events = 0; + /* since the actual watch deletion is done asynchronously, a handle_update call may + * reschedule the watch before it's fully deleted, that's why we need to mark it as + * 'removed' to prevent reuse + */ + data->removed = TRUE; g_idle_add(_event_handle_remove, data); ret = 0; @@ -358,7 +364,7 @@ gvir_event_timeout_find(int timer) continue; } - if (t->timer == timer) { + if ((t->timer == timer) && !t->removed) { return t; } } @@ -441,6 +447,11 @@ gvir_event_timeout_remove(int timer) g_source_remove(data->source); data->source = 0; + /* since the actual timeout deletion is done asynchronously, a timeout_update call may + * reschedule the timeout before it's fully deleted, that's why we need to mark it as + * 'removed' to prevent reuse + */ + data->removed = TRUE; g_idle_add(_event_timeout_remove, data); ret = 0; -- 1.7.10.2

On Wed, Jun 20, 2012 at 12:29:51PM +0200, Christophe Fergeau wrote:
The deletion of libvirt timeouts/watches is done in 2 steps: - the first step is synchronous and unregisters the timeout/watch from glib mainloop - the second step is asynchronous and triggered from the first step. It releases the memory used for bookkeeping for the timeout/watch being deleted
This is done this way to avoid some possible deadlocks when reentering the sync callback while freeing the memory associated with the timeout/watch.
However, it's possible to call gvir_event_update_handle after gvir_event_remove_handle but before _event_handle_remove does the final cleanup. When this happen, _update_handle will reregister the handle with glib mainloop, and bad things will happen when a glib callback is triggered for this event after _event_handle_remove has freed the memory associated with this handle watch.
That's clearly possible, but it is also illegal use of the events API by some client code, since the timer/watch id is not valid after calling remove(). Do you actually see this occurring in real life ?
This commit marks the timeouts and watches as removed in the synchronous _remove callback and makes sure removed timeouts/watches are ignored in _update callbacks. --- libvirt-glib/libvirt-glib-event.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/libvirt-glib/libvirt-glib-event.c b/libvirt-glib/libvirt-glib-event.c index 587a59e..657c1bf 100644 --- a/libvirt-glib/libvirt-glib-event.c +++ b/libvirt-glib/libvirt-glib-event.c @@ -87,7 +87,7 @@ struct gvir_event_handle int watch; int fd; int events; - int enabled; + int removed; GIOChannel *channel; guint source; virEventHandleCallback cb; @@ -99,6 +99,7 @@ struct gvir_event_timeout { int timer; int interval; + int removed; guint source; virEventTimeoutCallback cb; void *opaque; @@ -195,7 +196,7 @@ gvir_event_handle_find(int watch) continue; }
- if (h->watch == watch) { + if ((h->watch == watch) && !h->removed) { return h; } } @@ -289,6 +290,11 @@ gvir_event_handle_remove(int watch) g_source_remove(data->source); data->source = 0; data->events = 0; + /* since the actual watch deletion is done asynchronously, a handle_update call may + * reschedule the watch before it's fully deleted, that's why we need to mark it as + * 'removed' to prevent reuse + */ + data->removed = TRUE; g_idle_add(_event_handle_remove, data);
ret = 0; @@ -358,7 +364,7 @@ gvir_event_timeout_find(int timer) continue; }
- if (t->timer == timer) { + if ((t->timer == timer) && !t->removed) { return t; } } @@ -441,6 +447,11 @@ gvir_event_timeout_remove(int timer)
g_source_remove(data->source); data->source = 0; + /* since the actual timeout deletion is done asynchronously, a timeout_update call may + * reschedule the timeout before it's fully deleted, that's why we need to mark it as + * 'removed' to prevent reuse + */ + data->removed = TRUE; g_idle_add(_event_timeout_remove, data);
ret = 0;
ACK, because it doesn't hurt to be paranoid against badly written code. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Jun 25, 2012 at 11:03:16AM +0100, Daniel P. Berrange wrote:
On Wed, Jun 20, 2012 at 12:29:51PM +0200, Christophe Fergeau wrote:
The deletion of libvirt timeouts/watches is done in 2 steps: - the first step is synchronous and unregisters the timeout/watch from glib mainloop - the second step is asynchronous and triggered from the first step. It releases the memory used for bookkeeping for the timeout/watch being deleted
This is done this way to avoid some possible deadlocks when reentering the sync callback while freeing the memory associated with the timeout/watch.
However, it's possible to call gvir_event_update_handle after gvir_event_remove_handle but before _event_handle_remove does the final cleanup. When this happen, _update_handle will reregister the handle with glib mainloop, and bad things will happen when a glib callback is triggered for this event after _event_handle_remove has freed the memory associated with this handle watch.
That's clearly possible, but it is also illegal use of the events API by some client code, since the timer/watch id is not valid after calling remove(). Do you actually see this occurring in real life ?
Yes, I've seen this with gnome-boxes when starting a VM and then letting it run in the background, see the cover letter for a few more details I've just added if (h->removed) G_BREAKPOINT; to gvir_event_timeout_find to catch a backtrace when this happens: (gdb) bt #0 gvir_event_handle_find (watch=1) at libvirt-glib-event.c:202 #1 0x00007ffff6670b78 in gvir_event_handle_update (watch=1, events=1) at libvirt-glib-event.c:218 #2 0x00007ffff5f400e9 in virEventUpdateHandle (watch=1, events=1) at util/event.c:70 #3 0x00007ffff60823f1 in virNetSocketUpdateIOCallback (sock=0x7fffdc001fe0, events=1) at rpc/virnetsocket.c:1384 #4 0x00007ffff606e74e in virNetClientIOUpdateCallback (client=0x7fffdc001e20, enableCallback=true) at rpc/virnetclient.c:1454 #5 0x00007ffff606ecec in virNetClientIncomingEvent (sock=0x7fffdc001fe0, events=2, opaque=0x7fffdc001e20) at rpc/virnetclient.c:1659 #6 0x00007ffff608213c in virNetSocketEventHandle (watch=1, fd=26, events=2, opaque=0x7fffdc001fe0) at rpc/virnetsocket.c:1314 #7 0x00007ffff66708cd in gvir_event_handle_dispatch (source=0x7fffdc001f60, condition=G_IO_OUT, opaque=0x7fffdc001930) at libvirt-glib-event.c:136 #8 0x00007ffff1e20cb4 in g_io_unix_dispatch (source=0x7fffdc0017d0, callback=0x7ffff66707f8 <gvir_event_handle_dispatch>, user_data=0x7fffdc001930) at giounix.c:166 #9 0x00007ffff1dd1f15 in g_main_dispatch (context=0x6fea30) at gmain.c:2539 #10 0x00007ffff1dd2bda in g_main_context_dispatch (context=0x6fea30) at gmain.c:3075 #11 0x00007ffff1dd2dbd in g_main_context_iterate (context=0x6fea30, block=1, dispatch=1, self=0x728780) at gmain.c:3146 #12 0x00007ffff1dd2e81 in g_main_context_iteration (context=0x6fea30, may_block=1) at gmain.c:3207 #13 0x00007ffff25dbe7c in g_application_run (application=0x709890, argc=0, argv=0x0) at gapplication.c:1607 #14 0x000000000041dec8 in boxes_app_run (self=0x1525010) at app.c:1187 #15 0x00000000004506f4 in _vala_main (args=0x7fffffffd738, args_length1=1) at main.c:729 #16 0x000000000045075e in main (argc=1, argv=0x7fffffffd738) at main.c:740 It happens at more or less random times, so I'm not sure if I'll be able to catch a backtrace when the watch is removed. Christophe

On Mon, Jun 25, 2012 at 01:04:30PM +0200, Christophe Fergeau wrote:
It happens at more or less random times, so I'm not sure if I'll be able to catch a backtrace when the watch is removed.
Luckily watches don't get removed very often, so I managed to catch both the remove and the following update in gdb: Breakpoint 1, gvir_event_handle_remove (watch=1) at libvirt-glib-event.c:276 276 { (gdb) bt #0 gvir_event_handle_remove (watch=1) at libvirt-glib-event.c:276 #1 0x00007ffff5f4013a in virEventRemoveHandle (watch=1) at util/event.c:84 #2 0x00007ffff6082498 in virNetSocketRemoveIOCallback (sock=0x7fffdc001fe0) at rpc/virnetsocket.c:1399 #3 0x00007ffff606ec96 in virNetClientIncomingEvent (sock=0x7fffdc001fe0, events=2, opaque=0x7fffdc001e20) at rpc/virnetclient.c:1647 #4 0x00007ffff608213c in virNetSocketEventHandle (watch=1, fd=26, events=2, opaque=0x7fffdc001fe0) at rpc/virnetsocket.c:1314 #5 0x00007ffff66708cd in gvir_event_handle_dispatch (source=0x7fffdc001f60, condition=G_IO_OUT, opaque=0x7fffdc001930) at libvirt-glib-event.c:136 #6 0x00007ffff1e20cb4 in g_io_unix_dispatch (source=0x7fffb8002c00, callback=0x7ffff66707f8 <gvir_event_handle_dispatch>, user_data=0x7fffdc001930) at giounix.c:166 #7 0x00007ffff1dd1f15 in g_main_dispatch (context=0x6fea30) at gmain.c:2539 #8 0x00007ffff1dd2bda in g_main_context_dispatch (context=0x6fea30) at gmain.c:3075 #9 0x00007ffff1dd2dbd in g_main_context_iterate (context=0x6fea30, block=1, dispatch=1, self=0x1582780) at gmain.c:3146 #10 0x00007ffff1dd2e81 in g_main_context_iteration (context=0x6fea30, may_block=1) at gmain.c:3207 #11 0x00007ffff25dbe7c in g_application_run (application=0x709890, argc=0, argv=0x0) at gapplication.c:1607 #12 0x000000000041dec8 in boxes_app_run (self=0x1525010) at app.c:1187 #13 0x00000000004506f4 in _vala_main (args=0x7fffffffd738, args_length1=1) at main.c:729 #14 0x000000000045075e in main (argc=1, argv=0x7fffffffd738) at main.c:740 (gdb) c Continuing. [Thread 0x7fffd5853700 (LWP 30606) exited] Program received signal SIGTRAP, Trace/breakpoint trap. gvir_event_handle_find (watch=1) at libvirt-glib-event.c:202 202 if ((h->watch == watch) && !h->removed) { (gdb) bt #0 gvir_event_handle_find (watch=1) at libvirt-glib-event.c:202 #1 0x00007ffff6670b78 in gvir_event_handle_update (watch=1, events=1) at libvirt-glib-event.c:218 #2 0x00007ffff5f400e9 in virEventUpdateHandle (watch=1, events=1) at util/event.c:70 #3 0x00007ffff60823f1 in virNetSocketUpdateIOCallback (sock=0x7fffdc001fe0, events=1) at rpc/virnetsocket.c:1384 #4 0x00007ffff606e74e in virNetClientIOUpdateCallback (client=0x7fffdc001e20, enableCallback=true) at rpc/virnetclient.c:1454 #5 0x00007ffff606ecec in virNetClientIncomingEvent (sock=0x7fffdc001fe0, events=2, opaque=0x7fffdc001e20) at rpc/virnetclient.c:1659 #6 0x00007ffff608213c in virNetSocketEventHandle (watch=1, fd=26, events=2, opaque=0x7fffdc001fe0) at rpc/virnetsocket.c:1314 #7 0x00007ffff66708cd in gvir_event_handle_dispatch (source=0x7fffdc001f60, condition=G_IO_OUT, opaque=0x7fffdc001930) at libvirt-glib-event.c:136 #8 0x00007ffff1e20cb4 in g_io_unix_dispatch (source=0x7fffb8002c00, callback=0x7ffff66707f8 <gvir_event_handle_dispatch>, user_data=0x7fffdc001930) at giounix.c:166 #9 0x00007ffff1dd1f15 in g_main_dispatch (context=0x6fea30) at gmain.c:2539 #10 0x00007ffff1dd2bda in g_main_context_dispatch (context=0x6fea30) at gmain.c:3075 #11 0x00007ffff1dd2dbd in g_main_context_iterate (context=0x6fea30, block=1, dispatch=1, self=0x1582780) at gmain.c:3146 #12 0x00007ffff1dd2e81 in g_main_context_iteration (context=0x6fea30, may_block=1) at gmain.c:3207 #13 0x00007ffff25dbe7c in g_application_run (application=0x709890, argc=0, argv=0x0) at gapplication.c:1607 #14 0x000000000041dec8 in boxes_app_run (self=0x1525010) at app.c:1187 #15 0x00000000004506f4 in _vala_main (args=0x7fffffffd738, args_length1=1) at main.c:729 #16 0x000000000045075e in main (argc=1, argv=0x7fffffffd738) at main.c:740 (gdb) Hope that helps, Christophe

Ping ? Christophe On Wed, Jun 20, 2012 at 12:29:46PM +0200, Christophe Fergeau wrote:
Hey,
While testing Boxes, I've noticed that if I started a VM and then let it in the background for a while, it usually crashed after a while. When this happens, the backtrace is:
Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6671898 in gvir_event_handle_dispatch (source=0x7fffdc001f60, condition=G_IO_IN, opaque=0x7fffdc0021a0) at libvirt-glib-event.c:135 135 (data->cb)(data->watch, data->fd, events, data->opaque); (gdb) bt #0 0x00007ffff6671898 in gvir_event_handle_dispatch (source=0x7fffdc001f60, condition=G_IO_IN, opaque=0x7fffdc0021a0) at libvirt-glib-event.c:135 #1 0x00007ffff231dcb4 in g_io_unix_dispatch (source=0x40e34a0, callback=0x7ffff66717d8 <gvir_event_handle_dispatch>, user_data=0x7fffdc0021a0) at giounix.c:166 #2 0x00007ffff22cef15 in g_main_dispatch (context=0x7007f0) at gmain.c:2539 #3 0x00007ffff22cfbda in g_main_context_dispatch (context=0x7007f0) at gmain.c:3075 #4 0x00007ffff22cfdbd in g_main_context_iterate (context=0x7007f0, block=1, dispatch=1, self=0x15d3180) at gmain.c:3146 #5 0x00007ffff22cfe81 in g_main_context_iteration (context=0x7007f0, may_block=1) at gmain.c:3207 #6 0x00007ffff2ad7e7c in g_application_run (application=0x70b890, argc=0, argv=0x0) at gapplication.c:1607 #7 0x000000000041df98 in boxes_app_run (self=0x1527010) at app.c:1254 #8 0x0000000000450134 in _vala_main (args=0x7fffffffd708, args_length1=1) at main.c:729 #9 0x000000000045019e in main (argc=1, argv=0x7fffffffd708) at main.c:740
and the corresponding gvir_event_handle structure is corrupted:
(gdb) p *data $3 = {watch = 0, fd = 0, events = 1886417008, enabled = 1886417008, channel = 0x7070707070707070, source = 1886417008, cb = 0x7070707070707070, opaque = 0x7070707070707070, ff = 0x7ffff60cfba0 <virNetSocketEventFree>} (gdb) p/x *data $4 = {watch = 0x0, fd = 0x0, events = 0x70707070, enabled = 0x70707070, channel = 0x7070707070707070, source = 0x70707070, cb = 0x7070707070707070, opaque = 0x7070707070707070, ff = 0x7ffff60cfba0}
(I'm running with MALLOC_PERTURB_ set so the 0x70707070 pattern indicates uninitialized memory).
Right before this happens, the libvirt-glib event debugging log has:
(gnome-boxes:29577): Libvirt.GLib-DEBUG: Remove handle 0x7fffdc0021a0 1 27
(gnome-boxes:29577): Libvirt.GLib-DEBUG: Update handle 0x7fffdc0021a0 1 27 1
(gnome-boxes:29577): Libvirt.GLib-DEBUG: Update for missing handle watch 1
(gnome-boxes:29577): Libvirt.GLib-DEBUG: Dispatch handler 0x7fffdc0021a0 0 0 1 0x7070707070707070
that is, libvirt removes a handle watch, and then updates this handle (in other words, it calls virEventPollUpdateHandle after having called virEventPollRemoveHandle on the same handle). I don't know if it's legit for libvirt to do that, but we can make libvirt-glib robust against this situation, which is what patch 5/5 does. The first 3 patches are small cleanups, and 4/5 fixes a potential race which I found while looking at this code.
Christophe
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (2)
-
Christophe Fergeau
-
Daniel P. Berrange