[libvirt] Deadlock when using custom handlers

Hi, virt-viewer is using it's own virEventRegisterImpl. With current libvirt this can deadlock when connection to nonexistant URIs like qemu+ssh:///unknownhost.example.com/system like: 23:47:00.338: 1526: debug : doRemoteOpen:503 : proceeding with name = qemu:///system 23:47:00.338: 1526: debug : doRemoteOpen:513 : Connecting with transport 2 23:47:00.338: 1526: debug : virCommandRunAsync:2042 : About to run LC_ALL=C LD_LIBRARY_PATH=/var/scratch/debian/libvirt/upstream/libvirt/src/.libs/ PATH=/usr/local/bin:/usr/bin:/bin:/usr/games:/home/agx/bin:/sbin:/usr/sbin:/usr/lib/git-core HOME=/home/agx USER=agx LOGNAME=agx SSH_AUTH_SOCK=/tmp/keyring-RNvlnT/ssh DISPLAY=:0 ssh pustekiste nc -U /var/run/libvirt/libvirt-sock-ro 23:47:00.339: 1526: debug : virCommandRunAsync:2058 : Command result 0, with PID 1527 23:47:00.339: 1526: debug : virNetSocketNew:115 : localAddr=(nil) remoteAddr=(nil) fd=6 errfd=8 pid=1527 23:47:00.339: 1526: debug : virNetSocketNew:173 : sock=0x8926d20 localAddrStr=(null) remoteAddrStr=(null) ** (virt-viewer:1526): DEBUG: Add handle 6 1 0x8926d20 23:47:00.339: 1526: debug : virNetClientNew:160 : client=0xb5b5f008 refs=2 23:47:00.339: 1526: debug : doRemoteOpen:640 : Trying authentication 23:47:00.339: 1526: debug : virNetMessageNew:44 : msg=0xb5b1e008 23:47:00.340: 1526: debug : virNetMessageEncodePayload:255 : Encode length as 28 23:47:00.340: 1526: debug : virNetClientIO:1035 : program=536903814 version=1 serial=0 proc=66 type=0 length=28 dispatch=(nil) 23:47:00.340: 1526: debug : virNetClientIO:1103 : We have the buck 0x8939940 0x8939940 23:47:00.350: 1526: debug : virNetClientIOEventLoop:979 : Giving up the buck due to I/O error 0x8939940 (nil) 23:47:00.350: 1526: debug : virNetClientIO:1130 : All done with our call (nil) 0x8939940 -1 23:47:00.350: 1526: debug : virNetMessageFree:57 : msg=0xb5b1e008 ** (virt-viewer:1526): DEBUG: Remove handle 1 6 <deadlock> Gdb displays the deadlock nicely: #0 0xb7710424 in __kernel_vsyscall () #1 0xb692cf02 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142 #2 0xb692839b in _L_lock_728 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0 #3 0xb69281c1 in __pthread_mutex_lock (mutex=0x95e1c30) at pthread_mutex_lock.c:61 #4 0xb698fd07 in virMutexLock (m=0x95e1c30) at util/threads-pthread.c:85 #5 0xb6a5dd56 in virNetSocketEventFree (opaque=0x95e1c30) at rpc/virnetsocket.c:1147 #6 0x080514a1 in virt_viewer_events_remove_handle (watch=1) at virt-viewer-events.c:178 #7 0xb697264e in virEventRemoveHandle (watch=1) at util/event.c:84 #8 0xb6a608ed in virNetSocketRemoveIOCallback (sock=0x95e1c30) at rpc/virnetsocket.c:1221 #9 0xb6a57114 in virNetClientClose (client=0xb5b7e008) at rpc/virnetclient.c:280 #10 0xb6a46fc4 in doRemoteOpen (conn=0x95e1aa8, priv=0x95b1370, auth=0xbfc3b178, flags=1) at remote/remote_driver.c:705 #11 0xb6a49612 in remoteOpen (conn=0x95e1aa8, auth=0xbfc3b178, flags=1) at remote/remote_driver.c:820 #12 0xb69f70c6 in do_open (name=0x95aac10 "qemu+ssh://pustekiste:2222/system", auth=0xbfc3b178, flags=1) at libvirt.c:1054 #13 0xb69f9b88 in virConnectOpenAuth (name=0x95aac10 "qemu+ssh://pustekiste:2222/system", auth=0xbfc3b178, flags=1) at libvirt.c:1280 #14 0x080500cf in virt_viewer_start (app=0x95a1010) at virt-viewer.c:483 #15 0x08053be8 in virt_viewer_app_start (self=0x95a1010) at virt-viewer-app.c:1044 #16 0x0804f54d in main (argc=1, argv=0xbfc3b3f4) at virt-viewer-main.c:120 which is the sock->lock: virNetSocketRemoveIOCallback holds sock->lock -> virEventRemoveHandle -> impl_remove_handle -> opaque->ff -> virNetSocketEventFree want to hold sock->lock Working around this by removing the locks from virNetSocketRemoveIOCallback leads to another deadlock: #0 0xb77de424 in __kernel_vsyscall () #1 0xb69faf02 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142 #2 0xb69f639b in _L_lock_728 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0 #3 0xb69f61c1 in __pthread_mutex_lock (mutex=0xb5c4c00c) at pthread_mutex_lock.c:61 #4 0xb6a5dd07 in virMutexLock (m=0xb5c4c00c) at util/threads-pthread.c:85 #5 0xb6b24a98 in virNetClientLock (client=0xb5c4c008) at rpc/virnetclient.c:99 #6 virNetClientFree (client=0xb5c4c008) at rpc/virnetclient.c:243 #7 0xb6b250b7 in virNetClientEventFree (opaque=0xb5c4c008) at rpc/virnetclient.c:117 #8 0xb6b2bd82 in virNetSocketEventFree (opaque=0x8aabc30) at rpc/virnetsocket.c:1156 #9 0x080514a1 in virt_viewer_events_remove_handle (watch=1) at virt-viewer-events.c:178 #10 0xb6a4064e in virEventRemoveHandle (watch=1) at util/event.c:84 #11 0xb6b2e8e5 in virNetSocketRemoveIOCallback (sock=0x8aabc30) at rpc/virnetsocket.c:1219 #12 0xb6b25114 in virNetClientClose (client=0xb5c4c008) at rpc/virnetclient.c:280 #13 0xb6b14fc4 in doRemoteOpen (conn=0x8aabaa8, priv=0x8a7b370, auth=0xbfbf1798, flags=1) at remote/remote_driver.c:705 #14 0xb6b17612 in remoteOpen (conn=0x8aabaa8, auth=0xbfbf1798, flags=1) at remote/remote_driver.c:820 #15 0xb6ac50c6 in do_open (name=0x8a74c10 "qemu+ssh://pustekiste:2222/system", auth=0xbfbf1798, flags=1) at libvirt.c:1054 #16 0xb6ac7b88 in virConnectOpenAuth (name=0x8a74c10 "qemu+ssh://pustekiste:2222/system", auth=0xbfbf1798, flags=1) at libvirt.c:1280 #17 0x080500cf in virt_viewer_start (app=0x8a6b010) at virt-viewer.c:483 #18 0x08053be8 in virt_viewer_app_start (self=0x8a6b010) at virt-viewer-app.c:1044 #19 0x0804f54d in main (argc=1, argv=0xbfbf1a14) at virt-viewer-main.c:120 which is the virNetClient lock: virNetClientClose holds client->lock -> virNetSocketRemoveIOCallback -> virEventRemoveHandle -> impl_remove_handle -> virNetSocketEventFree -> virNetClientEventFree wants the lock I didn't see a simple way to fix this but would welcome any suggestions. Cheers, -- Guido

On Fri, Aug 12, 2011 at 11:54:28PM +0200, Guido Günther wrote:
Hi, virt-viewer is using it's own virEventRegisterImpl. With current libvirt this can deadlock when connection to nonexistant URIs like
qemu+ssh:///unknownhost.example.com/system Gdb displays the deadlock nicely:
#0 0xb7710424 in __kernel_vsyscall () #1 0xb692cf02 in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:142 #2 0xb692839b in _L_lock_728 () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0 #3 0xb69281c1 in __pthread_mutex_lock (mutex=0x95e1c30) at pthread_mutex_lock.c:61 #4 0xb698fd07 in virMutexLock (m=0x95e1c30) at util/threads-pthread.c:85 #5 0xb6a5dd56 in virNetSocketEventFree (opaque=0x95e1c30) at rpc/virnetsocket.c:1147 #6 0x080514a1 in virt_viewer_events_remove_handle (watch=1) at virt-viewer-events.c:178 #7 0xb697264e in virEventRemoveHandle (watch=1) at util/event.c:84 #8 0xb6a608ed in virNetSocketRemoveIOCallback (sock=0x95e1c30) at rpc/virnetsocket.c:1221 #9 0xb6a57114 in virNetClientClose (client=0xb5b7e008) at rpc/virnetclient.c:280 #10 0xb6a46fc4 in doRemoteOpen (conn=0x95e1aa8, priv=0x95b1370, auth=0xbfc3b178, flags=1) at remote/remote_driver.c:705 #11 0xb6a49612 in remoteOpen (conn=0x95e1aa8, auth=0xbfc3b178, flags=1) at remote/remote_driver.c:820 #12 0xb69f70c6 in do_open (name=0x95aac10 "qemu+ssh://pustekiste:2222/system", auth=0xbfc3b178, flags=1) at libvirt.c:1054 #13 0xb69f9b88 in virConnectOpenAuth (name=0x95aac10 "qemu+ssh://pustekiste:2222/system", auth=0xbfc3b178, flags=1) at libvirt.c:1280 #14 0x080500cf in virt_viewer_start (app=0x95a1010) at virt-viewer.c:483 #15 0x08053be8 in virt_viewer_app_start (self=0x95a1010) at virt-viewer-app.c:1044 #16 0x0804f54d in main (argc=1, argv=0xbfc3b3f4) at virt-viewer-main.c:120
which is the sock->lock:
virNetSocketRemoveIOCallback holds sock->lock -> virEventRemoveHandle -> impl_remove_handle -> opaque->ff -> virNetSocketEventFree want to hold sock->lock
In the default libvirt event loop, the 'ff' callback is always invoked from a "clean" stack in the event loop, so you never have this problem with re-entrancy.
Working around this by removing the locks from virNetSocketRemoveIOCallback leads to another deadlock:
Yeah this is not a viable approach.
I didn't see a simple way to fix this but would welcome any suggestions.
IMHO we just have to document that event loop implementations should make sure that the 'ff' callbacks are always invoked from a clean stack. In the case of virt-viewer, this means changing it to register a g_idle callback function to invoke the 'ff' callback. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Hi Daniel, On Sat, Aug 13, 2011 at 08:57:45PM -0700, Daniel P. Berrange wrote:
On Fri, Aug 12, 2011 at 11:54:28PM +0200, Guido Günther wrote: [..snip..] In the default libvirt event loop, the 'ff' callback is always invoked from a "clean" stack in the event loop, so you never have this problem with re-entrancy.
Working around this by removing the locks from virNetSocketRemoveIOCallback leads to another deadlock:
Yeah this is not a viable approach. Sure. This was only to see what else fails.
I didn't see a simple way to fix this but would welcome any suggestions.
IMHO we just have to document that event loop implementations should make sure that the 'ff' callbacks are always invoked from a clean stack. In the case of virt-viewer, this means changing it to register a g_idle callback function to invoke the 'ff' callback.
Patch for virt-viewer attached. I'll come up with a doc patch for libvirt once I have a bit more time. Cheers -- Guido

On Tue, Aug 16, 2011 at 02:24:53PM +0200, Guido Günther wrote:
Hi Daniel, On Sat, Aug 13, 2011 at 08:57:45PM -0700, Daniel P. Berrange wrote:
On Fri, Aug 12, 2011 at 11:54:28PM +0200, Guido Günther wrote: [..snip..] In the default libvirt event loop, the 'ff' callback is always invoked from a "clean" stack in the event loop, so you never have this problem with re-entrancy.
Working around this by removing the locks from virNetSocketRemoveIOCallback leads to another deadlock:
Yeah this is not a viable approach. Sure. This was only to see what else fails.
I didn't see a simple way to fix this but would welcome any suggestions.
IMHO we just have to document that event loop implementations should make sure that the 'ff' callbacks are always invoked from a clean stack. In the case of virt-viewer, this means changing it to register a g_idle callback function to invoke the 'ff' callback.
Patch for virt-viewer attached. I'll come up with a doc patch for libvirt once I have a bit more time.
Thanks, I have applied this patch. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Aug 16, 2011 at 02:24:53PM +0200, Guido Günther wrote:
Hi Daniel, On Sat, Aug 13, 2011 at 08:57:45PM -0700, Daniel P. Berrange wrote:
On Fri, Aug 12, 2011 at 11:54:28PM +0200, Guido Günther wrote: [..snip..] In the default libvirt event loop, the 'ff' callback is always invoked from a "clean" stack in the event loop, so you never have this problem with re-entrancy.
Working around this by removing the locks from virNetSocketRemoveIOCallback leads to another deadlock:
Yeah this is not a viable approach. Sure. This was only to see what else fails.
I didn't see a simple way to fix this but would welcome any suggestions.
IMHO we just have to document that event loop implementations should make sure that the 'ff' callbacks are always invoked from a clean stack. In the case of virt-viewer, this means changing it to register a g_idle callback function to invoke the 'ff' callback.
Patch for virt-viewer attached. I'll come up with a doc patch for libvirt once I have a bit more time.
As promised here's the patch for libvirt. O.k. to apply? Cheers, -- Guido

On Mon, Oct 03, 2011 at 10:41:17PM +0200, Guido Günther wrote:
On Tue, Aug 16, 2011 at 02:24:53PM +0200, Guido Günther wrote:
Hi Daniel, On Sat, Aug 13, 2011 at 08:57:45PM -0700, Daniel P. Berrange wrote:
On Fri, Aug 12, 2011 at 11:54:28PM +0200, Guido Günther wrote: [..snip..] In the default libvirt event loop, the 'ff' callback is always invoked from a "clean" stack in the event loop, so you never have this problem with re-entrancy.
Working around this by removing the locks from virNetSocketRemoveIOCallback leads to another deadlock:
Yeah this is not a viable approach. Sure. This was only to see what else fails.
I didn't see a simple way to fix this but would welcome any suggestions.
IMHO we just have to document that event loop implementations should make sure that the 'ff' callbacks are always invoked from a clean stack. In the case of virt-viewer, this means changing it to register a g_idle callback function to invoke the 'ff' callback.
Patch for virt-viewer attached. I'll come up with a doc patch for libvirt once I have a bit more time.
As promised here's the patch for libvirt. O.k. to apply? Cheers, -- Guido
From c2e2be4e377871ace12bb7adfde130e9b8779ab5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Guido=20G=C3=BCnther?= <agx@sigxcpu.org> Date: Mon, 3 Oct 2011 22:24:13 +0200 Subject: [PATCH] Document that ff callbacks need to be invoked from a clean stack.
--- include/libvirt/libvirt.h.in | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index a3c581d..bd7a0f7 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -2243,13 +2243,15 @@ typedef void (*virEventHandleCallback)(int watch, int fd, int events, void *opaq * @opaque: user data to pass to the callback * @ff: the callback invoked to free opaque data blob * - * Part of the EventImpl, this callback Adds a file handle callback to + * Part of the EventImpl, this callback adds a file handle callback to * listen for specific events. The same file handle can be registered * multiple times provided the requested event sets are non-overlapping * * If the opaque user data requires free'ing when the handle * is unregistered, then a 2nd callback can be supplied for - * this purpose. + * this purpose. This callback needs to be invoked from a clean stack. + * If 'ff' callbacks are invoked directly from the virEventRemoveHandleFunc + * they will likely deadlock in libvirt. * * Returns a handle watch number to be used for updating * and unregistering for events
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Oct 04, 2011 at 09:00:17AM +0100, Daniel P. Berrange wrote:
On Mon, Oct 03, 2011 at 10:41:17PM +0200, Guido Günther wrote:
On Tue, Aug 16, 2011 at 02:24:53PM +0200, Guido Günther wrote:
Hi Daniel, On Sat, Aug 13, 2011 at 08:57:45PM -0700, Daniel P. Berrange wrote:
On Fri, Aug 12, 2011 at 11:54:28PM +0200, Guido Günther wrote: [..snip..] In the default libvirt event loop, the 'ff' callback is always invoked from a "clean" stack in the event loop, so you never have this problem with re-entrancy.
Working around this by removing the locks from virNetSocketRemoveIOCallback leads to another deadlock:
Yeah this is not a viable approach. Sure. This was only to see what else fails.
I didn't see a simple way to fix this but would welcome any suggestions.
IMHO we just have to document that event loop implementations should make sure that the 'ff' callbacks are always invoked from a clean stack. In the case of virt-viewer, this means changing it to register a g_idle callback function to invoke the 'ff' callback.
Patch for virt-viewer attached. I'll come up with a doc patch for libvirt once I have a bit more time.
As promised here's the patch for libvirt. O.k. to apply? Cheers, -- Guido
From c2e2be4e377871ace12bb7adfde130e9b8779ab5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Guido=20G=C3=BCnther?= <agx@sigxcpu.org> Date: Mon, 3 Oct 2011 22:24:13 +0200 Subject: [PATCH] Document that ff callbacks need to be invoked from a clean stack.
--- include/libvirt/libvirt.h.in | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index a3c581d..bd7a0f7 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -2243,13 +2243,15 @@ typedef void (*virEventHandleCallback)(int watch, int fd, int events, void *opaq * @opaque: user data to pass to the callback * @ff: the callback invoked to free opaque data blob * - * Part of the EventImpl, this callback Adds a file handle callback to + * Part of the EventImpl, this callback adds a file handle callback to * listen for specific events. The same file handle can be registered * multiple times provided the requested event sets are non-overlapping * * If the opaque user data requires free'ing when the handle * is unregistered, then a 2nd callback can be supplied for - * this purpose. + * this purpose. This callback needs to be invoked from a clean stack. + * If 'ff' callbacks are invoked directly from the virEventRemoveHandleFunc + * they will likely deadlock in libvirt. * * Returns a handle watch number to be used for updating * and unregistering for events
ACK Pushed. Thanks. -- Guido
participants (2)
-
Daniel P. Berrange
-
Guido Günther