[libvirt] [PATCH] Fix (rare) deadlock in QEMU monitor callbacks

From: "Daniel P. Berrange" <berrange@redhat.com> Some users report (very rarely) seeing a deadlock in the QEMU monitor callbacks Thread 10 (Thread 0x7fcd11e20700 (LWP 26753)): at util/threads-pthread.c:85 at conf/domain_conf.c:14256 vm=0x7fcccc00a850) at qemu/qemu_process.c:1026 at qemu/qemu_monitor.c:249 at util/virobject.c:139 at qemu/qemu_monitor.c:860 vm=vm@entry=0x7fcccc00a850, reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=flags@entry=0) at qemu/qemu_process.c:4057 flags=<optimized out>) at qemu/qemu_driver.c:1977 domain=domain@entry=0x7fccf00c1830, flags=1) at libvirt.c:2256 At frame #10 we are holding the domain lock, we call into qemuProcessStop() to cleanup QEMU, which triggers the monitor to close, which invokes qemuProcessHandleMonitorDestroy() which tries to obtain the domain lock again. This is a non-recursive lock, hence hang. Since qemuMonitorPtr is a virObject, the unref call in qemuProcessHandleMonitorDestroy no longer needs mutex protection. The assignment of priv->mon = NULL, can be instead done by the caller of qemuMonitorClose(), thus removing all need for locking. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_process.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ade64b7..c8c188a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1019,17 +1019,10 @@ no_memory: } -static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, +static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjPtr vm) { - qemuDomainObjPrivatePtr priv; - - virDomainObjLock(vm); - priv = vm->privateData; - if (priv->mon == mon) - priv->mon = NULL; - if (virObjectUnref(vm)) - virDomainObjUnlock(vm); + virObjectUnref(vm); } static int @@ -3995,8 +3988,10 @@ void qemuProcessStop(struct qemud_driver *driver, priv->agentError = false; } - if (priv->mon) + if (priv->mon) { qemuMonitorClose(priv->mon); + priv->mon = NULL; + } if (priv->monConfig) { if (priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) -- 1.7.11.2

On Wed, Sep 26, 2012 at 03:56:52PM +0100, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Some users report (very rarely) seeing a deadlock in the QEMU monitor callbacks
Thread 10 (Thread 0x7fcd11e20700 (LWP 26753)): at util/threads-pthread.c:85 at conf/domain_conf.c:14256 vm=0x7fcccc00a850) at qemu/qemu_process.c:1026 at qemu/qemu_monitor.c:249 at util/virobject.c:139 at qemu/qemu_monitor.c:860 vm=vm@entry=0x7fcccc00a850, reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=flags@entry=0) at qemu/qemu_process.c:4057 flags=<optimized out>) at qemu/qemu_driver.c:1977 domain=domain@entry=0x7fccf00c1830, flags=1) at libvirt.c:2256
Argh, GIT ate my stack trace because the first letter was '#'! Thread 10 (Thread 0x7fcd11e20700 (LWP 26753)): #0 0x00000030d0e0de4d in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00000030d0e09ca6 in _L_lock_840 () from /lib64/libpthread.so.0 #2 0x00000030d0e09ba8 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007fcd162f416d in virMutexLock (m=<optimized out>) at util/threads-pthread.c:85 #4 0x00007fcd1632c651 in virDomainObjLock (obj=<optimized out>) at conf/domain_conf.c:14256 #5 0x00007fcd0daf05cc in qemuProcessHandleMonitorDestroy (mon=0x7fcccc0029e0, vm=0x7fcccc00a850) at qemu/qemu_process.c:1026 #6 0x00007fcd0db01710 in qemuMonitorDispose (obj=0x7fcccc0029e0) at qemu/qemu_monitor.c:249 #7 0x00007fcd162fd4e3 in virObjectUnref (anyobj=<optimized out>) at util/virobject.c:139 #8 0x00007fcd0db027a9 in qemuMonitorClose (mon=<optimized out>) at qemu/qemu_monitor.c:860 #9 0x00007fcd0daf61ad in qemuProcessStop (driver=driver@entry=0x7fcd04079d50, vm=vm@entry=0x7fcccc00a850, reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=flags@entry=0) at qemu/qemu_process.c:4057 #10 0x00007fcd0db323cf in qemuDomainDestroyFlags (dom=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:1977 #11 0x00007fcd1637ff51 in virDomainDestroyFlags ( domain=domain@entry=0x7fccf00c1830, flags=1) at libvirt.c:2256
At frame #10 we are holding the domain lock, we call into qemuProcessStop() to cleanup QEMU, which triggers the monitor to close, which invokes qemuProcessHandleMonitorDestroy() which tries to obtain the domain lock again. This is a non-recursive lock, hence hang.
Since qemuMonitorPtr is a virObject, the unref call in qemuProcessHandleMonitorDestroy no longer needs mutex protection. The assignment of priv->mon = NULL, can be instead done by the caller of qemuMonitorClose(), thus removing all need for locking.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- src/qemu/qemu_process.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index ade64b7..c8c188a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1019,17 +1019,10 @@ no_memory: }
-static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, +static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjPtr vm) { - qemuDomainObjPrivatePtr priv; - - virDomainObjLock(vm); - priv = vm->privateData; - if (priv->mon == mon) - priv->mon = NULL; - if (virObjectUnref(vm)) - virDomainObjUnlock(vm); + virObjectUnref(vm); }
static int @@ -3995,8 +3988,10 @@ void qemuProcessStop(struct qemud_driver *driver, priv->agentError = false; }
- if (priv->mon) + if (priv->mon) { qemuMonitorClose(priv->mon); + priv->mon = NULL; + }
if (priv->monConfig) { if (priv->monConfig->type == VIR_DOMAIN_CHR_TYPE_UNIX) -- 1.7.11.2
-- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 09/26/2012 08:56 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Some users report (very rarely) seeing a deadlock in the QEMU monitor callbacks
Thread 10 (Thread 0x7fcd11e20700 (LWP 26753)): at util/threads-pthread.c:85 at conf/domain_conf.c:14256 vm=0x7fcccc00a850) at qemu/qemu_process.c:1026 at qemu/qemu_monitor.c:249 at util/virobject.c:139 at qemu/qemu_monitor.c:860 vm=vm@entry=0x7fcccc00a850, reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED, flags=flags@entry=0) at qemu/qemu_process.c:4057 flags=<optimized out>) at qemu/qemu_driver.c:1977 domain=domain@entry=0x7fccf00c1830, flags=1) at libvirt.c:2256
At frame #10 we are holding the domain lock, we call into qemuProcessStop() to cleanup QEMU, which triggers the monitor to close, which invokes qemuProcessHandleMonitorDestroy() which tries to obtain the domain lock again. This is a non-recursive lock, hence hang.
Since qemuMonitorPtr is a virObject, the unref call in qemuProcessHandleMonitorDestroy no longer needs mutex protection. The assignment of priv->mon = NULL, can be instead done by the caller of qemuMonitorClose(), thus removing all need for locking.
ACK. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Wed, Sep 26, 2012 at 03:56:52PM +0100, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Some users report (very rarely) seeing a deadlock in the QEMU monitor callbacks
This is actually not rare. I hit it again today, twice, and I can ~50% reliably reproduce it using the parallel mount-local test in libguestfs. I have added this patch to my libvirt, and will see how it goes ... Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top

On Sat, Sep 29, 2012 at 08:30:10PM +0100, Richard W.M. Jones wrote:
On Wed, Sep 26, 2012 at 03:56:52PM +0100, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Some users report (very rarely) seeing a deadlock in the QEMU monitor callbacks
This is actually not rare. I hit it again today, twice, and I can ~50% reliably reproduce it using the parallel mount-local test in libguestfs.
I have added this patch to my libvirt, and will see how it goes ...
Looks good. I did ~100 cycles of that one test without any deadlocks. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://et.redhat.com/~rjones/virt-df/
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Richard W.M. Jones