Re: [libvirt] [PATCH 03/14] Fix crash when deleting monitor while a command is in progress

3 Dec 2009


      On Thu, Dec 03, 2009 at 12:47:02AM +0100, Matthias Bolte wrote:
...
2009/11/26 Daniel P. Berrange <berrange@redhat.com>:
...
If QEMU shuts down while we're in the middle of processing a
monitor command, the monitor will be freed, and upon cleaning
up we attempt to do  qemuMonitorUnlock(priv->mon) when priv->mon
is NULL.
To address this we introduce proper reference counting into
the qemuMonitorPtr object, and hold an extra reference whenever
executing a command.
* src/qemu/qemu_driver.c: Hold a reference on the monitor while
 executing commands, and only NULL-ify the priv->mon field when
 the last reference is released
* src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Add reference
 counting to handle safe deletion of monitor objects
The locking pattern below results in destroying a locked mutex. It
this intended?
No, that's a bug, the qemuMonitorUnref() call itself should have
called unlock if the ref count hit 0, before destroying it.
...
qemuMonitorLock(mon);
[...]
if (qemuMonitorUnref(mon) > 0)
    qemuMonitorUnlock(mon);
Well, this patch makes the TCK deadlock for me, seems to be a lock
ordering issue combined with a race condition; it doesn't happen every
run. I don't understand all details of the locking and refcounting
scheme of the QEMU monitor yet, it's quite complex and gets even more
complex.
Yes, that problem shown by valgrind will indeed deadlock. I'll fix
this. The domain object lock must never be acquired if the thread
holds the monitor lock already. We must have strict ordering from
top to bottom (driver -> domain object -> qemu monitor)


Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|