[libvirt] [PATCH 0/2] qemu: Fix DomainObj refcounting/hashtable races

Both of these bugs were found while looking for the cause of: https://bugzilla.redhat.com/show_bug.cgi?id=670848 The end-result of that bug was a crash due to an attempt to double-free a virDomainObj. Both of these could have contributed to that crash. There *may* be other places where virDomainRemoveInactive is called without having the driverlock (eg - the error handling of qemuProcessReconnect??), but they can be looked at later; the occurence fixed here is almost certainly part of the cause of the crash in Bug 670848. Note that these bugs showed up when testing transient domains, which seem to be used relatively less often than persistent domains, but which put more stress on all of the code that adds and removes virDomainObjs (if a domain is persistent, its virDomainObj is created when libvirtd starts, and its virDomainObj remains alive until libvirtd is terminated, but a transient domain's virDomainObj dies when the domain is shutdown.) It would probably be a good idea to add some transient domain stress testing to the libvirt-tck.

This was found while researching the root cause of: https://bugzilla.redhat.com/show_bug.cgi?id=670848 virDomainUnref should only be called with the lock held for the virDomainObj in question. However, when a transient qemu domain gets EOF on its monitor socket, it queues an event which frees the monitor, which unref's the virDomainObj without first locking it. If another thread has already locked the virDomainObj, the modification of the refcount could potentially be corrupted. In an extreme case, it could also be potentially unlocked by virDomainObjFree, thus left open to modification by anyone else who would have otherwise waited for the lock (not to mention the fact that they would be accessing freed data!). The solution is to have qemuMonitorFree lock the domain object right before unrefing it. Since the caller to qemuMonitorFree doesn't expect this lock to be held, if the refcount doesn't go all the way to 0, qemuMonitorFree must unlock it after the unref. --- src/qemu/qemu_process.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c419c75..1d67b3c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -596,7 +596,9 @@ static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, qemuDomainObjPrivatePtr priv = vm->privateData; if (priv->mon == mon) priv->mon = NULL; - virDomainObjUnref(vm); + virDomainObjLock(vm); + if (virDomainObjUnref(vm) > 0) + virDomainObjUnlock(vm); } static qemuMonitorCallbacks monitorCallbacks = { -- 1.7.3.4

On 03/03/2011 12:47 PM, Laine Stump wrote:
This was found while researching the root cause of:
https://bugzilla.redhat.com/show_bug.cgi?id=670848
virDomainUnref should only be called with the lock held for the virDomainObj in question. However, when a transient qemu domain gets EOF on its monitor socket, it queues an event which frees the monitor, which unref's the virDomainObj without first locking it. If another thread has already locked the virDomainObj, the modification of the refcount could potentially be corrupted. In an extreme case, it could also be potentially unlocked by virDomainObjFree, thus left open to modification by anyone else who would have otherwise waited for the lock (not to mention the fact that they would be accessing freed data!).
The solution is to have qemuMonitorFree lock the domain object right before unrefing it. Since the caller to qemuMonitorFree doesn't expect this lock to be held, if the refcount doesn't go all the way to 0, qemuMonitorFree must unlock it after the unref.
Nice writeup. However, just looking at this:
--- src/qemu/qemu_process.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c419c75..1d67b3c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -596,7 +596,9 @@ static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, qemuDomainObjPrivatePtr priv = vm->privateData; if (priv->mon == mon) priv->mon = NULL;
Hmm - we're modifying vm (by changing priv->mon)...
- virDomainObjUnref(vm); + virDomainObjLock(vm);
...prior to obtaining the lock. That sounds wrong. Do things still work for you if you move the virDomainObjLock(vm) prior to the point where we change priv->mon?
+ if (virDomainObjUnref(vm) > 0) + virDomainObjUnlock(vm); }
static qemuMonitorCallbacks monitorCallbacks = {
-- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

(Eric pointed out in IRC that I should be acquiring the domainobj lock prior to modifying obj->privateData->mon) This was found while researching the root cause of: https://bugzilla.redhat.com/show_bug.cgi?id=670848 virDomainUnref should only be called with the lock held for the virDomainObj in question. However, when a transient qemu domain gets EOF on its monitor socket, it queues an event which frees the monitor, which unref's the virDomainObj without first locking it. If another thread has already locked the virDomainObj, the modification of the refcount could potentially be corrupted. In an extreme case, it could also be potentially unlocked by virDomainObjFree, thus left open to modification by anyone else who would have otherwise waited for the lock (not to mention the fact that they would be accessing freed data!). The solution is to have qemuMonitorFree lock the domain object right before unrefing it. Since the caller to qemuMonitorFree doesn't expect this lock to be held, if the refcount doesn't go all the way to 0, qemuMonitorFree must unlock it after the unref. --- src/qemu/qemu_process.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c419c75..9084725 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -593,10 +593,14 @@ no_memory: static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, virDomainObjPtr vm) { - qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainObjPrivatePtr priv; + + virDomainObjLock(vm); + priv = vm->privateData; if (priv->mon == mon) priv->mon = NULL; - virDomainObjUnref(vm); + if (virDomainObjUnref(vm) > 0) + virDomainObjUnlock(vm); } static qemuMonitorCallbacks monitorCallbacks = { -- 1.7.3.4

On 03/03/2011 02:10 PM, Laine Stump wrote:
(Eric pointed out in IRC that I should be acquiring the domainobj lock prior to modifying obj->privateData->mon)
The solution is to have qemuMonitorFree lock the domain object right before unrefing it. Since the caller to qemuMonitorFree doesn't expect this lock to be held, if the refcount doesn't go all the way to 0, qemuMonitorFree must unlock it after the unref. --- src/qemu/qemu_process.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c419c75..9084725 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -593,10 +593,14 @@ no_memory: static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, virDomainObjPtr vm) { - qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainObjPrivatePtr priv; + + virDomainObjLock(vm); + priv = vm->privateData; if (priv->mon == mon) priv->mon = NULL; - virDomainObjUnref(vm); + if (virDomainObjUnref(vm) > 0) + virDomainObjUnlock(vm); }
ACK from me for following the rules in src/qemu/THREADS.txt, although you may want to wait for danpb to weigh in. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Mar 03, 2011 at 04:10:07PM -0500, Laine Stump wrote:
(Eric pointed out in IRC that I should be acquiring the domainobj lock prior to modifying obj->privateData->mon)
This was found while researching the root cause of:
https://bugzilla.redhat.com/show_bug.cgi?id=670848
virDomainUnref should only be called with the lock held for the virDomainObj in question. However, when a transient qemu domain gets EOF on its monitor socket, it queues an event which frees the monitor, which unref's the virDomainObj without first locking it. If another thread has already locked the virDomainObj, the modification of the refcount could potentially be corrupted. In an extreme case, it could also be potentially unlocked by virDomainObjFree, thus left open to modification by anyone else who would have otherwise waited for the lock (not to mention the fact that they would be accessing freed data!).
The solution is to have qemuMonitorFree lock the domain object right before unrefing it. Since the caller to qemuMonitorFree doesn't expect this lock to be held, if the refcount doesn't go all the way to 0, qemuMonitorFree must unlock it after the unref. --- src/qemu/qemu_process.c | 8 ++++++-- 1 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c419c75..9084725 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -593,10 +593,14 @@ no_memory: static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, virDomainObjPtr vm) { - qemuDomainObjPrivatePtr priv = vm->privateData; + qemuDomainObjPrivatePtr priv; + + virDomainObjLock(vm); + priv = vm->privateData; if (priv->mon == mon) priv->mon = NULL; - virDomainObjUnref(vm); + if (virDomainObjUnref(vm) > 0) + virDomainObjUnlock(vm); }
static qemuMonitorCallbacks monitorCallbacks = {
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/03/2011 12:47 PM, Laine Stump wrote:
This was found while researching the root cause of:
https://bugzilla.redhat.com/show_bug.cgi?id=670848
virDomainUnref should only be called with the lock held for the virDomainObj in question. However, when a transient qemu domain gets EOF on its monitor socket, it queues an event which frees the monitor, which unref's the virDomainObj without first locking it. If another thread has already locked the virDomainObj, the modification of the refcount could potentially be corrupted. In an extreme case, it could also be potentially unlocked by virDomainObjFree, thus left open to modification by anyone else who would have otherwise waited for the lock (not to mention the fact that they would be accessing freed data!).
The solution is to have qemuMonitorFree lock the domain object right before unrefing it. Since the caller to qemuMonitorFree doesn't expect this lock to be held, if the refcount doesn't go all the way to 0, qemuMonitorFree must unlock it after the unref. --- src/qemu/qemu_process.c | 4 +++- 1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index c419c75..1d67b3c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -596,7 +596,9 @@ static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, qemuDomainObjPrivatePtr priv = vm->privateData; if (priv->mon == mon) priv->mon = NULL; - virDomainObjUnref(vm); + virDomainObjLock(vm); + if (virDomainObjUnref(vm) > 0) + virDomainObjUnlock(vm);
Phooey, this causes 'virsh save domain file' to deadlock. I'm still looking into why, but it may be that we're going about fixing this wrong. Maybe we should be making sure that qemuProcessStop ensures that the monitor callback function won't trigger if there is no domain for it to trigger on. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 03/04/2011 02:46 PM, Eric Blake wrote:
+++ b/src/qemu/qemu_process.c @@ -596,7 +596,9 @@ static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, qemuDomainObjPrivatePtr priv = vm->privateData; if (priv->mon == mon) priv->mon = NULL; - virDomainObjUnref(vm); + virDomainObjLock(vm); + if (virDomainObjUnref(vm) > 0) + virDomainObjUnlock(vm);
Phooey, this causes 'virsh save domain file' to deadlock.
I'm still looking into why, but it may be that we're going about fixing this wrong. Maybe we should be making sure that qemuProcessStop ensures that the monitor callback function won't trigger if there is no domain for it to trigger on.
I think the real reason for the deadlock is that: qemudDomainSaveFlag holds the domain lock while it calls qemuDomainEventQueue, which requests the event loop lock the event loop thread holds the event loop lock, and can make several different callbacks that can result in requesting a domain lock I counted 29 callers of qemuDomainEventQueue; the majority of those were in situations where the domain lock had already been dropped (in fact, it makes sense to generate the event while you own the domain, but wait to fire it off until after you are done with the domain). But these culprits either held a domain lock, or I couldn't quickly see if they might hold a domain lock higher in the call stack: qemuMigrationSetOffline in qemu_migration.c has the domain lock qemudDomainSaveFlag in qemu_driver.c qemuDomainRevertToSnapshot in qemu_driver.c Not sure about: qemuMigrationFinish in qemu_migration.c qemudNotifyLoadDomain in qemu_driver.c qemudDomainSaveImageStart in qemu_driver.c qemuDomainObjStart in qemu_driver.c Of those, qemuDomainRevertToSnapshot can easily be fixed by swapping the unlock and event queue calls. But the rest could probably use a helper method that increases the vm ref count, unlocks the domain lock, fires the event, grabs the domain lock, reduces the vm ref count, and checks that the domain is still active. However, I ran out of time to code that up for now, and am not positive that it is the right solution. Dan, your thoughts? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 03/04/2011 06:03 PM, Eric Blake wrote:
I think the real reason for the deadlock is that:
qemudDomainSaveFlag holds the domain lock while it calls qemuDomainEventQueue, which requests the event loop lock
the event loop thread holds the event loop lock, and can make several different callbacks that can result in requesting a domain lock
I counted 29 callers of qemuDomainEventQueue; the majority of those were in situations where the domain lock had already been dropped (in fact, it makes sense to generate the event while you own the domain, but wait to fire it off until after you are done with the domain).
But the rest could probably use a helper method that increases the vm ref count, unlocks the domain lock, fires the event, grabs the domain lock, reduces the vm ref count, and checks that the domain is still active.
Or better yet, a helper function qemuDomainEventQueueLocked, which takes a locked vm as an additional parameter, and which stashes the event in the domain object, then teach virDomainObjUnlock that if an event is stashed in the domain, to call virDomainEventQueuePush then (that is, instead of worrying about unlocking and relocking in place, we instead delay firing the event until the point where we know there is no longer a domain lock). -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Fri, Mar 04, 2011 at 06:03:36PM -0700, Eric Blake wrote:
On 03/04/2011 02:46 PM, Eric Blake wrote:
+++ b/src/qemu/qemu_process.c @@ -596,7 +596,9 @@ static void qemuProcessHandleMonitorDestroy(qemuMonitorPtr mon, qemuDomainObjPrivatePtr priv = vm->privateData; if (priv->mon == mon) priv->mon = NULL; - virDomainObjUnref(vm); + virDomainObjLock(vm); + if (virDomainObjUnref(vm) > 0) + virDomainObjUnlock(vm);
Phooey, this causes 'virsh save domain file' to deadlock.
I'm still looking into why, but it may be that we're going about fixing this wrong. Maybe we should be making sure that qemuProcessStop ensures that the monitor callback function won't trigger if there is no domain for it to trigger on.
I think the real reason for the deadlock is that:
qemudDomainSaveFlag holds the domain lock while it calls qemuDomainEventQueue, which requests the event loop lock
the event loop thread holds the event loop lock, and can make several different callbacks that can result in requesting a domain lock
I don't think this can be the deadlock scenario. The event loop lock is completely isolated from other locks, because it is only ever held while code in event.c is executing. It is always released when invoking any external callbacks, so you can't have any lock ordering dependencies wrt the event loop lock. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 03/07/2011 03:03 AM, Daniel P. Berrange wrote:
I think the real reason for the deadlock is that:
qemudDomainSaveFlag holds the domain lock while it calls qemuDomainEventQueue, which requests the event loop lock
the event loop thread holds the event loop lock, and can make several different callbacks that can result in requesting a domain lock
I don't think this can be the deadlock scenario. The event loop lock is completely isolated from other locks, because it is only ever held while code in event.c is executing. It is always released when invoking any external callbacks, so you can't have any lock ordering dependencies wrt the event loop lock.
But right now, the event loop lock _is_ held while calling the destroy callbacks; see virEventCleanupHandles. Would it be better to break the deadlock by dropping the event loop lock there? Oh, serves me right for reading email in order; I just noticed that this has been suggested already: https://www.redhat.com/archives/libvir-list/2011-March/msg00218.html https://www.redhat.com/archives/libvir-list/2011-March/msg00232.html -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

This was also found while investigating https://bugzilla.redhat.com/show_bug.cgi?id=670848 An EOF on a domain's monitor socket results in an event being queued to handle the EOF. The handler calls qemuProcessHandleMonitorEOF. If it is a transient domain, this leads to a call to virDomainRemoveInactive, which removes the domain from the driver's hashtable and unref's it. Nowhere in this code is the qemu driver lock acquired. However, all modifications to the driver's domain hashtable *must* be done while holding the driver lock, otherwise the hashtable can become corrupt, and (even more likely) another thread could call a different hashtable function and acquire a pointer to the domain that is in the process of being destroyed. To prevent such a disaster, qemuProcessHandleMonitorEOF must get the qemu driver lock *before* it gets the DomainObj's lock, and hold it until it is finished with the DomainObj. This guarantees that nobody else modifies the hashtable at the same time, and that anyone who had already gotten the DomainObj from the hashtable prior to this call has finished with it before we remove/destroy it. --- src/qemu/qemu_process.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1d67b3c..6a65a33 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -107,11 +107,13 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name); + qemuDriverLock(driver); virDomainObjLock(vm); if (!virDomainObjIsActive(vm)) { VIR_DEBUG("Domain %p is not active, ignoring EOF", vm); virDomainObjUnlock(vm); + qemuDriverUnlock(driver); return; } @@ -137,10 +139,9 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjUnlock(vm); if (event) { - qemuDriverLock(driver); qemuDomainEventQueue(driver, event); - qemuDriverUnlock(driver); } + qemuDriverUnlock(driver); } -- 1.7.3.4

On 03/03/2011 12:47 PM, Laine Stump wrote:
This was also found while investigating
https://bugzilla.redhat.com/show_bug.cgi?id=670848
An EOF on a domain's monitor socket results in an event being queued to handle the EOF. The handler calls qemuProcessHandleMonitorEOF. If it is a transient domain, this leads to a call to virDomainRemoveInactive, which removes the domain from the driver's hashtable and unref's it. Nowhere in this code is the qemu driver lock acquired.
However, all modifications to the driver's domain hashtable *must* be done while holding the driver lock, otherwise the hashtable can become corrupt, and (even more likely) another thread could call a different hashtable function and acquire a pointer to the domain that is in the process of being destroyed.
To prevent such a disaster, qemuProcessHandleMonitorEOF must get the qemu driver lock *before* it gets the DomainObj's lock, and hold it until it is finished with the DomainObj. This guarantees that nobody else modifies the hashtable at the same time, and that anyone who had already gotten the DomainObj from the hashtable prior to this call has finished with it before we remove/destroy it. --- src/qemu/qemu_process.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-)
Makes sense to me, given src/qemu/THREADS.txt, but you may want to wait for danpb's ACK as well. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On Thu, Mar 03, 2011 at 02:47:38PM -0500, Laine Stump wrote:
This was also found while investigating
https://bugzilla.redhat.com/show_bug.cgi?id=670848
An EOF on a domain's monitor socket results in an event being queued to handle the EOF. The handler calls qemuProcessHandleMonitorEOF. If it is a transient domain, this leads to a call to virDomainRemoveInactive, which removes the domain from the driver's hashtable and unref's it. Nowhere in this code is the qemu driver lock acquired.
However, all modifications to the driver's domain hashtable *must* be done while holding the driver lock, otherwise the hashtable can become corrupt, and (even more likely) another thread could call a different hashtable function and acquire a pointer to the domain that is in the process of being destroyed.
To prevent such a disaster, qemuProcessHandleMonitorEOF must get the qemu driver lock *before* it gets the DomainObj's lock, and hold it until it is finished with the DomainObj. This guarantees that nobody else modifies the hashtable at the same time, and that anyone who had already gotten the DomainObj from the hashtable prior to this call has finished with it before we remove/destroy it. --- src/qemu/qemu_process.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1d67b3c..6a65a33 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -107,11 +107,13 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED,
VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name);
+ qemuDriverLock(driver); virDomainObjLock(vm);
if (!virDomainObjIsActive(vm)) { VIR_DEBUG("Domain %p is not active, ignoring EOF", vm); virDomainObjUnlock(vm); + qemuDriverUnlock(driver); return; }
@@ -137,10 +139,9 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virDomainObjUnlock(vm);
if (event) { - qemuDriverLock(driver); qemuDomainEventQueue(driver, event); - qemuDriverUnlock(driver); } + qemuDriverUnlock(driver); }
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump