[libvirt] [PATCH 0/2] Fix domain list race crash

This series fixes the race between freeing and accessing of domain objects and provides a reproducer case to test it. Peter Krempa (2): conf: Fix race between looking up a domain object and freeing it DO NOT APPLY UPSTREAM: Reproducer of domain free crash src/conf/domain_conf.c | 8 ++++++++ src/qemu/qemu_driver.c | 6 ++++++ 2 files changed, 14 insertions(+) -- 1.8.1.5

This patch fixes crash of the daemon that happens due to the following race condition: Let's have two threads in the libvirtd daemon's qemu driver: A - thread executing a API call to get information about a domain B - thread executing undefine on the same domain Assume following serialization of operations done by the threads: 1) A has the lock on the domain object and is executing some code prior to virDomainObjListRemove() 2) B takes the lock on the domain object list, looks up the domain object pointer and blocks in the attempt to lock the domain object as A is holding the lock 3) A reaches virDomainObjListRemove() and unlocks the lock on the domain object 4) A blocks on the attempt to get the domain list lock 5) B is able to lock the domain object now and unlocks the domain list 6) A is now able to lock the domain list, and shed the last reference on the tomain object, this triggers the freeing function. 6) B starts executing the code on the pointer that is being freed 7) The libvirtd daemon crashes while attempting to access invalid pointer in thread B. This patch fixes the race by acquiring a reference on the domain object before unlocking it in virDomainObjListRemove() and re-locks the object prior to removing and freeing it. This ensures that no thread that does not hold a reference on the domain object expects the pointer to be valid and the monitor code expects the object to vanish. This is a minimal fix of the problem, but a better solution will be to switch to full reference counting for domain objects. --- src/conf/domain_conf.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 03e5740..cafef0c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2238,10 +2238,14 @@ void virDomainObjListRemove(virDomainObjListPtr doms, char uuidstr[VIR_UUID_STRING_BUFLEN]; virUUIDFormat(dom->def->uuid, uuidstr); + virObjectRef(dom); virObjectUnlock(dom); virObjectLock(doms); + virObjectLock(dom); virHashRemoveEntry(doms->objs, uuidstr); + virObjectUnlock(dom); + virObjectUnref(dom); virObjectUnlock(doms); } -- 1.8.1.5

On 04/09/2013 03:02 PM, Peter Krempa wrote:
1 file changed, 4 insertions(+)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 03e5740..cafef0c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2238,10 +2238,14 @@ void virDomainObjListRemove(virDomainObjListPtr doms, char uuidstr[VIR_UUID_STRING_BUFLEN];
virUUIDFormat(dom->def->uuid, uuidstr); + virObjectRef(dom); virObjectUnlock(dom);
virObjectLock(doms); + virObjectLock(dom); virHashRemoveEntry(doms->objs, uuidstr); + virObjectUnlock(dom); + virObjectUnref(dom); virObjectUnlock(doms); }
+1, small fix big impact... -- Mit freundlichen Grüßen/Kind Regards Viktor Mihajlovski IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Martina Köderitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294

On 04/09/2013 07:02 AM, Peter Krempa wrote:
This patch fixes crash of the daemon that happens due to the following race condition:
Let's have two threads in the libvirtd daemon's qemu driver: A - thread executing a API call to get information about a domain B - thread executing undefine on the same domain
You mixed up the two threads here, compared to your description below. In the description, A is undefining the domain, while B is attempting to get information.
Assume following serialization of operations done by the threads: 1) A has the lock on the domain object and is executing some code prior to virDomainObjListRemove() 2) B takes the lock on the domain object list, looks up the domain object pointer and blocks in the attempt to lock the domain object as A is holding the lock 3) A reaches virDomainObjListRemove() and unlocks the lock on the domain object 4) A blocks on the attempt to get the domain list lock 5) B is able to lock the domain object now and unlocks the domain list 6) A is now able to lock the domain list, and shed the last reference on the
s/shed/sheds/
tomain object, this triggers the freeing function.
s/tomain/domain/
6) B starts executing the code on the pointer that is being freed 7) The libvirtd daemon crashes while attempting to access invalid pointer in thread B.
Yep, that sequence matches what the reproducer in 2/2 was exposing.
This patch fixes the race by acquiring a reference on the domain object before unlocking it in virDomainObjListRemove() and re-locks the object prior to removing and freeing it. This ensures that no thread that does not hold a reference on the domain object expects the pointer to be valid and the monitor code expects the object to vanish.
Double negative; I would write: This ensures that no thread holds a lock on the domain object at the time it is removed from the list, and that doing a list lookup will never find a domain that is about to vanish.
This is a minimal fix of the problem, but a better solution will be to switch to full reference counting for domain objects. --- src/conf/domain_conf.c | 4 ++++ 1 file changed, 4 insertions(+)
Commit message needs help, but the code is correct. ACK
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 03e5740..cafef0c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2238,10 +2238,14 @@ void virDomainObjListRemove(virDomainObjListPtr doms, char uuidstr[VIR_UUID_STRING_BUFLEN];
virUUIDFormat(dom->def->uuid, uuidstr); + virObjectRef(dom); virObjectUnlock(dom);
virObjectLock(doms); + virObjectLock(dom); virHashRemoveEntry(doms->objs, uuidstr); + virObjectUnlock(dom); + virObjectUnref(dom); virObjectUnlock(doms); }
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/09/13 16:52, Eric Blake wrote:
On 04/09/2013 07:02 AM, Peter Krempa wrote:
This patch fixes crash of the daemon that happens due to the following race condition:
...
--- src/conf/domain_conf.c | 4 ++++ 1 file changed, 4 insertions(+)
Commit message needs help, but the code is correct.
ACK
Thanks. I fixed the commit message and pushed this patch. Peter

The issue is triggered using the following shell oneliner with this patch applied: virsh undefine domain & sleep .1; virsh dominfo domain --- src/conf/domain_conf.c | 4 ++++ src/qemu/qemu_driver.c | 6 ++++++ 2 files changed, 10 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index cafef0c..8e1ccda 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -2241,8 +2241,12 @@ void virDomainObjListRemove(virDomainObjListPtr doms, virObjectRef(dom); virObjectUnlock(dom); + printf(" DEBUG: about to remove dom %s from list\n", uuidstr); + sleep(2); + virObjectLock(doms); virObjectLock(dom); + printf(" DEBUG: locked, removing dom %s from list\n", uuidstr); virHashRemoveEntry(doms->objs, uuidstr); virObjectUnlock(dom); virObjectUnref(dom); diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2c0d7d1..76daa01 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2296,10 +2296,16 @@ static int qemuDomainGetInfo(virDomainPtr dom, int ret = -1; int err; unsigned long long balloon; + char uuidstr[VIR_UUID_STRING_BUFLEN]; if (!(vm = qemuDomObjFromDomain(dom))) goto cleanup; + virUUIDFormat(dom->uuid, uuidstr); + printf(" DEBUG: about to look up info on %s\n", uuidstr); + sleep(5); + printf(" DEBUG: looking up info on %s\n", uuidstr); + info->state = virDomainObjGetState(vm, NULL); if (!virDomainObjIsActive(vm)) { -- 1.8.1.5
participants (3)
-
Eric Blake
-
Peter Krempa
-
Viktor Mihajlovski