When one domain is being undefined and at the same time started, for
example, there is a possibility of a rare problem occuring.
- Thread 1 does virDomainUndefine(), has the lock, checks that the
domain is active and because it's not, calls
virDomainObjListRemove().
- Thread 2 does virDomainCreate() and tries to lock the domain.
- Thread 1 needs to lock domain list in order to remove the domain from
it, but must unlock domain first (proper order is to lock domain list
first and the domain itself second).
- Thread 2 grabs the lock, starts the domain and releases the lock.
- Thread 1 grabs the lock and removes the domain from list.
With this patch:
- The undefining domain gets marked as "to undefine" before it is
unlocked.
- If domain is found in any of the search APIs, it's returned only if
it is not marked as "to undefine". The check is done while the
domain is locked.
Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1150505
Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
---
src/conf/domain_conf.c | 23 ++++++++++++++++++++---
src/conf/domain_conf.h | 1 +
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
index 43574e1..b92a58a 100644
--- a/src/conf/domain_conf.c
+++ b/src/conf/domain_conf.c
@@ -1054,8 +1054,13 @@ virDomainObjPtr virDomainObjListFindByID(virDomainObjListPtr
doms,
virDomainObjPtr obj;
virObjectLock(doms);
obj = virHashSearch(doms->objs, virDomainObjListSearchID, &id);
- if (obj)
+ if (obj) {
virObjectLock(obj);
+ if (obj->undefining) {
+ virObjectUnlock(obj);
+ obj = NULL;
+ }
+ }
virObjectUnlock(doms);
return obj;
}
I find this too hackish, Wouldn't it be better to hold the domain list
locked during the whole undefine procedure? On one hand the throughput
would get hurt but on the other, the code would be cleaner. Or maybe we
can just make the Undefine() API to grab the QEMU_JOB_MODIFY job which
would make the APIs serialize.
Michal