On Thu, Oct 30, 2014 at 05:44:23PM +0000, Daniel P. Berrange wrote:
On Thu, Oct 30, 2014 at 06:39:56PM +0100, Michal Privoznik wrote:
> On 30.10.2014 16:04, Martin Kletzander wrote:
> >When one domain is being undefined and at the same time started, for
> >example, there is a possibility of a rare problem occuring.
> >
> > - Thread 1 does virDomainUndefine(), has the lock, checks that the
> > domain is active and because it's not, calls
> > virDomainObjListRemove().
> >
> > - Thread 2 does virDomainCreate() and tries to lock the domain.
> >
> > - Thread 1 needs to lock domain list in order to remove the domain from
> > it, but must unlock domain first (proper order is to lock domain list
> > first and the domain itself second).
> >
> > - Thread 2 grabs the lock, starts the domain and releases the lock.
> >
> > - Thread 1 grabs the lock and removes the domain from list.
> >
> >With this patch:
> >
> > - The undefining domain gets marked as "to undefine" before it is
> > unlocked.
> >
> > - If domain is found in any of the search APIs, it's returned only if
> > it is not marked as "to undefine". The check is done while the
> > domain is locked.
> >
> >Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1150505
> >
> >Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
> >---
> > src/conf/domain_conf.c | 23 ++++++++++++++++++++---
> > src/conf/domain_conf.h | 1 +
> > 2 files changed, 21 insertions(+), 3 deletions(-)
> >
> >diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c
> >index 43574e1..b92a58a 100644
> >--- a/src/conf/domain_conf.c
> >+++ b/src/conf/domain_conf.c
> >@@ -1054,8 +1054,13 @@ virDomainObjPtr
virDomainObjListFindByID(virDomainObjListPtr doms,
> > virDomainObjPtr obj;
> > virObjectLock(doms);
> > obj = virHashSearch(doms->objs, virDomainObjListSearchID, &id);
> >- if (obj)
> >+ if (obj) {
> > virObjectLock(obj);
> >+ if (obj->undefining) {
> >+ virObjectUnlock(obj);
> >+ obj = NULL;
> >+ }
> >+ }
> > virObjectUnlock(doms);
> > return obj;
> > }
>
> I find this too hackish, Wouldn't it be better to hold the domain list
> locked during the whole undefine procedure? On one hand the throughput would
> get hurt but on the other, the code would be cleaner. Or maybe we can just
> make the Undefine() API to grab the QEMU_JOB_MODIFY job which would make the
> APIs serialize.
Yep, it feels like "Undefine" could be treated as a job, so that all
the other APIs get blocked/serialized by it.
That won't work for anything else than QEMU. I tried creating
something like a job in the generic level which would fix it in
e.g. LXC as well.
Martin