"Daniel P. Berrange" <berrange(a)redhat.com> wrote:
On Wed, Jan 30, 2008 at 06:36:37PM +0100, Jim Meyering wrote:
> "Daniel P. Berrange" <berrange(a)redhat.com> wrote:
>
> > On Wed, Jan 30, 2008 at 06:25:19PM +0100, Jim Meyering wrote:
> >> Without the following patch, this command would hang
> >>
> >> printf 'domuuid fc4\ndomstate fc4\n' \
> >> | ./virsh --connect test://$PWD/../docs/testnode.xml
> >>
> >> with this stack trace:
> >>
> >> __lll_lock_wait ...
> >> _L_lock_105 ...
> >> __pthread_mutex_lock ...
> >> virUnrefDomain (domain=0x6a8b30) at hash.c:884
> >> virDomainFree (domain=0x6a8b30) at libvirt.c:1211
> >> cmdDomstate (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:677
> >> vshCommandRun (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:4033
> >> main (argc=3, argv=0x7fffea7234e8) at virsh.c:501
> >>
> >> The problem is that cmdDomuuid didn't call virDomainFree(dom), so
> >> cmdDomstate hangs trying to do the Unref.
> >
> > I don't understand why it would hang simply because it forgot to free
> > an object. Surely it ought to just be a memory leak not hang ? Each
> > individual libvirt API must always have a matched lock & unlock pair
> > in all codepaths, so a hang should only occur if an unlock is missing
> > in some codepath.
>
> Well maybe virDomainFree is misnamed then.
> It does more:
>
> The missing virDomainFree calls cause that because
> they are what is supposed to do the unlock.
Not, its a bug in virUnrefDomain/Network - it calls mutex_lock() twice
in one codepath, instead of calling unlock().
Of course your patch to avoid the memory leak is still needed - so ACK
to that, but the locking flaw needs this patch:
Index: src/hash.c
===================================================================
RCS file: /data/cvs/libvirt/src/hash.c,v
retrieving revision 1.29
diff -u -p -r1.29 hash.c
--- src/hash.c 29 Jan 2008 18:15:54 -0000 1.29
+++ src/hash.c 30 Jan 2008 17:42:32 -0000
@@ -881,7 +881,7 @@ virUnrefDomain(virDomainPtr domain) {
return (0);
}
- pthread_mutex_lock(&domain->conn->lock);
+ pthread_mutex_unlock(&domain->conn->lock);
return (refs);
}
@@ -1013,7 +1013,7 @@ virUnrefNetwork(virNetworkPtr network) {
return (0);
}
- pthread_mutex_lock(&network->conn->lock);
+ pthread_mutex_unlock(&network->conn->lock);
return (refs);
}
Oh! Two locks in a row.
Amazing that such a problem wasn't exposed sooner.
Ack.