
"Daniel P. Berrange" <berrange@redhat.com> wrote:
On Wed, Jan 30, 2008 at 06:36:37PM +0100, Jim Meyering wrote:
"Daniel P. Berrange" <berrange@redhat.com> wrote:
On Wed, Jan 30, 2008 at 06:25:19PM +0100, Jim Meyering wrote:
Without the following patch, this command would hang
printf 'domuuid fc4\ndomstate fc4\n' \ | ./virsh --connect test://$PWD/../docs/testnode.xml
with this stack trace:
__lll_lock_wait ... _L_lock_105 ... __pthread_mutex_lock ... virUnrefDomain (domain=0x6a8b30) at hash.c:884 virDomainFree (domain=0x6a8b30) at libvirt.c:1211 cmdDomstate (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:677 vshCommandRun (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:4033 main (argc=3, argv=0x7fffea7234e8) at virsh.c:501
The problem is that cmdDomuuid didn't call virDomainFree(dom), so cmdDomstate hangs trying to do the Unref.
I don't understand why it would hang simply because it forgot to free an object. Surely it ought to just be a memory leak not hang ? Each individual libvirt API must always have a matched lock & unlock pair in all codepaths, so a hang should only occur if an unlock is missing in some codepath.
Well maybe virDomainFree is misnamed then. It does more:
The missing virDomainFree calls cause that because they are what is supposed to do the unlock.
Not, its a bug in virUnrefDomain/Network - it calls mutex_lock() twice in one codepath, instead of calling unlock().
Of course your patch to avoid the memory leak is still needed - so ACK to that, but the locking flaw needs this patch:
Index: src/hash.c =================================================================== RCS file: /data/cvs/libvirt/src/hash.c,v retrieving revision 1.29 diff -u -p -r1.29 hash.c --- src/hash.c 29 Jan 2008 18:15:54 -0000 1.29 +++ src/hash.c 30 Jan 2008 17:42:32 -0000 @@ -881,7 +881,7 @@ virUnrefDomain(virDomainPtr domain) { return (0); }
- pthread_mutex_lock(&domain->conn->lock); + pthread_mutex_unlock(&domain->conn->lock); return (refs); }
@@ -1013,7 +1013,7 @@ virUnrefNetwork(virNetworkPtr network) { return (0); }
- pthread_mutex_lock(&network->conn->lock); + pthread_mutex_unlock(&network->conn->lock); return (refs); }
Oh! Two locks in a row. Amazing that such a problem wasn't exposed sooner. Ack.