Re: [Libvir] [PATCH] avoid virsh hang due to missing virDomainFree(dom) call

30 Jan 2008

"Daniel P. Berrange" <berrange@redhat.com> wrote:
...
On Wed, Jan 30, 2008 at 06:36:37PM +0100, Jim Meyering wrote:
...
"Daniel P. Berrange" <berrange@redhat.com> wrote:
...
On Wed, Jan 30, 2008 at 06:25:19PM +0100, Jim Meyering wrote:
...
Without the following patch, this command would hang
printf 'domuuid fc4\ndomstate fc4\n' \
      | ./virsh --connect test://$PWD/../docs/testnode.xml
with this stack trace:
__lll_lock_wait ...
    _L_lock_105 ...
    __pthread_mutex_lock ...
    virUnrefDomain (domain=0x6a8b30) at hash.c:884
    virDomainFree (domain=0x6a8b30) at libvirt.c:1211
    cmdDomstate (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:677
    vshCommandRun (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:4033
    main (argc=3, argv=0x7fffea7234e8) at virsh.c:501
The problem is that cmdDomuuid didn't call virDomainFree(dom), so
cmdDomstate hangs trying to do the Unref.
I don't understand why it would hang simply because it forgot to free
an object. Surely it ought to just be a memory leak not hang ? Each
individual libvirt API must always have a matched lock & unlock pair
in all codepaths, so a hang should only occur if an unlock is missing
in some codepath.
Well maybe virDomainFree is misnamed then.
It does more:
The missing virDomainFree calls cause that because
they are what is supposed to do the unlock.
Not, its a bug in virUnrefDomain/Network - it calls mutex_lock() twice
in one codepath, instead of calling  unlock().
Of course your patch to avoid the memory leak is still needed - so ACK
to that, but the locking flaw needs this patch:
Index: src/hash.c
===================================================================
RCS file: /data/cvs/libvirt/src/hash.c,v
retrieving revision 1.29
diff -u -p -r1.29 hash.c

--- src/hash.c	29 Jan 2008 18:15:54 -0000	1.29
+++ src/hash.c	30 Jan 2008 17:42:32 -0000
@@ -881,7 +881,7 @@ virUnrefDomain(virDomainPtr domain) {
         return (0);
     }
-    pthread_mutex_lock(&domain->conn->lock);
+    pthread_mutex_unlock(&domain->conn->lock);
     return (refs);
 }
@@ -1013,7 +1013,7 @@ virUnrefNetwork(virNetworkPtr network) {
         return (0);
     }
-    pthread_mutex_lock(&network->conn->lock);
+    pthread_mutex_unlock(&network->conn->lock);
     return (refs);
 }
Oh!  Two locks in a row.
Amazing that such a problem wasn't exposed sooner.
Ack.