[Libvir] [PATCH] avoid virsh hang due to missing virDomainFree(dom) call

Without the following patch, this command would hang printf 'domuuid fc4\ndomstate fc4\n' \ | ./virsh --connect test://$PWD/../docs/testnode.xml with this stack trace: __lll_lock_wait ... _L_lock_105 ... __pthread_mutex_lock ... virUnrefDomain (domain=0x6a8b30) at hash.c:884 virDomainFree (domain=0x6a8b30) at libvirt.c:1211 cmdDomstate (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:677 vshCommandRun (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:4033 main (argc=3, argv=0x7fffea7234e8) at virsh.c:5015 The problem is that cmdDomuuid didn't call virDomainFree(dom), so cmdDomstate hangs trying to do the Unref. I then did a mind-numbing audit of all of the other cmd* functions in virsh.c and found two other cases in which a "dom" may not be released. Here's the patch: Avoid virsh hang due to missing virDomainFree(dom) calls * src/virsh.c (cmdDomuuid): Add missing virDomainFree call. (cmdAttachDevice): Likewise. (cmdDetachDevice): Likewise. --- src/virsh.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/src/virsh.c b/src/virsh.c index 0d93fb6..487f256 100644 --- a/src/virsh.c +++ b/src/virsh.c @@ -2161,6 +2161,7 @@ cmdDomuuid(vshControl * ctl, vshCmd * cmd) else vshError(ctl, FALSE, "%s", _("failed to get domain UUID")); + virDomainFree(dom); return TRUE; } @@ -3038,8 +3039,10 @@ cmdAttachDevice(vshControl * ctl, vshCmd * cmd) return FALSE; } - if (virFileReadAll(from, VIRSH_MAX_XML_FILE, &buffer) < 0) + if (virFileReadAll(from, VIRSH_MAX_XML_FILE, &buffer) < 0) { + virDomainFree(dom); return FALSE; + } ret = virDomainAttachDevice(dom, buffer); free (buffer); @@ -3092,8 +3095,10 @@ cmdDetachDevice(vshControl * ctl, vshCmd * cmd) return FALSE; } - if (virFileReadAll(from, VIRSH_MAX_XML_FILE, &buffer) < 0) + if (virFileReadAll(from, VIRSH_MAX_XML_FILE, &buffer) < 0) { + virDomainFree(dom); return FALSE; + } ret = virDomainDetachDevice(dom, buffer); free (buffer); -- 1.5.4.rc5.1.g0fa73

On Wed, Jan 30, 2008 at 06:25:19PM +0100, Jim Meyering wrote:
Without the following patch, this command would hang
printf 'domuuid fc4\ndomstate fc4\n' \ | ./virsh --connect test://$PWD/../docs/testnode.xml
with this stack trace:
__lll_lock_wait ... _L_lock_105 ... __pthread_mutex_lock ... virUnrefDomain (domain=0x6a8b30) at hash.c:884 virDomainFree (domain=0x6a8b30) at libvirt.c:1211 cmdDomstate (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:677 vshCommandRun (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:4033 main (argc=3, argv=0x7fffea7234e8) at virsh.c:501
The problem is that cmdDomuuid didn't call virDomainFree(dom), so cmdDomstate hangs trying to do the Unref.
I don't understand why it would hang simply because it forgot to free an object. Surely it ought to just be a memory leak not hang ? Each individual libvirt API must always have a matched lock & unlock pair in all codepaths, so a hang should only occur if an unlock is missing in some codepath. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

"Daniel P. Berrange" <berrange@redhat.com> wrote:
On Wed, Jan 30, 2008 at 06:25:19PM +0100, Jim Meyering wrote:
Without the following patch, this command would hang
printf 'domuuid fc4\ndomstate fc4\n' \ | ./virsh --connect test://$PWD/../docs/testnode.xml
with this stack trace:
__lll_lock_wait ... _L_lock_105 ... __pthread_mutex_lock ... virUnrefDomain (domain=0x6a8b30) at hash.c:884 virDomainFree (domain=0x6a8b30) at libvirt.c:1211 cmdDomstate (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:677 vshCommandRun (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:4033 main (argc=3, argv=0x7fffea7234e8) at virsh.c:501
The problem is that cmdDomuuid didn't call virDomainFree(dom), so cmdDomstate hangs trying to do the Unref.
I don't understand why it would hang simply because it forgot to free an object. Surely it ought to just be a memory leak not hang ? Each individual libvirt API must always have a matched lock & unlock pair in all codepaths, so a hang should only occur if an unlock is missing in some codepath.
Well maybe virDomainFree is misnamed then. It does more: The missing virDomainFree calls cause that because they are what is supposed to do the unlock. virDomainFree calls virUnrefDomain, which calls virReleaseDomain (when refs hits 0), which calls pthread_mutex_unlock.

On Wed, Jan 30, 2008 at 06:36:37PM +0100, Jim Meyering wrote:
"Daniel P. Berrange" <berrange@redhat.com> wrote:
On Wed, Jan 30, 2008 at 06:25:19PM +0100, Jim Meyering wrote:
Without the following patch, this command would hang
printf 'domuuid fc4\ndomstate fc4\n' \ | ./virsh --connect test://$PWD/../docs/testnode.xml
with this stack trace:
__lll_lock_wait ... _L_lock_105 ... __pthread_mutex_lock ... virUnrefDomain (domain=0x6a8b30) at hash.c:884 virDomainFree (domain=0x6a8b30) at libvirt.c:1211 cmdDomstate (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:677 vshCommandRun (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:4033 main (argc=3, argv=0x7fffea7234e8) at virsh.c:501
The problem is that cmdDomuuid didn't call virDomainFree(dom), so cmdDomstate hangs trying to do the Unref.
I don't understand why it would hang simply because it forgot to free an object. Surely it ought to just be a memory leak not hang ? Each individual libvirt API must always have a matched lock & unlock pair in all codepaths, so a hang should only occur if an unlock is missing in some codepath.
Well maybe virDomainFree is misnamed then. It does more:
The missing virDomainFree calls cause that because they are what is supposed to do the unlock.
Not, its a bug in virUnrefDomain/Network - it calls mutex_lock() twice in one codepath, instead of calling unlock(). Of course your patch to avoid the memory leak is still needed - so ACK to that, but the locking flaw needs this patch: Index: src/hash.c =================================================================== RCS file: /data/cvs/libvirt/src/hash.c,v retrieving revision 1.29 diff -u -p -r1.29 hash.c --- src/hash.c 29 Jan 2008 18:15:54 -0000 1.29 +++ src/hash.c 30 Jan 2008 17:42:32 -0000 @@ -881,7 +881,7 @@ virUnrefDomain(virDomainPtr domain) { return (0); } - pthread_mutex_lock(&domain->conn->lock); + pthread_mutex_unlock(&domain->conn->lock); return (refs); } @@ -1013,7 +1013,7 @@ virUnrefNetwork(virNetworkPtr network) { return (0); } - pthread_mutex_lock(&network->conn->lock); + pthread_mutex_unlock(&network->conn->lock); return (refs); } Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

"Daniel P. Berrange" <berrange@redhat.com> wrote:
On Wed, Jan 30, 2008 at 06:36:37PM +0100, Jim Meyering wrote:
"Daniel P. Berrange" <berrange@redhat.com> wrote:
On Wed, Jan 30, 2008 at 06:25:19PM +0100, Jim Meyering wrote:
Without the following patch, this command would hang
printf 'domuuid fc4\ndomstate fc4\n' \ | ./virsh --connect test://$PWD/../docs/testnode.xml
with this stack trace:
__lll_lock_wait ... _L_lock_105 ... __pthread_mutex_lock ... virUnrefDomain (domain=0x6a8b30) at hash.c:884 virDomainFree (domain=0x6a8b30) at libvirt.c:1211 cmdDomstate (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:677 vshCommandRun (ctl=0x7fffea723390, cmd=0x6a8b10) at virsh.c:4033 main (argc=3, argv=0x7fffea7234e8) at virsh.c:501
The problem is that cmdDomuuid didn't call virDomainFree(dom), so cmdDomstate hangs trying to do the Unref.
I don't understand why it would hang simply because it forgot to free an object. Surely it ought to just be a memory leak not hang ? Each individual libvirt API must always have a matched lock & unlock pair in all codepaths, so a hang should only occur if an unlock is missing in some codepath.
Well maybe virDomainFree is misnamed then. It does more:
The missing virDomainFree calls cause that because they are what is supposed to do the unlock.
Not, its a bug in virUnrefDomain/Network - it calls mutex_lock() twice in one codepath, instead of calling unlock().
Of course your patch to avoid the memory leak is still needed - so ACK to that, but the locking flaw needs this patch:
Index: src/hash.c =================================================================== RCS file: /data/cvs/libvirt/src/hash.c,v retrieving revision 1.29 diff -u -p -r1.29 hash.c --- src/hash.c 29 Jan 2008 18:15:54 -0000 1.29 +++ src/hash.c 30 Jan 2008 17:42:32 -0000 @@ -881,7 +881,7 @@ virUnrefDomain(virDomainPtr domain) { return (0); }
- pthread_mutex_lock(&domain->conn->lock); + pthread_mutex_unlock(&domain->conn->lock); return (refs); }
@@ -1013,7 +1013,7 @@ virUnrefNetwork(virNetworkPtr network) { return (0); }
- pthread_mutex_lock(&network->conn->lock); + pthread_mutex_unlock(&network->conn->lock); return (refs); }
Oh! Two locks in a row. Amazing that such a problem wasn't exposed sooner. Ack.

On Wed, Jan 30, 2008 at 07:00:30PM +0100, Jim Meyering wrote:
"Daniel P. Berrange" <berrange@redhat.com> wrote:
Not, its a bug in virUnrefDomain/Network - it calls mutex_lock() twice in one codepath, instead of calling unlock().
Of course your patch to avoid the memory leak is still needed - so ACK to that, but the locking flaw needs this patch:
Index: src/hash.c =================================================================== RCS file: /data/cvs/libvirt/src/hash.c,v retrieving revision 1.29 diff -u -p -r1.29 hash.c --- src/hash.c 29 Jan 2008 18:15:54 -0000 1.29 +++ src/hash.c 30 Jan 2008 17:42:32 -0000 @@ -881,7 +881,7 @@ virUnrefDomain(virDomainPtr domain) { return (0); }
- pthread_mutex_lock(&domain->conn->lock); + pthread_mutex_unlock(&domain->conn->lock); return (refs); }
@@ -1013,7 +1013,7 @@ virUnrefNetwork(virNetworkPtr network) { return (0); }
- pthread_mutex_lock(&network->conn->lock); + pthread_mutex_unlock(&network->conn->lock); return (refs); }
Oh! Two locks in a row. Amazing that such a problem wasn't exposed sooner.
Well I only committed this flaw last week, so not much time to expose it. Fixed in CVS now. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
participants (2)
-
Daniel P. Berrange
-
Jim Meyering