[libvirt] [PATCH] Domain/Net object cleanups after remote error

Domain and Net objects were not being cleaned up properly when reporting errors from the remote driver. Attached patch fixes this. Thanks, Cole diff --git a/src/remote_internal.c b/src/remote_internal.c index 51e8eb7..80f6ce6 100644 --- a/src/remote_internal.c +++ b/src/remote_internal.c @@ -4606,6 +4606,10 @@ server_error (virConnectPtr conn, remote_error *err) err->str3 ? *err->str3 : NULL, err->int1, err->int2, "%s", err->message ? *err->message : NULL); + if (dom) + virDomainFree(dom); + if (net) + virNetworkFree(dom); } /* get_nonnull_domain and get_nonnull_network turn an on-wire

Cole Robinson wrote:
Domain and Net objects were not being cleaned up properly when reporting errors from the remote driver. Attached patch fixes this.
Stupid typo. Correct patch attached. - Cole diff --git a/src/remote_internal.c b/src/remote_internal.c index 51e8eb7..80f6ce6 100644 --- a/src/remote_internal.c +++ b/src/remote_internal.c @@ -4606,6 +4606,10 @@ server_error (virConnectPtr conn, remote_error *err) err->str3 ? *err->str3 : NULL, err->int1, err->int2, "%s", err->message ? *err->message : NULL); + if (dom) + virDomainFree(dom); + if (net) + virNetworkFree(net); } /* get_nonnull_domain and get_nonnull_network turn an on-wire

On Mon, May 19, 2008 at 04:58:38PM -0400, Cole Robinson wrote:
diff --git a/src/remote_internal.c b/src/remote_internal.c index 51e8eb7..80f6ce6 100644 --- a/src/remote_internal.c +++ b/src/remote_internal.c @@ -4606,6 +4606,10 @@ server_error (virConnectPtr conn, remote_error *err) err->str3 ? *err->str3 : NULL, err->int1, err->int2, "%s", err->message ? *err->message : NULL); + if (dom) + virDomainFree(dom); + if (net) + virNetworkFree(net); }
Extra long hmmmmmmmmmmm ... This is right, the domain and network are leaked. However the virterror structure containing these pointers will be used later. (__virRaiseError saves it and callers access it later). However^2 these entries in the virterror structure are deprecated. I added a patch last month which adds a big warning about using these entries. At most callers should look at the pointers themselves and shouldn't dereference them (which would be the only safe thing to do given that the pointers have just been freed). But maybe there is old code around using the dom/net fields in virterror ... It's a bad idea even to look at those fields really. Rich. -- Richard Jones, Emerging Technologies, Red Hat http://et.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://et.redhat.com/~rjones/virt-top

On Tue, May 20, 2008 at 10:44:52AM +0100, Richard W.M. Jones wrote:
On Mon, May 19, 2008 at 04:58:38PM -0400, Cole Robinson wrote:
diff --git a/src/remote_internal.c b/src/remote_internal.c index 51e8eb7..80f6ce6 100644 --- a/src/remote_internal.c +++ b/src/remote_internal.c @@ -4606,6 +4606,10 @@ server_error (virConnectPtr conn, remote_error *err) err->str3 ? *err->str3 : NULL, err->int1, err->int2, "%s", err->message ? *err->message : NULL); + if (dom) + virDomainFree(dom); + if (net) + virNetworkFree(net); }
Extra long hmmmmmmmmmmm ...
This is right, the domain and network are leaked.
However the virterror structure containing these pointers will be used later. (__virRaiseError saves it and callers access it later).
However^2 these entries in the virterror structure are deprecated. I added a patch last month which adds a big warning about using these entries. At most callers should look at the pointers themselves and shouldn't dereference them (which would be the only safe thing to do given that the pointers have just been freed).
Yes, that should be safe - the only domain / network object set in an error condition is one that is passed in via the original caller. So there should still be a reference count held on these objects further up the call stack and thus this code won't actually free them immediately, merely decrement the ref count. Can probably confirm this by turning on debug mode and checking the messages. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Tue, May 20, 2008 at 10:44:52AM +0100, Richard W.M. Jones wrote:
On Mon, May 19, 2008 at 04:58:38PM -0400, Cole Robinson wrote:
diff --git a/src/remote_internal.c b/src/remote_internal.c index 51e8eb7..80f6ce6 100644 --- a/src/remote_internal.c +++ b/src/remote_internal.c @@ -4606,6 +4606,10 @@ server_error (virConnectPtr conn, remote_error *err) err->str3 ? *err->str3 : NULL, err->int1, err->int2, "%s", err->message ? *err->message : NULL); + if (dom) + virDomainFree(dom); + if (net) + virNetworkFree(net); } Extra long hmmmmmmmmmmm ...
This is right, the domain and network are leaked.
However the virterror structure containing these pointers will be used later. (__virRaiseError saves it and callers access it later).
However^2 these entries in the virterror structure are deprecated. I added a patch last month which adds a big warning about using these entries. At most callers should look at the pointers themselves and shouldn't dereference them (which would be the only safe thing to do given that the pointers have just been freed).
Yes, that should be safe - the only domain / network object set in an error condition is one that is passed in via the original caller. So there should still be a reference count held on these objects further up the call stack and thus this code won't actually free them immediately, merely decrement the ref count. Can probably confirm this by turning on debug mode and checking the messages.
Dan.
I just tested without this patch, but with the other destroy leak fixes I sent, and everything looks to be okay. So this patch can be ignored. Thanks, Cole

On Tue, May 20, 2008 at 09:32:28AM -0400, Cole Robinson wrote:
Daniel P. Berrange wrote:
On Tue, May 20, 2008 at 10:44:52AM +0100, Richard W.M. Jones wrote:
On Mon, May 19, 2008 at 04:58:38PM -0400, Cole Robinson wrote:
diff --git a/src/remote_internal.c b/src/remote_internal.c index 51e8eb7..80f6ce6 100644 --- a/src/remote_internal.c +++ b/src/remote_internal.c @@ -4606,6 +4606,10 @@ server_error (virConnectPtr conn, remote_error *err) err->str3 ? *err->str3 : NULL, err->int1, err->int2, "%s", err->message ? *err->message : NULL); + if (dom) + virDomainFree(dom); + if (net) + virNetworkFree(net); } Extra long hmmmmmmmmmmm ...
This is right, the domain and network are leaked.
However the virterror structure containing these pointers will be used later. (__virRaiseError saves it and callers access it later).
However^2 these entries in the virterror structure are deprecated. I added a patch last month which adds a big warning about using these entries. At most callers should look at the pointers themselves and shouldn't dereference them (which would be the only safe thing to do given that the pointers have just been freed).
Yes, that should be safe - the only domain / network object set in an error condition is one that is passed in via the original caller. So there should still be a reference count held on these objects further up the call stack and thus this code won't actually free them immediately, merely decrement the ref count. Can probably confirm this by turning on debug mode and checking the messages.
I just tested without this patch, but with the other destroy leak fixes I sent, and everything looks to be okay. So this patch can be ignored.
I'm not entirely convinced yet - the code certainly suggests to me that we need to free these. Only a couple of lines further up we obtain a referenced object /* Get the domain and network, if set. */ dom = err->dom ? get_nonnull_domain (conn, *err->dom) : NULL; net = err->net ? get_nonnull_network (conn, *err->net) : NULL; And __virRaiseError() does *not* own the reference after being called, thus we need to explicitly release the reference ourselves. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, May 20, 2008 at 02:39:52PM +0100, Daniel P. Berrange wrote:
I'm not entirely convinced yet - the code certainly suggests to me that we need to free these. Only a couple of lines further up we obtain a referenced object
/* Get the domain and network, if set. */ dom = err->dom ? get_nonnull_domain (conn, *err->dom) : NULL; net = err->net ? get_nonnull_network (conn, *err->net) : NULL;
And __virRaiseError() does *not* own the reference after being called, thus we need to explicitly release the reference ourselves.
Agreed with Dan ... I'm not sure how the leaks can go away with the other patch. OTOH the ref-counting stuff is complicated. Rich. -- Richard Jones, Emerging Technologies, Red Hat http://et.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v

Richard W.M. Jones wrote:
On Tue, May 20, 2008 at 02:39:52PM +0100, Daniel P. Berrange wrote:
I'm not entirely convinced yet - the code certainly suggests to me that we need to free these. Only a couple of lines further up we obtain a referenced object
/* Get the domain and network, if set. */ dom = err->dom ? get_nonnull_domain (conn, *err->dom) : NULL; net = err->net ? get_nonnull_network (conn, *err->net) : NULL;
And __virRaiseError() does *not* own the reference after being called, thus we need to explicitly release the reference ourselves.
Agreed with Dan ... I'm not sure how the leaks can go away with the other patch. OTOH the ref-counting stuff is complicated.
Rich.
I haven't really traced the code path in detail to figure out what was taking place but I'll explain what I was seeing: Connect to qemu driver with virsh. I have one defined vm currently shutoff. Trying to 'shutdown' the vm seems to cause no reference count leaks, and I can do it until I'm blue in the face. But once I try to 'destroy' the domain, a reference leak pops up. Now every subsequent 'destroy' or 'shutdown' command creates another leak (I didn't try any other commands). I thought I traced it to the error and remote.c leaks, and when fixed sequentially all problems went away. But now I'm not so sure what state all the code was in. I'll try this again from the original unpatched state and try to figure out exactly what was happening.

On Tue, May 20, 2008 at 09:57:02AM -0400, Cole Robinson wrote:
I'll try this again from the original unpatched state and try to figure out exactly what was happening.
Do you trigger an error at any point in the testing? You could try doing that (eg. try migrating a QEMU domain). Rich. -- Richard Jones, Emerging Technologies, Red Hat http://et.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v

Richard W.M. Jones wrote:
On Tue, May 20, 2008 at 09:57:02AM -0400, Cole Robinson wrote:
I'll try this again from the original unpatched state and try to figure out exactly what was happening.
Do you trigger an error at any point in the testing? You could try doing that (eg. try migrating a QEMU domain).
Rich.
Yes the whole time I am triggering errors, as I am trying to shutdown or destroy an already shutdown domain. That's the interesting part: once destroy errors once, even subsequent shutdown errors start leaking. - Cole

On Tue, May 20, 2008 at 10:11:57AM -0400, Cole Robinson wrote:
Richard W.M. Jones wrote:
On Tue, May 20, 2008 at 09:57:02AM -0400, Cole Robinson wrote:
I'll try this again from the original unpatched state and try to figure out exactly what was happening.
Do you trigger an error at any point in the testing? You could try doing that (eg. try migrating a QEMU domain).
Yes the whole time I am triggering errors, as I am trying to shutdown or destroy an already shutdown domain. That's the interesting part: once destroy errors once, even subsequent shutdown errors start leaking.
Ok, so the problem is a very complex one and here's what's going on.... - Storing the domain/network objects in the virError struct does not increment the reference count. - When a domain ref count drops to zero, the virReleaseDomain internal method will null-ify the dom/net field in any virError struct still refering to them. - In the remote daemon the paradigm for the helpers is usually if (virDomainCreate (dom) == -1) { virDomainFree(dom); return -1; } virDomainFree(dom); return 0; Both of those virDomainFree calls will release the last reference and thus, even if virDomainCreate() set the 'dom' field in the virError object it will now get nullified. - The remote daemon code which serializes the error object runs *after* that snippet I show above. Thus in the *absence* of memory leaks in the remote daemon helper methods it is *impossible* for the 'dom' or 'net' fields to ever be set in the error object serialized on the wire. So there *is* a memory leak in server_error() on the client end, but it is impossible to trigger it unless there is also a memory leak in the server end :-) Now the virDoaminDestroy() method helper does currently have a memory leak, hence why you saw the corresponding leak on the client side. Fixing the server side leak, hid the client side leak. Also, once a leak occurs on the server end, it will live forever, so once one method leaks, you'll see the client side leaks for all other methods which trigger an error too. So we *do* need the patch to the server_error() method - at very least for old daemons which will set the dom/net fields in the errors they return. We also ought to fix the daemon so that it correctly captures and serializes the error objects. This is a very non-trivial fix, since the code serializing the errors is running far too late. We need to capture the error much earlier, which will probably require quite a large change. So I vote for applying all Cole's patches which do indeed fix a number of memory leaks. Fixing the daemon to correctly serialize errors with a dom/net object can be done later unless someone has burning desire to tackle it now Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, May 20, 2008 at 03:38:01PM +0100, Daniel P. Berrange wrote:
So I vote for applying all Cole's patches which do indeed fix a number of memory leaks. Fixing the daemon to correctly serialize errors with a dom/net object can be done later unless someone has burning desire to tackle it now
Those fields are nothing but trouble. Any chance we can make a special exception and get rid of them in a way which causes code that depends on them to compile-fail with a meaningful error message, and yet doesn't break old running code linking to a newer library? Rich. -- Richard Jones, Emerging Technologies, Red Hat http://et.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into Xen guests. http://et.redhat.com/~rjones/virt-p2v

"Richard W.M. Jones" <rjones@redhat.com> wrote:
On Tue, May 20, 2008 at 03:38:01PM +0100, Daniel P. Berrange wrote:
So I vote for applying all Cole's patches which do indeed fix a number of memory leaks. Fixing the daemon to correctly serialize errors with a dom/net object can be done later unless someone has burning desire to tackle it now
Those fields are nothing but trouble. Any chance we can make a special exception and get rid of them in a way which causes code that depends on them to compile-fail with a meaningful error message, and yet doesn't break old running code linking to a newer library?
Hi Rich. Good idea. As a matter of fact, with gcc, there is a way to do that for variables. I'd seen this applied to functions, but not to variables. From gcc's extend.texi: @cindex @code{deprecated} attribute The @code{deprecated} attribute results in a warning if the variable is used anywhere in the source file. This is useful when identifying variables that are expected to be removed in a future version of a program. The warning also includes the location of the declaration of the deprecated variable, to enable users to easily find further information about why the variable is deprecated, or what they should do instead. Note that the warning only occurs for uses: @smallexample extern int old_var __attribute__ ((deprecated)); extern int old_var; int new_fn () @{ return old_var; @} @end smallexample So I tried it and it works like a charm: $ cat k.c struct foo { int ii; int jj __attribute__((deprecated)); }; static int bar() { struct foo o; o.jj = 1; }; $ gcc -c k.c k.c: In function 'bar': k.c:9: warning: 'jj' is deprecated (declared at k.c:3)

On Mon, May 19, 2008 at 04:58:38PM -0400, Cole Robinson wrote:
Cole Robinson wrote:
Domain and Net objects were not being cleaned up properly when reporting errors from the remote driver. Attached patch fixes this.
Stupid typo. Correct patch attached.
I've applied this to CVS now. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (4)
-
Cole Robinson
-
Daniel P. Berrange
-
Jim Meyering
-
Richard W.M. Jones