2011/7/27 Matthias Bolte <matthias.bolte(a)googlemail.com>:
doRemoteClose doesn't free the virNetClientPtr and this creates
a
260kb leak per remote connection. This happens because
virNetClientFree doesn't remove the last ref, because virNetClientNew
creates the virNetClientPtr with a refcount of 2.
static virNetClientPtr virNetClientNew(virNetSocketPtr sock,
const char *hostname)
{
[...]
client->refs = 1;
[...]
/* Set up a callback to listen on the socket data */
client->refs++;
if (virNetSocketAddIOCallback(client->sock,
VIR_EVENT_HANDLE_READABLE,
virNetClientIncomingEvent,
client,
virNetClientEventFree) < 0) {
client->refs--;
VIR_DEBUG("Failed to add event watch, disabling events");
}
[...]
}
virNetClientNew adds a ref before calling virNetSocketAddIOCallback
but only removes it when virNetSocketAddIOCallback fails. This seems
wrong too me and the ref should be removed in the success case too.
The same pattern was added in 0302391ee643ad91fdc6d2ecf7e4cf0fc9724516
(Fix race in ref counting when handling RPC jobs)
--- a/src/rpc/virnetserverclient.c
+++ b/src/rpc/virnetserverclient.c
@@ -763,10 +763,12 @@ readmore:
/* Send off to for normal dispatch to workers */
if (msg) {
+ client->refs++;
if (!client->dispatchFunc ||
client->dispatchFunc(client, msg,
client->dispatchOpaque) < 0) {
virNetMessageFree(msg);
client->wantClose = true;
+ client->refs--;
return;
}
}
Again, this seems wrong and the ref should be removed in the success
case here too. Before I spent time to figure out how the refcounting
is supposed to work here I just report it and hope that Dan can easily
fix this.
Okay, after some discussion on IRC with Michal and Eric and further
testing I think that the ref counting is okay here.
virNetClientNew adds the second ref because the free callback
(virNetClientEventFree) passed to virNetSocketAddIOCallback will call
virNetClientFree.
Another observation is that virNetClientFree calls
virNetSocketRemoveIOCallback when the last ref was removed. Actually
that's pointless because virNetSocketRemoveIOCallback might indirectly
removed the last ref. So this looks like an ordering problem. The
ordering gets fixed by calling virNetClientClose before
virNetClientFree, because virNetClientClose calls
virNetSocketRemoveIOCallback. Another problem is that
virNetSocketRemoveIOCallback only indirectly results in a call to
virNetClientFree, because the event loop will call virNetClientFree
when removing callbacks marked for deletion --
virNetSocketRemoveIOCallback does this marking.
The memory leak I saw was due to virsh calling
virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl.
Because the event loop is initialized, the call to
virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not
driving the event loop results in not removing callbacks that were
marked for deletion. Finally this leaks the virNetClientPtr using in
the remote driver. I used a virsh -c qemu:///system to test.
I was able fix this by calling virEventRunDefaultImpl after
virConnectClose in virsh. But I don't think that this is the correct
general solution.
To me the general problem seems to be the entanglement between the
virNetClientPtr refcounting and the event loop. I'm not sure how to
improve this in general. Maybe should have a thread calling
virEventRunDefaultImpl as the comment on virEventRunDefaultImpl
suggest.
--
Matthias Bolte
http://photron.blogspot.com