[libvirt] [PATCH v2 1/2] remote: set neventCallbacks to zero at DEREG_CB

set neventCallbacks to zero after VIR_FREE(eventCallbacks) was called --- daemon/remote.c | 1 + 1 file changed, 1 insertion(+) diff --git a/daemon/remote.c b/daemon/remote.c index 07557e9..cbcb6e8 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1686,6 +1686,7 @@ void remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED, int r VIR_WARN("unexpected %s event deregister failure", name); \ } \ VIR_FREE(eventCallbacks); \ + neventCallbacks = 0; \ } while (0); /* -- 2.8.3

base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve libvirt coredump problem, but it introduce a memory leak sense: 1. one client register a domain event such as reboot event 2. and client was terminated unexpectly, such as kill -9, then this client object will not free at libvirtd service program. remoteDispatchConnectDomainEventCallbackRegisterAny reference the client, but when client was terminated before it call deRegisterAny,the reference of client will not reduced to zero. so the memory leak take place. this patch will deRegister all event callbacks when client was close. --- daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-) diff --git a/daemon/remote.c b/daemon/remote.c index cbcb6e8..466b2e7 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1689,23 +1689,10 @@ void remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED, int r neventCallbacks = 0; \ } while (0); -/* - * You must hold lock for at least the client - * We don't free stuff here, merely disconnect the client's - * network socket & resources. - * We keep the libvirt connection open until any async - * jobs have finished, then clean it up elsewhere - */ -void remoteClientFreeFunc(void *data) +static void +remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv) { - struct daemonClientPrivate *priv = data; - - /* Deregister event delivery callback */ - if (priv->conn) { - virIdentityPtr sysident = virIdentityGetSystem(); - - virIdentitySetCurrent(sysident); - + if (priv && priv->conn) { DEREG_CB(priv->conn, priv->domainEventCallbacks, priv->ndomainEventCallbacks, virConnectDomainEventDeregisterAny, "domain"); @@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data) remoteRelayConnectionClosedEvent) < 0) VIR_WARN("unexpected close callback event deregister failure"); } + } +} +#undef DEREG_CB + +/* + * You must hold lock for at least the client + * We don't free stuff here, merely disconnect the client's + * network socket & resources. + * We keep the libvirt connection open until any async + * jobs have finished, then clean it up elsewhere + */ +void remoteClientFreeFunc(void *data) +{ + struct daemonClientPrivate *priv = data; + + /* Deregister event delivery callback */ + if (priv->conn) { + virIdentityPtr sysident = virIdentityGetSystem(); + + virIdentitySetCurrent(sysident); + + remoteClientFreePrivateCallbacks(priv); virConnectClose(priv->conn); virIdentitySetCurrent(NULL); virObjectUnref(sysident); - } + } VIR_FREE(priv); + } -#undef DEREG_CB static void remoteClientCloseFunc(virNetServerClientPtr client) @@ -1747,6 +1756,7 @@ static void remoteClientCloseFunc(virNetServerClientPtr client) struct daemonClientPrivate *priv = virNetServerClientGetPrivateData(client); daemonRemoveAllClientStreams(priv->streams); + remoteClientFreePrivateCallbacks(priv); } -- 2.8.3

On 11/06/2017 12:47 AM, xinhua.Cao wrote:
base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve libvirt coredump problem, but it introduce a memory leak sense:
1. one client register a domain event such as reboot event 2. and client was terminated unexpectly, such as kill -9,
unexpectedly
then this client object will not free at libvirtd service program.
remoteDispatchConnectDomainEventCallbackRegisterAny reference the client, but when client was terminated before it call deRegisterAny,the reference of client will not reduced to zero. so the memory leak take place.
What's the difference between the path that would terminate normally and the one that terminates abnormally. It seems to me there might be (an) extra Ref(s) on the client which prevents virNetServerClientDispose from calling it's @privateDataFreeFunc which would call remoteClientFreeFunc since remoteClientInitHook worked properly. That's what needs to be found.
this patch will deRegister all event callbacks when client was close. ---
FWIW: This commit message was a bit difficult to follow - I know it's primarily a language barrier thing - so maybe if you just construct some function code flow for both initialization/open and destruction/close it'd help... Paths that work and the path that you think is broken... That can be done in this section after the "---" and before the changed API's. I assume you've done a lot of research, but the details of that research can be difficult to just "pick up on". Typically someone will provide some sort of valgrind output as an example. I just have a sense that this could boil down to a path (or two) that are doing the virObjectRef without the expected virObjectUnref occurring - something perhaps to do with how @wantClose is handled in the abnormal termination case. Nothing jumps out at me though even after looking at the code for a while.
daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/daemon/remote.c b/daemon/remote.c index cbcb6e8..466b2e7 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1689,23 +1689,10 @@ void remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED, int r neventCallbacks = 0; \ } while (0);
-/* - * You must hold lock for at least the client - * We don't free stuff here, merely disconnect the client's - * network socket & resources. - * We keep the libvirt connection open until any async - * jobs have finished, then clean it up elsewhere - */ -void remoteClientFreeFunc(void *data) +static void +remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv) { - struct daemonClientPrivate *priv = data; - - /* Deregister event delivery callback */ - if (priv->conn) { - virIdentityPtr sysident = virIdentityGetSystem(); - - virIdentitySetCurrent(sysident); - + if (priv && priv->conn) {
Well each of the callers has already dereferenced @priv so it's a bit too late to check it now!
DEREG_CB(priv->conn, priv->domainEventCallbacks, priv->ndomainEventCallbacks, virConnectDomainEventDeregisterAny, "domain"); @@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data) remoteRelayConnectionClosedEvent) < 0) VIR_WARN("unexpected close callback event deregister failure"); } + } +} +#undef DEREG_CB +
There should be another blank line here.
+/* + * You must hold lock for at least the client + * We don't free stuff here, merely disconnect the client's + * network socket & resources. + * We keep the libvirt connection open until any async + * jobs have finished, then clean it up elsewhere + */ +void remoteClientFreeFunc(void *data) +{ + struct daemonClientPrivate *priv = data; + + /* Deregister event delivery callback */ + if (priv->conn) { + virIdentityPtr sysident = virIdentityGetSystem(); + + virIdentitySetCurrent(sysident); + + remoteClientFreePrivateCallbacks(priv);
virConnectClose(priv->conn);
virIdentitySetCurrent(NULL); virObjectUnref(sysident); - }
+ } VIR_FREE(priv); + } -#undef DEREG_CB
static void remoteClientCloseFunc(virNetServerClientPtr client) @@ -1747,6 +1756,7 @@ static void remoteClientCloseFunc(virNetServerClientPtr client) struct daemonClientPrivate *priv = virNetServerClientGetPrivateData(client);
daemonRemoveAllClientStreams(priv->streams); + remoteClientFreePrivateCallbacks(priv);
Perhaps this just gets a "if (priv->conn)" before calling... Still if as I assume the missing Unref is found (someday), then theoretically remoteClientFreeFunc would be called - of course the nEventCallbacks = 0 will ensure the Dereg's dont' get called again. So it shoudn't be a problem John
}

在 2017/11/11 5:43, John Ferlan 写道:
On 11/06/2017 12:47 AM, xinhua.Cao wrote:
base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve libvirt coredump problem, but it introduce a memory leak sense:
1. one client register a domain event such as reboot event 2. and client was terminated unexpectly, such as kill -9, unexpectedly
then this client object will not free at libvirtd service program.
remoteDispatchConnectDomainEventCallbackRegisterAny reference the client, but when client was terminated before it call deRegisterAny,the reference of client will not reduced to zero. so the memory leak take place. What's the difference between the path that would terminate normally and the one that terminates abnormally. It seems to me there might be (an) extra Ref(s) on the client which prevents virNetServerClientDispose from calling it's @privateDataFreeFunc which would call remoteClientFreeFunc since remoteClientInitHook worked properly.
That's what needs to be found.
this patch will deRegister all event callbacks when client was close. --- FWIW: This commit message was a bit difficult to follow - I know it's primarily a language barrier thing - so maybe if you just construct some function code flow for both initialization/open and destruction/close it'd help... Paths that work and the path that you think is broken... That can be done in this section after the "---" and before the changed API's.
I assume you've done a lot of research, but the details of that research can be difficult to just "pick up on". Typically someone will provide some sort of valgrind output as an example.
I just have a sense that this could boil down to a path (or two) that are doing the virObjectRef without the expected virObjectUnref occurring - something perhaps to do with how @wantClose is handled in the abnormal termination case. Nothing jumps out at me though even after looking at the code for a while.
daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/daemon/remote.c b/daemon/remote.c index cbcb6e8..466b2e7 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1689,23 +1689,10 @@ void remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED, int r neventCallbacks = 0; \ } while (0);
-/* - * You must hold lock for at least the client - * We don't free stuff here, merely disconnect the client's - * network socket & resources. - * We keep the libvirt connection open until any async - * jobs have finished, then clean it up elsewhere - */ -void remoteClientFreeFunc(void *data) +static void +remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv) { - struct daemonClientPrivate *priv = data; - - /* Deregister event delivery callback */ - if (priv->conn) { - virIdentityPtr sysident = virIdentityGetSystem(); - - virIdentitySetCurrent(sysident); - + if (priv && priv->conn) { Well each of the callers has already dereferenced @priv so it's a bit too late to check it now! yes, I also think this check is too late. this placement can only check
If client terminate normally, client would call virConnectDomainEventDeregisterAny to clear eventCallbacks. Then libvirt daemon will call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) to free eventCallbacks and unRef client. just think if client don't call virConnectDomainEventDeregisterAny and be killed. then there is no way to call call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) so client's ref can't down to zero. so remoteClientFreeFunc can't be called. then client object memory leak. The difference is call remoteDispatchConnectDomainEventDeregisterAny or don't call remoteDispatchConnectDomainEventDeregisterAny to Unref(callback->client). here is my test code #include<stdio.h> #include<pthread.h> #include<libvirt/libvirt.h> int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event, int detail, void *opaque); void timeOutCallback(int timer, void *opaque); int run = 1; int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event, int detail, void *opaque) { char uuid[128] = { 0 }; (void) virDomainGetUUIDString(dom, uuid); printf("domain %s come a event %d\n", uuid, event); } void _timeOutCallback(int timer, void *opaque) { printf("time out is comming\n"); } void *event_thread(void *opaque) { int i =0; if (virEventRegisterDefaultImpl() < 0) { return NULL; } virConnectPtr conn = virConnectOpen("qemu:///system"); if (conn == NULL) { return NULL; } int callback = virConnectDomainEventRegisterAny(conn, NULL, VIR_DOMAIN_EVENT_ID_LIFECYCLE, _DomainStatusCallback, NULL, NULL); int timeout = virEventAddTimeout(1000,_timeOutCallback,NULL,NULL); while (run && virConnectIsAlive(conn) == 1) { virEventRunDefaultImpl(); } sleep(5); virConnectDomainEventDeregisterAny(conn, callback); virConnectClose(conn); } int main(int argc, char *argv[]) { pthread_t pid; int ret = 0; ret = pthread_create(&pid, NULL, event_thread, NULL); sleep(100); run = 0; sleep(10); } when I don't call remoteClientFreePrivateCallbacks at remoteClientCloseFunc, here is client refs change when I kill the client. (gdb) c Continuing. Hardware watchpoint 6: *(int *) 0x5637edc888f4 Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); (gdb) bt #0 virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at util/virobject.c:296 #1 0x00005637ec61d643 in remoteDispatchConnectDomainEventCallbackRegisterAny (server=<optimized out>, msg=<optimized out>, ret=0x7f054c0008e0, args=0x7f054c0008c0, rerr=0x7f057ba6cc90, client=0x5637edc888f0) at remote.c:4418 #2 remoteDispatchConnectDomainEventCallbackRegisterAnyHelper (server=<optimized out>, client=0x5637edc888f0, msg=<optimized out>, rerr=0x7f057ba6cc90, args=0x7f054c0008c0, ret=0x7f054c0008e0) at remote_dispatch.h:424 #3 0x00005637ec6538a8 in virNetServerProgramDispatchCall (msg=0x5637edc89d40, client=0x5637edc888f0, server=0x5637edc6bf10, prog=0x5637edc84810) at rpc/virnetserverprogram.c:474 #4 virNetServerProgramDispatch (prog=0x5637edc84810, server=server@entry=0x5637edc6bf10, client=0x5637edc888f0, msg=0x5637edc89d40) at rpc/virnetserverprogram.c:311 #5 0x00005637ec64f44d in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x5637edc6bf10) at rpc/virnetserver.c:148 #6 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x5637edc6bf10) at rpc/virnetserver.c:169 #7 0x00007f058de6c311 in virThreadPoolWorker (opaque=opaque@entry=0x5637edc603e0) at util/virthreadpool.c:167 #8 0x00007f058de6b628 in virThreadHelper (data=<optimized out>) at util/virthread.c:219 #9 0x00007f058ae55dc5 in start_thread () from /usr/lib64/libpthread.so.0 #10 0x00007f058ab836fd in clone () from /usr/lib64/libc.so.6 Old value = 5 New value = 4 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); Continuing. [Switching to Thread 0x7f058eb1f840 (LWP 3878)] Hardware watchpoint 9: *(int *) 0x5637edc88634 Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); #0 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 #1 0x00005637ec65299d in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:978 #2 0x00005637ec650728 in virNetServerProcessClients (srv=0x5637edc6bf10) at rpc/virnetserver.c:861 #3 0x00007f058df8b099 in daemonServerProcessClients (payload=<optimized out>, key=<optimized out>, opaque=<optimized out>) at rpc/virnetdaemon.c:764 #4 0x00007f058de1921d in virHashForEach (table=0x5637edc6d950, iter=iter@entry=0x7f058df8b090 <daemonServerProcessClients>, data=data@entry=0x0) at util/virhash.c:606 #5 0x00007f058df8c14f in virNetDaemonRun (dmn=0x5637edc6f270) at rpc/virnetdaemon.c:831 #6 0x00005637ec610c48 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1605 Continuing. Hardware watchpoint 9: *(int *) 0x5637edc88634 Old value = 5 New value = 4 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec6529bd in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:982 #2 0x00005637ec650728 in virNetServerProcessClients (srv=0x5637edc6bf10) at rpc/virnetserver.c:861 #3 0x00007f058df8b099 in daemonServerProcessClients (payload=<optimized out>, key=<optimized out>, opaque=<optimized out>) at rpc/virnetdaemon.c:764 #4 0x00007f058de1921d in virHashForEach (table=0x5637edc6d950, iter=iter@entry=0x7f058df8b090 <daemonServerProcessClients>, data=data@entry=0x0) at util/virhash.c:606 #5 0x00007f058df8c14f in virNetDaemonRun (dmn=0x5637edc6f270) at rpc/virnetdaemon.c:831 #6 0x00005637ec610c48 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1605 Continuing. Hardware watchpoint 9: *(int *) 0x5637edc88634 Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); #0 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 #1 0x00005637ec6529d1 in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:987 #2 0x00005637ec650728 in virNetServerProcessClients (srv=0x5637edc6bf10) at rpc/virnetserver.c:861 #3 0x00007f058df8b099 in daemonServerProcessClients (payload=<optimized out>, key=<optimized out>, opaque=<optimized out>) at rpc/virnetdaemon.c:764 #4 0x00007f058de1921d in virHashForEach (table=0x5637edc6d950, iter=iter@entry=0x7f058df8b090 <daemonServerProcessClients>, data=data@entry=0x0) at util/virhash.c:606 #5 0x00007f058df8c14f in virNetDaemonRun (dmn=0x5637edc6f270) at rpc/virnetdaemon.c:831 #6 0x00005637ec610c48 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1605 Continuing. Hardware watchpoint 9: *(int *) 0x5637edc88634 Old value = 5 New value = 4 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec6529ee in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:991 #2 0x00005637ec650728 in virNetServerProcessClients (srv=0x5637edc6bf10) at rpc/virnetserver.c:861 #3 0x00007f058df8b099 in daemonServerProcessClients (payload=<optimized out>, key=<optimized out>, opaque=<optimized out>) at rpc/virnetdaemon.c:764 #4 0x00007f058de1921d in virHashForEach (table=0x5637edc6d950, iter=iter@entry=0x7f058df8b090 <daemonServerProcessClients>, data=data@entry=0x0) at util/virhash.c:606 #5 0x00007f058df8c14f in virNetDaemonRun (dmn=0x5637edc6f270) at rpc/virnetdaemon.c:831 #6 0x00005637ec610c48 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1605 Continuing. Hardware watchpoint 9: *(int *) 0x5637edc88634 Old value = 4 New value = 3 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec650705 in virNetServerProcessClients (srv=0x5637edc6bf10) at rpc/virnetserver.c:873 #2 0x00007f058df8b099 in daemonServerProcessClients (payload=<optimized out>, key=<optimized out>, opaque=<optimized out>) at rpc/virnetdaemon.c:764 #3 0x00007f058de1921d in virHashForEach (table=0x5637edc6d950, iter=iter@entry=0x7f058df8b090 <daemonServerProcessClients>, data=data@entry=0x0) at util/virhash.c:606 #4 0x00007f058df8c14f in virNetDaemonRun (dmn=0x5637edc6f270) at rpc/virnetdaemon.c:831 #5 0x00005637ec610c48 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1605 Continuing. Hardware watchpoint 9: *(int *) 0x5637edc88634 Old value = 3 New value = 2 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 #1 0x00007f058de4199b in virObjectUnref (anyobj=<optimized out>) at util/virobject.c:265 #2 0x00007f058de0db34 in virEventPollCleanupTimeouts () at util/vireventpoll.c:543 #3 0x00007f058de0ea7e in virEventPollRunOnce () at util/vireventpoll.c:628 #4 0x00007f058de0d892 in virEventRunDefaultImpl () at util/virevent.c:314 #5 0x00007f058df8c12d in virNetDaemonRun (dmn=0x5637edc6f270) at rpc/virnetdaemon.c:824 #6 0x00005637ec610c48 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1605 Continuing. Hardware watchpoint 9: *(int *) 0x5637edc88634 Old value = 2 New value = 1 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec656bee in virNetSocketEventFree (opaque=opaque@entry=0x5637edc8a5b0) at rpc/virnetsocket.c:2152 #2 0x00007f058de0dd34 in virEventPollCleanupHandles () at util/vireventpoll.c:592 #3 0x00007f058de0ea83 in virEventPollRunOnce () at util/vireventpoll.c:629 #4 0x00007f058de0d892 in virEventRunDefaultImpl () at util/virevent.c:314 #5 0x00007f058df8c12d in virNetDaemonRun (dmn=0x5637edc6f270) at rpc/virnetdaemon.c:824 #6 0x00005637ec610c48 in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1605 Continuing. so the client refs alway have 1 when I register one event callbacks. priv->conn.
DEREG_CB(priv->conn, priv->domainEventCallbacks, priv->ndomainEventCallbacks, virConnectDomainEventDeregisterAny, "domain"); @@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data) remoteRelayConnectionClosedEvent) < 0) VIR_WARN("unexpected close callback event deregister failure"); } + } +} +#undef DEREG_CB +
There should be another blank line here. OK, I lost a blank line.
+/* + * You must hold lock for at least the client + * We don't free stuff here, merely disconnect the client's + * network socket & resources. + * We keep the libvirt connection open until any async + * jobs have finished, then clean it up elsewhere + */ +void remoteClientFreeFunc(void *data) +{ + struct daemonClientPrivate *priv = data; + + /* Deregister event delivery callback */ + if (priv->conn) { + virIdentityPtr sysident = virIdentityGetSystem(); + + virIdentitySetCurrent(sysident); + + remoteClientFreePrivateCallbacks(priv);
virConnectClose(priv->conn);
virIdentitySetCurrent(NULL); virObjectUnref(sysident); - }
+ } VIR_FREE(priv); + } -#undef DEREG_CB
static void remoteClientCloseFunc(virNetServerClientPtr client) @@ -1747,6 +1756,7 @@ static void remoteClientCloseFunc(virNetServerClientPtr client) struct daemonClientPrivate *priv = virNetServerClientGetPrivateData(client);
daemonRemoveAllClientStreams(priv->streams); + remoteClientFreePrivateCallbacks(priv); Perhaps this just gets a "if (priv->conn)" before calling...
Still if as I assume the missing Unref is found (someday), then theoretically remoteClientFreeFunc would be called - of course the nEventCallbacks = 0 will ensure the Dereg's dont' get called again. So it shoudn't be a problem In my opinion, it is safe to check point is Null. But check everywhere is stupid. so it is hard to determine where need check.:-)
John
}
.

On 11/11/2017 03:20 AM, xinhua.Cao wrote:
在 2017/11/11 5:43, John Ferlan 写道:
On 11/06/2017 12:47 AM, xinhua.Cao wrote:
base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve libvirt coredump problem, but it introduce a memory leak sense:
1. one client register a domain event such as reboot event 2. and client was terminated unexpectly, such as kill -9, unexpectedly
then this client object will not free at libvirtd service program.
remoteDispatchConnectDomainEventCallbackRegisterAny reference the client, but when client was terminated before it call deRegisterAny,the reference of client will not reduced to zero. so the memory leak take place. What's the difference between the path that would terminate normally and the one that terminates abnormally. It seems to me there might be (an) extra Ref(s) on the client which prevents virNetServerClientDispose from calling it's @privateDataFreeFunc which would call remoteClientFreeFunc since remoteClientInitHook worked properly.
That's what needs to be found.
If client terminate normally, client would call virConnectDomainEventDeregisterAny to clear eventCallbacks. Then libvirt daemon will call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) to free eventCallbacks and unRef client. just think if client don't call virConnectDomainEventDeregisterAny and be killed. then there is no way to call call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) so client's ref can't down to zero. so remoteClientFreeFunc can't be called. then client object memory leak.
So the client/sample code is not calling Deregister because it has no "handling" of an abnormal exit. That is no signal handler to call Deregister in the event some outside force kills the client/sample code while it's processing. So a different solution would be to add signal handling to the code in order to handle the kill and make the Deregister (and close) calls.
The difference is call remoteDispatchConnectDomainEventDeregisterAny or don't call remoteDispatchConnectDomainEventDeregisterAny to Unref(callback->client).
IIUC, the whole reason the Deregister code is in the Free Func is to "handle" the case where that last client reference was Unref'd, but someone didn't perform the actual Deregister. The logic predates my libvirt involvement... Still because of commit id 'fe8f1c8b' where we generate a REF for the Register and that's transparent to the consumer (e.g. how would they know they need to ensure that Deregister is called), thus the purpose of this patch is to find a way to Deregister if it's determined that the consumer hasn't by the time of the "last" REF we'd have [trying to write a commit message here too]. This solution to this problem is to alter the processing to have the remoteClientCloseFunc handle performing the Deregister calls instead of the remoteClientFreeFunc because there's no way FreeFunc would be called unless the Deregister was already called. In fact, if you think about it if the CloseFunc does the Deregister why would the FreeFunc also include that call? IOW: It should only need to be in one or the other. I know you've already posted patch v3; however, I also think you need to read commit id '8294aa0c'. If you're moving the Deregister handling to the close func, then I would think that the "reason" for that commit still applies. When you generate a v4, my suggestion is to generate 2 patches. 1. Just the remoteClientFreePrivateCallbacks separation including the sysident handling. From my read of the code, the virConnectClose doesn't have the same issue. 2. Move the call to remoteClientFreePrivateCallbacks from FreeFunc to CloseFunc with a commit message that explains why this is necessary since commit id 'fe8f1c8b' (BTW: No need to supply the complete hash, 8 significant digits are usually plenty). I supplied a bit more details w/r/t what I found for REF/UNREF - I assume it essentially matches what you've found. But having it here certainly helps - so thanks for providing it along with the example.
here is my test code
#include<stdio.h> #include<pthread.h> #include<libvirt/libvirt.h>
int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event, int detail, void *opaque); void timeOutCallback(int timer, void *opaque);
int run = 1;
int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event, int detail, void *opaque) { char uuid[128] = { 0 }; (void) virDomainGetUUIDString(dom, uuid); printf("domain %s come a event %d\n", uuid, event); }
void _timeOutCallback(int timer, void *opaque) { printf("time out is comming\n"); }
void *event_thread(void *opaque) { int i =0; if (virEventRegisterDefaultImpl() < 0) { return NULL; }
virConnectPtr conn = virConnectOpen("qemu:///system"); if (conn == NULL) { return NULL; }
int callback = virConnectDomainEventRegisterAny(conn, NULL, VIR_DOMAIN_EVENT_ID_LIFECYCLE, _DomainStatusCallback, NULL, NULL); int timeout = virEventAddTimeout(1000,_timeOutCallback,NULL,NULL);
while (run && virConnectIsAlive(conn) == 1) { virEventRunDefaultImpl(); }
sleep(5); virConnectDomainEventDeregisterAny(conn, callback); virConnectClose(conn); }
int main(int argc, char *argv[]) { pthread_t pid; int ret = 0; ret = pthread_create(&pid, NULL, event_thread, NULL); sleep(100); run = 0; sleep(10); }
when I don't call remoteClientFreePrivateCallbacks at remoteClientCloseFunc, here is client refs change when I kill the client. (gdb) c Continuing. Hardware watchpoint 6: *(int *) 0x5637edc888f4
Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); (gdb) bt #0 virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at util/virobject.c:296 #1 0x00005637ec61d643 in remoteDispatchConnectDomainEventCallbackRegisterAny (server=<optimized out>, msg=<optimized out>, ret=0x7f054c0008e0, args=0x7f054c0008c0, rerr=0x7f057ba6cc90, client=0x5637edc888f0) at remote.c:4418
Although the line numbers are different with top of tree, this is the RegisterAny REF on the client... [...] So that says to me we generally run with 3-4 REFs (without any Register Event occurring). AFAICT, the following is the list: 1. Adding to the srv->clients (virNetServerAddClient) 2. Adding IO Callback (virNetServerClientRegisterEvent) 3. Adding Keep Alive (virNetServerClientInitKeepAlive and virKeepAliveStart) The 4th is a bit trickier, but I believe it's REF'd during the call to virNetServerClientDispatchRead...
Old value = 5 New value = 4 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); Continuing. [Switching to Thread 0x7f058eb1f840 (LWP 3878)] Hardware watchpoint 9: *(int *) 0x5637edc88634
I'm assuming this comes from the opposite end of the DispatchRead - there's no trace here to prove it, but that's not necessarily surprising either as it's not easy to find where the Unref would naturally occur without an error path occurring....
Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); #0 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 #1 0x00005637ec65299d in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:978
[...]
Old value = 5 New value = 4 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec6529bd in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:982
This pair of REF/UNREF in virNetServerClientClose when client->keepalive is set. [...]
Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); #0 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 #1 0x00005637ec6529d1 in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:987
[...]
Old value = 5 New value = 4 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec6529ee in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:991
This pair of REF/UNREF in virNetServerClientClose when client->privateDataCloseFunc is set. [...]
Old value = 4 New value = 3 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec650705 in virNetServerProcessClients (srv=0x5637edc6bf10) at rpc/virnetserver.c:873
Again my line numbers are off a bit, but this would be the for the UNREF after VIR_DELETE_ELEMENT when removed from srv->clients [...]
Old value = 3 New value = 2 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 #1 0x00007f058de4199b in virObjectUnref (anyobj=<optimized out>) at util/virobject.c:265 #2 0x00007f058de0db34 in virEventPollCleanupTimeouts () at util/vireventpoll.c:543
[...] This one is the opposite end of the virObjectRef done for the keep alive timer. See virNetServerClientInitKeepAlive and virKeepAliveStart usage of virEventAddTimeout.
Old value = 2 New value = 1 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec656bee in virNetSocketEventFree (opaque=opaque@entry=0x5637edc8a5b0) at rpc/virnetsocket.c:2152 #2 0x00007f058de0dd34 in virEventPollCleanupHandles () at util/vireventpoll.c:592
[...] This one is the opposite end of the virObjectRef done in virNetServerClientRegisterEvent prior to virNetSocketAddIOCallback which calls virEventAddHandle. Theoretically the "last" REF... John
Continuing.
this patch will deRegister all event callbacks when client was close. --- FWIW: This commit message was a bit difficult to follow - I know it's primarily a language barrier thing - so maybe if you just construct some function code flow for both initialization/open and destruction/close it'd help... Paths that work and the path that you think is broken... That can be done in this section after the "---" and before the changed API's.
I assume you've done a lot of research, but the details of that research can be difficult to just "pick up on". Typically someone will provide some sort of valgrind output as an example.
I just have a sense that this could boil down to a path (or two) that are doing the virObjectRef without the expected virObjectUnref occurring - something perhaps to do with how @wantClose is handled in the abnormal termination case. Nothing jumps out at me though even after looking at the code for a while.
daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/daemon/remote.c b/daemon/remote.c index cbcb6e8..466b2e7 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1689,23 +1689,10 @@ void remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED, int r neventCallbacks = 0; \ } while (0); -/* - * You must hold lock for at least the client - * We don't free stuff here, merely disconnect the client's - * network socket & resources. - * We keep the libvirt connection open until any async - * jobs have finished, then clean it up elsewhere - */ -void remoteClientFreeFunc(void *data) +static void +remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv) { - struct daemonClientPrivate *priv = data; - - /* Deregister event delivery callback */ - if (priv->conn) { - virIdentityPtr sysident = virIdentityGetSystem(); - - virIdentitySetCurrent(sysident); - + if (priv && priv->conn) { Well each of the callers has already dereferenced @priv so it's a bit too late to check it now! yes, I also think this check is too late. this placement can only check
so the client refs alway have 1 when I register one event callbacks. priv->conn.
DEREG_CB(priv->conn, priv->domainEventCallbacks, priv->ndomainEventCallbacks, virConnectDomainEventDeregisterAny, "domain"); @@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data) remoteRelayConnectionClosedEvent) < 0) VIR_WARN("unexpected close callback event deregister failure"); } + } +} +#undef DEREG_CB + There should be another blank line here. OK, I lost a blank line.
+/* + * You must hold lock for at least the client + * We don't free stuff here, merely disconnect the client's + * network socket & resources. + * We keep the libvirt connection open until any async + * jobs have finished, then clean it up elsewhere + */ +void remoteClientFreeFunc(void *data) +{ + struct daemonClientPrivate *priv = data; + + /* Deregister event delivery callback */ + if (priv->conn) { + virIdentityPtr sysident = virIdentityGetSystem(); + + virIdentitySetCurrent(sysident); + + remoteClientFreePrivateCallbacks(priv); virConnectClose(priv->conn); virIdentitySetCurrent(NULL); virObjectUnref(sysident); - } + } VIR_FREE(priv); + } -#undef DEREG_CB static void remoteClientCloseFunc(virNetServerClientPtr client) @@ -1747,6 +1756,7 @@ static void remoteClientCloseFunc(virNetServerClientPtr client) struct daemonClientPrivate *priv = virNetServerClientGetPrivateData(client); daemonRemoveAllClientStreams(priv->streams); + remoteClientFreePrivateCallbacks(priv); Perhaps this just gets a "if (priv->conn)" before calling...
Still if as I assume the missing Unref is found (someday), then theoretically remoteClientFreeFunc would be called - of course the nEventCallbacks = 0 will ensure the Dereg's dont' get called again. So it shoudn't be a problem In my opinion, it is safe to check point is Null. But check everywhere is stupid. so it is hard to determine where need check.:-)
John
}
.

在 2017/11/12 22:45, John Ferlan 写道:
On 11/11/2017 03:20 AM, xinhua.Cao wrote:
在 2017/11/11 5:43, John Ferlan 写道:
On 11/06/2017 12:47 AM, xinhua.Cao wrote:
base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve libvirt coredump problem, but it introduce a memory leak sense:
1. one client register a domain event such as reboot event 2. and client was terminated unexpectly, such as kill -9, unexpectedly
then this client object will not free at libvirtd service program.
remoteDispatchConnectDomainEventCallbackRegisterAny reference the client, but when client was terminated before it call deRegisterAny,the reference of client will not reduced to zero. so the memory leak take place. What's the difference between the path that would terminate normally and the one that terminates abnormally. It seems to me there might be (an) extra Ref(s) on the client which prevents virNetServerClientDispose from calling it's @privateDataFreeFunc which would call remoteClientFreeFunc since remoteClientInitHook worked properly.
That's what needs to be found. If client terminate normally, client would call virConnectDomainEventDeregisterAny to clear eventCallbacks. Then libvirt daemon will call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) to free eventCallbacks and unRef client. just think if client don't call virConnectDomainEventDeregisterAny and be killed. then there is no way to call call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) so client's ref can't down to zero. so remoteClientFreeFunc can't be called. then client object memory leak.
So the client/sample code is not calling Deregister because it has no "handling" of an abnormal exit. That is no signal handler to call Deregister in the event some outside force kills the client/sample code while it's processing.
So a different solution would be to add signal handling to the code in order to handle the kill and make the Deregister (and close) calls.
The difference is call remoteDispatchConnectDomainEventDeregisterAny or don't call remoteDispatchConnectDomainEventDeregisterAny to Unref(callback->client).
IIUC, the whole reason the Deregister code is in the Free Func is to "handle" the case where that last client reference was Unref'd, but someone didn't perform the actual Deregister. The logic predates my libvirt involvement...
Still because of commit id 'fe8f1c8b' where we generate a REF for the Register and that's transparent to the consumer (e.g. how would they know they need to ensure that Deregister is called), thus the purpose of this patch is to find a way to Deregister if it's determined that the consumer hasn't by the time of the "last" REF we'd have [trying to write a commit message here too].
This solution to this problem is to alter the processing to have the remoteClientCloseFunc handle performing the Deregister calls instead of the remoteClientFreeFunc because there's no way FreeFunc would be called unless the Deregister was already called. In fact, if you think about it if the CloseFunc does the Deregister why would the FreeFunc also include that call? IOW: It should only need to be in one or the other.
I know you've already posted patch v3; however, I also think you need to read commit id '8294aa0c'. If you're moving the Deregister handling to the close func, then I would think that the "reason" for that commit still applies.
When you generate a v4, my suggestion is to generate 2 patches.
1. Just the remoteClientFreePrivateCallbacks separation including the sysident handling. From my read of the code, the virConnectClose doesn't have the same issue. I am unfamiliarity with sysident. then research on it. It used for
2. Move the call to remoteClientFreePrivateCallbacks from FreeFunc to CloseFunc with a commit message that explains why this is necessary since commit id 'fe8f1c8b' (BTW: No need to supply the complete hash, 8 significant digits are usually plenty).
signal handler can handle such as kill -15 signal, But can't handle kill -9 signal. so we also need to resolve it in libvirt polkit check but I still can't confirm it. this two advices are very useful. thanks very much.
I supplied a bit more details w/r/t what I found for REF/UNREF - I assume it essentially matches what you've found. But having it here certainly helps - so thanks for providing it along with the example.
here is my test code
#include<stdio.h> #include<pthread.h> #include<libvirt/libvirt.h>
int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event, int detail, void *opaque); void timeOutCallback(int timer, void *opaque);
int run = 1;
int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event, int detail, void *opaque) { char uuid[128] = { 0 }; (void) virDomainGetUUIDString(dom, uuid); printf("domain %s come a event %d\n", uuid, event); }
void _timeOutCallback(int timer, void *opaque) { printf("time out is comming\n"); }
void *event_thread(void *opaque) { int i =0; if (virEventRegisterDefaultImpl() < 0) { return NULL; }
virConnectPtr conn = virConnectOpen("qemu:///system"); if (conn == NULL) { return NULL; }
int callback = virConnectDomainEventRegisterAny(conn, NULL, VIR_DOMAIN_EVENT_ID_LIFECYCLE, _DomainStatusCallback, NULL, NULL); int timeout = virEventAddTimeout(1000,_timeOutCallback,NULL,NULL);
while (run && virConnectIsAlive(conn) == 1) { virEventRunDefaultImpl(); }
sleep(5); virConnectDomainEventDeregisterAny(conn, callback); virConnectClose(conn); }
int main(int argc, char *argv[]) { pthread_t pid; int ret = 0; ret = pthread_create(&pid, NULL, event_thread, NULL); sleep(100); run = 0; sleep(10); }
when I don't call remoteClientFreePrivateCallbacks at remoteClientCloseFunc, here is client refs change when I kill the client. (gdb) c Continuing. Hardware watchpoint 6: *(int *) 0x5637edc888f4
Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); (gdb) bt #0 virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at util/virobject.c:296 #1 0x00005637ec61d643 in remoteDispatchConnectDomainEventCallbackRegisterAny (server=<optimized out>, msg=<optimized out>, ret=0x7f054c0008e0, args=0x7f054c0008c0, rerr=0x7f057ba6cc90, client=0x5637edc888f0) at remote.c:4418 Although the line numbers are different with top of tree, this is the RegisterAny REF on the client...
[...]
So that says to me we generally run with 3-4 REFs (without any Register Event occurring). AFAICT, the following is the list:
1. Adding to the srv->clients (virNetServerAddClient) 2. Adding IO Callback (virNetServerClientRegisterEvent) 3. Adding Keep Alive (virNetServerClientInitKeepAlive and virKeepAliveStart)
The 4th is a bit trickier, but I believe it's REF'd during the call to virNetServerClientDispatchRead...
Old value = 5 New value = 4 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); Continuing. [Switching to Thread 0x7f058eb1f840 (LWP 3878)] Hardware watchpoint 9: *(int *) 0x5637edc88634 I'm assuming this comes from the opposite end of the DispatchRead - there's no trace here to prove it, but that's not necessarily surprising either as it's not easy to find where the Unref would naturally occur without an error path occurring....
Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); #0 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 #1 0x00005637ec65299d in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:978 [...]
Old value = 5 New value = 4 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec6529bd in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:982 This pair of REF/UNREF in virNetServerClientClose when client->keepalive is set.
[...]
Old value = 4 New value = 5 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 296 PROBE(OBJECT_REF, "obj=%p", obj); #0 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296 #1 0x00005637ec6529d1 in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:987 [...]
Old value = 5 New value = 4 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec6529ee in virNetServerClientClose (client=0x5637edc88630) at rpc/virnetserverclient.c:991 This pair of REF/UNREF in virNetServerClientClose when client->privateDataCloseFunc is set.
[...]
Old value = 4 New value = 3 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec650705 in virNetServerProcessClients (srv=0x5637edc6bf10) at rpc/virnetserver.c:873 Again my line numbers are off a bit, but this would be the for the UNREF after VIR_DELETE_ELEMENT when removed from srv->clients
[...]
Old value = 3 New value = 2 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 #1 0x00007f058de4199b in virObjectUnref (anyobj=<optimized out>) at util/virobject.c:265 #2 0x00007f058de0db34 in virEventPollCleanupTimeouts () at util/vireventpoll.c:543 [...]
This one is the opposite end of the virObjectRef done for the keep alive timer. See virNetServerClientInitKeepAlive and virKeepAliveStart usage of virEventAddTimeout.
Old value = 2 New value = 1 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 259 PROBE(OBJECT_UNREF, "obj=%p", obj); #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259 #1 0x00005637ec656bee in virNetSocketEventFree (opaque=opaque@entry=0x5637edc8a5b0) at rpc/virnetsocket.c:2152 #2 0x00007f058de0dd34 in virEventPollCleanupHandles () at util/vireventpoll.c:592 [...]
This one is the opposite end of the virObjectRef done in virNetServerClientRegisterEvent prior to virNetSocketAddIOCallback which calls virEventAddHandle.
Theoretically the "last" REF...
John
Continuing.
this patch will deRegister all event callbacks when client was close. --- FWIW: This commit message was a bit difficult to follow - I know it's primarily a language barrier thing - so maybe if you just construct some function code flow for both initialization/open and destruction/close it'd help... Paths that work and the path that you think is broken... That can be done in this section after the "---" and before the changed API's.
I assume you've done a lot of research, but the details of that research can be difficult to just "pick up on". Typically someone will provide some sort of valgrind output as an example.
I just have a sense that this could boil down to a path (or two) that are doing the virObjectRef without the expected virObjectUnref occurring - something perhaps to do with how @wantClose is handled in the abnormal termination case. Nothing jumps out at me though even after looking at the code for a while.
daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------ 1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/daemon/remote.c b/daemon/remote.c index cbcb6e8..466b2e7 100644 --- a/daemon/remote.c +++ b/daemon/remote.c @@ -1689,23 +1689,10 @@ void remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED, int r neventCallbacks = 0; \ } while (0); -/* - * You must hold lock for at least the client - * We don't free stuff here, merely disconnect the client's - * network socket & resources. - * We keep the libvirt connection open until any async - * jobs have finished, then clean it up elsewhere - */ -void remoteClientFreeFunc(void *data) +static void +remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv) { - struct daemonClientPrivate *priv = data; - - /* Deregister event delivery callback */ - if (priv->conn) { - virIdentityPtr sysident = virIdentityGetSystem(); - - virIdentitySetCurrent(sysident); - + if (priv && priv->conn) { Well each of the callers has already dereferenced @priv so it's a bit too late to check it now! yes, I also think this check is too late. this placement can only check
so the client refs alway have 1 when I register one event callbacks. priv->conn.
DEREG_CB(priv->conn, priv->domainEventCallbacks, priv->ndomainEventCallbacks, virConnectDomainEventDeregisterAny, "domain"); @@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data)
remoteRelayConnectionClosedEvent) < 0) VIR_WARN("unexpected close callback event deregister failure"); } + } +} +#undef DEREG_CB + There should be another blank line here. OK, I lost a blank line. +/* + * You must hold lock for at least the client + * We don't free stuff here, merely disconnect the client's + * network socket & resources. + * We keep the libvirt connection open until any async + * jobs have finished, then clean it up elsewhere + */ +void remoteClientFreeFunc(void *data) +{ + struct daemonClientPrivate *priv = data; + + /* Deregister event delivery callback */ + if (priv->conn) { + virIdentityPtr sysident = virIdentityGetSystem(); + + virIdentitySetCurrent(sysident); + + remoteClientFreePrivateCallbacks(priv); virConnectClose(priv->conn); virIdentitySetCurrent(NULL); virObjectUnref(sysident); - } + } VIR_FREE(priv); + } -#undef DEREG_CB static void remoteClientCloseFunc(virNetServerClientPtr client) @@ -1747,6 +1756,7 @@ static void remoteClientCloseFunc(virNetServerClientPtr client) struct daemonClientPrivate *priv = virNetServerClientGetPrivateData(client); daemonRemoveAllClientStreams(priv->streams); + remoteClientFreePrivateCallbacks(priv); Perhaps this just gets a "if (priv->conn)" before calling...
Still if as I assume the missing Unref is found (someday), then theoretically remoteClientFreeFunc would be called - of course the nEventCallbacks = 0 will ensure the Dereg's dont' get called again. So it shoudn't be a problem In my opinion, it is safe to check point is Null. But check everywhere is stupid. so it is hard to determine where need check.:-) John
}
.
.

On Mon, Nov 13, 2017 at 09:07:12PM +0800, xinhua.Cao wrote:
在 2017/11/12 22:45, John Ferlan 写道:
On 11/11/2017 03:20 AM, xinhua.Cao wrote:
在 2017/11/11 5:43, John Ferlan 写道:
On 11/06/2017 12:47 AM, xinhua.Cao wrote:
base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve libvirt coredump problem, but it introduce a memory leak sense:
1. one client register a domain event such as reboot event 2. and client was terminated unexpectly, such as kill -9, unexpectedly
then this client object will not free at libvirtd service program.
remoteDispatchConnectDomainEventCallbackRegisterAny reference the client, but when client was terminated before it call deRegisterAny,the reference of client will not reduced to zero. so the memory leak take place. What's the difference between the path that would terminate normally and the one that terminates abnormally. It seems to me there might be (an) extra Ref(s) on the client which prevents virNetServerClientDispose from calling it's @privateDataFreeFunc which would call remoteClientFreeFunc since remoteClientInitHook worked properly.
That's what needs to be found. If client terminate normally, client would call virConnectDomainEventDeregisterAny to clear eventCallbacks. Then libvirt daemon will call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) to free eventCallbacks and unRef client. just think if client don't call virConnectDomainEventDeregisterAny and be killed. then there is no way to call call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) so client's ref can't down to zero. so remoteClientFreeFunc can't be called. then client object memory leak.
So the client/sample code is not calling Deregister because it has no "handling" of an abnormal exit. That is no signal handler to call Deregister in the event some outside force kills the client/sample code while it's processing.
So a different solution would be to add signal handling to the code in order to handle the kill and make the Deregister (and close) calls.
signal handler can handle such as kill -15 signal, But can't handle kill -9 signal. so we also need to resolve it in libvirt
Basically we need to handle bad behaving clients in the daemon anyway. I tried running libvirtd under valgrind and there was no extra leak with a client which didn't deregister the callback. How did you encounter this? I really want to find the place that I think needs to be changed, but without a reproducer this is just a guessing game.

在 2017/11/13 22:27, Martin Kletzander 写道:
On Mon, Nov 13, 2017 at 09:07:12PM +0800, xinhua.Cao wrote:
在 2017/11/12 22:45, John Ferlan 写道:
On 11/11/2017 03:20 AM, xinhua.Cao wrote:
在 2017/11/11 5:43, John Ferlan 写道:
On 11/06/2017 12:47 AM, xinhua.Cao wrote:
base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve libvirt coredump problem, but it introduce a memory leak sense:
1. one client register a domain event such as reboot event 2. and client was terminated unexpectly, such as kill -9, unexpectedly
then this client object will not free at libvirtd service program.
remoteDispatchConnectDomainEventCallbackRegisterAny reference the client, but when client was terminated before it call deRegisterAny,the reference of client will not reduced to zero. so the memory leak take place. What's the difference between the path that would terminate normally and the one that terminates abnormally. It seems to me there might be (an) extra Ref(s) on the client which prevents virNetServerClientDispose from calling it's @privateDataFreeFunc which would call remoteClientFreeFunc since remoteClientInitHook worked properly.
That's what needs to be found. If client terminate normally, client would call virConnectDomainEventDeregisterAny to clear eventCallbacks. Then libvirt daemon will call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) to free eventCallbacks and unRef client. just think if client don't call virConnectDomainEventDeregisterAny and be killed. then there is no way to call call remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree -> virObjectUnref(callback->client) so client's ref can't down to zero. so remoteClientFreeFunc can't be called. then client object memory leak.
So the client/sample code is not calling Deregister because it has no "handling" of an abnormal exit. That is no signal handler to call Deregister in the event some outside force kills the client/sample code while it's processing.
So a different solution would be to add signal handling to the code in order to handle the kill and make the Deregister (and close) calls.
signal handler can handle such as kill -15 signal, But can't handle kill -9 signal. so we also need to resolve it in libvirt
Basically we need to handle bad behaving clients in the daemon anyway. I tried running libvirtd under valgrind and there was no extra leak with a client which didn't deregister the callback. How did you encounter this? I really want to find the place that I think needs to be changed, but without a reproducer this is just a guessing game. event mechanism is a very complicated. I guess it was freed at libvirtd deamon quit at qemuStateCleanup it cleared all event by virObjectUnref(qemu_driver->domainEventState), so client was also cleared at this moment. but I can't gdb at this point(qemuStateCleanup) when libvirtd recieve kill -15 signal.
participants (3)
-
John Ferlan
-
Martin Kletzander
-
xinhua.Cao