在 2017/11/12 22:45, John Ferlan 写道:
On 11/11/2017 03:20 AM, xinhua.Cao wrote:
>
> 在 2017/11/11 5:43, John Ferlan 写道:
>> On 11/06/2017 12:47 AM, xinhua.Cao wrote:
>>> base on commit fe8f1c8b8650c8ab5e7d27338f4538582651bd14, we solve
>>> libvirt coredump problem, but it introduce a memory leak sense:
>>>
>>> 1. one client register a domain event such as reboot event
>>> 2. and client was terminated unexpectly, such as kill -9,
>> unexpectedly
>>
>>> then this client object will not free at libvirtd service program.
>>>
>>> remoteDispatchConnectDomainEventCallbackRegisterAny reference the
>>> client,
>>> but when client was terminated before it call deRegisterAny,the
>>> reference
>>> of client will not reduced to zero. so the memory leak take place.
>> What's the difference between the path that would terminate normally and
>> the one that terminates abnormally. It seems to me there might be (an)
>> extra Ref(s) on the client which prevents virNetServerClientDispose from
>> calling it's @privateDataFreeFunc which would call remoteClientFreeFunc
>> since remoteClientInitHook worked properly.
>>
>> That's what needs to be found.
> If client terminate normally, client would call
> virConnectDomainEventDeregisterAny to
> clear eventCallbacks. Then libvirt daemon will call
> remoteDispatchConnectDomainEventDeregisterAny
> -> remoteEventCallbackFree -> virObjectUnref(callback->client) to free
> eventCallbacks and unRef client.
> just think if client don't call virConnectDomainEventDeregisterAny and
> be killed. then there is no way to call call
> remoteDispatchConnectDomainEventDeregisterAny -> remoteEventCallbackFree
> -> virObjectUnref(callback->client)
> so client's ref can't down to zero. so remoteClientFreeFunc can't be
> called. then client object memory leak.
So the client/sample code is not calling Deregister because it has no
"handling" of an abnormal exit. That is no signal handler to call
Deregister in the event some outside force kills the client/sample code
while it's processing.
So a different solution would be to add signal handling to the code in
order to handle the kill and make the Deregister (and close) calls.
signal handler
can handle such as kill -15 signal, But can't handle kill
-9 signal.
so we also need to resolve it in libvirt
> The difference is call
remoteDispatchConnectDomainEventDeregisterAny or
> don't call remoteDispatchConnectDomainEventDeregisterAny
> to Unref(callback->client).
>
IIUC, the whole reason the Deregister code is in the Free Func is to
"handle" the case where that last client reference was Unref'd, but
someone didn't perform the actual Deregister. The logic predates my
libvirt involvement...
Still because of commit id 'fe8f1c8b' where we generate a REF for the
Register and that's transparent to the consumer (e.g. how would they
know they need to ensure that Deregister is called), thus the purpose of
this patch is to find a way to Deregister if it's determined that the
consumer hasn't by the time of the "last" REF we'd have [trying to
write
a commit message here too].
This solution to this problem is to alter the processing to have the
remoteClientCloseFunc handle performing the Deregister calls instead of
the remoteClientFreeFunc because there's no way FreeFunc would be called
unless the Deregister was already called. In fact, if you think about it
if the CloseFunc does the Deregister why would the FreeFunc also include
that call? IOW: It should only need to be in one or the other.
I know you've already posted patch v3; however, I also think you need to
read commit id '8294aa0c'. If you're moving the Deregister handling to
the close func, then I would think that the "reason" for that commit
still applies.
When you generate a v4, my suggestion is to generate 2 patches.
1. Just the remoteClientFreePrivateCallbacks separation including the
sysident handling. From my read of the code, the virConnectClose doesn't
have the same issue.
I am unfamiliarity with sysident. then research on it. It
used for
polkit check
but I still can't confirm it.
2. Move the call to remoteClientFreePrivateCallbacks from FreeFunc
to
CloseFunc with a commit message that explains why this is necessary
since commit id 'fe8f1c8b' (BTW: No need to supply the complete hash, 8
significant digits are usually plenty).
this two advices are very useful. thanks
very much.
I supplied a bit more details w/r/t what I found for REF/UNREF - I
assume it essentially matches what you've found. But having it here
certainly helps - so thanks for providing it along with the example.
> here is my test code
>
> #include<stdio.h>
> #include<pthread.h>
> #include<libvirt/libvirt.h>
>
> int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event,
> int detail, void *opaque);
> void timeOutCallback(int timer, void *opaque);
>
> int run = 1;
>
> int _DomainStatusCallback(virConnectPtr conn, virDomainPtr dom, int event,
> int detail, void *opaque)
> {
> char uuid[128] = { 0 };
> (void) virDomainGetUUIDString(dom, uuid);
> printf("domain %s come a event %d\n", uuid, event);
> }
>
> void _timeOutCallback(int timer, void *opaque)
> {
> printf("time out is comming\n");
> }
>
> void *event_thread(void *opaque)
> {
> int i =0;
> if (virEventRegisterDefaultImpl() < 0)
> {
> return NULL;
> }
>
> virConnectPtr conn = virConnectOpen("qemu:///system");
> if (conn == NULL)
> {
> return NULL;
> }
>
> int callback = virConnectDomainEventRegisterAny(conn, NULL,
> VIR_DOMAIN_EVENT_ID_LIFECYCLE,
> _DomainStatusCallback, NULL, NULL);
> int timeout = virEventAddTimeout(1000,_timeOutCallback,NULL,NULL);
>
> while (run && virConnectIsAlive(conn) == 1)
> {
> virEventRunDefaultImpl();
> }
>
> sleep(5);
> virConnectDomainEventDeregisterAny(conn, callback);
> virConnectClose(conn);
> }
>
>
> int main(int argc, char *argv[])
> {
> pthread_t pid;
> int ret = 0;
> ret = pthread_create(&pid, NULL, event_thread, NULL);
> sleep(100);
> run = 0;
> sleep(10);
> }
>
> when I don't call remoteClientFreePrivateCallbacks at
> remoteClientCloseFunc,
> here is client refs change when I kill the client.
> (gdb) c
> Continuing.
> Hardware watchpoint 6: *(int *) 0x5637edc888f4
>
> Old value = 4
> New value = 5
> virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at util/virobject.c:296
> 296 PROBE(OBJECT_REF, "obj=%p", obj);
> (gdb) bt
> #0 virObjectRef (anyobj=anyobj@entry=0x5637edc888f0) at
> util/virobject.c:296
> #1 0x00005637ec61d643 in
> remoteDispatchConnectDomainEventCallbackRegisterAny (server=<optimized
> out>, msg=<optimized out>,
> ret=0x7f054c0008e0, args=0x7f054c0008c0, rerr=0x7f057ba6cc90,
> client=0x5637edc888f0) at remote.c:4418
Although the line numbers are different with top of tree, this is the
RegisterAny REF on the client...
[...]
So that says to me we generally run with 3-4 REFs (without any Register
Event occurring). AFAICT, the following is the list:
1. Adding to the srv->clients (virNetServerAddClient)
2. Adding IO Callback (virNetServerClientRegisterEvent)
3. Adding Keep Alive (virNetServerClientInitKeepAlive and
virKeepAliveStart)
The 4th is a bit trickier, but I believe it's REF'd during the call to
virNetServerClientDispatchRead...
> Old value = 5
> New value = 4
> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
> Continuing.
> [Switching to Thread 0x7f058eb1f840 (LWP 3878)]
> Hardware watchpoint 9: *(int *) 0x5637edc88634
I'm assuming this comes from the opposite end of the DispatchRead -
there's no trace here to prove it, but that's not necessarily surprising
either as it's not easy to find where the Unref would naturally occur
without an error path occurring....
> Old value = 4
> New value = 5
> virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296
> 296 PROBE(OBJECT_REF, "obj=%p", obj);
> #0 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at
> util/virobject.c:296
> #1 0x00005637ec65299d in virNetServerClientClose
> (client=0x5637edc88630) at rpc/virnetserverclient.c:978
[...]
> Old value = 5
> New value = 4
> virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259
> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
> #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at
> util/virobject.c:259
> #1 0x00005637ec6529bd in virNetServerClientClose
> (client=0x5637edc88630) at rpc/virnetserverclient.c:982
This pair of REF/UNREF in virNetServerClientClose when client->keepalive
is set.
[...]
> Old value = 4
> New value = 5
> virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:296
> 296 PROBE(OBJECT_REF, "obj=%p", obj);
> #0 virObjectRef (anyobj=anyobj@entry=0x5637edc88630) at
> util/virobject.c:296
> #1 0x00005637ec6529d1 in virNetServerClientClose
> (client=0x5637edc88630) at rpc/virnetserverclient.c:987
[...]
> Old value = 5
> New value = 4
> virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259
> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
> #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at
> util/virobject.c:259
> #1 0x00005637ec6529ee in virNetServerClientClose
> (client=0x5637edc88630) at rpc/virnetserverclient.c:991
This pair of REF/UNREF in virNetServerClientClose when
client->privateDataCloseFunc is set.
[...]
> Old value = 4
> New value = 3
> virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at util/virobject.c:259
> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
> #0 virObjectUnref (anyobj=anyobj@entry=0x5637edc88630) at
> util/virobject.c:259
> #1 0x00005637ec650705 in virNetServerProcessClients
> (srv=0x5637edc6bf10) at rpc/virnetserver.c:873
Again my line numbers are off a bit, but this would be the for the UNREF
after VIR_DELETE_ELEMENT when removed from srv->clients
[...]
> Old value = 3
> New value = 2
> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
> #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
> #1 0x00007f058de4199b in virObjectUnref (anyobj=<optimized out>) at
> util/virobject.c:265
> #2 0x00007f058de0db34 in virEventPollCleanupTimeouts () at
> util/vireventpoll.c:543
[...]
This one is the opposite end of the virObjectRef done for the keep alive
timer. See virNetServerClientInitKeepAlive and virKeepAliveStart usage
of virEventAddTimeout.
> Old value = 2
> New value = 1
> virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
> 259 PROBE(OBJECT_UNREF, "obj=%p", obj);
> #0 virObjectUnref (anyobj=0x5637edc88630) at util/virobject.c:259
> #1 0x00005637ec656bee in virNetSocketEventFree
> (opaque=opaque@entry=0x5637edc8a5b0) at rpc/virnetsocket.c:2152
> #2 0x00007f058de0dd34 in virEventPollCleanupHandles () at
> util/vireventpoll.c:592
[...]
This one is the opposite end of the virObjectRef done in
virNetServerClientRegisterEvent prior to virNetSocketAddIOCallback which
calls virEventAddHandle.
Theoretically the "last" REF...
John
> Continuing.
>
> so the client refs alway have 1 when I register one event callbacks.
>>> this patch will deRegister all event callbacks when client was close.
>>> ---
>> FWIW: This commit message was a bit difficult to follow - I know it's
>> primarily a language barrier thing - so maybe if you just construct some
>> function code flow for both initialization/open and destruction/close
>> it'd help... Paths that work and the path that you think is broken...
>> That can be done in this section after the "---" and before the
changed
>> API's.
>>
>> I assume you've done a lot of research, but the details of that research
>> can be difficult to just "pick up on". Typically someone will provide
>> some sort of valgrind output as an example.
>>
>> I just have a sense that this could boil down to a path (or two) that
>> are doing the virObjectRef without the expected virObjectUnref occurring
>> - something perhaps to do with how @wantClose is handled in the abnormal
>> termination case. Nothing jumps out at me though even after looking at
>> the code for a while.
>>
>>
>>> daemon/remote.c | 46 ++++++++++++++++++++++++++++------------------
>>> 1 file changed, 28 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/daemon/remote.c b/daemon/remote.c
>>> index cbcb6e8..466b2e7 100644
>>> --- a/daemon/remote.c
>>> +++ b/daemon/remote.c
>>> @@ -1689,23 +1689,10 @@ void
>>> remoteRelayConnectionClosedEvent(virConnectPtr conn ATTRIBUTE_UNUSED,
>>> int r
>>> neventCallbacks = 0; \
>>> } while (0);
>>> -/*
>>> - * You must hold lock for at least the client
>>> - * We don't free stuff here, merely disconnect the client's
>>> - * network socket & resources.
>>> - * We keep the libvirt connection open until any async
>>> - * jobs have finished, then clean it up elsewhere
>>> - */
>>> -void remoteClientFreeFunc(void *data)
>>> +static void
>>> +remoteClientFreePrivateCallbacks(struct daemonClientPrivate *priv)
>>> {
>>> - struct daemonClientPrivate *priv = data;
>>> -
>>> - /* Deregister event delivery callback */
>>> - if (priv->conn) {
>>> - virIdentityPtr sysident = virIdentityGetSystem();
>>> -
>>> - virIdentitySetCurrent(sysident);
>>> -
>>> + if (priv && priv->conn) {
>> Well each of the callers has already dereferenced @priv so it's a bit
>> too late to check it now!
> yes, I also think this check is too late. this placement can only check
> priv->conn.
>>> DEREG_CB(priv->conn, priv->domainEventCallbacks,
>>> priv->ndomainEventCallbacks,
>>> virConnectDomainEventDeregisterAny, "domain");
>>> @@ -1730,16 +1717,38 @@ void remoteClientFreeFunc(void *data)
>>>
>>> remoteRelayConnectionClosedEvent) < 0)
>>> VIR_WARN("unexpected close callback event
>>> deregister failure");
>>> }
>>> + }
>>> +}
>>> +#undef DEREG_CB
>>> +
>> There should be another blank line here.
> OK, I lost a blank line.
>>> +/*
>>> + * You must hold lock for at least the client
>>> + * We don't free stuff here, merely disconnect the client's
>>> + * network socket & resources.
>>> + * We keep the libvirt connection open until any async
>>> + * jobs have finished, then clean it up elsewhere
>>> + */
>>> +void remoteClientFreeFunc(void *data)
>>> +{
>>> + struct daemonClientPrivate *priv = data;
>>> +
>>> + /* Deregister event delivery callback */
>>> + if (priv->conn) {
>>> + virIdentityPtr sysident = virIdentityGetSystem();
>>> +
>>> + virIdentitySetCurrent(sysident);
>>> +
>>> + remoteClientFreePrivateCallbacks(priv);
>>> virConnectClose(priv->conn);
>>> virIdentitySetCurrent(NULL);
>>> virObjectUnref(sysident);
>>> - }
>>> + }
>>> VIR_FREE(priv);
>>> +
>>> }
>>> -#undef DEREG_CB
>>> static void remoteClientCloseFunc(virNetServerClientPtr client)
>>> @@ -1747,6 +1756,7 @@ static void
>>> remoteClientCloseFunc(virNetServerClientPtr client)
>>> struct daemonClientPrivate *priv =
>>> virNetServerClientGetPrivateData(client);
>>> daemonRemoveAllClientStreams(priv->streams);
>>> + remoteClientFreePrivateCallbacks(priv);
>> Perhaps this just gets a "if (priv->conn)" before calling...
>>
>> Still if as I assume the missing Unref is found (someday), then
>> theoretically remoteClientFreeFunc would be called - of course the
>> nEventCallbacks = 0 will ensure the Dereg's dont' get called again. So
>> it shoudn't be a problem
> In my opinion, it is safe to check point is Null. But check everywhere
> is stupid.
> so it is hard to determine where need check.:-)
>> John
>>
>>> }
>>>
>> .
>>
>
.