Michal,
please check my comments in the following.
>On 07/21/2017 08:20 AM, liu.yunh@zte.com.cn wrote:
>> Hi Michal,
>>
>> This problem is triggerred by libvirt python's example event-test.py. the original examples has resouce leak issue
>>
>> at the remove_handle and remove_timer.
>>
>> with "python -u event-test.py" run this example and "systemctl restart libvirtd.service" will trigger resource leak problem.
>>
>> with lsof -p <event-test.pid> can see socket handler's number increased , after restart libvirtd.serivce each time.
>
>This is interesting. When I try this out, the python script just gets
>disconnected and never connects back. So I don't see any number (FD)
>getting increased.
>
we are evaluating the event driven framework in the event-test.py example. because it's only illurstrate one shot connection, we modify virEventLoopPureStart with a loop
to allow the thread reconnect to libvirtd.service. the original example as you seen, once the libvirtd.service was restart, the example stop running. so you also need do little modification
to allow the thread reconnect to libvirtd.service. the resource leak problem will be seen after the modification.
>>
>> the reason is remove_handle and remove_timer do not return the remove handle information to libvirt-python's framework.
>>
>> little patch was apply to this example, to fix this problem.
>>
>> Now, run this example again and restart libvirtd.service , call sequence virNetSocketRemoveIOCallback->virNetSocketEventFree
>>
>> can be observed , the no-recursive mutex, lock with recursive issue can be seen.
>
>Recursive mutexes are usually sign of bad design. Anyway in this case,
>one lock should be held by thread doing virNetSocketRemoveIOCallback().
>The other (event loop) should be trying to lock the socket lock from
>virNetSocketEventFree(). Since these are two different threads, each one
>of them is modifying the state of the socket we have to have them use
>the lock.
>
>>
>> you can check the detail stack trace and our comments about the lock's issue in function virNetSocketEventFree in the following.
>>
>>
>>
>>
>> ====================================================================
>>
>> def add_timer(self, interval, cb, opaque):
>>
>> timerID = self.nextTimerID + 1
>>
>> self.nextTimerID = self.nextTimerID + 1
>>
>>
>>
>>
>> h = self.virEventLoopPureTimer(timerID, interval, cb, opaque)
>>
>> self.timers.append(h)
>>
>> - self.timers_opaque[timerID] = opaque
>>
>> self.interrupt()
>>
>>
>>
>>
>> debug("Add timer %d interval %d" % (timerID, interval))
>>
>>
>>
>>
>> return timerID
>>
>>
>>
>>
>>
>>
>>
>> def remove_handle(self, handleID):
>>
>> handles = []
>>
>> - opaque = None
>>
>> for h in self.handles:
>>
>> if h.get_id() == handleID:
>>
>> self.poll.unregister(h.get_fd())
>>
>> - opaque = self.opaques[handleID]
>>
>> - del self.opaques[handleID]
>>
>> debug("Remove handle %d fd %d" % (handleID, h.get_fd()))
>>
>> else:
>>
>> handles.append(h)
>>
>> self.handles = handles
>>
>> self.interrupt()
>>
>> - return opaque
>>
>>
>>
>>
>> # Stop firing the periodic timer
>>
>> def remove_timer(self, timerID):
>>
>> timers = []
>>
>> - opaque = None
>>
>> for h in self.timers:
>>
>> if h.get_id() != timerID:
>>
>> timers.append(h)
>>
>> - else:
>>
>> - opaque = self.timers_opaque[timerID]
>>
>> debug("Remove timer %d" % timerID)
>>
>> self.timers = timers
>>
>> self.interrupt()
>>
>> - return opaque
>
>
>I don't see this code anywhere and thus cannot perform the changes
>you've suggested. Sorry. Is this current git HEAD?
>
>>
>> ====================================================================================
>>
>>
>>
>>
>>
>>>> On 07/15/2017 05:00 PM, Peng Hao wrote:
>>
>>>> virNetSocketRemoveIOCallback get sock's ObjectLock and will call
>>
>>>> virNetSocketEventFree. virNetSocketEventFree may be free sock
>>
>>>> object and virNetSocketRemoveIOCallback will access a null pointer
>>
>>>> in release sock's ObjectLock.
>>
>>>>
>>
>>>> Signed-off-by: Liu Yun <liu.yunh@zte.com.cn>
>>
>>>> Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
>>
>>>> ---
>>
>>>> src/rpc/virnetsocket.c | 6 +++---
>>
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>>>>
>>
>>>
>>
>>> I don't think this can work.
>>
>>>
>>
>>>> diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
>>
>>>> index d228c8a..8b550e8 100644
>>
>>>> --- a/src/rpc/virnetsocket.c
>>
>>>> +++ b/src/rpc/virnetsocket.c
>>
>>>> @@ -2140,14 +2140,12 @@ static void virNetSocketEventFree(void *opaque)
>>
>>>> virFreeCallback ff
>>
>>>> void *eopaque
>>
>>>>
>>
>>>> - virObjectLock(sock)
>>
>>>> ff = sock->ff
>>
>>>> eopaque = sock->opaque
>>
>>>> sock->func = NULL
>>
>>>> sock->ff = NULL
>>
>>>> sock->opaque = NULL
>>
>>>> - virObjectUnlock(sock)
>>
>>>
>>
>>> I think we need the lock here. This function is called from the event
>>
>>> loop thread. So even if virNetSocketUpdateIOCallback() locks the @socket
>>
>>> this code can see it unlocked. Or locked. But the crucial part is it's
>>
>>> modifying the object and thus should have lock held.
>>
>>>
>>
>> I have check the code , in default implementation of eventPoll, virEventPollRunOnce always dispatch and clear in one thread loop,
>>
>> so, the lock in the virNetSocketEventFree may be unnessary.
>>
>>
>>
>>
>>>> -
>>
>>>> +
>>
>>>> if (ff)
>>
>>>> ff(eopaque)
>>
>>>>
>>
>>>> @@ -2207,6 +2205,7 @@ void virNetSocketUpdateIOCallback(virNetSocketPtr sock,
>>
>>>>
>>
>>>> void virNetSocketRemoveIOCallback(virNetSocketPtr sock)
>>
>>>> {
>>
>>>> + virObjectRef(sock)
>>
>>
>>
>>
>> This should be mistake when generate the patch. The correct one is
>>
>> + virObjectUnref(sock)
>>
>>>> virObjectLock(sock)
>>
>>>
>>
>>> I think this is what actually fixes your problem. However, I also think
>>
>>> it introduces uneven ratio of ref:unref calls.
>>
>>>
>>
>>>>
>>
>>>> if (sock->watch < 0) {
>>
>>>> @@ -2220,6 +2219,7 @@ void virNetSocketRemoveIOCallback(virNetSocketPtr sock)
>>
>>>> sock->watch = -1
>>
>>>>
>>
>>>> virObjectUnlock(sock)
>>
>>>> + virObjectRef(sock)
>>
>>
>>
>>
>>
>> This should be mistake when generate the patch. The correct one is
>>
>>
>> + virObjectUnref(sock)
>>
>>>
>>
>>> It definitely does so because you ref twice. Anyway, do you perhaps have
>>
>>> a backtrace to share?
>>
>
>Can you provide the output of 't a a bt'? I wonder if this is the only
>thread (and thus something left socket locked) or we have some deadlock
>here.
following is the all threads bactrace information. it show only one thread to accquire the lock.
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ff3c6362740 (LWP 20078) sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
* 1 Thread 0x7ff3b48d4700 (LWP 20081) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ff3c6362740 (LWP 20078))]
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
85 movq %rax, %rcx
(gdb) bt
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x00007ff3c5eb17b5 in PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#2 0x00007ff3c5e808a6 in PyEval_RestoreThread () from /lib64/libpython2.7.so.1.0
#3 0x00007ff3b50fe086 in time_sleep () from /usr/lib64/python2.7/lib-dynload/timemodule.so
#4 0x00007ff3c5e85aa4 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#5 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#6 0x00007ff3c5e8576f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#8 0x00007ff3c5e871c2 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#9 0x00007ff3c5ea05ff in run_mod () from /lib64/libpython2.7.so.1.0
#10 0x00007ff3c5ea17be in PyRun_FileExFlags () from /lib64/libpython2.7.so.1.0
#11 0x00007ff3c5ea2a49 in PyRun_SimpleFileExFlags () from /lib64/libpython2.7.so.1.0
#12 0x00007ff3c5eb3b9f in Py_Main () from /lib64/libpython2.7.so.1.0
#13 0x00007ff3c50e0b15 in __libc_start_main (main=0x4006f0 <main>, argc=3, ubp_av=0x7ffca58ef528, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca58ef518)
at libc-start.c:274
#14 0x0000000000400721 in _start ()
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ff3b48d4700 (LWP 20081))]
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 2: movl %edx, %eax
(gdb) bt
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007ff3c5b92d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007ff3c5b92c08 in __GI___pthread_mutex_lock (mutex=mutex@entry=0xa6b9c0) at pthread_mutex_lock.c:64
#3 0x00007ff3be351e15 in virMutexLock (m=m@entry=0xa6b9c0) at util/virthread.c:89
#4 0x00007ff3be3338ae in virObjectLock (anyobj=anyobj@entry=0xa6b9b0) at util/virobject.c:323
#5 0x00007ff3be47a52c in virNetSocketEventFree (opaque=0xa6b9b0) at rpc/virnetsocket.c:2134
#6 0x00007ff3be82af87 in libvirt_virEventRemoveHandleFunc (watch=<optimized out>) at libvirt-override.c:5496
#7 0x00007ff3be47dc69 in virNetSocketRemoveIOCallback (sock=0xa6b9b0) at rpc/virnetsocket.c:2212
#8 0x00007ff3be469d76 in virNetClientMarkClose (client=0xa6bcb0, reason=0) at rpc/virnetclient.c:779
#9 0x00007ff3be46a0eb in virNetClientIncomingEvent (sock=0xa6b9b0, events=9, opaque=0xa6bcb0) at rpc/virnetclient.c:1985
#10 0x00007ff3be81e347 in libvirt_virEventInvokeHandleCallback (self=<optimized out>, args=<optimized out>) at libvirt-override.c:5718
#11 0x00007ff3c5e85aa4 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#12 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#13 0x00007ff3c5e8576f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#14 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#15 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#16 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#17 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#18 0x00007ff3c5e1405d in function_call () from /lib64/libpython2.7.so.1.0
#19 0x00007ff3c5def0b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#20 0x00007ff3c5e822f7 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#21 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#23 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#24 0x00007ff3c5e13f68 in function_call () from /lib64/libpython2.7.so.1.0
#25 0x00007ff3c5def0b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#26 0x00007ff3c5dfe0a5 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#27 0x00007ff3c5def0b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#28 0x00007ff3c5e80f07 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#29 0x00007ff3c5eb5842 in t_bootstrap () from /lib64/libpython2.7.so.1.0
#30 0x00007ff3c5b90dc5 in start_thread (arg=0x7ff3b48d4700) at pthread_create.c:308
#31 0x00007ff3c51b521d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb)
>
>>
>>
>> #0 __lll_lock_wait ()
>>
>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>>
>> #1 0x00007fde6207cd02 in _L_lock_791 () from /lib64/libpthread.so.0
>>
>> #2 0x00007fde6207cc08 in __GI___pthread_mutex_lock (
>>
>> mutex=mutex@entry=0x119c3e0) at pthread_mutex_lock.c:64
>>
>> #3 0x00007fde5a97ee15 in virMutexLock (m=m@entry=0x119c3e0)
>>
>> at util/virthread.c:89
>>
>> #4 0x00007fde5a9608ae in virObjectLock (anyobj=anyobj@entry=0x119c3d0)
>>
>> at util/virobject.c:323
>>
>> #5 0x00007fde5aaa752c in virNetSocketEventFree (opaque=0x119c3d0)
>>
>> at rpc/virnetsocket.c:2134
>>
>> #6 0x00007fde5ae57f87 in libvirt_virEventRemoveHandleFunc (
>>
>> watch=<optimized out>) at libvirt-override.c:5496
>>
>> #7 0x00007fde5aaaac69 in virNetSocketRemoveIOCallback (sock=0x119c3d0)
>>
>> at rpc/virnetsocket.c:2212
>>
>> #8 0x00007fde5aa96d76 in virNetClientMarkClose (client=0x119c650, reason=0)
>>
>> at rpc/virnetclient.c:779
>>
>> #9 0x00007fde5aa970eb in virNetClientIncomingEvent (sock=0x119c3d0, events=9,
>>
>> opaque=0x119c650) at rpc/virnetclient.c:1985
>>
>> #10 0x00007fde5ae4b347 in libvirt_virEventInvokeHandleCallback (
>>
>> self=<optimized out>, args=<optimized out>) at libvirt-override.c:5718
>>
>> #11 0x00007fde6236faa4 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
>>
>> #12 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
>>
>> ---Type <return> to continue, or q <return> to quit---
>>
>> #13 0x00007fde6236f76f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
>>
>> #14 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
>>
>> #15 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
>>
>> #16 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
>>
>> #17 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
>>
>> #18 0x00007fde622fe05d in function_call () from /lib64/libpython2.7.so.1.0
>>
>> #19 0x00007fde622d90b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
>>
>> #20 0x00007fde6236c2f7 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
>>
>> #21 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
>>
>> #22 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
>>
>> #23 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
>>
>> #24 0x00007fde622fdf68 in function_call () from /lib64/libpython2.7.so.1.0
>>
>> #25 0x00007fde622d90b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
>>
>> #26 0x00007fde622e80a5 in instancemethod_call ()
>>
>>
>
>Michal