On 07/21/2017 08:20 AM, liu.yunh(a)zte.com.cn wrote:
Hi Michal,
This problem is triggerred by libvirt python's example event-test.py. the
original examples has resouce leak issue
at the remove_handle and remove_timer.
with "python -u event-test.py" run this example and "systemctl restart
libvirtd.service" will trigger resource leak problem.
with lsof -p <event-test.pid> can see socket handler's number increased , after
restart libvirtd.serivce each time.
This is interesting. When I try this out, the python script just gets
disconnected and never connects back. So I don't see any number (FD)
getting increased.
the reason is remove_handle and remove_timer do not return the remove handle
information to libvirt-python's framework.
little patch was apply to this example, to fix this problem.
Now, run this example again and restart libvirtd.service , call sequence
virNetSocketRemoveIOCallback->virNetSocketEventFree
can be observed , the no-recursive mutex, lock with recursive issue can be seen.
Recursive mutexes are usually sign of bad design. Anyway in this case,
one lock should be held by thread doing virNetSocketRemoveIOCallback().
The other (event loop) should be trying to lock the socket lock from
virNetSocketEventFree(). Since these are two different threads, each one
of them is modifying the state of the socket we have to have them use
the lock.
you can check the detail stack trace and our comments about the lock's issue in
function virNetSocketEventFree in the following.
====================================================================
def add_timer(self, interval, cb, opaque):
timerID = self.nextTimerID + 1
self.nextTimerID = self.nextTimerID + 1
h = self.virEventLoopPureTimer(timerID, interval, cb, opaque)
self.timers.append(h)
- self.timers_opaque[timerID] = opaque
self.interrupt()
debug("Add timer %d interval %d" % (timerID, interval))
return timerID
def remove_handle(self, handleID):
handles = []
- opaque = None
for h in self.handles:
if h.get_id() == handleID:
self.poll.unregister(h.get_fd())
- opaque = self.opaques[handleID]
- del self.opaques[handleID]
debug("Remove handle %d fd %d" % (handleID, h.get_fd()))
else:
handles.append(h)
self.handles = handles
self.interrupt()
- return opaque
# Stop firing the periodic timer
def remove_timer(self, timerID):
timers = []
- opaque = None
for h in self.timers:
if h.get_id() != timerID:
timers.append(h)
- else:
- opaque = self.timers_opaque[timerID]
debug("Remove timer %d" % timerID)
self.timers = timers
self.interrupt()
- return opaque
I don't see this code anywhere and thus cannot perform the changes
you've suggested. Sorry. Is this current git HEAD?
====================================================================================
>> On 07/15/2017 05:00 PM, Peng Hao wrote:
>> virNetSocketRemoveIOCallback get sock's ObjectLock and will call
>> virNetSocketEventFree. virNetSocketEventFree may be free sock
>> object and virNetSocketRemoveIOCallback will access a null pointer
>> in release sock's ObjectLock.
>>
>> Signed-off-by: Liu Yun <liu.yunh(a)zte.com.cn>
>> Signed-off-by: Peng Hao <peng.hao2(a)zte.com.cn>
>> ---
>> src/rpc/virnetsocket.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>
> I don't think this can work.
>
>> diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
>> index d228c8a..8b550e8 100644
>> --- a/src/rpc/virnetsocket.c
>> +++ b/src/rpc/virnetsocket.c
>> @@ -2140,14 +2140,12 @@ static void virNetSocketEventFree(void *opaque)
>> virFreeCallback ff
>> void *eopaque
>>
>> - virObjectLock(sock)
>> ff = sock->ff
>> eopaque = sock->opaque
>> sock->func = NULL
>> sock->ff = NULL
>> sock->opaque = NULL
>> - virObjectUnlock(sock)
>
> I think we need the lock here. This function is called from the event
> loop thread. So even if virNetSocketUpdateIOCallback() locks the @socket
> this code can see it unlocked. Or locked. But the crucial part is it's
> modifying the object and thus should have lock held.
>
I have check the code , in default implementation of eventPoll, virEventPollRunOnce
always dispatch and clear in one thread loop,
so, the lock in the virNetSocketEventFree may be unnessary.
>> -
>> +
>> if (ff)
>> ff(eopaque)
>>
>> @@ -2207,6 +2205,7 @@ void virNetSocketUpdateIOCallback(virNetSocketPtr sock,
>>
>> void virNetSocketRemoveIOCallback(virNetSocketPtr sock)
>> {
>> + virObjectRef(sock)
This should be mistake when generate the patch. The correct one is
+ virObjectUnref(sock)
>> virObjectLock(sock)
>
> I think this is what actually fixes your problem. However, I also think
> it introduces uneven ratio of ref:unref calls.
>
>>
>> if (sock->watch < 0) {
>> @@ -2220,6 +2219,7 @@ void virNetSocketRemoveIOCallback(virNetSocketPtr sock)
>> sock->watch = -1
>>
>> virObjectUnlock(sock)
>> + virObjectRef(sock)
This should be mistake when generate the patch. The correct one is
+ virObjectUnref(sock)
>
> It definitely does so because you ref twice. Anyway, do you perhaps have
> a backtrace to share?
Can you provide the output of 't a a bt'? I wonder if this is the only
thread (and thus something left socket locked) or we have some deadlock
here.
#0 __lll_lock_wait ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fde6207cd02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007fde6207cc08 in __GI___pthread_mutex_lock (
mutex=mutex@entry=0x119c3e0) at pthread_mutex_lock.c:64
#3 0x00007fde5a97ee15 in virMutexLock (m=m@entry=0x119c3e0)
at util/virthread.c:89
#4 0x00007fde5a9608ae in virObjectLock (anyobj=anyobj@entry=0x119c3d0)
at util/virobject.c:323
#5 0x00007fde5aaa752c in virNetSocketEventFree (opaque=0x119c3d0)
at rpc/virnetsocket.c:2134
#6 0x00007fde5ae57f87 in libvirt_virEventRemoveHandleFunc (
watch=<optimized out>) at libvirt-override.c:5496
#7 0x00007fde5aaaac69 in virNetSocketRemoveIOCallback (sock=0x119c3d0)
at rpc/virnetsocket.c:2212
#8 0x00007fde5aa96d76 in virNetClientMarkClose (client=0x119c650, reason=0)
at rpc/virnetclient.c:779
#9 0x00007fde5aa970eb in virNetClientIncomingEvent (sock=0x119c3d0, events=9,
opaque=0x119c650) at rpc/virnetclient.c:1985
#10 0x00007fde5ae4b347 in libvirt_virEventInvokeHandleCallback (
self=<optimized out>, args=<optimized out>) at libvirt-override.c:5718
#11 0x00007fde6236faa4 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#12 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
---Type <return> to continue, or q <return> to quit---
#13 0x00007fde6236f76f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#14 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#15 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#16 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#17 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#18 0x00007fde622fe05d in function_call () from /lib64/libpython2.7.so.1.0
#19 0x00007fde622d90b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#20 0x00007fde6236c2f7 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#21 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#23 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#24 0x00007fde622fdf68 in function_call () from /lib64/libpython2.7.so.1.0
#25 0x00007fde622d90b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#26 0x00007fde622e80a5 in instancemethod_call ()
Michal