Hi Michal,

    This problem is triggerred by libvirt python's example event-test.py. the original examples has resouce leak issue

at the remove_handle and remove_timer. 

    with "python -u event-test.py" run this example and "systemctl restart libvirtd.service" will trigger resource leak problem.

with lsof -p <event-test.pid> can see socket handler's number increased , after restart libvirtd.serivce each time.

    the reason is remove_handle and remove_timer do not return the remove handle information to libvirt-python's framework. 

little patch was apply to this example, to fix this problem.

   Now, run this example again and restart libvirtd.service , call sequence virNetSocketRemoveIOCallback->virNetSocketEventFree 

can be observed , the no-recursive mutex, lock with recursive issue can be seen. 

    you can check the detail stack trace and our comments about the lock's issue in function virNetSocketEventFree  in the following.


  ====================================================================  

 def add_timer(self, interval, cb, opaque):

        timerID = self.nextTimerID + 1

        self.nextTimerID = self.nextTimerID + 1


        h = self.virEventLoopPureTimer(timerID, interval, cb, opaque)

        self.timers.append(h)

-       self.timers_opaque[timerID] = opaque

        self.interrupt()


        debug("Add timer %d interval %d" % (timerID, interval))


        return timerID



     def remove_handle(self, handleID):

         handles = []

-        opaque = None

         for h in self.handles:

             if h.get_id() == handleID:

                 self.poll.unregister(h.get_fd())

-                opaque = self.opaques[handleID]

-                del self.opaques[handleID]

                 debug("Remove handle %d fd %d" % (handleID, h.get_fd()))

             else:

                 handles.append(h)

         self.handles = handles

         self.interrupt()

-        return opaque


     # Stop firing the periodic timer

     def remove_timer(self, timerID):

         timers = []

-        opaque = None

         for h in self.timers:

             if h.get_id() != timerID:

                 timers.append(h)

-            else:

-                opaque = self.timers_opaque[timerID]

                 debug("Remove timer %d" % timerID)

         self.timers = timers

         self.interrupt()

-        return opaque

====================================================================================


>>On 07/15/2017 05:00 PM, Peng Hao wrote:

>> virNetSocketRemoveIOCallback get sock's ObjectLock and will call

>> virNetSocketEventFree. virNetSocketEventFree may be free sock

>> object and virNetSocketRemoveIOCallback will access a null pointer

>> in release sock's ObjectLock.

>> 

>> Signed-off-by: Liu Yun <liu.yunh@zte.com.cn>

>> Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>

>> ---

>>  src/rpc/virnetsocket.c | 6 +++---

>>  1 file changed, 3 insertions(+), 3 deletions(-)

>> 

>

>I don't think this can work.

>

>> diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c

>> index d228c8a..8b550e8 100644

>> --- a/src/rpc/virnetsocket.c

>> +++ b/src/rpc/virnetsocket.c

>> @@ -2140,14 +2140,12 @@ static void virNetSocketEventFree(void *opaque)

>>      virFreeCallback ff;

>>      void *eopaque;

>>  

>> -    virObjectLock(sock);

>>      ff = sock->ff;

>>      eopaque = sock->opaque;

>>      sock->func = NULL;

>>      sock->ff = NULL;

>>      sock->opaque = NULL;

>> -    virObjectUnlock(sock);

>

>I think we need the lock here. This function is called from the event

>loop thread. So even if virNetSocketUpdateIOCallback() locks the @socket

>this code can see it unlocked. Or locked. But the crucial part is it's

>modifying the object and thus should have lock held.

>

   I have check the code , in default implementation of eventPoll, virEventPollRunOnce always dispatch and clear in one thread loop,

so, the lock in the virNetSocketEventFree may be unnessary.


>> -

>> +  

>>      if (ff)

>>          ff(eopaque);

>>  

>> @@ -2207,6 +2205,7 @@ void virNetSocketUpdateIOCallback(virNetSocketPtr sock,

>>  

>>  void virNetSocketRemoveIOCallback(virNetSocketPtr sock)

>>  {

>> +    virObjectRef(sock);


This should be mistake when generate the patch. The correct one is 

     +    virObjectUnref(sock);

>>      virObjectLock(sock);

>

>I think this is what actually fixes your problem. However, I also think

>it introduces uneven ratio of ref:unref calls.

>

>>  

>>      if (sock->watch < 0) {

>> @@ -2220,6 +2219,7 @@ void virNetSocketRemoveIOCallback(virNetSocketPtr sock)

>>      sock->watch = -1;

>>  

>>      virObjectUnlock(sock);

>> +    virObjectRef(sock);


This should be mistake when generate the patch. The correct one is 

     +    virObjectUnref(sock);

>

>It definitely does so because you ref twice. Anyway, do you perhaps have

>a backtrace to share?

    

#0  __lll_lock_wait ()

    at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

#1  0x00007fde6207cd02 in _L_lock_791 () from /lib64/libpthread.so.0

#2  0x00007fde6207cc08 in __GI___pthread_mutex_lock (

    mutex=mutex@entry=0x119c3e0) at pthread_mutex_lock.c:64

#3  0x00007fde5a97ee15 in virMutexLock (m=m@entry=0x119c3e0)

    at util/virthread.c:89

#4  0x00007fde5a9608ae in virObjectLock (anyobj=anyobj@entry=0x119c3d0)

    at util/virobject.c:323

#5  0x00007fde5aaa752c in virNetSocketEventFree (opaque=0x119c3d0)

    at rpc/virnetsocket.c:2134

#6  0x00007fde5ae57f87 in libvirt_virEventRemoveHandleFunc (

    watch=<optimized out>) at libvirt-override.c:5496

#7  0x00007fde5aaaac69 in virNetSocketRemoveIOCallback (sock=0x119c3d0)

    at rpc/virnetsocket.c:2212

#8  0x00007fde5aa96d76 in virNetClientMarkClose (client=0x119c650, reason=0)

    at rpc/virnetclient.c:779

#9  0x00007fde5aa970eb in virNetClientIncomingEvent (sock=0x119c3d0, events=9,

    opaque=0x119c650) at rpc/virnetclient.c:1985

#10 0x00007fde5ae4b347 in libvirt_virEventInvokeHandleCallback (

    self=<optimized out>, args=<optimized out>) at libvirt-override.c:5718

#11 0x00007fde6236faa4 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0

#12 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0

---Type <return> to continue, or q <return> to quit---

#13 0x00007fde6236f76f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0

#14 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0

#15 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0

#16 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0

#17 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0

#18 0x00007fde622fe05d in function_call () from /lib64/libpython2.7.so.1.0

#19 0x00007fde622d90b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0

#20 0x00007fde6236c2f7 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0

#21 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0

#22 0x00007fde6236f860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0

#23 0x00007fde623710bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0

#24 0x00007fde622fdf68 in function_call () from /lib64/libpython2.7.so.1.0

#25 0x00007fde622d90b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0

#26 0x00007fde622e80a5 in instancemethod_call ()


>

>Michal



BestRegards

  LiuYun

原始邮件