
On 07/22/2017 05:07 AM, liu.yunh@zte.com.cn wrote:
On 07/21/2017 08:20 AM, liu.yunh@zte.com.cn wrote:
Hi Michal,
This problem is triggerred by libvirt python's example event-test.py. the original examples has resouce leak issue
at the remove_handle and remove_timer.
with "python -u event-test.py" run this example and "systemctl restart libvirtd.service" will trigger resource leak problem.
with lsof -p <event-test.pid> can see socket handler's number increased , after restart libvirtd.serivce each time.
This is interesting. When I try this out, the python script just gets
disconnected and never connects back. So I don't see any number (FD)
getting increased.
we are evaluating the event driven framework in the event-test.py example. because it's only illurstrate one shot connection, we modify virEventLoopPureStart with a loop
to allow the thread reconnect to libvirtd.service. the original example as you seen, once the libvirtd.service was restart, the example stop running. so you also need do little modification
to allow the thread reconnect to libvirtd.service. the resource leak problem will be seen after the modification.
Ah, so I couldn't have applied that patch, because there's nothing to apply it to. Is your code available somewhere so that I can do the changes? Also, is it possible that there's a problem in your code?
<trim/>
Can you provide the output of 't a a bt'? I wonder if this is the only
thread (and thus something left socket locked) or we have some deadlock
here.
following is the all threads bactrace information. it show only one thread to accquire the lock.
(gdb) info threads
Id Target Id Frame
2 Thread 0x7ff3c6362740 (LWP 20078) sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
* 1 Thread 0x7ff3b48d4700 (LWP 20081) __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ff3c6362740 (LWP 20078))]
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
85 movq %rax, %rcx
(gdb) bt
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x00007ff3c5eb17b5 in PyThread_acquire_lock () from /lib64/libpython2.7.so.1.0
#2 0x00007ff3c5e808a6 in PyEval_RestoreThread () from /lib64/libpython2.7.so.1.0
#3 0x00007ff3b50fe086 in time_sleep () from /usr/lib64/python2.7/lib-dynload/timemodule.so
#4 0x00007ff3c5e85aa4 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#5 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#6 0x00007ff3c5e8576f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#7 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#8 0x00007ff3c5e871c2 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#9 0x00007ff3c5ea05ff in run_mod () from /lib64/libpython2.7.so.1.0
#10 0x00007ff3c5ea17be in PyRun_FileExFlags () from /lib64/libpython2.7.so.1.0
#11 0x00007ff3c5ea2a49 in PyRun_SimpleFileExFlags () from /lib64/libpython2.7.so.1.0
#12 0x00007ff3c5eb3b9f in Py_Main () from /lib64/libpython2.7.so.1.0
#13 0x00007ff3c50e0b15 in __libc_start_main (main=0x4006f0 <main>, argc=3, ubp_av=0x7ffca58ef528, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffca58ef518)
at libc-start.c:274
#14 0x0000000000400721 in _start ()
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ff3b48d4700 (LWP 20081))]
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 2: movl %edx, %eax
(gdb) bt
#0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007ff3c5b92d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2 0x00007ff3c5b92c08 in __GI___pthread_mutex_lock (mutex=mutex@entry=0xa6b9c0) at pthread_mutex_lock.c:64
#3 0x00007ff3be351e15 in virMutexLock (m=m@entry=0xa6b9c0) at util/virthread.c:89
#4 0x00007ff3be3338ae in virObjectLock (anyobj=anyobj@entry=0xa6b9b0) at util/virobject.c:323
#5 0x00007ff3be47a52c in virNetSocketEventFree (opaque=0xa6b9b0) at rpc/virnetsocket.c:2134
#6 0x00007ff3be82af87 in libvirt_virEventRemoveHandleFunc (watch=<optimized out>) at libvirt-override.c:5496
#7 0x00007ff3be47dc69 in virNetSocketRemoveIOCallback (sock=0xa6b9b0) at rpc/virnetsocket.c:2212
#8 0x00007ff3be469d76 in virNetClientMarkClose (client=0xa6bcb0, reason=0) at rpc/virnetclient.c:779
#9 0x00007ff3be46a0eb in virNetClientIncomingEvent (sock=0xa6b9b0, events=9, opaque=0xa6bcb0) at rpc/virnetclient.c:1985
#10 0x00007ff3be81e347 in libvirt_virEventInvokeHandleCallback (self=<optimized out>, args=<optimized out>) at libvirt-override.c:5718
#11 0x00007ff3c5e85aa4 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#12 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#13 0x00007ff3c5e8576f in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#14 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#15 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#16 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#17 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#18 0x00007ff3c5e1405d in function_call () from /lib64/libpython2.7.so.1.0
#19 0x00007ff3c5def0b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#20 0x00007ff3c5e822f7 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#21 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#22 0x00007ff3c5e85860 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#23 0x00007ff3c5e870bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#24 0x00007ff3c5e13f68 in function_call () from /lib64/libpython2.7.so.1.0
#25 0x00007ff3c5def0b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#26 0x00007ff3c5dfe0a5 in instancemethod_call () from /lib64/libpython2.7.so.1.0
#27 0x00007ff3c5def0b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0
#28 0x00007ff3c5e80f07 in PyEval_CallObjectWithKeywords () from /lib64/libpython2.7.so.1.0
#29 0x00007ff3c5eb5842 in t_bootstrap () from /lib64/libpython2.7.so.1.0
#30 0x00007ff3c5b90dc5 in start_thread (arg=0x7ff3b48d4700) at pthread_create.c:308
#31 0x00007ff3c51b521d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
(gdb)
Okay, so there is no deadlock in sense that two threads holding two locks and fight for the other ones. However, this looks like somebody left the socket locked (e.g. a bug in your code?). Alternatively, socket might have been freed and thus subsequent lock attempt just hangs (as a result of undefined behaviour). Anyway, unless I can reproduce the problem I am hesitant to merge the patch (esp. if it doesn't look right). Or can you provide a small reproducer? Michal