
On 24.07.2013 12:29, Daniel P. Berrange wrote:
On Wed, Jul 24, 2013 at 12:15:32PM +0200, Michal Privoznik wrote:
There's a race in lxc driver causing a deadlock. If a domain is destroyed immediately after started, the deadlock can occur. When domain is started, the even loop tries to connect to the monitor. If the connecting succeeds, virLXCProcessMonitorInitNotify() is called with @mon->client locked. The first thing that callee does, is virObjectLock(vm). So the order of locking is: 1) @mon->client, 2) @vm.
However, if there's another thread executing virDomainDestroy on the very same domain, the first thing done here is locking the @vm. Then, the corresponding libvirt_lxc process is killed and monitor is closed via calling virLXCMonitorClose(). This callee tries to lock @mon->client too. So the order is reversed to the first case. This situation results in deadlock and unresponsive libvirtd (since the eventloop is involved).
The proper solution is to unlock the @vm in virLXCMonitorClose prior entering virNetClientClose(). See the backtrace as follows:
Hmm, I think I'd say that the flaw is in the way virLXCProcessMonitorInitNotify is invoked. In the QEMU driver monitor, we unlock the monitor before invoking any callbacks. In the LXC driver monitor we're invoking the callbacks with the monitor lock held. I think we need to make the LXC monitor locking wrt callbacks do what QEMU does, and unlock the monitor. See QEMU_MONITOR_CALLBACK in qemu_monitor.c
Daniel
I don't think so. It's not the monitor lock what is causing deadlock here. In fact, the monitor is unlocked: Thread 1 (Thread 0x7f35a348e740 (LWP 18839)): #0 0x00007f35a0481714 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f35a047d16c in _L_lock_516 () from /lib64/libpthread.so.0 #2 0x00007f35a047cfbb in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f35a29ab83f in virMutexLock (m=0x7f3588024e80) at util/virthreadpthread.c:85 #4 0x00007f35a2994d62 in virObjectLock (anyobj=0x7f3588024e70) at util/virobject.c:320 #5 0x00007f358ed5bbd7 in virLXCProcessMonitorInitNotify (mon=0x7f3560000ab0, initpid=29062, vm=0x7f3588024e70) at lxc/lxc_process.c:601 #6 0x00007f358ed59fd3 in virLXCMonitorHandleEventInit (prog=0x7f35600087b0, client=0x7f3560001fd0, evdata=0x7f35a53bc1e0, opaque=0x7f3560000ab0) at lxc/lxc_monitor.c:109 #7 0x00007f35a2ad2206 in virNetClientProgramDispatch (prog=0x7f35600087b0, client=0x7f3560001fd0, msg=0x7f3560002038) at rpc/virnetclientprogram.c:259 #8 0x00007f35a2acf0a0 in virNetClientCallDispatchMessage (client=0x7f3560001fd0) at rpc/virnetclient.c:1019 #9 0x00007f35a2acf72b in virNetClientCallDispatch (client=0x7f3560001fd0) at rpc/virnetclient.c:1140 #10 0x00007f35a2acfdb1 in virNetClientIOHandleInput (client=0x7f3560001fd0) at rpc/virnetclient.c:1312 #11 0x00007f35a2ad0fc1 in virNetClientIncomingEvent (sock=0x7f3560008350, events=1, opaque=0x7f3560001fd0) at rpc/virnetclient.c:1832 #12 0x00007f35a2ae6238 in virNetSocketEventHandle (watch=47, fd=40, events=1, opaque=0x7f3560008350) at rpc/virnetsocket.c:1695 #13 0x00007f35a296f33f in virEventPollDispatchHandles (nfds=22, fds=0x7f35a53bc7a0) at util/vireventpoll.c:498 #14 0x00007f35a296fb62 in virEventPollRunOnce () at util/vireventpoll.c:645 #15 0x00007f35a296dad1 in virEventRunDefaultImpl () at util/virevent.c:273 #16 0x00007f35a2ad69ee in virNetServerRun (srv=0x7f35a53b09d0) at rpc/virnetserver.c:1097 #17 0x00007f35a34e5b6b in main (argc=2, argv=0x7fffe188e778) at libvirtd.c:1512 (gdb) up #1 0x00007f35a047d16c in _L_lock_516 () from /lib64/libpthread.so.0 (gdb) up #2 0x00007f35a047cfbb in pthread_mutex_lock () from /lib64/libpthread.so.0 (gdb) #3 0x00007f35a29ab83f in virMutexLock (m=0x7f3588024e80) at util/virthreadpthread.c:85 85 pthread_mutex_lock(&m->lock); (gdb) #4 0x00007f35a2994d62 in virObjectLock (anyobj=0x7f3588024e70) at util/virobject.c:320 320 virMutexLock(&obj->lock); (gdb) #5 0x00007f358ed5bbd7 in virLXCProcessMonitorInitNotify (mon=0x7f3560000ab0, initpid=29062, vm=0x7f3588024e70) at lxc/lxc_process.c:601 601 virObjectLock(vm); (gdb) p *mon $1 = {parent = {parent = {magic = 3405643812, refs = 2, klass = 0x7f3588102cb0}, lock = {lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}}, vm = 0x7f3588024e70, cb = {destroy = 0x0, eofNotify = 0x0, exitNotify = 0x7f358ed5b928 <virLXCProcessMonitorExitNotify>, initNotify = 0x7f358ed5bb8b <virLXCProcessMonitorInitNotify>}, client = 0x7f3560001fd0, program = 0x7f35600087b0} (gdb) Michal