On 24.07.2013 12:29, Daniel P. Berrange wrote:
On Wed, Jul 24, 2013 at 12:15:32PM +0200, Michal Privoznik wrote:
> There's a race in lxc driver causing a deadlock. If a domain is
> destroyed immediately after started, the deadlock can occur. When domain
> is started, the even loop tries to connect to the monitor. If the
> connecting succeeds, virLXCProcessMonitorInitNotify() is called with
> @mon->client locked. The first thing that callee does, is
> virObjectLock(vm). So the order of locking is: 1) @mon->client, 2) @vm.
>
> However, if there's another thread executing virDomainDestroy on the
> very same domain, the first thing done here is locking the @vm. Then,
> the corresponding libvirt_lxc process is killed and monitor is closed
> via calling virLXCMonitorClose(). This callee tries to lock @mon->client
> too. So the order is reversed to the first case. This situation results
> in deadlock and unresponsive libvirtd (since the eventloop is involved).
>
> The proper solution is to unlock the @vm in virLXCMonitorClose prior
> entering virNetClientClose(). See the backtrace as follows:
Hmm, I think I'd say that the flaw is in the way virLXCProcessMonitorInitNotify
is invoked. In the QEMU driver monitor, we unlock the monitor before invoking
any callbacks. In the LXC driver monitor we're invoking the callbacks with
the monitor lock held. I think we need to make the LXC monitor locking wrt
callbacks do what QEMU does, and unlock the monitor. See QEMU_MONITOR_CALLBACK
in qemu_monitor.c
Daniel
I don't think so. It's not the monitor lock what is causing deadlock here. In
fact, the monitor is unlocked:
Thread 1 (Thread 0x7f35a348e740 (LWP 18839)):
#0 0x00007f35a0481714 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f35a047d16c in _L_lock_516 () from /lib64/libpthread.so.0
#2 0x00007f35a047cfbb in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f35a29ab83f in virMutexLock (m=0x7f3588024e80) at util/virthreadpthread.c:85
#4 0x00007f35a2994d62 in virObjectLock (anyobj=0x7f3588024e70) at util/virobject.c:320
#5 0x00007f358ed5bbd7 in virLXCProcessMonitorInitNotify (mon=0x7f3560000ab0,
initpid=29062, vm=0x7f3588024e70) at lxc/lxc_process.c:601
#6 0x00007f358ed59fd3 in virLXCMonitorHandleEventInit (prog=0x7f35600087b0,
client=0x7f3560001fd0, evdata=0x7f35a53bc1e0, opaque=0x7f3560000ab0) at
lxc/lxc_monitor.c:109
#7 0x00007f35a2ad2206 in virNetClientProgramDispatch (prog=0x7f35600087b0,
client=0x7f3560001fd0, msg=0x7f3560002038) at rpc/virnetclientprogram.c:259
#8 0x00007f35a2acf0a0 in virNetClientCallDispatchMessage (client=0x7f3560001fd0) at
rpc/virnetclient.c:1019
#9 0x00007f35a2acf72b in virNetClientCallDispatch (client=0x7f3560001fd0) at
rpc/virnetclient.c:1140
#10 0x00007f35a2acfdb1 in virNetClientIOHandleInput (client=0x7f3560001fd0) at
rpc/virnetclient.c:1312
#11 0x00007f35a2ad0fc1 in virNetClientIncomingEvent (sock=0x7f3560008350, events=1,
opaque=0x7f3560001fd0) at rpc/virnetclient.c:1832
#12 0x00007f35a2ae6238 in virNetSocketEventHandle (watch=47, fd=40, events=1,
opaque=0x7f3560008350) at rpc/virnetsocket.c:1695
#13 0x00007f35a296f33f in virEventPollDispatchHandles (nfds=22, fds=0x7f35a53bc7a0) at
util/vireventpoll.c:498
#14 0x00007f35a296fb62 in virEventPollRunOnce () at util/vireventpoll.c:645
#15 0x00007f35a296dad1 in virEventRunDefaultImpl () at util/virevent.c:273
#16 0x00007f35a2ad69ee in virNetServerRun (srv=0x7f35a53b09d0) at rpc/virnetserver.c:1097
#17 0x00007f35a34e5b6b in main (argc=2, argv=0x7fffe188e778) at libvirtd.c:1512
(gdb) up
#1 0x00007f35a047d16c in _L_lock_516 () from /lib64/libpthread.so.0
(gdb) up
#2 0x00007f35a047cfbb in pthread_mutex_lock () from /lib64/libpthread.so.0
(gdb)
#3 0x00007f35a29ab83f in virMutexLock (m=0x7f3588024e80) at util/virthreadpthread.c:85
85 pthread_mutex_lock(&m->lock);
(gdb)
#4 0x00007f35a2994d62 in virObjectLock (anyobj=0x7f3588024e70) at util/virobject.c:320
320 virMutexLock(&obj->lock);
(gdb)
#5 0x00007f358ed5bbd7 in virLXCProcessMonitorInitNotify (mon=0x7f3560000ab0,
initpid=29062, vm=0x7f3588024e70) at lxc/lxc_process.c:601
601 virObjectLock(vm);
(gdb) p *mon
$1 = {parent = {parent = {magic = 3405643812, refs = 2, klass = 0x7f3588102cb0}, lock =
{lock = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins
= 0, __list = {__prev = 0x0, __next = 0x0}},
__size = '\000' <repeats 39 times>, __align = 0}}}, vm =
0x7f3588024e70, cb = {destroy = 0x0, eofNotify = 0x0, exitNotify = 0x7f358ed5b928
<virLXCProcessMonitorExitNotify>,
initNotify = 0x7f358ed5bb8b <virLXCProcessMonitorInitNotify>}, client =
0x7f3560001fd0, program = 0x7f35600087b0}
(gdb)
Michal