Hi all:
I encountered with an AB deadlock and libvirtd crash problem the other day.
The process to generate the problems:
1 use the command "virsh create ***.xml" to create a vm
2 after vm is running , use the command "virsh console ***" to connect vm with
console
3 after connection, use the command "virsh destroy ****" to destroy vm and
delete it
Then, an AB lock problem occurs, or libvirtd got crashed.
AB LOCK stack:
[Switching to thread 1 (Thread 0x7ff96bf3d7a0 (LWP 9772))]
#0 0x00007ff967b49324 in __lll_lock_wait () from /lib64/libpthread.so.0
#0 0x00007ff967b49324 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ff967b44669 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007ff967b4447e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007ff96b3bd96c in virChrdevFDStreamCloseCb () from /usr/lib64/libvirt.so.0
#4 0x00007ff96b3c9a34 in virFDStreamCloseInt () from /usr/lib64/libvirt.so.0
#5 0x00007ff96b40ac3e in virStreamAbort () from /usr/lib64/libvirt.so.0
#6 0x00007ff96bfa242a in daemonStreamHandleAbort ()
#7 0x00007ff96bfa2803 in daemonStreamEvent ()
#8 0x00007ff96b3c9bfc in virFDStreamEvent () from /usr/lib64/libvirt.so.0
#9 0x00007ff96b3083aa in virEventPollRunOnce () from /usr/lib64/libvirt.so.0
#10 0x00007ff96b307042 in virEventRunDefaultImpl () from /usr/lib64/libvirt.so.0
#11 0x00007ff96b4503dd in virNetDaemonRun () from /usr/lib64/libvirt.so.0
#12 0x00007ff96bf72528 in main ()
[Switching to thread 12 (Thread 0x7ff9640ff700 (LWP 9778))]
#0 0x00007ff967b49324 in __lll_lock_wait () from /lib64/libpthread.so.0
#0 0x00007ff967b49324 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ff967b44669 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007ff967b4447e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007ff96b3c8f26 in virFDStreamSetInternalCloseCb () from /usr/lib64/libvirt.so.0
#4 0x00007ff96b30f7b9 in virHashForEach () from /usr/lib64/libvirt.so.0
#5 0x00007ff96b3be1a2 in virChrdevFree () from /usr/lib64/libvirt.so.0
#6 0x00007ff96055980f in qemuDomainObjPrivateFree () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#7 0x00007ff96b367014 in virDomainObjDispose () from /usr/lib64/libvirt.so.0
#8 0x00007ff96b330c33 in virObjectUnref () from /usr/lib64/libvirt.so.0
#9 0x00007ff96b35edb9 in virDomainObjEndAPI () from /usr/lib64/libvirt.so.0
#10 0x00007ff9605c6e89 in qemuDomainUndefineFlags () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#11 0x00007ff96b3e35ac in virDomainUndefine () from /usr/lib64/libvirt.so.0
#12 0x00007ff96bf996db in remoteDispatchDomainUndefineHelper ()
#13 0x00007ff96b4537ef in virNetServerProgramDispatch () from /usr/lib64/libvirt.so.0
#14 0x00007ff96b45205e in virNetServerProcessMsg () from /usr/lib64/libvirt.so.0
#15 0x00007ff96b4520e8 in virNetServerHandleJob () from /usr/lib64/libvirt.so.0
#16 0x00007ff96b349f94 in virThreadPoolWorker () from /usr/lib64/libvirt.so.0
#17 0x00007ff96b3494a8 in virThreadHelper () from /usr/lib64/libvirt.so.0
#18 0x00007ff967b42806 in start_thread () from /lib64/libpthread.so.0
#19 0x00007ff96789d67d in clone () from /lib64/libc.so.6
#20 0x0000000000000000 in ?? ()
coredump stack
(gdb) f 1
#1 0x00007f1672a7c73f in virMutexLock (m=0x68) at util/virthread.c:89
89 pthread_mutex_lock(&m->lock);
(gdb) bt
#0 0x00007f1670ace444 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1 0x00007f1672a7c73f in virMutexLock (m=0x68) at util/virthread.c:89
#2 0x00007f1672b2fda2 in virFDStreamSetInternalCloseCb (st=0x7f1673a1a370, cb=0x0,
opaque=0x0, fcb=0x0) at fdstream.c:796
#3 0x00007f1672b1f5ad in virChrdevFreeClearCallbacks (payload=0x7f1673a1a370,
name=0x7f1673a12520, data=0x0) at conf/virchrdev.c:301
#4 0x00007f1672a36c9b in virHashForEach (table=0x7f1673a0ba20, iter=0x7f1672b1f567
<virChrdevFreeClearCallbacks>, data=0x0) at util/virhash.c:521
#5 0x00007f1672b1f626 in virChrdevFree (devs=0x7f1673a2f820) at conf/virchrdev.c:316
#6 0x00007f1669cca988 in qemuDomainObjPrivateFree (data=0x7f1673a1afa0) at
qemu/qemu_domain.c:496
#7 0x00007f1672a9e293 in virDomainObjDispose (obj=0x7f1673a16400) at
conf/domain_conf.c:2545
#8 0x00007f1672a5b326 in virObjectUnref (anyobj=0x7f1673a16400) at util/virobject.c:265
#9 0x00007f1672a9e7a5 in virDomainObjEndAPI (vm=0x7f166b07d8d0) at
conf/domain_conf.c:2684
#10 0x00007f1669d3ff00 in qemuDomainDestroyFlags (dom=0x7f16600017f0, flags=0) at
qemu/qemu_driver.c:2264
#11 0x00007f1669d3ff69 in qemuDomainDestroy (dom=0x7f16600017f0) at
qemu/qemu_driver.c:2273
#12 0x00007f1672b39333 in virDomainDestroy (domain=0x7f16600017f0) at
libvirt-domain.c:483
#13 0x00007f16737167a5 in remoteDispatchDomainDestroy (server=0x7f167398f4f0,
client=0x7f1673a172f0, msg=0x7f1673a2fc70, rerr=0x7f166b07db50, args=0x7f1673a2f9b0)
at remote_dispatch.h:4002
#14 0x00007f1673716648 in remoteDispatchDomainDestroyHelper (server=0x7f167398f4f0,
client=0x7f1673a172f0, msg=0x7f1673a2fc70, rerr=0x7f166b07db50, args=0x7f1673a2f9b0,
ret=
0x7f1673a10e10) at remote_dispatch.h:3977
#15 0x00007f1672bd8a9b in virNetServerProgramDispatchCall (prog=0x7f167399c610,
server=0x7f167398f4f0, client=0x7f1673a172f0, msg=0x7f1673a2fc70)
at rpc/virnetserverprogram.c:437
#16 0x00007f1672bd85fd in virNetServerProgramDispatch (prog=0x7f167399c610,
server=0x7f167398f4f0, client=0x7f1673a172f0, msg=0x7f1673a2fc70) at
rpc/virnetserverprogram.c:307
#17 0x00007f1672bd2a08 in virNetServerProcessMsg (srv=0x7f167398f4f0,
client=0x7f1673a172f0, prog=0x7f167399c610, msg=0x7f1673a2fc70) at
rpc/virnetserver.c:136
#18 0x00007f1672bd2aed in virNetServerHandleJob (jobOpaque=0x7f1673a1d780,
opaque=0x7f167398f4f0) at rpc/virnetserver.c:157
#19 0x00007f1672a7d833 in virThreadPoolWorker (opaque=0x7f167399bad0) at
util/virthreadpool.c:145
#20 0x00007f1672a7cbec in virThreadHelper (data=0x7f167399c980) at util/virthread.c:206
#21 0x00007f1670acc806 in start_thread () from /lib64/libpthread.so.0
#22 0x00007f167082767d in clone () from /lib64/libc.so.6
#23 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x00007f1670ad3324 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007f1670ace669 in _L_lock_1008 () from /lib64/libpthread.so.0
#2 0x00007f1670ace47e in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007f1672a7c73f in virMutexLock (m=0x7f1673a2f820) at util/virthread.c:89
#4 0x00007f1672b1f3fe in virChrdevFDStreamCloseCb (st=0x7f1673a1a370,
opaque=0x7f1673a149e0) at conf/virchrdev.c:254
#5 0x00007f1672b2e8a8 in virFDStreamCloseInt (st=0x7f1673a1a370, streamAbort=true) at
fdstream.c:329
#6 0x00007f1672b2e9d5 in virFDStreamAbort (st=0x7f1673a1a370) at fdstream.c:355
#7 0x00007f1672b7d856 in virStreamAbort (stream=0x7f1673a1a370) at libvirt-stream.c:663
#8 0x00007f16737497e8 in daemonStreamHandleAbort (client=0x7f1673a1b1b0,
stream=0x7f1673a1ec50, msg=0x7f1673a0d660) at stream.c:613
#9 0x00007f16737499fa in daemonStreamHandleWrite (client=0x7f1673a1b1b0,
stream=0x7f1673a1ec50) at stream.c:662
#10 0x00007f167374866b in daemonStreamEvent (st=0x7f1673a1a370, events=14,
opaque=0x7f1673a1b1b0) at stream.c:138
#11 0x00007f1672b2e17a in virFDStreamEvent (watch=18, fd=17, events=14,
opaque=0x7f1673a1a370) at fdstream.c:173
#12 0x00007f1672a2c2cb in virEventPollDispatchHandles (nfds=12, fds=0x7f1673a2fda0) at
util/vireventpoll.c:509
#13 0x00007f1672a2cb08 in virEventPollRunOnce () at util/vireventpoll.c:658
#14 0x00007f1672a2a9e0 in virEventRunDefaultImpl () at util/virevent.c:308
#15 0x00007f1672bd25d2 in virNetDaemonRun (dmn=0x7f1673990ed0) at rpc/virnetdaemon.c:707
#16 0x00007f167370cad4 in main (argc=3, argv=0x7ffd63276288) at libvirtd.c:1581
The reason of this problem is that fdstream abort event or close event occured at the same
time, libvirtd doesn't deal with the synchronousness well enough.
the flows about fdStream is bellow
1、qemuDomainDefineXMLFlags -> virDomainObjListAdd -> qemuDomainObjPrivateAlloc
-> virChrdevAlloc -> virHashCreate
2、qemuDomainOpenConsole -> virChrdevOpen -> virHashAddEntry(devs->hash, path,
st)
3、virDomainObjDispose -> privateDataFreeFunc (qemuDomainObjPrivateFree) - >
virChrdevFree(*dev locked*) -> virChrdevFreeClearCallbacks - >
virFDStreamSetInternalCloseCb(*fdst locked*)
4、virFDStreamCloseInt (*fdst locked*) -> icbFreeOpaque(virChrdevFDStreamCloseCb (*dev
locked*)) -> virHashRemoveEntry
The AB lock problem is obviouse: in step 3, it locks chardev before fdst, and in step 4,
it's the opsite way.
The reason of libvirtd crash is that: in virFDStreamCloseInt function we set fdst to NULL,
while in virFDStreamSetInternalCloseCb we use fdst->lock, note that fdst has already
been freed.
another crash problem occurs because that when virChrdevFree was earlier finished,
dev->hash freed and all date of hash is freed, but fdstream event flow use fdStream
after hash free. Ahha~~, libvirtd coredump.
All of those problem is because clear vm flow and fdStream flow concurs synchronously.
then I fix this problem by modify virChrdevFree()
void virChrdevFree(virChrdevsPtr devs)
{
if (!devs || !devs->hash)
return;
for (;;) {
virMutexLock(&devs->lock);
if (0 == virHashSize(devs->hash)) {
virMutexUnlock(&devs->lock);
break;
}
virMutexUnlock(&devs->lock);
usleep(10 * 1000);
}
virMutexLock(&devs->lock);
virHashFree(devs->hash);
virMutexUnlock(&devs->lock);
virMutexDestroy(&devs->lock);
VIR_FREE(devs);
}
If the chardev is removed by fdStream close or fdStream abort when vm destroy or shutdown,
the modification works well. But I'm not sure is that all chardev would be removed
when we clear vm, if not, it would always sleep here.
Another solution is as follows:
virMutexLock(vm);
virChrdevFree();
virMutexUnlock(vm);
virMutexLock(vm);
virFDStreamCloseInt();
virMutexUnlock(vm);
I lock vm before calling these 2 functions, which makes them run sequently.
Do you have other better idea to solve this problem. thanks in advance.
Thanks in advance.
Wei Fuqiang