On Fri, May 18, 2018 at 05:30:00PM +0100, Daniel P. Berrangé wrote:
I've just found a deadlock in nwfilter caused by something in
libpcap.
There are 2 VMs running with the nwfilter stress test, but the deadlock
in fact only involves VM VM.
Three threads in libvirtd
Thread 1 (Thread 0x7f3a26f726c0 (LWP 15384)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f3a22ae1cc4 in __GI___pthread_mutex_lock (mutex=0x7f39fc023de0) at
../nptl/pthread_mutex_lock.c:78
#2 0x00007f3a264817f5 in virMutexLock (m=<optimized out>) at util/virthread.c:89
#3 0x00007f3a2645c34b in virObjectLock (anyobj=anyobj@entry=0x7f39fc023dd0) at
util/virobject.c:429
#4 0x00007f39deca2b93 in qemuProcessHandleEvent (mon=<optimized out>,
vm=0x7f39fc023dd0, eventName=0x5601c3940500 "SHUTDOWN",
seconds=1526658178, micros=453867, details=0x5601c3a468b0
"{\"guest\":false}", opaque=0x7f39ac11ba20) at
qemu/qemu_process.c:630
#5 0x00007f39decc2686 in qemuMonitorEmitEvent (mon=mon@entry=0x7f39fc049d70,
event=event@entry=0x5601c3940500 "SHUTDOWN",
seconds=1526658178, micros=453867, details=0x5601c3a468b0
"{\"guest\":false}") at qemu/qemu_monitor.c:1336
#6 0x00007f39decd6cce in qemuMonitorJSONIOProcessEvent (obj=0x5601c391fcc0,
mon=0x7f39fc049d70) at qemu/qemu_monitor_json.c:174
#7 qemuMonitorJSONIOProcessLine (mon=mon@entry=0x7f39fc049d70, line=<optimized
out>, msg=msg@entry=0x0)
at qemu/qemu_monitor_json.c:211
#8 0x00007f39decd6f59 in qemuMonitorJSONIOProcess (mon=mon@entry=0x7f39fc049d70,
data=0x5601c3964670 "{\"timestamp\": {\"seconds\":
1526658178, \"microseconds\": 453867}, \"event\":
\"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n",
len=111, msg=msg@entry=0x0) at qemu/qemu_monitor_json.c:253
#9 0x00007f39decc11fc in qemuMonitorIOProcess (mon=0x7f39fc049d70) at
qemu/qemu_monitor.c:446
#10 qemuMonitorIO (watch=watch@entry=323, fd=<optimized out>, events=0,
events@entry=1, opaque=opaque@entry=0x7f39fc049d70)
at qemu/qemu_monitor.c:706
#11 0x00007f3a26425743 in virEventPollDispatchHandles (fds=0x5601c39a3410,
nfds=<optimized out>) at util/vireventpoll.c:508
#12 virEventPollRunOnce () at util/vireventpoll.c:657
#13 0x00007f3a26423e71 in virEventRunDefaultImpl () at util/virevent.c:327
#14 0x00007f3a26549ff5 in virNetDaemonRun (dmn=0x5601c38cd1b0) at rpc/virnetdaemon.c:850
#15 0x00005601c281544a in main (argc=<optimized out>, argv=<optimized out>) at
remote/remote_daemon.c:1483
Thread 4 (Thread 0x7f3a0e011700 (LWP 15400)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007f3a22ae1d64 in __GI___pthread_mutex_lock (mutex=0x7f39ac1cb3e0) at
../nptl/pthread_mutex_lock.c:113
#2 0x00007f3a264817f5 in virMutexLock (m=<optimized out>) at util/virthread.c:89
---Type <return> to continue, or q <return> to quit---
#3 0x00007f39f518653f in virNWFilterLockIface (ifname=ifname@entry=0x7f39fc025b60
"tck-vm2-if0")
at nwfilter/nwfilter_learnipaddr.c:197
#4 0x00007f39f517aae5 in _virNWFilterTeardownFilter (ifname=0x7f39fc025b60
"tck-vm2-if0") at nwfilter/nwfilter_gentech_driver.c:1021
#5 0x00007f39f517b5fc in virNWFilterTeardownFilter (net=0x7f39fc044f80) at
nwfilter/nwfilter_gentech_driver.c:1039
#6 0x00007f3a264da77c in virDomainConfVMNWFilterTeardown (vm=vm@entry=0x7f39fc023dd0) at
conf/domain_nwfilter.c:65
#7 0x00007f39decab77a in qemuProcessStop (driver=driver@entry=0x7f39ac11ba20,
vm=0x7f39fc023dd0,
reason=reason@entry=VIR_DOMAIN_SHUTOFF_DESTROYED,
asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, flags=0)
at qemu/qemu_process.c:6767
#8 0x00007f39decfd47e in qemuDomainDestroyFlags (dom=<optimized out>,
flags=<optimized out>) at qemu/qemu_driver.c:2237
#9 0x00007f3a2661c593 in virDomainDestroy (domain=domain@entry=0x7f39f00191e0) at
libvirt-domain.c:475
#10 0x00005601c284052a in remoteDispatchDomainDestroy (server=0x5601c38cd240,
msg=0x5601c3a189e0, args=<optimized out>,
rerr=0x7f3a0e010b60, client=0x5601c393d9f0) at
remote/remote_daemon_dispatch_stubs.h:4615
#11 remoteDispatchDomainDestroyHelper (server=0x5601c38cd240, client=0x5601c393d9f0,
msg=0x5601c3a189e0, rerr=0x7f3a0e010b60,
args=<optimized out>, ret=0x7f39f0004970) at
remote/remote_daemon_dispatch_stubs.h:4591
#12 0x00007f3a265443f4 in virNetServerProgramDispatchCall (msg=0x5601c3a189e0,
client=0x5601c393d9f0, server=0x5601c38cd240,
prog=0x5601c3938d70) at rpc/virnetserverprogram.c:437
#13 virNetServerProgramDispatch (prog=0x5601c3938d70, server=server@entry=0x5601c38cd240,
client=0x5601c393d9f0, msg=0x5601c3a189e0)
at rpc/virnetserverprogram.c:304
#14 0x00007f3a2654a778 in virNetServerProcessMsg (msg=<optimized out>,
prog=<optimized out>, client=<optimized out>,
srv=0x5601c38cd240) at rpc/virnetserver.c:145
#15 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x5601c38cd240) at
rpc/virnetserver.c:166
#16 0x00007f3a264822f0 in virThreadPoolWorker (opaque=opaque@entry=0x5601c38d9da0) at
util/virthreadpool.c:167
#17 0x00007f3a26481678 in virThreadHelper (data=<optimized out>) at
util/virthread.c:206
#18 0x00007f3a22adf564 in start_thread (arg=<optimized out>) at
pthread_create.c:463
#19 0x00007f3a2281331f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Thread 19 (Thread 0x7f39ceacb700 (LWP 17216)):
#0 0x00007f3a22808929 in __GI___poll (fds=fds@entry=0x7f39ceaca930, nfds=nfds@entry=1,
timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007f39f4f3409d in poll (__timeout=<optimized out>, __nfds=1,
__fds=0x7f39ceaca930) at /usr/include/bits/poll2.h:46
#2 pcap_wait_for_frames_mmap (handle=handle@entry=0x7f39ac1a7260) at ./pcap-linux.c:4561
#3 0x00007f39f4f38b98 in pcap_read_linux_mmap_v3 (handle=0x7f39ac1a7260, max_packets=1,
callback=0x7f39f4f339f0 <pcap_oneshot_mmap>, user=0x7f39ceaca9d0
"@\252\254\316\071\177") at ./pcap-linux.c:5059
#4 0x00007f39f4f3d632 in pcap_next (p=p@entry=0x7f39ac1a7260, h=h@entry=0x7f39ceacaa40)
at ./pcap.c:243
#5 0x00007f39f5186ace in learnIPAddressThread (arg=0x7f39fc058f10) at
nwfilter/nwfilter_learnipaddr.c:486
#6 0x00007f3a264816a2 in virThreadHelper (data=<optimized out>) at
util/virthread.c:206
#7 0x00007f3a22adf564 in start_thread (arg=<optimized out>) at
pthread_create.c:463
#8 0x00007f3a2281331f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
So in summary:
Thread 1 is responding to QEMU SHUTDOWN event
Waiting on lock VM mutex due to thread 4
Thread 4 is responding to virDomainDestroy
Holds lock on VM mutex
Holds lock on nwfilter update mutex
Waiting on lock interface mutex due to thread 19
Thread 19 is doing nwfilter address learning
Holds lock on interface mutex
Thread 19 is using libpcap to capture packets, and has told it to use a
500ms timeout. So it is in pcap_next() while holding the interface
lock waiting in poll(). The code is written such that pcap_next() is
expected to return on hitting the 500ms timeout and check for a request
for learning to terminate. Whereupon it would release the interface lock
allowing thread 4 to continnue.
As we can see from the stack trace though, pcap_next() is not honouring
our 500ms timeout, and has done poll() with an infinite timeout.
This seems to be intentional behaviour from libpcap to ignore our requested
timeout....
} else if (handlep->timeout > 0) {
#ifdef HAVE_TPACKET3
/*
* For TPACKET_V3, the timeout is handled by the kernel,
* so block forever; that way, we don't get extra timeouts.
* Don't do that if we have a broken TPACKET_V3, though.
*/
if (handlep->tp_version == TPACKET_V3 && !broken_tpacket_v3)
handlep->poll_timeout = -1; /* block forever, let TPACKET_V3 wake us up */
else
#endif
handlep->poll_timeout = handlep->timeout; /* block for that amount of time */
We've taken the path tp_version == TPACKET_V3 && !broken_tpacket_v3.
This is the case for anything with kernel >= 3.19
/*
* OK, that was a valid minor version.
* Is this 3.19 or newer?
*/
if (major >= 4 || (major == 3 && minor >= 19)) {
/* Yes. TPACKET_V3 works correctly. */
broken_tpacket_v3 = 0;
}
I've no clue what it is about TPACKET_V3 that is supposed to wake us
up from poll, but it clearly isn't happening :-(
Sorry, me neither, but I remembered something about timeouts being fishy in pcap
back when I used it. And looking at the man page now I surely was right:
NOTE: the read timeout cannot be used to cause calls that read packets to return within
a
limited period of time, because, on some platforms, the read timeout isn't
supported,
and, on other platforms, the timer doesn't start until at least one packet arrives.
This
means that the read timeout should NOT be used, for example, in an interactive
applica‐
tion to allow the packet capture loop to ``poll'' for user input periodically,
as there's
no guarantee that a call reading packets will return after the timeout expires even if
no
packets have arrived.
I'm not sure why in your particular case the timeout is just not set on purpose,
but maybe that excerpt might help.
Good luck with figuring this out.
Have a nice weekend,
Martin