On 04/14/2015 11:31 AM, Ian Jackson wrote:
Konrad Rzeszutek Wilk writes ("libvirtd live-locking on CTX_LOCK
when doing 'virsh <domid> save /tmp/blah' with guest corrupting memory (on
purpose)."):
> It looks like thread #10 is blocking in libxl_read_exactly waiting
> for 'libxl-save-helper'. Said application (see below) has dispatched
> an message through helper_getreply and is blocking on __read_nocancel.
This is not supposed to block.
helper_stdout_readable assumes that the fd is actually readable.
However, for complicated reasons it can happen in a multithreaded
program that the fd was _reviously_ readable and is now no longer.
This was not clearly documented in the internal API documentation.
I have produced what I think are two patches that will fix this. I
have compiled them but I haven't tested them. Konrad, are you able to
check whether they fix your bug ?
I too saw this bug just before Konrad's report, but the patches don't seem to
help. Running a script that continually saves and restores domains will
eventually lock libvirtd with essentially the same traces reported by Konrad
Thread 4 (Thread 0x7fffee3a0700 (LWP 39068)):
#0 0x00007ffff3a9aa9d in read () from /lib64/libpthread.so.0
#1 0x00007ffff4540ea0 in libxl_read_exactly (ctx=0x7fffe00445e0, fd=37,
data=0x7fffee39f36e,
sz=2, source=0x7fffc80010c0 "domain 6 save/restore helper stdout pipe",
what=0x7ffff458112a "ipc msg header") at libxl_utils.c:430
#2 0x00007ffff454913a in helper_stdout_readable (egc=0x7fffee39f540,
ev=0x7fffc8002038, fd=37,
events=3, revents=1) at libxl_save_callout.c:281
#3 0x00007ffff454fafb in afterpoll_internal (egc=0x7fffee39f540,
poller=0x7fffe0000a00, nfds=4,
fds=0x7fffe0000930, now=...) at libxl_event.c:1185
#4 0x00007ffff455127a in eventloop_iteration (egc=0x7fffee39f540,
poller=0x7fffe0000a00)
at libxl_event.c:1645
#5 0x00007ffff4551df1 in libxl__ao_inprogress (ao=0x7fffc8001060,
file=0x7ffff4575e1b "libxl.c",
line=982, func=0x7ffff4578750 <__func__.17561>
"libxl_domain_suspend") at
libxl_event.c:1896
#6 0x00007ffff450e051 in libxl_domain_suspend (ctx=0x7fffe00445e0, domid=6,
fd=29, flags=0,
ao_how=0x0) at libxl.c:982
#7 0x00007fffe8774636 in libxlDoDomainSave (driver=0x7fffe011f1c0,
vm=0x7fffe004f950,
to=0x7fffc8000990 "/tmp/sles12gm-pv.img") at libxl/libxl_driver.c:1584
#8 0x00007fffe8774a35 in libxlDomainSaveFlags (dom=0x7fffc8000de0,
to=0x7fffc8000990 "/tmp/sles12gm-pv.img", dxml=0x0, flags=0) at
libxl/libxl_driver.c:1653
#9 0x00007fffe8774b11 in libxlDomainSave (dom=0x7fffc8000de0,
to=0x7fffc8000990 "/tmp/sles12gm-pv.img") at libxl/libxl_driver.c:1678
#10 0x00007ffff751db15 in virDomainSave (domain=0x7fffc8000de0,
to=0x7fffc80009d0 "/tmp/sles12gm-pv.img") at libvirt-domain.c:839
...
Thread 1 (Thread 0x7ffff7fc18c0 (LWP 39059)):
#0 0x00007ffff3a9a7bc in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007ffff3a964a4 in _L_lock_952 () from /lib64/libpthread.so.0
#2 0x00007ffff3a96306 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007ffff454caf6 in libxl__ctx_lock (ctx=0x7fffe00445e0) at
libxl_internal.h:3268
#4 0x00007ffff454fe98 in libxl_osevent_occurred_fd (ctx=0x7fffe00445e0,
for_libxl=0x7fffe004f210, fd=32, events_ign=0, revents_ign=1) at
libxl_event.c:1242
#5 0x00007fffe8770573 in libxlFDEventCallback (watch=24, fd=32, vir_events=1,
fd_info=0x555555896c60) at libxl/libxl_driver.c:123
#6 0x00007ffff73f71bc in virEventPollDispatchHandles (nfds=14, fds=0x555555897fa0)
at util/vireventpoll.c:508
#7 0x00007ffff73f79f9 in virEventPollRunOnce () at util/vireventpoll.c:657
#8 0x00007ffff73f58fa in virEventRunDefaultImpl () at util/virevent.c:308
#9 0x00005555555c2131 in virNetServerRun (srv=0x555555889980) at
rpc/virnetserver.c:1139
#10 0x000055555556cf88 in main (argc=2, argv=0x7fffffffe378) at libvirtd.c:1489
Regards,
Jim