Many thanks to Daniel Veillard for the Xen storage patch that was ACKed
here yesterday! It does indeed seem to fix the crashing and lockup
problems I was seeingw hen performing many simultaneous operations.
At the moment, it may have exposed something else I find curious. I
have a set of twelve VMs across two hosts (seven on one, five on the
other). At night they're all paused, and in the morning they're all
resumed at once. For the past two evenings, the pause operation has
stalled for two of the VMs. The first time it was one per host, and
just yesterday it was both on the 7-domain host.
These connections are mostly harmless. The libvirtd still accepts new
connections, but I can see the stalled ssh/netcat processes still
running to forward the local socket.
I've attached (okay, pasted inline) a traceback of the daemon that had
two ghost connections, and from my inexperienced read-through it looks
as though libvirtd doesn't even know that anyone's trying to talk to it.
This isn't much to go on, but I felt it was better to post this than to
let it drop on the floor, since there does seem to be a pattern so far
(we'll see how things go tonight, and I'll be able to do more
destructive testing next week).
(gdb) thread apply all bt
Thread 6 (Thread 0x7f5ca3af7950 (LWP 889)):
#0 0x00007f5ca84dab99 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#1 0x00007f5ca9b43369 in virCondWait (c=0x65a63c, m=0x80)
at threads-pthread.c:81
#2 0x0000000000412ab5 in qemudWorker (data=<value optimized out>)
at qemud.c:1445
#3 0x00007f5ca84d63f7 in start_thread () from /lib/libpthread.so.0
#4 0x00007f5ca8245b3d in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 5 (Thread 0x7f5c9bfff950 (LWP 28425)):
#0 0x00007f5ca84dab99 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#1 0x00007f5ca9b43369 in virCondWait (c=0x65a63c, m=0x80)
at threads-pthread.c:81
#2 0x0000000000412ab5 in qemudWorker (data=<value optimized out>)
at qemud.c:1445
#3 0x00007f5ca84d63f7 in start_thread () from /lib/libpthread.so.0
#4 0x00007f5ca8245b3d in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7f5ca52fa950 (LWP 5493)):
#0 0x00007f5ca84dab99 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#1 0x00007f5ca9b43369 in virCondWait (c=0x65a63c, m=0x80)
at threads-pthread.c:81
#2 0x0000000000412ab5 in qemudWorker (data=<value optimized out>)
at qemud.c:1445
#3 0x00007f5ca84d63f7 in start_thread () from /lib/libpthread.so.0
#4 0x00007f5ca8245b3d in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f5ca4af9950 (LWP 25795)):
#0 0x00007f5ca84dab99 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#1 0x00007f5ca9b43369 in virCondWait (c=0x65a63c, m=0x80)
at threads-pthread.c:81
#2 0x0000000000412ab5 in qemudWorker (data=<value optimized out>)
at qemud.c:1445
#3 0x00007f5ca84d63f7 in start_thread () from /lib/libpthread.so.0
#4 0x00007f5ca8245b3d in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f5ca2af5950 (LWP 25810)):
#0 0x00007f5ca84dab99 in pthread_cond_wait@(a)GLIBC_2.3.2 ()
from /lib/libpthread.so.0
#1 0x00007f5ca9b43369 in virCondWait (c=0x65a63c, m=0x80)
at threads-pthread.c:81
#2 0x0000000000412ab5 in qemudWorker (data=<value optimized out>)
at qemud.c:1445
#3 0x00007f5ca84d63f7 in start_thread () from /lib/libpthread.so.0
#4 0x00007f5ca8245b3d in clone () from /lib/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f5caa853780 (LWP 864)):
#0 0x00007f5ca823cc86 in poll () from /lib/libc.so.6
#1 0x000000000040f1d6 in virEventRunOnce () at event.c:542
#2 0x00000000004116cd in qemudRunLoop (server=0x65a610) at qemud.c:2067
#3 0x0000000000414bbe in main (argc=4, argv=<value optimized out>)
at qemud.c:2921
#0 0x00007f5ca823cc86 in poll () from /lib/libc.so.6
--
BitKeeper, how quaint. Nick Moffitt
-- Alan Cox nick(a)zork.net