Howdy, 'yall.
I'm having issues with virDomainRestore failing, particularly under load
-- even in 0.7.0, when there's no need to parse through qemu's output to
find the monitor PTY.
Digging through strace output of libvirtd and the qemu processes it
spawns, this is happening when qemu blocks on the migrate -incoming and
ceases to respond to the monitor socket -- though some versions of qemu
can go into this state before the monitor socket is even opened, leading
to libvirt timing out either while attempting to open the monitor socket
or while trying to read therefrom, and subsequently killing the qemu
instance it spawned while that instance is still attempting to migrate
in its old saved state.
Both of qemu-0.11.0-rc1 and qemu-kvm master have some form of blocking
in -incoming exec: which can prevent libvirt from successfully carrying
through a resume; I have reproduced the issue (and maintain logs from
strace, available on request) irrespective of the state of Chris
Lalancette's "Fix detached migration with exec" and "Allow monitor
interaction when using migrate -exec" patches. The qemu binaries being
used _appear_ to correctly allow monitor interaction prior to -incoming
exec:... completion when interactively invoked in the trivial case shown
below:
$ qemu-system-x86_64 \
-monitor stdio \
-nographic \
-serial file:/dev/null \
-incoming 'exec:sleep 5; echo DONE >&2; kill $PPID' \
/dev/null
QEMU 0.10.91 monitor - type 'help' for more information
(qemu) DONE
$
...however, whether these same binaries work as-expected when invoked
from libvirt by our automated test system under load is
nondeterministic. (I have yet to reproduce the issue in a low-load
environment using "virsh restore").
Is someone else working on this? Is a known-good (or believed-good)
libvirt/qemu pair available? What can I do to help in getting this issue
resolved?
Thanks!
---
libvirt-0.7.0 + qemu-kvm-0.11.0-rc1
qemudReadMonitorOutput:728 : internal error Timed out while reading
monitor startup output
libvirt-0.6.5 + qemu-kvm-0.11.0-rc1
error : qemudReadMonitorOutput:705 : internal error Timed out while
reading monitor startup output
error : qemudWaitForMonitor:1003 : internal error unable to start guest:
char device redirected to /dev/pts/9
libvir: QEMU error : internal error unable to start guest: char device
redirected to /dev/pts/9
^^ particularly interesting, as the above line should have been eaten by
qemudExtractMonitorPath rather than emitted as error text
---
<aliguori> -incoming is blocking
<aliguori> you cannot interact with the monitor during -incoming
<mDuff> ...shouldn't we always be opening the monitor before starting
the blocking -incoming bits, though? I don't always see that happening
(and have an strace handy where it certainly doesn't).
<aliguori> no
<aliguori> well, i think they added some patches for that
<aliguori> but originally, that's not how it worked
<aliguori> and i think it's silly to work that way
<aliguori> -incoming should mean, wait patiently for an incoming migration
<aliguori> there's no point in interfacing with the monitor in the interim
<mDuff> I agree that interacting may not be called for, but at least
connect()ing -- if it's a UNIX socket, the other side won't be able to
connect at all until qemu goes first...
<aliguori> heh, well....
<aliguori> that particular race condition is addressed by -daemonize
<aliguori> because that's generally true
<aliguori> you don't know how long qemu will take to open the monitor
<aliguori> but -daemonize makes gives you notification because it
doesn't daemonize the process until you've gotten to the point where all
sockets are open
<aliguori> but IIRC, libvirt doesn't use -daemonize