On Mon, Sep 28, 2009 at 6:43 AM, Daniel P. Berrange <berrange(a)redhat.com>wrote:
The flaw in QEMU is depressingly obvious
static int stdio_pclose(void *opaque)
{
QEMUFileStdio *s = opaque;
pclose(s->stdio_file);
qemu_free(s);
return 0;
}
Notice how it completely discards the exit status returned by
pclone() and just pretends everything always worked :-(
If this was handling errors correctly, you'd at least see QEMU
exiting rather than hanging around broken.
Ugh, indeed. I'll submit a patch for that later today, if nobody beats me to
it.
Hmm, this does look problematic - we need the monitor to be
responsive
in order to do things like CPU pinning. We need the monitor to be
non-responsive to ensure 'cont' doesn't run until migration has finished.
We can't have it both ways, and the former wins since we need that to be
done before ever letting QEMU start allocating guest RAM pages. So relying
on 'cont' to block is not good. Is the 'cont' even neccessary - I
remember
seeing somewhere that QEMU unconditionally started its CPUs after an
incoming migraiton finished ?
I've seen patches to change that behavior, so IMHO it's probably not to safe
to depend on it being one way or the other throughout the versions of qemu
libvirt supports.
What I'm tempted to do is add a command which sends a sigil to stderr to the
end of the exec: migration lines specified by libvirt, and wait for either
that sigil or an error to show up in the log for that domain before issuing
the cont; if my memory is at all correct, libvirt should have some helper
functions useful for that purpose already available.
Does this sound like a reasonable approach?