As noted in another message, the problem I was seeing is a race
condition in qemudDomainRestore(), not with my modifications to
qemudDmainSave(). Here's some discussion about that problem from IRC,
with a question at the bottom:
<laine> Does anyone else see a failure of domain restore
(immediately
after domain save? I'm very definitely seeing it on my machine with
F12+updates testing and libvirt built from unpatched sources.
<laine> It's very reproduceable - with virsh I do "save domain
filename", then "restore filename" and it pretty much always gives me
a black screen. Then I force shutdown the guest (with virt-manager)
and do "restore filename" again. Tada! It's restored and running!
[...]
<danpb> laine: possible race condition
<danpb> laine: try putting a sleep(10) before the qemuMonitorStartCPUs
in qemuDomainRestore()
Dan's suggestion *did* eliminate the failures.
[...]
<danpb> laine: this sounds like the issue with libvirt prematurely
starting execution of the CPUs before QEMU has even started restoring
(or soemthing like that)
<danpb> laine: search the archives for a mail from Charles Duffy on
this subject some time ago
Here's the BZ filed by Charles Duffy
https://bugzilla.redhat.com/show_bug.cgi?id=537938
It looks like he's dealing with a race condition earlier in the restore,
since his solution was to wait for the migration process to terminate
somewhere inside qemudStartVMDaemon(), rather than waiting until
qemudStartVMDaemon() was finished (which is what it does now). Since
this wait has already been done anyway by the time of Dan's sleep(10) in
my test, I don't think Charles' patch would help this situation.
So is there something that libvirt can wait on here to ensure proper
start? Or is there a problem in qemu? (I'm still running 0.11. I'll also
try upgrading to 0.12 and see if there are changes in behavior.)