[libvirt] Silently ignored virDomainRestore failures

25 Sep 2009

      Howdy, all.

I maintain a test infrastructure which makes heavy use of virDomainSave 
and virDomainRestore, and have been seeing occasional cases where my 
saved images are for some reason not restored correctly -- and, indeed, 
the incoming migration streams are not even read in their entirety.

While this generally appears to be caused by issues outside of libvirt's 
purview, one unfortunate issue is that libvirt can report success 
performing a restore even when the operation is effectively an abject 
failure.

Consider the following snippet, taken from one of my 
/var/log/libvirt/qemu/<domain>.log files:

LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin USER=root LOGNAME=root 
/usr/bin/qemu-kvm -S -M pc-0.11 -m 512 -smp 1 <...lots of arguments 
here...> -incoming exec:cat
cat: write error: Broken pipe

This leaves a running qemu hosting a catatonic guest -- but the libvirt 
client (connecting through the Python bindings) received a status of 
success for the operation given here.

libvirt's mechanism for validating a successful restore consists of 
running a "cont" command on the guest, and then checking 
virGetLastError(); AIUI, it is expected that the "cont" will not be able 
to run until the restore is completed, as the monitor should not be 
responsive until that time. Browsing through qemudMonitorSendCont (and 
qemudMonitorCommandWithHandler, which it calls), I don't see anything 
which looks at the log file with the stderr output from qemu to 
determine whether an error actually occurred. (As an aside, "info 
history" on the guest's monitor socket indicates that it was indeed 
issued this "cont").

Should the existing cont+virGetLastError() approach be sufficient to 
handle this class of error? If not, is there any guidance on what would 
comprise a better system? (I suppose we could add something to the exec: 
to affirmatively indicate on stderr that the decompressor [or cat, if 
not using one] exited successfully, and check for that marker in the log 
file... but that seems quite a dirty hack).

Thanks!

[libvirt] Silently ignored virDomainRestore failures

Charles Duffy