
On 08/18/2011 11:42 PM, Osier Yang wrote:
Remember, that 'migrate' is a long-running async job command, and can be interrupted. That is, 'service libvirtd restart' is a legal action to take during step 3, and it is not as severe as a libvirtd crash, and we have already recently added patches to remember async job status across libvirtd restarts with the intention of making it legal to restart libvirtd in the middle of an async job (whether the async job should still succeed, or should remove the save file, is a slightly different question; but removing the save file would require that we save in the XML the name of the file to remove if libvirtd is restarted).
Hmm, how about restart libvirtd during the process of managed saving?
Domain will be restored from the corrupt save image automatically. We report an error like "image is corrupt" and quite the domain starting simply? This might be not good, as one will see a running domain fails to start after libvirtd restarting.
Or we want to the managed saving still succeed? If so, we might need:
1) continue the managed saving job, (Per we are already support remeber the async job status across libvirtd restarting) 2) restore from the saved image finished in 1).
I think the easiest approach is: if we restart libvirtd, and see that an async job for save-to-file was in progress, then we abort the job (leaving the file marked unfinished, whether it was managed save or user save), and log the error. On managed restore (virDomainCreate or autostart), if the save file exists but is incomplete, then log the fact that the file is unusable, then unlink() the file and proceed to do a normal boot (nothing we can do to recover the lost autosave, but we can at least clean up on the user's behalf). On user restore (virDomainRestore), if the save file exists but is incomplete, report the error to the user. No unlink(), and no rebooting the guest; it's up to the user to decide how to handle the failed save. But if we can figure out how to do better, by making a libvirtd restart able to complete the save process rather than ditch it, then that would be nicer. It's just that I don't know how easy that would be, and we have to start this patch somewhere. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org