[Libvir] Proposal: libvirt should remove or rename a save file after a successful restore

Chris Lalancette found this problem: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=246105 <quote> Once an "xm restore" has successfully completed, the -save file that it came from should really be deleted. Restoring a domain a second time from the same save file will result in disk corruption; the kernel VM state (in the saved file) will be in an inconsistent state with respect to what is actually on disk (in the domain disk file). The only valid use I can see for the -save file after a restore is possibly for some crash analysis, but even that could be worked around by renaming the save file after a success. In particular, I just ran into this situation (and I'm worried customers might do the same): [and then he goes on to describe a scenario in which you can commonly hit this situation] </quote> So the obvious and simple solution, it seems to me, is just to change libvirt.c:virDomainRestore so that if the driver underneath it returns from the restore without error, then we either remove or rename the restore file. It's a trivial patch -- what do people think? My thoughts: * Removing the file might seem quite abrupt/unexpected, and there is the possibility of data loss. * You have to rename the file as a hidden file (dotfile) in order for xendomains to ignore it. However these files are big and having large, hidden files which build up in number over time sounds like it could be a bad thing. * This might be considered a significant behaviour change. * It's easier to handle this for the remote case if we make the change inside libvirt, rather than in virsh etc. Rich. -- Emerging Technologies, Red Hat - http://et.redhat.com/~rjones/ Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 03798903

On Mon, Jul 02, 2007 at 04:38:48PM +0100, Richard W.M. Jones wrote:
So the obvious and simple solution, it seems to me, is just to change libvirt.c:virDomainRestore so that if the driver underneath it returns from the restore without error, then we either remove or rename the restore file.
It's a trivial patch -- what do people think?
This is not correct as it assumes that the filesystem(s) cannot be rolled back to the state corresponding to that known by the saved domain. regards john

On Mon, Jul 02, 2007 at 04:38:48PM +0100, Richard W.M. Jones wrote:
Chris Lalancette found this problem:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=246105
<quote> Once an "xm restore" has successfully completed, the -save file that it came from should really be deleted. Restoring a domain a second time from the same save file will result in disk corruption; the kernel VM state (in the saved file) will be in an inconsistent state with respect to what is actually on disk (in the domain disk file). The only valid use I can see for the -save file after a restore is possibly for some crash analysis, but even that could be worked around by renaming the save file after a success. In particular, I just ran into this situation (and I'm worried customers might do the same): [and then he goes on to describe a scenario in which you can commonly hit this situation] </quote>
So the obvious and simple solution, it seems to me, is just to change libvirt.c:virDomainRestore so that if the driver underneath it returns from the restore without error, then we either remove or rename the restore file.
It's a trivial patch -- what do people think?
My thoughts: * Removing the file might seem quite abrupt/unexpected, and there is the possibility of data loss.
That would be placing policy into the API which IMHO is wrong. Policy should be in tools instead. So making virsh/virt-manager remove the file is good, but we shouldn't force it on all users of the API. It is quite possible that a user wishes to restore from the same file mulitple times - assuming they know the caveats about snapshotting the filesystem to match their VM state.
* You have to rename the file as a hidden file (dotfile) in order for xendomains to ignore it. However these files are big and having large, hidden files which build up in number over time sounds like it could be a bad thing.
I don't like the idea of renaming just to get around xendomains issues. Ad-hoc saved VMs should not be put in the same directory that xendomains uses for its managed saved-on-shutdown process. This bug should should fixed separately.
* This might be considered a significant behaviour change. * It's easier to handle this for the remote case if we make the change inside libvirt, rather than in virsh etc.
We can't change the existing API semantics in this way. At best we could add an additional variant of the virDomainRestore which added a 'flags' parameter perhaps. Longer term we need to consider expanding the save/restore APIs because both QEMU & Xen now have an idea of 'managed save images' as well as external save images. WIth managed images, you don't supply a filename, instead you simply tell it to save the VM (optionally with a 'tag' to identify the image). XenD / QEMU can list previously saved images, and optionally restore from one, or delete one. This supports the idea of snapshots/checkpoints too. So I think we'll ultimately need several more APIs in this area. The managed save/restore APIs will end up being separate from the existing unmanaged APIs though. Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|
participants (3)
-
Daniel P. Berrange
-
John Levon
-
Richard W.M. Jones