[libvirt-users] managedsave results in unexpected shutdown from inside Windows

Hi, I'm facing a strange behaviour of windows 2008 guests. After a virsh managedsave and start, the windows guest open a "unexpected shutdown" window. The window looks like this one (randomly taken from the web): http://toastytech.com/guis/srv2k3login2.jpg This happens in all virtualized windows in a hypervisor. Has anyone else experienced such issue? Thanks, -- Nicolas Sebrecht

2013/3/12 Nicolas Sebrecht <nsebrecht@piing.fr>
Hi,
I'm facing a strange behaviour of windows 2008 guests. After a virsh managedsave and start, the windows guest open a "unexpected shutdown" window.
The window looks like this one (randomly taken from the web):
http://toastytech.com/guis/srv2k3login2.jpg
This happens in all virtualized windows in a hypervisor.
Has anyone else experienced such issue?
Thanks,
Your pic show that you are using a windows2003 guest not windows2008,as far as I know if a windows2003 directly poweroff (not normal shutdown),then this window will open when bootup.

The 12/03/13, Gao Yongwei wrote:
2013/3/12 Nicolas Sebrecht <nsebrecht@piing.fr>
Hi,
I'm facing a strange behaviour of windows 2008 guests. After a virsh managedsave and start, the windows guest open a "unexpected shutdown" window.
The window looks like this one (randomly taken from the web):
http://toastytech.com/guis/srv2k3login2.jpg
This happens in all virtualized windows in a hypervisor.
Has anyone else experienced such issue?
Thanks,
Your pic show that you are using a windows2003 guest not windows2008,as
Yes, I didn't found a 2008 picture. :-)
far as I know if a windows2003 directly poweroff (not normal shutdown),then this window will open when bootup.
Yes, this is the window we get on poweroff. The thing is that this window pop up after a managedsave... Odd. -- Nicolas Sebrecht

On 03/12/2013 07:35 AM, Nicolas Sebrecht wrote:
Hi,
I'm facing a strange behaviour of windows 2008 guests. After a virsh managedsave and start, the windows guest open a "unexpected shutdown" window.
That shouldn't be happening - a start after a managedsave should be restoring the guest to the same state as at the save. It sounds like you may have run into a corrupted managedsave file, so libvirt punted and booted the guest from scratch instead of restoring state; booting from scratch without a clean shutdown would explain the symptoms of the OS complaining. Did you upgrade qemu in between when you saved your guest and restarted it? If so, this may be more of a qemu bug about not handling incoming migration of data generated from an older qemu. Are you sure that the managed save data was not corrupted, such as a power outage occurring before the managed save file was completely flushed to disk? -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The 13/03/13, Eric Blake wrote:
That shouldn't be happening - a start after a managedsave should be restoring the guest to the same state as at the save. It sounds like you may have run into a corrupted managedsave file, so libvirt punted and booted the guest from scratch instead of restoring state; booting from scratch without a clean shutdown would explain the symptoms of the OS complaining.
Did you upgrade qemu in between when you saved your guest and restarted it? If so, this may be more of a qemu bug about not handling incoming migration of data generated from an older qemu. Are you sure that the managed save data was not corrupted, such as a power outage occurring before the managed save file was completely flushed to disk?
I use a script started each night to (managed)save the guests and upload them to a FTP server. Basically, I do virsh managedsave guest <upload managedsave file> <upload guest disks> virsh start guest So, there is no qemu update and the hypervisor uptime is pretty high. I'll investigate on the corruption possibility. Thanks for your feedback. -- Nicolas Sebrecht

On 03/13/2013 09:25 AM, Nicolas Sebrecht wrote:
I use a script started each night to (managed)save the guests and upload them to a FTP server. Basically, I do
virsh managedsave guest <upload managedsave file> <upload guest disks> virsh start guest
You might want to look into external snapshots as a more efficient way of taking guest snapshots. With virsh managedsave/start, you are guaranteed guest downtime of several seconds at a minimum, and you have to do the disk management yourself; but with external snapshots, you can take a consistent picture of guest state at a point in time with sub-second downtime (the same way live migration between hosts has sub-second downtime). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The 13/03/13, Eric Blake wrote:
You might want to look into external snapshots as a more efficient way of taking guest snapshots.
I have guests with raw disks due to Windows performance issues. It would very welcome to have minimal downtime as some disks are quiet large (terabytes) and the allowed downtime window very short. Let's try external snapshots for guest "VM" while running: # cd virtuals/images # virsh virsh> snapshot-create-as VM snap1 "snap1 VM" --memspec file=VM.save,snapshot=external --diskspec vda,snapshot=external,file=VM-snap1.img Domain snapshot snap1 created virsh> exit # ls VM-snap1.img ls: cannot access VM-snap1.img: No such file or directory # Ooch! <investigating...> # ls /VM-snap1.img /VM-snap1.img # ls /VM.save /VM.save # Surprising! I would have expect files to be stored in virtuals/images. This is not the point for now, let's continue. # virsh snapshot-list VM Name Creation Time State ------------------------------------------------------------ snap1 2013-03-14 12:20:01 +0100 running # USE CASE 1: restoring from backing file ======================================= # virsh shutdown VM I can't find a snapshot-* doing what I want (snapshot-revert expects to revert a snapshot), trying restore. # virsh restore /VM.save Domain restored from /VM.save # LANG=C virsh snapshot-list VM Name Creation Time State ------------------------------------------------------------ snap1 2013-03-14 12:20:01 +0100 running # As we might expect, the snapshot is still there. # virsh snapshot-delete VM snap1 error: Failed to delete snapshot snap1 error: unsupported configuration: deletion of 1 external disk snapshots not supported yet # A bit annoying. Now, it seems that I have to manually delete garbage. Actually, I've tried and I had to delete /VM.save, /VM-snap1.img, /var/lib/libvirt/qemu/snapshot/VM/snap1.xml and restart libvirt (no snapshot-refresh). USE CASE 2: the files are saved in another place, let's merge back the changes ============================================================================== The idea is to merge VM-snap1.img back to VM.raw with minimal downtime. I can't find a command for that, let's try manually. # virsh managedsave VM # qemu-img commit /VM-snap1.img # rm /VM-snap1.img /VM.save # virsh start VM error: Failed to start domain VM error: cannot open file 'VM-snap1.img': No such file or directory # virsh edit VM <virsh edit VM to come back to vda -> VM.raw> # virsh start VM error: Failed to start domain VM error: cannot open file 'VM-snap1.img': No such file or directory # Looks like the complain comes from the xml state header. # virsh save-image-edit /var/lib/libvirt/qemu/save/VM.save <virsh edit VM to come back to vda -> VM.raw> error: operation failed : new xml too large to fit in file # Stuck. :-/ -- Nicolas Sebrecht

On 03/14/2013 06:29 AM, Nicolas Sebrecht wrote:
The 13/03/13, Eric Blake wrote:
You might want to look into external snapshots as a more efficient way of taking guest snapshots.
I have guests with raw disks due to Windows performance issues. It would very welcome to have minimal downtime as some disks are quiet large (terabytes) and the allowed downtime window very short. Let's try external snapshots for guest "VM" while running:
Do be aware that an external snapshot means you are no longer using a raw image - it forces you to use a qcow2 file that wraps your raw image. With new enough libvirt and qemu, it is also possible to use 'virsh blockcopy' instead of snapshots as a backup mechanism, and THAT works with raw images without forcing your VM to use qcow2. But right now, it only works with transient guests (getting it to work for persistent guests requires a persistent bitmap feature that has been proposed for qemu 1.5, along with more libvirt work to take advantage of persistent bitmaps). There's also a proposal on the qemu lists to add a block-backup job, which I would need to expose in libvirt, which has even nicer backup semantics than blockcopy, and does not need a persistent bitmap.
# cd virtuals/images # virsh virsh> snapshot-create-as VM snap1 "snap1 VM" --memspec file=VM.save,snapshot=external --diskspec vda,snapshot=external,file=VM-snap1.img Domain snapshot snap1 created virsh> exit # ls VM-snap1.img ls: cannot access VM-snap1.img: No such file or directory
Specify an absolute patch, not a relative one.
#
Ooch! <investigating...>
# ls /VM-snap1.img /VM-snap1.img # ls /VM.save /VM.save #
Surprising! I would have expect files to be stored in virtuals/images. This is not the point for now, let's continue.
Actually, it would probably be better if libvirtd errored out on relative path names (relative to what? libvirtd runs in '/', and has no idea what directory virsh was running in), and therefore virsh should be nice and convert names to absolute before handing them to libvirtd.
# virsh snapshot-list VM Name Creation Time State ------------------------------------------------------------ snap1 2013-03-14 12:20:01 +0100 running #
USE CASE 1: restoring from backing file =======================================
# virsh shutdown VM
I can't find a snapshot-* doing what I want (snapshot-revert expects to revert a snapshot), trying restore.
Correct - we still don't have 'snapshot-revert' wired up in libvirt to revert to an external snapshot - we have ideas on what needs to happen, but it will take time to get that code into the code base. So for now, you have to do that manually.
# virsh restore /VM.save Domain restored from /VM.save
Hmm. This restored the memory state from the point at which the snapshot was taken, but unless you were careful to check that the saved state referred to the base file name and not the just-created qcow2 wrapper from when you took the snapshot, then your disks might be in an inconsistent state with the memory you are loading. Not good. Also, restoring from the base image means that you are invalidating the contents of the qcow2 file for everything that took place after the snapshot was taken.
# LANG=C virsh snapshot-list VM Name Creation Time State ------------------------------------------------------------ snap1 2013-03-14 12:20:01 +0100 running #
As we might expect, the snapshot is still there.
# virsh snapshot-delete VM snap1 error: Failed to delete snapshot snap1 error: unsupported configuration: deletion of 1 external disk snapshots not supported yet
Yeah, again a known limitation. Once you change state behind libvirt's back (because libvirt doesn't yet have snapshot-revert wired up to do things properly), you generally have to 'virsh snapshot-delete --metadata VM snap1' to tell libvirt to forget the snapshot existed, but without trying to delete any files, since you did the file deletion manually.
#
A bit annoying. Now, it seems that I have to manually delete garbage. Actually, I've tried and I had to delete /VM.save, /VM-snap1.img, /var/lib/libvirt/qemu/snapshot/VM/snap1.xml and restart libvirt (no snapshot-refresh).
You have to delete /VM.save and /VM-snap1.img yourself, but you should have used 'virsh snapshot-delete --metadata' instead of mucking around in /var/lib (that directory should not be managed manually).
USE CASE 2: the files are saved in another place, let's merge back the changes ==============================================================================
The idea is to merge VM-snap1.img back to VM.raw with minimal downtime. I can't find a command for that, let's try manually.
Here, qemu is at fault. They have not yet given us a command to do that with minimal downtime. They HAVE given us 'virsh blockcommit', but it is currently limited to reducing chains of length 3 or longer to chains of at least 2. It IS possible to merge back into a single file while the guest remains live, by using 'virsh blockpull', but that single file will end up being qcow2; and it takes the time proportional to the size of the entire disk, rather than to the size of the changes since the snapshot was taken. Again, here's hoping that qemu 1.5 gives us live commit support, for taking a chain of 2 down to a single raw image.
# virsh managedsave VM # qemu-img commit /VM-snap1.img # rm /VM-snap1.img /VM.save # virsh start VM error: Failed to start domain VM error: cannot open file 'VM-snap1.img': No such file or directory
You went behind libvirt's back and removed /VM-snap1.img, but failed to update the managedsave image to record the location of the new filename.
# virsh edit VM <virsh edit VM to come back to vda -> VM.raw>
Yes, but you still have the managedsave image in the way.
# virsh start VM error: Failed to start domain VM error: cannot open file 'VM-snap1.img': No such file or directory
Try 'virsh managedsave-remove VM' to get the broken managedsave image out of the way. Or, if you are brave, and insist on rebooting from the memory state at which the managedsave image was taken but are sure you have tweaked the disks correctly to match the same point in time, then you can use 'virsh save-image-edit /path/to/managedsave' (assuming you know where to look in /etc to find where the managedsave file was stored internally by libvirt). Since modifying files in /etc is not typically recommended, I will assume that if you can find the right file to edit, you are already brave enough to take on the consequences of going behind libvirt's back. At any rate, editing the managed save image to point back to the correct raw file name, followed by 'virsh start', will let you resume with memory restored to the point of your managed save (and hopefully you pointed the disks to the same point in time).
#
Looks like the complain comes from the xml state header.
# virsh save-image-edit /var/lib/libvirt/qemu/save/VM.save <virsh edit VM to come back to vda -> VM.raw> error: operation failed : new xml too large to fit in file #
Aha - so you ARE brave, and DID try to edit the managedsave file. I'm surprised that you really hit a case where your edits pushed the XML over a 4096-byte boundary. Can you come up with a way to (temporarily) use shorter names, such as having /VM-snap1.img be a symlink to the real file, just long enough for you to get the domain booted again? Also, I hope that you did your experimentation on a throwaway VM, and not on a production one, in case you did manage to fubar things to the point of data corruption by mismatching disk state vs. memory state. Yes, I know that reverting to snapshots is still very much a work in progress in libvirt, and that you are not the first to ask these sorts of questions (reading the list archives will show that this topic comes up quite frequently). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The 14/03/13, Eric Blake wrote:
On 03/14/2013 06:29 AM, Nicolas Sebrecht wrote:
The 13/03/13, Eric Blake wrote:
You might want to look into external snapshots as a more efficient way of taking guest snapshots.
I have guests with raw disks due to Windows performance issues. It would very welcome to have minimal downtime as some disks are quiet large (terabytes) and the allowed downtime window very short. Let's try external snapshots for guest "VM" while running:
Do be aware that an external snapshot means you are no longer using a raw image - it forces you to use a qcow2 file that wraps your raw image.
Yes, that's what I understood from the man pages. This would not be a problem as long as it would be a temporary case while doing the backups. To resume the context for further readers from the archives, the idea is to use external snapshots in order to have minimal downtime, instead of using managedsave (aka hibernate). This is possible even if not all features are not implemented in libvirt for you (depends of the original disk format and the development state). Here are the basics steps. This is still not that simple and there are tricky parts in the way. Usual workflow (use case 2) =========================== Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state). Step 3: save and halt the vm state once backups are finished. Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file. Step 5: start the VM. Restarting from the backup (use case 1) ======================================= Step A: shutdown the running VM and move it out the way. Step B: restore the backing files and state file from the archives of step 2. Step C: restore the VM. (still not sure on that one, see below) I wish to provide a more detailed procedure in the future.
With new enough libvirt and qemu, it is also possible to use 'virsh blockcopy' instead of snapshots as a backup mechanism, and THAT works with raw images without forcing your VM to use qcow2. But right now, it only works with transient guests (getting it to work for persistent guests requires a persistent bitmap feature that has been proposed for qemu 1.5, along with more libvirt work to take advantage of persistent bitmaps).
Fine. Sadly, my guests are not transient. It appears I'm in worst case for all options. :-)
There's also a proposal on the qemu lists to add a block-backup job, which I would need to expose in libvirt, which has even nicer backup semantics than blockcopy, and does not need a persistent bitmap.
Ok.
Surprising! I would have expect files to be stored in virtuals/images. This is not the point for now, let's continue.
Actually, it would probably be better if libvirtd errored out on relative path names (relative to what? libvirtd runs in '/', and has no idea what directory virsh was running in), and therefore virsh should be nice and convert names to absolute before handing them to libvirtd.
Ok. I guess an error for relative paths would be fine to avoid unexpected paths. All embedded console I know support relative path (e.g.: python, irb, rails console, etc).
USE CASE 1: restoring from backing file =======================================
<...>
Correct - we still don't have 'snapshot-revert' wired up in libvirt to revert to an external snapshot - we have ideas on what needs to happen, but it will take time to get that code into the code base. So for now, you have to do that manually.
Fine.
# virsh restore /VM.save Domain restored from /VM.save
Hmm. This restored the memory state from the point at which the snapshot was taken, but unless you were careful to check that the saved state referred to the base file name and not the just-created qcow2 wrapper from when you took the snapshot, then your disks might be in an inconsistent state with the memory you are loading. Not good. Also, restoring from the base image means that you are invalidating the contents of the qcow2 file for everything that took place after the snapshot was taken.
Wait, wait, wait. Here, I want to restore my backup. So, I use the first created memory state in use case 2 with the snapshot-create-as command. Sorry, I sucked at using use case numbers in chronogical order. I keep the error to not add complexity.
From what I've checked with save-image-edit, this memory state points to the VM.raw disk (what I would expect).
Here is where we are in the workflow (step C) for what we are talking about: Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state). Step 3: save and halt the vm state once backups are finished. Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file. Step 5: start the VM. <For whatever reason, I have to restore the backup from step 2> Step A: shutdown the running VM and move it out the way. Step B: restore the backing files and state file from the archives of step 2. Step C: restore the VM. So, yes: this is the memory state from the point at which the snapshot was taken but I clearly expect it to point to the backing file only.
Yeah, again a known limitation. Once you change state behind libvirt's back (because libvirt doesn't yet have snapshot-revert wired up to do things properly), you generally have to 'virsh snapshot-delete --metadata VM snap1' to tell libvirt to forget the snapshot existed, but without trying to delete any files, since you did the file deletion manually.
Good, this is what I was missing.
You have to delete /VM.save and /VM-snap1.img yourself, but you should have used 'virsh snapshot-delete --metadata' instead of mucking around in /var/lib (that directory should not be managed manually).
Ok.
USE CASE 2: the files are saved in another place, let's merge back the changes ==============================================================================
The idea is to merge VM-snap1.img back to VM.raw with minimal downtime. I can't find a command for that, let's try manually.
Here, qemu is at fault. They have not yet given us a command to do that with minimal downtime. They HAVE given us 'virsh blockcommit', but it is currently limited to reducing chains of length 3 or longer to chains of at least 2. It IS possible to merge back into a single file while the guest remains live, by using 'virsh blockpull', but that single file will end up being qcow2; and it takes the time proportional to the size of the entire disk, rather than to the size of the changes since the snapshot was taken. Again, here's hoping that qemu 1.5 gives us live commit support, for taking a chain of 2 down to a single raw image.
Ok, good to know.
You went behind libvirt's back and removed /VM-snap1.img, but failed to update the managedsave image to record the location of the new filename.
<...>
Yes, but you still have the managedsave image in the way.
Right.
# virsh start VM error: Failed to start domain VM error: cannot open file 'VM-snap1.img': No such file or directory
Try 'virsh managedsave-remove VM' to get the broken managedsave image out of the way.
Well, no. I would expect to come back to the exact same environment as after the backup. To do so, I expect to be able to do steps 3, 4 and 5 cleanly. Step 3: save and halt the vm state once backups are finished. Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file. Step 5: start the VM.
Or, if you are brave, and insist on rebooting from the memory state at which the managedsave image was taken but are sure you have tweaked the disks correctly to match the same point in time, then you can use 'virsh save-image-edit /path/to/managedsave' (assuming you know where to look in /etc to find where the managedsave file was stored internally by libvirt). Since modifying files in /etc is not typically recommended, I will assume that if you can find the right file to edit, you are already brave enough to take on the consequences of going behind libvirt's back. At any rate, editing the managed save image to point back to the correct raw file name, followed by 'virsh start', will let you resume with memory restored to the point of your managed save (and hopefully you pointed the disks to the same point in time).
Exactly.
Looks like the complain comes from the xml state header.
# virsh save-image-edit /var/lib/libvirt/qemu/save/VM.save <virsh edit VM to come back to vda -> VM.raw> error: operation failed : new xml too large to fit in file #
Aha - so you ARE brave, and DID try to edit the managedsave file. I'm surprised that you really hit a case where your edits pushed the XML over a 4096-byte boundary. Can you come up with a way to (temporarily) use shorter names, such as having /VM-snap1.img be a symlink to the real file, just long enough for you to get the domain booted again?
Excellent. I don't know why I didn't think about trying that. Tested and the symlink trick works fine. I had to change the disk format in the memory header, of course. BTW, I guess I can prevent that by giving absolute path for the snapshot longer than the original disk path.
Also, I hope that you did your experimentation on a throwaway VM, and not on a production one, in case you did manage to fubar things to the point of data corruption by mismatching disk state vs. memory state.
I did everything in a testing environment where breaking guests or hypervisor does not matter.
Yes, I know that reverting to snapshots is still very much a work in progress in libvirt, and that you are not the first to ask these sorts of questions (reading the list archives will show that this topic comes up quite frequently).
While this is WIP and as users regulary ask for that I wonder if it would worth writing a simple and dirty python script for others. What I would provide would ask each one to hack the script for its own environment. Thanks a lot for your patience and insteresting information. -- Nicolas Sebrecht

On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
Here are the basics steps. This is still not that simple and there are tricky parts in the way.
Usual workflow (use case 2) ===========================
Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state). Step 3: save and halt the vm state once backups are finished. Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file. Step 5: start the VM.
This involves guest downtime, longer according to how much state changed since the snapshot.
Restarting from the backup (use case 1) =======================================
Step A: shutdown the running VM and move it out the way. Step B: restore the backing files and state file from the archives of step 2. Step C: restore the VM. (still not sure on that one, see below)
I wish to provide a more detailed procedure in the future.
With new enough libvirt and qemu, it is also possible to use 'virsh blockcopy' instead of snapshots as a backup mechanism, and THAT works with raw images without forcing your VM to use qcow2. But right now, it only works with transient guests (getting it to work for persistent guests requires a persistent bitmap feature that has been proposed for qemu 1.5, along with more libvirt work to take advantage of persistent bitmaps).
Fine. Sadly, my guests are not transient.
Guests can be made temporarily transient. That is, the following sequence has absolute minimal guest downtime, and can be done without any qcow2 files in the mix. For a guest with a single disk, there is ZERO! downtime: virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup --wait --verbose --finish virsh define dom.xml For a guest with multiple disks, the downtime can be sub-second, if you script things correctly (the downtime lasts for the duration between the suspend and resume, but the steps done in that time are all fast): virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup-vda virsh blockcopy dom vdb /path/to/backup-vdb polling loop - check periodically until 'virsh blockjob dom vda' and 'virsh blockjob dom vdb' both show 100% completion virsh suspend dom virsh blockjob dom vda --abort virsh blockjob dom vdb --abort virsh resume dom virsh define dom.xml In other words, 'blockcopy' is my current preferred method of online guest backup, even though I'm still waiting for qemu improvements to make it even nicer.
It appears I'm in worst case for all options. :-)
Not if you don't mind being temporarily transient.
There's also a proposal on the qemu lists to add a block-backup job, which I would need to expose in libvirt, which has even nicer backup semantics than blockcopy, and does not need a persistent bitmap.
Ok.
For that, I will probably be adding a 'virsh blockbackup dom vda' command.
Surprising! I would have expect files to be stored in virtuals/images. This is not the point for now, let's continue.
Actually, it would probably be better if libvirtd errored out on relative path names (relative to what? libvirtd runs in '/', and has no idea what directory virsh was running in), and therefore virsh should be nice and convert names to absolute before handing them to libvirtd.
Ok. I guess an error for relative paths would be fine to avoid unexpected paths. All embedded console I know support relative path (e.g.: python, irb, rails console, etc).
virsh would still support relative paths, it's just the underlying libvirtd should require absolute (in other words, the UI should do the normalizing, so that the RPC is unambiguous; right now the RPC is doing the normalization, but to the wrong directory because it doesn't know what the wording directory of the UI is). This is a bug, but easy enough to fix, and in the meantime, easy enough for you to work around (use absolute instead of relative, until libvirt 1.0.4 is out).
Here is where we are in the workflow (step C) for what we are talking about:
Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state).
During this step, the qcow2 files created in step 1 are getting larger proportional to the amount of changes done in the guest; obviously, the faster you can complete it, the smaller the deltas will be, and the faster your later merge steps will be. Since later merge steps have to be done while the guest is halted, it's good to keep small size in mind. More on this thought below...
Step 3: save and halt the vm state once backups are finished.
By 'halt the vm state', do you mean power it down, so that you would be doing a fresh boot (aka 'virsh shutdown dom', do your work including 'virsh edit dom', 'virsh start dom')? Or do you mean 'take yet another snapshot', so that you stop qemu, manipulate things to point to the right files, then start a new qemu but pickup up at the same point where the running guest left off (aka 'virsh save dom file', do your work including 'virsh save-file-edit file', 'virsh restore file')? My advice: Don't use managedsave. At this point, it just adds more confusion, and you are better off directly using 'virsh save' (managedsave is just a special case of 'virsh save', where libvirt picks the file name on your behalf, and where 'virsh start' is smart enough to behave like 'virsh restore file' on that managed name - but that extra magic in 'virsh start' makes life that much harder for you to modify what the guest will start with).
Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.
This step is done with raw qemu-img commands at the moment, and takes time proportional to the size of the qcow2 data.
Step 5: start the VM.
Based on how you stopped the vm in step 3, this is either via 'virsh start' (assuming you did 'virsh shutdown', or via 'virsh restore' (assuming you did 'virsh save'; with the special case that if you did 'virsh managedsave, 'virsh start' behaves like 'virsh restore'). ...As mentioned above, the time taken in step 2 can affect how big the delta is, and therefore how long step 4 lasts (while the guest is offline). If your original disk is huge, and copying it to your backup takes a long amount of time, it may pay to do an iterative approach: start with raw image: raw.img create the external snapshot at the point you care about raw.img <- snap1.qcow2 transfer raw.img and vmstate file to backup storage, taking as long as needed (gigabytes of data, so on the order of minutes, during which the qcow2 files can build up to megabytes in size) raw.img <- snap1.qcow2 create another external snapshot, but this time with --disk-only --no-metadata (we don't plan on reverting to this point in time) raw.img <- snap1.qcow2 <- snap2.qcow2 use 'virsh blockcommit dom vda --base /path/to/raw --top /path/to/snap1 --wait --verbose'; this takes time for megabytes of storage, but not gigabytes, so it is faster than the time to copy raw.img, which means snap2.qcow2 will hold less delta data than snap1.qcow2 raw.img <- snap2.qcow2 now stop the guest, commit snap2.qcow2 into raw.img, and restart the guest By doing an iteration, you've reduced the size of the file that has to be committed while the guest is offline; and may be able to achieve a noticeable reduction in guest downtime.
<For whatever reason, I have to restore the backup from step 2> Step A: shutdown the running VM and move it out the way.
Here, 'virsh destroy' is fine, if you don't care about maintaining parallel branches of execution past the snapshot point. If you DO plan on toggling between two parallel branches from a common snapshot, be sure to take another snapshot at this point in time.
Step B: restore the backing files and state file from the archives of step 2. Step C: restore the VM.
Here, you need to use 'virsh restore' on the file that holds the vm state from the point of the snapshot.
So, yes: this is the memory state from the point at which the snapshot was taken but I clearly expect it to point to the backing file only.
You can double-check what it points to with 'virsh save-image-dumpxml', to make sure.
Yeah, again a known limitation. Once you change state behind libvirt's back (because libvirt doesn't yet have snapshot-revert wired up to do things properly), you generally have to 'virsh snapshot-delete --metadata VM snap1' to tell libvirt to forget the snapshot existed, but without trying to delete any files, since you did the file deletion manually.
Good, this is what I was missing.
In fact, if you KNOW you don't care about libvirt tracking snapshots, you can do 'virsh snapshot-create[-as] --no-metadata dom ...' in the first place, so that you get the side effects of external file creation without any of the (soon-to-be-useless) metadata in the first place.
Try 'virsh managedsave-remove VM' to get the broken managedsave image out of the way.
Well, no. I would expect to come back to the exact same environment as after the backup. To do so, I expect to be able to do steps 3, 4 and 5 cleanly.
Again, my recommendation is to NOT use managedsave. It changes what 'virsh start' will do: if there is a managedsave image present, it takes precedence over anything you do in 'virsh edit', unless you use 'virsh start --force-boot' to intentionally discard the managedsave image. On the other hand, since the managedsave image is the only record of the running vm state, you don't want it discarded, which means your attempt to use 'virsh edit' are useless, and you are forced to use 'virsh save-image-edit' on a file that should be internal to libvirt.
Step 3: save and halt the vm state once backups are finished. Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file. Step 5: start the VM.
Sounds like we discussed this up above - you can either do this offline (virsh shutdown, edit, virsh start) or on a running image (virsh save, edit, virsh restore).
# virsh save-image-edit /var/lib/libvirt/qemu/save/VM.save <virsh edit VM to come back to vda -> VM.raw> error: operation failed : new xml too large to fit in file #
Aha - so you ARE brave, and DID try to edit the managedsave file. I'm surprised that you really hit a case where your edits pushed the XML over a 4096-byte boundary. Can you come up with a way to (temporarily) use shorter names, such as having /VM-snap1.img be a symlink to the real file, just long enough for you to get the domain booted again?
Excellent. I don't know why I didn't think about trying that. Tested and the symlink trick works fine. I had to change the disk format in the memory header, of course.
BTW, I guess I can prevent that by giving absolute path for the snapshot longer than the original disk path.
Yeah, being more careful about the saved image that you create in the first place will make it less likely that changing the save image adds enough content to push XML over a 4096-byte boundary.
Also, I hope that you did your experimentation on a throwaway VM, and not on a production one, in case you did manage to fubar things to the point of data corruption by mismatching disk state vs. memory state.
I did everything in a testing environment where breaking guests or hypervisor does not matter.
Always good advice, when trying something new and potentially dangerous :) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

I'm splitting my answer into different mails, one mail by strategy to help me not mix in between. The 15/03/13, Eric Blake wrote:
On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
Here are the basics steps. This is still not that simple and there are tricky parts in the way.
Usual workflow (use case 2) ===========================
Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state). Step 3: save and halt the vm state once backups are finished. Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file. Step 5: start the VM.
This involves guest downtime, longer according to how much state changed since the snapshot.
Yes.
Guests can be made temporarily transient. That is, the following sequence has absolute minimal guest downtime, and can be done without any qcow2 files in the mix. For a guest with a single disk, there is ZERO! downtime:
virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup --wait --verbose --finish virsh define dom.xml
For a guest with multiple disks, the downtime can be sub-second, if you script things correctly (the downtime lasts for the duration between the suspend and resume, but the steps done in that time are all fast):
virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup-vda virsh blockcopy dom vdb /path/to/backup-vdb polling loop - check periodically until 'virsh blockjob dom vda' and 'virsh blockjob dom vdb' both show 100% completion virsh suspend dom virsh blockjob dom vda --abort virsh blockjob dom vdb --abort virsh resume dom virsh define dom.xml
In other words, 'blockcopy' is my current preferred method of online guest backup, even though I'm still waiting for qemu improvements to make it even nicer.
Thanks for the procedure. The hypervisor in production I'm working on is running libvirt v0.9.8 and blockcopy was not supported at that time. Also, I'm seeing that blockcopy mirrors the disks from "old" to "new" until --abort or --pivot is passed to blockjob. The problem is that the guest I target in production is too much constrained (one disk is very large and mirroring it is not possible).
Here is where we are in the workflow (step C) for what we are talking about:
Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state).
During this step, the qcow2 files created in step 1 are getting larger proportional to the amount of changes done in the guest; obviously, the faster you can complete it, the smaller the deltas will be, and the faster your later merge steps will be. Since later merge steps have to be done while the guest is halted, it's good to keep small size in mind. More on this thought below...
Right. It still has to be tested against real guest. I expect the merge to be small enough as the script is run nightly. At the time I expect this step to starts (between 00:00 and 02:00 a.m.), nobody will be using the guests.
Step 3: save and halt the vm state once backups are finished.
By 'halt the vm state', do you mean power it down, so that you would be doing a fresh boot (aka 'virsh shutdown dom', do your work including 'virsh edit dom', 'virsh start dom')? Or do you mean 'take yet another snapshot', so that you stop qemu, manipulate things to point to the right files, then start a new qemu but pickup up at the same point where the running guest left off (aka 'virsh save dom file', do your work including 'virsh save-file-edit file', 'virsh restore file')?
I meant the latter, yes. I should have said "virsh save && virsh destroy" and later do the "virsh restore".
My advice: Don't use managedsave. At this point, it just adds more confusion, and you are better off directly using 'virsh save' (managedsave is just a special case of 'virsh save', where libvirt picks the file name on your behalf, and where 'virsh start' is smart enough to behave like 'virsh restore file' on that managed name - but that extra magic in 'virsh start' makes life that much harder for you to modify what the guest will start with).
Yes. I realized that from the previous test. Thanks for clearly confirm it.
Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.
This step is done with raw qemu-img commands at the moment, and takes time proportional to the size of the qcow2 data.
Right.
So, yes: this is the memory state from the point at which the snapshot was taken but I clearly expect it to point to the backing file only.
You can double-check what it points to with 'virsh save-image-dumpxml', to make sure.
Ok.
In fact, if you KNOW you don't care about libvirt tracking snapshots, you can do 'virsh snapshot-create[-as] --no-metadata dom ...' in the first place, so that you get the side effects of external file creation without any of the (soon-to-be-useless) metadata in the first place.
Ok.
Excellent. I don't know why I didn't think about trying that. Tested and the symlink trick works fine. I had to change the disk format in the memory header, of course.
BTW, I guess I can prevent that by giving absolute path for the snapshot longer than the original disk path.
Yeah, being more careful about the saved image that you create in the first place will make it less likely that changing the save image adds enough content to push XML over a 4096-byte boundary.
Good! -- Nicolas Sebrecht

The 15/03/13, Eric Blake wrote:
On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
<...>
Here is where we are in the workflow (step C) for what we are talking about:
Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state).
During this step, the qcow2 files created in step 1 are getting larger proportional to the amount of changes done in the guest; obviously, the faster you can complete it, the smaller the deltas will be, and the faster your later merge steps will be. Since later merge steps have to be done while the guest is halted, it's good to keep small size in mind. More on this thought below...
Step 3: save and halt the vm state once backups are finished.
Step3: virsh save && virsh destroy.
Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.
This step is done with raw qemu-img commands at the moment, and takes time proportional to the size of the qcow2 data.
Step 5: start the VM.
Step 5: virsh restore.
...As mentioned above, the time taken in step 2 can affect how big the delta is, and therefore how long step 4 lasts (while the guest is offline). If your original disk is huge, and copying it to your backup takes a long amount of time, it may pay to do an iterative approach:
start with raw image: raw.img
create the external snapshot at the point you care about raw.img <- snap1.qcow2
transfer raw.img and vmstate file to backup storage, taking as long as needed (gigabytes of data, so on the order of minutes, during which the qcow2 files can build up to megabytes in size) raw.img <- snap1.qcow2
create another external snapshot, but this time with --disk-only --no-metadata (we don't plan on reverting to this point in time) raw.img <- snap1.qcow2 <- snap2.qcow2
use 'virsh blockcommit dom vda --base /path/to/raw --top /path/to/snap1 --wait --verbose'; this takes time for megabytes of storage, but not gigabytes, so it is faster than the time to copy raw.img, which means snap2.qcow2 will hold less delta data than snap1.qcow2 raw.img <- snap2.qcow2
now stop the guest, commit snap2.qcow2 into raw.img, and restart the guest
By doing an iteration, you've reduced the size of the file that has to be committed while the guest is offline; and may be able to achieve a noticeable reduction in guest downtime.
Looks like a very good optimization for recent enough libvirt. Thanks! -- Nicolas Sebrecht

The 15/03/13, Eric Blake wrote:
On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
Here are the basics steps. This is still not that simple and there are tricky parts in the way.
Usual workflow (use case 2) ===========================
Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state). Step 3: save and halt the vm state once backups are finished. Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file. Step 5: start the VM.
This involves guest downtime, longer according to how much state changed since the snapshot.
Right.
Restarting from the backup (use case 1) =======================================
Step A: shutdown the running VM and move it out the way. Step B: restore the backing files and state file from the archives of step 2. Step C: restore the VM. (still not sure on that one, see below)
I wish to provide a more detailed procedure in the future.
With new enough libvirt and qemu, it is also possible to use 'virsh blockcopy' instead of snapshots as a backup mechanism, and THAT works with raw images without forcing your VM to use qcow2. But right now, it only works with transient guests (getting it to work for persistent guests requires a persistent bitmap feature that has been proposed for qemu 1.5, along with more libvirt work to take advantage of persistent bitmaps).
Fine. Sadly, my guests are not transient.
Guests can be made temporarily transient. That is, the following sequence has absolute minimal guest downtime, and can be done without any qcow2 files in the mix. For a guest with a single disk, there is ZERO! downtime:
virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup --wait --verbose --finish virsh define dom.xml
For a guest with multiple disks, the downtime can be sub-second, if you script things correctly (the downtime lasts for the duration between the suspend and resume, but the steps done in that time are all fast):
virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup-vda virsh blockcopy dom vdb /path/to/backup-vdb polling loop - check periodically until 'virsh blockjob dom vda' and 'virsh blockjob dom vdb' both show 100% completion virsh suspend dom virsh blockjob dom vda --abort virsh blockjob dom vdb --abort virsh resume dom virsh define dom.xml
In other words, 'blockcopy' is my current preferred method of online guest backup, even though I'm still waiting for qemu improvements to make it even nicer.
As I understand the man-page, blockcopy (without --shallow) creates a new disk file of a disk by merging all the current files if there are more than one. Unless --finish/--pivot is passed to blockcopy or until --abort/--pivot/--async is passed to blockjob, the original disks (before blockcopy started) and the new disk created by blockcopy are both mirrored. Only --pivot makes use of the new disk. So with --finish or --abort, we get a backup of a running guest. Nice! Except maybe that the backup doesn't include the memory state. In order to include the memory state to the backup, I guess the pause/resume is inevitable: virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup-vda polling loop - check periodically until 'virsh blockjob dom vda' shows 100% completion virsh suspend dom virsh save dom /path/to/memory-backup --running virsh blockjob dom vda --abort virsh resume dom virsh define dom.xml I'd say that the man page miss the information that these commands can run with a running guest, dispite the mirroring feature might imply it. I would also add a "sync" command just after the first command as a safety mesure to ensure the xml is kept on disk. The main drawback I can see is that the hypervisor must have at least as free disk space than the disks to backup... Or have the path/to/backups as a remote mount point. Now, I wonder if I change of backup strategy and make the remote hosting the backup mounted locally on the hypervisor (via nfs, iSCSI, sshfs, etc), should I expect write performance degradation? I mean, does the running guest wait for underlying both mirrored disk write (cache is set to none for the current disks)? -- Nicolas Sebrecht

On 03/18/2013 05:39 AM, Nicolas Sebrecht wrote:
The 15/03/13, Eric Blake wrote:
On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
In other words, 'blockcopy' is my current preferred method of online guest backup, even though I'm still waiting for qemu improvements to make it even nicer.
As I understand the man-page, blockcopy (without --shallow) creates a new disk file of a disk by merging all the current files if there are more than one.
Correct.
Unless --finish/--pivot is passed to blockcopy or until --abort/--pivot/--async is passed to blockjob, the original disks (before blockcopy started) and the new disk created by blockcopy are both mirrored.
Only --pivot makes use of the new disk. So with --finish or --abort, we get a backup of a running guest. Nice! Except maybe that the backup doesn't include the memory state.
Indeed, sounds like room for a future addition to libvirt, if qemu 1.5 gives us the additional hooks that I'd like to have. In particular, there is talk on the upstream qemu list about adding a blockbackup operation that would make capturing memory state and starting a backup job easier to manage than the current approach of starting a mirroring job, then capturing memory state and ending the mirroring job.
In order to include the memory state to the backup, I guess the pause/resume is inevitable:
virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup-vda polling loop - check periodically until 'virsh blockjob dom vda' shows 100% completion virsh suspend dom virsh save dom /path/to/memory-backup --running virsh blockjob dom vda --abort virsh resume dom virsh define dom.xml
A live-migration solution would have less downtime, but potentially take up more disk space. Again, there is talk on the upstream qemu list about how to optimize the amount of disk space taken by a live migration to a seekable file.
I'd say that the man page miss the information that these commands can run with a running guest, dispite the mirroring feature might imply it.
Care to write a patch? The virsh man page is generated from libvirt.git:tools/virsh.pod.
I would also add a "sync" command just after the first command as a safety mesure to ensure the xml is kept on disk.
Again, if libvirt can be improved to do better external snapshot management, a sync of the xml/memory state file should be part of this effort.
The main drawback I can see is that the hypervisor must have at least as free disk space than the disks to backup... Or have the path/to/backups as a remote mount point.
Yes, but that's true of any backup operation - you are committing to the disk space for both the live and the backup, in some form or another. Also, disk mirroring to a remote point is fairly easy to set up, by using nbd:... instead of /path/to/file as the destination for the blockcopy, and having an NBD server listening on the remote destination side.
Now, I wonder if I change of backup strategy and make the remote hosting the backup mounted locally on the hypervisor (via nfs, iSCSI, sshfs, etc), should I expect write performance degradation? I mean, does the running guest wait for underlying both mirrored disk write (cache is set to none for the current disks)?
Probably a question better asked on the qemu list, but yes, my understanding is that if you have a disk mirror or backup job set up, then qemu has to manage to flush I/O to two locations instead of one, and might end up slightly slowing the guest as a result. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The 20/03/13, Eric Blake wrote:
On 03/18/2013 05:39 AM, Nicolas Sebrecht wrote:
I'd say that the man page miss the information that these commands can run with a running guest, dispite the mirroring feature might imply it.
Care to write a patch? The virsh man page is generated from libvirt.git:tools/virsh.pod.
Sure, I wish to write one but I'm very busy these days. Will do my best.
The main drawback I can see is that the hypervisor must have at least as free disk space than the disks to backup... Or have the path/to/backups as a remote mount point.
Yes, but that's true of any backup operation - you are committing to the disk space for both the live and the backup, in some form or another. Also, disk mirroring to a remote point is fairly easy to set up, by using nbd:... instead of /path/to/file as the destination for the blockcopy, and having an NBD server listening on the remote destination side.
Don't know NBD very well but I'll investigate on it. Thanks for the input.
Now, I wonder if I change of backup strategy and make the remote hosting the backup mounted locally on the hypervisor (via nfs, iSCSI, sshfs, etc), should I expect write performance degradation? I mean, does the running guest wait for underlying both mirrored disk write (cache is set to none for the current disks)?
Probably a question better asked on the qemu list, but yes, my understanding is that if you have a disk mirror or backup job set up, then qemu has to manage to flush I/O to two locations instead of one, and might end up slightly slowing the guest as a result.
Ok. Thank you very much Eric for this whole thread and your efforts in giving me all these information I was missing. Very appreciated. -- Nicolas Sebrecht

The 13/03/13, Eric Blake wrote:
On 03/13/2013 09:25 AM, Nicolas Sebrecht wrote:
I use a script started each night to (managed)save the guests and upload them to a FTP server. Basically, I do
virsh managedsave guest <upload managedsave file> <upload guest disks> virsh start guest
You might want to look into external snapshots as a more efficient way of taking guest snapshots. With virsh managedsave/start, you are guaranteed guest downtime of several seconds at a minimum, and you have to do the disk management yourself; but with external snapshots, you can take a consistent picture of guest state at a point in time with sub-second downtime (the same way live migration between hosts has sub-second downtime).
I'm coming back to say that we now have a working solution in production based on external snapshots. It's written in python to execute to process detailed in this thread. The downtime is a few seconds. Thanks again. -- Nicolas Sebrecht
participants (3)
-
Eric Blake
-
Gao Yongwei
-
Nicolas Sebrecht