
The 15/03/13, Eric Blake wrote:
On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
Here are the basics steps. This is still not that simple and there are tricky parts in the way.
Usual workflow (use case 2) ===========================
Step 1: create external snapshot for all VM disks (includes VM state). Step 2: do the backups manually while the VM is still running (original disks and memory state). Step 3: save and halt the vm state once backups are finished. Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file. Step 5: start the VM.
This involves guest downtime, longer according to how much state changed since the snapshot.
Right.
Restarting from the backup (use case 1) =======================================
Step A: shutdown the running VM and move it out the way. Step B: restore the backing files and state file from the archives of step 2. Step C: restore the VM. (still not sure on that one, see below)
I wish to provide a more detailed procedure in the future.
With new enough libvirt and qemu, it is also possible to use 'virsh blockcopy' instead of snapshots as a backup mechanism, and THAT works with raw images without forcing your VM to use qcow2. But right now, it only works with transient guests (getting it to work for persistent guests requires a persistent bitmap feature that has been proposed for qemu 1.5, along with more libvirt work to take advantage of persistent bitmaps).
Fine. Sadly, my guests are not transient.
Guests can be made temporarily transient. That is, the following sequence has absolute minimal guest downtime, and can be done without any qcow2 files in the mix. For a guest with a single disk, there is ZERO! downtime:
virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup --wait --verbose --finish virsh define dom.xml
For a guest with multiple disks, the downtime can be sub-second, if you script things correctly (the downtime lasts for the duration between the suspend and resume, but the steps done in that time are all fast):
virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup-vda virsh blockcopy dom vdb /path/to/backup-vdb polling loop - check periodically until 'virsh blockjob dom vda' and 'virsh blockjob dom vdb' both show 100% completion virsh suspend dom virsh blockjob dom vda --abort virsh blockjob dom vdb --abort virsh resume dom virsh define dom.xml
In other words, 'blockcopy' is my current preferred method of online guest backup, even though I'm still waiting for qemu improvements to make it even nicer.
As I understand the man-page, blockcopy (without --shallow) creates a new disk file of a disk by merging all the current files if there are more than one. Unless --finish/--pivot is passed to blockcopy or until --abort/--pivot/--async is passed to blockjob, the original disks (before blockcopy started) and the new disk created by blockcopy are both mirrored. Only --pivot makes use of the new disk. So with --finish or --abort, we get a backup of a running guest. Nice! Except maybe that the backup doesn't include the memory state. In order to include the memory state to the backup, I guess the pause/resume is inevitable: virsh dumpxml --security-info dom > dom.xml virsh undefine dom virsh blockcopy dom vda /path/to/backup-vda polling loop - check periodically until 'virsh blockjob dom vda' shows 100% completion virsh suspend dom virsh save dom /path/to/memory-backup --running virsh blockjob dom vda --abort virsh resume dom virsh define dom.xml I'd say that the man page miss the information that these commands can run with a running guest, dispite the mirroring feature might imply it. I would also add a "sync" command just after the first command as a safety mesure to ensure the xml is kept on disk. The main drawback I can see is that the hypervisor must have at least as free disk space than the disks to backup... Or have the path/to/backups as a remote mount point. Now, I wonder if I change of backup strategy and make the remote hosting the backup mounted locally on the hypervisor (via nfs, iSCSI, sshfs, etc), should I expect write performance degradation? I mean, does the running guest wait for underlying both mirrored disk write (cache is set to none for the current disks)? -- Nicolas Sebrecht