The 15/03/13, Eric Blake wrote:
On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
> Here are the basics steps. This is still not that simple and
there are
> tricky parts in the way.
>
> Usual workflow (use case 2)
> ===========================
>
> Step 1: create external snapshot for all VM disks (includes VM state).
> Step 2: do the backups manually while the VM is still running (original disks and
memory state).
> Step 3: save and halt the vm state once backups are finished.
> Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.
> Step 5: start the VM.
This involves guest downtime, longer according to how much state changed
since the snapshot.
Right.
> Restarting from the backup (use case 1)
> =======================================
>
> Step A: shutdown the running VM and move it out the way.
> Step B: restore the backing files and state file from the archives of step 2.
> Step C: restore the VM. (still not sure on that one, see below)
>
> I wish to provide a more detailed procedure in the future.
>
>
>> With new enough libvirt and qemu, it is also possible to use 'virsh
>> blockcopy' instead of snapshots as a backup mechanism, and THAT works
>> with raw images without forcing your VM to use qcow2. But right now, it
>> only works with transient guests (getting it to work for persistent
>> guests requires a persistent bitmap feature that has been proposed for
>> qemu 1.5, along with more libvirt work to take advantage of persistent
>> bitmaps).
>
> Fine. Sadly, my guests are not transient.
Guests can be made temporarily transient. That is, the following
sequence has absolute minimal guest downtime, and can be done without
any qcow2 files in the mix. For a guest with a single disk, there is
ZERO! downtime:
virsh dumpxml --security-info dom > dom.xml
virsh undefine dom
virsh blockcopy dom vda /path/to/backup --wait --verbose --finish
virsh define dom.xml
For a guest with multiple disks, the downtime can be sub-second, if you
script things correctly (the downtime lasts for the duration between the
suspend and resume, but the steps done in that time are all fast):
virsh dumpxml --security-info dom > dom.xml
virsh undefine dom
virsh blockcopy dom vda /path/to/backup-vda
virsh blockcopy dom vdb /path/to/backup-vdb
polling loop - check periodically until 'virsh blockjob dom vda' and
'virsh blockjob dom vdb' both show 100% completion
virsh suspend dom
virsh blockjob dom vda --abort
virsh blockjob dom vdb --abort
virsh resume dom
virsh define dom.xml
In other words, 'blockcopy' is my current preferred method of online
guest backup, even though I'm still waiting for qemu improvements to
make it even nicer.
As I understand the man-page, blockcopy (without --shallow) creates a
new disk file of a disk by merging all the current files if there are
more than one.
Unless --finish/--pivot is passed to blockcopy or until
--abort/--pivot/--async is passed to blockjob, the original disks
(before blockcopy started) and the new disk created by blockcopy are
both mirrored.
Only --pivot makes use of the new disk. So with --finish or --abort, we
get a backup of a running guest. Nice! Except maybe that the backup
doesn't include the memory state.
In order to include the memory state to the backup, I guess the
pause/resume is inevitable:
virsh dumpxml --security-info dom > dom.xml
virsh undefine dom
virsh blockcopy dom vda /path/to/backup-vda
polling loop - check periodically until 'virsh blockjob dom vda'
shows 100% completion
virsh suspend dom
virsh save dom /path/to/memory-backup --running
virsh blockjob dom vda --abort
virsh resume dom
virsh define dom.xml
I'd say that the man page miss the information that these commands can
run with a running guest, dispite the mirroring feature might imply it.
I would also add a "sync" command just after the first command as a
safety mesure to ensure the xml is kept on disk.
The main drawback I can see is that the hypervisor must have at least as
free disk space than the disks to backup... Or have the path/to/backups
as a remote mount point.
Now, I wonder if I change of backup strategy and make the remote hosting
the backup mounted locally on the hypervisor (via nfs, iSCSI, sshfs,
etc), should I expect write performance degradation? I mean, does the
running guest wait for underlying both mirrored disk write (cache is set
to none for the current disks)?
--
Nicolas Sebrecht