[libvirt-users] backup procedure using snapshots

Monday, 18 March 2013

I'm splitting my answer into different mails, one mail by strategy to
help me not mix in between.

The 15/03/13, Eric Blake wrote:
...
 On 03/15/2013 06:17 AM, Nicolas Sebrecht wrote:
 > Here are the basics steps. This is still not that simple and there are
 > tricky parts in the way.
 > 
 > Usual workflow (use case 2)
 > ===========================
 > 
 > Step 1: create external snapshot for all VM disks (includes VM state).
 > Step 2: do the backups manually while the VM is still running (original disks and
memory state).
 > Step 3: save and halt the vm state once backups are finished.
 > Step 4: merge the snapshots (qcow2 disk wrappers) back to their backing file.
 > Step 5: start the VM.

 This involves guest downtime, longer according to how much state changed
 since the snapshot. 
Yes.

...
 Guests can be made temporarily transient.  That is, the following
 sequence has absolute minimal guest downtime, and can be done without
 any qcow2 files in the mix.  For a guest with a single disk, there is
 ZERO! downtime:

 virsh dumpxml --security-info dom > dom.xml
 virsh undefine dom
 virsh blockcopy dom vda /path/to/backup --wait --verbose --finish
 virsh define dom.xml

 For a guest with multiple disks, the downtime can be sub-second, if you
 script things correctly (the downtime lasts for the duration between the
 suspend and resume, but the steps done in that time are all fast):

 virsh dumpxml --security-info dom > dom.xml
 virsh undefine dom
 virsh blockcopy dom vda /path/to/backup-vda
 virsh blockcopy dom vdb /path/to/backup-vdb
 polling loop - check periodically until 'virsh blockjob dom vda' and
 'virsh blockjob dom vdb' both show 100% completion
 virsh suspend dom
 virsh blockjob dom vda --abort
 virsh blockjob dom vdb --abort
 virsh resume dom
 virsh define dom.xml

 In other words, 'blockcopy' is my current preferred method of online
 guest backup, even though I'm still waiting for qemu improvements to
 make it even nicer. 
Thanks for the procedure. The hypervisor in production I'm working on is
running libvirt v0.9.8 and blockcopy was not supported at that time.

Also, I'm seeing that blockcopy mirrors the disks from "old" to
"new"
until --abort or --pivot is passed to blockjob. The problem is that the
guest I target in production is too much constrained (one disk is very
large and mirroring it is not possible).

...
 > Here is where we are in the workflow (step C) for what we are
talking about:
 > 
 > Step 1: create external snapshot for all VM disks (includes VM state).
 > Step 2: do the backups manually while the VM is still running (original disks and
memory state).

 During this step, the qcow2 files created in step 1 are getting larger
 proportional to the amount of changes done in the guest; obviously, the
 faster you can complete it, the smaller the deltas will be, and the
 faster your later merge steps will be.  Since later merge steps have to
 be done while the guest is halted, it's good to keep small size in mind.
  More on this thought below... 
Right. It still has to be tested against real guest. I expect the merge
to be small enough as the script is run nightly. At the time I expect
this step to starts (between 00:00 and 02:00 a.m.), nobody will be using
the guests.

...
 > Step 3: save and halt the vm state once backups are finished.

 By 'halt the vm state', do you mean power it down, so that you would be
 doing a fresh boot (aka 'virsh shutdown dom', do your work including
 'virsh edit dom', 'virsh start dom')?  Or do you mean 'take yet
another
 snapshot', so that you stop qemu, manipulate things to point to the
 right files, then start a new qemu but pickup up at the same point where
 the running guest left off (aka 'virsh save dom file', do your work
 including 'virsh save-file-edit file', 'virsh restore file')? 
I meant the latter, yes. I should have said "virsh save && virsh
destroy"
and later do the "virsh restore".

...
 My advice: Don't use managedsave.  At this point, it just adds
more
 confusion, and you are better off directly using 'virsh save'
 (managedsave is just a special case of 'virsh save', where libvirt picks
 the file name on your behalf, and where 'virsh start' is smart enough to
 behave like 'virsh restore file' on that managed name - but that extra
 magic in 'virsh start' makes life that much harder for you to modify
 what the guest will start with). 
Yes. I realized that from the previous test. Thanks for clearly confirm
it.

...
 > Step 4: merge the snapshots (qcow2 disk wrappers) back to their
backing file.

 This step is done with raw qemu-img commands at the moment, and takes
 time proportional to the size of the qcow2 data. 
Right.

...
 > So, yes: this is the memory state from the point at which the
snapshot
 > was taken but I clearly expect it to point to the backing file only.

 You can double-check what it points to with 'virsh save-image-dumpxml',
 to make sure. 
Ok.

...
 In fact, if you KNOW you don't care about libvirt tracking
snapshots,
 you can do 'virsh snapshot-create[-as] --no-metadata dom ...' in the
 first place, so that you get the side effects of external file creation
 without any of the (soon-to-be-useless) metadata in the first place. 
Ok.

...
 > Excellent. I don't know why I didn't think about trying
that. Tested and
 > the symlink trick works fine. I had to change the disk format in the
 > memory header, of course.
 > 
 > BTW, I guess I can prevent that by giving absolute path for the
 > snapshot longer than the original disk path.

 Yeah, being more careful about the saved image that you create in the
 first place will make it less likely that changing the save image adds
 enough content to push XML over a 4096-byte boundary. 
Good!

-- 
Nicolas Sebrecht

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

[libvirt-users] backup procedure using snapshots