[libvirt-users] Backup a VM (using live external snapshot and blockcommit)

Hi. I'm following here a conversation that was initiated on Kashyap's website [1]. We have a server we use as a host for virtual machines using KVM (virt-manager used for VM creation) and we would like to setup VM backups. Basically, we're thinking of a backup schedule like "keep 7 daily and 4 weekly backups". We'd rather not shutdown the VM every day so live backups would be nice. I've been doing my best with documentation found on the Internet. It is likely that the path I chose was not the best, so feel free to tell me if I'm asking the wrong questions and I should be proceeding totally differently. AFAIU, backups can be done at filesystem level (using LVM) and at virtualization level (using libvirt). We chose the libvirt way. AFAIU, live backups using libvirt may be done thanks to blockcommit as explained here on the wiki [2]. -> Considering our use case, is this the recommended way? Assuming yes, here is the plan. I wrote a script that does # Create snapshot virsh snapshot-create-as --domain $VM_NAME snap --diskspec vda,file=$VM_DIR/"$VM_NAME"-snap.qcow2 --disk-only --atomic --no-metadata --quiesce # Copy frozen backing file cp $VM_DIR/"$VM_NAME".qcow2 $SNAP_FILEPATH # Blockcommit snapshot back into backing file virsh blockcommit $VM_NAME vda --active --pivot # Remove snapshot file rm $VM_DIR/"$VM_NAME"-snap.qcow2 Variables should be self-explanatory: - VM_DIR is the directory where the VM are stored - VM_NAME is the name of the VM, and its qcow2 file is called VM_NAME.qcow2 - SNAP_FILEPATH is the full path (including name) where the backup should be created Using this scheme, we only keep snapshots for the time of the VM file copy, which is less than a minute. The backing chain is at most 'back <- snap', and most of the time just 'back'. If something ever happens to the VM (human error while being logged as root, attack from the internet,...), we'll turn off the VM, replace its qcow2 file and turn it back on. I understand that this method only saves disk states, so the VM will be started as if it had been powered-off suddenly while running (not quite: thanks to the '--quiesce' option, at least the disks are in a sane state). Not perfect but better than nothing. Those backups are meant to be used only when all else failed, anyway, it's not daily use. -> Does this make sense? Am I missing a feature or even a different approach that would make things simpler or more secure? Am I using libvirt snapshots for what they're not meant to? -> Anything wrong about my snapshot-create-as and blockcommit command lines? May I remove the snapshot with only a rm command? Now, a few side questions, as I might have messed up with the VM I was experimenting with. I used the same command lines as described above, except I didn't pass the '--no-metadata' option. Once the backing file was copied, I deleted the snapshot qcow2 file and thought I was done with it, until I realized the snapshot was still listed by virsh snapshot-list. And I couldn't find a way to delete it. (For the record, I asked on serverfault about that [3].) Ultimately, I found the snapshot's .xml descriptor and deleted it (in fact, moved it) while libvirtd was down. Now, the snapshot is not listed anymore. -> Am I getting away with it? Are there still some traces about that snapshot? Is my VM in an unsafe state? Anything I should do about it? -> What would be the proper way of dropping an external snapshot that was created without the '--no-metadata' option, then blockcommitted? I understand libvirt doesn't do it yet. Thanks for any hint. I naively thought our use case was pretty usual, and I must admit I didn't think I'd have to dive into this complexity, which is why I'm thinking there might be a more "common" way... [1] http://kashyapc.com/2014/10/07/libvirt-blockcommit-shorten-disk-image-chain-... [2] http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit [3] http://serverfault.com/questions/721216/delete-orphan-libvirt-snapshot -- Jérôme

On 09/11/2015 06:45 AM, Jérôme wrote:
AFAIU, live backups using libvirt may be done thanks to blockcommit as explained here on the wiki [2].
-> Considering our use case, is this the recommended way?
Yes, using active block-commit is the ideal way to perform a live backup.
Assuming yes, here is the plan.
I wrote a script that does
# Create snapshot virsh snapshot-create-as --domain $VM_NAME snap --diskspec vda,file=$VM_DIR/"$VM_NAME"-snap.qcow2 --disk-only --atomic --no-metadata --quiesce
# Copy frozen backing file cp $VM_DIR/"$VM_NAME".qcow2 $SNAP_FILEPATH
# Blockcommit snapshot back into backing file virsh blockcommit $VM_NAME vda --active --pivot
# Remove snapshot file rm $VM_DIR/"$VM_NAME"-snap.qcow2
Yep, that about covers it. Note that the --quiesce step in snapshot creation requires qemu-guest-agent running in the guest, and that you trust interaction with your guest.
I understand that this method only saves disk states, so the VM will be started as if it had been powered-off suddenly while running (not quite: thanks to the '--quiesce' option, at least the disks are in a sane state). Not perfect but better than nothing. Those backups are meant to be used only when all else failed, anyway, it's not daily use.
Yep.
-> Does this make sense? Am I missing a feature or even a different approach that would make things simpler or more secure? Am I using libvirt snapshots for what they're not meant to?
No, you're spot on for one of the useful use cases of snapshots.
-> Anything wrong about my snapshot-create-as and blockcommit command lines? May I remove the snapshot with only a rm command?
Looks correct to me, and matches my recent KVM Forum slides: http://events.linuxfoundation.org/sites/events/files/slides/2015-qcow2-expan...
Now, a few side questions, as I might have messed up with the VM I was experimenting with.
I used the same command lines as described above, except I didn't pass the '--no-metadata' option. Once the backing file was copied, I deleted the snapshot qcow2 file and thought I was done with it, until I realized the snapshot was still listed by virsh snapshot-list. And I couldn't find a way to delete it. (For the record, I asked on serverfault about that [3].)
virsh snapshot-delete --metadata $dom $badname to remove $badname snapshot that no longer exists because you changed things behind the scenes.
Ultimately, I found the snapshot's .xml descriptor and deleted it (in fact, moved it) while libvirtd was down. Now, the snapshot is not listed anymore.
-> Am I getting away with it? Are there still some traces about that snapshot? Is my VM in an unsafe state? Anything I should do about it?
Directly manipulating .xml files behind libvirt's back is not ideal; better is to use libvirt APIs (the way snapshot-delete --metadata does).
-> What would be the proper way of dropping an external snapshot that was created without the '--no-metadata' option, then blockcommitted? I understand libvirt doesn't do it yet.
Thanks for any hint. I naively thought our use case was pretty usual, and I must admit I didn't think I'd have to dive into this complexity, which is why I'm thinking there might be a more "common" way...
Nope, right now, there is still some user burden rather than a one-command-does-it-all virsh wrapper. But fortunately it is not too bad (you proved it is scriptable), and you discovered the correct sequencing. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hi Eric. Thank you so much for your quick and relieving answer. Le 2015-09-11 17:05, Eric Blake a écrit :
Yes, using active block-commit is the ideal way to perform a live backup.
Great.
Yep, that about covers it. Note that the --quiesce step in snapshot creation requires qemu-guest-agent running in the guest, and that you trust interaction with your guest.
Yes, I think I get this. I don't really figure out what these cases could be. We're using Debian Jessie and I installed qemu-guest-agent. Other VM could use other systems, but most likely Linux based. Do you mean that, in cases where you shouldn't trust the guest, using '--quiesce' might end up being worse than nothing? Or just useless?
-> Anything wrong about my snapshot-create-as and blockcommit command lines? May I remove the snapshot with only a rm command?
Looks correct to me, and matches my recent KVM Forum slides: http://events.linuxfoundation.org/sites/events/files/slides/2015-qcow2-expan...
I'll have a look at these, thanks.
Now, a few side questions, as I might have messed up with the VM I was experimenting with.
I used the same command lines as described above, except I didn't pass the '--no-metadata' option. Once the backing file was copied, I deleted the snapshot qcow2 file and thought I was done with it, until I realized the snapshot was still listed by virsh snapshot-list. And I couldn't find a way to delete it. (For the record, I asked on serverfault about that [3].)
virsh snapshot-delete --metadata $dom $badname
to remove $badname snapshot that no longer exists because you changed things behind the scenes.
Before removing the .xml file, I tried the command indicated in the wiki [1] with no success. "NOTE-2: Optionally, you can also supply '--no-metadata' option to tell libvirt to not track the snapshot metadata -- this is useful currently as at a later point when you merge snapshot files, then you have to explicitly clean the libvirt metadata (by invoking: virsh snapshot-delete vm1 --delete --current -- repeat this as needed.)" Shouldn't the virsh snapshot-delete vm1 --delete --current be rephrased as virsh snapshot-delete vm1 --metadata --current ? I see '--delete' is not listed in the man. Or even virsh snapshot-delete vm1 --metadata $badname since after the blockcommit, the snapshot is unused, I'm not sure it is considered current. Anyway, I'm glad you confirm I now have the correct sequence. Thanks again. Enjoy the WE. [1] http://wiki.libvirt.org/page/Live-disk-backup-with-active-blockcommit -- Jérôme

On 09/11/2015 10:18 AM, Jérôme wrote:
Yep, that about covers it. Note that the --quiesce step in snapshot creation requires qemu-guest-agent running in the guest, and that you trust interaction with your guest.
Yes, I think I get this. I don't really figure out what these cases could be. We're using Debian Jessie and I installed qemu-guest-agent. Other VM could use other systems, but most likely Linux based.
qga with support for quiesce has also been ported to Windows guests.
Do you mean that, in cases where you shouldn't trust the guest, using '--quiesce' might end up being worse than nothing? Or just useless?
If the agent is not running, using --quiesce will fail the entire command; you'd learn pretty quickly to retry without --quiesce for guests that don't know how to handle it. . But if the guest is malicious, it can pretend to be a guest agent, but intentionally refuse to reply to the --quiesce request, and leave libvirt hung waiting for a reply. So it boils down to whether you trust your guests to be reasonable with their guest agent connection (fine if it is your own guests, not so much if you are hosting a cloud for other people's guests).
-> Anything wrong about my snapshot-create-as and blockcommit command lines? May I remove the snapshot with only a rm command?
Looks correct to me, and matches my recent KVM Forum slides: http://events.linuxfoundation.org/sites/events/files/slides/2015-qcow2-expan...
I'll have a look at these, thanks.
The libvirt commands were towards the end, in part 3; but the first two parts might give a better understanding of the overall operations of what is happening.
virsh snapshot-delete --metadata $dom $badname
to remove $badname snapshot that no longer exists because you changed things behind the scenes.
Before removing the .xml file, I tried the command indicated in the wiki [1] with no success.
"NOTE-2: Optionally, you can also supply '--no-metadata' option to tell libvirt to not track the snapshot metadata -- this is useful currently as at a later point when you merge snapshot files, then you have to explicitly clean the libvirt metadata (by invoking: virsh snapshot-delete vm1 --delete --current -- repeat this as needed.)"
Shouldn't the
virsh snapshot-delete vm1 --delete --current
be rephrased as
virsh snapshot-delete vm1 --metadata --current
Yep, sounds like a bug in the wiki, so I fixed it. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

[. . .] On Fri, Sep 11, 2015 at 10:45:34AM -0600, Eric Blake wrote:
On 09/11/2015 10:18 AM, Jérôme wrote:
Yep, that about covers it. Note that the --quiesce step in snapshot creation requires qemu-guest-agent running in the guest, and that you trust interaction with your guest.
Yes, I think I get this. I don't really figure out what these cases could be. We're using Debian Jessie and I installed qemu-guest-agent. Other VM could use other systems, but most likely Linux based.
qga with support for quiesce has also been ported to Windows guests.
Do you mean that, in cases where you shouldn't trust the guest, using '--quiesce' might end up being worse than nothing? Or just useless?
If the agent is not running, using --quiesce will fail the entire command; you'd learn pretty quickly to retry without --quiesce for guests that don't know how to handle it. . But if the guest is malicious, it can pretend to be a guest agent, but intentionally refuse to reply to the --quiesce request, and leave libvirt hung waiting for a reply. So it boils down to whether you trust your guests to be reasonable with their guest agent connection (fine if it is your own guests, not so much if you are hosting a cloud for other people's guests).
-> Anything wrong about my snapshot-create-as and blockcommit command lines? May I remove the snapshot with only a rm command?
Looks correct to me, and matches my recent KVM Forum slides: http://events.linuxfoundation.org/sites/events/files/slides/2015-qcow2-expan...
I'll have a look at these, thanks.
Yes, I highly recommend it. This talk gives an excellent under-the-hood details of virtual machine disk image backing chain management. Associated video: https://www.youtube.com/watch?v=etIGp12RHRE
The libvirt commands were towards the end, in part 3; but the first two parts might give a better understanding of the overall operations of what is happening.
virsh snapshot-delete --metadata $dom $badname
to remove $badname snapshot that no longer exists because you changed things behind the scenes.
Before removing the .xml file, I tried the command indicated in the wiki [1] with no success.
"NOTE-2: Optionally, you can also supply '--no-metadata' option to tell libvirt to not track the snapshot metadata -- this is useful currently as at a later point when you merge snapshot files, then you have to explicitly clean the libvirt metadata (by invoking: virsh snapshot-delete vm1 --delete --current -- repeat this as needed.)"
Shouldn't the
virsh snapshot-delete vm1 --delete --current
be rephrased as
virsh snapshot-delete vm1 --metadata --current
Yep, sounds like a bug in the wiki, so I fixed it.
Indeed, it was a typo. I didn't even notice it until now as I just type these commands from muscle memory. Thanks, Eric, for fixing it (and for all the detailed responses). -- /kashyap

Le Fri, 11 Sep 2015 10:45:34 -0600, Eric Blake <eblake@redhat.com> a écrit :
But if the guest is malicious, it can pretend to be a guest agent, but intentionally refuse to reply to the --quiesce request, and leave libvirt hung waiting for a reply. So it boils down to whether you trust your guests to be reasonable with their guest agent connection (fine if it is your own guests, not so much if you are hosting a cloud for other people's guests).
Of course. I didn't think of this use case.
virsh snapshot-delete vm1 --metadata --current
Yep, sounds like a bug in the wiki, so I fixed it.
I thought so but didn't dare to be too affirmative about it. Glad it is fixed. Hopefully, it will save someone some time and trouble. Let this be my micro-contribution... Thanks again. -- Jérôme
participants (3)
-
Eric Blake
-
Jérôme
-
Kashyap Chamarthy