Hello everyone, I'm seeking guidance on *best practices* for virtual
machine recovery using external disk snapshots, particularly in a storage
environment with ZFS. My current snapshot and recovery *workflow* involves:
- Keeping VM disks & state on a ZFS volume; - Creating external KVM/Libvirt
disk-only snapshots, resulting in deltas kept on the volume, next to the
disk images; - Capturing the entire VM state through ZFS snapshots; - VM
recovery through ZFS snapshot clones. I am particularly interested in
obtaining an app-consistent recovery, in which I need to revert to the KVM
snapshot of the VM, to ensure the possible clean state offered by a
quiesced snapshot. Reading other posts from the archive and forums, it is
clear for me that I cannot simply revert to the VM's snapshot, if it's a
disk-only one, and that I have to manage them manually. Thus, my question
is: *what is the best practice in order to recover the VM to the external
disk snapshot that we have*? *What I have tried* and worked but I'm not
sure is the best practice: on a VM with only one snapshot, I've changed the
disk source files (which were pointing to deltas), to the ones pointed by
their backingStore source files, effectively making them use the disk state
of the snapshot time. This only works for shut-off VMs, as live VMs cannot
have their disk sources changed, of course. Thus, for powered on VMs in the
use case with only one snapshot, I've chosen to use `virDomainBlockPull` in
order to have the app-consistent state pulled on the current disk (which
was and still is pointing to the delta). *My concerns* on the approach I
took regard, mostly, scalability and the safety of the whole process: - I
am not sure how I could revert again to the current snapshot with the
operations I did: for powered off VMs, disk images will change once we
start using the VM, and for powered on VMs, the blockpull will alter the
deltas which the disks were pointing to; - I don't see how I could apply
this method in a scalable way, if the VM had more than one snapshot. At
least for powered-on VMs.
Thus, I thought I should seek some advice from you guys and see if there's
another, smarter way that I can do this. Thanks a lot for your time, Alex
Serban