Hello everyone, I'm seeking guidance on best practices for virtual machine recovery using external disk snapshots, particularly in a storage environment with ZFS. My current snapshot and recovery workflow involves: - Keeping VM disks & state on a ZFS volume; - Creating external KVM/Libvirt disk-only snapshots, resulting in deltas kept on the volume, next to the disk images; - Capturing the entire VM state through ZFS snapshots; - VM recovery through ZFS snapshot clones. I am particularly interested in obtaining an app-consistent recovery, in which I need to revert to the KVM snapshot of the VM, to ensure the possible clean state offered by a quiesced snapshot. Reading other posts from the archive and forums, it is clear for me that I cannot simply revert to the VM's snapshot, if it's a disk-only one, and that I have to manage them manually. Thus, my question is: what is the best practice in order to recover the VM to the external disk snapshot that we have? What I have tried and worked but I'm not sure is the best practice: on a VM with only one snapshot, I've changed the disk source files (which were pointing to deltas), to the ones pointed by their backingStore source files, effectively making them use the disk state of the snapshot time. This only works for shut-off VMs, as live VMs cannot have their disk sources changed, of course. Thus, for powered on VMs in the use case with only one snapshot, I've chosen to use `virDomainBlockPull` in order to have the app-consistent state pulled on the current disk (which was and still is pointing to the delta). My concerns on the approach I took regard, mostly, scalability and the safety of the whole process: - I am not sure how I could revert again to the current snapshot with the operations I did: for powered off VMs, disk images will change once we start using the VM, and for powered on VMs, the blockpull will alter the deltas which the disks were pointing to; - I don't see how I could apply this method in a scalable way, if the VM had more than one snapshot. At least for powered-on VMs.
Thus, I thought I should seek some advice from you guys and see if there's another, smarter way that I can do this. Thanks a lot for your time, Alex Serban