On Thu, May 02, 2024 at 14:14:25 +0300, Marius Vollmer wrote:
Hi!
I am trying to improve the support for VM snapshots in the Cockpit web
console, and I am afraid I have questions...
We have been asked to prefer the "external" over the "internal"
snapshot
format, at least on RHEL. I haven't yet figured out why, and
consequently I am struggling with deciding how hard the Cockpit UI
should push people towards external snapshots.
This is because development preferentially went into external
snapshots. This unfortunately also meant that internal snapshots were
neglected.
So, what's wrong with internal snapshots?
Firstly. Internal snapshots only work with storage formats which do
support them. Basically you can't snapshot a VM with 'raw' disk. That's
not the case with external snapshots as qcow2 is done as an overlay.
Currently, internal snapshots (at least when done via libvirt) don't
allow you to do a partial (not all disks) snapshot or don't work with
UEFI (historical reasons -> memory image would be stored in the UEFI
image). Both of those can be solved but will require some work.
I heard they are "unreliable", but how so in detail? Does the data
structure inside the qcow2 files get corrupted easily?
I don't think this ever was true.
Do they behave
poorly when the snapshot process runs out of disk in the middle?
The main problem is that the VM is paused and the interaction with
libvirt blocks until the snapshot is done.
The only disk consumption is from when the memory si snapshotted, in
either case that should lead to failuer in snapshotting and the VM
should continue
That
sort of thing would help me a lot to figure out what Cockpit should be
doing on platforms other than RHEL.
I don't think having different behaviour is a good idea.
And how well (or how soon) can external snapshots be expected to
work?
Very technically libvirt already expects external snapshots to work.
Said that it's a relatively recent implementation so there may be bugs.
I have severely messed up my libvirt state a couple of times while
playing around with them, and my confidence in them right now isn't
great. :-) Are you surprised by this? Or are external snapshots not yet
considered ready?
Please do report them including steps how you managed to break stuff.
(Most recent example from my experiments: Deleting a full system
snapshot of a paused machine fails with "internal error: unable to
execute QEMU command 'block-commit': Block node is read-only". Reverting
to it works and after that it can also be deleted, all while the VM is
paused.)
Ah right. This is a corner case oversight in the implementation. QEMU
apparently removes write flags from the storage while the VM is paused.
Once again please report this in our issue tracker.