On 07/06/2011 02:47 AM, Nicolas Sebrecht wrote:
> And even if you have control over which of the two images to
> delete, you may also want to have control over the final filename used
> for the merged image (that is, in the 5% dirty case, use the
> snapshot->base merge followed by rename(base,snapshot), rather than
> wasting time on the base->snapshot merge, to still get the end result
> that the final filename is snapshot).
I agree with your analysis. The current behaviour is blocking us from
using qcow2 snapshots while your RFC fix the issues.
But in your last sentence, what do you mean by "to still get the end
result that the final filename is snapshot"? As end result, are you
talking about:
1) the filename (which would mean we could have moving path for disks)
Starting from a single file base.img, if I take an external snapshot, I
would now have base.img as the read-only base and new.img as the live
file with a backing of base.img. If I then want to merge the two files,
I want to choose which file to keep:
if I want to keep base.img, then I either merge the dirty blocks from
new.img back to base.img, or I merge the clean blocks from base.img to
new.img then rename new.img to base.img.
if I want to keep new.img, then I either merge clean blocks from
base.img to new.img, or I merge dirty blocks from new.img to base.img
then rename base.img to new.img.
But your point about changing file names is also an important
consideration - when libvirt tells qemu to do a snapshot, what is really
happening (and should qemu support both modes of operation)?
1. libvirt pre-creates an empty file new.img with correct permissions,
then tells qemu that file name; qemu uses the existing base.img as the
read-only base and makes all further edits into the file new.img; so
libvirt has to update the domain XML
2. libvirt creates a hard link new.img as an alternate name to base.img,
then tells qemu that file name; qemu then opens new.img, unlinks
base.img, and recreates base.img with new.img as the backing file,
making all further edits to the new inode but existing base.img file
name; so libvirt does not have to edit domain XML. Except that
permissions may prevent qemu from re-creating a file, and truncating a
hard-linked file is insufficient. So this method would involve some
additional handshaking steps, where qemu would have to get help from
libvirt in re-creating the new file. So this is a non-starter.
3. libvirt renames base.img to new.img while qemu still has the fd open,
then creates base.img with the right permissions, and tells qemu to make
the snapshot into base.img with a backing of new.img; all further edits
go into the file base.img which now has a backing file of new.img. But
this method implies that qemu has to either trust that libvirt did the
rename correctly (or compare the inode between an fstat of the existing
open fd and the stat of the backing file name), as well as implying that
libvirt has to pass two filenames instead of one. It also implies that
you can rename an in-use file (renaming devices doesn't work as well,
and this isn't portable to mingw). So this also sounds like a non-starter.
4. Any other possibilities?
In other words, it looks like we are stuck with updating XML to track
new file names any time we take a snapshot.
--
Eric Blake eblake(a)redhat.com +1-801-349-2682
Libvirt virtualization library
http://libvirt.org