On Tue, Jul 5, 2011 at 8:59 PM, Eric Blake <eblake(a)redhat.com> wrote:
On 07/04/2011 08:19 AM, Stefan Hajnoczi wrote:
> On Thu, Jun 16, 2011 at 6:41 AM, Eric Blake <eblake(a)redhat.com> wrote:
>
> Robert, Fernando, Jagane: I have CCed you because we have discussed
> snapshot APIs and I thought you'd be interested in Eric's work to
> build them for libvirt.
>
> Does each volume have its own independent snapshot namespace? It may
> be wise to document that snapshot namespaces are *not* independent
> because storage backends may not be able to provide these semantics.
Good question, and I'm not quite sure on the best way to represent this.
For qcow2 internal snapshots, the answer is obvious - each qcow2 image
has its own snapshot namespace (and you can currently view this with
qemu-img snapshot -l). But it looks like the existing drive to add qemu
snapshot support to live domains is focusing solely on external
snapshots (that is, a qcow2 snapshot involves creating a new filename,
then marking the old filename as a read-only backing image of the new
qcow2 filename; where qcow2 can even be used as a snapshot around a raw
file).
For external snapshots, I'm not sure whether the namespace should be
specific to a storage pool (that is, any additional metadata that
libvirt needs to track snapshot relationships should be stored in the
same directory as the disk images themselves) or specific to the libvirt
host (that is, just as libvirt's notion of a persistent storage pool
happens to be stored in /etc/libvirt/storage/pool.xml, libvirt should
also manage a file /etc/libvirt/storage/pool/snapshot.xml to describe
all snapshots tracked within "pool"). But I'm certainly thinking that
it is more likely to be a pool-wide namespace, rather than a
storage-volume local namespace.
Pool-wide seems reasonable.
>
> Is this function necessary when you already have
> virStorageVolSnapshotListNames()?
Maybe I need to understand why we have it for the virDomainSnapshot
case, and whether it still makes sense for a disk image that is not
associated with a domain. To some degree, I think it seems necessary,
to reflect the fact that with internal qcow2 snapshots, I can do:
qemu-img snapshsot -c one file
run then stop vm to modify file
qemu-img snapshot -c two file
qemu-img snapshot -a one file
run then stop vm to modify file
qemu-img snapshot -c three file
with the resulting hierarchy:
one -> two
\-> three
On the other hand, qemu-img doesn't appear to list any hierarchies
between internal snapshots - that is, while 'qemu-img snapshot -l' will
list one, two, and three, it gives no indication that three depends on
one but not two, nor whether the current state of the file would be a
delta against three, two, one, or even parent-less.
There is no explicit relationship between internal qcow2 snapshots.
qcow2 does reference counting of the actual data clusters and tables
but the snapshot itself is oblivious. You can delete "one" without
affecting "two" or "three". There is no dependency relationship
between snapshots themselves, only reference counts on data clusters
and tables.
Here is how qcow2 snapshot operations work:
1. Create snapshot
Increment reference counts for entire active image.
Copy active L1 table into the snapshot data structure.
2. Activate snapshot
Decrement reference counts for entire active image.
Copy snapshot L1 table into active data structure.
Increment reference counts for entire active image.
3. Delete snapshot
Decrement reference counts for entire snapshot image.
This also starts to get into questions about the ability to split a
qcow2 image with internal snapshots. That is, if I have a single file
with snapshot one and a delta against that snapshot as the current disk
state, it would be nice to create a new qcow2 file with identical
contents to snapshot one, then rebase the existing qcow2 file to have a
backing file of my new clone file and delete the internal snapshot from
the original file. But this starts to sound like work on live block
copy APIs. For an offline storage volume, we can do things manually
(qemu-img snapshot -c to temporarily create yet another snapshot point
to later return to, qemu-img snapshot -a to revert to the snapshot of
interest, then qemu-img convert to copy off the contents, then qemu-img
snapshot -a to the temporary state, then qemu-img snapshot -d to clean
up both the temporary andstate). But for a storage volume currently in
use by qemu, this would imply a new qemu command to have qemu assist in
streaming out the contents of the snapshot state.
The current live block copy/image streaming APIs do not know about
internal snapshots. Copy the contents of a snapshot while the VM is
running is technically doable but there is no API and no code for it
in QEMU.
>> /* Return the most recent snapshot of a volume, if one
exists, or NULL
>> on failure. Flags is 0 for now. */
>> virStorageVolSnapshotPtr virStorageVolSnapshotCurrent(virStorageVolPtr
>> vol, unsigned int flags);
>
> The name should include "revert". This looks like a shortcut function
> for virStorageVolRevertToSnapshot().
No, it was intended as a counterpart to virDomainSnapshotCurrent, which
returns the "current snapshot" if there is one. But again, it may be
that while a "current snapshot" makes sense for a domain, it might not
make sense for a storage volume in isolation.
qcow2 internal snapshots always copy metadata to the "active" image,
they do not allow you to in-place update existing snapshots. In that
sense a qcow2 image only has one current snapshot, the active image.
The way to update a snapshot is to delete it and create a new one with
the same name.
Stefan