Il 13/03/2012 23:20, Eric Blake ha scritto:
virDomainSnapshotCreateXML will learn a new flag:
VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC. If this flag is present, then
libvirt guarantees that the snapshot operation will either succeed, or
that failure will be reported without changing domain XML or qemu
runtime state. If present, the creation API will fail if qemu lacks the
'transaction' command and more than one disk snapshot was requested in
the <domainsnapshot> XML. If this flag is not present, then libvirt
will use 'transaction' if available, but fall back to
'blockdev-snapshot-sync', so that it works with older qemu, but where
the caller then has to check virDomainGetXMLDesc on failure to see if a
partial snapshot occurred. This flag will be implied by any other part
of the API that requires the use of 'transaction'.
Fine.
The VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT flag was added to
virDomainSnapshotCreateXML in 0.9.10, with semantics that it would stop
libvirt from complaining if a regular file already existed as the
snapshot destination, but without interacting with qemu, which would
blindly overwrite the contents of that file. Since this flag is
relatively new, and has not had much use, I propose to slightly alter
its documented semantics to now interact with the qemu 1.1 feature being
added as part of 'transaction'. If qemu supports 'transaction', then
presence of this flag implies that libvirt will explicitly request
'mode':'existing' for each snapshot, which tells qemu to open the
existing file without writing any new metadata, and that the caller is
responsible to ensure that the file has identical guest contents
(generally by creating a qcow2 file with the current file as backing
image and no additional contents). Additionally, libvirt will now
require the file to already exist (in 0.9.10, libvirt silently ignored
the fact if the flag was requested but the file did not exist).
Presence of the flag without qemu support for 'transaction' will now
fail (that is, VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT will now imply
VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC).
Also looks ok.
Absence of the flag means that
libvirt will rely on qemu's default to
'mode':'absolute-paths', and will
require that the file does not exist as a regular file; this maps to
qemu 1.0 always writing a new qcow2 header with absolute backing file
name. If we want to later expose additional modes, like
'no-backing-file', it would be done via per-<disk> annotations in the
<domainsnapshot> XML rather than via new flags, but for this proposal, I
think oVirt is okay using the flag to set a single policy for all disks
mentioned in a given snapshot request.
virDomainSnapshotCreateXML's xml argument, <domainsnapshot>, will learn
an optional <mirror> sub-element to each <disk>. While the
'transaction' command supports multiple mirrors in one transaction, for
now, libvirt will enforce at most one mirror, which should be sufficient
for oVirt's needs. (Adding more support for the rest of the power of
'transaction' is probably best left for new libvirt API, but that's
outside the scope of this proposal). As an example,
<domainsnapshot>
<disks>
<disk name='/src/base.img' snapshot='external'>
<source file='/src/snap.img'/>
<mirror file='/dest/snap.img'/>
</disk>
</disks>
</domainsnapshot>
would create a new libvirt snapshot object with /src/snap.img as the
read-write new image, and /dest/snap.img as the new write-only mirror.
On success, this rewrites the domain's live XML to point to
/src/snap.img as its current file.
This is an awfully low-level API; you're designing for oVirt rather than
for everything else. The problem here is twofold:
1) you're defining a snapshot that cannot be started without losing the
mirrors.
2) in case the snapshotting is aborted early for any reason, oVirt has
to do a rebase operation manually. This is currently O(size-of-disk),
not O(changes-in-the-last-image), so it wastes both disk space and time.
If it works, I cannot really say "don't do it", but I think the oVirt
mirrored snapshots idea is a dead-end and a workaround for lack of block
device streaming (which is now supported). You could have a simpler,
high-level API based on streaming rather than snapshotting. So, if you
have /src/disk.img as your image, you would have a new API:
virDomainBlockCopy(dom, "disk",
"/dst/disk.img", "/src/base.img",
bandwidth, flags)
which would do all that is needed:
- start mirroring writes to /dst/disk.img; no snapshotting needed. A
flag VIR_DOMAIN_BLOCK_COPY_REUSE_EXT would let you specify the
"existing" mode. Another flag VIR_DOMAIN_BLOCK_COPY_CREATE_RAW would
use the raw format on the destination and specify the no-backing-file
mode (of course only valid if base == NULL).
- call virDomainBlockRebase(dom, "disk", "/src/base.img", bandwidth,
0)
to start the streaming job.
If something doesn't work here, it's a QEMU bug.
Finally, virDomainSnapshotDelete will learn a new flag,
VIR_DOMAIN_SNAPSHOT_DELETE_REOPEN_MIRROR, which says that the libvirt
snapshot object will be deleted, but only after first calling the qemu
'drive-reopen' monitor command for all disks that had a <mirror> in the
associated snapshot object. That is, for the above example, this would
reopen the disk from it's current read-write of /src/snap.img over to
the second storage domain's /dest/snap.img with it's accompanying
mirrored backing chain. On success, this rewrites the domain's live XML
to point to the just-opened mirror location. This flag will fail if the
libvirt snapshot being deleted is not the current image, or if the
snapshot being deleted does not have any mirrored disks.
I think you also need VIR_DOMAIN_SNAPSHOT_DELETE_REMOVE_MIRROR, to be
used in case of abort so that the domain can actually be started. Or it
could be an event MIRROR_DROPPED or something like that.
Paolo