Hi all.
Purpose of this API is to help with making host-side backups of running
domains without having to suspend them until backup finishes. The API can be
used to create a consistent snapshot of a disk assigned to a domain, which can
be later used for making backups. For better consistency, libvirt will try to
notify the domain to make data on disk consistent. Guest agent is needed for
this to work and libvirt will talk with it using hypervisor API or directly.
Libvirt already provides snapshot support with virDomainSnapshot* APIs but
they serve a bit different purpose. Domain snapshots consist of current state
of all disks and runtime state (if the domain is running) taken at the same
time. Since memory is also saved, domain snapshot creation can take some time
during which the domain may need to be suspended. The main benefit of these
APIs is that they can be used to create checkpoints of a domain in a known
good state to which the domain can later be reverted.
Disk snapshot API needs to be general and flexible enough to support various
storage and snapshot methods. Examples of what can be used for creating disk
snapshots are: QEMU qcow2 snapshots, LVM snapshots, filesystems with snapshot
support (btrfs, zfs, ...), enterprise storage.
Moreover, method used to create disk snapshot does not have to be determined
by disk type. One can have a qcow2 disk stored on btrfs inside lvm logical
volume; snapshot of such disk can be done at any of the three levels.
Hypervisor support: QEMU supports snapshot_blkdev monitor command for creating
qcow2 snapshots. VMware does not seem to support per-device snapshots, only
per-VM snapshot without memory. VirtualBox provides
IMedium::createDiffStorage(), IMedium::mergeTo() for per-device snapshots but
they do not seem to work on disks in use.
However, full support from hypervisors is not strictly required since
snapshots can be done outside of them although at a possible cost of losing
consistency.
As different snapshot methods may require different input data, new
diskSnapshot XML element is introduced to describe disk snapshot method:
<diskSnapshot method='qemu|lvm|btrfs|...'>
<!-- required data depending on the method -->
</diskSnapshot>
For qemu method, the element can be very simple:
<diskSnapshot method='qemu'/>
For lvm, logical volume device needs to be specified in case it is not the
same as disk to be snapshotted:
<diskSnapshot method='lvm'>
<device path='/dev/vg/lv4'/>
</diskSnapshot>
Enterprise storage would need their own ways of identifying volumes to be
snapshotted.
The diskSnapshot element can be used inside disk element in domain XML:
<domain ...>
...
<devices>
...
<disk ...>
...
<diskSnapshot ...>
...
</diskSnapshot>
</disk>
</devices>
</domain>
The "disk" part of "diskSnapshot" element name may seem to be
redundant but I
wanted to avoid confusion with snapshot XML element used for domain snapshots.
Existing virDomainUpdateDevice API can be used to alter snapshot method on
existing disk devices.
To create a snapshot of a disk, the following API is introduced:
int
virDomainDiskSnapshotCreate(virDomainPtr domain,
const char *disk,
const char *method,
const char *name,
char **modifiedDisk,
char **backupSource,
unsigned int flags);
@domain
pointer to domain object
@disk
XML definition of the disk to snapshot
@method
optional <diskSnapshot> XML element overriding the one from <disk>
element; if none of them is specified, default method according to disk
type (and pool) is used; e.g., qcow disk => qemu method; logical volume in
LVM pool => lvm method
@name
snapshot name
@modifiedDisk
place where to store modified XML description of the disk which the domain
is now using; NULL is stored if domain is still using the original disk
(snapshot was created separately)
@backupSource
place where to store 'source' element from disk XML describing the disk
which can be used to take backups of 'disk' (i.e., read-only and immutable
snapshot); it might either be the same as provided in 'disk' or something
else (depending on method/implementation used); e.g., for qemu method, the
element describes previous disk source; lvm creates a new device for
snapshot and keeps writing into the original device
@flags
OR'ed set of flags:
- VIR_DOMAIN_DISK_SNAPSHOT_QUIESCE_REQUIRED -- if no guest agent is
running/answering requests for consistent disk state, fail the API;
otherwise, the snapshot will be done regardless
I have a slight feeling that the API is a bit over-engineered but I'm not
entirely sure if it can be simplified and still provide the flexibility and
future-compatibility. I have this feeling especially about backupSource output
parameter which might possibly replaced with a simple char * (eventually
returned directly by the API instead of int) containing file/device path.
Another think which is not strictly needed is modifiedDisk. The caller can ask
for domain XML and look the device there if needed but that would be quite
complicated. Thus returning it from this API seemed useful and logical too,
since the API is possibly changing disk XML and it make sense to return the
changes.
Deleting/merging snapshots previously created by virDomainDiskSnapshotCreate
is not covered by this proposal and will need to be added in the future to
complete disk snapshot support.
Jirka