Re: [libvirt] RFC: API additions for enhanced snapshot support

21 Jun 2011


      On 06/21/2011 04:30 AM, Daniel P. Berrange wrote:
...
...
Upstream qemu is developing a 'live snapshot' feature, which allows the
creation of a snapshot without the current downtime of several seconds
required by the current 'savevm' monitor command, as well as means for
controlling applications (libvirt) to request that qemu pause I/O to a
particular disk, then externally perform a snapshot, then tell qemu to
resume I/O (perhaps on a different file name or fd from the host, but
with no change to the contents seen by the guest).  Eventually, these
changes will make it possible for libvirt to create fast snapshots of
LVM partitions or btrfs files for guest disk images, as well as to
Actually, IIUC, the QEMU 'live snapshot' feature is only for special
disk formats like qcow2, qed, etc.
Does anyone have pointers to the qemu implementation of monitor commands
used for live snapshot?
...
For formats like LVM, brtfs, SCSI, etc,  libvirt will have todo all
the work of creating the snapshot, possibly then telling QEMU to
switch the backing file of a virtual disk to the new image (if the
snapshot mechanism works that way).
Yes, that was what I was envisioning.
...
...
select which disks are saved in a snapshot (that is, save a
crash-consistent state of a subset of disks, without the corresponding
RAM state, rather than making a full system restore point); the latter
would work best with guest cooperation to quiesce disks before qemu
pauses I/O to that disk, but that is an orthogonal enhancement.
At the very least, you need a way to store QEMU writing to the disk
for a period of time, whether or not the guest is quiesced. There
are basically 3 options
1. Pause the guest CPUs (eg  'stop' on the monitor)
 2. QEMU queues I/O from guest in memory temporarily (does not currently exist)
 3. QEMU tells guest to quiesce I/O temporarily (does not currently exist)
To perform a snapshot libvirt would need todo
1. Stop I/O using one of the 3 methods above
 2. If disk is a special format
      - Ask QEMU to snapshot it
    Else
      - Create snapshot ourselves
      - Update QEMU disk backing path (optional)
 3. Resume I/O
It is step 2B (create the snapshot ourselves) where the proposed
virStorageVolSnapshot* APIs would be useful.  The remaining steps also
need implementation, but I believe that they can fit into existing APIs
by the use of new flag values, rather than requiring any new API.
...
...
However, my first goal with API enhancements is to merely prove that
libvirt can manage a live snapshot by using qemu-img on a qcow2 image
rather than the current 'savevm' approach of qemu doing all the work.
FYI, QEMU developers are adament that if the disk image is open
by QEMU you should, in general, not do anything using qemu-img
on that disk image.
Agreed.  And I further think that we need to expend some efforts making
the new image locking code also play well with libvirt - that is, any
virStorageVol API that can modify a disk image (rather than just do a
read-only operation describing the image) should probably be taught to
fail if any active domain is also using that image.  Conversely, if a
long-running virStorageVol API is started on a volume, then an attempt
to virDomainStart a domain should see that the volume is already in use
and fail just as if the volume had been locked by another running domain.
...
libvirt does currently do things like querying
disk capacity, but we can get away with that because it is an
invariant section of the header. We certainly can't create internal
snapshots with qemu-img while the guest is live. Creating external
snapshots with qemu-img is probably OK, but when I've suggested
this before QEMU developers were unhappy with even that.
Basically, my proposed virStorageVolSnapshot APIs should only be used on
inactive volumes; for a running domain, you should always go through the
existing virDomainSnapshot API, which can then make appropriate
decisions whether to do external snapshots, or whether to have qemu do
the work because the image is qcow2.  I think we're in agreement here,
and that it still doesn't impact the decision for adding new API for
offline snapshot management.
...
What I'm not seeing here, is how these APIs all relate to the existing
support we have in virStorageVol APIs for creating snapshots. THis is
already implemented for LVM, QCow, QCow2.
The only existing snapshot API that I found was
virDomainSnapshotCreateXML, which only works on qcow2 (not lvm or qcow),
and which works either online (via qemu) or offline (via qemu-img).  But
I could have overlooked something - where is the existing API for
creating an LVM snapshot?  For volume creation, I'm aware of code for
specifying a backing file for an existing file, but backing files aren't
necessarily the same as snapshots, are they?
...
The snapshots are created by
specifying a backing file in the initial volume description. Depending
on the storage type, the backing file for a snapshot can be writable,
or readonly. Snapshots appear as just more storage volumes, and are not
restricted to being within the same pool as the original volume. You can
also mix storage formats, eg, create a Qcow2 volume with backing file
on LVM, which is itself a snapshot of another LVM volume.
The QCow2 internal snapshots don't really fit into our existing model,
since they don't have extra associated external files, so maybe we do
still want some of these explicit APIs to query snapshots against
volumes.
-- 
Eric Blake   eblake@redhat.com    +1-801-349-2682
Libvirt virtualization library http://libvirt.org