Hey Kashyap,
I've started reading this to learn about libvirt snapshot implementation,
and noticed a few typos (I think Eric already pointed out some of these),
On Tue, Oct 23, 2012 at 03:28:06PM +0530, Kashyap Chamarthy wrote:
---
docs/snapshots-blockcommit-blockpull.rst | 646 ++++++++++++++++++++++++++++++
1 files changed, 646 insertions(+), 0 deletions(-)
create mode 100644 docs/snapshots-blockcommit-blockpull.rst
diff --git a/docs/snapshots-blockcommit-blockpull.rst
b/docs/snapshots-blockcommit-blockpull.rst
new file mode 100644
index 0000000000000000000000000000000000000000..99c30223a004ee5291e2914b788ac7fe04eee3c8
--- /dev/null
+++ b/docs/snapshots-blockcommit-blockpull.rst
@@ -0,0 +1,646 @@
+.. ----------------------------------------------------------------------
+ Note: All these tests were performed with latest qemu-git,libvirt-git (as of
+ 20-Oct-2012 on a Fedora-18 alpha machine
+.. ----------------------------------------------------------------------
+
+
+Introduction
+============
+
+A virtual machine snapshot is a view of a virtual machine(its OS & all its
+applications) at a given point in time. So that, one can revert to a known sane
+state, or take backups while the guest is running live. So, before we dive into
+snapshots, let's have an understanding of backing files and overlays.
+
+
+
+QCOW2 backing files & overlays
+------------------------------
+
+In essence, QCOW2(Qemu Copy-On-Write) gives you an ability to create a base-image,
+and create several 'disposable' copy-on-write overlay disk images on top of the
+base image(also called backing file). Backing files and overlays are
+extremely useful to rapidly instantiate thin-privisoned virtual machines(more on
provisioned
+it below). Especially quite useful in development & test
environments, so that
+one could quickly revert to a known state & discard the overlay.
+
+**Figure-1**
+
+::
+
+ .--------------. .-------------. .-------------. .-------------.
+ | | | | | | | |
+ | RootBase |<---| Overlay-1 |<---| Overlay-1A <--- | Overlay-1B |
+ | (raw/qcow2) | | (qcow2) | | (qcow2) | | (qcow2) |
+ '--------------' '-------------' '-------------'
'-------------'
+
+The above figure illustrates - RootBase is the backing file for Overlay-1, which
+in turn is backing file for Overlay-2, which in turn is backing file for
+Overlay-3.
Text is about overlay 1, 2 , 3, and the image has 1, 1A and 1B.
+
+**Figure-2**
+::
+
+ .-----------. .-----------. .------------. .------------. .------------.
+ | | | | | | | | | |
+ | RootBase |<--- Overlay-1 |<--- Overlay-1A <--- Overlay-1B <---
Overlay-1C |
+ | | | | | | | | | (Active) |
+ '-----------' '-----------' '------------'
'------------' '------------'
+ ^ ^
+ | |
+ | | .-----------. .------------.
+ | | | | | |
+ | '-------| Overlay-2 |<---| Overlay-2A |
+ | | | | (Active) |
+ | '-----------' '------------'
+ |
+ |
+ | .-----------. .------------.
+ | | | | |
+ '------------| Overlay-3 |<---| Overlay-3A |
+ | | | (Active) |
+ '-----------' '------------'
+
+The above figure is just another representation which indicates, we can use a
+'single' backing file, and create several overlays -- which can be used
further,
+to create overlays on top of them.
+
+
+**NOTE**: Backing files are always opened **read-only**. In other words, once
+ an overlay is created, its backing file should not be modified(as the
+ overlay depends on a particular state of the backing file). Refer
+ below ('blockcommit' section) for relevant info on this.
+
+
+**Example** :
+
+::
+
+ [FedoraBase.img] ----- <- [Fedora-guest-1.qcow2] <- [Fed-w-updates.qcow2]
<- [Fedora-guest-with-updates-1A]
+ \
+ \--- <- [Fedora-guest-2.qcow2] <- [Fed-w-updates.qcow2]
<- [Fedora-guest-with-updates-2A]
+
+(Arrow to be read as Fed-w-updates.qcow2 has Fedora-guest-1.qcow2 as its backing file.)
+
+In the above example, say, *FedoraBase.img* has a freshly installed Fedora-17 OS on it,
+and let's establish it as our backing file. Now, FedoraBase can be used as a
+read-only 'template' to quickly instantiate two(or more) thinly provisioned
+Fedora-17 guests(say Fedora-guest-1.qcow2, Fedora-guest-2.qcow2) by creating
+QCOW2 overlay files pointing to our backing file. Also, the example & *Figure-2*
+above illustrate that a single root-base image(FedoraBase.img) can be used
+to create multiple overlays -- which can subsequently have their own overlays.
+
+
+ To create two thinly-provisioned Fedora clones(or overlays) using a single
+ backing file, we can invoke qemu-img as below: ::
+
+
+ # qemu-img create -b /export/vmimages/RootBase.img -f qcow2 \
+ /export/vmimages/Fedora-guest-1.qcow2
+
+ # qemu-img create -b /export/vmimages/RootBase.img -f qcow2 \
+ /export/vmimages/Fedora-guest-2.qcow2
+
+ Now, both the above images *Fedora-guest-1* & *Fedora-guest-2* are ready to
+ boot. Continuting with our example, say, now you want to instantiate a
Continuing
+ Fedora-17 guest, but this time, with full Fedora updates. This
can be
+ accomplished by creating another overlay(Fedora-guest-with-updates-1A) - but
+ this overly would point to 'Fed-w-updates.qcow2' as its backing file (which
overlay
+ has the full Fedora updates) ::
+
+ # qemu-img create -b /export/vmimages/Fed-w-updates.qcow2 -f qcow2 \
+ /export/vmimages/Fedora-guest-with-updates-1A.qcow2
+
+
+ Information about a disk image, like virtual size, disk size, backing file(if it
+ exists) can be obtained by using 'qemu-img' as below:
+ ::
+
+ # qemu-img info /export/vmimages/Fedora-guest-with-updates-1A.qcow2
+
+ NOTE: With latest qemu, an entire backing chain can be recursively
+ enumerated by doing:
+ ::
+
+ # qemu-img info --backing-chain
/export/vmimages/Fedora-guest-with-updates-1A.qcow2
+
+
+
+Snapshot Terminology:
+---------------------
+
+ - **Internal Snapshots** -- A single qcow2 image file holds both the saved state
+ & the delta since that saved point. This can be further classified as :-
+
+ (1) **Internal disk snapshot**: The state of the virtual disk at a given
+ point in time. Both the snapshot & delta since the snapshot are
+ stored in the same qcow2 file. Can be taken when the guest is
'live'
+ or 'offline'.
+
+ - Libvirt uses QEMU's 'qemu-img' command when the guest is
'offline'.
+ - Libvirt uses QEMU's 'savevm' command when the guest is
'live'.
+
+ (2) **Internal system checkpoint**: RAM state, device state & the
+ disk-state of a running guest, are all stored in the same originial
+ qcow2 file. Can be taken when the guest is running 'live'.
+
+ - Libvirt uses QEMU's 'savevm' command when the guest is
'live'
+
+
+ - **External Snapshots** -- Here, when a snapshot is taken, the saved state will
+ be stored in one file(from that point, it becomes a read-only backing
+ file) & a new file(overlay) will track the deltas from that saved state.
+ This can be further classified as :-
+
+ (1) **External disk snapshot**: The snapshot of the disk is saved in one
+ file, and the delta since the snapshot is tracked in a new qcow2
+ file. Can be taken when the guest is 'live' or 'offline'.
+
+ - Libvirt uses QEMU's 'transaction' cmd under the hood,
when the
+ guest is 'live'.
+
+ - Libvirt uses QEMU's 'qemu-img' cmd under the hood when
the
+ guest is 'offline'(this implementation is in progress, as of
+ writing this).
+
+ (2) **External system checkpoint**: Here, the guest's disk-state will be
+ saved in one file, its RAM & device-state will be saved in another
+ new file (This implementation is in progress upstream libvirt, as of
+ writing this).
+
+
+
+ - **VM State**: Saves the RAM & device state of a running guest(not
'disk-state') to
+ a file, so that it can be restored later. This simliar to doing hibernate
This is similar
If I'm not mistaken there's a big difference between this and hibernate in
that when coming back from hibernate the guest OS knows its clock is out of
sync, but when restoring RAM & device state, it doesn't know that, and the
out of sync clock can confuse some OSes (Windows).
+ of the system. (NOTE: The disk-state should be unmodified at
the time of
+ restoration.)
+
+ - Libvirt uses QEMU's 'migrate' (to file) cmd under the hood.
+
+
+
+Creating snapshots
+==================
+ - Whenever an 'external' snapshot is issued, a /new/ overlay image is
+ created to facilitate guest writes, and the previous image becomes a
+ snapshot.
+
+ - **Create a disk-only internal snapshot**
+
+ (1) If I have a guest named 'f17vm1', to create an offline or online
+ 'internal' snapshot called 'snap1' with description
'snap1-desc' ::
+
+ # virsh snapshot-create-as f17vm1 snap1 snap1-desc
+
+ (2) List the snapshot ; and query using *qemu-img* tool to view
+ the image info & its internal snapshot details ::
+
+ # virsh snapshot-list f17vm1
+ # qemu-img info /home/kashyap/vmimages/f17vm1.qcow2
+
+
+
+ - **Create a disk-only external snapshot** :
+
+ (1) List the block device associated with the guest. ::
+
+ # virsh domblklist f17-base
+ Target Source
+ ---------------------------------------------
+ vda /export/vmimages/f17-base.qcow2
+
+ #
+
+ (2) Create external disk-only snapshot (while the guest is *running*). ::
+
+ # virsh snapshot-create-as --domain f17-base snap1 snap1-desc \
+ --disk-only --diskspec
vda,snapshot=external,file=/export/vmimages/sn1-of-f17-base.qcow2 \
+ --atomic
+ Domain snapshot snap1 created
+ #
+
+ * Once the above command is issued, the original disk-image
+ of f17-base will become the backing_file & a new overlay
+ image is created to track the new changes. Here on, libvirt
+ will use this overlay for further write operations(while
+ using the original image as a read-only backing_file).
+
+ (3) Now, list the block device associated(use cmd from step-1, above)
+ with the guest,again, to ensure it reflects the new overlay image as
+ the current block device in use. ::
+
+ # virsh domblklist f17-base
+ Target Source
+ ----------------------------------------------------
+ vda /export/vmimages/sn1-of-f17-base.qcow2
+
+ #
+
+
+
+
+Reverting to snapshots
+======================
+As of writing this, reverting to 'Internal Snapshots'(system checkpoint or
+disk-only) is possible.
+
+ To revert to a snapshot named 'snap1' of domain f17vm1 ::
+
+ # virsh snapshot-revert --domain f17vm1 snap1
+
+Reverting to 'external disk snapshots' using *snapshot-revert* is a little more
+tricky, as it involves slightly complicated process of dealing with additional
+snapshot files - whether to merge 'base' images into 'top' or to merge
other way
+round ('top' into 'base').
+
+That said, there are a couple of ways to deal with external snapshot files by
+merging them to reduce the external snapshot disk image chain by performing
+either a **blockpull** or **blockcommit** (more on this below).
+
+Further improvements on this front is in work upstream libvirt as of writing
+this.
+
+
+
+Merging snapshot files
+======================
+External snapshots are incredibly useful. But, with plenty of external snapshot
+files, there comes a problem of maintaining and tracking all these inidivdual
individual
+files. At a later point in time, we might want to 'merge'
some of these snapshot
+files (either backing_files into overlays or vice-versa) to reduce the length of
+the image chain. To accomplish that, there are two mechanisms:
+
+ + blockcommit: merges data from **top** into **base** (in other
+ words, merge overlays into backing files).
+
+
+ + blockpull: Populates a disk image with data from its backing file. Or
+ merges data from **base** into **top** (in other words, merge backing files
+ into overlays).
+ blockcommit: merges...
+ blockpull: Populates...
The case is inconsistent here
+
+
+blockcommit
+-----------
+
+Block Commit allows you to merge from a 'top' image(within a disk backing file
+chain) into a lower-level 'base' image. To rephrase, it allows you to
+merge overlays into backing files. Once the **blockcommit** operation is finished,
+any portion that depends on the 'top' image, will now be pointing to the
'base'.
+
+This is useful in flattening(or collapsing or reducing) backing file chain
+length after taking several external snapshots.
+
+
+Let's understand with an illustration below:
+
+We have a base image called 'RootBase', which has a disk image chain with 4
+external snapshots. With 'Active' as the current active-layer, where
'live' guest
+writes happen. There are a few possibilities of resulting image chains that we
+can end up with, using 'blockcommit' :
+
+ (1) Data from Snap-1, Snap-2 and Snap-3 can be merged into 'RootBase'
+ (resulting in RootBase becoming the backing_file of 'Active', and thus
+ invalidating Snap-1, Snap-2, & Snap-3).
+
+ (2) Data from Snap-1 and Snap-2 can be merged into RootBase(resulting in
+ Rootbase becoming the backing_file of Snap-3, and thus invalidating
+ Snap-1 & Snap-2).
+
+ (3) Data from Snap-1 can be merged into RootBase(resulting in RootBase
+ becoming the backing_file of Snap-2, and thus invalidating Snap-1).
+
+ (4) Data from Snap-2 can be merged into Snap-1(resulting in Snap-1 becoming
+ the backing_file of Snap-3, and thus invalidating Snap-2).
+
+ (5) Data from Snap-3 can be merged into Snap-2(resulting in Snap-2 becoming
+ the backing_file for 'Active', and thus invalidating Snap-3).
+
+ (6) Data from Snap-2 and Snap-3 can be merged into Snap-1(resulting in
+ Snap-1 becoming the backing_file of 'Active', and thus invalidating
+ Snap-2 & Snap-3).
+
+ NOTE: Eventually(not supported in qemu as of writing this), we can also
+ merge down the 'Active' layer(the top-most overlay) into its
+ backing_files. Once it is supported, the 'top' argument can become
backing_file instead of backing_files ?
+ optional, and default to active layer.
+
+
+(The below figure illustrates case (6) from the above)
+
+**Figure-3**
+::
+
+ .------------. .------------. .------------. .------------. .------------.
+ | | | | | | | | | |
+ | RootBase <--- Snap-1 <--- Snap-2 <--- Snap-3 <--- Snap-4
|
+ | | | | | | | | | (Active) |
+ '------------' '------------' '------------'
'------------' '------------'
+ / |
+ / |
+ / commit data |
+ / |
+ / |
+ / |
+ v commit data |
+ .------------. .------------. <--------------------' .------------.
+ | | | | | |
+ | RootBase <--- Snap-1 |<---------------------------------| Snap-4 |
+ | | | | Backing File | (Active) |
+ '------------' '------------'
'------------'
+
+For instance, if we have the below scenario:
+
+ Actual: [base] <- sn1 <- sn2 <- sn3 <- sn4(this is active)
+
+ Desired: [base] <- sn1 <- sn4 (thus invalidating sn2,sn3)
+
+ Any of the below two methods is valid (as of 17-Oct-2012 qemu-git). With
+ method-a, operation will be faster & correct if we don't care about
+ sn2(because, it'll be invalidated). Note that, method-b is slower, but sn2
+ will remain valid. (Also note that, the guest is 'live' in all these
cases).
+
+ **(method-a)**:
+ ::
+
+ # virsh blockcommit --domain f17 vda --base /export/vmimages/sn1.qcow2 --top
/export/vmimages/sn3.qcow2 --wait --verbose
+
+ [OR]
+
+ **(method-b)**:
+ ::
+
+ # virsh blockcommit --domain f17 vda --base /export/vmimages/sn2.qcow2
--top /export/vmimages/sn3.qcow2 --wait --verbose
+ # virsh blockcommit --domain f17 vda --base /export/vmimages/sn1.qcow2
--top /export/vmimages/sn2.qcow2 --wait --verbose
+
+ NOTE: If we had to do manually with *qemu-img* cmd, we can only do method-b at
the moment.
+
+
+**Figure-4**
+::
+
+ .------------. .------------. .------------. .------------. .------------.
+ | | | | | | | | | |
+ | RootBase <--- Snap-1 <--- Snap-2 <--- Snap-3 <--- Snap-4
|
+ | | | | | | | | | (Active) |
+ '------------' '------------' '------------'
'------------' '------------'
+ / | |
+ / | |
+ / | |
+ commit data / commit data | |
+ / | |
+ / | commit data |
+ v | |
+ .------------.<----------------------|-------------' .------------.
+ | |<----------------------' | |
+ | RootBase | | Snap-4 |
+ | |<-------------------------------------------------| (Active) |
+ '------------' Backing File
'------------'
+
+
+The above figure is another representation of reducing the disk image chain
+using blockcommit. Data from Snap-1, Snap-2, Snap-3 are merged(/committed)
+into RootBase, & now the current 'Active' image now pointing to
'RootBase' as its
+backing file(instead of Snap-3, which was the case *before* blockcommit). Note
+that, now intermediate images Snap-1, Snap-1, Snap-3 will be invalidated(as they were
+dependent on a particular state of RootBase).
+
+blockpull
+---------
+Block Pull(also called 'Block Stream' in QEMU's paralance) allows you to
merge
parlance
+into 'base' from a 'top' image(within a disk backing
file chain). To rephrase it
+allows merging backing files into an overlay(active). This works in the
+opposite side of 'blockcommit' to flatten the snapshot chain. At the moment,
+**blockpull** can pull only into the active layer(the top-most image). It's
+worth noting here that, intermediate images are not invalidated once a blockpull
+operation is complete (while blockcommit, invalidates them).
I wouldn't put a ',' inside the parentheses.
+
+
+Consider the below illustration:
+
+**Figure-5**
+::
+
+ .------------. .------------. .------------. .------------. .------------.
+ | | | | | | | | | |
+ | RootBase <--- Snap-1 <--- Snap-2 <--- Snap-3 <--- Snap-4
|
+ | | | | | | | | | (Active) |
+ '------------' '------------' '------------'
'------------' '------------'
+ | | \
+ | | \
+ | | \
+ | | \ stream data
+ | | stream data \
+ | stream data | \
+ | | v
+ .------------. | '---------------> .------------.
+ | | '---------------------------------> | |
+ | RootBase | | Snap-4 |
+ | | <---------------------------------------- | (Active) |
+ '------------' Backing File
'------------'
+
+
+
+The above figure illustrates that, using block-copy we can pull data from
+Snap-1, Snap-2 and Snap-3 into the 'Active' layer, resulting in
'RootBase'
+becoming the backing file for the 'Active' image (instead of 'Snap-3',
which was
+the case before doing the blockpull operation).
+
+The command flow would be:
+ (1) Assuming a external disk-only snapshot was created as mentioned in
+ *Creating Snapshots* section:
+
+ (2) A blockpull operation can be issued this way, to achieve the desired
+ state of *Figure-5*-- [RootBase] <- [Active]. ::
+
+ # virsh blockpull --domain RootBase --path
var/lib/libvirt/images/active.qcow2 --base /var/lib/libvirt/images/RootBase.qcow2 --wait
--verbose
+
+
+ As a follow up, we can do the below to clean-up the snapshot *tracking*
+ metadata by libvirt (note: the below does not 'remove' the files, it
+ just cleans up the snapshot tracking metadata). ::
+
+ # virsh snapshot-delete --domain RootBase Snap-3 --metadata
+ # virsh snapshot-delete --domain RootBase Snap-2 --metadata
+ # virsh snapshot-delete --domain RootBase Snap-1 --metadata
+
+
+
+
+**Figure-6**
+::
+
+ .------------. .------------. .------------. .------------. .------------.
+ | | | | | | | | | |
+ | RootBase <--- Snap-1 <--- Snap-2 <--- Snap-3 <--- Snap-4
|
+ | | | | | | | | | (Active) |
+ '------------' '------------' '------------'
'------------' '------------'
+ | | | \
+ | | | \
+ | | | \ stream data
+ | | | stream data \
+ | | | \
+ | | stream data | \
+ | stream data | '------------------> v
+ | | .--------------.
+ | '---------------------------------> | |
+ | | Snap-4 |
+ '----------------------------------------------------> | (Active) |
+ '--------------'
+ 'Standalone'
+ (w/o backing
+ file)
+
+The above figure illustrates, once blockpull operation is complete, by
+pulling/streaming data from RootBase, Snap-1, Snap-2, Snap-3 into 'Active', all
+the backing files can be discarded and 'Active' now will be a standalone image
+without any backing files.
+
+Command flow would be:
+ (0) Assuming 4 external disk-only (live) snapshots were created as
+ mentioned in *Creating Snapshots* section,
+
+ (1) Let's check the snapshot overlay images size *before* blockpull operation
(note the image of 'Active'):
+ ::
+
+ # ls -lash /var/lib/libvirt/images/RootBase.img
+ 608M -rw-r--r--. 1 qemu qemu 1.0G Oct 11 17:54
/var/lib/libvirt/images/RootBase.img
+
+ # ls -lash /var/lib/libvirt/images/*Snap*
+ 840K -rw-------. 1 qemu qemu 896K Oct 11 17:56
/var/lib/libvirt/images/Snap-1.qcow2
+ 392K -rw-------. 1 qemu qemu 448K Oct 11 17:56
/var/lib/libvirt/images/Snap-2.qcow2
+ 456K -rw-------. 1 qemu qemu 512K Oct 11 17:56
/var/lib/libvirt/images/Snap-3.qcow2
+ 2.9M -rw-------. 1 qemu qemu 3.0M Oct 11 18:10
/var/lib/libvirt/images/Active.qcow2
+
+ (2) Also, check the disk image information of 'Active'. It can noticed that
+ 'Active' has Snap-3 as its backing file. ::
+
+ # qemu-img info /var/lib/libvirt/images/Active.qcow2
+ image: /var/lib/libvirt/images/Active.qcow2
+ file format: qcow2
+ virtual size: 1.0G (1073741824 bytes)
+ disk size: 2.9M
+ cluster_size: 65536
+ backing file: /var/lib/libvirt/images/Snap-3.qcow2
+
+ (3) Do the **blockpull** operation. ::
+
+ # virsh blockpull --domain ptest2-base --path
/var/lib/libvirt/images/Active.qcow2 --wait --verbose
+ Block Pull: [100 %]
+ Pull complete
+
+ (4) Let's again check the snapshot overlay images size *after*
+ blockpull operation. It can be noticed, 'Active' is now considerably
larger. ::
+
+ # ls -lash /var/lib/libvirt/images/*Snap*
+ 840K -rw-------. 1 qemu qemu 896K Oct 11 17:56
/var/lib/libvirt/images/Snap-1.qcow2
+ 392K -rw-------. 1 qemu qemu 448K Oct 11 17:56
/var/lib/libvirt/images/Snap-2.qcow2
+ 456K -rw-------. 1 qemu qemu 512K Oct 11 17:56
/var/lib/libvirt/images/Snap-3.qcow2
+ 1011M -rw-------. 1 qemu qemu 3.0M Oct 11 18:29
/var/lib/libvirt/images/Active.qcow2
+
+
+ (5) Also, check the disk image information of 'Active'. It can now be
+ noticed that 'Active' is a standalone image without any backing file -
+ which is the desired state of *Figure-6*.::
+
+ # qemu-img info /var/lib/libvirt/images/Active.qcow2
+ image: /var/lib/libvirt/images/Active.qcow2
+ file format: qcow2
+ virtual size: 1.0G (1073741824 bytes)
+ disk size: 1.0G
+ cluster_size: 65536
+
+ (6) We can now clean-up the snapshot tracking metadata by libvirt to
+ reflect the new reality ::
+
+ # virsh snapshot-delete --domain RootBase Snap-3 --metadata
+
+ (7) Optionally, one can check, the guest disk contents by invoking
+ *guestfish* tool(part of *libguestfs*) **READ-ONLY** (*--ro* option
+ below does it) as below ::
+
+ # guestfish --ro -i -a /var/lib/libvirt/images/Active.qcow2
+
+
+Deleting snapshots (and 'offline commit')
+=========================================
+
+Deleting (live/offline) *Internal Snapshots* (where the originial & all the named
snapshots
original
All in all, this is a very interesting and useful doc for me )
Thanks,
Christophe