2010/3/30 Chris Lalancette <clalance(a)redhat.com>:
Hello,
After our discussions about the snapshot API last week, I went ahead and implemented
quite a bit of the API. I also went back to the ESX, Virtualbox, and QEMU API's to
try and make sure our API's matched up. What's below is my revised API based on
that survey. Following my revised API are notes that I took regarding how the
libvirt API matches up to the various API's, and some questions about semantics that
I had while doing the survey. More comments and questions are welcome.
/* Start the guest from the snapshot "snapshot" */
int virDomainCreateFromSnapshot(virDomainSnapshotPtr snapshot,
unsigned int flags);
Will it be enforced that the domain is shutdown in order to call this function?
ESX doesn't have such a restriction. Not sure about other hypervisors.
/* Rename the snapshot */
/* Do we really need this? In theory we could use
* virsh snapshot-edit <domain> <name> and then detect
* name changes, but that will require a UUID, which may
* or may not be overkill
*/
int virDomainSnapshotRename(virDomainSnapshotPtr snapshot,
char *newname,
unsigned int flags);
/* Delete a snapshot - with no flags, the snapshot is not used anymore,
* but also not removed. With a MERGE flag, it merges the snapshot into
* the parent snapshot (or the base image, if there is no parent snapshot).
Note that "merging into the parent" seems to be the wrong term, even
in case of ESX. See detailed discussion below.
* Note that if other snapshots would be discarded because of this
* MERGE action, this operation will fail. If that is really what is intended,
* use MERGE_FORCE.
*
* With a DISCARD flag, it deletes the snapshot. Note that if children snapshots
* would be discarded because of this delete action, this operation will
* fail. If this is really what is intended, use DISCARD_FORCE.
*
* MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive.
*
* Note that this operation can happen when the domain is running or shut
* down, though this is hypervisor specific */
typedef enum {
VIR_DOMAIN_SNAPSHOT_DELETE_MERGE,
VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE,
VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD,
VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE,
} virDomainSnapshotDelete;
int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot,
unsigned int flags);
int virDomainSnapshotFree(virDomainSnapshotPtr snapshot);
NOTE: During snapshot creation, *none* of the fields are required. That is,
you can call virDomainSnapshotCreateXML() with an XML of
"<domainsnapshot/>".
In this case, the individual driver will make up a <name> and <uuid> for you,
Does <uuid> here refer to a snapshot UUID? As said before, there is no
easy way have a UUID per snapshot with ESX. Well, we could store
<uuid>:<name> in the name field on the ESX side, but that's not a
really good way to do it.
the <creationdate> will be set to the current time+date,
<description> will be
empty, <state> will be the current state of the VM, and <parent> will
be set to the current snapshot (if any). If you do want to specify some
fields during virDomainSnapshotCreateXML(), note that the only ones that are
settable are <name>, <uuid>, and <description>;
the rest are ignored, and filled in by the driver when the snapshot is
actually created.
NOTE: <state> refers to the state of the VM when the snapshot was taken.
<domainsnapshot>
<name>XYZ</name>
<creationdate>...</creationdate>
<description>...</description>
<state>RUNNING</state>
<domain>
<uuid>XXXXX-XXXX-XXXX-XXXX-XXXXXXXXX</uuid>
</domain>
<parent>
<name>ABC</name>
</parent>
</domainsnapshot>
The virsh commands will be:
virsh snapshot-create <dom> <xmlfile>
virsh snapshot-list <dom>
virsh snapshot-dumpxml <dom> <name>
virsh start-with-snapshot <dom> <snapshotname>
virsh snapshot-delete <dom> <snapshotname>
[--merge|--mergeforce|--delete|--deleteforce]
virsh snapshot-delete-all <dom>
Possible issues:
1) I don't see a way to support "managed" save/restore and snapshotting
with
this API. I think we'll have to have a separate API for managed save/restore.
What's "managed" save/restore and snapshotting?
3) Do we need a snapshot UUID? Virtualbox allows you to have
multiple snapshots
with the same name, differentiated by UUID. Confusingly, they also have a
"FindByName" method that returns the first depth-first search snapshot that
matches
a given name. For qemu, if you specify the same name twice it overwrites the previous
one with the new one. I don't know what ESX does here.
ESX 4.0 allows multiple snapshots with the same name. I think this is
because ESX 4.0 has an integer ID per snapshot. I'm not sure if ESX
3.5 allows multiple snapshots with the same name, because the ID field
was added in ESX 4.0. I assume ESX 3.5 doesn't allow multiple
snapshots with the same name, but I have currently no ESX 3.5 at hand
to test.
We could use this integer ID and convert it to UUID format, but you
won't be able to set the UUID, it'll be read-only and only available
on ESX 4.0 and above.
Mapping of our interface to various hypervisors:
+-------------------------------+-----------------+-------------------+------------------------------+
| Libvirt | Qemu | Virtualbox | ESX
|
+-------------------------------+-----------------+-------------------+------------------------------+
| virDomainSnapshotCreateXML | monitor command | takeSnapshot |
CreateSnapshot_task |
| | "savevm"; if | Snapshots can | takes a
name, description, |
| | snapshot name | be taken on | memory
(true/false) and |
| | is already in | powered off, | quiesce
(true/false). |
| | use, replaces | saved, running, | What does
"memory" mean? |
If memory is true, ESX snapshots the memory of the domain too,
otherwise only a disk snapshot is created.
Creating a disk-only snapshot is nearly instant, while creating a
memory snapshot also requires a notable amount of time to write the
memory image to disk.
| | the previous | or paused VMs.
| Should we model "quiesce" |
The vSphere API docs give a good description what the quiesce option does:
"If TRUE and the virtual machine is powered on when the snapshot is
taken, VMware Tools is used to quiesce the file system in the virtual
machine. This assures that a disk snapshot represents a consistent
state of the guest file systems. If the virtual machine is powered off
or VMware Tools are not available, the quiesce flag is ignored."
I assume "quiesce the file system" means to flush write caches and
stuff like that.
This option is important if you want to create a disk-only snapshot of
a running domain.
| | snapshot. Also | The snapshot is
| Trees of snapshots are |
| | qemu-img | always taken | supported. What
happens |
| | snapshot -c can | against the | on a duplicate
name? What |
| | be used to | current snapshot. | state(s) can a VM
be in |
| | create a | What happens on | when calling
this? Does |
| | disk-only | a duplicate | a VM get paused
when this |
| | snapshot. What | name? Trees of | is called?
|
In case of ESX the domain can be in any state when a snapshot is created.
If the domain is running when you create a snapshot then the domain is
_not_ paused during the snapshot creation.
I tested it and the memory snapshot represents the state at the time
the snapshot command was issued.
| | happens if the | snapshots are
| |
| | VM is running | not currently |
|
| | when you do | supported. |
|
| | this? Trees of | Taking a snapshot |
|
| | snapshots seem | of a running VM |
|
| | to be supported | pauses the VM |
|
| | VM gets paused | before taking the |
|
| | while this is | snapshot. |
|
| | happening. What | |
|
| | states can the | |
|
| | VM be in? | |
|
+-------------------------------+-----------------+-------------------+------------------------------+
+-------------------------------+-----------------+-------------------+------------------------------+
| virDomainSnapshotDelete | monitor command | deleteSnapshot |
RemoveSnapshot_Task |
| | "delvm". What | deletes the | removes
this snapshot and |
| | happens if the | specified | deletes any
associated |
| | snapshot is in | snapshot. Takes | storage.
Operates on a |
| | use? What | an ID. The VM |
VirtualMachineSnapshot |
| | states can the | must be off. | object. What
states can |
| | VM be in? Also | Differences to | the VM be in?
What |
| | qemu-img | children | happens if this
snapshot |
| | snapshot -d | snapshots will be | is in-use? What
happens |
| | <name> <file> | merged with the | to
parents and children? |
| | command can be | children to keep |
|
| | used. What | children valid. |
|
| | happens if the | Parent for this |
|
| | disk is in-use? | snapshot will |
|
| | What happens to | become parent of |
|
| | parents and | any children |
|
| | children? | snapshots. |
|
| | How do we | |
|
| | handle merges? | |
|
+-------------------------------+-----------------+-------------------+------------------------------+
The domain can be in any state when deleting a snapshot, even if you
delete the current snapshot. VMware has some documentation about how a
snapshot is merged into its parent:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&...
And some more general docs about snapshots:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&...
Regarding what get's merged and where, I should define the terms I'm
using first.
A <--1-- B <--2-- C <--3-- current
+ <--4-- D
I intentionally draw the arrows directed from child to parent.
A, B, C, D are what I call a snapshot, a point in "time" I can switch
to. The disk differences between these points are stored in COW sparse
images, here shown as 1, 2, 3, 4. The current state of the domain is
denoted by the "current" item.
Each snapshot is associated with a disk image: A is associated with
the base image, B with sparse image 1, C with 2 and so on. A special
case is sparse image 3, it's not associated with a snapshot, but with
the current state. Also each snapshot can be associated with a memory
image (not shown here).
The current snapshot in this case is C. If the domain writes changes
to disk, these changes get stored in sparse image 3. If you switch to
another snapshot from here then the changes in 3 are lost, because you
cannot go back to a point where you could access the changes in 3
again.
Now lets delete B. In this case the memory images associated with B is
just discarded and 1 and 2 are merged into 5. That's what I was
referring to when I said ESX merges snapshots into the parent.
A <------5------- C <--3-- current
+ <--4-- D
But this only happens for snapshots like B, that have a parent and a
child (C is such a snapshot too, even if its child isn't an actual
snapshot). If you delete D in this example, then the changes in sparse
image 4 are discarded, because there is no place where they could be
merged. Merging 4 in the base image would alter A, merging 4 and 5
would alter C.
Now as I think of this in detail, it seems that the term "merging into
the parent" is wrong.
In the next example we have snapshot E with parent B.
A <--1-- B <--2-- C <--3-- current
+ <--6-- E
Now what's going to happen if we delete B? In order to preserve C and
E, the changes in 1 need to be merged into 2 and 6, this results in 1
+ 2 = 5 and 1 + 6 = 7. Or rephrased: B is merged in toits children C
and E.
A <------5------- C <--3-- current
+ <------7------- E
So, virDomainSnapshotDelete's semantic for VirtualBox and ESX seems to
be the same. I just used the wrong words to describe it at first.
Sorry for that.
Matthias