On 03/30/2010 08:14 PM, Matthias Bolte wrote:
2010/3/30 Chris Lalancette <clalance(a)redhat.com>:
> Hello,
> After our discussions about the snapshot API last week, I went ahead and
implemented
> quite a bit of the API. I also went back to the ESX, Virtualbox, and QEMU API's
to
> try and make sure our API's matched up. What's below is my revised API based
on
> that survey. Following my revised API are notes that I took regarding how the
> libvirt API matches up to the various API's, and some questions about semantics
that
> I had while doing the survey. More comments and questions are welcome.
> /* Start the guest from the snapshot "snapshot" */
> int virDomainCreateFromSnapshot(virDomainSnapshotPtr snapshot,
> unsigned int flags);
Will it be enforced that the domain is shutdown in order to call this function?
ESX doesn't have such a restriction. Not sure about other hypervisors.
Heh, I was just going through that myself. No, it's not required to be
shutdown in general; qemu supports both modes. I've updated the documentation
for this call.
<snip>
> * Note that if other snapshots would be discarded because of
this
> * MERGE action, this operation will fail. If that is really what is intended,
> * use MERGE_FORCE.
> *
> * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots
> * would be discarded because of this delete action, this operation will
> * fail. If this is really what is intended, use DISCARD_FORCE.
> *
> * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive.
> *
> * Note that this operation can happen when the domain is running or shut
> * down, though this is hypervisor specific */
> typedef enum {
> VIR_DOMAIN_SNAPSHOT_DELETE_MERGE,
> VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE,
> VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD,
> VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE,
> } virDomainSnapshotDelete;
> int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot,
> unsigned int flags);
>
> int virDomainSnapshotFree(virDomainSnapshotPtr snapshot);
>
> NOTE: During snapshot creation, *none* of the fields are required. That is,
> you can call virDomainSnapshotCreateXML() with an XML of
"<domainsnapshot/>".
> In this case, the individual driver will make up a <name> and <uuid> for
you,
Does <uuid> here refer to a snapshot UUID? As said before, there is no
easy way have a UUID per snapshot with ESX. Well, we could store
<uuid>:<name> in the name field on the ESX side, but that's not a
really good way to do it.
Yeah, agreed, that was a leftover I forgot to edit out. See my reply to
Jiri Denemark, but essentially I'm content to declare duplicate names
unsupported/undefined, and not deal with UUID's at all. I've removed
mention of UUID's from the documentation now.
<snip>
> The virsh commands will be:
> virsh snapshot-create <dom> <xmlfile>
> virsh snapshot-list <dom>
> virsh snapshot-dumpxml <dom> <name>
> virsh start-with-snapshot <dom> <snapshotname>
> virsh snapshot-delete <dom> <snapshotname>
[--merge|--mergeforce|--delete|--deleteforce]
> virsh snapshot-delete-all <dom>
>
> Possible issues:
> 1) I don't see a way to support "managed" save/restore and
snapshotting with
> this API. I think we'll have to have a separate API for managed save/restore.
What's "managed" save/restore and snapshotting?
Oops, yeah, that's a personal note that I didn't really expound upon. One of the
reasons that I originally started down the path of implementing snapshotting was
to implement save/restore for guest during host shutdown
and startup. Because of the way autostart works within libvirt, we can't have
an external script (ala xendomains) do this; it needs to be handled inside the
libvirt daemon itself, and our current save/restore API is not sufficient for
this. That being said, after all of the discussions we have had about
this snapshotting API, I don't think it will be appropriate to shoehorn this
"managed"
save/restore into this API, and we'll need a separate API for that.
> 3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots
> with the same name, differentiated by UUID. Confusingly, they also have a
> "FindByName" method that returns the first depth-first search snapshot that
matches
> a given name. For qemu, if you specify the same name twice it overwrites the
previous
> one with the new one. I don't know what ESX does here.
ESX 4.0 allows multiple snapshots with the same name. I think this is
because ESX 4.0 has an integer ID per snapshot. I'm not sure if ESX
3.5 allows multiple snapshots with the same name, because the ID field
was added in ESX 4.0. I assume ESX 3.5 doesn't allow multiple
snapshots with the same name, but I have currently no ESX 3.5 at hand
to test.
We could use this integer ID and convert it to UUID format, but you
won't be able to set the UUID, it'll be read-only and only available
on ESX 4.0 and above.
Yeah, again, I'm happy to drop UUID and declare duplicate names unsupported
unless there is a good use case.
> Mapping of our interface to various hypervisors:
>
+-------------------------------+-----------------+-------------------+------------------------------+
> | Libvirt | Qemu | Virtualbox | ESX
|
>
+-------------------------------+-----------------+-------------------+------------------------------+
> | virDomainSnapshotCreateXML | monitor command | takeSnapshot |
CreateSnapshot_task |
> | | "savevm"; if | Snapshots can |
takes a name, description, |
> | | snapshot name | be taken on | memory
(true/false) and |
> | | is already in | powered off, | quiesce
(true/false). |
> | | use, replaces | saved, running, | What does
"memory" mean? |
If memory is true, ESX snapshots the memory of the domain too,
otherwise only a disk snapshot is created.
Creating a disk-only snapshot is nearly instant, while creating a
memory snapshot also requires a notable amount of time to write the
memory image to disk.
Sorry, I misread the documentation yesterday. That's fairly clear.
What's less clear to me is what happens when you take a disk-only snapshot,
and then try to RevertToSnapshot from a running VM. What happens in that case?
> | | the previous | or paused VMs. | Should we
model "quiesce" |
The vSphere API docs give a good description what the quiesce option does:
"If TRUE and the virtual machine is powered on when the snapshot is
taken, VMware Tools is used to quiesce the file system in the virtual
machine. This assures that a disk snapshot represents a consistent
state of the guest file systems. If the virtual machine is powered off
or VMware Tools are not available, the quiesce flag is ignored."
I assume "quiesce the file system" means to flush write caches and
stuff like that.
This option is important if you want to create a disk-only snapshot of
a running domain.
Exactly. I'm not sure this is going to be possible in general (and
I guess it's not even really possible in ESX unless you install VMware
Tools inside the guest). I'm inclined not to model it at the moment,
although I could be convinced otherwise.
> | | snapshot. Also | The snapshot is | Trees of
snapshots are |
> | | qemu-img | always taken | supported.
What happens |
> | | snapshot -c can | against the | on a
duplicate name? What |
> | | be used to | current snapshot. | state(s) can
a VM be in |
> | | create a | What happens on | when calling
this? Does |
> | | disk-only | a duplicate | a VM get
paused when this |
> | | snapshot. What | name? Trees of | is called?
|
In case of ESX the domain can be in any state when a snapshot is created.
If the domain is running when you create a snapshot then the domain is
_not_ paused during the snapshot creation.
I tested it and the memory snapshot represents the state at the time
the snapshot command was issued.
OK, great. I'll update these notes about that.
> | | happens if the | snapshots are |
|
> | | VM is running | not currently |
|
> | | when you do | supported. |
|
> | | this? Trees of | Taking a snapshot |
|
> | | snapshots seem | of a running VM |
|
> | | to be supported | pauses the VM |
|
> | | VM gets paused | before taking the |
|
> | | while this is | snapshot. |
|
> | | happening. What | |
|
> | | states can the | |
|
> | | VM be in? | |
|
>
+-------------------------------+-----------------+-------------------+------------------------------+
>
+-------------------------------+-----------------+-------------------+------------------------------+
> | virDomainSnapshotDelete | monitor command | deleteSnapshot |
RemoveSnapshot_Task |
> | | "delvm". What | deletes the |
removes this snapshot and |
> | | happens if the | specified | deletes any
associated |
> | | snapshot is in | snapshot. Takes | storage.
Operates on a |
> | | use? What | an ID. The VM |
VirtualMachineSnapshot |
> | | states can the | must be off. | object. What
states can |
> | | VM be in? Also | Differences to | the VM be in?
What |
> | | qemu-img | children | happens if
this snapshot |
> | | snapshot -d | snapshots will be | is in-use?
What happens |
> | | <name> <file> | merged with the |
to parents and children? |
> | | command can be | children to keep |
|
> | | used. What | children valid. |
|
> | | happens if the | Parent for this |
|
> | | disk is in-use? | snapshot will |
|
> | | What happens to | become parent of |
|
> | | parents and | any children |
|
> | | children? | snapshots. |
|
> | | How do we | |
|
> | | handle merges? | |
|
>
+-------------------------------+-----------------+-------------------+------------------------------+
The domain can be in any state when deleting a snapshot, even if you
delete the current snapshot. VMware has some documentation about how a
snapshot is merged into its parent:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&...
And some more general docs about snapshots:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&...
Regarding what get's merged and where, I should define the terms I'm
using first.
A <--1-- B <--2-- C <--3-- current
+ <--4-- D
I intentionally draw the arrows directed from child to parent.
A, B, C, D are what I call a snapshot, a point in "time" I can switch
to. The disk differences between these points are stored in COW sparse
images, here shown as 1, 2, 3, 4. The current state of the domain is
denoted by the "current" item.
Each snapshot is associated with a disk image: A is associated with
the base image, B with sparse image 1, C with 2 and so on. A special
case is sparse image 3, it's not associated with a snapshot, but with
the current state. Also each snapshot can be associated with a memory
image (not shown here).
The current snapshot in this case is C. If the domain writes changes
to disk, these changes get stored in sparse image 3. If you switch to
another snapshot from here then the changes in 3 are lost, because you
cannot go back to a point where you could access the changes in 3
again.
Now lets delete B. In this case the memory images associated with B is
just discarded and 1 and 2 are merged into 5. That's what I was
referring to when I said ESX merges snapshots into the parent.
A <------5------- C <--3-- current
+ <--4-- D
But this only happens for snapshots like B, that have a parent and a
child (C is such a snapshot too, even if its child isn't an actual
snapshot). If you delete D in this example, then the changes in sparse
image 4 are discarded, because there is no place where they could be
merged. Merging 4 in the base image would alter A, merging 4 and 5
would alter C.
Now as I think of this in detail, it seems that the term "merging into
the parent" is wrong.
In the next example we have snapshot E with parent B.
A <--1-- B <--2-- C <--3-- current
+ <--6-- E
Now what's going to happen if we delete B? In order to preserve C and
E, the changes in 1 need to be merged into 2 and 6, this results in 1
+ 2 = 5 and 1 + 6 = 7. Or rephrased: B is merged in toits children C
and E.
A <------5------- C <--3-- current
+ <------7------- E
So, virDomainSnapshotDelete's semantic for VirtualBox and ESX seems to
be the same. I just used the wrong words to describe it at first.
Sorry for that.
OK, that's very interesting to know. So VirtualBox and ESX seem to do the
same thing here. This is the last thing I have to do testing on with qemu
to get it's semantic; I'll get to that today, and then we can look again
at the semantics of the flags to virDomainSnapshotDelete.
--
Chris Lalancette