[libvirt] [RFC]: Snapshot API v3

Hello, After our discussions about the snapshot API last week, I went ahead and implemented quite a bit of the API. I also went back to the ESX, Virtualbox, and QEMU API's to try and make sure our API's matched up. What's below is my revised API based on that survey. Following my revised API are notes that I took regarding how the libvirt API matches up to the various API's, and some questions about semantics that I had while doing the survey. More comments and questions are welcome. /* NOTE: struct _virDomainSnapshot is a private structure, ala * struct _virDomain. */ typedef struct _virDomainSnapshot virDomainSnapshot; /* Take a snapshot of the current VM state. Throws an error if * the VM is not currently running */ virDomainSnapshotPtr virDomainSnapshotCreateXML(virDomainPtr domain, const char *xmlDesc, unsigned int flags); /* Dump the XML of a snapshot */ /* NOTE: see below for proposed XML */ char *virDomainSnapshotGetXMLDesc(virDomainSnapshotPtr snapshot, unsigned int flags); /* Return the number of snapshots for this domain */ int virDomainSnapshotNum(virDomainPtr domain, unsigned int flags); /* Get the names of all snapshots for this domain */ int virDomainListSnapshotNames(virDomainPtr domain, char **names, int nameslen, unsigned int flags); /* Get a handle to a named snapshot */ virDomainSnapshotPtr virDomainSnapshotLookupByName(virDomainPtr domain, const char *name, unsigned int flags); /* Get a handle to the current in-use snapshot for the domain */ virDomainSnapshotPtr virDomainSnapshotCurrent(virDomainPtr domain, unsigned int flags); /* Start the guest from the snapshot "snapshot" */ int virDomainCreateFromSnapshot(virDomainSnapshotPtr snapshot, unsigned int flags); /* Rename the snapshot */ /* Do we really need this? In theory we could use * virsh snapshot-edit <domain> <name> and then detect * name changes, but that will require a UUID, which may * or may not be overkill */ int virDomainSnapshotRename(virDomainSnapshotPtr snapshot, char *newname, unsigned int flags); /* Delete a snapshot - with no flags, the snapshot is not used anymore, * but also not removed. With a MERGE flag, it merges the snapshot into * the parent snapshot (or the base image, if there is no parent snapshot). * Note that if other snapshots would be discarded because of this * MERGE action, this operation will fail. If that is really what is intended, * use MERGE_FORCE. * * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots * would be discarded because of this delete action, this operation will * fail. If this is really what is intended, use DISCARD_FORCE. * * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive. * * Note that this operation can happen when the domain is running or shut * down, though this is hypervisor specific */ typedef enum { VIR_DOMAIN_SNAPSHOT_DELETE_MERGE, VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE, } virDomainSnapshotDelete; int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot, unsigned int flags); int virDomainSnapshotFree(virDomainSnapshotPtr snapshot); NOTE: During snapshot creation, *none* of the fields are required. That is, you can call virDomainSnapshotCreateXML() with an XML of "<domainsnapshot/>". In this case, the individual driver will make up a <name> and <uuid> for you, the <creationdate> will be set to the current time+date, <description> will be empty, <state> will be the current state of the VM, and <parent> will be set to the current snapshot (if any). If you do want to specify some fields during virDomainSnapshotCreateXML(), note that the only ones that are settable are <name>, <uuid>, and <description>; the rest are ignored, and filled in by the driver when the snapshot is actually created. NOTE: <state> refers to the state of the VM when the snapshot was taken. <domainsnapshot> <name>XYZ</name> <creationdate>...</creationdate> <description>...</description> <state>RUNNING</state> <domain> <uuid>XXXXX-XXXX-XXXX-XXXX-XXXXXXXXX</uuid> </domain> <parent> <name>ABC</name> </parent> </domainsnapshot> The virsh commands will be: virsh snapshot-create <dom> <xmlfile> virsh snapshot-list <dom> virsh snapshot-dumpxml <dom> <name> virsh start-with-snapshot <dom> <snapshotname> virsh snapshot-delete <dom> <snapshotname> [--merge|--mergeforce|--delete|--deleteforce] virsh snapshot-delete-all <dom> Possible issues: 1) I don't see a way to support "managed" save/restore and snapshotting with this API. I think we'll have to have a separate API for managed save/restore. 2) What is the semantic for deleting snapshots from a running domain? Virtualbox seems to not allow you to manipulate snapshots while the domain is running. Qemu does allow this, but it's currently unclear what the exact semantics are. VMware seems to allow manipulation of snapshots while the domain is running. 3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots with the same name, differentiated by UUID. Confusingly, they also have a "FindByName" method that returns the first depth-first search snapshot that matches a given name. For qemu, if you specify the same name twice it overwrites the previous one with the new one. I don't know what ESX does here. Mapping of our interface to various hypervisors: +-------------------------------+-----------------+-------------------+------------------------------+ | Libvirt | Qemu | Virtualbox | ESX | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotCreateXML | monitor command | takeSnapshot | CreateSnapshot_task | | | "savevm"; if | Snapshots can | takes a name, description, | | | snapshot name | be taken on | memory (true/false) and | | | is already in | powered off, | quiesce (true/false). | | | use, replaces | saved, running, | What does "memory" mean? | | | the previous | or paused VMs. | Should we model "quiesce" | | | snapshot. Also | The snapshot is | Trees of snapshots are | | | qemu-img | always taken | supported. What happens | | | snapshot -c can | against the | on a duplicate name? What | | | be used to | current snapshot. | state(s) can a VM be in | | | create a | What happens on | when calling this? Does | | | disk-only | a duplicate | a VM get paused when this | | | snapshot. What | name? Trees of | is called? | | | happens if the | snapshots are | | | | VM is running | not currently | | | | when you do | supported. | | | | this? Trees of | Taking a snapshot | | | | snapshots seem | of a running VM | | | | to be supported | pauses the VM | | | | VM gets paused | before taking the | | | | while this is | snapshot. | | | | happening. What | | | | | states can the | | | | | VM be in? | | | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotGetXMLDesc | Libvirt qemu | GetId | VirtualMachine object-> | | | snapshot | GetDescription | snapshot-> rootSnapshotList | | | metadata | GetTimeStamp | [i].createTime | | | | GetParent | .description | | | | GetCurrentSnapshot| .name | | | | | .id | | | | | .state | | | | | .quiesced | | | | | .vm | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotNum | Libvirt qemu | GetSnapshotCount | VirtualMachine object-> | | | snapshot | | snapshot-> rootSnapshotList | | | metadata | | | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainListSnapshotNames | Libvirt qemu | GetSnapshotCount | VirtualMachine object-> | | | snapshot | GetChildren | snapshot-> rootSnapshotList | | | metadata, or | | | | | listvm monitor | | | | | command or | | | | | qemu-img | | | | | snapshot -l | | | | | <file>. If the | | | | | VM has multiple | | | | | disks and the | | | | | disks have | | | | | different | | | | | snapshots, what | | | | | do you do? | | | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotLookupByName | " | findSnapshot | VirtualMachine object-> | | | | Takes a name and | snapshot-> rootSnapshotList | | | | returns a | | | | | snapshot object. | | | | | In case of | | | | | multiple snapshots| | | | | with the same | | | | | name, it returns | | | | | the first object | | | | | from a depth | | | | | first search. | | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainCreateFromSnapshot | qemu-img | restoreSnapshot | RevertToSnapshot_Task | | | snapshot -a | Takes a snapshot | Changes execution state | | | <snapname> | object, and | of VM to state of this | | | <file>. What | resets the VM's | snapshot. Takes a | | | happens if the | state to that of | snapshot object, an | | | VM has multiple | the snapshot. | optional host, and | | | files with | If this is a | suppressPowerOn, which | | | different | snapshot taken | forces the VM to the off | | | snapshots? | against a running | state regardless of the | | | | machine, then the | state when the snapshot | | | | memory is | was taken. Implies that | | | | restored as well. | without this flag, VM | | | | Does *not* start | starts in whatever | | | | the VM. The VM | <state> was when snapshot | | | | must be off for | was taken. | | | | this operation to | | | | | succeed. | | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotDelete | monitor command | deleteSnapshot | RemoveSnapshot_Task | | | "delvm". What | deletes the | removes this snapshot and | | | happens if the | specified | deletes any associated | | | snapshot is in | snapshot. Takes | storage. Operates on a | | | use? What | an ID. The VM | VirtualMachineSnapshot | | | states can the | must be off. | object. What states can | | | VM be in? Also | Differences to | the VM be in? What | | | qemu-img | children | happens if this snapshot | | | snapshot -d | snapshots will be | is in-use? What happens | | | <name> <file> | merged with the | to parents and children? | | | command can be | children to keep | | | | used. What | children valid. | | | | happens if the | Parent for this | | | | disk is in-use? | snapshot will | | | | What happens to | become parent of | | | | parents and | any children | | | | children? | snapshots. | | | | How do we | | | | | handle merges? | | | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotCurrent | Libvirt qemu | currentSnapshot | VirtualMachine object-> | | | snapshot | | snapshot-> currentSnapshot | | | metadata | | | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotRename | Libvirt qemu | ISnapshot->name | RenameSnapshot | | | snapshot | | | | | metadata | | | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotDelete with | Libvirt qemu | deleteSnapshot | RemoveAllSnapshots_Task | | DISCARD_FORCE flag against | snapshot | with manual depth | | | root snapshot | metadata | deletion of | | | | | children | | +-------------------------------+-----------------+-------------------+------------------------------+ | | | | RevertToCurrentSnapshot_Task | +-------------------------------+-----------------+-------------------+------------------------------+ Attribute mapping: +----------------+----------+-------------+----------------------------+ | Libvirt | Qemu | Virtualbox | ESX | +----------------+----------+-------------+----------------------------+ | <name> | TAG | name | name | +----------------+----------+-------------+----------------------------+ | <creationdate> | DATE | timeStamp | createTime | +----------------+----------+-------------+----------------------------+ | <description> | Libvirt | description | description | | | qemu | | | | | metadata | | | +----------------+----------+-------------+----------------------------+ | <state> | Libvirt | online | state (this is the *power* | | | qemu | | state of the VM when this | | | metadata | | snapshot was taken) | +----------------+----------+-------------+----------------------------+ | <domain><uuid> | Libvirt | machine | vm | | | qemu | | | | | metadata | | | +----------------+----------+-------------+----------------------------+ | <parent><name> | Libvirt | parent | N/A | | | qemu | | | | | metadata | | | +----------------+----------+-------------+----------------------------+ | N/A | N/A | id (uuid) | id | +----------------+----------+-------------+----------------------------+ | N/A | N/A | children | childSnapshotList | +----------------+----------+-------------+----------------------------+ | N/A | N/A | N/A | memory | +----------------+----------+-------------+----------------------------+ | N/A | N/A | N/A | quiesced | +----------------+----------+-------------+----------------------------+ | N/A | N/A | N/A | backupManifest | +----------------+----------+-------------+----------------------------+ | N/A | N/A | N/A | replaySupported | +----------------+----------+-------------+----------------------------+ -- Chris Lalancette

Hi. ...
/* NOTE: struct _virDomainSnapshot is a private structure, ala * struct _virDomain. */ typedef struct _virDomainSnapshot virDomainSnapshot;
/* Take a snapshot of the current VM state. Throws an error if * the VM is not currently running */ virDomainSnapshotPtr virDomainSnapshotCreateXML(virDomainPtr domain, const char *xmlDesc, unsigned int flags);
This is probably a leftover from previous versions, but... why do we restrict this API only for running VMs? ...
/* Delete a snapshot - with no flags, the snapshot is not used anymore, * but also not removed. With a MERGE flag, it merges the snapshot into * the parent snapshot (or the base image, if there is no parent snapshot). * Note that if other snapshots would be discarded because of this * MERGE action, this operation will fail. If that is really what is intended, * use MERGE_FORCE. * * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots * would be discarded because of this delete action, this operation will * fail. If this is really what is intended, use DISCARD_FORCE. * * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive. * * Note that this operation can happen when the domain is running or shut * down, though this is hypervisor specific */ typedef enum { VIR_DOMAIN_SNAPSHOT_DELETE_MERGE, VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE, } virDomainSnapshotDelete;
Merging a snapshot into its parent is probably not the best semantics for MERGE flag as hypervisors differ in the way merging is implemented. As you also mention below, VirtualBox merges into all children instead of a parent. We should allow for both cases. However it influences several things. Firstly, it makes MERGE_FORCE unnecessary for child merging, which is not a big deal as it can just be treated in the same way as MERGE. Secondly, it makes a huge difference when deleting a snapshot with no child. In one case it results in changes being merged and in other case it results changes begin dropped. One option is to refine the semantics to something like: - MERGE: merge changes into other snapshot(s) and fail if it would require any snapshot to be discarded (even the one which was supposed to be merged) - MERGE_FORCE: really merge even discarding other snapshots but fail if the snapshot itself would actually be discarded - DISCARD: discard the snapshot and fail if other snapshots would be discarded - DISCARD_FORCE: discard, no matter what Another option would be to introduce several different APIs for merging into children, merging into parent, and discarding. That would allow drivers to implement only supported methods. Even all of them for a very flexible hypervisor. And the third option I see would be distinguishing merge direction using new flags. Personally, I like the second option best as it provides the easiest way for application to detect unsupported behavior. ...
Possible issues: 1) I don't see a way to support "managed" save/restore and snapshotting with this API. I think we'll have to have a separate API for managed save/restore. 2) What is the semantic for deleting snapshots from a running domain? Virtualbox seems to not allow you to manipulate snapshots while the domain is running. Qemu does allow this, but it's currently unclear what the exact semantics are. VMware seems to allow manipulation of snapshots while the domain is running. 3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots with the same name, differentiated by UUID. Confusingly, they also have a "FindByName" method that returns the first depth-first search snapshot that matches a given name. For qemu, if you specify the same name twice it overwrites the previous one with the new one. I don't know what ESX does here.
Libvirt uses/generates UUIDs for almost everything (networks, vms, ...) so it might be more consistent to have UUID in snapshot as well. Jirka

2010/3/30 Jiri Denemark <jdenemar@redhat.com>:
Hi.
...
/* NOTE: struct _virDomainSnapshot is a private structure, ala * struct _virDomain. */ typedef struct _virDomainSnapshot virDomainSnapshot;
/* Take a snapshot of the current VM state. Throws an error if * the VM is not currently running */ virDomainSnapshotPtr virDomainSnapshotCreateXML(virDomainPtr domain, const char *xmlDesc, unsigned int flags);
This is probably a leftover from previous versions, but... why do we restrict this API only for running VMs?
Yep, if a domain it not running you'll just get a disk snapshot without a memory snapshot.
...
Possible issues: 1) I don't see a way to support "managed" save/restore and snapshotting with this API. I think we'll have to have a separate API for managed save/restore. 2) What is the semantic for deleting snapshots from a running domain? Virtualbox seems to not allow you to manipulate snapshots while the domain is running. Qemu does allow this, but it's currently unclear what the exact semantics are. VMware seems to allow manipulation of snapshots while the domain is running. 3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots with the same name, differentiated by UUID. Confusingly, they also have a "FindByName" method that returns the first depth-first search snapshot that matches a given name. For qemu, if you specify the same name twice it overwrites the previous one with the new one. I don't know what ESX does here.
Libvirt uses/generates UUIDs for almost everything (networks, vms, ...) so it might be more consistent to have UUID in snapshot as well.
ESX snapshots don't have a UUID, and there is no easy way to store an UUID-to-snapshot mapping. Matthias

On 03/30/2010 02:52 PM, Matthias Bolte wrote:
2010/3/30 Jiri Denemark <jdenemar@redhat.com>:
Hi.
...
/* NOTE: struct _virDomainSnapshot is a private structure, ala * struct _virDomain. */ typedef struct _virDomainSnapshot virDomainSnapshot;
/* Take a snapshot of the current VM state. Throws an error if * the VM is not currently running */ virDomainSnapshotPtr virDomainSnapshotCreateXML(virDomainPtr domain, const char *xmlDesc, unsigned int flags);
This is probably a leftover from previous versions, but... why do we restrict this API only for running VMs?
Yep, if a domain it not running you'll just get a disk snapshot without a memory snapshot.
Is taking a halted disk snapshot something we might want to allow with a flag, though? For that matter, is this API useful for taking a disk snapshot and disregarding a memory snapshot even of a running VM? -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 03/30/2010 05:26 PM, Eric Blake wrote:
On 03/30/2010 02:52 PM, Matthias Bolte wrote:
2010/3/30 Jiri Denemark <jdenemar@redhat.com>:
Hi.
...
/* NOTE: struct _virDomainSnapshot is a private structure, ala * struct _virDomain. */ typedef struct _virDomainSnapshot virDomainSnapshot;
/* Take a snapshot of the current VM state. Throws an error if * the VM is not currently running */ virDomainSnapshotPtr virDomainSnapshotCreateXML(virDomainPtr domain, const char *xmlDesc, unsigned int flags);
This is probably a leftover from previous versions, but... why do we restrict this API only for running VMs?
Yep, if a domain it not running you'll just get a disk snapshot without a memory snapshot.
Is taking a halted disk snapshot something we might want to allow with a flag, though? For that matter, is this API useful for taking a disk snapshot and disregarding a memory snapshot even of a running VM?
The problem with disgregarding a memory snapshot of a running guest is that it is very easy to get inconsistent snapshots; that is, the guest could have data cached in memory that is not on disk when you snapshot, and then you have completely inconsistent results on disk. ESX allows you to do a disk only snapshot of a running guest, but they cheat; they also have a way to quiesce the guest (force writes), thus ensuring a consistent snapshot. Virtualbox doesn't give you a choice in the matter; if the guest is off when you take a snapshot, you get a disk snapshot, and if it's on when you take a snapshot, you get a disk+memory snapshot. Qemu allows both usages, although it must have the consistency problems I mentioned above for a disk-only snapshot on a running guest. For now, I think we can stick with the disk-only semantic for shutoff guests, and the disk+memory semantic for running guests. If it turns out that there is a need for a disk-only semantic for running guests, this should be easy to add later via a flag to virDomainSnapshotCreateXML. -- Chris Lalancette

On Tue, Mar 30, 2010 at 06:39:02PM -0400, Chris Lalancette wrote:
On 03/30/2010 05:26 PM, Eric Blake wrote:
On 03/30/2010 02:52 PM, Matthias Bolte wrote:
Yep, if a domain it not running you'll just get a disk snapshot without a memory snapshot.
Is taking a halted disk snapshot something we might want to allow with a flag, though? For that matter, is this API useful for taking a disk snapshot and disregarding a memory snapshot even of a running VM?
The problem with disgregarding a memory snapshot of a running guest is that it is very easy to get inconsistent snapshots; that is, the guest could have data cached in memory that is not on disk when you snapshot, and then you have completely inconsistent results on disk.
ESX allows you to do a disk only snapshot of a running guest, but they cheat; they also have a way to quiesce the guest (force writes), thus ensuring a consistent snapshot.
Virtualbox doesn't give you a choice in the matter; if the guest is off when you take a snapshot, you get a disk snapshot, and if it's on when you take a snapshot, you get a disk+memory snapshot.
Qemu allows both usages, although it must have the consistency problems I mentioned above for a disk-only snapshot on a running guest.
For now, I think we can stick with the disk-only semantic for shutoff guests, and the disk+memory semantic for running guests. If it turns out that there is a need for a disk-only semantic for running guests, this should be easy to add later via a flag to virDomainSnapshotCreateXML.
yes, that sounds a reasonable approach to me. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On 03/30/2010 04:40 PM, Jiri Denemark wrote:
Hi.
...
/* NOTE: struct _virDomainSnapshot is a private structure, ala * struct _virDomain. */ typedef struct _virDomainSnapshot virDomainSnapshot;
/* Take a snapshot of the current VM state. Throws an error if * the VM is not currently running */ virDomainSnapshotPtr virDomainSnapshotCreateXML(virDomainPtr domain, const char *xmlDesc, unsigned int flags);
This is probably a leftover from previous versions, but... why do we restrict this API only for running VMs?
Oops, yeah, you are right, I just forgot to change the comment. It now says: /* Take a snapshot of the current VM state. */
...
/* Delete a snapshot - with no flags, the snapshot is not used anymore, * but also not removed. With a MERGE flag, it merges the snapshot into * the parent snapshot (or the base image, if there is no parent snapshot). * Note that if other snapshots would be discarded because of this * MERGE action, this operation will fail. If that is really what is intended, * use MERGE_FORCE. * * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots * would be discarded because of this delete action, this operation will * fail. If this is really what is intended, use DISCARD_FORCE. * * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive. * * Note that this operation can happen when the domain is running or shut * down, though this is hypervisor specific */ typedef enum { VIR_DOMAIN_SNAPSHOT_DELETE_MERGE, VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE, } virDomainSnapshotDelete;
Merging a snapshot into its parent is probably not the best semantics for MERGE flag as hypervisors differ in the way merging is implemented. As you
Yeah, you are right, we can't just declare this. It's also an open question to me what qemu does about merging (though bugs with loadvm/delvm are preventing me from testing this at the moment).
also mention below, VirtualBox merges into all children instead of a parent. We should allow for both cases. However it influences several things. Firstly, it makes MERGE_FORCE unnecessary for child merging, which is not a big deal as it can just be treated in the same way as MERGE. Secondly, it makes a huge difference when deleting a snapshot with no child. In one case it results in changes being merged and in other case it results changes begin dropped.
One option is to refine the semantics to something like:
- MERGE: merge changes into other snapshot(s) and fail if it would require any snapshot to be discarded (even the one which was supposed to be merged) - MERGE_FORCE: really merge even discarding other snapshots but fail if the snapshot itself would actually be discarded - DISCARD: discard the snapshot and fail if other snapshots would be discarded - DISCARD_FORCE: discard, no matter what
The problem with declaring these semantics is that they are somewhat confusing, so application developers will probably get them wrong. On the other hand it does allow us to declare a semantic that all 3 hypervisors probably can support, unlike the options below.
Another option would be to introduce several different APIs for merging into children, merging into parent, and discarding. That would allow drivers to implement only supported methods. Even all of them for a very flexible hypervisor.
The problem with this one is that it will be difficult for application writers to write a single application that handles all of the hypervisors. Imagine trying to write a GUI around this, and you'll see what I mean. If we were to add 3 new API's like this, we would also probably want to add some data to the capabilities XML for the particular hypervisors so you could grey out specific options in a GUI. On the other hand, it does solve the problem with merging parents vs. children, and also diffuses Paolo's concern about the "SnapshotDelete" API sometimes deleting data and sometimes modifying the base image.
And the third option I see would be distinguishing merge direction using new flags.
This one is like the second option, in that you don't know which particular directions a particular hypervisor supports. You'd still need to add capabilities XML for each hypervisor. I have to say that after thinking about these 3 options, I like the first option the best. While it's slightly confusing, it is a good semantic. I'll update the documentation for this.
Personally, I like the second option best as it provides the easiest way for application to detect unsupported behavior.
...
Possible issues: 1) I don't see a way to support "managed" save/restore and snapshotting with this API. I think we'll have to have a separate API for managed save/restore. 2) What is the semantic for deleting snapshots from a running domain? Virtualbox seems to not allow you to manipulate snapshots while the domain is running. Qemu does allow this, but it's currently unclear what the exact semantics are. VMware seems to allow manipulation of snapshots while the domain is running. 3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots with the same name, differentiated by UUID. Confusingly, they also have a "FindByName" method that returns the first depth-first search snapshot that matches a given name. For qemu, if you specify the same name twice it overwrites the previous one with the new one. I don't know what ESX does here.
Libvirt uses/generates UUIDs for almost everything (networks, vms, ...) so it might be more consistent to have UUID in snapshot as well.
Yeah, that is true, which is why I'm waffling with it. In general it seems like superflous information, except in the one case of virtualbox duplicate names (ESX doesn't allow duplicate names, I don't think, and qemu blows away duplicates). My inclination is to declare the semantics of duplicate names to be undefined, since it doesn't seem to be a very useful feature. -- Chris Lalancette

2010/3/31 Chris Lalancette <clalance@redhat.com>:
On 03/30/2010 04:40 PM, Jiri Denemark wrote:
Hi.
...
/* NOTE: struct _virDomainSnapshot is a private structure, ala * struct _virDomain. */ typedef struct _virDomainSnapshot virDomainSnapshot;
/* Take a snapshot of the current VM state. Throws an error if * the VM is not currently running */ virDomainSnapshotPtr virDomainSnapshotCreateXML(virDomainPtr domain, const char *xmlDesc, unsigned int flags);
This is probably a leftover from previous versions, but... why do we restrict this API only for running VMs?
Oops, yeah, you are right, I just forgot to change the comment. It now says:
/* Take a snapshot of the current VM state. */
...
/* Delete a snapshot - with no flags, the snapshot is not used anymore, * but also not removed. With a MERGE flag, it merges the snapshot into * the parent snapshot (or the base image, if there is no parent snapshot). * Note that if other snapshots would be discarded because of this * MERGE action, this operation will fail. If that is really what is intended, * use MERGE_FORCE. * * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots * would be discarded because of this delete action, this operation will * fail. If this is really what is intended, use DISCARD_FORCE. * * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive. * * Note that this operation can happen when the domain is running or shut * down, though this is hypervisor specific */ typedef enum { VIR_DOMAIN_SNAPSHOT_DELETE_MERGE, VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE, } virDomainSnapshotDelete;
Merging a snapshot into its parent is probably not the best semantics for MERGE flag as hypervisors differ in the way merging is implemented. As you
Yeah, you are right, we can't just declare this. It's also an open question to me what qemu does about merging (though bugs with loadvm/delvm are preventing me from testing this at the moment).
And as described in my other response ESX does merge-into-children too, but I didn't realize this until now. I just used the wrong words to describe it at first. So, after I thought about this in detail, I'd like to ask: Is there something like a merge-into-parent semantic at all?
also mention below, VirtualBox merges into all children instead of a parent. We should allow for both cases. However it influences several things. Firstly, it makes MERGE_FORCE unnecessary for child merging, which is not a big deal as it can just be treated in the same way as MERGE. Secondly, it makes a huge difference when deleting a snapshot with no child. In one case it results in changes being merged and in other case it results changes begin dropped.
One option is to refine the semantics to something like:
- MERGE: merge changes into other snapshot(s) and fail if it would require any snapshot to be discarded (even the one which was supposed to be merged) - MERGE_FORCE: really merge even discarding other snapshots but fail if the snapshot itself would actually be discarded - DISCARD: discard the snapshot and fail if other snapshots would be discarded - DISCARD_FORCE: discard, no matter what
The problem with declaring these semantics is that they are somewhat confusing, so application developers will probably get them wrong. On the other hand it does allow us to declare a semantic that all 3 hypervisors probably can support, unlike the options below.
Another option would be to introduce several different APIs for merging into children, merging into parent, and discarding. That would allow drivers to implement only supported methods. Even all of them for a very flexible hypervisor.
The problem with this one is that it will be difficult for application writers to write a single application that handles all of the hypervisors. Imagine trying to write a GUI around this, and you'll see what I mean. If we were to add 3 new API's like this, we would also probably want to add some data to the capabilities XML for the particular hypervisors so you could grey out specific options in a GUI.
On the other hand, it does solve the problem with merging parents vs. children, and also diffuses Paolo's concern about the "SnapshotDelete" API sometimes deleting data and sometimes modifying the base image.
Again, I think there is nothing like "merging parents vs. children". The only possibility is merging into children. Or am I totally confused now?
And the third option I see would be distinguishing merge direction using new flags.
This one is like the second option, in that you don't know which particular directions a particular hypervisor supports. You'd still need to add capabilities XML for each hypervisor.
I have to say that after thinking about these 3 options, I like the first option the best. While it's slightly confusing, it is a good semantic. I'll update the documentation for this.
Personally, I like the second option best as it provides the easiest way for application to detect unsupported behavior.
...
Possible issues: 1) I don't see a way to support "managed" save/restore and snapshotting with this API. I think we'll have to have a separate API for managed save/restore. 2) What is the semantic for deleting snapshots from a running domain? Virtualbox seems to not allow you to manipulate snapshots while the domain is running. Qemu does allow this, but it's currently unclear what the exact semantics are. VMware seems to allow manipulation of snapshots while the domain is running. 3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots with the same name, differentiated by UUID. Confusingly, they also have a "FindByName" method that returns the first depth-first search snapshot that matches a given name. For qemu, if you specify the same name twice it overwrites the previous one with the new one. I don't know what ESX does here.
Libvirt uses/generates UUIDs for almost everything (networks, vms, ...) so it might be more consistent to have UUID in snapshot as well.
Yeah, that is true, which is why I'm waffling with it. In general it seems like superflous information, except in the one case of virtualbox duplicate names (ESX doesn't allow duplicate names, I don't think, and qemu blows away duplicates). My inclination is to declare the semantics of duplicate names to be undefined, since it doesn't seem to be a very useful feature.
As said in another response: ESX 4.0 allows duplicate names. It distinguishes snapshots based on their ID. But this ID was added in ESX 4.0. I think ESX 3.5 doesn't allow duplicate names. With ESX 4.0 we could use the ID to derive a read-only UUID from it. With ESX 3.5 we would need to abuse the name filed of a snapshot on the ESX side to store a UUID per snapshot. Matthias

On Tue, Mar 30, 2010 at 06:13:07PM -0400, Chris Lalancette wrote:
On 03/30/2010 04:40 PM, Jiri Denemark wrote: [...]
also mention below, VirtualBox merges into all children instead of a parent. We should allow for both cases. However it influences several things. Firstly, it makes MERGE_FORCE unnecessary for child merging, which is not a big deal as it can just be treated in the same way as MERGE. Secondly, it makes a huge difference when deleting a snapshot with no child. In one case it results in changes being merged and in other case it results changes begin dropped.
One option is to refine the semantics to something like:
- MERGE: merge changes into other snapshot(s) and fail if it would require any snapshot to be discarded (even the one which was supposed to be merged) - MERGE_FORCE: really merge even discarding other snapshots but fail if the snapshot itself would actually be discarded - DISCARD: discard the snapshot and fail if other snapshots would be discarded - DISCARD_FORCE: discard, no matter what
The problem with declaring these semantics is that they are somewhat confusing, so application developers will probably get them wrong. On the other hand it does allow us to declare a semantic that all 3 hypervisors probably can support, unlike the options below.
Well I think if we documents the 4 different flags with small graphic examples showing what happen to hierarchies of snapshot on the operation, I don't see why the users would be confused. The only other option to avoid user confusion is to reduce capabilities drastically, and I don't think we want to go there.
Libvirt uses/generates UUIDs for almost everything (networks, vms, ...) so it might be more consistent to have UUID in snapshot as well.
Yeah, that is true, which is why I'm waffling with it. In general it seems like superflous information, except in the one case of virtualbox duplicate names (ESX doesn't allow duplicate names, I don't think, and qemu blows away duplicates). My inclination is to declare the semantics of duplicate names to be undefined, since it doesn't seem to be a very useful feature.
IMHO uuid allows to name things uniquely when we can't check for uniqueness at creation (e.g. the domain has been moving around and we don't have a shared storage for snaphots). That's an edge case but if using also UUID is cheap I would go for it. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

2010/3/30 Chris Lalancette <clalance@redhat.com>:
Hello, After our discussions about the snapshot API last week, I went ahead and implemented quite a bit of the API. I also went back to the ESX, Virtualbox, and QEMU API's to try and make sure our API's matched up. What's below is my revised API based on that survey. Following my revised API are notes that I took regarding how the libvirt API matches up to the various API's, and some questions about semantics that I had while doing the survey. More comments and questions are welcome.
/* Start the guest from the snapshot "snapshot" */ int virDomainCreateFromSnapshot(virDomainSnapshotPtr snapshot, unsigned int flags);
Will it be enforced that the domain is shutdown in order to call this function? ESX doesn't have such a restriction. Not sure about other hypervisors.
/* Rename the snapshot */ /* Do we really need this? In theory we could use * virsh snapshot-edit <domain> <name> and then detect * name changes, but that will require a UUID, which may * or may not be overkill */ int virDomainSnapshotRename(virDomainSnapshotPtr snapshot, char *newname, unsigned int flags);
/* Delete a snapshot - with no flags, the snapshot is not used anymore, * but also not removed. With a MERGE flag, it merges the snapshot into * the parent snapshot (or the base image, if there is no parent snapshot).
Note that "merging into the parent" seems to be the wrong term, even in case of ESX. See detailed discussion below.
* Note that if other snapshots would be discarded because of this * MERGE action, this operation will fail. If that is really what is intended, * use MERGE_FORCE. * * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots * would be discarded because of this delete action, this operation will * fail. If this is really what is intended, use DISCARD_FORCE. * * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive. * * Note that this operation can happen when the domain is running or shut * down, though this is hypervisor specific */ typedef enum { VIR_DOMAIN_SNAPSHOT_DELETE_MERGE, VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE, } virDomainSnapshotDelete; int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot, unsigned int flags);
int virDomainSnapshotFree(virDomainSnapshotPtr snapshot);
NOTE: During snapshot creation, *none* of the fields are required. That is, you can call virDomainSnapshotCreateXML() with an XML of "<domainsnapshot/>". In this case, the individual driver will make up a <name> and <uuid> for you,
Does <uuid> here refer to a snapshot UUID? As said before, there is no easy way have a UUID per snapshot with ESX. Well, we could store <uuid>:<name> in the name field on the ESX side, but that's not a really good way to do it.
the <creationdate> will be set to the current time+date, <description> will be empty, <state> will be the current state of the VM, and <parent> will be set to the current snapshot (if any). If you do want to specify some fields during virDomainSnapshotCreateXML(), note that the only ones that are settable are <name>, <uuid>, and <description>; the rest are ignored, and filled in by the driver when the snapshot is actually created. NOTE: <state> refers to the state of the VM when the snapshot was taken.
<domainsnapshot> <name>XYZ</name> <creationdate>...</creationdate> <description>...</description> <state>RUNNING</state> <domain> <uuid>XXXXX-XXXX-XXXX-XXXX-XXXXXXXXX</uuid> </domain> <parent> <name>ABC</name> </parent> </domainsnapshot>
The virsh commands will be: virsh snapshot-create <dom> <xmlfile> virsh snapshot-list <dom> virsh snapshot-dumpxml <dom> <name> virsh start-with-snapshot <dom> <snapshotname> virsh snapshot-delete <dom> <snapshotname> [--merge|--mergeforce|--delete|--deleteforce] virsh snapshot-delete-all <dom>
Possible issues: 1) I don't see a way to support "managed" save/restore and snapshotting with this API. I think we'll have to have a separate API for managed save/restore.
What's "managed" save/restore and snapshotting?
3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots with the same name, differentiated by UUID. Confusingly, they also have a "FindByName" method that returns the first depth-first search snapshot that matches a given name. For qemu, if you specify the same name twice it overwrites the previous one with the new one. I don't know what ESX does here.
ESX 4.0 allows multiple snapshots with the same name. I think this is because ESX 4.0 has an integer ID per snapshot. I'm not sure if ESX 3.5 allows multiple snapshots with the same name, because the ID field was added in ESX 4.0. I assume ESX 3.5 doesn't allow multiple snapshots with the same name, but I have currently no ESX 3.5 at hand to test. We could use this integer ID and convert it to UUID format, but you won't be able to set the UUID, it'll be read-only and only available on ESX 4.0 and above.
Mapping of our interface to various hypervisors: +-------------------------------+-----------------+-------------------+------------------------------+ | Libvirt | Qemu | Virtualbox | ESX | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotCreateXML | monitor command | takeSnapshot | CreateSnapshot_task | | | "savevm"; if | Snapshots can | takes a name, description, | | | snapshot name | be taken on | memory (true/false) and | | | is already in | powered off, | quiesce (true/false). | | | use, replaces | saved, running, | What does "memory" mean? |
If memory is true, ESX snapshots the memory of the domain too, otherwise only a disk snapshot is created. Creating a disk-only snapshot is nearly instant, while creating a memory snapshot also requires a notable amount of time to write the memory image to disk.
| | the previous | or paused VMs. | Should we model "quiesce" |
The vSphere API docs give a good description what the quiesce option does: "If TRUE and the virtual machine is powered on when the snapshot is taken, VMware Tools is used to quiesce the file system in the virtual machine. This assures that a disk snapshot represents a consistent state of the guest file systems. If the virtual machine is powered off or VMware Tools are not available, the quiesce flag is ignored." I assume "quiesce the file system" means to flush write caches and stuff like that. This option is important if you want to create a disk-only snapshot of a running domain.
| | snapshot. Also | The snapshot is | Trees of snapshots are | | | qemu-img | always taken | supported. What happens | | | snapshot -c can | against the | on a duplicate name? What | | | be used to | current snapshot. | state(s) can a VM be in | | | create a | What happens on | when calling this? Does | | | disk-only | a duplicate | a VM get paused when this | | | snapshot. What | name? Trees of | is called? |
In case of ESX the domain can be in any state when a snapshot is created. If the domain is running when you create a snapshot then the domain is _not_ paused during the snapshot creation. I tested it and the memory snapshot represents the state at the time the snapshot command was issued.
| | happens if the | snapshots are | | | | VM is running | not currently | | | | when you do | supported. | | | | this? Trees of | Taking a snapshot | | | | snapshots seem | of a running VM | | | | to be supported | pauses the VM | | | | VM gets paused | before taking the | | | | while this is | snapshot. | | | | happening. What | | | | | states can the | | | | | VM be in? | | | +-------------------------------+-----------------+-------------------+------------------------------+
+-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotDelete | monitor command | deleteSnapshot | RemoveSnapshot_Task | | | "delvm". What | deletes the | removes this snapshot and | | | happens if the | specified | deletes any associated | | | snapshot is in | snapshot. Takes | storage. Operates on a | | | use? What | an ID. The VM | VirtualMachineSnapshot | | | states can the | must be off. | object. What states can | | | VM be in? Also | Differences to | the VM be in? What | | | qemu-img | children | happens if this snapshot | | | snapshot -d | snapshots will be | is in-use? What happens | | | <name> <file> | merged with the | to parents and children? | | | command can be | children to keep | | | | used. What | children valid. | | | | happens if the | Parent for this | | | | disk is in-use? | snapshot will | | | | What happens to | become parent of | | | | parents and | any children | | | | children? | snapshots. | | | | How do we | | | | | handle merges? | | | +-------------------------------+-----------------+-------------------+------------------------------+
The domain can be in any state when deleting a snapshot, even if you delete the current snapshot. VMware has some documentation about how a snapshot is merged into its parent: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002836 And some more general docs about snapshots: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180 Regarding what get's merged and where, I should define the terms I'm using first. A <--1-- B <--2-- C <--3-- current + <--4-- D I intentionally draw the arrows directed from child to parent. A, B, C, D are what I call a snapshot, a point in "time" I can switch to. The disk differences between these points are stored in COW sparse images, here shown as 1, 2, 3, 4. The current state of the domain is denoted by the "current" item. Each snapshot is associated with a disk image: A is associated with the base image, B with sparse image 1, C with 2 and so on. A special case is sparse image 3, it's not associated with a snapshot, but with the current state. Also each snapshot can be associated with a memory image (not shown here). The current snapshot in this case is C. If the domain writes changes to disk, these changes get stored in sparse image 3. If you switch to another snapshot from here then the changes in 3 are lost, because you cannot go back to a point where you could access the changes in 3 again. Now lets delete B. In this case the memory images associated with B is just discarded and 1 and 2 are merged into 5. That's what I was referring to when I said ESX merges snapshots into the parent. A <------5------- C <--3-- current + <--4-- D But this only happens for snapshots like B, that have a parent and a child (C is such a snapshot too, even if its child isn't an actual snapshot). If you delete D in this example, then the changes in sparse image 4 are discarded, because there is no place where they could be merged. Merging 4 in the base image would alter A, merging 4 and 5 would alter C. Now as I think of this in detail, it seems that the term "merging into the parent" is wrong. In the next example we have snapshot E with parent B. A <--1-- B <--2-- C <--3-- current + <--6-- E Now what's going to happen if we delete B? In order to preserve C and E, the changes in 1 need to be merged into 2 and 6, this results in 1 + 2 = 5 and 1 + 6 = 7. Or rephrased: B is merged in toits children C and E. A <------5------- C <--3-- current + <------7------- E So, virDomainSnapshotDelete's semantic for VirtualBox and ESX seems to be the same. I just used the wrong words to describe it at first. Sorry for that. Matthias

On 03/30/2010 08:14 PM, Matthias Bolte wrote:
2010/3/30 Chris Lalancette <clalance@redhat.com>:
Hello, After our discussions about the snapshot API last week, I went ahead and implemented quite a bit of the API. I also went back to the ESX, Virtualbox, and QEMU API's to try and make sure our API's matched up. What's below is my revised API based on that survey. Following my revised API are notes that I took regarding how the libvirt API matches up to the various API's, and some questions about semantics that I had while doing the survey. More comments and questions are welcome.
/* Start the guest from the snapshot "snapshot" */ int virDomainCreateFromSnapshot(virDomainSnapshotPtr snapshot, unsigned int flags);
Will it be enforced that the domain is shutdown in order to call this function?
ESX doesn't have such a restriction. Not sure about other hypervisors.
Heh, I was just going through that myself. No, it's not required to be shutdown in general; qemu supports both modes. I've updated the documentation for this call. <snip>
* Note that if other snapshots would be discarded because of this * MERGE action, this operation will fail. If that is really what is intended, * use MERGE_FORCE. * * With a DISCARD flag, it deletes the snapshot. Note that if children snapshots * would be discarded because of this delete action, this operation will * fail. If this is really what is intended, use DISCARD_FORCE. * * MERGE, MERGE_FORCE, DISCARD, and DISCARD_FORCE are mutually-exclusive. * * Note that this operation can happen when the domain is running or shut * down, though this is hypervisor specific */ typedef enum { VIR_DOMAIN_SNAPSHOT_DELETE_MERGE, VIR_DOMAIN_SNAPSHOT_DELETE_MERGE_FORCE, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD, VIR_DOMAIN_SNAPSHOT_DELETE_DISCARD_FORCE, } virDomainSnapshotDelete; int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot, unsigned int flags);
int virDomainSnapshotFree(virDomainSnapshotPtr snapshot);
NOTE: During snapshot creation, *none* of the fields are required. That is, you can call virDomainSnapshotCreateXML() with an XML of "<domainsnapshot/>". In this case, the individual driver will make up a <name> and <uuid> for you,
Does <uuid> here refer to a snapshot UUID? As said before, there is no easy way have a UUID per snapshot with ESX. Well, we could store <uuid>:<name> in the name field on the ESX side, but that's not a really good way to do it.
Yeah, agreed, that was a leftover I forgot to edit out. See my reply to Jiri Denemark, but essentially I'm content to declare duplicate names unsupported/undefined, and not deal with UUID's at all. I've removed mention of UUID's from the documentation now. <snip>
The virsh commands will be: virsh snapshot-create <dom> <xmlfile> virsh snapshot-list <dom> virsh snapshot-dumpxml <dom> <name> virsh start-with-snapshot <dom> <snapshotname> virsh snapshot-delete <dom> <snapshotname> [--merge|--mergeforce|--delete|--deleteforce] virsh snapshot-delete-all <dom>
Possible issues: 1) I don't see a way to support "managed" save/restore and snapshotting with this API. I think we'll have to have a separate API for managed save/restore.
What's "managed" save/restore and snapshotting?
Oops, yeah, that's a personal note that I didn't really expound upon. One of the reasons that I originally started down the path of implementing snapshotting was to implement save/restore for guest during host shutdown and startup. Because of the way autostart works within libvirt, we can't have an external script (ala xendomains) do this; it needs to be handled inside the libvirt daemon itself, and our current save/restore API is not sufficient for this. That being said, after all of the discussions we have had about this snapshotting API, I don't think it will be appropriate to shoehorn this "managed" save/restore into this API, and we'll need a separate API for that.
3) Do we need a snapshot UUID? Virtualbox allows you to have multiple snapshots with the same name, differentiated by UUID. Confusingly, they also have a "FindByName" method that returns the first depth-first search snapshot that matches a given name. For qemu, if you specify the same name twice it overwrites the previous one with the new one. I don't know what ESX does here.
ESX 4.0 allows multiple snapshots with the same name. I think this is because ESX 4.0 has an integer ID per snapshot. I'm not sure if ESX 3.5 allows multiple snapshots with the same name, because the ID field was added in ESX 4.0. I assume ESX 3.5 doesn't allow multiple snapshots with the same name, but I have currently no ESX 3.5 at hand to test.
We could use this integer ID and convert it to UUID format, but you won't be able to set the UUID, it'll be read-only and only available on ESX 4.0 and above.
Yeah, again, I'm happy to drop UUID and declare duplicate names unsupported unless there is a good use case.
Mapping of our interface to various hypervisors: +-------------------------------+-----------------+-------------------+------------------------------+ | Libvirt | Qemu | Virtualbox | ESX | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotCreateXML | monitor command | takeSnapshot | CreateSnapshot_task | | | "savevm"; if | Snapshots can | takes a name, description, | | | snapshot name | be taken on | memory (true/false) and | | | is already in | powered off, | quiesce (true/false). | | | use, replaces | saved, running, | What does "memory" mean? |
If memory is true, ESX snapshots the memory of the domain too, otherwise only a disk snapshot is created.
Creating a disk-only snapshot is nearly instant, while creating a memory snapshot also requires a notable amount of time to write the memory image to disk.
Sorry, I misread the documentation yesterday. That's fairly clear. What's less clear to me is what happens when you take a disk-only snapshot, and then try to RevertToSnapshot from a running VM. What happens in that case?
| | the previous | or paused VMs. | Should we model "quiesce" |
The vSphere API docs give a good description what the quiesce option does:
"If TRUE and the virtual machine is powered on when the snapshot is taken, VMware Tools is used to quiesce the file system in the virtual machine. This assures that a disk snapshot represents a consistent state of the guest file systems. If the virtual machine is powered off or VMware Tools are not available, the quiesce flag is ignored."
I assume "quiesce the file system" means to flush write caches and stuff like that.
This option is important if you want to create a disk-only snapshot of a running domain.
Exactly. I'm not sure this is going to be possible in general (and I guess it's not even really possible in ESX unless you install VMware Tools inside the guest). I'm inclined not to model it at the moment, although I could be convinced otherwise.
| | snapshot. Also | The snapshot is | Trees of snapshots are | | | qemu-img | always taken | supported. What happens | | | snapshot -c can | against the | on a duplicate name? What | | | be used to | current snapshot. | state(s) can a VM be in | | | create a | What happens on | when calling this? Does | | | disk-only | a duplicate | a VM get paused when this | | | snapshot. What | name? Trees of | is called? |
In case of ESX the domain can be in any state when a snapshot is created.
If the domain is running when you create a snapshot then the domain is _not_ paused during the snapshot creation.
I tested it and the memory snapshot represents the state at the time the snapshot command was issued.
OK, great. I'll update these notes about that.
| | happens if the | snapshots are | | | | VM is running | not currently | | | | when you do | supported. | | | | this? Trees of | Taking a snapshot | | | | snapshots seem | of a running VM | | | | to be supported | pauses the VM | | | | VM gets paused | before taking the | | | | while this is | snapshot. | | | | happening. What | | | | | states can the | | | | | VM be in? | | | +-------------------------------+-----------------+-------------------+------------------------------+
+-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotDelete | monitor command | deleteSnapshot | RemoveSnapshot_Task | | | "delvm". What | deletes the | removes this snapshot and | | | happens if the | specified | deletes any associated | | | snapshot is in | snapshot. Takes | storage. Operates on a | | | use? What | an ID. The VM | VirtualMachineSnapshot | | | states can the | must be off. | object. What states can | | | VM be in? Also | Differences to | the VM be in? What | | | qemu-img | children | happens if this snapshot | | | snapshot -d | snapshots will be | is in-use? What happens | | | <name> <file> | merged with the | to parents and children? | | | command can be | children to keep | | | | used. What | children valid. | | | | happens if the | Parent for this | | | | disk is in-use? | snapshot will | | | | What happens to | become parent of | | | | parents and | any children | | | | children? | snapshots. | | | | How do we | | | | | handle merges? | | | +-------------------------------+-----------------+-------------------+------------------------------+
The domain can be in any state when deleting a snapshot, even if you delete the current snapshot. VMware has some documentation about how a snapshot is merged into its parent:
And some more general docs about snapshots:
Regarding what get's merged and where, I should define the terms I'm using first.
A <--1-- B <--2-- C <--3-- current + <--4-- D
I intentionally draw the arrows directed from child to parent.
A, B, C, D are what I call a snapshot, a point in "time" I can switch to. The disk differences between these points are stored in COW sparse images, here shown as 1, 2, 3, 4. The current state of the domain is denoted by the "current" item.
Each snapshot is associated with a disk image: A is associated with the base image, B with sparse image 1, C with 2 and so on. A special case is sparse image 3, it's not associated with a snapshot, but with the current state. Also each snapshot can be associated with a memory image (not shown here).
The current snapshot in this case is C. If the domain writes changes to disk, these changes get stored in sparse image 3. If you switch to another snapshot from here then the changes in 3 are lost, because you cannot go back to a point where you could access the changes in 3 again.
Now lets delete B. In this case the memory images associated with B is just discarded and 1 and 2 are merged into 5. That's what I was referring to when I said ESX merges snapshots into the parent.
A <------5------- C <--3-- current + <--4-- D
But this only happens for snapshots like B, that have a parent and a child (C is such a snapshot too, even if its child isn't an actual snapshot). If you delete D in this example, then the changes in sparse image 4 are discarded, because there is no place where they could be merged. Merging 4 in the base image would alter A, merging 4 and 5 would alter C.
Now as I think of this in detail, it seems that the term "merging into the parent" is wrong.
In the next example we have snapshot E with parent B.
A <--1-- B <--2-- C <--3-- current + <--6-- E
Now what's going to happen if we delete B? In order to preserve C and E, the changes in 1 need to be merged into 2 and 6, this results in 1 + 2 = 5 and 1 + 6 = 7. Or rephrased: B is merged in toits children C and E.
A <------5------- C <--3-- current + <------7------- E
So, virDomainSnapshotDelete's semantic for VirtualBox and ESX seems to be the same. I just used the wrong words to describe it at first. Sorry for that.
OK, that's very interesting to know. So VirtualBox and ESX seem to do the same thing here. This is the last thing I have to do testing on with qemu to get it's semantic; I'll get to that today, and then we can look again at the semantics of the flags to virDomainSnapshotDelete. -- Chris Lalancette

2010/3/31 Chris Lalancette <clalance@redhat.com>:
On 03/30/2010 08:14 PM, Matthias Bolte wrote:
2010/3/30 Chris Lalancette <clalance@redhat.com>:
Hello, After our discussions about the snapshot API last week, I went ahead and implemented quite a bit of the API. I also went back to the ESX, Virtualbox, and QEMU API's to try and make sure our API's matched up. What's below is my revised API based on that survey. Following my revised API are notes that I took regarding how the libvirt API matches up to the various API's, and some questions about semantics that I had while doing the survey. More comments and questions are welcome.
Mapping of our interface to various hypervisors: +-------------------------------+-----------------+-------------------+------------------------------+ | Libvirt | Qemu | Virtualbox | ESX | +-------------------------------+-----------------+-------------------+------------------------------+ | virDomainSnapshotCreateXML | monitor command | takeSnapshot | CreateSnapshot_task | | | "savevm"; if | Snapshots can | takes a name, description, | | | snapshot name | be taken on | memory (true/false) and | | | is already in | powered off, | quiesce (true/false). | | | use, replaces | saved, running, | What does "memory" mean? |
If memory is true, ESX snapshots the memory of the domain too, otherwise only a disk snapshot is created.
Creating a disk-only snapshot is nearly instant, while creating a memory snapshot also requires a notable amount of time to write the memory image to disk.
Sorry, I misread the documentation yesterday. That's fairly clear. What's less clear to me is what happens when you take a disk-only snapshot, and then try to RevertToSnapshot from a running VM. What happens in that case?
If the domain is running and you revert to a disk-only snapshot then the domain gets shutdown. Or rephrased: the power state associated with a disk-only snapshot is always powered-off, even if the domain was running while the disk-only snapshot was created.
| | the previous | or paused VMs. | Should we model "quiesce" |
The vSphere API docs give a good description what the quiesce option does:
"If TRUE and the virtual machine is powered on when the snapshot is taken, VMware Tools is used to quiesce the file system in the virtual machine. This assures that a disk snapshot represents a consistent state of the guest file systems. If the virtual machine is powered off or VMware Tools are not available, the quiesce flag is ignored."
I assume "quiesce the file system" means to flush write caches and stuff like that.
This option is important if you want to create a disk-only snapshot of a running domain.
Exactly. I'm not sure this is going to be possible in general (and I guess it's not even really possible in ESX unless you install VMware Tools inside the guest). I'm inclined not to model it at the moment, although I could be convinced otherwise.
Yes, quiesce requires VMware Tools to be installed in the guest. Matthias

...
/* Get a handle to the current in-use snapshot for the domain */ virDomainSnapshotPtr virDomainSnapshotCurrent(virDomainPtr domain, unsigned int flags);
This call needs to be changed a bit to allow distinguishing between an error and a domain with no snapshots (i.e., no current snapshot): int virDomainSnapshotCurrent(virDomainPtr domain, virDomainSnapshotPtr *snapshot, unsigned int flags); Jirka

On Wed, Mar 31, 2010 at 06:16:12PM +0200, Jiri Denemark wrote:
...
/* Get a handle to the current in-use snapshot for the domain */ virDomainSnapshotPtr virDomainSnapshotCurrent(virDomainPtr domain, unsigned int flags);
This call needs to be changed a bit to allow distinguishing between an error and a domain with no snapshots (i.e., no current snapshot):
int virDomainSnapshotCurrent(virDomainPtr domain, virDomainSnapshotPtr *snapshot, unsigned int flags);
I think it'd be nicer to keep the first syntax, but add a way to query if a guest has a current snapshot int virDomainSnapshotHasCurrent(virDomainPtr dom); That way, virDomainSnapshotCurrent() has no need to distingush between the 'no snapshots' error and other types of errors. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (6)
-
Chris Lalancette
-
Daniel P. Berrange
-
Daniel Veillard
-
Eric Blake
-
Jiri Denemark
-
Matthias Bolte