[libvirt] [PATCHv4 0/4] Add initial support for reverting snapshots

I'm reposting another version of the series. This version fixes small mistakes in patches 1-3 and reposts patch 4 as it got missing due to a sending error. Peter Krempa (4): qemu: Split out guts of qemuDomainSaveImageStartVM() to allow reuse snapshot: Add flag VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED snapshot: Add flag to allow hypervisor restart when reverting snapshots snapshot: qemu: Implement reverting of external snapshots include/libvirt/libvirt.h.in | 3 + src/libvirt.c | 36 ++-- src/qemu/qemu_driver.c | 389 ++++++++++++++++++++++++++++++++++++++++--- tools/virsh-snapshot.c | 7 + tools/virsh.pod | 19 ++- 5 files changed, 410 insertions(+), 44 deletions(-) -- 1.8.0

The workhorse part of qemuDomainSaveImageStartVM can be reused while loading external snapshots. This patch splits the code out into a new function qemuDomainSaveImageLoad that is free of setting lifecycle events. --- No changes, ACKed. --- src/qemu/qemu_driver.c | 53 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 36 insertions(+), 17 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 595c452..34aef0c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -4952,19 +4952,19 @@ error: return -1; } -static int ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_NONNULL(6) -qemuDomainSaveImageStartVM(virConnectPtr conn, - struct qemud_driver *driver, - virDomainObjPtr vm, - int *fd, - const struct qemud_save_header *header, - const char *path, - bool start_paused) +/* this helper loads the save image into a new qemu process */ +static int +qemuDomainSaveImageLoad(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr vm, + int *fd, + const struct qemud_save_header *header, + const char *path) + { - int ret = -1; - virDomainEventPtr event; int intermediatefd = -1; virCommandPtr cmd = NULL; + int ret = -1; if (header->version == 2) { const char *prog = qemudSaveCompressionTypeToString(header->compressed); @@ -4972,7 +4972,7 @@ qemuDomainSaveImageStartVM(virConnectPtr conn, virReportError(VIR_ERR_OPERATION_FAILED, _("Invalid compressed save format %d"), header->compressed); - goto out; + goto cleanup; } if (header->compressed != QEMUD_SAVE_FORMAT_RAW) { @@ -4988,7 +4988,7 @@ qemuDomainSaveImageStartVM(virConnectPtr conn, _("Failed to start decompression binary %s"), prog); *fd = intermediatefd; - goto out; + goto cleanup; } } } @@ -5018,6 +5018,30 @@ qemuDomainSaveImageStartVM(virConnectPtr conn, ret = -1; } +cleanup: + virCommandFree(cmd); + if (virSecurityManagerRestoreSavedStateLabel(driver->securityManager, + vm->def, path) < 0) + VIR_WARN("failed to restore save state label on %s", path); + + + return ret; +} + +static int ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_NONNULL(6) +qemuDomainSaveImageStartVM(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr vm, + int *fd, + const struct qemud_save_header *header, + const char *path, + bool start_paused) +{ + virDomainEventPtr event; + int ret = -1; + + ret = qemuDomainSaveImageLoad(conn, driver, vm, fd, header, path); + if (ret < 0) { virDomainAuditStart(vm, "restored", false); goto out; @@ -5058,11 +5082,6 @@ qemuDomainSaveImageStartVM(virConnectPtr conn, ret = 0; out: - virCommandFree(cmd); - if (virSecurityManagerRestoreSavedStateLabel(driver->securityManager, - vm->def, path) < 0) - VIR_WARN("failed to restore save state label on %s", path); - return ret; } -- 1.8.0

The current snapshot reverting api supported changing the state of the machine after the snapshot was reverted to either started or paused. This patch adds the ability to revert the state but to stopped state. --- Fixed comment regarding use of this flag on transient domains. --- include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 31 ++++++++++++++++--------------- tools/virsh-snapshot.c | 3 +++ tools/virsh.pod | 15 +++++++++------ 4 files changed, 29 insertions(+), 21 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 49a361a..d3ee588 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -3875,6 +3875,7 @@ typedef enum { VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING = 1 << 0, /* Run after revert */ VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED = 1 << 1, /* Pause after revert */ VIR_DOMAIN_SNAPSHOT_REVERT_FORCE = 1 << 2, /* Allow risky reverts */ + VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED = 1 << 3, /* Revert into stopped state */ } virDomainSnapshotRevertFlags; /* Revert the domain to a point-in-time snapshot. The diff --git a/src/libvirt.c b/src/libvirt.c index bdb1dc6..1e04d27 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18638,17 +18638,17 @@ error: * * Revert the domain to a given snapshot. * - * Normally, the domain will revert to the same state the domain was - * in while the snapshot was taken (whether inactive, running, or - * paused), except that disk snapshots default to reverting to - * inactive state. Including VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING in - * @flags overrides the snapshot state to guarantee a running domain - * after the revert; or including VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED in - * @flags guarantees a paused domain after the revert. These two - * flags are mutually exclusive. While a persistent domain does not - * need either flag, it is not possible to revert a transient domain - * into an inactive state, so transient domains require the use of one - * of these two flags. + * Normally, the domain will revert to the same state the domain was in while + * the snapshot was taken (whether inactive, running, or paused), except that + * disk snapshots default to reverting to inactive state. Including + * VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING in @flags overrides the snapshot state to + * guarantee a running domain after the revert; or including + * VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED in @flags guarantees a paused domain after + * the revert. With VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED contained in the flags + * the domain's memory state is not restored. These three flags are mutually + * exclusive. While a persistent domain does not need any of these flags, + * it is not possible to revert a transient domain into an inactive state, + * so transient domains require the use of either the running or paused flag. * * Reverting to any snapshot discards all configuration changes made since * the last snapshot. Additionally, reverting to a snapshot from a running @@ -18697,11 +18697,12 @@ virDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, goto error; } - if ((flags & VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING) && - (flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED)) { + if ((!!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING) + + !!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED) + + !!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED)) > 1) { virReportInvalidArg(flags, - _("running and paused flags in %s are mutually exclusive"), - __FUNCTION__); + _("running, paused and stopped flags in %s are " + "mutually exclusive"), __FUNCTION__); goto error; } diff --git a/tools/virsh-snapshot.c b/tools/virsh-snapshot.c index 0bd9583..57d2baf 100644 --- a/tools/virsh-snapshot.c +++ b/tools/virsh-snapshot.c @@ -1517,6 +1517,7 @@ static const vshCmdOptDef opts_snapshot_revert[] = { {"current", VSH_OT_BOOL, 0, N_("revert to current snapshot")}, {"running", VSH_OT_BOOL, 0, N_("after reverting, change state to running")}, {"paused", VSH_OT_BOOL, 0, N_("after reverting, change state to paused")}, + {"stopped", VSH_OT_BOOL, 0, N_("after reverting, change state to stopped")}, {"force", VSH_OT_BOOL, 0, N_("try harder on risky reverts")}, {NULL, 0, 0, NULL} }; @@ -1536,6 +1537,8 @@ cmdDomainSnapshotRevert(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING; if (vshCommandOptBool(cmd, "paused")) flags |= VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED; + if (vshCommandOptBool(cmd, "stopped")) + flags |= VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED; /* We want virsh snapshot-revert --force to work even when talking * to older servers that did the unsafe revert by default but * reject the flag, so we probe without the flag, and only use it diff --git a/tools/virsh.pod b/tools/virsh.pod index 29be39e..e940c5f 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2810,7 +2810,7 @@ Output the name of the parent snapshot, if any, for the given I<snapshot>, or for the current snapshot with I<--current>. =item B<snapshot-revert> I<domain> {I<snapshot> | I<--current>} -[{I<--running> | I<--paused>}] [I<--force>] +[{I<--running> | I<--paused> | I<--stopped>}] [I<--force>] Revert the given domain to the snapshot specified by I<snapshot>, or to the current snapshot with I<--current>. Be aware @@ -2821,11 +2821,14 @@ the original snapshot was taken. Normally, reverting to a snapshot leaves the domain in the state it was at the time the snapshot was created, except that a disk snapshot with -no vm state leaves the domain in an inactive state. Passing either the -I<--running> or I<--paused> flag will perform additional state changes -(such as booting an inactive domain, or pausing a running domain). Since -transient domains cannot be inactive, it is required to use one of these -flags when reverting to a disk snapshot of a transient domain. +no vm state leaves the domain in an inactive state. Passing one of the +I<--running>, I<--paused> or I<--stopped> flag will perform additional +state changes such as booting an inactive domain, pausing a running domain +or shutting the domain down after the snapshot is reverted. Since +transient domains cannot be inactive, it is required to use one of +I<--running> or I<--paused> flags when reverting to a disk snapshot of a +transient domain. The I<--stopped> flag cannot be used on snapshots +of transient domains. There are two cases where a snapshot revert involves extra risk, which requires the use of I<--force> to proceed. One is the case of a -- 1.8.0

On 11/19/2012 09:06 AM, Peter Krempa wrote:
The current snapshot reverting api supported changing the state of the machine after the snapshot was reverted to either started or paused.
This patch adds the ability to revert the state but to stopped state. --- Fixed comment regarding use of this flag on transient domains. --- include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 31 ++++++++++++++++--------------- tools/virsh-snapshot.c | 3 +++ tools/virsh.pod | 15 +++++++++------ 4 files changed, 29 insertions(+), 21 deletions(-)
ACK. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Some hypervisors require a respawn of the hypervisor to allow reverting to some snapshot states. This patch adds flag to remove the default safe approach to not allow this. When this flag is specified the hypervisor driver should re-emit events to allow management apps to reconnect. This flag is meant as a lesser way to enforce the restart of the hypervisor, that is a fairly common possibility compared to other meanings that the existing force flag has. --- Force now selects this flag and this flag can be used with internal snapshots too. --- include/libvirt/libvirt.h.in | 2 ++ src/libvirt.c | 5 +++++ src/qemu/qemu_driver.c | 8 ++++++-- tools/virsh-snapshot.c | 4 ++++ tools/virsh.pod | 6 +++++- 5 files changed, 22 insertions(+), 3 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index d3ee588..ac24ed8 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -3876,6 +3876,8 @@ typedef enum { VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED = 1 << 1, /* Pause after revert */ VIR_DOMAIN_SNAPSHOT_REVERT_FORCE = 1 << 2, /* Allow risky reverts */ VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED = 1 << 3, /* Revert into stopped state */ + VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN = 1 << 4, /* Allow restarting of the + hypervisor */ } virDomainSnapshotRevertFlags; /* Revert the domain to a point-in-time snapshot. The diff --git a/src/libvirt.c b/src/libvirt.c index 1e04d27..8b7de9f 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18650,6 +18650,11 @@ error: * it is not possible to revert a transient domain into an inactive state, * so transient domains require the use of either the running or paused flag. * + * Some snapshot operations may require a restart of the hypervisor to complete + * successfuly. This is normally not allowed. To override this behavior add + * VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN to @flags. The hypervisor driver should + * re-emit the appropriate events to allow reconnect of management applications. + * * Reverting to any snapshot discards all configuration changes made since * the last snapshot. Additionally, reverting to a snapshot from a running * domain is a form of data loss, since it discards whatever is in the diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 34aef0c..2178798 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12129,7 +12129,11 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, virCheckFlags(VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED | - VIR_DOMAIN_SNAPSHOT_REVERT_FORCE, -1); + VIR_DOMAIN_SNAPSHOT_REVERT_FORCE | + VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN, -1); + + if (flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE) + flags |= VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN; /* We have the following transitions, which create the following events: * 1. inactive -> inactive: none @@ -12249,7 +12253,7 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, if (config && !virDomainDefCheckABIStability(vm->def, config)) { virErrorPtr err = virGetLastError(); - if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) { + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN)) { /* Re-spawn error using correct category. */ if (err->code == VIR_ERR_CONFIG_UNSUPPORTED) virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s", diff --git a/tools/virsh-snapshot.c b/tools/virsh-snapshot.c index 57d2baf..8897b7e 100644 --- a/tools/virsh-snapshot.c +++ b/tools/virsh-snapshot.c @@ -1519,6 +1519,8 @@ static const vshCmdOptDef opts_snapshot_revert[] = { {"paused", VSH_OT_BOOL, 0, N_("after reverting, change state to paused")}, {"stopped", VSH_OT_BOOL, 0, N_("after reverting, change state to stopped")}, {"force", VSH_OT_BOOL, 0, N_("try harder on risky reverts")}, + {"allow-respawn", VSH_OT_BOOL, 0, + N_("allow respawn of hypervisor on certain operations")}, {NULL, 0, 0, NULL} }; @@ -1539,6 +1541,8 @@ cmdDomainSnapshotRevert(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED; if (vshCommandOptBool(cmd, "stopped")) flags |= VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED; + if (vshCommandOptBool(cmd, "allow-respawn")) + flags |= VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN; /* We want virsh snapshot-revert --force to work even when talking * to older servers that did the unsafe revert by default but * reject the flag, so we probe without the flag, and only use it diff --git a/tools/virsh.pod b/tools/virsh.pod index e940c5f..0269709 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2810,7 +2810,7 @@ Output the name of the parent snapshot, if any, for the given I<snapshot>, or for the current snapshot with I<--current>. =item B<snapshot-revert> I<domain> {I<snapshot> | I<--current>} -[{I<--running> | I<--paused> | I<--stopped>}] [I<--force>] +[{I<--running> | I<--paused> | I<--stopped>}] [I<--force>] [I<--respawn>] Revert the given domain to the snapshot specified by I<snapshot>, or to the current snapshot with I<--current>. Be aware @@ -2830,6 +2830,10 @@ I<--running> or I<--paused> flags when reverting to a disk snapshot of a transient domain. The I<--stopped> flag cannot be used on snapshots of transient domains. +Some snapshot revert approaches may require a respawn of the hypervisor +process. This is not allowed by default. You may specify I<--allow-respawn> +to override this limit. + There are two cases where a snapshot revert involves extra risk, which requires the use of I<--force> to proceed. One is the case of a snapshot that lacks full domain information for reverting -- 1.8.0

On 11/19/2012 09:06 AM, Peter Krempa wrote:
Some hypervisors require a respawn of the hypervisor to allow reverting to some snapshot states. This patch adds flag to remove the default safe approach to not allow this. When this flag is specified the hypervisor driver should re-emit events to allow management apps to reconnect.
This flag is meant as a lesser way to enforce the restart of the hypervisor, that is a fairly common possibility compared to other meanings that the existing force flag has. --- Force now selects this flag and this flag can be used with internal snapshots too. ---
@@ -12129,7 +12129,11 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
virCheckFlags(VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED | - VIR_DOMAIN_SNAPSHOT_REVERT_FORCE, -1); + VIR_DOMAIN_SNAPSHOT_REVERT_FORCE | + VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN, -1); + + if (flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE) + flags |= VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN;
Up to here looks fine.
/* We have the following transitions, which create the following events: * 1. inactive -> inactive: none @@ -12249,7 +12253,7 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot,
Hmm, this is incomplete. Earlier, inside a VIR_DOMAIN_SNAPSHOT_REVERT_FORCE check, we have: if (virDomainObjIsActive(vm) && !(snap->def->state == VIR_DOMAIN_RUNNING || snap->def->state == VIR_DOMAIN_PAUSED) && (flags & (VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED))) { virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s", _("must respawn qemu to start inactive snapshot")); goto cleanup; which should be converted to the new flag.
if (config && !virDomainDefCheckABIStability(vm->def, config)) { virErrorPtr err = virGetLastError();
- if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) { + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN)) { /* Re-spawn error using correct category. */
I think this one is correct, though.
@@ -2830,6 +2830,10 @@ I<--running> or I<--paused> flags when reverting to a disk snapshot of a transient domain. The I<--stopped> flag cannot be used on snapshots of transient domains.
+Some snapshot revert approaches may require a respawn of the hypervisor +process. This is not allowed by default. You may specify I<--allow-respawn> +to override this limit.
It might be worth mentioning that --force implies --allow-respawn (that is, --force is a supserset of --allow-respawn, in that it can force situations that --allow-respawn will not). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 11/19/2012 03:53 PM, Eric Blake wrote:
On 11/19/2012 09:06 AM, Peter Krempa wrote:
Some hypervisors require a respawn of the hypervisor to allow reverting to some snapshot states. This patch adds flag to remove the default safe approach to not allow this. When this flag is specified the hypervisor driver should re-emit events to allow management apps to reconnect.
@@ -2830,6 +2830,10 @@ I<--running> or I<--paused> flags when reverting to a disk snapshot of a transient domain. The I<--stopped> flag cannot be used on snapshots of transient domains.
+Some snapshot revert approaches may require a respawn of the hypervisor +process. This is not allowed by default. You may specify I<--allow-respawn> +to override this limit.
It might be worth mentioning that --force implies --allow-respawn (that is, --force is a supserset of --allow-respawn, in that it can force situations that --allow-respawn will not).
One more possible problem - the existing virsh code fakes '_FORCE' even for older servers, by first trying without _FORCE, and then only adding it if it would make a difference. Should we also fake '_RESPAWN' in the same manner? Then again, if --allow-respawn fails because the server is too old, you can still use --force; so I think we're okay. Here's what I'm changing before my repost: diff --git i/src/libvirt.c w/src/libvirt.c index a8ce47b..762d829 100644 --- i/src/libvirt.c +++ w/src/libvirt.c @@ -18712,8 +18712,9 @@ error: * * Some snapshot operations may require a restart of the hypervisor to complete * successfuly. This is normally not allowed. To override this behavior add - * VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN to @flags. The hypervisor driver should - * re-emit the appropriate events to allow reconnect of management applications. + * VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN to @flags. The hypervisor driver will + * re-emit the appropriate events to allow reconnect of management + * applications. This flag is implied by VIR_DOMAIN_SNAPSHOT_REVERT_FORCE. * * Reverting to any snapshot discards all configuration changes made since * the last snapshot. Additionally, reverting to a snapshot from a running @@ -18735,7 +18736,8 @@ error: * such as VNC or Spice graphics) - this condition arises from active * snapshots that are provably ABI incomaptible, as well as from * inactive snapshots with a @flags request to start the domain after - * the revert. + * the revert (this latter situation is also covered via the weaker + * VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN, as of libvirt 1.0.1). * * Returns 0 if the creation is successful, -1 on error. */ diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c index 3de8197..b873604 100644 --- i/src/qemu/qemu_driver.c +++ w/src/qemu/qemu_driver.c @@ -12182,24 +12182,23 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, "yet")); goto cleanup; } - if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) { - if (!snap->def->dom) { - virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, - _("snapshot '%s' lacks domain '%s' rollback info"), - snap->def->name, vm->def->name); - goto cleanup; - } - if (virDomainObjIsActive(vm) && - !(snap->def->state == VIR_DOMAIN_RUNNING - || snap->def->state == VIR_DOMAIN_PAUSED) && - (flags & (VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | - VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED))) { - virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s", - _("must respawn qemu to start inactive snapshot")); - goto cleanup; - } + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE) && + !snap->def->dom) { + virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, + _("snapshot '%s' lacks domain '%s' rollback info"), + snap->def->name, vm->def->name); + goto cleanup; + } + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN) && + virDomainObjIsActive(vm) && + !(snap->def->state == VIR_DOMAIN_RUNNING + || snap->def->state == VIR_DOMAIN_PAUSED) && + (flags & (VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | + VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED))) { + virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s", + _("must respawn qemu to start inactive snapshot")); + goto cleanup; } - if (vm->current_snapshot) { vm->current_snapshot->def->current = false; diff --git i/tools/virsh.pod w/tools/virsh.pod index 01b3625..cb41352 100644 --- i/tools/virsh.pod +++ w/tools/virsh.pod @@ -2845,25 +2845,18 @@ I<--running> or I<--paused> flags when reverting to a disk snapshot of a transient domain. The I<--stopped> flag cannot be used on snapshots of transient domains. -Some snapshot revert approaches may require a respawn of the hypervisor -process. This is not allowed by default. You may specify I<--allow-respawn> -to override this limit. - -There are two cases where a snapshot revert involves extra risk, which -requires the use of I<--force> to proceed. One is the case of a +There are some cases where a snapshot revert involves extra risk, and +where the revert does not succeed by default but can happen with extra +flags. The first case is any time that reverting from a running domain +would require respawning the hypervisor, rather than reusing the existing +one; this action can impact SPICE and VNC connections, and requires the +I<--allow-respawn> or I<--force> flag. Another problem is the case of a snapshot that lacks full domain information for reverting configuration (such as snapshots created prior to libvirt 0.9.5); since libvirt cannot prove that the current configuration matches what was in use at the time of the snapshot, supplying I<--force> assures libvirt that the snapshot is compatible with the current configuration -(and if it is not, the domain will likely fail to run). The other is -the case of reverting from a running domain to an active state where a -new hypervisor has to be created rather than reusing the existing -hypervisor, because it implies drawbacks such as breaking any existing -VNC or Spice connections; this condition happens with an active -snapshot that uses a provably incompatible configuration, as well as -with an inactive snapshot that is combined with the I<--start> or -I<--pause> flag. +(and if it is not, the domain will likely fail to run). =item B<snapshot-delete> I<domain> {I<snapshot> | I<--current>} [I<--metadata>] [{I<--children> | I<--children-only>}] -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This patch adds support for reverting of external snapshots. The support is somewhat limited yet (you can only revert to a snapshot that has no children or delete the children that would have their image chains invalidated). While reverting an external snapshot, the domain has to be taken offline as there's no possibility to start a migration from file on a running machine. This poses a few difficulties when the user has virt-viewer attached as appropriate events need to be re-emitted even if the machine doesn't change states. --- Adapt for the revert flag and a few minor fixes. --- src/qemu/qemu_driver.c | 330 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 325 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2178798..d66551b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12112,6 +12112,315 @@ qemuDomainSnapshotRevertInactive(struct qemud_driver *driver, return ret > 0 ? -1 : ret; } +/* helper to create lifecycle for possible outcomes of reverting snapshots + * the reemit parameter is used when specific events need to be reemitted + * so that virt-viewer for example is able to re-connect */ +static int +qemuDomainSnapshotCreateEvent(struct qemud_driver *driver, + virDomainObjPtr vm, + virDomainState old_state, + virDomainState new_state, + bool reemit) +{ + int reason = 0; /* generic unknown reason */ + int type = -1; + virDomainEventPtr event; + + switch (new_state) { + case VIR_DOMAIN_RUNNING: + switch (old_state) { + case VIR_DOMAIN_RUNNING: + if (!reemit) + break; + /* fallthrough */ + case VIR_DOMAIN_NOSTATE: + case VIR_DOMAIN_SHUTOFF: + case VIR_DOMAIN_CRASHED: + case VIR_DOMAIN_BLOCKED: + case VIR_DOMAIN_SHUTDOWN: + type = VIR_DOMAIN_EVENT_STARTED; + reason = VIR_DOMAIN_EVENT_STARTED_FROM_SNAPSHOT; + break; + + case VIR_DOMAIN_PMSUSPENDED: + case VIR_DOMAIN_PAUSED: + type = VIR_DOMAIN_EVENT_RESUMED; + reason = VIR_DOMAIN_EVENT_RESUMED_FROM_SNAPSHOT; + break; + + case VIR_DOMAIN_LAST: + /* no event */ + break; + } + break; + + case VIR_DOMAIN_PAUSED: + switch (old_state) { + case VIR_DOMAIN_NOSTATE: + case VIR_DOMAIN_SHUTOFF: + case VIR_DOMAIN_CRASHED: + /* the machine was started here, so we need an additional event */ + event = virDomainEventNewFromObj(vm, + VIR_DOMAIN_EVENT_STARTED, + VIR_DOMAIN_EVENT_STARTED_FROM_SNAPSHOT); + if (!event) + goto no_memory; + qemuDomainEventQueue(driver, event); + /* fallthrough! */ + case VIR_DOMAIN_PMSUSPENDED: + case VIR_DOMAIN_RUNNING: + case VIR_DOMAIN_SHUTDOWN: + case VIR_DOMAIN_BLOCKED: + type = VIR_DOMAIN_EVENT_SUSPENDED; + reason = VIR_DOMAIN_EVENT_SUSPENDED_FROM_SNAPSHOT; + break; + + case VIR_DOMAIN_PAUSED: + case VIR_DOMAIN_LAST: + /* no event */ + break; + } + break; + + case VIR_DOMAIN_SHUTOFF: + switch (old_state) { + case VIR_DOMAIN_NOSTATE: + case VIR_DOMAIN_BLOCKED: + case VIR_DOMAIN_SHUTDOWN: + case VIR_DOMAIN_PAUSED: + case VIR_DOMAIN_RUNNING: + case VIR_DOMAIN_PMSUSPENDED: + type = VIR_DOMAIN_EVENT_STOPPED; + reason = VIR_DOMAIN_EVENT_STOPPED_FROM_SNAPSHOT; + break; + + case VIR_DOMAIN_SHUTOFF: + case VIR_DOMAIN_CRASHED: + case VIR_DOMAIN_LAST: + /* no event */ + break; + } + break; + + /* fallthrough */ + case VIR_DOMAIN_NOSTATE: + case VIR_DOMAIN_BLOCKED: + case VIR_DOMAIN_SHUTDOWN: + case VIR_DOMAIN_CRASHED: + case VIR_DOMAIN_LAST: + case VIR_DOMAIN_PMSUSPENDED: + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Suspicious domain state '%d' after snapshot"), + new_state); + return -1; + } + + if (!(event = virDomainEventNewFromObj(vm, type, reason))) + goto no_memory; + + qemuDomainEventQueue(driver, event); + + return 0; + +no_memory: + virReportOOMError(); + return -1; +} + + +/* The domain and driver is expected to be locked */ +static int +qemuDomainSnapshotRevertExternal(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr vm, + virDomainSnapshotObjPtr snap, + unsigned int flags) +{ + int ret = -1; + int loadfd = -1; + + virDomainDefPtr save_def = NULL; + struct qemud_save_header header; + virFileWrapperFdPtr wrapperFd = NULL; + + virDomainState old_state = virDomainObjGetState(vm, NULL); + virDomainDefPtr config = NULL; + + /* check if domain is running and should be running afterwards: + * qemu has to be re-started to allow the migration from file + */ + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN) && + virDomainObjIsActive(vm) && + !(snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT || + snap->def->state == VIR_DOMAIN_SHUTOFF)) { + virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, + _("Reverting of snapshot '%s' needs to restart the " + "hypervisor."), snap->def->name); + goto cleanup; + } + + /* check if snapshot has children */ + if (snap->nchildren > 0) { + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) { + virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s", + _("Reverting of snapshots with children " + "is not implemented yet.")); + goto cleanup; + } else { + VIR_WARN("Invalidating image chain of snapshot '%s'.", + snap->def->name); + } + } + + if (vm->current_snapshot) { + vm->current_snapshot->def->current = false; + if (qemuDomainSnapshotWriteMetadata(vm, vm->current_snapshot, + driver->snapshotDir) < 0) + goto cleanup; + vm->current_snapshot = NULL; + /* XXX Should we restore vm->current_snapshot after this point + * in the failure cases where we know there was no change? */ + } + + /* Prepare to copy the snapshot inactive xml as the config of this + * domain. Easiest way is by a round trip through xml. + * + * XXX Should domain snapshots track live xml rather + * than inactive xml? */ + snap->def->current = true; + if (snap->def->dom) { + char *xml; + if (!(xml = qemuDomainDefFormatXML(driver, + snap->def->dom, + VIR_DOMAIN_XML_INACTIVE | + VIR_DOMAIN_XML_SECURE | + VIR_DOMAIN_XML_MIGRATABLE))) + goto cleanup; + + config = virDomainDefParseString(driver->caps, xml, + QEMU_EXPECTED_VIRT_TYPES, + VIR_DOMAIN_XML_INACTIVE); + VIR_FREE(xml); + if (!config) + goto cleanup; + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Domain definition not found in snapshot '%s'"), + snap->def->name); + goto cleanup; + } + + /* try to open the memory save image if one exists and it's needed */ + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED) && + (snap->def->state == VIR_DOMAIN_RUNNING || + snap->def->state == VIR_DOMAIN_PAUSED || + flags & VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING || + flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED)) { + + if (snap->def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL) { + if (!snap->def->file) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Memory image path does not exist for " + "snapshot '%s'"), snap->def->name); + goto cleanup; + } + + if (!virFileExists(snap->def->file)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("memory save file '%s' does not exist."), + snap->def->file); + goto cleanup; + } + + /* open the save image file */ + if ((loadfd = qemuDomainSaveImageOpen(driver, snap->def->file, + &save_def, &header, + false, &wrapperFd, NULL, + -1, false, false)) < 0) + goto cleanup; + + /* opening succeeded, there's a big probability the revert will work. + * We can now get rid of the active qemu, if it runs */ + } else { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Can't revert snapshot to running state: " + "Memory image not found")); + goto cleanup; + } + } + + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + /* Now we destroy the (possibly) running qemu process. + * Unfortunately, the guest can be restored only using incoming migration. + */ + if (virDomainObjIsActive(vm)) { + qemuProcessStop(driver, vm, + VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, 0); + virDomainAuditStop(vm, "from-snapshot"); + } + + /* assign domain definition */ + virDomainObjAssignDef(vm, config, false); + config = NULL; + + /* wipe and re-create disk images - qemu-img doesn't care if it exists*/ + if (qemuDomainSnapshotCreateInactiveExternal(driver, vm, snap, false) < 0) + goto endjob; + + /* save the config files */ + if (virDomainSaveConfig(driver->configDir, vm->def) < 0) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Write to config file failed")); + goto endjob; + } + + /* re-start the guest if necessary */ + if (loadfd > 0) { + if ((ret = qemuDomainSaveImageLoad(conn, driver, + vm, &loadfd, + &header, snap->def->file)) < 0) { + virDomainAuditStart(vm, "from-snapshot", false); + goto endjob; + } + + virDomainAuditStart(vm, "from-snapshot", true); + + if ((snap->def->state == VIR_DOMAIN_RUNNING && + !(flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED)) || + flags & VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING) { + if (qemuProcessStartCPUs(driver, vm, conn, + VIR_DOMAIN_RUNNING_FROM_SNAPSHOT, + QEMU_ASYNC_JOB_NONE) < 0) + goto endjob; + } + } + + /* create corresponding life cycle events */ + if (qemuDomainSnapshotCreateEvent(driver, vm, old_state, + virDomainObjGetState(vm, NULL), + !!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN)) < 0) + goto endjob; + + ret = 0; +endjob: + if (qemuDomainObjEndJob(driver, vm) == 0) { + vm = NULL; + } else if (ret < 0 && !vm->persistent) { + qemuDomainRemoveInactive(driver, vm); + vm = NULL; + } + +cleanup: + virDomainDefFree(config); + virDomainDefFree(save_def); + VIR_FORCE_CLOSE(loadfd); + virFileWrapperFdFree(wrapperFd); + + return ret; +} + static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, unsigned int flags) { @@ -12128,8 +12437,9 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, virDomainDefPtr config = NULL; virCheckFlags(VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | - VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED | - VIR_DOMAIN_SNAPSHOT_REVERT_FORCE | + VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED | + VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED | + VIR_DOMAIN_SNAPSHOT_REVERT_FORCE | VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN, -1); if (flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE) @@ -12176,12 +12486,21 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, "to revert to inactive snapshot")); goto cleanup; } - if (snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT) { + + /* Check if reverting an external snapshot */ + if (virDomainSnapshotIsExternal(snap)) { + ret = qemuDomainSnapshotRevertExternal(snapshot->domain->conn, + driver, vm, snap, flags); + goto cleanup; + } + + if (flags & (VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("revert to external disk snapshot not supported " - "yet")); + _("VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED is not supported " + "with internal snapshots")); goto cleanup; } + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) { if (!snap->def->dom) { virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, @@ -12233,6 +12552,7 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, goto cleanup; } + /* Internal snapshot revert code starts below */ if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; -- 1.8.0

On 11/19/2012 09:06 AM, Peter Krempa wrote:
This patch adds support for reverting of external snapshots. The support is somewhat limited yet (you can only revert to a snapshot that has no children or delete the children that would have their image chains invalidated).
While reverting an external snapshot, the domain has to be taken offline as there's no possibility to start a migration from file on a running machine. This poses a few difficulties when the user has virt-viewer attached as appropriate events need to be re-emitted even if the machine doesn't change states. --- Adapt for the revert flag and a few minor fixes. --- src/qemu/qemu_driver.c | 330 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 325 insertions(+), 5 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2178798..d66551b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12112,6 +12112,315 @@ qemuDomainSnapshotRevertInactive(struct qemud_driver *driver, return ret > 0 ? -1 : ret; }
+/* helper to create lifecycle for possible outcomes of reverting snapshots + * the reemit parameter is used when specific events need to be reemitted
Punctuation helps; since these two lines are different sentences: s/snapshots the/snapshots. The/
+ * so that virt-viewer for example is able to re-connect */ +static int +qemuDomainSnapshotCreateEvent(struct qemud_driver *driver,
+ } + break; + + /* fallthrough */
Spurious comment.
+/* The domain and driver is expected to be locked */ +static int +qemuDomainSnapshotRevertExternal(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr vm, + virDomainSnapshotObjPtr snap, + unsigned int flags) +{ + int ret = -1; + int loadfd = -1; + + virDomainDefPtr save_def = NULL; + struct qemud_save_header header; + virFileWrapperFdPtr wrapperFd = NULL; + + virDomainState old_state = virDomainObjGetState(vm, NULL); + virDomainDefPtr config = NULL; + + /* check if domain is running and should be running afterwards: + * qemu has to be re-started to allow the migration from file + */ + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN) && + virDomainObjIsActive(vm) && + !(snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT || + snap->def->state == VIR_DOMAIN_SHUTOFF)) { + virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, + _("Reverting of snapshot '%s' needs to restart the " + "hypervisor."), snap->def->name); + goto cleanup; + }
Makes sense.
+ + /* check if snapshot has children */ + if (snap->nchildren > 0) { + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) { + virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, "%s", + _("Reverting of snapshots with children " + "is not implemented yet.")); + goto cleanup; + } else { + VIR_WARN("Invalidating image chain of snapshot '%s'.", + snap->def->name); + } + }
Aha - the difference between --force and --allow-respawn becomes even more apparent.
+ + if (vm->current_snapshot) { + vm->current_snapshot->def->current = false; + if (qemuDomainSnapshotWriteMetadata(vm, vm->current_snapshot, + driver->snapshotDir) < 0) + goto cleanup; + vm->current_snapshot = NULL; + /* XXX Should we restore vm->current_snapshot after this point + * in the failure cases where we know there was no change? */ + } + + /* Prepare to copy the snapshot inactive xml as the config of this + * domain. Easiest way is by a round trip through xml.
I've got a pending patch that may make this easier; depending on which of us gets something working and pushed first, I can rebase it to cover this additional call site.
+ * + * XXX Should domain snapshots track live xml rather + * than inactive xml? */
I think we can do a subsequent patch that cleans up all instances of this comment - by now, we should be pretty happy with non-live XML for external checkpoints, as we have proven that it is sufficient for our needs, and matches with the fact that we have to respawn qemu (live XML is only useful when re-using an existing qemu process).
+ snap->def->current = true; + if (snap->def->dom) { + char *xml; + if (!(xml = qemuDomainDefFormatXML(driver, + snap->def->dom, + VIR_DOMAIN_XML_INACTIVE | + VIR_DOMAIN_XML_SECURE | + VIR_DOMAIN_XML_MIGRATABLE))) + goto cleanup; + + config = virDomainDefParseString(driver->caps, xml, + QEMU_EXPECTED_VIRT_TYPES, + VIR_DOMAIN_XML_INACTIVE); + VIR_FREE(xml); + if (!config) + goto cleanup; + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Domain definition not found in snapshot '%s'"), + snap->def->name); + goto cleanup; + } + + /* try to open the memory save image if one exists and it's needed */ + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED) && + (snap->def->state == VIR_DOMAIN_RUNNING || + snap->def->state == VIR_DOMAIN_PAUSED || + flags & VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING || + flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED)) { + + if (snap->def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL) { + if (!snap->def->file) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Memory image path does not exist for " + "snapshot '%s'"), snap->def->name); + goto cleanup; + } + + if (!virFileExists(snap->def->file)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("memory save file '%s' does not exist."), + snap->def->file); + goto cleanup; + } + + /* open the save image file */ + if ((loadfd = qemuDomainSaveImageOpen(driver, snap->def->file, + &save_def, &header, + false, &wrapperFd, NULL, + -1, false, false)) < 0) + goto cleanup; + + /* opening succeeded, there's a big probability the revert will work. + * We can now get rid of the active qemu, if it runs */ + } else { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("Can't revert snapshot to running state: " + "Memory image not found"));
Well, _technically_ we could revert into a running state by reverting the disk image and then booting it (of course, the guest would end up doing an fsck or such). But saving that for a followup patch is okay, given that we are still doing incremental improvements. In fact, for a disk-snapshot or offline external snapshot, there is no memory state to revert to, but that's expected. The only time where a missing memory state is unexpected is if the snapshot state was running or paused.
+ goto cleanup; + } + } + + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + /* Now we destroy the (possibly) running qemu process. + * Unfortunately, the guest can be restored only using incoming migration. + */ + if (virDomainObjIsActive(vm)) { + qemuProcessStop(driver, vm, + VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT, 0); + virDomainAuditStop(vm, "from-snapshot"); + } + + /* assign domain definition */ + virDomainObjAssignDef(vm, config, false); + config = NULL; + + /* wipe and re-create disk images - qemu-img doesn't care if it exists*/ + if (qemuDomainSnapshotCreateInactiveExternal(driver, vm, snap, false) < 0) + goto endjob;
This comment says you are throwing away state, but without the use of any flag guarding things that the user meant to throw away state. I'm a bit worried that we're missing a flag here to state that we are reverting to the point of the external snapshot _and_ want to throw away all changes that happened in the external file. Also, does qemuDomainSnapshotCreateInactiveExternal(,,,false) actually do what you want here? I thought that invocation fails if the file already exists (and passing true assumes that the file already exists, but does not otherwise touch it). Do we instead need to change that boolean into an 'int', with multiple values (0 and 1 for when the user first creates the snapshot, depending on whether they passed the REUSE_EXT flag, and 2 for this function, which says to forcefully recreate)?
+ + /* save the config files */ + if (virDomainSaveConfig(driver->configDir, vm->def) < 0) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Write to config file failed")); + goto endjob; + } + + /* re-start the guest if necessary */ + if (loadfd > 0) {
= 0 (not that we plan on it ever being libvirtd's stdin, but otherwise tools like coverity will warn about a possible off-by-one)
+ if ((ret = qemuDomainSaveImageLoad(conn, driver, + vm, &loadfd, + &header, snap->def->file)) < 0) { + virDomainAuditStart(vm, "from-snapshot", false); + goto endjob; + } + + virDomainAuditStart(vm, "from-snapshot", true); + + if ((snap->def->state == VIR_DOMAIN_RUNNING && + !(flags & VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED)) || + flags & VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING) { + if (qemuProcessStartCPUs(driver, vm, conn, + VIR_DOMAIN_RUNNING_FROM_SNAPSHOT, + QEMU_ASYNC_JOB_NONE) < 0) + goto endjob; + } + } + + /* create corresponding life cycle events */ + if (qemuDomainSnapshotCreateEvent(driver, vm, old_state, + virDomainObjGetState(vm, NULL), + !!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN)) < 0) + goto endjob; + + ret = 0; +endjob: + if (qemuDomainObjEndJob(driver, vm) == 0) { + vm = NULL; + } else if (ret < 0 && !vm->persistent) { + qemuDomainRemoveInactive(driver, vm); + vm = NULL; + } + +cleanup: + virDomainDefFree(config); + virDomainDefFree(save_def); + VIR_FORCE_CLOSE(loadfd); + virFileWrapperFdFree(wrapperFd); +
Overall, looks relatively good.
+ return ret; +} + static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, unsigned int flags) { @@ -12128,8 +12437,9 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, virDomainDefPtr config = NULL;
virCheckFlags(VIR_DOMAIN_SNAPSHOT_REVERT_RUNNING | - VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED | - VIR_DOMAIN_SNAPSHOT_REVERT_FORCE | + VIR_DOMAIN_SNAPSHOT_REVERT_PAUSED | + VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED | + VIR_DOMAIN_SNAPSHOT_REVERT_FORCE | VIR_DOMAIN_SNAPSHOT_REVERT_RESPAWN, -1);
if (flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE) @@ -12176,12 +12486,21 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, "to revert to inactive snapshot")); goto cleanup; } - if (snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT) { + + /* Check if reverting an external snapshot */ + if (virDomainSnapshotIsExternal(snap)) { + ret = qemuDomainSnapshotRevertExternal(snapshot->domain->conn, + driver, vm, snap, flags); + goto cleanup; + } + + if (flags & (VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("revert to external disk snapshot not supported " - "yet")); + _("VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED is not supported " + "with internal snapshots"));
Technically, it _could_ be; but as a followup patch (that is, do a loadvm into paused state, then kill the VM; thus effectively reverting to _just_ the disk state of that internal checkpoint).
goto cleanup; } + if (!(flags & VIR_DOMAIN_SNAPSHOT_REVERT_FORCE)) { if (!snap->def->dom) { virReportError(VIR_ERR_SNAPSHOT_REVERT_RISKY, @@ -12233,6 +12552,7 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, goto cleanup; }
+ /* Internal snapshot revert code starts below */ if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup;
I'm still the most worried about the concept of whether we are blindly discarding user disk state since the snapshot was created, and whether we are properly recreating external files if that's what the user really wanted. I'm not sure whether to say ACK and then fix the fallout, or to wait a bit longer to see what else you come up with in the series. I guess we still have a few more development days before the 1.0.1 freeze to decide, so for now, I'd like to see what you have in store for snapshot deletion, and to also get my patches posted for snapshot revert-and-create, before I decide on this one. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 11/19/2012 04:51 PM, Eric Blake wrote:
On 11/19/2012 09:06 AM, Peter Krempa wrote:
This patch adds support for reverting of external snapshots. The support is somewhat limited yet (you can only revert to a snapshot that has no children or delete the children that would have their image chains invalidated).
While reverting an external snapshot, the domain has to be taken offline as there's no possibility to start a migration from file on a running machine. This poses a few difficulties when the user has virt-viewer attached as appropriate events need to be re-emitted even if the machine doesn't change states. --- Adapt for the revert flag and a few minor fixes.
+ /* wipe and re-create disk images - qemu-img doesn't care if it exists*/ + if (qemuDomainSnapshotCreateInactiveExternal(driver, vm, snap, false) < 0) + goto endjob;
This comment says you are throwing away state, but without the use of any flag guarding things that the user meant to throw away state. I'm a bit worried that we're missing a flag here to state that we are reverting to the point of the external snapshot _and_ want to throw away all changes that happened in the external file.
Also, does qemuDomainSnapshotCreateInactiveExternal(,,,false) actually do what you want here? I thought that invocation fails if the file already exists (and passing true assumes that the file already exists, but does not otherwise touch it). Do we instead need to change that boolean into an 'int', with multiple values (0 and 1 for when the user first creates the snapshot, depending on whether they passed the REUSE_EXT flag, and 2 for this function, which says to forcefully recreate)?
After thinking a bit more, I think we need the following additional flags to virDomainRevertToSnapshot, and require that exactly one of these three mutually exclusive flags is present when reverting to an external snapshot: VIR_DOMAIN_SNAPSHOT_REVERT_FOLLOW - revert to the current state of disks after an external snapshot (I'm about to post work on this flag - yes, it will conflict with what you've done so far) VIR_DOMAIN_SNAPSHOT_REVERT_RESET - revert to the point where an external snapshot was taken, and reset the external files to be exactly that state again (the semantics of _this_ patch) VIR_DOMAIN_SNAPSHOT_REVERT_COMMIT - revert to the current state of disks after an external snapshot, but commit the state of the external files into the backing files. Not possible if there is another snapshot branched off the same backing file (unless we want to let _FORCE allow us to potentially invalidate those other branches) I can also see the use for a revert-and-delete operation rolled into one, via a new flag: VIR_DOMAIN_SNAPSHOT_REVERT_DISCARD - after reverting to a snapshot, delete it (that is, we no longer plan to revert to it again in the future). When mixed with FOLLOW, we keep the backing chain with no change to disk contents; when mixed with RESET, we keep the backing chain but reset the backing chain to start fresh; and when mixed with COMMIT, we shorten the backing chain back to its pre-snapshot length. [I was debating whether the combination delete-and-revert fits better as part of the revert operation, via the REVERT_DISCARD flag, or whether it fits better as part of the delete operation, via a new flag there; but if we make it part of delete, then delete has to learn whether the domain should be running, paused, or stopped after an associated revert, whereas revert already has that logic.]
I'm still the most worried about the concept of whether we are blindly discarding user disk state since the snapshot was created, and whether we are properly recreating external files if that's what the user really wanted. I'm not sure whether to say ACK and then fix the fallout, or to wait a bit longer to see what else you come up with in the series. I guess we still have a few more development days before the 1.0.1 freeze to decide, so for now, I'd like to see what you have in store for snapshot deletion, and to also get my patches posted for snapshot revert-and-create, before I decide on this one.
I think I could ACK this patch if you respun it to require my proposed RESET flag to match your intent of discarding user disk state. Also, before you decide too much, you'd better read up on my respin with my proposed FOLLOW flag. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 11/21/12 01:37, Eric Blake wrote:
On 11/19/2012 04:51 PM, Eric Blake wrote:
On 11/19/2012 09:06 AM, Peter Krempa wrote:
This patch adds support for reverting of external snapshots. The support is somewhat limited yet (you can only revert to a snapshot that has no children or delete the children that would have their image chains invalidated).
While reverting an external snapshot, the domain has to be taken offline as there's no possibility to start a migration from file on a running machine. This poses a few difficulties when the user has virt-viewer attached as appropriate events need to be re-emitted even if the machine doesn't change states. --- Adapt for the revert flag and a few minor fixes.
+ /* wipe and re-create disk images - qemu-img doesn't care if it exists*/ + if (qemuDomainSnapshotCreateInactiveExternal(driver, vm, snap, false) < 0) + goto endjob;
This comment says you are throwing away state, but without the use of any flag guarding things that the user meant to throw away state. I'm a bit worried that we're missing a flag here to state that we are reverting to the point of the external snapshot _and_ want to throw away all changes that happened in the external file.
Also, does qemuDomainSnapshotCreateInactiveExternal(,,,false) actually do what you want here? I thought that invocation fails if the file already exists (and passing true assumes that the file already exists, but does not otherwise touch it). Do we instead need to change that boolean into an 'int', with multiple values (0 and 1 for when the user first creates the snapshot, depending on whether they passed the REUSE_EXT flag, and 2 for this function, which says to forcefully recreate)?
After thinking a bit more, I think we need the following additional flags to virDomainRevertToSnapshot, and require that exactly one of these three mutually exclusive flags is present when reverting to an external snapshot:
VIR_DOMAIN_SNAPSHOT_REVERT_FOLLOW - revert to the current state of disks after an external snapshot (I'm about to post work on this flag - yes, it will conflict with what you've done so far)
With this flag, the description of the API function is starting to drift away from its original "revertTo" semantic. This flag is useful in the meaning it has but I'm not quite sure if I like the implications it would have: 1) You cannot do this with a checkpoint unless you wish to revert without the memory state. 2) Even if you shut down the machine correctly you will need to remember that and revert a checkpoint to the offline state with this flag. This is the original reason I was talking about the automatic meta-snapshots that could be used to revert to a current state of a branch that was left. When you want to leave a branch in live state you have to create a checkpoint in any case so this flag would be usable just for disk snapshots.
VIR_DOMAIN_SNAPSHOT_REVERT_RESET - revert to the point where an external snapshot was taken, and reset the external files to be exactly that state again (the semantics of _this_ patch)
and the original meaning of the "revertTo" semantic of this API
VIR_DOMAIN_SNAPSHOT_REVERT_COMMIT - revert to the current state of disks after an external snapshot, but commit the state of the external files into the backing files. Not possible if there is another snapshot branched off the same backing file (unless we want to let _FORCE allow us to potentially invalidate those other branches)
... and also invalidates checkpoints.
I can also see the use for a revert-and-delete operation rolled into one, via a new flag:
VIR_DOMAIN_SNAPSHOT_REVERT_DISCARD - after reverting to a snapshot, delete it (that is, we no longer plan to revert to it again in the future). When mixed with FOLLOW, we keep the backing chain with no change to disk contents; when mixed with RESET, we keep the backing chain but reset the backing chain to start fresh; and when mixed with COMMIT, we shorten the backing chain back to its pre-snapshot length.
This might be useful, but is basicaly syntax sugar (when snapshot deleting will be in place).
[I was debating whether the combination delete-and-revert fits better as part of the revert operation, via the REVERT_DISCARD flag, or whether it fits better as part of the delete operation, via a new flag there; but if we make it part of delete, then delete has to learn whether the domain should be running, paused, or stopped after an associated revert, whereas revert already has that logic.]
I agree, this should be part of the revert operation.
I'm still the most worried about the concept of whether we are blindly discarding user disk state since the snapshot was created, and whether we are properly recreating external files if that's what the user really wanted. I'm not sure whether to say ACK and then fix the fallout, or to wait a bit longer to see what else you come up with in the series. I guess we still have a few more development days before the 1.0.1 freeze to decide, so for now, I'd like to see what you have in store for snapshot deletion, and to also get my patches posted for snapshot revert-and-create, before I decide on this one.
Right now, the only way this API can be used is if the snapshot is the last in the chain or if you specify the FORCE flag, so with the given state of libvirt right now you lose only the changes that are not a part of a checkpoint/snapshot if you revert somewhere _or_ you specified --force and then you sign of that you want to do anything. As of re-creating the disk images, I tried the approach manually and qemu-img doesn't care much if the file exists and overwrites it.
I think I could ACK this patch if you respun it to require my proposed RESET flag to match your intent of discarding user disk state. Also, before you decide too much, you'd better read up on my respin with my proposed FOLLOW flag.
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 11/21/2012 07:16 AM, Peter Krempa wrote:
+ /* wipe and re-create disk images - qemu-img doesn't care if it exists*/ + if (qemuDomainSnapshotCreateInactiveExternal(driver, vm, snap, false) < 0) + goto endjob;
This comment says you are throwing away state, but without the use of any flag guarding things that the user meant to throw away state. I'm a bit worried that we're missing a flag here to state that we are reverting to the point of the external snapshot _and_ want to throw away all changes that happened in the external file.
Also, does qemuDomainSnapshotCreateInactiveExternal(,,,false) actually do what you want here? I thought that invocation fails if the file already exists (and passing true assumes that the file already exists, but does not otherwise touch it). Do we instead need to change that boolean into an 'int', with multiple values (0 and 1 for when the user first creates the snapshot, depending on whether they passed the REUSE_EXT flag, and 2 for this function, which says to forcefully recreate)?
After thinking a bit more, I think we need the following additional flags to virDomainRevertToSnapshot, and require that exactly one of these three mutually exclusive flags is present when reverting to an external snapshot:
VIR_DOMAIN_SNAPSHOT_REVERT_FOLLOW - revert to the current state of disks after an external snapshot (I'm about to post work on this flag - yes, it will conflict with what you've done so far)
With this flag, the description of the API function is starting to drift away from its original "revertTo" semantic. This flag is useful in the meaning it has but I'm not quite sure if I like the implications it would have:
1) You cannot do this with a checkpoint unless you wish to revert without the memory state.
Yes, but if you were careful to shut down first before leaving a branch, that's what you want; and if you (management app) took a snapshot prior to leaving a branch, then you would revert to that NEW snapshot, rather than using revert-with-follow to the old snapshot.
2) Even if you shut down the machine correctly you will need to remember that and revert a checkpoint to the offline state with this flag.
That's where the already-existing REVERT_RUNNING and REVERT_PAUSED flags come into play - revert-with-follow would default to shut off, but you can request that the guest be booted from the current disk state.
This is the original reason I was talking about the automatic meta-snapshots that could be used to revert to a current state of a branch that was left. When you want to leave a branch in live state you have to create a checkpoint in any case so this flag would be usable just for disk snapshots.
Rather, the result of using this flag would be as if the automatic meta-snapshot was always a disk snapshot. If you leave an offline branch, all is well; if you leave an online branch, then the disk may have lost pending I/O (so don't revert from a running machine, if you want to return to that branch). The idea of doing an automatic meta-snapshot prior to leaving a branch may still make sense, but would it be internal or external? Offline snapshots (whether internal or external) are relatively fast; but online snapshots take time no matter whether they are internal or external (with internal having the further complication of having to use qcow2 disks). I guess we'll have to continue thinking about automatic snapshots before leaving a branch, but it doesn't affect the usefulness of your patches so far.
VIR_DOMAIN_SNAPSHOT_REVERT_RESET - revert to the point where an external snapshot was taken, and reset the external files to be exactly that state again (the semantics of _this_ patch)
and the original meaning of the "revertTo" semantic of this API
Hmm, maybe we don't need a flag here after all for your patch. Thinking about the internal snapshot case, it's important to remember that there is always an implicit state (the current), as well as any number of named states in a qcow2 file. Each cluster has a reference count based on how many entry states can reach that cluster (with copy-on-write semantics any time the current state goes to write to a cluster). Consider the following setup of snapshots (in 'virsh snapshot-list --tree' format): a | +- b | +- c | | | +- d | +- e created by taking snapshot 'a', taking snapshot 'b', taking snapshot 'c', taking snapshot 'd', reverting to snapshot 'b', then taking snapshot 'e'. The action of reverting to 'b' did NOT invalidate snapshots c or d, but did discard any state changes that happened after 'd'. Another way of looking at this is that reverting to 'b' really means changing the implicit 'current' tip to initially be identical to 'b'. Looking at your patch, you are refusing to revert to anything but a leaf snapshot; that's good, because unlike the internal case, the moment we wipe an external file, we have lost any internal snapshots it may contain, and we have also invalidated any further external snapshots that use it as a backing file. So, your code will permit a revert to exactly one of two places: 'd' (if already executing on branch c/d) and 'e' (if already executing on branch e), and if there is no automatic snapshot taken at the tip, then the user is already throwing away running state, the same as they would be for reverting to an internal snapshot. Okay, I think you've convinced me that your patch doesn't need a flag, and that it matches the semantics of internal revert, or else fails because it can't match the semantics (at which point, you can use --force to throw things away, or my revert-and-commit to create a new branch from arbitrary points rather than leaf snapshots).
VIR_DOMAIN_SNAPSHOT_REVERT_COMMIT - revert to the current state of disks after an external snapshot, but commit the state of the external files into the backing files. Not possible if there is another snapshot branched off the same backing file (unless we want to let _FORCE allow us to potentially invalidate those other branches)
... and also invalidates checkpoints.
Well, here we could recreate the checkpoint at the time of the new revert. After all, a commit operation (without a discard) says that we don't need to go back to the earlier point in time, but we still want to remember something at the current point in time. Also, this operation would be time-consuming (disk commits are slower than creating new qcow2 wrappers), so it would need to be under an async job and allow cancellation (but what does that mean to the snapshot if you cancel half-way through?).
I can also see the use for a revert-and-delete operation rolled into one, via a new flag:
VIR_DOMAIN_SNAPSHOT_REVERT_DISCARD - after reverting to a snapshot, delete it (that is, we no longer plan to revert to it again in the future). When mixed with FOLLOW, we keep the backing chain with no change to disk contents; when mixed with RESET, we keep the backing chain but reset the backing chain to start fresh; and when mixed with COMMIT, we shorten the backing chain back to its pre-snapshot length.
This might be useful, but is basicaly syntax sugar (when snapshot deleting will be in place).
Not necessarily sugar - if you follow my argument about COMMIT needing to re-create the checkpoint state to the current point in time without this flag, then using this flag can omit that recreation step and definitely save time. But we can worry about that more when we actually get around to writing a COMMIT implementation.
Right now, the only way this API can be used is if the snapshot is the last in the chain or if you specify the FORCE flag, so with the given state of libvirt right now you lose only the changes that are not a part of a checkpoint/snapshot if you revert somewhere _or_ you specified --force and then you sign of that you want to do anything.
So the --force flag does let the user invalidate other snapshots/backing file chains/whatever, but in that case they asked for it. Meanwhile, since your version can only revert to leaf snapshots, and my revert-and-create can revert anywhere (to create a branch), and my proposed revert-and-follow flag can then switch branches (but up to the management to ensure sane state before leaving a branch), I think we are doing pretty good on the revert front. I'll post a rebase of your patch series merged with mine (as I'm finding I need some of your code to implement my revert-and-follow stuff).
As of re-creating the disk images, I tried the approach manually and qemu-img doesn't care much if the file exists and overwrites it.
Indeed, now that I think about it more: via the public API, we reject things up-front if the image file already exists; and since our backdoor didn't repeat the check, then we get the right behavior for both the public API (the file doesn't exist, so create it as empty) and your revert API (forcefully recreate the file as empty). Cool!
I think I could ACK this patch if you respun it to require my proposed RESET flag to match your intent of discarding user disk state. Also, before you decide too much, you'd better read up on my respin with my proposed FOLLOW flag.
Updated plan of attack - I'll repost your series with the final tweaks I think it needs as part of my series, and you can take that as my ACK of your code. So, I'm now working on the revert-with-follow semantics, and am assuming that you are working on the delete code. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 19.11.2012 17:06, Peter Krempa wrote:
I'm reposting another version of the series. This version fixes small mistakes in patches 1-3 and reposts patch 4 as it got missing due to a sending error.
Peter Krempa (4): qemu: Split out guts of qemuDomainSaveImageStartVM() to allow reuse snapshot: Add flag VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED snapshot: Add flag to allow hypervisor restart when reverting snapshots snapshot: qemu: Implement reverting of external snapshots
include/libvirt/libvirt.h.in | 3 + src/libvirt.c | 36 ++-- src/qemu/qemu_driver.c | 389 ++++++++++++++++++++++++++++++++++++++++--- tools/virsh-snapshot.c | 7 + tools/virsh.pod | 19 ++- 5 files changed, 410 insertions(+), 44 deletions(-)
We should really split qemu_driver.c - which has now ~15K LOC - e.g. by moving snapshot code to a separate file. Michal

On 11/19/12 18:19, Michal Privoznik wrote:
On 19.11.2012 17:06, Peter Krempa wrote:
I'm reposting another version of the series. This version fixes small mistakes in patches 1-3 and reposts patch 4 as it got missing due to a sending error.
Peter Krempa (4): qemu: Split out guts of qemuDomainSaveImageStartVM() to allow reuse snapshot: Add flag VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED snapshot: Add flag to allow hypervisor restart when reverting snapshots snapshot: qemu: Implement reverting of external snapshots
include/libvirt/libvirt.h.in | 3 + src/libvirt.c | 36 ++-- src/qemu/qemu_driver.c | 389 ++++++++++++++++++++++++++++++++++++++++--- tools/virsh-snapshot.c | 7 + tools/virsh.pod | 19 ++- 5 files changed, 410 insertions(+), 44 deletions(-)
We should really split qemu_driver.c - which has now ~15K LOC - e.g. by moving snapshot code to a separate file.
Michal
I'm planing on doing that when the snapshot work starts to wrap up so I don't cause merge conflicts. Peter

On 11/19/2012 09:06 AM, Peter Krempa wrote:
I'm reposting another version of the series. This version fixes small mistakes in patches 1-3 and reposts patch 4 as it got missing due to a sending error.
Peter Krempa (4): qemu: Split out guts of qemuDomainSaveImageStartVM() to allow reuse snapshot: Add flag VIR_DOMAIN_SNAPSHOT_REVERT_STOPPED snapshot: Add flag to allow hypervisor restart when reverting snapshots snapshot: qemu: Implement reverting of external snapshots
Is there still a handy git repo holding these changes? My prior ACK on patch 1 still stands; I'm now looking at the others. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (3)
-
Eric Blake
-
Michal Privoznik
-
Peter Krempa