[libvirt] [PATCH 0/3] atomic snapshots, round 1

I figured it's better to post small patch series for incremental review than to slam the list with a 20 patch series. I'm in the middle of coding up my RFC for interaction with the new qemu 1.1 'transaction' monitor command, and this is the first step. [1] https://www.redhat.com/archives/libvir-list/2012-March/msg00578.html To avoid a merge conflict, this patch [2] needs to be applied first. [2] https://www.redhat.com/archives/libvir-list/2012-March/msg00744.html Eric Blake (3): snapshot: add qemu capability for 'transaction' command snapshot: add atomic create flag snapshot: rudimentary qemu support for atomic disk snapshot include/libvirt/libvirt.h.in | 2 ++ src/libvirt.c | 21 ++++++++++++++------- src/qemu/qemu_capabilities.c | 1 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_driver.c | 40 +++++++++++++++++++++++++++------------- src/qemu/qemu_monitor_json.c | 3 +++ tools/virsh.c | 6 ++++++ tools/virsh.pod | 16 ++++++++++++++-- 8 files changed, 68 insertions(+), 22 deletions(-) -- 1.7.7.6

We need a capability bit to gracefully error out if some of the additions in future patches can't be implemented by the running qemu. * src/qemu/qemu_capabilities.h (QEMU_CAPS_TRANSACTION): New cap. * src/qemu/qemu_capabilities.c (qemuCaps): Name it. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set it. --- I haven't yet decided if I need to check for 'block-mirror' as an independent capability, but that can be a later patch. src/qemu/qemu_capabilities.c | 1 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_monitor_json.c | 3 +++ 3 files changed, 5 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index ace5011..0e09d6d 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -155,6 +155,7 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "system_wakeup", "scsi-disk.channel", "scsi-block", + "transaction", ); struct qemu_feature_flags { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 62b4270..78cdbe0 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -123,6 +123,7 @@ enum qemuCapsFlags { QEMU_CAPS_WAKEUP = 86, /* system_wakeup monitor command */ QEMU_CAPS_SCSI_DISK_CHANNEL = 87, /* Is scsi-disk.channel available? */ QEMU_CAPS_SCSI_BLOCK = 88, /* -device scsi-block */ + QEMU_CAPS_TRANSACTION = 89, /* transaction monitor command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index ce68e69..4dd6924 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -884,6 +884,9 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, if (STREQ(name, "system_wakeup")) qemuCapsSet(qemuCaps, QEMU_CAPS_WAKEUP); + + if (STREQ(name, "transaction")) + qemuCapsSet(qemuCaps, QEMU_CAPS_TRANSACTION); } ret = 0; -- 1.7.7.6

On 03/16/2012 11:05 PM, Eric Blake wrote:
We need a capability bit to gracefully error out if some of the additions in future patches can't be implemented by the running qemu.
* src/qemu/qemu_capabilities.h (QEMU_CAPS_TRANSACTION): New cap. * src/qemu/qemu_capabilities.c (qemuCaps): Name it. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set it. ---
ACK, Peter

Right now, it is appallingly easy to cause qemu disk snapshots to alter a domain then fail; for example, by requesting a two-disk snapshot where the second disk name resides on read-only storage. In this failure scenario, libvirt reports failure, but modifies the live domain XML in-place to record that the first disk snapshot was taken; and places a difficult burden on the management app to grab the XML and reparse it to see which disks, if any, were altered by the partial snapshot. This patch adds a new flag where implementations can request that the hypervisor make snapshots atomically; either no changes to XML occur, or all disks were altered as a group. If you request the flag, you either get outright failure up front, or you take advantage of hypervisor abilities to make an atomic snapshot. Of course, drivers should prefer the atomic means even without the flag explicitly requested. There's no way to make snapshots 100% bulletproof - even if the hypervisor does it perfectly atomic, we could run out of memory during the followup tasks of updating our in-memory XML, and report a failure. However, these sorts of catastrophic failures are rare and unlikely, and it is still nicer to know that either all snapshots happened or none of them, as that is an easier state to recover from. * include/libvirt/libvirt.h.in (VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC): New flag. * src/libvirt.c (virDomainSnapshotCreateXML): Document it. * tools/virsh.c (cmdSnapshotCreate, cmdSnapshotCreateAs): Expose it. * tools/virsh.pod (snapshot-create, snapshot-create-as): Document it. --- include/libvirt/libvirt.h.in | 2 ++ src/libvirt.c | 21 ++++++++++++++------- tools/virsh.c | 6 ++++++ tools/virsh.pod | 16 ++++++++++++++-- 4 files changed, 36 insertions(+), 9 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 7d41642..4566580 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -3289,6 +3289,8 @@ typedef enum { quiesce all mounted file systems within the domain */ + VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC = (1 << 7), /* atomically avoid + partial changes */ } virDomainSnapshotCreateFlags; /* Take a snapshot of the current VM state */ diff --git a/src/libvirt.c b/src/libvirt.c index 7f8d42c..48d08c4 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17119,14 +17119,21 @@ virDomainSnapshotGetConnect(virDomainSnapshotPtr snapshot) * @flags includes VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT, then existing * destination files are instead truncated and reused. * - * Returns an (opaque) virDomainSnapshotPtr on success, NULL on failure. * Be aware that although libvirt prefers to report errors up front with - * no other effect, there are certain types of failures where a failure - * can occur even though the guest configuration was changed (for - * example, if a disk snapshot request over two disks only fails on the - * second disk, leaving the first disk altered); so after getting a NULL - * return, it can be wise to use virDomainGetXMLDesc() to determine if - * any partial changes occurred. + * no other effect, some hypervisors have certain types of failures where + * the overall command can easily fail even though the guest configuration + * was partially altered (for example, if a disk snapshot request for two + * disks fails on the second disk, but the first disk alteration cannot be + * rolled back). If this API call fails, it is therefore normally + * necessary to follow up with virDomainGetXMLDesc() and check each disk + * to determine if any partial changes occurred. However, if @flags + * contains VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC, then libvirt guarantees + * that this command will not alter any disks unless the entire set of + * changes can be done atomically, making failure recovery simpler (note + * that it is still possible to fail after disks have changed, but only + * in the much rarer cases of running out of memory or disk space). + * + * Returns an (opaque) virDomainSnapshotPtr on success, NULL on failure. */ virDomainSnapshotPtr virDomainSnapshotCreateXML(virDomainPtr domain, diff --git a/tools/virsh.c b/tools/virsh.c index e48c56a..980e206 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -15715,6 +15715,7 @@ static const vshCmdOptDef opts_snapshot_create[] = { {"disk-only", VSH_OT_BOOL, 0, N_("capture disk state but not vm state")}, {"reuse-external", VSH_OT_BOOL, 0, N_("reuse any existing external files")}, {"quiesce", VSH_OT_BOOL, 0, N_("quiesce guest's file systems")}, + {"atomic", VSH_OT_BOOL, 0, N_("require atomic operation")}, {NULL, 0, 0, NULL} }; @@ -15741,6 +15742,8 @@ cmdSnapshotCreate(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT; if (vshCommandOptBool(cmd, "quiesce")) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE; + if (vshCommandOptBool(cmd, "atomic")) + flags |= VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC; if (!vshConnectionUsability(ctl, ctl->conn)) goto cleanup; @@ -15851,6 +15854,7 @@ static const vshCmdOptDef opts_snapshot_create_as[] = { {"disk-only", VSH_OT_BOOL, 0, N_("capture disk state but not vm state")}, {"reuse-external", VSH_OT_BOOL, 0, N_("reuse any existing external files")}, {"quiesce", VSH_OT_BOOL, 0, N_("quiesce guest's file systems")}, + {"atomic", VSH_OT_BOOL, 0, N_("require atomic operation")}, {"diskspec", VSH_OT_ARGV, 0, N_("disk attributes: disk[,snapshot=type][,driver=type][,file=name]")}, {NULL, 0, 0, NULL} @@ -15878,6 +15882,8 @@ cmdSnapshotCreateAs(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT; if (vshCommandOptBool(cmd, "quiesce")) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE; + if (vshCommandOptBool(cmd, "atomic")) + flags |= VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC; if (!vshConnectionUsability(ctl, ctl->conn)) goto cleanup; diff --git a/tools/virsh.pod b/tools/virsh.pod index ecf6aaa..232bf3c 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2270,7 +2270,7 @@ used to represent properties of snapshots. =item B<snapshot-create> I<domain> [I<xmlfile>] {[I<--redefine> [I<--current>]] | [I<--no-metadata>] [I<--halt>] [I<--disk-only>] [I<--reuse-external>] -[I<--quiesce>]} +[I<--quiesce>] [I<--atomic>]} Create a snapshot for domain I<domain> with the properties specified in I<xmlfile>. Normally, the only properties settable for a domain snapshot @@ -2317,6 +2317,12 @@ to freeze and unfreeze domain's mounted file systems. However, if domain has no guest agent, snapshot creation will fail. Currently, this requires I<--disk-only> to be passed as well. +If I<--atomic> is specified, libvirt will guarantee that the snapshot +either succeeds, or fails with no changes; not all hypervisors support +this. If this flag is not specified, then some hypervisors may fail +after partially performing the action, and B<dumpxml> must be used to +see whether any partial changes occurred. + Existence of snapshot metadata will prevent attempts to B<undefine> a persistent domain. However, for transient domains, snapshot metadata is silently lost when the domain quits running (whether @@ -2324,7 +2330,7 @@ by command such as B<destroy> or by internal guest action). =item B<snapshot-create-as> I<domain> {[I<--print-xml>] | [I<--no-metadata>] [I<--halt>] [I<--reuse-existing>]} [I<name>] -[I<description>] [I<--disk-only> [I<--quiesce>] +[I<description>] [I<--disk-only> [I<--quiesce>] [I<--atomic>] [[I<--diskspec>] B<diskspec>]...] Create a snapshot for domain I<domain> with the given <name> and @@ -2364,6 +2370,12 @@ treat the snapshot as current, and cannot revert to the snapshot unless B<snapshot-create> is later used to teach libvirt about the metadata again). This flag is incompatible with I<--print-xml>. +If I<--atomic> is specified, libvirt will guarantee that the snapshot +either succeeds, or fails with no changes; not all hypervisors support +this. If this flag is not specified, then some hypervisors may fail +after partially performing the action, and B<dumpxml> must be used to +see whether any partial changes occurred. + =item B<snapshot-current> I<domain> {[I<--name>] | [I<--security-info>] | [I<snapshotname>]} -- 1.7.7.6

Right now, it is appallingly easy to cause qemu disk snapshots to alter a domain then fail; for example, by requesting a two-disk snapshot where the second disk name resides on read-only storage. In this failure scenario, libvirt reports failure, but modifies the live domain XML in-place to record that the first disk snapshot was taken; and places a difficult burden on the management app to grab the XML and reparse it to see which disks, if any, were altered by the partial snapshot.
This patch adds a new flag where implementations can request that the hypervisor make snapshots atomically; either no changes to XML occur, or all disks were altered as a group. If you request the flag, you either get outright failure up front, or you take advantage of hypervisor abilities to make an atomic snapshot. Of course, drivers should prefer the atomic means even without the flag explicitly requested.
There's no way to make snapshots 100% bulletproof - even if the hypervisor does it perfectly atomic, we could run out of memory during the followup tasks of updating our in-memory XML, and report a failure. However, these sorts of catastrophic failures are rare and unlikely, and it is still nicer to know that either all snapshots happened or none of them, as that is an easier state to recover from.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC): New flag. * src/libvirt.c (virDomainSnapshotCreateXML): Document it. * tools/virsh.c (cmdSnapshotCreate, cmdSnapshotCreateAs): Expose it. * tools/virsh.pod (snapshot-create, snapshot-create-as): Document it. --- ... + * that it is still possible to fail after disks have changed, but only + * in the much rarer cases of running out of memory or disk space). (sometimes it's not that rare (but then you've also got some other
On 03/16/2012 11:05 PM, Eric Blake wrote: problems) :D) ACK, Peter

Taking an external snapshot of just one disk is atomic, without having to pause and resume the VM. This also paves the way for later patches to interact with the new qemu 'transaction' monitor command. The various scenarios when requesting atomic are: online, 1 disk, old qemu - safe, allowed by this patch online, more than 1 disk, old qemu - failure, this patch offline snapshot - safe, once a future patch implements offline disk snapshot online, 1 or more disks, new qemu - safe, once future patch uses transaction * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Support new flag for single-disk setups. (qemuDomainSnapshotDiskPrepare): Check for atomic here. (qemuDomainSnapshotCreateDiskActive): Skip pausing the VM when atomic supported. --- src/qemu/qemu_driver.c | 40 +++++++++++++++++++++++++++------------- 1 files changed, 27 insertions(+), 13 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 13ca92f..a46ce10 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9712,13 +9712,16 @@ endjob: static int qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, - bool allow_reuse) + unsigned int *flags) { int ret = -1; int i; bool found = false; bool active = virDomainObjIsActive(vm); struct stat st; + bool allow_reuse = (*flags & VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT) != 0; + bool atomic = (*flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) != 0; + int external = 0; for (i = 0; i < def->ndisks; i++) { virDomainSnapshotDiskDefPtr disk = &def->disks[i]; @@ -9773,6 +9776,7 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, goto cleanup; } found = true; + external++; break; case VIR_DOMAIN_DISK_SNAPSHOT_NO: @@ -9792,6 +9796,13 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, "selected for snapshot")); goto cleanup; } + if (atomic && external > 1) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("atomic snapshot of multiple disks is unsupported")); + goto cleanup; + } else if (external == 1) { + *flags |= VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC; + } ret = 0; @@ -9920,6 +9931,7 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, int i; bool persist = false; int thaw = 0; /* 1 if freeze succeeded, -1 if freeze failed */ + bool atomic = (flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) != 0; if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) return -1; @@ -9944,14 +9956,14 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, } } - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { - /* In qemu, snapshot_blkdev on a single disk will pause cpus, - * but this confuses libvirt since notifications are not given - * when qemu resumes. And for multiple disks, libvirt must - * pause externally to get all snapshots to be at the same - * point in time. For simplicitly, we always pause ourselves - * rather than relying on qemu doing pause. - */ + /* For multiple disks, libvirt must pause externally to get all + * snapshots to be at the same point in time, unless qemu supports + * transactions. For a single disk, snapshot is atomic without + * requiring a pause. Thanks to qemuDomainSnapshotDiskPrepare, if + * we got to this point, the atomic flag now says whether we need + * to pause, and a capability bit says whether to use transaction. + */ + if (!atomic && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, QEMU_ASYNC_JOB_NONE) < 0) goto cleanup; @@ -10069,7 +10081,8 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, VIR_DOMAIN_SNAPSHOT_CREATE_HALT | VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY | VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT | - VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, NULL); + VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE | + VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC, NULL); if ((flags & VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE) && !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)) { @@ -10211,14 +10224,15 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, goto cleanup; if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY) { - bool allow_reuse; + qemuDomainObjPrivatePtr priv = vm->privateData; - allow_reuse = (flags & VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT) != 0; if (virDomainSnapshotAlignDisks(def, VIR_DOMAIN_DISK_SNAPSHOT_EXTERNAL, false) < 0) goto cleanup; - if (qemuDomainSnapshotDiskPrepare(vm, def, allow_reuse) < 0) + if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_TRANSACTION)) + flags |= VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC; + if (qemuDomainSnapshotDiskPrepare(vm, def, &flags) < 0) goto cleanup; def->state = VIR_DOMAIN_DISK_SNAPSHOT; } else { -- 1.7.7.6

On 03/16/2012 04:05 PM, Eric Blake wrote:
Taking an external snapshot of just one disk is atomic, without having to pause and resume the VM. This also paves the way for later patches to interact with the new qemu 'transaction' monitor command.
The various scenarios when requesting atomic are: online, 1 disk, old qemu - safe, allowed by this patch online, more than 1 disk, old qemu - failure, this patch offline snapshot - safe, once a future patch implements offline disk snapshot online, 1 or more disks, new qemu - safe, once future patch uses transaction
This is incomplete:
@@ -10069,7 +10081,8 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, VIR_DOMAIN_SNAPSHOT_CREATE_HALT | VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY | VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT | - VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, NULL); + VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE | + VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC, NULL);
By accepting the new 'atomic' flag, we now have to ensure it works for all modes. As written, v1 covered the disk-snapshot mode. The live system checkpoint mode is already atomic ('savevm' is a single monitor command; if it can fail partway through, that's a bug in qemu). But for offline internal snapshots, we were making iterative calls to 'qemu-img snapshot -c', and not cleaning up after partial failure. I'm sending a v2 to fix that. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Taking an external snapshot of just one disk is atomic, without having to pause and resume the VM. This also paves the way for later patches to interact with the new qemu 'transaction' monitor command. The various scenarios when requesting atomic are: online, 1 disk, old qemu - safe, allowed by this patch online, more than 1 disk, old qemu - failure, this patch offline snapshot - safe, once a future patch implements offline disk snapshot online, 1 or more disks, new qemu - safe, once future patch uses transaction Taking an online system checkpoint snapshot is atomic, since it is done via a single 'savevm' monitor command. Taking an offline system checkpoint snapshot currently uses multiple qemu-img invocations with no provision for cleanup of partial failure, so for now we mark it unsupported. * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Support new flag for single-disk setups. (qemuDomainSnapshotDiskPrepare): Check for atomic here. (qemuDomainSnapshotCreateDiskActive): Skip pausing the VM when atomic supported. (qemuDomainSnapshotIsAllowed): Use bool instead of int. --- Patches 1/3 and 2/3 unchanged. v2: consider interactions of atomic with non-disk-snapshots src/qemu/qemu_driver.c | 56 +++++++++++++++++++++++++++++++++-------------- 1 files changed, 39 insertions(+), 17 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 85f8cf7..9e62738 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9535,7 +9535,8 @@ cleanup: return ret; } -static int qemuDomainSnapshotIsAllowed(virDomainObjPtr vm) +static bool +qemuDomainSnapshotIsAllowed(virDomainObjPtr vm) { int i; @@ -9550,11 +9551,11 @@ static int qemuDomainSnapshotIsAllowed(virDomainObjPtr vm) qemuReportError(VIR_ERR_OPERATION_INVALID, _("Disk '%s' does not support snapshotting"), vm->def->disks[i]->src); - return 0; + return false; } } - return 1; + return true; } static int @@ -9712,13 +9713,17 @@ endjob: static int qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, - bool allow_reuse) + unsigned int *flags) { int ret = -1; int i; bool found = false; bool active = virDomainObjIsActive(vm); struct stat st; + bool allow_reuse = (*flags & VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT) != 0; + bool atomic = (*flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) != 0; + int external = 0; + qemuDomainObjPrivatePtr priv = vm->privateData; for (i = 0; i < def->ndisks; i++) { virDomainSnapshotDiskDefPtr disk = &def->disks[i]; @@ -9773,6 +9778,7 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, goto cleanup; } found = true; + external++; break; case VIR_DOMAIN_DISK_SNAPSHOT_NO: @@ -9792,6 +9798,17 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, "selected for snapshot")); goto cleanup; } + if (active) { + if (external == 1 || + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_TRANSACTION)) { + *flags |= VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC; + } else if (atomic && external > 1) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("atomic live snapshot of multiple disks " + "is unsupported")); + goto cleanup; + } + } ret = 0; @@ -9920,6 +9937,7 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, int i; bool persist = false; int thaw = 0; /* 1 if freeze succeeded, -1 if freeze failed */ + bool atomic = (flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) != 0; if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) return -1; @@ -9944,14 +9962,14 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, } } - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { - /* In qemu, snapshot_blkdev on a single disk will pause cpus, - * but this confuses libvirt since notifications are not given - * when qemu resumes. And for multiple disks, libvirt must - * pause externally to get all snapshots to be at the same - * point in time. For simplicitly, we always pause ourselves - * rather than relying on qemu doing pause. - */ + /* For multiple disks, libvirt must pause externally to get all + * snapshots to be at the same point in time, unless qemu supports + * transactions. For a single disk, snapshot is atomic without + * requiring a pause. Thanks to qemuDomainSnapshotDiskPrepare, if + * we got to this point, the atomic flag now says whether we need + * to pause, and a capability bit says whether to use transaction. + */ + if (!atomic && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, QEMU_ASYNC_JOB_NONE) < 0) goto cleanup; @@ -10070,7 +10088,8 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, VIR_DOMAIN_SNAPSHOT_CREATE_HALT | VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY | VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT | - VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, NULL); + VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE | + VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC, NULL); if ((flags & VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE) && !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)) { @@ -10212,14 +10231,11 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, goto cleanup; if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY) { - bool allow_reuse; - - allow_reuse = (flags & VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT) != 0; if (virDomainSnapshotAlignDisks(def, VIR_DOMAIN_DISK_SNAPSHOT_EXTERNAL, false) < 0) goto cleanup; - if (qemuDomainSnapshotDiskPrepare(vm, def, allow_reuse) < 0) + if (qemuDomainSnapshotDiskPrepare(vm, def, &flags) < 0) goto cleanup; def->state = VIR_DOMAIN_DISK_SNAPSHOT; } else { @@ -10279,6 +10295,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, &vm, snap, flags) < 0) goto cleanup; } else if (!virDomainObjIsActive(vm)) { + if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("atomic snapshots of inactive domains not " + "implemented yet")); + goto cleanup; + } if (qemuDomainSnapshotCreateInactive(driver, vm, snap) < 0) goto cleanup; } else { -- 1.7.7.6

On 03/17/2012 05:33 PM, Eric Blake wrote:
Taking an external snapshot of just one disk is atomic, without having to pause and resume the VM. This also paves the way for later patches to interact with the new qemu 'transaction' monitor command.
The various scenarios when requesting atomic are: online, 1 disk, old qemu - safe, allowed by this patch online, more than 1 disk, old qemu - failure, this patch offline snapshot - safe, once a future patch implements offline disk snapshot online, 1 or more disks, new qemu - safe, once future patch uses transaction
Taking an online system checkpoint snapshot is atomic, since it is done via a single 'savevm' monitor command.
Taking an offline system checkpoint snapshot currently uses multiple qemu-img invocations with no provision for cleanup of partial failure, so for now we mark it unsupported.
* src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Support new flag for single-disk setups. (qemuDomainSnapshotDiskPrepare): Check for atomic here. (qemuDomainSnapshotCreateDiskActive): Skip pausing the VM when atomic supported. (qemuDomainSnapshotIsAllowed): Use bool instead of int. ---
Patches 1/3 and 2/3 unchanged. v2: consider interactions of atomic with non-disk-snapshots
src/qemu/qemu_driver.c | 56 +++++++++++++++++++++++++++++++++-------------- 1 files changed, 39 insertions(+), 17 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 85f8cf7..9e62738 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10279,6 +10295,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, &vm, snap, flags)< 0) goto cleanup; } else if (!virDomainObjIsActive(vm)) { + if (flags& VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("atomic snapshots of inactive domains not " + "implemented yet")); + goto cleanup;
I wonder if we shouldn't add a error code for unimplemented functionality to differentiate it from unsupported configurations. (Well, but using a configuration that uses unimplemented functionality makes it an unsupported configuration basically, so I don't really mind)
+ } if (qemuDomainSnapshotCreateInactive(driver, vm, snap)< 0) goto cleanup; } else {
ACK, Peter.

On 03/19/2012 06:20 AM, Peter Krempa wrote:
On 03/17/2012 05:33 PM, Eric Blake wrote:
Taking an external snapshot of just one disk is atomic, without having to pause and resume the VM. This also paves the way for later patches to interact with the new qemu 'transaction' monitor command.
+++ b/src/qemu/qemu_driver.c @@ -10279,6 +10295,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, &vm, snap, flags)< 0) goto cleanup; } else if (!virDomainObjIsActive(vm)) { + if (flags& VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("atomic snapshots of inactive domains not " + "implemented yet")); + goto cleanup;
I wonder if we shouldn't add a error code for unimplemented functionality to differentiate it from unsupported configurations. (Well, but using a configuration that uses unimplemented functionality makes it an unsupported configuration basically, so I don't really mind)
This exact same error message is removed in 4/3, so I don't think it quite matters _what_ we put here. I guess I could rebase the series to swap 3/4 and 4/4, to avoid the churn.
+ } if (qemuDomainSnapshotCreateInactive(driver, vm, snap)< 0) goto cleanup; } else {
ACK,
Thanks for the reviews. I may wait another day or two before actually pushing, since I'm still in the middle of testing other snapshot-related patches, just to make sure my tests don't turn up any other needed last-minute changes to the ones I've posted so far. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Offline internal snapshots can be rolled back with just a little bit of refactoring, meaning that we are now automatically atomic. * src/qemu/qemu_domain.c (qemuDomainSnapshotForEachQcow2): Move guts... (qemuDomainSnapshotForEachQcow2Raw): ...to new helper, to allow rollbacks. * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Offline snapshots are now atomic. --- v2: new patch. src/qemu/qemu_domain.c | 55 ++++++++++++++++++++++++++++++----------------- src/qemu/qemu_driver.c | 6 ----- 2 files changed, 35 insertions(+), 26 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 625c595..a845480 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1,7 +1,7 @@ /* * qemu_domain.h: QEMU domain private state * - * Copyright (C) 2006-2011 Red Hat, Inc. + * Copyright (C) 2006-2012 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -1489,24 +1489,17 @@ cleanup: /* The domain is expected to be locked and inactive. Return -1 on normal * failure, 1 if we skipped a disk due to try_all. */ -int -qemuDomainSnapshotForEachQcow2(struct qemud_driver *driver, - virDomainObjPtr vm, - virDomainSnapshotObjPtr snap, - const char *op, - bool try_all) +static int +qemuDomainSnapshotForEachQcow2Raw(struct qemud_driver *driver, + virDomainDefPtr def, + const char *name, + const char *op, + bool try_all, + int ndisks) { const char *qemuimgarg[] = { NULL, "snapshot", NULL, NULL, NULL, NULL }; int i; bool skipped = false; - virDomainDefPtr def; - - /* Prefer action on the disks in use at the time the snapshot was - * created; but fall back to current definition if dealing with a - * snapshot created prior to libvirt 0.9.5. */ - def = snap->def->dom; - if (!def) - def = vm->def; qemuimgarg[0] = qemuFindQemuImgBinary(driver); if (qemuimgarg[0] == NULL) { @@ -1515,13 +1508,10 @@ qemuDomainSnapshotForEachQcow2(struct qemud_driver *driver, } qemuimgarg[2] = op; - qemuimgarg[3] = snap->def->name; + qemuimgarg[3] = name; - for (i = 0; i < def->ndisks; i++) { + for (i = 0; i < ndisks; i++) { /* FIXME: we also need to handle LVM here */ - /* FIXME: if we fail halfway through this loop, we are in an - * inconsistent state. I'm not quite sure what to do about that - */ if (def->disks[i]->device == VIR_DOMAIN_DISK_DEVICE_DISK) { if (!def->disks[i]->driverType || STRNEQ(def->disks[i]->driverType, "qcow2")) { @@ -1533,6 +1523,11 @@ qemuDomainSnapshotForEachQcow2(struct qemud_driver *driver, def->disks[i]->dst); skipped = true; continue; + } else if (STREQ(op, "-c") && i) { + /* We must roll back partial creation by deleting + * all earlier snapshots. */ + qemuDomainSnapshotForEachQcow2Raw(driver, def, name, + "-d", false, i); } qemuReportError(VIR_ERR_OPERATION_INVALID, _("Disk device '%s' does not support" @@ -1558,6 +1553,26 @@ qemuDomainSnapshotForEachQcow2(struct qemud_driver *driver, return skipped ? 1 : 0; } +/* The domain is expected to be locked and inactive. Return -1 on normal + * failure, 1 if we skipped a disk due to try_all. */ +int +qemuDomainSnapshotForEachQcow2(struct qemud_driver *driver, + virDomainObjPtr vm, + virDomainSnapshotObjPtr snap, + const char *op, + bool try_all) +{ + /* Prefer action on the disks in use at the time the snapshot was + * created; but fall back to current definition if dealing with a + * snapshot created prior to libvirt 0.9.5. */ + virDomainDefPtr def = snap->def->dom; + + if (!def) + def = vm->def; + return qemuDomainSnapshotForEachQcow2Raw(driver, def, snap->def->name, + op, try_all, def->ndisks); +} + /* Discard one snapshot (or its metadata), without reparenting any children. */ int qemuDomainSnapshotDiscard(struct qemud_driver *driver, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9e62738..4bbfabd 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10295,12 +10295,6 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, &vm, snap, flags) < 0) goto cleanup; } else if (!virDomainObjIsActive(vm)) { - if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) { - qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("atomic snapshots of inactive domains not " - "implemented yet")); - goto cleanup; - } if (qemuDomainSnapshotCreateInactive(driver, vm, snap) < 0) goto cleanup; } else { -- 1.7.7.6

On 03/17/2012 05:33 PM, Eric Blake wrote:
Offline internal snapshots can be rolled back with just a little bit of refactoring, meaning that we are now automatically atomic.
* src/qemu/qemu_domain.c (qemuDomainSnapshotForEachQcow2): Move guts... (qemuDomainSnapshotForEachQcow2Raw): ...to new helper, to allow rollbacks. * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Offline snapshots are now atomic. --- You rollback changes to disks if other disks are not snapshotable, but later on, when the actual qemu-img command is run and fails the rollback is not performed.
I's suggest squashing in: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index a845480..8dde3c9 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1512,70 +1512,75 @@ qemuDomainSnapshotForEachQcow2Raw(struct qemud_driver *driver, for (i = 0; i < ndisks; i++) { /* FIXME: we also need to handle LVM here */ if (def->disks[i]->device == VIR_DOMAIN_DISK_DEVICE_DISK) { if (!def->disks[i]->driverType || STRNEQ(def->disks[i]->driverType, "qcow2")) { if (try_all) { /* Continue on even in the face of error, since other * disks in this VM may have the same snapshot name. */ VIR_WARN("skipping snapshot action on %s", def->disks[i]->dst); skipped = true; continue; } else if (STREQ(op, "-c") && i) { /* We must roll back partial creation by deleting * all earlier snapshots. */ qemuDomainSnapshotForEachQcow2Raw(driver, def, name, "-d", false, i); } qemuReportError(VIR_ERR_OPERATION_INVALID, _("Disk device '%s' does not support" " snapshotting"), def->disks[i]->dst); return -1; } qemuimgarg[4] = def->disks[i]->src; if (virRun(qemuimgarg, NULL) < 0) { if (try_all) { VIR_WARN("skipping snapshot action on %s", def->disks[i]->dst); skipped = true; continue; + } else if (STREQ(op, "-c") && i) { + /* We must roll back partial creation by deleting + * all earlier snapshots. */ + qemuDomainSnapshotForEachQcow2Raw(driver, def, name, + "-d", false, i); } return -1; } } } return skipped ? 1 : 0; } If you don't add this change the result is following: Try to make a snapshot of a domain that has one of images missing: virsh # snapshot-create-as Bare2 test1 test --atomic error: internal error Child process (/usr/bin/qemu-img snapshot -c test1 /var/lib/libvirt/images/d2.qcow2) status unexpected: exit status 1 But the first image is still modified: # qemu-img snapshot -l d.qcow2 Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 test1 0 2012-03-19 14:26:50 00:00:00.000 Otherwise looks good. ACK with that suggested change. Peter

On 03/19/2012 07:40 AM, Peter Krempa wrote:
On 03/17/2012 05:33 PM, Eric Blake wrote:
Offline internal snapshots can be rolled back with just a little bit of refactoring, meaning that we are now automatically atomic.
* src/qemu/qemu_domain.c (qemuDomainSnapshotForEachQcow2): Move guts... (qemuDomainSnapshotForEachQcow2Raw): ...to new helper, to allow rollbacks. * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Offline snapshots are now atomic. --- You rollback changes to disks if other disks are not snapshotable, but later on, when the actual qemu-img command is run and fails the rollback is not performed.
Good catch.
I's suggest squashing in:
if (virRun(qemuimgarg, NULL) < 0) { if (try_all) { VIR_WARN("skipping snapshot action on %s", def->disks[i]->dst); skipped = true; continue; + } else if (STREQ(op, "-c") && i) { + /* We must roll back partial creation by deleting + * all earlier snapshots. */ + qemuDomainSnapshotForEachQcow2Raw(driver, def, name, + "-d", false, i); }
Yep, that looks right. Thanks for the test case.
Otherwise looks good. ACK with that suggested change.
Same story as for 3/3 - I'll wait to push this until I've run a few more tests for the rest of my pending series, in case I find any more last-minute issues. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 03/19/2012 03:25 PM, Eric Blake wrote:
On 03/19/2012 07:40 AM, Peter Krempa wrote:
On 03/17/2012 05:33 PM, Eric Blake wrote:
Offline internal snapshots can be rolled back with just a little bit of refactoring, meaning that we are now automatically atomic.
* src/qemu/qemu_domain.c (qemuDomainSnapshotForEachQcow2): Move guts... (qemuDomainSnapshotForEachQcow2Raw): ...to new helper, to allow rollbacks. * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Offline snapshots are now atomic. ---
Otherwise looks good. ACK with that suggested change.
Same story as for 3/3 - I'll wait to push this until I've run a few more tests for the rest of my pending series, in case I find any more last-minute issues.
I've pushed round 1 now. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Peter Krempa