On 10/23/2012 09:12 AM, Peter Krempa wrote:
This patch adds support to take external system checkpoints.
The functionality is layered on top of the previous disk-only snapshot
code. When the checkpoint is requested the domain memory is saved to the
memory image file using live migration to file. (The user may specify to
do take the memory image while the guest is paused with the
VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE flag.)
Outdated comment, but looks like you've touched that up on your git tree.
This operation pauses the guest.
After the guest is paused the disk snapshot is taken.
The guest will pause either because _LIVE was not present (we paused up
front), or because qemu will pause the domain when live migration
converges. Then, while the guest is paused, we take the disk snapshot,
then resume the guest, and voila! external snapshot!
The memory save image shares format with the image created by
virDomainSave() API.
Hmm, wonder if we should document somewhere that:
virDomainSave(dom, "/path/to/file")
is now shorthand for:
virDomainSnapshotCreate(dom, "
<domainsnapshot>
<memory file='/path/to/file'/>
<disks>
<disk name='vda' snapshot='no'/> <!-- etc. for each disk
-->
</disks>
</domainsnapshot>
", VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA)
It also makes me wonder if virDomainSaveFlags should learn a _LIVE flag.
Consider what happens when you save domain memory but not disks, and
then let the domain continue running - since you didn't snapshot disks,
then any modification made to the disks will invalidate the saved
memory. That said, it is possible to create a domain that only uses
read-only disks and stores everything in tmpfs. In fact, one of my
favorite test cases while working on block-commit was a domain based on
a live iso image - precisely because I can then attach any number
(including 0) of other disks which won't be touched by the live OS, at
which point you really can make a case for a live ram-only snapshot.
+static int
+qemuDomainSnapshotCreateActiveExternal(virConnectPtr conn,
+ struct qemud_driver *driver,
+ virDomainObjPtr *vmptr,
+ virDomainSnapshotObjPtr snap,
+ unsigned int flags)
+{
+ bool resume = false;
+ int ret = -1;
+ virDomainObjPtr vm = *vmptr;
+ qemuDomainObjPrivatePtr priv = vm->privateData;
+ char *xml = NULL;
+ bool memory = snap->def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL;
+ bool atomic = !!(flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC);
+ bool transaction = qemuCapsGet(priv->caps, QEMU_CAPS_TRANSACTION);
+
+ if (qemuDomainObjBeginAsyncJobWithDriver(driver, vm,
+ QEMU_ASYNC_JOB_SNAPSHOT) < 0)
goto cleanup;
Hmm, here we are starting an async job that can be canceled any time we
drop locks. But the biggest part of this job is waiting for the
migration to file to complete; once that part is done, we will drop
locks several more times while issuing a 'transaction' or a series of
disk snapshot monitor commands, and I don't think the cancellation code
is prepared to handle that case. I wonder if we need to separate this
into two jobs - one async to do the migration to file, and one sync to
do the disk snapshots, so that we only allow cancellation during the
first half (the second half is fast enough that we aren't starving
access to the domain in the process).
+ /* we need to resume the guest only if it was previously running
*/
+ if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) {
+ resume = true;
I haven't checked closely - does this new async job allow the user to
pause and continue the guest at will? I know with live migration we
allow the client to pause the guest if they are tired of waiting on
things to converge (in fact, virsh even has a migrate option that
auto-pauses the guest if things take too long). Hmm, while I know we
can pause a live migration, what I don't know is if we allow resuming a
guest while migration is ongoing.
I'm worried that we are missing the interaction that allows a user to
pause a live snapshot if things aren't converging fast enough. I would
be okay if our design is that, for the initial implementation, we reject
pause and resume attempts during a migration, and add that ability
later, but it is something we need to keep in mind. And it does make
the question of whether to resume a bit trickier - it is no longer a
matter of whether the guest was running before we started the snapshot,
but also whether the guest has been paused later on during a live
snapshot. Similar to live migration, I'm assuming that it is always
safe to pause, but that once things are paused, we probably want to
reject resuming until the snapshot operation completes.
+
+ /* For multiple disks, libvirt must pause externally to get all
+ * snapshots to be at the same point in time, unless qemu supports
+ * transactions. For a single disk, snapshot is atomic without
+ * requiring a pause. Thanks to qemuDomainSnapshotPrepare, if
+ * we got to this point, the atomic flag now says whether we need
+ * to pause, and a capability bit says whether to use transaction.
+ */
+ if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE ||
+ (atomic && !transaction)) {
I looked in your tree to see how you changed things for the _LIVE flag:
+ if (!(flags & VIR_DOMAIN_SNAPSHOT_CREATE_LIVE) ||
+ (!memory && atomic && !transaction)) {
That's not quite the right logic. You want to pause if there is a
memory snapshot but no _LIVE flag, or if there are multiple disks but no
transaction support. Something like:
if ((memory && !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_LIVE)) ||
(atomic && !transaction))
+ if (qemuProcessStopCPUs(driver, vm,
VIR_DOMAIN_PAUSED_SNAPSHOT,
+ QEMU_ASYNC_JOB_SNAPSHOT) < 0)
+ goto endjob;
+
+ if (!virDomainObjIsActive(vm)) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("guest unexpectedly quit"));
+ goto endjob;
+ }
+ }
+ }
+
+ /* do the memory snapshot if necessary */
+ if (memory) {
+ if (!(xml = qemuDomainDefFormatLive(driver, vm->def, true, false)))
+ goto endjob;
+
+ if ((ret = qemuDomainSaveMemory(driver, vm, snap->def->file,
+ xml, QEMUD_SAVE_FORMAT_RAW,
+ resume, 0,
+ QEMU_ASYNC_JOB_SNAPSHOT)) < 0)
+ goto endjob;
Be aware that at this point, even if _LIVE was specified, the domain is
now paused (qemu pauses the domain once things converge) - you may need
to account for this later.
+ }
+
+ /* now the domain is now paused if:
+ * - if a memory snapshot was requested
+ * - an atomic snapshot was requested AND
+ * qemu does not support transactions
Yep, should have read further - the comment looks right.
+ *
+ * Next we snapshot the disks.
+ */
+ if ((ret = qemuDomainSnapshotCreateDiskActive(driver, vm, snap, flags,
+ QEMU_ASYNC_JOB_SNAPSHOT)) < 0)
+ goto endjob;
+
+ /* the snapshot is complete now */
if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT) {
virDomainEventPtr event;
static virDomainSnapshotPtr
qemuDomainSnapshotCreateXML(virDomainPtr domain,
const char *xmlDesc,
@@ -11190,7 +11241,8 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain,
VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY |
VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT |
VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE |
- VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC, NULL);
+ VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC |
+ VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE, NULL);
s/PAUSE/LIVE/, obviously
if ((flags & VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE) &&
!(flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)) {
@@ -11397,16 +11449,22 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain,
/* XXX Should we validate that the redefined snapshot even
* makes sense, such as checking that qemu-img recognizes the
* snapshot name in at least one of the domain's disks? */
- } else if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY) {
- if (qemuDomainSnapshotCreateDiskActive(domain->conn, driver,
- &vm, snap, flags) < 0)
- goto cleanup;
- } else if (!virDomainObjIsActive(vm)) {
- if (qemuDomainSnapshotCreateInactive(driver, vm, snap) < 0)
- goto cleanup;
+ } else if (virDomainObjIsActive(vm)) {
+ if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY ||
+ snap->def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL) {
+ /* external checkpoint or disk snapshot */
+ if (qemuDomainSnapshotCreateActiveExternal(domain->conn, driver,
+ &vm, snap, flags) < 0)
+ goto cleanup;
+ } else {
+ /* internal checkpoint */
+ if (qemuDomainSnapshotCreateActiveInternal(domain->conn, driver,
+ &vm, snap, flags) < 0)
+ goto cleanup;
We should fail internal checkpoint if _LIVE is present, since qemu's
'savevm' does not support a live mode.
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org