[libvirt] [PATCH 0/7] External checkpoint support

This series adds support for external system checkpoints. This series is designed to be applied on top of: http://www.redhat.com/archives/libvir-list/2012-September/msg00594.html This first series adds support for taking the snapshots. I'll follow up with adding support to revert snapshots or to add add snapshots with different parents, but I'd like to start rounds of review on this until the design settles. Peter Peter Krempa (7): qemu: Fix misleading comment for qemuDomainObjBeginJobWithDriver() qemu: Split out code to save domain memory to allow reuse snapshot: Add flag to enable creating checkpoints in paused state snapshot: qemu: Add async job type for snapshots snapshot: qemu: Rename qemuDomainSnapshotCreateActive snapshot: qemu: Add support for external checkpoints snapshot: qemu: Remove restrictions preventing external snapshots include/libvirt/libvirt.h.in | 4 + src/qemu/qemu_domain.c | 6 +- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_driver.c | 474 +++++++++++++++++++++++++------------------ src/qemu/qemu_process.c | 19 ++ tools/virsh-domain-monitor.c | 2 + tools/virsh-snapshot.c | 6 + tools/virsh.pod | 12 +- 8 files changed, 326 insertions(+), 198 deletions(-) -- 1.7.12

The comment stated that you may call qemuDomainObjBeginJobWithDriver() without passing qemud_driver to signalize it's not locked. qemuDomainObjBeginJobWithDriver still accesses the qemud_driver structure and the lock singalling is done through a separate parameter. --- src/qemu/qemu_domain.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 17ae3b9..19ba099 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -884,8 +884,7 @@ int qemuDomainObjBeginAsyncJob(struct qemud_driver *driver, } /* - * obj must be locked before calling. If qemud_driver is passed, it MUST be - * locked; otherwise it MUST NOT be locked. + * obj and qemud_driver must be locked before calling. * * This must be called by anything that will change the VM state * in any way, or anything that will use the QEMU monitor. -- 1.7.12

On 10/09/2012 07:39 AM, Peter Krempa wrote:
The comment stated that you may call qemuDomainObjBeginJobWithDriver() without passing qemud_driver to signalize it's not locked.
signalize is a valid spelling, but unusual; I'd write it 'signal'.
qemuDomainObjBeginJobWithDriver still accesses the qemud_driver structure and the lock singalling is done through a separate parameter.
s/singalling/signaling/
--- src/qemu/qemu_domain.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 17ae3b9..19ba099 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -884,8 +884,7 @@ int qemuDomainObjBeginAsyncJob(struct qemud_driver *driver, }
/* - * obj must be locked before calling. If qemud_driver is passed, it MUST be - * locked; otherwise it MUST NOT be locked. + * obj and qemud_driver must be locked before calling.
ACK. This one can be applied independently of my pending patches (aargh, I still need to get those rebased with comments addressed...). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 10/11/12 05:25, Eric Blake wrote:
On 10/09/2012 07:39 AM, Peter Krempa wrote:
The comment stated that you may call qemuDomainObjBeginJobWithDriver() without passing qemud_driver to signalize it's not locked.
signalize is a valid spelling, but unusual; I'd write it 'signal'.
qemuDomainObjBeginJobWithDriver still accesses the qemud_driver structure and the lock singalling is done through a separate parameter.
s/singalling/signaling/
--- src/qemu/qemu_domain.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
ACK. This one can be applied independently of my pending patches (aargh, I still need to get those rebased with comments addressed...).
I fixed the spelling and pushed. Peter

The code that saves domain memory by migration to file can be reused while doing external snapshots of a machine. This patch extracts the common code and places it in a separate function. --- src/qemu/qemu_driver.c | 225 ++++++++++++++++++++++++++++--------------------- 1 file changed, 127 insertions(+), 98 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 896ecdd..b8ea289 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2604,7 +2604,7 @@ bswap_header(struct qemud_save_header *hdr) { /* return -errno on failure, or 0 on success */ static int -qemuDomainSaveHeader(int fd, const char *path, char *xml, +qemuDomainSaveHeader(int fd, const char *path, const char *xml, struct qemud_save_header *header) { int ret = 0; @@ -2751,98 +2751,38 @@ cleanup: return fd; } -/* This internal function expects the driver lock to already be held on - * entry and the vm must be active + locked. Vm will be unlocked and - * potentially free'd after this returns (eg transient VMs are freed - * shutdown). So 'vm' must not be referenced by the caller after - * this returns (whether returning success or failure). - */ +/* Helper function to execute a migration to file with a correct save header + * the caller needs to make sure that the processors are stopped and do all other + * actions besides saving memory */ static int -qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, - virDomainObjPtr vm, const char *path, - int compressed, const char *xmlin, unsigned int flags) +qemuDomainSaveMemory(struct qemud_driver *driver, + virDomainObjPtr vm, + const char *path, + const char *xml, + int compressed, + bool was_running, + unsigned int flags, + enum qemuDomainAsyncJob asyncJob) { - char *xml = NULL; struct qemud_save_header header; bool bypassSecurityDriver = false; - int ret = -1; - int rc; - virDomainEventPtr event = NULL; - qemuDomainObjPrivatePtr priv; bool needUnlink = false; - size_t len; - unsigned long long offset; - unsigned long long pad; + int ret = -1; int fd = -1; int directFlag = 0; virFileWrapperFdPtr wrapperFd = NULL; unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING; - - if (qemuProcessAutoDestroyActive(driver, vm)) { - virReportError(VIR_ERR_OPERATION_INVALID, - "%s", _("domain is marked for auto destroy")); - goto cleanup; - } + unsigned long long pad; + unsigned long long offset; + size_t len; memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_PARTIAL, sizeof(header.magic)); header.version = QEMUD_SAVE_VERSION; + header.was_running = was_running ? 1 : 0; header.compressed = compressed; - priv = vm->privateData; - - if (qemuDomainObjBeginAsyncJobWithDriver(driver, vm, - QEMU_ASYNC_JOB_SAVE) < 0) - goto cleanup; - - memset(&priv->job.info, 0, sizeof(priv->job.info)); - priv->job.info.type = VIR_DOMAIN_JOB_UNBOUNDED; - - /* Pause */ - if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { - header.was_running = 1; - if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, - QEMU_ASYNC_JOB_SAVE) < 0) - goto endjob; - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("guest unexpectedly quit")); - goto endjob; - } - } - /* libvirt.c already guaranteed these two flags are exclusive. */ - if (flags & VIR_DOMAIN_SAVE_RUNNING) - header.was_running = 1; - else if (flags & VIR_DOMAIN_SAVE_PAUSED) - header.was_running = 0; - - /* Get XML for the domain. Restore needs only the inactive xml, - * including secure. We should get the same result whether xmlin - * is NULL or whether it was the live xml of the domain moments - * before. */ - if (xmlin) { - virDomainDefPtr def = NULL; - - if (!(def = virDomainDefParseString(driver->caps, xmlin, - QEMU_EXPECTED_VIRT_TYPES, - VIR_DOMAIN_XML_INACTIVE))) { - goto endjob; - } - if (!virDomainDefCheckABIStability(vm->def, def)) { - virDomainDefFree(def); - goto endjob; - } - xml = qemuDomainDefFormatLive(driver, def, true, true); - } else { - xml = qemuDomainDefFormatLive(driver, vm->def, true, true); - } - if (!xml) { - virReportError(VIR_ERR_OPERATION_FAILED, - "%s", _("failed to get domain xml")); - goto endjob; - } len = strlen(xml) + 1; offset = sizeof(header) + len; @@ -2859,7 +2799,7 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, ((offset + pad) % QEMU_MONITOR_MIGRATE_TO_FILE_BS)); if (VIR_EXPAND_N(xml, len, pad) < 0) { virReportOOMError(); - goto endjob; + goto cleanup; } offset += pad; header.xml_len = len; @@ -2877,22 +2817,21 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, fd = qemuOpenFile(driver, path, O_WRONLY | O_TRUNC | O_CREAT | directFlag, &needUnlink, &bypassSecurityDriver); if (fd < 0) - goto endjob; + goto cleanup; + if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags))) - goto endjob; + goto cleanup; /* Write header to file, followed by XML */ - if (qemuDomainSaveHeader(fd, path, xml, &header) < 0) { - VIR_FORCE_CLOSE(fd); - goto endjob; - } + if (qemuDomainSaveHeader(fd, path, xml, &header) < 0) + goto cleanup; /* Perform the migration */ if (qemuMigrationToFile(driver, vm, fd, offset, path, qemuCompressProgramName(compressed), bypassSecurityDriver, - QEMU_ASYNC_JOB_SAVE) < 0) - goto endjob; + asyncJob) < 0) + goto cleanup; /* Touch up file header to mark image complete. */ @@ -2902,26 +2841,120 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, * that's acceptable. */ if (VIR_CLOSE(fd) < 0) { virReportSystemError(errno, _("unable to close %s"), path); - goto endjob; + goto cleanup; } + if (virFileWrapperFdClose(wrapperFd) < 0) - goto endjob; - fd = qemuOpenFile(driver, path, O_WRONLY, NULL, NULL); - if (fd < 0) - goto endjob; + goto cleanup; + + if ((fd = qemuOpenFile(driver, path, O_WRONLY, NULL, NULL)) < 0) + goto cleanup; memcpy(header.magic, QEMUD_SAVE_MAGIC, sizeof(header.magic)); + if (safewrite(fd, &header, sizeof(header)) != sizeof(header)) { virReportSystemError(errno, _("unable to write %s"), path); - goto endjob; + goto cleanup; } + if (VIR_CLOSE(fd) < 0) { virReportSystemError(errno, _("unable to close %s"), path); - goto endjob; + goto cleanup; } ret = 0; +cleanup: + VIR_FORCE_CLOSE(fd); + virFileWrapperFdFree(wrapperFd); + + if (ret != 0 && needUnlink) + unlink(path); + + return ret; +} + +/* This internal function expects the driver lock to already be held on + * entry and the vm must be active + locked. Vm will be unlocked and + * potentially free'd after this returns (eg transient VMs are freed + * shutdown). So 'vm' must not be referenced by the caller after + * this returns (whether returning success or failure). + */ +static int +qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, + virDomainObjPtr vm, const char *path, + int compressed, const char *xmlin, unsigned int flags) +{ + char *xml = NULL; + bool was_running = false; + int ret = -1; + int rc; + virDomainEventPtr event = NULL; + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (qemuProcessAutoDestroyActive(driver, vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is marked for auto destroy")); + goto cleanup; + } + + if (qemuDomainObjBeginAsyncJobWithDriver(driver, vm, + QEMU_ASYNC_JOB_SAVE) < 0) + + memset(&priv->job.info, 0, sizeof(priv->job.info)); + priv->job.info.type = VIR_DOMAIN_JOB_UNBOUNDED; + + /* Pause */ + if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { + was_running = true; + if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, + QEMU_ASYNC_JOB_SAVE) < 0) + goto endjob; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest unexpectedly quit")); + goto endjob; + } + } + + /* libvirt.c already guaranteed these two flags are exclusive. */ + if (flags & VIR_DOMAIN_SAVE_RUNNING) + was_running = true; + else if (flags & VIR_DOMAIN_SAVE_PAUSED) + was_running = false; + + /* Get XML for the domain. Restore needs only the inactive xml, + * including secure. We should get the same result whether xmlin + * is NULL or whether it was the live xml of the domain moments + * before. */ + if (xmlin) { + virDomainDefPtr def = NULL; + + if (!(def = virDomainDefParseString(driver->caps, xmlin, + QEMU_EXPECTED_VIRT_TYPES, + VIR_DOMAIN_XML_INACTIVE))) { + goto endjob; + } + if (!virDomainDefCheckABIStability(vm->def, def)) { + virDomainDefFree(def); + goto endjob; + } + xml = qemuDomainDefFormatLive(driver, def, true, true); + } else { + xml = qemuDomainDefFormatLive(driver, vm->def, true, true); + } + if (!xml) { + virReportError(VIR_ERR_OPERATION_FAILED, + "%s", _("failed to get domain xml")); + goto endjob; + } + + ret = qemuDomainSaveMemory(driver, vm, path, xml, compressed, + was_running, flags, QEMU_ASYNC_JOB_SAVE); + if (ret < 0) + goto endjob; + /* Shut it down */ qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_SAVED, 0); virDomainAuditStop(vm, "saved"); @@ -2937,7 +2970,7 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, endjob: if (vm) { if (ret != 0) { - if (header.was_running && virDomainObjIsActive(vm)) { + if (was_running && virDomainObjIsActive(vm)) { rc = qemuProcessStartCPUs(driver, vm, dom->conn, VIR_DOMAIN_RUNNING_SAVE_CANCELED, QEMU_ASYNC_JOB_SAVE); @@ -2950,11 +2983,7 @@ endjob: } cleanup: - VIR_FORCE_CLOSE(fd); - virFileWrapperFdFree(wrapperFd); VIR_FREE(xml); - if (ret != 0 && needUnlink) - unlink(path); if (event) qemuDomainEventQueue(driver, event); if (vm) -- 1.7.12

The default behavior while creating external checkpoints is to let the guest run while the memory state is caputred. This leads to a larger save file but minimizes the time needed to take the checkpoint. This patch adds a flag that causes the guest to be paused before taking the snapshot. *include/libvirt/libvirt.h.in: - add new paused reason: VIR_DOMAIN_PAUSED_SNAPSHOT - add new flag for takin snapshot: VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE *tools/virsh-domain-monitor.c: - add string representation for VIR_DOMAIN_PAUSED_SNAPSHOT *tools/virsh-snapshot.c: - add support for VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE *tools/virsh.pod: - add docs for --pause option added to use VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE flag --- include/libvirt/libvirt.h.in | 4 ++++ tools/virsh-domain-monitor.c | 2 ++ tools/virsh-snapshot.c | 6 ++++++ tools/virsh.pod | 12 ++++++++++-- 4 files changed, 22 insertions(+), 2 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 81f12a4..376d6d5 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -179,6 +179,7 @@ typedef enum { VIR_DOMAIN_PAUSED_WATCHDOG = 6, /* paused due to a watchdog event */ VIR_DOMAIN_PAUSED_FROM_SNAPSHOT = 7, /* paused after restoring from snapshot */ VIR_DOMAIN_PAUSED_SHUTTING_DOWN = 8, /* paused during shutdown process */ + VIR_DOMAIN_PAUSED_SNAPSHOT = 9, /* paused while creating a snaphot */ #ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_PAUSED_LAST @@ -3663,6 +3664,9 @@ typedef enum { the domain */ VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC = (1 << 7), /* atomically avoid partial changes */ + VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE = (1 << 8), /* pause the guest + before taking + checkpoint */ } virDomainSnapshotCreateFlags; /* Take a snapshot of the current VM state */ diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c index b264f15..faf23fe 100644 --- a/tools/virsh-domain-monitor.c +++ b/tools/virsh-domain-monitor.c @@ -223,6 +223,8 @@ vshDomainStateReasonToString(int state, int reason) return N_("from snapshot"); case VIR_DOMAIN_PAUSED_SHUTTING_DOWN: return N_("shutting down"); + case VIR_DOMAIN_PAUSED_SNAPSHOT: + return N_("creating snapshot"); case VIR_DOMAIN_PAUSED_UNKNOWN: case VIR_DOMAIN_PAUSED_LAST: ; diff --git a/tools/virsh-snapshot.c b/tools/virsh-snapshot.c index b828371..05ee7bd 100644 --- a/tools/virsh-snapshot.c +++ b/tools/virsh-snapshot.c @@ -127,6 +127,7 @@ static const vshCmdOptDef opts_snapshot_create[] = { {"reuse-external", VSH_OT_BOOL, 0, N_("reuse any existing external files")}, {"quiesce", VSH_OT_BOOL, 0, N_("quiesce guest's file systems")}, {"atomic", VSH_OT_BOOL, 0, N_("require atomic operation")}, + {"pause", VSH_OT_BOOL, 0, N_("pause guest before taking snapshot")}, {NULL, 0, 0, NULL} }; @@ -155,6 +156,8 @@ cmdSnapshotCreate(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE; if (vshCommandOptBool(cmd, "atomic")) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC; + if (vshCommandOptBool(cmd, "pause")) + flags |= VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE; dom = vshCommandOptDomain(ctl, cmd, NULL); if (dom == NULL) @@ -263,6 +266,7 @@ static const vshCmdOptDef opts_snapshot_create_as[] = { {"reuse-external", VSH_OT_BOOL, 0, N_("reuse any existing external files")}, {"quiesce", VSH_OT_BOOL, 0, N_("quiesce guest's file systems")}, {"atomic", VSH_OT_BOOL, 0, N_("require atomic operation")}, + {"pause", VSH_OT_BOOL, 0, N_("pause guest before taking snapshot")}, {"diskspec", VSH_OT_ARGV, 0, N_("disk attributes: disk[,snapshot=type][,driver=type][,file=name]")}, {NULL, 0, 0, NULL} @@ -292,6 +296,8 @@ cmdSnapshotCreateAs(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE; if (vshCommandOptBool(cmd, "atomic")) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC; + if (vshCommandOptBool(cmd, "pause")) + flags |= VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE; dom = vshCommandOptDomain(ctl, cmd, NULL); if (dom == NULL) diff --git a/tools/virsh.pod b/tools/virsh.pod index ac8a00f..b02ca45 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2575,7 +2575,7 @@ used to represent properties of snapshots. =item B<snapshot-create> I<domain> [I<xmlfile>] {[I<--redefine> [I<--current>]] | [I<--no-metadata>] [I<--halt>] [I<--disk-only>] [I<--reuse-external>] -[I<--quiesce>] [I<--atomic>]} +[I<--quiesce>] [I<--atomic>] [I<--pause>]} Create a snapshot for domain I<domain> with the properties specified in I<xmlfile>. Normally, the only properties settable for a domain snapshot @@ -2628,6 +2628,10 @@ this. If this flag is not specified, then some hypervisors may fail after partially performing the action, and B<dumpxml> must be used to see whether any partial changes occurred. +If I<--pause> is specified, libvirt pauses the guest before taking the +snapshot to decrease size of the memory image in case of external +checkpoints. Afterwards the guest is resumed. + Existence of snapshot metadata will prevent attempts to B<undefine> a persistent domain. However, for transient domains, snapshot metadata is silently lost when the domain quits running (whether @@ -2636,7 +2640,7 @@ by command such as B<destroy> or by internal guest action). =item B<snapshot-create-as> I<domain> {[I<--print-xml>] | [I<--no-metadata>] [I<--halt>] [I<--reuse-external>]} [I<name>] [I<description>] [I<--disk-only> [I<--quiesce>] [I<--atomic>] -[[I<--diskspec>] B<diskspec>]...] +[I<--pause>] [[I<--diskspec>] B<diskspec>]...] Create a snapshot for domain I<domain> with the given <name> and <description>; if either value is omitted, libvirt will choose a @@ -2681,6 +2685,10 @@ this. If this flag is not specified, then some hypervisors may fail after partially performing the action, and B<dumpxml> must be used to see whether any partial changes occurred. +If I<--pause> is specified, libvirt pauses the guest before taking the +snapshot to decrease size of the memory image in case of external +checkpoints. Afterwards the guest is resumed. + =item B<snapshot-current> I<domain> {[I<--name>] | [I<--security-info>] | [I<snapshotname>]} -- 1.7.12

On 10/09/2012 07:39 AM, Peter Krempa wrote:
The default behavior while creating external checkpoints is to let the guest run while the memory state is caputred. This leads to a larger
s/caputred/captured/
save file but minimizes the time needed to take the checkpoint.
I'd rather swap the defaults. After all, virDomainMigrate and virDomainCoreDump default to offline, and require a _LIVE flag to force live saves, as live is trickier to implement and might not converge, so it is not a good default.
This patch adds a flag that causes the guest to be paused before taking the snapshot. *include/libvirt/libvirt.h.in: - add new paused reason: VIR_DOMAIN_PAUSED_SNAPSHOT - add new flag for takin snapshot: VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE *tools/virsh-domain-monitor.c: - add string representation for VIR_DOMAIN_PAUSED_SNAPSHOT *tools/virsh-snapshot.c: - add support for VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE *tools/virsh.pod: - add docs for --pause option added to use VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE flag
You'll therefore need to rework this patch into selecting VIR_DOMAIN_SNAPSHOT_CREATE_LIVE with opposite semantics, and default to paused.
--- include/libvirt/libvirt.h.in | 4 ++++ tools/virsh-domain-monitor.c | 2 ++ tools/virsh-snapshot.c | 6 ++++++ tools/virsh.pod | 12 ++++++++++-- 4 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 81f12a4..376d6d5 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -179,6 +179,7 @@ typedef enum { VIR_DOMAIN_PAUSED_WATCHDOG = 6, /* paused due to a watchdog event */ VIR_DOMAIN_PAUSED_FROM_SNAPSHOT = 7, /* paused after restoring from snapshot */ VIR_DOMAIN_PAUSED_SHUTTING_DOWN = 8, /* paused during shutdown process */ + VIR_DOMAIN_PAUSED_SNAPSHOT = 9, /* paused while creating a snaphot */
Eep - VIR_DOMAIN_PAUSED_FROM_SNAPSHOT and VIR_DOMAIN_PAUSED_SNAPSHOT - do we really need two values? I'd rather you just reuse the existing value. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The new external system checkpoints will require an async job while the snapshot is taken. This patch adds QEMU_ASYNC_JOB_SNAPSHOT to track this job type. --- src/qemu/qemu_domain.c | 3 +++ src/qemu/qemu_domain.h | 1 + src/qemu/qemu_process.c | 19 +++++++++++++++++++ 3 files changed, 23 insertions(+) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 19ba099..a414ece 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -68,6 +68,7 @@ VIR_ENUM_IMPL(qemuDomainAsyncJob, QEMU_ASYNC_JOB_LAST, "migration in", "save", "dump", + "snapshot", ); @@ -84,6 +85,7 @@ qemuDomainAsyncJobPhaseToString(enum qemuDomainAsyncJob job, case QEMU_ASYNC_JOB_DUMP: case QEMU_ASYNC_JOB_NONE: case QEMU_ASYNC_JOB_LAST: + case QEMU_ASYNC_JOB_SNAPSHOT: ; /* fall through */ } @@ -106,6 +108,7 @@ qemuDomainAsyncJobPhaseFromString(enum qemuDomainAsyncJob job, case QEMU_ASYNC_JOB_DUMP: case QEMU_ASYNC_JOB_NONE: case QEMU_ASYNC_JOB_LAST: + case QEMU_ASYNC_JOB_SNAPSHOT: ; /* fall through */ } diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 97e4346..86448d6 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -88,6 +88,7 @@ enum qemuDomainAsyncJob { QEMU_ASYNC_JOB_MIGRATION_IN, QEMU_ASYNC_JOB_SAVE, QEMU_ASYNC_JOB_DUMP, + QEMU_ASYNC_JOB_SNAPSHOT, QEMU_ASYNC_JOB_LAST }; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 31909b7..0168f0e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2972,6 +2972,25 @@ qemuProcessRecoverJob(struct qemud_driver *driver, } break; + case QEMU_ASYNC_JOB_SNAPSHOT: + qemuDomainObjEnterMonitor(driver, vm); + ignore_value(qemuMonitorMigrateCancel(priv->mon)); + qemuDomainObjExitMonitor(driver, vm); + /* resume the domain but only if it was paused as a result of + * creating the snapshot. */ + if (state == VIR_DOMAIN_PAUSED && + ((job->asyncJob == QEMU_ASYNC_JOB_SNAPSHOT && + reason == VIR_DOMAIN_PAUSED_MIGRATION) || + reason == VIR_DOMAIN_PAUSED_UNKNOWN)) { + if (qemuProcessStartCPUs(driver, vm, conn, + VIR_DOMAIN_RUNNING_UNPAUSED, + QEMU_ASYNC_JOB_NONE) < 0) { + VIR_WARN("Could not resume domain %s after snapshot", + vm->def->name); + } + } + break; + case QEMU_ASYNC_JOB_NONE: case QEMU_ASYNC_JOB_LAST: break; -- 1.7.12

By now, libvirt supported only internal snapshots for active guests. This patch renames this function to qemuDomainSnapshotCreateActiveInternal to prepare the grounds for external active snapshots. --- src/qemu/qemu_driver.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b8ea289..297ae29 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10447,13 +10447,14 @@ qemuDomainSnapshotCreateInactive(struct qemud_driver *driver, return qemuDomainSnapshotForEachQcow2(driver, vm, snap, "-c", false); } + /* The domain is expected to be locked and active. */ static int -qemuDomainSnapshotCreateActive(virConnectPtr conn, - struct qemud_driver *driver, - virDomainObjPtr *vmptr, - virDomainSnapshotObjPtr snap, - unsigned int flags) +qemuDomainSnapshotCreateActiveInternal(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr *vmptr, + virDomainSnapshotObjPtr snap, + unsigned int flags) { virDomainObjPtr vm = *vmptr; qemuDomainObjPrivatePtr priv = vm->privateData; @@ -11280,8 +11281,8 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, if (qemuDomainSnapshotCreateInactive(driver, vm, snap) < 0) goto cleanup; } else { - if (qemuDomainSnapshotCreateActive(domain->conn, driver, - &vm, snap, flags) < 0) + if (qemuDomainSnapshotCreateActiveInternal(domain->conn, driver, + &vm, snap, flags) < 0) goto cleanup; } -- 1.7.12

This patch adds support to take external system checkpoints. The functionality is layered on top of the previous disk-only snapshot code. When the checkpoint is requested the domain memory is saved to the memory image file using live migration to file. (The user may specify to do take the memory image while the guest is paused with the VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE flag.) This operation pauses the guest. After the guest is paused the disk snapshot is taken. The memory save image shares format with the image created by virDomainSave() API. --- src/qemu/qemu_driver.c | 212 +++++++++++++++++++++++++++++++------------------ 1 file changed, 135 insertions(+), 77 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 297ae29..5cace0c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10846,31 +10846,25 @@ cleanup: /* The domain is expected to be locked and active. */ static int -qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, - struct qemud_driver *driver, - virDomainObjPtr *vmptr, +qemuDomainSnapshotCreateDiskActive(struct qemud_driver *driver, + virDomainObjPtr vm, virDomainSnapshotObjPtr snap, - unsigned int flags) + unsigned int flags, + enum qemuDomainAsyncJob asyncJob) { - virDomainObjPtr vm = *vmptr; qemuDomainObjPrivatePtr priv = vm->privateData; virJSONValuePtr actions = NULL; - bool resume = false; int ret = -1; int i; bool persist = false; int thaw = 0; /* 1 if freeze succeeded, -1 if freeze failed */ - bool atomic = (flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC) != 0; bool reuse = (flags & VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT) != 0; virCgroupPtr cgroup = NULL; - if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) - return -1; - if (!virDomainObjIsActive(vm)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("domain is not running")); - goto endjob; + goto cleanup; } if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES) && @@ -10878,7 +10872,7 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, virReportError(VIR_ERR_INTERNAL_ERROR, _("Unable to find cgroup for %s"), vm->def->name); - goto endjob; + goto cleanup; } /* 'cgroup' is still NULL if cgroups are disabled. */ @@ -10890,34 +10884,14 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, if (qemuDomainSnapshotFSFreeze(driver, vm) < 0) { /* helper reported the error */ thaw = -1; - goto endjob; + goto cleanup; } else { thaw = 1; } } - /* For multiple disks, libvirt must pause externally to get all - * snapshots to be at the same point in time, unless qemu supports - * transactions. For a single disk, snapshot is atomic without - * requiring a pause. Thanks to qemuDomainSnapshotPrepare, if - * we got to this point, the atomic flag now says whether we need - * to pause, and a capability bit says whether to use transaction. - */ - if (!atomic && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { - if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, - QEMU_ASYNC_JOB_NONE) < 0) - goto cleanup; - - resume = true; - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("guest unexpectedly quit")); - goto cleanup; - } - } if (qemuCapsGet(priv->caps, QEMU_CAPS_TRANSACTION)) { - actions = virJSONValueNewArray(); - if (!actions) { + if (!(actions = virJSONValueNewArray())) { virReportOOMError(); goto cleanup; } @@ -10928,7 +10902,9 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, * Based on earlier qemuDomainSnapshotPrepare, all * disks in this list are now either SNAPSHOT_NO, or * SNAPSHOT_EXTERNAL with a valid file name and qcow2 format. */ - qemuDomainObjEnterMonitorWithDriver(driver, vm); + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) + goto cleanup; + for (i = 0; i < snap->def->ndisks; i++) { virDomainDiskDefPtr persistDisk = NULL; @@ -10982,9 +10958,96 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, } } qemuDomainObjExitMonitorWithDriver(driver, vm); - if (ret < 0) + +cleanup: + virCgroupFree(&cgroup); + + if (ret == 0 || !qemuCapsGet(priv->caps, QEMU_CAPS_TRANSACTION)) { + if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0 || + (persist && virDomainSaveConfig(driver->configDir, vm->newDef) < 0)) + ret = -1; + } + + if (thaw != 0 && + qemuDomainSnapshotFSThaw(driver, vm, thaw > 0) < 0) { + /* helper reported the error, if it was needed */ + if (thaw > 0) + ret = -1; + } + + return ret; +} + + +static int +qemuDomainSnapshotCreateActiveExternal(virConnectPtr conn, + struct qemud_driver *driver, + virDomainObjPtr *vmptr, + virDomainSnapshotObjPtr snap, + unsigned int flags) +{ + bool resume = false; + int ret = -1; + virDomainObjPtr vm = *vmptr; + qemuDomainObjPrivatePtr priv = vm->privateData; + char *xml = NULL; + bool memory = snap->def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL; + bool atomic = !!(flags & VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC); + bool transaction = qemuCapsGet(priv->caps, QEMU_CAPS_TRANSACTION); + + if (qemuDomainObjBeginAsyncJobWithDriver(driver, vm, + QEMU_ASYNC_JOB_SNAPSHOT) < 0) goto cleanup; + /* we need to resume the guest only if it was previously running */ + if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { + resume = true; + + /* For multiple disks, libvirt must pause externally to get all + * snapshots to be at the same point in time, unless qemu supports + * transactions. For a single disk, snapshot is atomic without + * requiring a pause. Thanks to qemuDomainSnapshotPrepare, if + * we got to this point, the atomic flag now says whether we need + * to pause, and a capability bit says whether to use transaction. + */ + if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE || + (atomic && !transaction)) { + if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SNAPSHOT, + QEMU_ASYNC_JOB_SNAPSHOT) < 0) + goto endjob; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest unexpectedly quit")); + goto endjob; + } + } + } + + /* do the memory snapshot if necessary */ + if (memory) { + if (!(xml = qemuDomainDefFormatLive(driver, vm->def, true, false))) + goto endjob; + + if ((ret = qemuDomainSaveMemory(driver, vm, snap->def->file, + xml, QEMUD_SAVE_FORMAT_RAW, + resume, 0, + QEMU_ASYNC_JOB_SNAPSHOT)) < 0) + goto endjob; + } + + /* now the domain is now paused if: + * - if a memory snapshot was requested + * - an atomic snapshot was requested AND + * qemu does not support transactions + * + * Next we snapshot the disks. + */ + if ((ret = qemuDomainSnapshotCreateDiskActive(driver, vm, snap, flags, + QEMU_ASYNC_JOB_SNAPSHOT)) < 0) + goto endjob; + + /* the snapshot is complete now */ if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT) { virDomainEventPtr event; @@ -10994,53 +11057,41 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, virDomainAuditStop(vm, "from-snapshot"); /* We already filtered the _HALT flag for persistent domains * only, so this end job never drops the last reference. */ - ignore_value(qemuDomainObjEndJob(driver, vm)); + ignore_value(qemuDomainObjEndAsyncJob(driver, vm)); resume = false; - thaw = 0; vm = NULL; if (event) qemuDomainEventQueue(driver, event); } -cleanup: - if (resume && virDomainObjIsActive(vm)) { - if (qemuProcessStartCPUs(driver, vm, conn, - VIR_DOMAIN_RUNNING_UNPAUSED, - QEMU_ASYNC_JOB_NONE) < 0 && - virGetLastError() == NULL) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("resuming after snapshot failed")); - goto endjob; - } - } - - if (vm && (ret == 0 || - !qemuCapsGet(priv->caps, QEMU_CAPS_TRANSACTION))) { - if (virDomainSaveStatus(driver->caps, driver->stateDir, vm) < 0 || - (persist && - virDomainSaveConfig(driver->configDir, vm->newDef) < 0)) - ret = -1; - } + ret = 0; endjob: - if (cgroup) - virCgroupFree(&cgroup); - if (vm && thaw != 0 && - qemuDomainSnapshotFSThaw(driver, vm, thaw > 0) < 0) { - /* helper reported the error, if it was needed */ - if (thaw > 0) - ret = -1; - } - if (vm && (qemuDomainObjEndJob(driver, vm) == 0)) { + if (vm && !qemuDomainObjEndAsyncJob(driver, vm)) { /* Only possible if a transient vm quit while our locks were down, - * in which case we don't want to save snapshot metadata. */ + * in which case we don't want to save snapshot metadata. + */ *vmptr = NULL; ret = -1; } +cleanup: + VIR_FREE(xml); + if (resume && vm && virDomainObjIsActive(vm) && + qemuProcessStartCPUs(driver, vm, conn, + VIR_DOMAIN_RUNNING_UNPAUSED, + QEMU_ASYNC_JOB_NONE) < 0 && + virGetLastError() == NULL) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("resuming after snapshot failed")); + + return -1; + } + return ret; } + static virDomainSnapshotPtr qemuDomainSnapshotCreateXML(virDomainPtr domain, const char *xmlDesc, @@ -11066,7 +11117,8 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY | VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT | VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE | - VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC, NULL); + VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC | + VIR_DOMAIN_SNAPSHOT_CREATE_PAUSE, NULL); if ((flags & VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE) && !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY)) { @@ -11273,16 +11325,22 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, /* XXX Should we validate that the redefined snapshot even * makes sense, such as checking that qemu-img recognizes the * snapshot name in at least one of the domain's disks? */ - } else if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY) { - if (qemuDomainSnapshotCreateDiskActive(domain->conn, driver, - &vm, snap, flags) < 0) - goto cleanup; - } else if (!virDomainObjIsActive(vm)) { - if (qemuDomainSnapshotCreateInactive(driver, vm, snap) < 0) - goto cleanup; + } else if (virDomainObjIsActive(vm)) { + if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY || + snap->def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL) { + /* external checkpoint or disk snapshot */ + if (qemuDomainSnapshotCreateActiveExternal(domain->conn, driver, + &vm, snap, flags) < 0) + goto cleanup; + } else { + /* internal checkpoint */ + if (qemuDomainSnapshotCreateActiveInternal(domain->conn, driver, + &vm, snap, flags) < 0) + goto cleanup; + } } else { - if (qemuDomainSnapshotCreateActiveInternal(domain->conn, driver, - &vm, snap, flags) < 0) + /* inactive */ + if (qemuDomainSnapshotCreateInactive(driver, vm, snap) < 0) goto cleanup; } -- 1.7.12

Some of the pre-snapshot check have restrictions wired in regarding configuration options that influence taking of external checkpoints. This patch removes restrictions that would inhibit taking of such a snapshot. --- src/qemu/qemu_driver.c | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5cace0c..3a9f42e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10585,13 +10585,6 @@ qemuDomainSnapshotPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, break; case VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL: - if (def->state != VIR_DOMAIN_DISK_SNAPSHOT) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, - _("system checkpoint external snapshot for " - "disk %s not implemented yet"), - disk->name); - goto cleanup; - } if (!disk->driverType) { if (!(disk->driverType = strdup("qcow2"))) { virReportOOMError(); @@ -10634,12 +10627,15 @@ qemuDomainSnapshotPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, } } - if (!found) { + /* external snapshot is possible without specifying a disk to snapshot */ + if (!found && + !(def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("snapshots require at least one disk to be " - "selected for snapshot")); + _("internal and disk-only snapshots require at least " + "one disk to be selected for snapshot")); goto cleanup; } + if (def->state != VIR_DOMAIN_DISK_SNAPSHOT && active) { if (external == 1 || qemuCapsGet(priv->caps, QEMU_CAPS_TRANSACTION)) { @@ -11255,7 +11251,8 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, snap = other; } if (def->dom) { - if (def->state == VIR_DOMAIN_DISK_SNAPSHOT) { + if (def->state == VIR_DOMAIN_DISK_SNAPSHOT || + def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL) { align_location = VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL; align_match = false; } @@ -11285,9 +11282,10 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, def->memory = VIR_DOMAIN_SNAPSHOT_LOCATION_NONE; } else { def->state = virDomainObjGetState(vm, NULL); - def->memory = (def->state == VIR_DOMAIN_SHUTOFF ? - VIR_DOMAIN_SNAPSHOT_LOCATION_NONE : - VIR_DOMAIN_SNAPSHOT_LOCATION_INTERNAL); + if (def->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL) { + align_match = false; + align_location = VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL; + } } if (virDomainSnapshotAlignDisks(def, align_location, align_match) < 0 || -- 1.7.12
participants (2)
-
Eric Blake
-
Peter Krempa