[PATCH 0/9] Introduce instant memory snapshots

Hi, everyone. The details for instant snapshot are given in main patch "qemu: support instant snapshots". Here I only want to mention two moments. First as instant snapshot API is synchronous we need to add an event that snapshot moment in time is passed so that client is free to make changes to guest. This way client can make use of instant snapshot while memory is still being written to disk. Second there is work in progress on adding asynchronous revert to snapshot [1]. The idea is to write memory during snapshot in such a way that it is possible to do a revert to snapshot in background. Namely start guest on revert with only minimal amount of memory restored from disk and then restore memory in background in postcopy migration fashion. In order to make it possible snapshot memory should be written to disk not just as plain migration stream. Thus the [1] patch adds a special tool to write the migration stream during snapshot. So if/when patch [1] will be merged I plan to add 'instant revert' mode to snapshot too. [1] [RFC PATCH v2 0/7] migration/snapshot: External snapshot utility https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg03619.html Nikolay Shirokovskiy (9): qemu: make snap possibility check before pausing CPUS qemu: snapshot: remove redundant flag setting qemu: don't bother saving ret code from qemuDomainSaveMemory qemu: move virQEMUSaveDataFree auto decl to header qemu: factor out qemuDomainSnapshotSaveMemory qemu: introduce qemuMigrationParamsSetCapability qemu: support instant mode in qemuMigrationSrcToFile qemu: support instant snapshots virsh: add --instant flag to snapshot-create docs/manpages/virsh.rst | 8 +- include/libvirt/libvirt-domain-snapshot.h | 2 + src/qemu/qemu_driver.c | 4 +- src/qemu/qemu_migration.c | 20 ++- src/qemu/qemu_migration.h | 1 + src/qemu/qemu_migration_params.c | 13 ++ src/qemu/qemu_migration_params.h | 5 + src/qemu/qemu_saveimage.c | 4 +- src/qemu/qemu_saveimage.h | 2 + src/qemu/qemu_snapshot.c | 158 +++++++++++++++------- tools/virsh-snapshot.c | 6 + 11 files changed, 162 insertions(+), 61 deletions(-) -- 2.27.0

This way we won't unnecessaryly pause VMs. Looks like pausing does not afect qemuMigrationSrcIsAllowed checks so we can check without pausing. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_snapshot.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index d105eead27..007c55b82a 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1363,6 +1363,10 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, virQEMUSaveData *data = NULL; g_autoptr(GHashTable) blockNamedNodeData = NULL; + /* check if migration is possible */ + if (memory && !qemuMigrationSrcIsAllowed(driver, vm, false, 0)) + return -1; + /* If quiesce was requested, then issue a freeze command, and a * counterpart thaw command when it is actually sent to agent. * The command will fail if the guest is paused or the guest agent @@ -1423,10 +1427,6 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, /* do the memory snapshot if necessary */ if (memory) { - /* check if migration is possible */ - if (!qemuMigrationSrcIsAllowed(driver, vm, false, 0)) - goto cleanup; - priv->job.current->statsType = QEMU_DOMAIN_JOB_STATS_TYPE_SAVEDUMP; /* allow the migration job to be cancelled or the domain to be paused */ -- 2.27.0

On Thu, Nov 11, 2021 at 11:55:46 +0300, Nikolay Shirokovskiy wrote:
This way we won't unnecessaryly pause VMs. Looks like pausing does not afect qemuMigrationSrcIsAllowed checks so we can check without pausing.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_snapshot.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

It is already set above. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_snapshot.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index 007c55b82a..aa9b2ebbd9 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1413,8 +1413,6 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, _("guest unexpectedly quit")); goto cleanup; } - - resume = true; } } -- 2.27.0

On Thu, Nov 11, 2021 at 11:55:47 +0300, Nikolay Shirokovskiy wrote:
It is already set above.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_snapshot.c | 2 -- 1 file changed, 2 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

It does not make a difference. qemuDomainSaveMemory ret code is either 0 or -1. If it is -1 then ret is already set to -1. If it is 0 then it is overwritten by qemuSnapshotCreateActiveExternalDisks anyway. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_snapshot.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index aa9b2ebbd9..8656affa25 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1451,9 +1451,9 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, memory_existing = virFileExists(snapdef->memorysnapshotfile); - if ((ret = qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile, - data, compressor, 0, - QEMU_ASYNC_JOB_SNAPSHOT)) < 0) + if (qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile, + data, compressor, 0, + QEMU_ASYNC_JOB_SNAPSHOT) < 0) goto cleanup; /* the memory image was created, remove it on errors */ -- 2.27.0

On Thu, Nov 11, 2021 at 11:55:48 +0300, Nikolay Shirokovskiy wrote:
It does not make a difference. qemuDomainSaveMemory ret code is either 0 or -1. If it is -1 then ret is already set to -1. If it is 0 then it is overwritten by qemuSnapshotCreateActiveExternalDisks anyway.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_snapshot.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

In order to use it in qemu_driver.c which has of plenty of virQEMUSaveDataFree calls. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_saveimage.c | 1 - src/qemu/qemu_saveimage.h | 1 + 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index e14e2987f1..e03b79c303 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -88,7 +88,6 @@ virQEMUSaveDataFree(virQEMUSaveData *data) g_free(data); } -G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree); /** * This function steals @domXML on success. diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 45c5f35e11..0710426742 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -112,3 +112,4 @@ virQEMUSaveDataNew(char *domXML, void virQEMUSaveDataFree(virQEMUSaveData *data); +G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree); -- 2.27.0

On Thu, Nov 11, 2021 at 11:55:49 +0300, Nikolay Shirokovskiy wrote:
In order to use it in qemu_driver.c which has of plenty of virQEMUSaveDataFree calls.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_saveimage.c | 1 - src/qemu/qemu_saveimage.h | 1 + 2 files changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Peter Krempa <pkrempa@redhat.com>

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_snapshot.c | 83 ++++++++++++++++++++++++---------------- 1 file changed, 49 insertions(+), 34 deletions(-) diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index 8656affa25..7e4dadb876 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1339,6 +1339,54 @@ qemuSnapshotCreateActiveExternalDisks(virDomainObj *vm, return 0; } +static int +qemuSnapshotSaveMemory(virQEMUDriver *driver, + virDomainObj *vm, + virDomainSnapshotDef *snapdef, + bool running, + virQEMUDriverConfig *cfg) +{ + qemuDomainObjPrivate *priv = vm->privateData; + g_autoptr(virCommand) compressor = NULL; + g_autoptr(virQEMUSaveData) data = NULL; + g_autofree char *xml = NULL; + int compressed; + + priv->job.current->statsType = QEMU_DOMAIN_JOB_STATS_TYPE_SAVEDUMP; + + /* allow the migration job to be cancelled or the domain to be paused */ + qemuDomainObjSetAsyncJobMask(vm, (QEMU_JOB_DEFAULT_MASK | + JOB_MASK(QEMU_JOB_SUSPEND) | + JOB_MASK(QEMU_JOB_MIGRATION_OP))); + + if ((compressed = qemuSaveImageGetCompressionProgram(cfg->snapshotImageFormat, + &compressor, + "snapshot", false)) < 0) + return -1; + + if (!(xml = qemuDomainDefFormatLive(driver, priv->qemuCaps, + vm->def, priv->origCPU, + true, true)) || + !(snapdef->cookie = (virObject *) qemuDomainSaveCookieNew(vm))) + return -1; + + if (!(data = virQEMUSaveDataNew(xml, + (qemuDomainSaveCookie *) snapdef->cookie, + running, compressed, driver->xmlopt))) + return -1; + xml = NULL; + + if (qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile, + data, compressor, 0, + QEMU_ASYNC_JOB_SNAPSHOT) < 0) + return -1; + + /* forbid any further manipulation */ + qemuDomainObjSetAsyncJobMask(vm, QEMU_JOB_DEFAULT_MASK); + + return 0; +} + static int qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, @@ -1351,16 +1399,12 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, bool resume = false; int ret = -1; qemuDomainObjPrivate *priv = vm->privateData; - g_autofree char *xml = NULL; virDomainSnapshotDef *snapdef = virDomainSnapshotObjGetDef(snap); bool memory = snapdef->memory == VIR_DOMAIN_SNAPSHOT_LOCATION_EXTERNAL; bool memory_unlink = false; bool memory_existing = false; bool thaw = false; bool pmsuspended = false; - int compressed; - g_autoptr(virCommand) compressor = NULL; - virQEMUSaveData *data = NULL; g_autoptr(GHashTable) blockNamedNodeData = NULL; /* check if migration is possible */ @@ -1425,43 +1469,15 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, /* do the memory snapshot if necessary */ if (memory) { - priv->job.current->statsType = QEMU_DOMAIN_JOB_STATS_TYPE_SAVEDUMP; - - /* allow the migration job to be cancelled or the domain to be paused */ - qemuDomainObjSetAsyncJobMask(vm, (QEMU_JOB_DEFAULT_MASK | - JOB_MASK(QEMU_JOB_SUSPEND) | - JOB_MASK(QEMU_JOB_MIGRATION_OP))); - - if ((compressed = qemuSaveImageGetCompressionProgram(cfg->snapshotImageFormat, - &compressor, - "snapshot", false)) < 0) - goto cleanup; - - if (!(xml = qemuDomainDefFormatLive(driver, priv->qemuCaps, - vm->def, priv->origCPU, - true, true)) || - !(snapdef->cookie = (virObject *) qemuDomainSaveCookieNew(vm))) - goto cleanup; - - if (!(data = virQEMUSaveDataNew(xml, - (qemuDomainSaveCookie *) snapdef->cookie, - resume, compressed, driver->xmlopt))) - goto cleanup; - xml = NULL; - memory_existing = virFileExists(snapdef->memorysnapshotfile); - if (qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile, - data, compressor, 0, - QEMU_ASYNC_JOB_SNAPSHOT) < 0) + if (qemuSnapshotSaveMemory(driver, vm, snapdef, resume, cfg) < 0) goto cleanup; /* the memory image was created, remove it on errors */ if (!memory_existing) memory_unlink = true; - /* forbid any further manipulation */ - qemuDomainObjSetAsyncJobMask(vm, QEMU_JOB_DEFAULT_MASK); } /* the domain is now paused if a memory snapshot was requested */ @@ -1521,7 +1537,6 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, qemuDomainObjEndAgentJob(vm); } - virQEMUSaveDataFree(data); if (memory_unlink && ret < 0) unlink(snapdef->memorysnapshotfile); -- 2.27.0

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_migration_params.c | 12 ++++++++++++ src/qemu/qemu_migration_params.h | 4 ++++ 2 files changed, 16 insertions(+) diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index b6c582aaca..11081dc11c 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -1125,6 +1125,18 @@ qemuMigrationParamsGetULL(qemuMigrationParams *migParams, } +void +qemuMigrationParamsSetCapability(qemuMigrationParams *migParams, + qemuMigrationCapability cap, + bool value) +{ + if (value) + ignore_value(virBitmapSetBit(migParams->caps, cap)); + else + ignore_value(virBitmapClearBit(migParams->caps, cap)); +} + + /** * qemuMigrationParamsCheck: * diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index f770bd2576..5ca171226f 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -133,6 +133,10 @@ qemuMigrationParamsGetULL(qemuMigrationParams *migParams, unsigned long long *value); void +qemuMigrationParamsSetCapability(qemuMigrationParams *migParams, + qemuMigrationCapability cap, + bool value); +void qemuMigrationParamsSetBlockDirtyBitmapMapping(qemuMigrationParams *migParams, virJSONValue **params); -- 2.27.0

We only need to turn QEMU_MIGRATION_CAP_BACKGROUND_SNAPSHOT migration capabiliti for this. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- src/qemu/qemu_driver.c | 4 ++-- src/qemu/qemu_migration.c | 20 +++++++++++++++----- src/qemu/qemu_migration.h | 1 + src/qemu/qemu_migration_params.c | 1 + src/qemu/qemu_migration_params.h | 1 + src/qemu/qemu_saveimage.c | 3 ++- src/qemu/qemu_saveimage.h | 1 + src/qemu/qemu_snapshot.c | 5 +++-- 8 files changed, 26 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6a77d9f513..8edc38d343 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2855,7 +2855,7 @@ qemuDomainSaveInternal(virQEMUDriver *driver, xml = NULL; ret = qemuSaveImageCreate(driver, vm, path, data, compressor, - flags, QEMU_ASYNC_JOB_SAVE); + flags, false, QEMU_ASYNC_JOB_SAVE); if (ret < 0) goto endjob; @@ -3238,7 +3238,7 @@ doCoreDump(virQEMUDriver *driver, if (!qemuMigrationSrcIsAllowed(driver, vm, false, 0)) goto cleanup; - rc = qemuMigrationSrcToFile(driver, vm, fd, compressor, + rc = qemuMigrationSrcToFile(driver, vm, fd, compressor, false, QEMU_ASYNC_JOB_DUMP); } diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 9729041846..18a01df3d0 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -5898,6 +5898,7 @@ int qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, int fd, virCommand *compressor, + bool instant, qemuDomainAsyncJob asyncJob) { qemuDomainObjPrivate *priv = vm->privateData; @@ -5924,6 +5925,11 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, QEMU_DOMAIN_MIG_BANDWIDTH_MAX * 1024 * 1024) < 0) return -1; + if (instant) + qemuMigrationParamsSetCapability(migParams, + QEMU_MIGRATION_CAP_BACKGROUND_SNAPSHOT, + true); + if (qemuMigrationParamsApply(driver, vm, asyncJob, migParams) < 0) return -1; @@ -6018,11 +6024,15 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, /* Restore max migration bandwidth */ if (virDomainObjIsActive(vm)) { if (bwParam) { - if (qemuMigrationParamsSetULL(migParams, - QEMU_MIGRATION_PARAM_MAX_BANDWIDTH, - saveMigBandwidth * 1024 * 1024) == 0) - ignore_value(qemuMigrationParamsApply(driver, vm, asyncJob, - migParams)); + qemuMigrationParamsSetULL(migParams, + QEMU_MIGRATION_PARAM_MAX_BANDWIDTH, + saveMigBandwidth * 1024 * 1024); + if (instant) + qemuMigrationParamsSetCapability(migParams, + QEMU_MIGRATION_CAP_BACKGROUND_SNAPSHOT, + false); + ignore_value(qemuMigrationParamsApply(driver, vm, asyncJob, + migParams)); } else { if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { qemuMonitorSetMigrationSpeed(priv->mon, saveMigBandwidth); diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index dd74f8bc88..9d91fbc882 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -209,6 +209,7 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, int fd, virCommand *compressor, + bool instant, qemuDomainAsyncJob asyncJob) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 11081dc11c..148aa2d12d 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -90,6 +90,7 @@ VIR_ENUM_IMPL(qemuMigrationCapability, "late-block-activate", "multifd", "dirty-bitmaps", + "background-snapshot", ); diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index 5ca171226f..844ebf92eb 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -40,6 +40,7 @@ typedef enum { QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE, QEMU_MIGRATION_CAP_MULTIFD, QEMU_MIGRATION_CAP_BLOCK_DIRTY_BITMAPS, + QEMU_MIGRATION_CAP_BACKGROUND_SNAPSHOT, QEMU_MIGRATION_CAP_LAST } qemuMigrationCapability; diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index e03b79c303..b2b48610e6 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -258,6 +258,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, virQEMUSaveData *data, virCommand *compressor, unsigned int flags, + bool instant, qemuDomainAsyncJob asyncJob) { g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); @@ -295,7 +296,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, goto cleanup; /* Perform the migration */ - if (qemuMigrationSrcToFile(driver, vm, fd, compressor, asyncJob) < 0) + if (qemuMigrationSrcToFile(driver, vm, fd, compressor, instant, asyncJob) < 0) goto cleanup; /* Touch up file header to mark image complete. */ diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 0710426742..2e87d6485a 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -96,6 +96,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, virQEMUSaveData *data, virCommand *compressor, unsigned int flags, + bool instant, qemuDomainAsyncJob asyncJob); int diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index 7e4dadb876..b521634f2a 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1344,6 +1344,7 @@ qemuSnapshotSaveMemory(virQEMUDriver *driver, virDomainObj *vm, virDomainSnapshotDef *snapdef, bool running, + bool instant, virQEMUDriverConfig *cfg) { qemuDomainObjPrivate *priv = vm->privateData; @@ -1377,7 +1378,7 @@ qemuSnapshotSaveMemory(virQEMUDriver *driver, xml = NULL; if (qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile, - data, compressor, 0, + data, compressor, 0, instant, QEMU_ASYNC_JOB_SNAPSHOT) < 0) return -1; @@ -1471,7 +1472,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, if (memory) { memory_existing = virFileExists(snapdef->memorysnapshotfile); - if (qemuSnapshotSaveMemory(driver, vm, snapdef, resume, cfg) < 0) + if (qemuSnapshotSaveMemory(driver, vm, snapdef, resume, false, cfg) < 0) goto cleanup; /* the memory image was created, remove it on errors */ -- 2.27.0

Usual snapshot with memory of a running domain with first saves domain memory to disk and then make a disk snapshot. As result we get snapshot at moment in time much later then client asked for snapshot if domain memory is large. So basically you need to wait several minutes or even several tens of minutes before making guest unsafe changes. This patch adds instant mode to snapshot with memory of a running domain. In this mode snapshot is done almost at the moment of client request. It does not depends of domain memory size. So client can proceed with unsafe changes immediately (We need an event to notify client though as snapshot API itself is still synchronous and API will finish when domain memory will be stored to disk). I dared to call this snapshot mode instant instead of background as it named in QEMU. IMHO in case of libvirt API name background be confused with asynchronous snapshot API which is not true. Instant mode basically just stops guest CPUs, makes disks snapshot and then start QEMU's background memory snapshot. Background here means if guest writes some region in memory then this memory first is written to disk so that eventually domain memory written to disk corresponds to moment of starting background snapshot. Note that background snapshot starts guest CPUs right after snapshot start and do not stop CPUs after snapshot is finished unlikely to usual memory snapshots or migration. Nevertheless qemuSnapshotCreateActiveExternal calls qemuProcessStartCPUs in instant mode in order to lock domain images and other tasks we do not do in resume handler. Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- include/libvirt/libvirt-domain-snapshot.h | 2 + src/qemu/qemu_snapshot.c | 68 ++++++++++++++++++----- 2 files changed, 57 insertions(+), 13 deletions(-) diff --git a/include/libvirt/libvirt-domain-snapshot.h b/include/libvirt/libvirt-domain-snapshot.h index 90673ed0fb..2661ba2556 100644 --- a/include/libvirt/libvirt-domain-snapshot.h +++ b/include/libvirt/libvirt-domain-snapshot.h @@ -73,6 +73,8 @@ typedef enum { running */ VIR_DOMAIN_SNAPSHOT_CREATE_VALIDATE = (1 << 9), /* validate the XML against the schema */ + VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT = (1 << 10),/* snapshot at the moment + of call */ } virDomainSnapshotCreateFlags; /* Take a snapshot of the current VM state */ diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index b521634f2a..14c4a64d52 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1398,6 +1398,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, { virObjectEvent *event; bool resume = false; + bool instant = false; int ret = -1; qemuDomainObjPrivate *priv = vm->privateData; virDomainSnapshotDef *snapdef = virDomainSnapshotObjGetDef(snap); @@ -1442,13 +1443,25 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PMSUSPENDED) { pmsuspended = true; } else if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { + if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT) { + if (!qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_BACKGROUND_SNAPSHOT)) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Migration option 'background-snapshot'" + " is not supported by QEMU binary")); + goto cleanup; + } + + instant = true; + } + /* For full system external snapshots (those with memory), the guest * must pause (either by libvirt up front, or by qemu after * _LIVE converges). */ if (memory) resume = true; - if (memory && !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_LIVE)) { + if (memory && + (!(flags & VIR_DOMAIN_SNAPSHOT_CREATE_LIVE) || instant)) { if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SNAPSHOT, QEMU_ASYNC_JOB_SNAPSHOT) < 0) goto cleanup; @@ -1472,21 +1485,39 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, if (memory) { memory_existing = virFileExists(snapdef->memorysnapshotfile); - if (qemuSnapshotSaveMemory(driver, vm, snapdef, resume, false, cfg) < 0) - goto cleanup; + if (instant) { + if ((ret = qemuSnapshotCreateActiveExternalDisks(vm, snap, + blockNamedNodeData, flags, + QEMU_ASYNC_JOB_SNAPSHOT)) < 0) + goto cleanup; - /* the memory image was created, remove it on errors */ - if (!memory_existing) - memory_unlink = true; + if (qemuSnapshotSaveMemory(driver, vm, snapdef, resume, true, cfg) < 0) + goto cleanup; - } + /* the memory image was created, remove it on errors */ + if (!memory_existing) + memory_unlink = true; + } else { + if (qemuSnapshotSaveMemory(driver, vm, snapdef, resume, false, cfg) < 0) + goto cleanup; - /* the domain is now paused if a memory snapshot was requested */ + /* the memory image was created, remove it on errors */ + if (!memory_existing) + memory_unlink = true; - if ((ret = qemuSnapshotCreateActiveExternalDisks(vm, snap, - blockNamedNodeData, flags, - QEMU_ASYNC_JOB_SNAPSHOT)) < 0) - goto cleanup; + /* the domain is now paused if a memory snapshot was requested */ + if ((ret = qemuSnapshotCreateActiveExternalDisks(vm, snap, + blockNamedNodeData, flags, + QEMU_ASYNC_JOB_SNAPSHOT)) < 0) + goto cleanup; + } + } else { + /* the domain is now paused if a memory snapshot was requested */ + if ((ret = qemuSnapshotCreateActiveExternalDisks(vm, snap, + blockNamedNodeData, flags, + QEMU_ASYNC_JOB_SNAPSHOT)) < 0) + goto cleanup; + } /* the snapshot is complete now */ if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT) { @@ -1575,7 +1606,8 @@ qemuSnapshotCreateXML(virDomainPtr domain, VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE | VIR_DOMAIN_SNAPSHOT_CREATE_ATOMIC | VIR_DOMAIN_SNAPSHOT_CREATE_LIVE | - VIR_DOMAIN_SNAPSHOT_CREATE_VALIDATE, NULL); + VIR_DOMAIN_SNAPSHOT_CREATE_VALIDATE | + VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT, NULL); VIR_REQUIRE_FLAG_RET(VIR_DOMAIN_SNAPSHOT_CREATE_QUIESCE, VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY, @@ -1584,6 +1616,16 @@ qemuSnapshotCreateXML(virDomainPtr domain, VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE, NULL); + VIR_EXCLUSIVE_FLAGS_RET(VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT, + VIR_DOMAIN_SNAPSHOT_CREATE_DISK_ONLY, + NULL); + VIR_EXCLUSIVE_FLAGS_RET(VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT, + VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE, + NULL); + VIR_REQUIRE_FLAG_RET(VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT, + VIR_DOMAIN_SNAPSHOT_CREATE_LIVE, + NULL); + if ((redefine && !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_CURRENT)) || (flags & VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA)) update_current = false; -- 2.27.0

On Thu, Nov 11, 2021 at 11:55:53 +0300, Nikolay Shirokovskiy wrote:
Usual snapshot with memory of a running domain with first saves domain memory to disk and then make a disk snapshot. As result we get snapshot at moment in time much later then client asked for snapshot if domain memory is large. So basically you need to wait several minutes or even several tens of minutes before making guest unsafe changes.
Note that the API contract states that making changes is safe ONLY after the virDomainSnapshotCreate API returns success, and ...
This patch adds instant mode to snapshot with memory of a running domain. In this mode snapshot is done almost at the moment of client request. It does not depends of domain memory size. So client can proceed with unsafe changes immediately (We need an event to notify client though as snapshot API itself is still synchronous and API will finish when domain memory will be stored to disk).
... this doesn't really change that. This gives users a point after which changes done to the VM can be reverted IF the snapshot completes successfully. It can still fail and the point in time the user wished to capture will be lost forever. Users will still need to wait for the API to finish to be sure. This must me emphasised in the documentation. Also for users of the new mode won't have any benefit as virsh will be still waiting for the API to finish. In general the new mode is definitely better, but we must not oversell it.
I dared to call this snapshot mode instant instead of background as it named in QEMU. IMHO in case of libvirt API name background be confused with asynchronous snapshot API which is not true.
This is okay. Many features have different names in libvirt.
Instant mode basically just stops guest CPUs, makes disks snapshot and then start QEMU's background memory snapshot. Background here means if guest writes some region in memory then this memory first is written to disk so that eventually domain memory written to disk corresponds to moment of starting background snapshot.
Note that background snapshot starts guest CPUs right after snapshot start and do not stop CPUs after snapshot is finished unlikely to usual memory snapshots or migration. Nevertheless qemuSnapshotCreateActiveExternal calls qemuProcessStartCPUs in instant mode in order to lock domain images and other tasks we do not do in resume handler.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- include/libvirt/libvirt-domain-snapshot.h | 2 + src/qemu/qemu_snapshot.c | 68 ++++++++++++++++++----- 2 files changed, 57 insertions(+), 13 deletions(-)
diff --git a/include/libvirt/libvirt-domain-snapshot.h b/include/libvirt/libvirt-domain-snapshot.h index 90673ed0fb..2661ba2556 100644 --- a/include/libvirt/libvirt-domain-snapshot.h +++ b/include/libvirt/libvirt-domain-snapshot.h @@ -73,6 +73,8 @@ typedef enum { running */ VIR_DOMAIN_SNAPSHOT_CREATE_VALIDATE = (1 << 9), /* validate the XML against the schema */ + VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT = (1 << 10),/* snapshot at the moment + of call */ } virDomainSnapshotCreateFlags;
We generally require that additions to public API are in a separate patch. Additionally you must also add a broader description of the implications of the flag in the function docs for virDomainSnapshotCreateXML in libvirt-domain-snapshot.c.
/* Take a snapshot of the current VM state */ diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index b521634f2a..14c4a64d52 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1398,6 +1398,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, { virObjectEvent *event; bool resume = false; + bool instant = false; int ret = -1; qemuDomainObjPrivate *priv = vm->privateData; virDomainSnapshotDef *snapdef = virDomainSnapshotObjGetDef(snap); @@ -1442,13 +1443,25 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_PMSUSPENDED) { pmsuspended = true; } else if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { + if (flags & VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT) { + if (!qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_BACKGROUND_SNAPSHOT)) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Migration option 'background-snapshot'" + " is not supported by QEMU binary"));
Don't linebreak error messages. Also users don't need to know which qemu feature we are using and that it's a migration. "Instant snapshots are not supported by this QEMU binary". Also we'll probably need a check rejecting known-unsupported VM configs such as when hugepages are in use as I remember the error from qemu being extremely cryptic.
+ goto cleanup; + } + + instant = true; + } +

Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- docs/manpages/virsh.rst | 8 +++++++- tools/virsh-snapshot.c | 6 ++++++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index 5f5ccfeafe..64f0eec4a2 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -6948,7 +6948,7 @@ snapshot-create snapshot-create domain [xmlfile] {[--redefine [--current]] | [--no-metadata] [--halt] [--disk-only] [--reuse-external] - [--quiesce] [--atomic] [--live]} [--validate] + [--quiesce] [--atomic] [--live]} [--validate] [--instant] Create a snapshot for domain *domain* with the properties specified in *xmlfile*. Optionally, the *--validate* option can be passed to @@ -7010,6 +7010,12 @@ the guest is running. Both disk snapshot and domain memory snapshot are taken. This increases the size of the memory image of the external snapshot. This is currently supported only for full system external snapshots. +If *--instant* is specified then snapshot will be at the moment the +snapshot is called. Ordinary snapshot is done at the moment the call +is finished. The time difference can be significant if domain memory +is large. The flag only makes sense in case of snapshots with memory. + + Existence of snapshot metadata will prevent attempts to ``undefine`` a persistent guest. However, for transient domains, snapshot metadata is silently lost when the domain quits running (whether diff --git a/tools/virsh-snapshot.c b/tools/virsh-snapshot.c index 154e82b48b..72ed1b2a52 100644 --- a/tools/virsh-snapshot.c +++ b/tools/virsh-snapshot.c @@ -152,6 +152,10 @@ static const vshCmdOptDef opts_snapshot_create[] = { .type = VSH_OT_BOOL, .help = N_("validate the XML against the schema"), }, + {.name = "instant", + .type = VSH_OT_BOOL, + .help = N_("snapshot at the moment of call"), + }, {.name = NULL} }; @@ -183,6 +187,8 @@ cmdSnapshotCreate(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_LIVE; if (vshCommandOptBool(cmd, "validate")) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_VALIDATE; + if (vshCommandOptBool(cmd, "instant")) + flags |= VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT; if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) return false; -- 2.27.0

On Thu, Nov 11, 2021 at 11:55:54 +0300, Nikolay Shirokovskiy wrote:
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@virtuozzo.com> --- docs/manpages/virsh.rst | 8 +++++++- tools/virsh-snapshot.c | 6 ++++++ 2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index 5f5ccfeafe..64f0eec4a2 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -6948,7 +6948,7 @@ snapshot-create
snapshot-create domain [xmlfile] {[--redefine [--current]] | [--no-metadata] [--halt] [--disk-only] [--reuse-external] - [--quiesce] [--atomic] [--live]} [--validate] + [--quiesce] [--atomic] [--live]} [--validate] [--instant]
Create a snapshot for domain *domain* with the properties specified in *xmlfile*. Optionally, the *--validate* option can be passed to @@ -7010,6 +7010,12 @@ the guest is running. Both disk snapshot and domain memory snapshot are taken. This increases the size of the memory image of the external snapshot. This is currently supported only for full system external snapshots.
+If *--instant* is specified then snapshot will be at the moment the +snapshot is called. Ordinary snapshot is done at the moment the call
This is misleading and racy. Users must wait for the event to consider it at least somewhat safe. Virsh man page must not encourage assuming that it's already in the somewhat safe state.
+is finished. The time difference can be significant if domain memory +is large. The flag only makes sense in case of snapshots with memory. + + Existence of snapshot metadata will prevent attempts to ``undefine`` a persistent guest. However, for transient domains, snapshot metadata is silently lost when the domain quits running (whether diff --git a/tools/virsh-snapshot.c b/tools/virsh-snapshot.c index 154e82b48b..72ed1b2a52 100644 --- a/tools/virsh-snapshot.c +++ b/tools/virsh-snapshot.c @@ -152,6 +152,10 @@ static const vshCmdOptDef opts_snapshot_create[] = { .type = VSH_OT_BOOL, .help = N_("validate the XML against the schema"), }, + {.name = "instant", + .type = VSH_OT_BOOL, + .help = N_("snapshot at the moment of call"), + }, {.name = NULL} };
@@ -183,6 +187,8 @@ cmdSnapshotCreate(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_LIVE; if (vshCommandOptBool(cmd, "validate")) flags |= VIR_DOMAIN_SNAPSHOT_CREATE_VALIDATE; + if (vshCommandOptBool(cmd, "instant")) + flags |= VIR_DOMAIN_SNAPSHOT_CREATE_INSTANT;
The implementation in the qemu driver interlocks with the _LIVE flag. In virsh we mirror these interlocks as it then reports proper flag names to the user. See VSH_EXCLUSIVE_OPTION...
if (!(dom = virshCommandOptDomain(ctl, cmd, NULL))) return false; -- 2.27.0

On Thu, Nov 11, 2021 at 11:55:45 +0300, Nikolay Shirokovskiy wrote:
Hi, everyone.
The details for instant snapshot are given in main patch "qemu: support instant snapshots". Here I only want to mention two moments.
First as instant snapshot API is synchronous we need to add an event that snapshot moment in time is passed so that client is free to make changes to guest. This way client can make use of instant snapshot while memory is still being written to disk.
This is a nice replacement for --live. I had a dummy implementation for this but I put it aside for the following reasons. 1) It doesn't work with all VM configs. My original approach wanted to use it as a full replacement for the snapshot code. That wouldn't work. Yours uses a new flag which is okay as it sidesteps the issue. (One detail, docs don't mention the caveat, more on that inline). 2) The format change for instant load Below you mention that you want to change the image format for instant load. Introducing the feature now, without the image format change means that there will be a need for yet another flag switching to the new format. I'd strongly suggest you reconsider whether to push this now or just simply go with the full new format once it's ready in qemu.
Second there is work in progress on adding asynchronous revert to snapshot [1]. The idea is to write memory during snapshot in such a way that it is possible to do a revert to snapshot in background. Namely start guest on revert with only minimal amount of memory restored from disk and then restore memory in background in postcopy migration fashion. In order to make it possible snapshot memory should be written to disk not just as plain migration stream. Thus the [1] patch adds a special tool to write the migration stream during snapshot.
So if/when patch [1] will be merged I plan to add 'instant revert' mode to snapshot too.
[1] [RFC PATCH v2 0/7] migration/snapshot: External snapshot utility https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg03619.html
I'll add few review points inline. Also note that Pavel Hrdina might want to comment too as he is currently making good progress on making reversion of external snapshots work.

пт, 12 нояб. 2021 г. в 10:39, Peter Krempa <pkrempa@redhat.com>:
On Thu, Nov 11, 2021 at 11:55:45 +0300, Nikolay Shirokovskiy wrote:
Hi, everyone.
The details for instant snapshot are given in main patch "qemu: support instant snapshots". Here I only want to mention two moments.
First as instant snapshot API is synchronous we need to add an event that snapshot moment in time is passed so that client is free to make changes to guest. This way client can make use of instant snapshot while memory is still being written to disk.
This is a nice replacement for --live. I had a dummy implementation for this but I put it aside for the following reasons.
1) It doesn't work with all VM configs.
My original approach wanted to use it as a full replacement for the snapshot code. That wouldn't work.
Yours uses a new flag which is okay as it sidesteps the issue. (One detail, docs don't mention the caveat, more on that inline).
2) The format change for instant load
Below you mention that you want to change the image format for instant load. Introducing the feature now, without the image format change means that there will be a need for yet another flag switching to the new format.
I'd strongly suggest you reconsider whether to push this now or just simply go with the full new format once it's ready in qemu.
Hi, Peter. Thank you for your review. Ok, let's wait for new format in qemu. Nikolay
Second there is work in progress on adding asynchronous revert to snapshot [1]. The idea is to write memory during snapshot in such a way that it is possible to do a revert to snapshot in background. Namely start guest on revert with only minimal amount of memory restored from disk and then restore memory in background in postcopy migration fashion. In order to make it possible snapshot memory should be written to disk not just as plain migration stream. Thus the [1] patch adds a special tool to write the migration stream during snapshot.
So if/when patch [1] will be merged I plan to add 'instant revert' mode to snapshot too.
[1] [RFC PATCH v2 0/7] migration/snapshot: External snapshot utility https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg03619.html
I'll add few review points inline.
Also note that Pavel Hrdina might want to comment too as he is currently making good progress on making reversion of external snapshots work.
participants (2)
-
Nikolay Shirokovskiy
-
Peter Krempa