[PATCH 00/20] qemu: support mapped-ram+directio+mulitfd

This series is essentially V1 of a prior RFC [1] to support QEMU's mapped-ram stream format [2] and migration capability. Along with supporting mapped-ram, it implements a design approach we discussed for supporting parallel save/restore [3]. In summary, the approach is 1. Add mapped-ram migration capability 2. Steal an element from save header 'unused' for a 'features' variable and bump save version to 3. 3. Add /etc/libvirt/qemu.conf knob for the save format version, defaulting to latest v3 4. Use v3 (aka mapped-ram) by default 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2 6. include: Define constants for parallel save/restore 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2 8. qemu: Add support for parallel restore. Implies mapped-ram. Reject if v2 9. tools: add parallel parameter to virsh save command 10. tools: add parallel parameter to virsh restore command With this series, saving and restoring using mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf. To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to a value >= 1 - QEMU must be provided an fdset containing the migration fd(s) - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start reading or writing the data stream, e.g. {"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}} The mapped-ram stream, in conjunction with direct IO and multifd, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1). VM: 32G RAM, 1 vcpu, idle (shortly after boot) | save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472 -----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | | | | + multifd-channels=8 | 4.421s | 0.845s | 34368554318 | 1774312 ------------------------------------------------------------------- VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory | save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+--------- legacy | 25.800s | 14.332s | 33154309983 | 64754512 -----------------------+---------+---------+--------------+--------- mapped-ram | 18.742s | 15.027s | 34368559228 | 64617160 -----------------------+---------+---------+--------------+--------- legacy + direct IO | 13.115s | 18.050s | 33154310496 | 64754520 -----------------------+---------+---------+--------------+--------- mapped-ram + direct IO | 13.623s | 15.959s | 34368557392 | 64662040 -----------------------+-------- +---------+--------------+--------- mapped-ram + direct IO | | | | + multifd-channels=8 | 6.994s | 6.470s | 34368554980 | 64665776 -------------------------------------------------------------------- As can be seen from the tables, one caveat of mapped-ram is the logical file size of a saved image is basically equivalent to the VM memory size. Note however that mapped-ram typically uses fewer blocks on disk. Support for mapped-ram+direct-io only recently landed in upstream QEMU and will first appear in the 9.1 release, which may complicate merging support in libvirt. Specifically, I'm not sure how to detect if the combination is supported by QEMU. Suggestions welcomed. Similar to the RFC, V1 ignores compression. libvirt currently supports compression by connecting the output of QEMU's save stream to the specified compression program via a pipe. This approach is incompatible with mapped-ram since the fd provided to QEMU must be seekable. In general, we can consider mapped-ram and compression incompatible and document they cannot be used together. [1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/EF6Y... [2] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp... [3] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/K4BD... Claudio Fontana (2): include: Define constants for parallel save/restore tools: add parallel parameter to virsh restore command Jim Fehlig (17): lib: virDomainSaveParams: Ensure absolute save path qemu_fd: Add function to retrieve fdset ID qemu: Add function to check capability in migration params qemu: Add function to get bool value from migration params qemu: Add mapped-ram migration capability qemu: Add function to get migration params for save qemu: QEMU_SAVE_VERSION: Bump to version 3 qemu: conf: Add setting for save image version qemu: Add helper function for creating save image fd qemu: Add support for mapped-ram on save qemu: Decompose qemuSaveImageOpen qemu: Move creation of qemuProcessIncomingDef struct qemu: Apply migration parameters in qemuMigrationDstRun qemu: Add support for mapped-ram on restore qemu: Support O_DIRECT with mapped-ram on save qemu: Support O_DIRECT with mapped-ram on restore qemu: Add support for parallel save and restore Li Zhang (1): tools: add parallel parameter to virsh save command docs/manpages/virsh.rst | 9 +- include/libvirt/libvirt-domain.h | 13 ++ src/libvirt-domain.c | 52 +++++-- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf.in | 6 + src/qemu/qemu_conf.c | 16 +++ src/qemu/qemu_conf.h | 5 + src/qemu/qemu_driver.c | 104 +++++++++----- src/qemu/qemu_fd.c | 18 +++ src/qemu/qemu_fd.h | 3 + src/qemu/qemu_migration.c | 192 +++++++++++++++++-------- src/qemu/qemu_migration.h | 9 +- src/qemu/qemu_migration_params.c | 86 ++++++++++++ src/qemu/qemu_migration_params.h | 17 +++ src/qemu/qemu_monitor.c | 39 ++++++ src/qemu/qemu_monitor.h | 5 + src/qemu/qemu_process.c | 120 +++++++++++----- src/qemu/qemu_process.h | 19 ++- src/qemu/qemu_saveimage.c | 216 ++++++++++++++++++++--------- src/qemu/qemu_saveimage.h | 35 +++-- src/qemu/qemu_snapshot.c | 26 ++-- src/qemu/test_libvirtd_qemu.aug.in | 1 + tools/virsh-domain.c | 79 +++++++++-- 23 files changed, 827 insertions(+), 244 deletions(-) -- 2.35.3

When invoking virDomainSaveParams with a relative path, the image is saved to the daemon's CWD, which in most cases is '/'. Ensure a relative path is converted to absolute before invoking the driver 'domainSaveParams' function. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/libvirt-domain.c | 46 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 7c6b93963c..5aedae1b49 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -1020,6 +1020,11 @@ virDomainSaveParams(virDomainPtr domain, unsigned int flags) { virConnectPtr conn; + virTypedParameterPtr params_copy = NULL; + int nparams_copy = 0; + const char *to = NULL; + g_autofree char *absolute_to = NULL; + int ret = -1; VIR_DOMAIN_DEBUG(domain, "params=%p, nparams=%d, flags=0x%x", params, nparams, flags); @@ -1030,23 +1035,46 @@ virDomainSaveParams(virDomainPtr domain, virCheckDomainReturn(domain, -1); conn = domain->conn; - virCheckReadOnlyGoto(conn->flags, error); + virCheckReadOnlyGoto(conn->flags, done); VIR_EXCLUSIVE_FLAGS_GOTO(VIR_DOMAIN_SAVE_RUNNING, VIR_DOMAIN_SAVE_PAUSED, - error); + done); + + /* We must absolutize the file path as the save is done out of process */ + virTypedParamsCopy(¶ms_copy, params, nparams); + nparams_copy = nparams; + if (virTypedParamsGetString(params_copy, nparams_copy, + VIR_DOMAIN_SAVE_PARAM_FILE, &to) < 0) + goto done; + + if (to) { + if (!(absolute_to = g_canonicalize_filename(to, NULL))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("could not build absolute output file path")); + goto done; + } + + if (virTypedParamsReplaceString(¶ms_copy, &nparams_copy, + VIR_DOMAIN_SAVE_PARAM_FILE, + absolute_to) < 0) + goto done; + } if (conn->driver->domainSaveParams) { - if (conn->driver->domainSaveParams(domain, params, nparams, flags) < 0) - goto error; - return 0; + if (conn->driver->domainSaveParams(domain, params_copy, nparams_copy, flags) < 0) + goto done; + ret = 0; + } else { + virReportUnsupportedError(); } - virReportUnsupportedError(); + done: + if (ret < 0) + virDispatchError(domain->conn); + virTypedParamsFree(params_copy, nparams_copy); - error: - virDispatchError(domain->conn); - return -1; + return ret; } -- 2.35.3

On Thu, Aug 08, 2024 at 05:37:54PM -0600, Jim Fehlig via Devel wrote:
When invoking virDomainSaveParams with a relative path, the image is saved to the daemon's CWD, which in most cases is '/'. Ensure a relative path is converted to absolute before invoking the driver 'domainSaveParams' function.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/libvirt-domain.c | 46 +++++++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 9 deletions(-)
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 7c6b93963c..5aedae1b49 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -1020,6 +1020,11 @@ virDomainSaveParams(virDomainPtr domain, unsigned int flags)
The same change appears to be needed in virDomainRestoreParams ?
{ virConnectPtr conn; + virTypedParameterPtr params_copy = NULL; + int nparams_copy = 0; + const char *to = NULL; + g_autofree char *absolute_to = NULL; + int ret = -1;
VIR_DOMAIN_DEBUG(domain, "params=%p, nparams=%d, flags=0x%x", params, nparams, flags); @@ -1030,23 +1035,46 @@ virDomainSaveParams(virDomainPtr domain, virCheckDomainReturn(domain, -1); conn = domain->conn;
- virCheckReadOnlyGoto(conn->flags, error); + virCheckReadOnlyGoto(conn->flags, done);
VIR_EXCLUSIVE_FLAGS_GOTO(VIR_DOMAIN_SAVE_RUNNING, VIR_DOMAIN_SAVE_PAUSED, - error); + done); + + /* We must absolutize the file path as the save is done out of process */ + virTypedParamsCopy(¶ms_copy, params, nparams); + nparams_copy = nparams; + if (virTypedParamsGetString(params_copy, nparams_copy, + VIR_DOMAIN_SAVE_PARAM_FILE, &to) < 0) + goto done; + + if (to) { + if (!(absolute_to = g_canonicalize_filename(to, NULL))) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("could not build absolute output file path")); + goto done; + } + + if (virTypedParamsReplaceString(¶ms_copy, &nparams_copy, + VIR_DOMAIN_SAVE_PARAM_FILE, + absolute_to) < 0) + goto done; + }
if (conn->driver->domainSaveParams) { - if (conn->driver->domainSaveParams(domain, params, nparams, flags) < 0) - goto error; - return 0; + if (conn->driver->domainSaveParams(domain, params_copy, nparams_copy, flags) < 0) + goto done; + ret = 0; + } else { + virReportUnsupportedError(); }
- virReportUnsupportedError(); + done: + if (ret < 0) + virDispatchError(domain->conn); + virTypedParamsFree(params_copy, nparams_copy);
- error: - virDispatchError(domain->conn); - return -1; + return ret; }
-- 2.35.3
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Add new function qemuFDPassGetId() for retrieving the fdset ID of provided qemuFDPass object. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_fd.c | 18 ++++++++++++++++++ src/qemu/qemu_fd.h | 3 +++ 2 files changed, 21 insertions(+) diff --git a/src/qemu/qemu_fd.c b/src/qemu/qemu_fd.c index e847056573..3ae4a87a20 100644 --- a/src/qemu/qemu_fd.c +++ b/src/qemu/qemu_fd.c @@ -276,6 +276,24 @@ qemuFDPassGetPath(qemuFDPass *fdpass) } +/** + * qemuFDPassGetId: + * @fdpass: The fd passing helper struct + * @id: An out parameter for providing the fdset ID + * + * Returns 0 on success, -1 on error. + */ +int +qemuFDPassGetId(qemuFDPass *fdpass, unsigned int *id) +{ + if (!fdpass) + return -1; + + *id = fdpass->fdSetID; + return 0; +} + + struct _qemuFDPassDirect { int fd; char *name; diff --git a/src/qemu/qemu_fd.h b/src/qemu/qemu_fd.h index cd0ff2c690..dfcd5fb754 100644 --- a/src/qemu/qemu_fd.h +++ b/src/qemu/qemu_fd.h @@ -58,6 +58,9 @@ qemuFDPassTransferMonitorRollback(qemuFDPass *fdpass, const char * qemuFDPassGetPath(qemuFDPass *fdpass); +int +qemuFDPassGetId(qemuFDPass *fdpass, unsigned int *id); + typedef struct _qemuFDPassDirect qemuFDPassDirect; -- 2.35.3

On Thu, Aug 08, 2024 at 05:37:55PM -0600, Jim Fehlig via Devel wrote:
Add new function qemuFDPassGetId() for retrieving the fdset ID of provided qemuFDPass object.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_fd.c | 18 ++++++++++++++++++ src/qemu/qemu_fd.h | 3 +++ 2 files changed, 21 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Add new function qemuMigrationParamsCapEnabled() to check if a capability is set in the caller-provided migration parameters. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration_params.c | 16 ++++++++++++++++ src/qemu/qemu_migration_params.h | 4 ++++ 2 files changed, 20 insertions(+) diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 98822012cc..c3c9120c22 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -1303,6 +1303,22 @@ qemuMigrationParamsGetULL(qemuMigrationParams *migParams, } +/** + * Returns true if @cap is enabled in @migParams, false otherwise. + */ +bool +qemuMigrationParamsCapEnabled(qemuMigrationParams *migParams, + qemuMigrationCapability cap) +{ + bool enabled = false; + + if (migParams) + ignore_value(virBitmapGetBit(migParams->caps, cap, &enabled)); + + return enabled; +} + + /** * qemuMigrationParamsCheck: * diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index df67f1fb92..020071e8ef 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -136,6 +136,10 @@ qemuMigrationParamsGetULL(qemuMigrationParams *migParams, qemuMigrationParam param, unsigned long long *value); +bool +qemuMigrationParamsCapEnabled(qemuMigrationParams *migParams, + qemuMigrationCapability cap); + void qemuMigrationParamsSetBlockDirtyBitmapMapping(qemuMigrationParams *migParams, virJSONValue **params); -- 2.35.3

On Thu, Aug 08, 2024 at 05:37:56PM -0600, Jim Fehlig via Devel wrote:
Add new function qemuMigrationParamsCapEnabled() to check if a capability is set in the caller-provided migration parameters.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration_params.c | 16 ++++++++++++++++ src/qemu/qemu_migration_params.h | 4 ++++ 2 files changed, 20 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration_params.c | 18 ++++++++++++++++++ src/qemu/qemu_migration_params.h | 5 +++++ 2 files changed, 23 insertions(+) diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index c3c9120c22..daa52269f4 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -1303,6 +1303,24 @@ qemuMigrationParamsGetULL(qemuMigrationParams *migParams, } +int +qemuMigrationParamsGetBool(qemuMigrationParams *migParams, + qemuMigrationParam param, + bool *value) +{ + if (!migParams || !value) + return 0; + + if (qemuMigrationParamsCheckType(param, QEMU_MIGRATION_PARAM_TYPE_BOOL) < 0) + return -1; + + *value = migParams->params[param].set ? + migParams->params[param].value.b : false; + + return 0; +} + + /** * Returns true if @cap is enabled in @migParams, false otherwise. */ diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index 020071e8ef..64b2dde59b 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -136,6 +136,11 @@ qemuMigrationParamsGetULL(qemuMigrationParams *migParams, qemuMigrationParam param, unsigned long long *value); +int +qemuMigrationParamsGetBool(qemuMigrationParams *migParams, + qemuMigrationParam param, + bool *value); + bool qemuMigrationParamsCapEnabled(qemuMigrationParams *migParams, qemuMigrationCapability cap); -- 2.35.3

On Thu, Aug 08, 2024 at 05:37:57PM -0600, Jim Fehlig via Devel wrote:
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration_params.c | 18 ++++++++++++++++++ src/qemu/qemu_migration_params.h | 5 +++++ 2 files changed, 23 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Add the mapped-ram migration capability introduced in QEMU 9.0. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration_params.c | 1 + src/qemu/qemu_migration_params.h | 1 + 2 files changed, 2 insertions(+) diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index daa52269f4..03937df2d3 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -106,6 +106,7 @@ VIR_ENUM_IMPL(qemuMigrationCapability, "zero-copy-send", "postcopy-preempt", "switchover-ack", + "mapped-ram", ); diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index 64b2dde59b..91a1965a74 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -42,6 +42,7 @@ typedef enum { QEMU_MIGRATION_CAP_ZERO_COPY_SEND, QEMU_MIGRATION_CAP_POSTCOPY_PREEMPT, QEMU_MIGRATION_CAP_SWITCHOVER_ACK, + QEMU_MIGRATION_CAP_MAPPED_RAM, QEMU_MIGRATION_CAP_LAST } qemuMigrationCapability; -- 2.35.3

On Thu, Aug 08, 2024 at 05:37:58PM -0600, Jim Fehlig via Devel wrote:
Add the mapped-ram migration capability introduced in QEMU 9.0.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration_params.c | 1 + src/qemu/qemu_migration_params.h | 1 + 2 files changed, 2 insertions(+)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Introduce qemuMigrationParamsForSave() to create a qemuMigrationParams object initialized with appropriate migration capabilities and parameters for a save operation. Note that mapped-ram capability also requires the multifd capability. For now, the number of multifd channels is set to 1. Future work to support parallel save/restore can set the number of channels to a user-specified value. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration.c | 2 +- src/qemu/qemu_migration_params.c | 21 +++++++++++++++++++++ src/qemu/qemu_migration_params.h | 3 +++ 3 files changed, 25 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 4fd7a0aafb..c1bdaa273b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -6965,7 +6965,7 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, /* Increase migration bandwidth to unlimited since target is a file. * Failure to change migration speed is not fatal. */ - if (!(migParams = qemuMigrationParamsNew())) + if (!(migParams = qemuMigrationParamsForSave(false))) return -1; if (qemuMigrationParamsSetULL(migParams, diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 03937df2d3..503d6165b0 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -783,6 +783,27 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params, } +qemuMigrationParams * +qemuMigrationParamsForSave(bool mappedRam) +{ + g_autoptr(qemuMigrationParams) saveParams = NULL; + + if (!(saveParams = qemuMigrationParamsNew())) + return NULL; + + if (mappedRam) { + if (virBitmapSetBit(saveParams->caps, QEMU_MIGRATION_CAP_MAPPED_RAM) < 0) + return NULL; + if (virBitmapSetBit(saveParams->caps, QEMU_MIGRATION_CAP_MULTIFD) < 0) + return NULL; + saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = 1; + saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].set = true; + } + + return g_steal_pointer(&saveParams); +} + + int qemuMigrationParamsDump(qemuMigrationParams *migParams, virTypedParameterPtr *params, diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index 91a1965a74..fe239d9a8f 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -86,6 +86,9 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params, unsigned int flags, qemuMigrationParty party); +qemuMigrationParams * +qemuMigrationParamsForSave(bool mappedRam); + int qemuMigrationParamsDump(qemuMigrationParams *migParams, virTypedParameterPtr *params, -- 2.35.3

On Thu, Aug 08, 2024 at 05:37:59PM -0600, Jim Fehlig via Devel wrote:
Introduce qemuMigrationParamsForSave() to create a qemuMigrationParams object initialized with appropriate migration capabilities and parameters for a save operation.
Note that mapped-ram capability also requires the multifd capability. For now, the number of multifd channels is set to 1. Future work to support parallel save/restore can set the number of channels to a user-specified value.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration.c | 2 +- src/qemu/qemu_migration_params.c | 21 +++++++++++++++++++++ src/qemu/qemu_migration_params.h | 3 +++ 3 files changed, 25 insertions(+), 1 deletion(-)
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

QEMU's new mapped-ram stream format [1] is incompatible with the existing sequential stream format. An older libvirt+QEMU that does not support mapped-ram must not attempt to restore a mapped-ram saved image. Currently the only way to achieve this is to bump QEMU_SAVE_VERSION. To avoid future version bumps, add a new 'features' element to the saved image header. The element is used now to indicate the use of mapped-ram feature, and provides a mechanism to support future save image features. [1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp... Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_saveimage.c | 7 +++++++ src/qemu/qemu_saveimage.h | 9 +++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 018ab5a222..50fec33f54 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -74,6 +74,7 @@ qemuSaveImageBswapHeader(virQEMUSaveHeader *hdr) hdr->was_running = GUINT32_SWAP_LE_BE(hdr->was_running); hdr->compressed = GUINT32_SWAP_LE_BE(hdr->compressed); hdr->cookieOffset = GUINT32_SWAP_LE_BE(hdr->cookieOffset); + hdr->features = GUINT32_SWAP_LE_BE(hdr->features); } @@ -637,6 +638,12 @@ qemuSaveImageOpen(virQEMUDriver *driver, return -1; } + if (header->features && header->features != QEMU_SAVE_FEATURE_MAPPED_RAM) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("image contains unsupported features)")); + return -1; + } + if (header->cookieOffset) xml_len = header->cookieOffset; else diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index e541792153..9dd7de292d 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -28,10 +28,14 @@ */ #define QEMU_SAVE_MAGIC "LibvirtQemudSave" #define QEMU_SAVE_PARTIAL "LibvirtQemudPart" -#define QEMU_SAVE_VERSION 2 +#define QEMU_SAVE_VERSION 3 G_STATIC_ASSERT(sizeof(QEMU_SAVE_MAGIC) == sizeof(QEMU_SAVE_PARTIAL)); +typedef enum { + QEMU_SAVE_FEATURE_MAPPED_RAM = 1 << 0, +} qemuSaveFeatures; + typedef struct _virQEMUSaveHeader virQEMUSaveHeader; struct _virQEMUSaveHeader { char magic[sizeof(QEMU_SAVE_MAGIC)-1]; @@ -40,7 +44,8 @@ struct _virQEMUSaveHeader { uint32_t was_running; uint32_t compressed; uint32_t cookieOffset; - uint32_t unused[14]; + uint32_t features; + uint32_t unused[13]; }; -- 2.35.3

On Thu, Aug 08, 2024 at 05:38:00PM -0600, Jim Fehlig via Devel wrote:
QEMU's new mapped-ram stream format [1] is incompatible with the existing sequential stream format. An older libvirt+QEMU that does not support mapped-ram must not attempt to restore a mapped-ram saved image. Currently the only way to achieve this is to bump QEMU_SAVE_VERSION.
To avoid future version bumps, add a new 'features' element to the saved image header. The element is used now to indicate the use of mapped-ram feature, and provides a mechanism to support future save image features.
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp...
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_saveimage.c | 7 +++++++ src/qemu/qemu_saveimage.h | 9 +++++++-- 2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 018ab5a222..50fec33f54 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -74,6 +74,7 @@ qemuSaveImageBswapHeader(virQEMUSaveHeader *hdr) hdr->was_running = GUINT32_SWAP_LE_BE(hdr->was_running); hdr->compressed = GUINT32_SWAP_LE_BE(hdr->compressed); hdr->cookieOffset = GUINT32_SWAP_LE_BE(hdr->cookieOffset); + hdr->features = GUINT32_SWAP_LE_BE(hdr->features); }
@@ -637,6 +638,12 @@ qemuSaveImageOpen(virQEMUDriver *driver, return -1; }
+ if (header->features && header->features != QEMU_SAVE_FEATURE_MAPPED_RAM) {
Can simplify: if ((header & ~(QEMU_SAVE_FEATURE_MAPPED_RAM)) != 0) and make the code patter future proof by ancitipating: if ((header & ~(QEMU_SAVE_FEATURE_MAPPED_RAM | QEMU_SAVE_FEATURE_BLAH)) != 0)
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("image contains unsupported features)")); + return -1; + } + if (header->cookieOffset) xml_len = header->cookieOffset; else diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index e541792153..9dd7de292d 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -28,10 +28,14 @@ */ #define QEMU_SAVE_MAGIC "LibvirtQemudSave" #define QEMU_SAVE_PARTIAL "LibvirtQemudPart" -#define QEMU_SAVE_VERSION 2 +#define QEMU_SAVE_VERSION 3
This will make *all* save images incompatible with old libvirt, even if they're not using mapped ram. This feels sub-optimal to me. I figure we should use v2, unless the new header files are non-zero
G_STATIC_ASSERT(sizeof(QEMU_SAVE_MAGIC) == sizeof(QEMU_SAVE_PARTIAL));
+typedef enum { + QEMU_SAVE_FEATURE_MAPPED_RAM = 1 << 0, +} qemuSaveFeatures; + typedef struct _virQEMUSaveHeader virQEMUSaveHeader; struct _virQEMUSaveHeader { char magic[sizeof(QEMU_SAVE_MAGIC)-1]; @@ -40,7 +44,8 @@ struct _virQEMUSaveHeader { uint32_t was_running; uint32_t compressed; uint32_t cookieOffset; - uint32_t unused[14]; + uint32_t features;
Some features might be possible to restore from, even if the current impl doesn't know about it. In qcow2 they added "compatible features" and "incompatible featurs" fields to reflect that some are backwards compatible. So how about we do the same here with two 32-bit uints.
+ uint32_t unused[13]; };
-- 2.35.3
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Oct 10, 2024 at 02:06:47PM +0100, Daniel P. Berrangé wrote:
On Thu, Aug 08, 2024 at 05:38:00PM -0600, Jim Fehlig via Devel wrote:
QEMU's new mapped-ram stream format [1] is incompatible with the existing sequential stream format. An older libvirt+QEMU that does not support mapped-ram must not attempt to restore a mapped-ram saved image. Currently the only way to achieve this is to bump QEMU_SAVE_VERSION.
To avoid future version bumps, add a new 'features' element to the saved image header. The element is used now to indicate the use of mapped-ram feature, and provides a mechanism to support future save image features.
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp...
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_saveimage.c | 7 +++++++ src/qemu/qemu_saveimage.h | 9 +++++++-- 2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 018ab5a222..50fec33f54 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -74,6 +74,7 @@ qemuSaveImageBswapHeader(virQEMUSaveHeader *hdr) hdr->was_running = GUINT32_SWAP_LE_BE(hdr->was_running); hdr->compressed = GUINT32_SWAP_LE_BE(hdr->compressed); hdr->cookieOffset = GUINT32_SWAP_LE_BE(hdr->cookieOffset); + hdr->features = GUINT32_SWAP_LE_BE(hdr->features); }
@@ -637,6 +638,12 @@ qemuSaveImageOpen(virQEMUDriver *driver, return -1; }
I'm not sure the code above this point is backwards compatible with v2. The code does 'saferead(.... sizeof(*header))', which will be over-reading data if the file was saved with v2.....
+ if (header->features && header->features != QEMU_SAVE_FEATURE_MAPPED_RAM) {
...and this check will be validating whatever random data followed the header in old libvirt save images. We must only validate 'features' for v3 images.
Can simplify:
if ((header & ~(QEMU_SAVE_FEATURE_MAPPED_RAM)) != 0)
and make the code patter future proof by ancitipating:
if ((header & ~(QEMU_SAVE_FEATURE_MAPPED_RAM | QEMU_SAVE_FEATURE_BLAH)) != 0)
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("image contains unsupported features)")); + return -1; + } + if (header->cookieOffset) xml_len = header->cookieOffset; else diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index e541792153..9dd7de292d 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -28,10 +28,14 @@ */ #define QEMU_SAVE_MAGIC "LibvirtQemudSave" #define QEMU_SAVE_PARTIAL "LibvirtQemudPart" -#define QEMU_SAVE_VERSION 2 +#define QEMU_SAVE_VERSION 3
This will make *all* save images incompatible with old libvirt, even if they're not using mapped ram. This feels sub-optimal to me. I figure we should use v2, unless the new header files are non-zero
Perhaps just don't change this version until the next patch
G_STATIC_ASSERT(sizeof(QEMU_SAVE_MAGIC) == sizeof(QEMU_SAVE_PARTIAL));
+typedef enum { + QEMU_SAVE_FEATURE_MAPPED_RAM = 1 << 0, +} qemuSaveFeatures; + typedef struct _virQEMUSaveHeader virQEMUSaveHeader; struct _virQEMUSaveHeader { char magic[sizeof(QEMU_SAVE_MAGIC)-1]; @@ -40,7 +44,8 @@ struct _virQEMUSaveHeader { uint32_t was_running; uint32_t compressed; uint32_t cookieOffset; - uint32_t unused[14]; + uint32_t features;
Some features might be possible to restore from, even if the current impl doesn't know about it.
In qcow2 they added "compatible features" and "incompatible featurs" fields to reflect that some are backwards compatible. So how about we do the same here with two 32-bit uints.
+ uint32_t unused[13]; };
-- 2.35.3
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 10/10/24 07:06, Daniel P. Berrangé wrote:
On Thu, Aug 08, 2024 at 05:38:00PM -0600, Jim Fehlig via Devel wrote:
QEMU's new mapped-ram stream format [1] is incompatible with the existing sequential stream format. An older libvirt+QEMU that does not support mapped-ram must not attempt to restore a mapped-ram saved image. Currently the only way to achieve this is to bump QEMU_SAVE_VERSION.
To avoid future version bumps, add a new 'features' element to the saved image header. The element is used now to indicate the use of mapped-ram feature, and provides a mechanism to support future save image features.
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp...
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_saveimage.c | 7 +++++++ src/qemu/qemu_saveimage.h | 9 +++++++-- 2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 018ab5a222..50fec33f54 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -74,6 +74,7 @@ qemuSaveImageBswapHeader(virQEMUSaveHeader *hdr) hdr->was_running = GUINT32_SWAP_LE_BE(hdr->was_running); hdr->compressed = GUINT32_SWAP_LE_BE(hdr->compressed); hdr->cookieOffset = GUINT32_SWAP_LE_BE(hdr->cookieOffset); + hdr->features = GUINT32_SWAP_LE_BE(hdr->features); }
@@ -637,6 +638,12 @@ qemuSaveImageOpen(virQEMUDriver *driver, return -1; }
+ if (header->features && header->features != QEMU_SAVE_FEATURE_MAPPED_RAM) {
Can simplify:
if ((header & ~(QEMU_SAVE_FEATURE_MAPPED_RAM)) != 0)
and make the code patter future proof by ancitipating:
if ((header & ~(QEMU_SAVE_FEATURE_MAPPED_RAM | QEMU_SAVE_FEATURE_BLAH)) != 0)
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("image contains unsupported features)")); + return -1; + } + if (header->cookieOffset) xml_len = header->cookieOffset; else diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index e541792153..9dd7de292d 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -28,10 +28,14 @@ */ #define QEMU_SAVE_MAGIC "LibvirtQemudSave" #define QEMU_SAVE_PARTIAL "LibvirtQemudPart" -#define QEMU_SAVE_VERSION 2 +#define QEMU_SAVE_VERSION 3
This will make *all* save images incompatible with old libvirt, even if they're not using mapped ram. This feels sub-optimal to me. I figure we should use v2, unless the new header files are non-zero
G_STATIC_ASSERT(sizeof(QEMU_SAVE_MAGIC) == sizeof(QEMU_SAVE_PARTIAL));
+typedef enum { + QEMU_SAVE_FEATURE_MAPPED_RAM = 1 << 0, +} qemuSaveFeatures; + typedef struct _virQEMUSaveHeader virQEMUSaveHeader; struct _virQEMUSaveHeader { char magic[sizeof(QEMU_SAVE_MAGIC)-1]; @@ -40,7 +44,8 @@ struct _virQEMUSaveHeader { uint32_t was_running; uint32_t compressed; uint32_t cookieOffset; - uint32_t unused[14]; + uint32_t features;
Some features might be possible to restore from, even if the current impl doesn't know about it.
In qcow2 they added "compatible features" and "incompatible featurs" fields to reflect that some are backwards compatible. So how about we do the same here with two 32-bit uints.
Unfortunately we'll still need the version bump to prevent older libvirt from restoring a mapped-ram saved image, but we can avoid adding the 'features' field if we use the existing 'compressed' field (renamed to 'format', see commit bd6d7ebf62 and related b0dc8a923d) to control mapped-ram. And if we use the existing field for mapped-ram, do we even need a version setting in qemu.conf? Martin seemed receptive to using *_image_format for mapped-ram. What are your opinions on that? Regards, Jim

Hi Daniel, Have you had time to consider the idea of using the existing *_image_format settings in qemu.conf to control mapped-ram? I'd like to know if that's a possible approach before reworking this patch. Regards, Jim On 10/10/24 15:03, Jim Fehlig wrote:
On 10/10/24 07:06, Daniel P. Berrangé wrote:
On Thu, Aug 08, 2024 at 05:38:00PM -0600, Jim Fehlig via Devel wrote:
QEMU's new mapped-ram stream format [1] is incompatible with the existing sequential stream format. An older libvirt+QEMU that does not support mapped-ram must not attempt to restore a mapped-ram saved image. Currently the only way to achieve this is to bump QEMU_SAVE_VERSION.
To avoid future version bumps, add a new 'features' element to the saved image header. The element is used now to indicate the use of mapped-ram feature, and provides a mechanism to support future save image features.
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp...
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_saveimage.c | 7 +++++++ src/qemu/qemu_saveimage.h | 9 +++++++-- 2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 018ab5a222..50fec33f54 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -74,6 +74,7 @@ qemuSaveImageBswapHeader(virQEMUSaveHeader *hdr) hdr->was_running = GUINT32_SWAP_LE_BE(hdr->was_running); hdr->compressed = GUINT32_SWAP_LE_BE(hdr->compressed); hdr->cookieOffset = GUINT32_SWAP_LE_BE(hdr->cookieOffset); + hdr->features = GUINT32_SWAP_LE_BE(hdr->features); } @@ -637,6 +638,12 @@ qemuSaveImageOpen(virQEMUDriver *driver, return -1; } + if (header->features && header->features != QEMU_SAVE_FEATURE_MAPPED_RAM) {
Can simplify:
if ((header & ~(QEMU_SAVE_FEATURE_MAPPED_RAM)) != 0)
and make the code patter future proof by ancitipating:
if ((header & ~(QEMU_SAVE_FEATURE_MAPPED_RAM | QEMU_SAVE_FEATURE_BLAH)) != 0)
+ virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("image contains unsupported features)")); + return -1; + } + if (header->cookieOffset) xml_len = header->cookieOffset; else diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index e541792153..9dd7de292d 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -28,10 +28,14 @@ */ #define QEMU_SAVE_MAGIC "LibvirtQemudSave" #define QEMU_SAVE_PARTIAL "LibvirtQemudPart" -#define QEMU_SAVE_VERSION 2 +#define QEMU_SAVE_VERSION 3
This will make *all* save images incompatible with old libvirt, even if they're not using mapped ram. This feels sub-optimal to me. I figure we should use v2, unless the new header files are non-zero
G_STATIC_ASSERT(sizeof(QEMU_SAVE_MAGIC) == sizeof(QEMU_SAVE_PARTIAL)); +typedef enum { + QEMU_SAVE_FEATURE_MAPPED_RAM = 1 << 0, +} qemuSaveFeatures; + typedef struct _virQEMUSaveHeader virQEMUSaveHeader; struct _virQEMUSaveHeader { char magic[sizeof(QEMU_SAVE_MAGIC)-1]; @@ -40,7 +44,8 @@ struct _virQEMUSaveHeader { uint32_t was_running; uint32_t compressed; uint32_t cookieOffset; - uint32_t unused[14]; + uint32_t features;
Some features might be possible to restore from, even if the current impl doesn't know about it.
In qcow2 they added "compatible features" and "incompatible featurs" fields to reflect that some are backwards compatible. So how about we do the same here with two 32-bit uints.
Unfortunately we'll still need the version bump to prevent older libvirt from restoring a mapped-ram saved image, but we can avoid adding the 'features' field if we use the existing 'compressed' field (renamed to 'format', see commit bd6d7ebf62 and related b0dc8a923d) to control mapped-ram. And if we use the existing field for mapped-ram, do we even need a version setting in qemu.conf?
Martin seemed receptive to using *_image_format for mapped-ram. What are your opinions on that?
Regards, Jim

Add a 'save_image_version' setting to qemu.conf to control the image version when saving a VM with 'virsh save' or 'virsh managedsave'. Default to the new version 3. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf.in | 6 ++++++ src/qemu/qemu_conf.c | 16 ++++++++++++++++ src/qemu/qemu_conf.h | 5 +++++ src/qemu/qemu_driver.c | 3 +-- src/qemu/qemu_saveimage.c | 11 ++++++----- src/qemu/qemu_saveimage.h | 8 ++++---- src/qemu/qemu_snapshot.c | 4 ++-- src/qemu/test_libvirtd_qemu.aug.in | 1 + 9 files changed, 42 insertions(+), 13 deletions(-) diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 2b6526538f..acbfcd9fc3 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -92,6 +92,7 @@ module Libvirtd_qemu = | str_array_entry "namespaces" let save_entry = str_entry "save_image_format" + | int_entry "save_image_version" | str_entry "dump_image_format" | str_entry "snapshot_image_format" | str_entry "auto_dump_path" diff --git a/src/qemu/qemu.conf.in b/src/qemu/qemu.conf.in index 6bc2140dcb..b5df8c1cc6 100644 --- a/src/qemu/qemu.conf.in +++ b/src/qemu/qemu.conf.in @@ -590,6 +590,11 @@ # at scheduled saving, and it is an error if the specified save_image_format # is not valid, or the requested compression program can't be found. # +# save_image_version is applicable when using 'virsh save' or 'virsh managed-save'. +# Currently only versions 2 and 3 are supported, with version 3 being the default. +# If saved images must be compatible with an older libvirt without this setting, +# then set save_image_format_version to 2. +# # dump_image_format is used when you use 'virsh dump' at emergency # crashdump, and if the specified dump_image_format is not valid, or # the requested compression program can't be found, this falls @@ -601,6 +606,7 @@ # or the requested compression program can't be found. # #save_image_format = "raw" +#save_image_version = 3 #dump_image_format = "raw" #snapshot_image_format = "raw" diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index b36bede6c3..ab4122708c 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -66,6 +66,10 @@ VIR_LOG_INIT("qemu.qemu_conf"); #define QEMU_MIGRATION_PORT_MIN 49152 #define QEMU_MIGRATION_PORT_MAX 49215 +/* Need to reconsile definition here and in qemu_saveimage.h */ +#define QEMU_SAVE_VERSION 3 + + VIR_ENUM_IMPL(virQEMUSchedCore, QEMU_SCHED_CORE_LAST, "none", @@ -246,6 +250,8 @@ virQEMUDriverConfig *virQEMUDriverConfigNew(bool privileged, cfg->migrationPortMin = QEMU_MIGRATION_PORT_MIN; cfg->migrationPortMax = QEMU_MIGRATION_PORT_MAX; + cfg->saveImageVersion = QEMU_SAVE_IMAGE_VERSION; + /* For privileged driver, try and find hugetlbfs mounts automatically. * Non-privileged driver requires admin to create a dir for the * user, chown it, and then let user configure it manually. */ @@ -608,6 +614,16 @@ static int virQEMUDriverConfigLoadSaveEntry(virQEMUDriverConfig *cfg, virConf *conf) { + if (virConfGetValueUInt(conf, "save_image_version", &cfg->saveImageVersion) < 0) + return -1; + if (cfg->saveImageVersion < QEMU_SAVE_IMAGE_VERSION_MIN || + cfg->saveImageVersion > QEMU_SAVE_IMAGE_VERSION) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid save image version %1$u"), + cfg->saveImageVersion); + return -1; + } + if (virConfGetValueString(conf, "save_image_format", &cfg->saveImageFormat) < 0) return -1; if (virConfGetValueString(conf, "dump_image_format", &cfg->dumpImageFormat) < 0) diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index aa1e1a626c..55abede7e3 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -45,6 +45,10 @@ #define QEMU_DRIVER_NAME "QEMU" +#define QEMU_SAVE_IMAGE_VERSION_MIN 2 +#define QEMU_SAVE_IMAGE_VERSION 3 + + typedef enum { QEMU_SCHED_CORE_NONE = 0, QEMU_SCHED_CORE_VCPUS, @@ -193,6 +197,7 @@ struct _virQEMUDriverConfig { bool securityRequireConfined; char *saveImageFormat; + unsigned int saveImageVersion; char *dumpImageFormat; char *snapshotImageFormat; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 736602333e..6c0d3e097c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2683,8 +2683,7 @@ qemuDomainSaveInternal(virQEMUDriver *driver, if (!(cookie = qemuDomainSaveCookieNew(vm))) goto endjob; - if (!(data = virQEMUSaveDataNew(xml, cookie, was_running, compressed, - driver->xmlopt))) + if (!(data = virQEMUSaveDataNew(driver, xml, cookie, was_running, compressed))) goto endjob; xml = NULL; diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 50fec33f54..bffa0a3397 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -95,25 +95,26 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree); * This function steals @domXML on success. */ virQEMUSaveData * -virQEMUSaveDataNew(char *domXML, +virQEMUSaveDataNew(virQEMUDriver *driver, + char *domXML, qemuDomainSaveCookie *cookieObj, bool running, - int compressed, - virDomainXMLOption *xmlopt) + int compressed) { virQEMUSaveData *data = NULL; virQEMUSaveHeader *header; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); data = g_new0(virQEMUSaveData, 1); if (cookieObj && !(data->cookie = virSaveCookieFormat((virObject *) cookieObj, - virDomainXMLOptionGetSaveCookie(xmlopt)))) + virDomainXMLOptionGetSaveCookie(driver->xmlopt)))) goto error; header = &data->header; memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic)); - header->version = QEMU_SAVE_VERSION; + header->version = cfg->saveImageVersion; header->was_running = running ? 1 : 0; header->compressed = compressed; diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 9dd7de292d..3cee846f14 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -28,7 +28,7 @@ */ #define QEMU_SAVE_MAGIC "LibvirtQemudSave" #define QEMU_SAVE_PARTIAL "LibvirtQemudPart" -#define QEMU_SAVE_VERSION 3 +#define QEMU_SAVE_VERSION QEMU_SAVE_IMAGE_VERSION G_STATIC_ASSERT(sizeof(QEMU_SAVE_MAGIC) == sizeof(QEMU_SAVE_PARTIAL)); @@ -123,11 +123,11 @@ virQEMUSaveDataWrite(virQEMUSaveData *data, const char *path); virQEMUSaveData * -virQEMUSaveDataNew(char *domXML, +virQEMUSaveDataNew(virQEMUDriver *driver, + char *domXML, qemuDomainSaveCookie *cookieObj, bool running, - int compressed, - virDomainXMLOption *xmlopt); + int compressed); void virQEMUSaveDataFree(virQEMUSaveData *data); diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index f5260c4a22..1d75208814 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1390,9 +1390,9 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, !(snapdef->cookie = (virObject *) qemuDomainSaveCookieNew(vm))) goto cleanup; - if (!(data = virQEMUSaveDataNew(xml, + if (!(data = virQEMUSaveDataNew(driver, xml, (qemuDomainSaveCookie *) snapdef->cookie, - resume, compressed, driver->xmlopt))) + resume, compressed))) goto cleanup; xml = NULL; diff --git a/src/qemu/test_libvirtd_qemu.aug.in b/src/qemu/test_libvirtd_qemu.aug.in index b97e6de11e..8bcb332020 100644 --- a/src/qemu/test_libvirtd_qemu.aug.in +++ b/src/qemu/test_libvirtd_qemu.aug.in @@ -70,6 +70,7 @@ module Test_libvirtd_qemu = { "8" = "/dev/userfaultfd" } } { "save_image_format" = "raw" } +{ "save_image_version" = "3" } { "dump_image_format" = "raw" } { "snapshot_image_format" = "raw" } { "auto_dump_path" = "/var/lib/libvirt/qemu/dump" } -- 2.35.3

On Thu, Aug 08, 2024 at 05:38:01PM -0600, Jim Fehlig via Devel wrote:
Add a 'save_image_version' setting to qemu.conf to control the image version when saving a VM with 'virsh save' or 'virsh managedsave'. Default to the new version 3.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf.in | 6 ++++++ src/qemu/qemu_conf.c | 16 ++++++++++++++++ src/qemu/qemu_conf.h | 5 +++++ src/qemu/qemu_driver.c | 3 +-- src/qemu/qemu_saveimage.c | 11 ++++++----- src/qemu/qemu_saveimage.h | 8 ++++---- src/qemu/qemu_snapshot.c | 4 ++-- src/qemu/test_libvirtd_qemu.aug.in | 1 + 9 files changed, 42 insertions(+), 13 deletions(-)
diff --git a/src/qemu/libvirtd_qemu.aug b/src/qemu/libvirtd_qemu.aug index 2b6526538f..acbfcd9fc3 100644 --- a/src/qemu/libvirtd_qemu.aug +++ b/src/qemu/libvirtd_qemu.aug @@ -92,6 +92,7 @@ module Libvirtd_qemu = | str_array_entry "namespaces"
let save_entry = str_entry "save_image_format" + | int_entry "save_image_version" | str_entry "dump_image_format" | str_entry "snapshot_image_format" | str_entry "auto_dump_path" diff --git a/src/qemu/qemu.conf.in b/src/qemu/qemu.conf.in index 6bc2140dcb..b5df8c1cc6 100644 --- a/src/qemu/qemu.conf.in +++ b/src/qemu/qemu.conf.in @@ -590,6 +590,11 @@ # at scheduled saving, and it is an error if the specified save_image_format # is not valid, or the requested compression program can't be found. # +# save_image_version is applicable when using 'virsh save' or 'virsh managed-save'. +# Currently only versions 2 and 3 are supported, with version 3 being the default. +# If saved images must be compatible with an older libvirt without this setting, +# then set save_image_format_version to 2. +# # dump_image_format is used when you use 'virsh dump' at emergency # crashdump, and if the specified dump_image_format is not valid, or # the requested compression program can't be found, this falls @@ -601,6 +606,7 @@ # or the requested compression program can't be found. # #save_image_format = "raw" +#save_image_version = 3 #dump_image_format = "raw" #snapshot_image_format = "raw"
diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c index b36bede6c3..ab4122708c 100644 --- a/src/qemu/qemu_conf.c +++ b/src/qemu/qemu_conf.c @@ -66,6 +66,10 @@ VIR_LOG_INIT("qemu.qemu_conf"); #define QEMU_MIGRATION_PORT_MIN 49152 #define QEMU_MIGRATION_PORT_MAX 49215
+/* Need to reconsile definition here and in qemu_saveimage.h */ +#define QEMU_SAVE_VERSION 3
You've not used this - instead you used QEMU_SAVE_IMAGE_VERSION defined in qemu_conf.h
+ + VIR_ENUM_IMPL(virQEMUSchedCore, QEMU_SCHED_CORE_LAST, "none", @@ -246,6 +250,8 @@ virQEMUDriverConfig *virQEMUDriverConfigNew(bool privileged, cfg->migrationPortMin = QEMU_MIGRATION_PORT_MIN; cfg->migrationPortMax = QEMU_MIGRATION_PORT_MAX;
+ cfg->saveImageVersion = QEMU_SAVE_IMAGE_VERSION; +
/* For privileged driver, try and find hugetlbfs mounts automatically. * Non-privileged driver requires admin to create a dir for the * user, chown it, and then let user configure it manually. */ @@ -608,6 +614,16 @@ static int virQEMUDriverConfigLoadSaveEntry(virQEMUDriverConfig *cfg, virConf *conf) { + if (virConfGetValueUInt(conf, "save_image_version", &cfg->saveImageVersion) < 0) + return -1; + if (cfg->saveImageVersion < QEMU_SAVE_IMAGE_VERSION_MIN || + cfg->saveImageVersion > QEMU_SAVE_IMAGE_VERSION) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Invalid save image version %1$u"), + cfg->saveImageVersion); + return -1; + } + if (virConfGetValueString(conf, "save_image_format", &cfg->saveImageFormat) < 0) return -1; if (virConfGetValueString(conf, "dump_image_format", &cfg->dumpImageFormat) < 0) diff --git a/src/qemu/qemu_conf.h b/src/qemu/qemu_conf.h index aa1e1a626c..55abede7e3 100644 --- a/src/qemu/qemu_conf.h +++ b/src/qemu/qemu_conf.h @@ -45,6 +45,10 @@
#define QEMU_DRIVER_NAME "QEMU"
+#define QEMU_SAVE_IMAGE_VERSION_MIN 2 +#define QEMU_SAVE_IMAGE_VERSION 3
Call the second one 'MAX' for parity, and also to make it more obvious that we're not automatically using the 'MAX' by default.
+ + typedef enum { QEMU_SCHED_CORE_NONE = 0, QEMU_SCHED_CORE_VCPUS, @@ -193,6 +197,7 @@ struct _virQEMUDriverConfig { bool securityRequireConfined;
char *saveImageFormat; + unsigned int saveImageVersion; char *dumpImageFormat; char *snapshotImageFormat;
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 736602333e..6c0d3e097c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2683,8 +2683,7 @@ qemuDomainSaveInternal(virQEMUDriver *driver, if (!(cookie = qemuDomainSaveCookieNew(vm))) goto endjob;
- if (!(data = virQEMUSaveDataNew(xml, cookie, was_running, compressed, - driver->xmlopt))) + if (!(data = virQEMUSaveDataNew(driver, xml, cookie, was_running, compressed))) goto endjob; xml = NULL;
diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 50fec33f54..bffa0a3397 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -95,25 +95,26 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree); * This function steals @domXML on success. */ virQEMUSaveData * -virQEMUSaveDataNew(char *domXML, +virQEMUSaveDataNew(virQEMUDriver *driver, + char *domXML, qemuDomainSaveCookie *cookieObj, bool running, - int compressed, - virDomainXMLOption *xmlopt) + int compressed) { virQEMUSaveData *data = NULL; virQEMUSaveHeader *header; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver);
data = g_new0(virQEMUSaveData, 1);
if (cookieObj && !(data->cookie = virSaveCookieFormat((virObject *) cookieObj, - virDomainXMLOptionGetSaveCookie(xmlopt)))) + virDomainXMLOptionGetSaveCookie(driver->xmlopt)))) goto error;
header = &data->header; memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic)); - header->version = QEMU_SAVE_VERSION; + header->version = cfg->saveImageVersion; header->was_running = running ? 1 : 0; header->compressed = compressed;
diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 9dd7de292d..3cee846f14 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -28,7 +28,7 @@ */ #define QEMU_SAVE_MAGIC "LibvirtQemudSave" #define QEMU_SAVE_PARTIAL "LibvirtQemudPart" -#define QEMU_SAVE_VERSION 3 +#define QEMU_SAVE_VERSION QEMU_SAVE_IMAGE_VERSION
Surely we shouldn't need this constant anymore since you changed the qemu_saveimage.c code to use 'cfg->saveImageVersion' instead of 'QEMU_SAVE_VERSION'
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Move the code in qemuSaveImageCreate that opens, labels, and wraps the save image fd to a helper function, providing more flexibility for upcoming mapped-ram support. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_saveimage.c | 65 +++++++++++++++++++++++++++------------ 1 file changed, 45 insertions(+), 20 deletions(-) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index bffa0a3397..6f2ce40124 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -361,6 +361,50 @@ qemuSaveImageDecompressionStop(virCommand *cmd, } +static int +qemuSaveImageCreateFd(virQEMUDriver *driver, + virDomainObj *vm, + const char *path, + virFileWrapperFd *wrapperFd, + bool *needUnlink, + unsigned int flags) +{ + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + int ret = -1; + VIR_AUTOCLOSE fd = -1; + int directFlag = 0; + unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING; + + if (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) { + wrapperFlags |= VIR_FILE_WRAPPER_BYPASS_CACHE; + directFlag = virFileDirectFdFlag(); + if (directFlag < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("bypass cache unsupported by this system")); + return -1; + } + } + + fd = virQEMUFileOpenAs(cfg->user, cfg->group, false, path, + O_WRONLY | O_TRUNC | O_CREAT | directFlag, + needUnlink); + + if (fd < 0) + return -1; + + if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, fd) < 0) + return -1; + + if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags))) + return -1; + + ret = fd; + fd = -1; + + return ret; +} + + /* Helper function to execute a migration to file with a correct save header * the caller needs to make sure that the processors are stopped and do all other * actions besides saving memory */ @@ -377,33 +421,14 @@ qemuSaveImageCreate(virQEMUDriver *driver, bool needUnlink = false; int ret = -1; int fd = -1; - int directFlag = 0; virFileWrapperFd *wrapperFd = NULL; - unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING; /* Obtain the file handle. */ - if ((flags & VIR_DOMAIN_SAVE_BYPASS_CACHE)) { - wrapperFlags |= VIR_FILE_WRAPPER_BYPASS_CACHE; - directFlag = virFileDirectFdFlag(); - if (directFlag < 0) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("bypass cache unsupported by this system")); - goto cleanup; - } - } + fd = qemuSaveImageCreateFd(driver, vm, path, wrapperFd, &needUnlink, flags); - fd = virQEMUFileOpenAs(cfg->user, cfg->group, false, path, - O_WRONLY | O_TRUNC | O_CREAT | directFlag, - &needUnlink); if (fd < 0) goto cleanup; - if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, fd) < 0) - goto cleanup; - - if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags))) - goto cleanup; - if (virQEMUSaveDataWrite(data, fd, path) < 0) goto cleanup; -- 2.35.3

Introduce support for QEMU's new mapped-ram stream format [1]. mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf to version '2'. To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to 1 - QEMU must be provided an fdset containing the migration fd - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start writing the data stream, e.g. {"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}} The mapped-ram stream, in conjunction with direct IO and multifd support provided by subsequent patches, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1). VM: 32G RAM, 1 vcpu, idle (shortly after boot) | save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472 -----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | | | | + multifd-channels=8 | 4.421s | 0.845s | 34368554318 | 1774312 ------------------------------------------------------------------- VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory | save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+--------- legacy | 25.800s | 14.332s | 33154309983 | 64754512 -----------------------+---------+---------+--------------+--------- mapped-ram | 18.742s | 15.027s | 34368559228 | 64617160 -----------------------+---------+---------+--------------+--------- legacy + direct IO | 13.115s | 18.050s | 33154310496 | 64754520 -----------------------+---------+---------+--------------+--------- mapped-ram + direct IO | 13.623s | 15.959s | 34368557392 | 64662040 -----------------------+-------- +---------+--------------+--------- mapped-ram + direct IO | | | | + multifd-channels=8 | 6.994s | 6.470s | 34368554980 | 64665776 -------------------------------------------------------------------- As can be seen from the tables, one caveat of mapped-ram is the logical file size of a saved image is basically equivalent to the VM memory size. Note however that mapped-ram typically uses fewer blocks on disk. Another caveat of mapped-ram is the requirement for a seekable file descriptor, which currently makes it incompatible with libvirt's support for save image compression. Also note the mapped-ram stream is incompatible with the existing stream format, hence mapped-ram cannot be used to restore an image saved with the existing format and vice versa. [1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp... Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 20 ++++-- src/qemu/qemu_migration.c | 139 ++++++++++++++++++++++++++------------ src/qemu/qemu_migration.h | 4 +- src/qemu/qemu_monitor.c | 36 ++++++++++ src/qemu/qemu_monitor.h | 4 ++ src/qemu/qemu_saveimage.c | 43 +++++++++--- src/qemu/qemu_saveimage.h | 2 + src/qemu/qemu_snapshot.c | 9 ++- 8 files changed, 195 insertions(+), 62 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6c0d3e097c..977763e5d8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2617,6 +2617,8 @@ qemuDomainSaveInternal(virQEMUDriver *driver, qemuDomainObjPrivate *priv = vm->privateData; virQEMUSaveData *data = NULL; g_autoptr(qemuDomainSaveCookie) cookie = NULL; + g_autoptr(qemuMigrationParams) saveParams = NULL; + bool mappedRam = false; if (virDomainObjBeginAsyncJob(vm, VIR_ASYNC_JOB_SAVE, VIR_DOMAIN_JOB_OPERATION_SAVE, flags) < 0) @@ -2683,12 +2685,17 @@ qemuDomainSaveInternal(virQEMUDriver *driver, if (!(cookie = qemuDomainSaveCookieNew(vm))) goto endjob; - if (!(data = virQEMUSaveDataNew(driver, xml, cookie, was_running, compressed))) + + if (!(data = virQEMUSaveDataNew(driver, vm, xml, cookie, was_running, compressed))) goto endjob; xml = NULL; + mappedRam = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; + if (!(saveParams = qemuMigrationParamsForSave(mappedRam))) + goto endjob; + ret = qemuSaveImageCreate(driver, vm, path, data, compressor, - flags, VIR_ASYNC_JOB_SAVE); + saveParams, flags, VIR_ASYNC_JOB_SAVE); if (ret < 0) goto endjob; @@ -3122,6 +3129,8 @@ doCoreDump(virQEMUDriver *driver, memory_dump_format) < 0) goto cleanup; } else { + g_autoptr(qemuMigrationParams) dump_params = NULL; + if (dumpformat != VIR_DOMAIN_CORE_DUMP_FORMAT_RAW) { virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", _("kdump-compressed format is only supported with memory-only dump")); @@ -3131,8 +3140,11 @@ doCoreDump(virQEMUDriver *driver, if (!qemuMigrationSrcIsAllowed(vm, false, VIR_ASYNC_JOB_DUMP, 0)) goto cleanup; - if (qemuMigrationSrcToFile(driver, vm, fd, compressor, - VIR_ASYNC_JOB_DUMP) < 0) + if (!(dump_params = qemuMigrationParamsNew())) + goto cleanup; + + if (qemuMigrationSrcToFile(driver, vm, &fd, compressor, + dump_params, dump_flags, VIR_ASYNC_JOB_DUMP) < 0) goto cleanup; } diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index c1bdaa273b..3ab5e2fa30 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -6944,46 +6944,17 @@ qemuMigrationProcessUnattended(virQEMUDriver *driver, } -/* Helper function called while vm is active. */ -int -qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, - int fd, - virCommand *compressor, - virDomainAsyncJob asyncJob) +static int +qemuMigrationSrcToLegacyFile(virQEMUDriver *driver, + virDomainObj *vm, + int fd, + virCommand *compressor, + virDomainAsyncJob asyncJob) { qemuDomainObjPrivate *priv = vm->privateData; - int rc; int ret = -1; int pipeFD[2] = { -1, -1 }; - unsigned long saveMigBandwidth = priv->migMaxBandwidth; char *errbuf = NULL; - virErrorPtr orig_err = NULL; - g_autoptr(qemuMigrationParams) migParams = NULL; - - if (qemuMigrationSetDBusVMState(driver, vm) < 0) - return -1; - - /* Increase migration bandwidth to unlimited since target is a file. - * Failure to change migration speed is not fatal. */ - if (!(migParams = qemuMigrationParamsForSave(false))) - return -1; - - if (qemuMigrationParamsSetULL(migParams, - QEMU_MIGRATION_PARAM_MAX_BANDWIDTH, - QEMU_DOMAIN_MIG_BANDWIDTH_MAX * 1024 * 1024) < 0) - return -1; - - if (qemuMigrationParamsApply(vm, asyncJob, migParams, 0) < 0) - return -1; - - priv->migMaxBandwidth = QEMU_DOMAIN_MIG_BANDWIDTH_MAX; - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("guest unexpectedly quit")); - /* nothing to tear down */ - return -1; - } if (compressor && virPipe(pipeFD) < 0) return -1; @@ -7000,7 +6971,7 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, goto cleanup; if (!compressor) { - rc = qemuMonitorMigrateToFd(priv->mon, 0, fd); + ret = qemuMonitorMigrateToFd(priv->mon, 0, fd); } else { virCommandSetInputFD(compressor, pipeFD[0]); virCommandSetOutputFD(compressor, &fd); @@ -7016,12 +6987,99 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, qemuDomainObjExitMonitor(vm); goto cleanup; } - rc = qemuMonitorMigrateToFd(priv->mon, 0, pipeFD[1]); + ret = qemuMonitorMigrateToFd(priv->mon, 0, pipeFD[1]); if (VIR_CLOSE(pipeFD[0]) < 0 || VIR_CLOSE(pipeFD[1]) < 0) VIR_WARN("failed to close intermediate pipe"); } qemuDomainObjExitMonitor(vm); + + cleanup: + VIR_FORCE_CLOSE(pipeFD[0]); + VIR_FORCE_CLOSE(pipeFD[1]); + + if (errbuf) { + VIR_DEBUG("Compression binary stderr: %s", NULLSTR(errbuf)); + VIR_FREE(errbuf); + } + + return ret; +} + + +static int +qemuMigrationSrcToMappedFile(virQEMUDriver *driver, + virDomainObj *vm, + int *fd, + unsigned int flags, + virDomainAsyncJob asyncJob) +{ + int ret; + + /* mapped-ram does not support directIO */ + if ((flags & VIR_DOMAIN_SAVE_BYPASS_CACHE)) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("bypass cache unsupported by this system")); + return -1; + } + + if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, *fd) < 0) + return -1; + + if (qemuDomainObjEnterMonitorAsync(vm, asyncJob) < 0) + return -1; + + ret = qemuMonitorMigrateToFdSet(vm, 0, fd); + qemuDomainObjExitMonitor(vm); + return ret; +} + + +/* Helper function called while vm is active. */ +int +qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, + int *fd, + virCommand *compressor, + qemuMigrationParams *migParams, + unsigned int flags, + virDomainAsyncJob asyncJob) +{ + qemuDomainObjPrivate *priv = vm->privateData; + int rc; + int ret = -1; + unsigned long saveMigBandwidth = priv->migMaxBandwidth; + virErrorPtr orig_err = NULL; + + if (qemuMigrationSetDBusVMState(driver, vm) < 0) + return -1; + + /* Increase migration bandwidth to unlimited since target is a file. + * Failure to change migration speed is not fatal. */ + if (migParams && + qemuMigrationParamsSetULL(migParams, + QEMU_MIGRATION_PARAM_MAX_BANDWIDTH, + QEMU_DOMAIN_MIG_BANDWIDTH_MAX * 1024 * 1024) < 0) + return -1; + + if (qemuMigrationParamsApply(vm, asyncJob, migParams, 0) < 0) + return -1; + + priv->migMaxBandwidth = QEMU_DOMAIN_MIG_BANDWIDTH_MAX; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest unexpectedly quit")); + /* nothing to tear down */ + return -1; + } + + if (migParams && + qemuMigrationParamsCapEnabled(migParams, QEMU_MIGRATION_CAP_MAPPED_RAM)) { + rc = qemuMigrationSrcToMappedFile(driver, vm, fd, flags, asyncJob); + } else { + rc = qemuMigrationSrcToLegacyFile(driver, vm, *fd, compressor, asyncJob); + } + if (rc < 0) goto cleanup; @@ -7057,13 +7115,6 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, priv->migMaxBandwidth = saveMigBandwidth; } - VIR_FORCE_CLOSE(pipeFD[0]); - VIR_FORCE_CLOSE(pipeFD[1]); - if (errbuf) { - VIR_DEBUG("Compression binary stderr: %s", NULLSTR(errbuf)); - VIR_FREE(errbuf); - } - virErrorRestore(&orig_err); return ret; diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index ed62fd4a91..039a926e69 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -236,8 +236,10 @@ qemuMigrationSrcIsAllowed(virDomainObj *vm, int qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, - int fd, + int *fd, virCommand *compressor, + qemuMigrationParams *migParams, + unsigned int flags, virDomainAsyncJob asyncJob) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index b1c0c6a064..9a454a1d08 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2237,6 +2237,42 @@ qemuMonitorMigrateToFd(qemuMonitor *mon, } +int +qemuMonitorMigrateToFdSet(virDomainObj *vm, + unsigned int flags, + int *fd) +{ + qemuDomainObjPrivate *priv = vm->privateData; + qemuMonitor *mon = priv->mon; + off_t offset; + g_autoptr(qemuFDPass) fdPassMigrate = NULL; + unsigned int setId; + g_autofree char *uri = NULL; + int ret; + + VIR_DEBUG("fd=%d flags=0x%x", *fd, flags); + + QEMU_CHECK_MONITOR(mon); + + if ((offset = lseek(*fd, 0, SEEK_CUR)) == -1) { + virReportSystemError(errno, + "%s", _("failed to seek on file descriptor")); + return -1; + } + + fdPassMigrate = qemuFDPassNew("migrate", priv); + qemuFDPassAddFD(fdPassMigrate, fd, "-buffered-fd"); + if (qemuFDPassTransferMonitor(fdPassMigrate, mon) < 0) + return -1; + + qemuFDPassGetId(fdPassMigrate, &setId); + uri = g_strdup_printf("file:/dev/fdset/%u,offset=%#lx", setId, offset); + ret = qemuMonitorJSONMigrate(mon, flags, uri); + + return ret; +} + + int qemuMonitorMigrateToHost(qemuMonitor *mon, unsigned int flags, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 76c859a888..ebacdf110e 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -844,6 +844,10 @@ int qemuMonitorMigrateToFd(qemuMonitor *mon, unsigned int flags, int fd); +int qemuMonitorMigrateToFdSet(virDomainObj *vm, + unsigned int flags, + int *fd); + int qemuMonitorMigrateToHost(qemuMonitor *mon, unsigned int flags, const char *protocol, diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 6f2ce40124..98a1ad638d 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -96,6 +96,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree); */ virQEMUSaveData * virQEMUSaveDataNew(virQEMUDriver *driver, + virDomainObj *vm, char *domXML, qemuDomainSaveCookie *cookieObj, bool running, @@ -115,6 +116,19 @@ virQEMUSaveDataNew(virQEMUDriver *driver, header = &data->header; memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic)); header->version = cfg->saveImageVersion; + + /* Enable mapped-ram feature if available and save version >= 3 */ + if (header->version >= QEMU_SAVE_VERSION && + qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_MAPPED_RAM)) { + if (compressed != QEMU_SAVE_FORMAT_RAW) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("compression is not supported with save image version %1$u"), + header->version); + goto error; + } + header->features |= QEMU_SAVE_FEATURE_MAPPED_RAM; + } + header->was_running = running ? 1 : 0; header->compressed = compressed; @@ -366,6 +380,7 @@ qemuSaveImageCreateFd(virQEMUDriver *driver, virDomainObj *vm, const char *path, virFileWrapperFd *wrapperFd, + bool mappedRam, bool *needUnlink, unsigned int flags) { @@ -375,7 +390,7 @@ qemuSaveImageCreateFd(virQEMUDriver *driver, int directFlag = 0; unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING; - if (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) { + if (!mappedRam && flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) { wrapperFlags |= VIR_FILE_WRAPPER_BYPASS_CACHE; directFlag = virFileDirectFdFlag(); if (directFlag < 0) { @@ -395,7 +410,7 @@ qemuSaveImageCreateFd(virQEMUDriver *driver, if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, fd) < 0) return -1; - if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags))) + if (!mappedRam && !(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags))) return -1; ret = fd; @@ -414,6 +429,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, const char *path, virQEMUSaveData *data, virCommand *compressor, + qemuMigrationParams *saveParams, unsigned int flags, virDomainAsyncJob asyncJob) { @@ -422,9 +438,10 @@ qemuSaveImageCreate(virQEMUDriver *driver, int ret = -1; int fd = -1; virFileWrapperFd *wrapperFd = NULL; + bool mappedRam = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; /* Obtain the file handle. */ - fd = qemuSaveImageCreateFd(driver, vm, path, wrapperFd, &needUnlink, flags); + fd = qemuSaveImageCreateFd(driver, vm, path, wrapperFd, mappedRam, &needUnlink, flags); if (fd < 0) goto cleanup; @@ -433,7 +450,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, goto cleanup; /* Perform the migration */ - if (qemuMigrationSrcToFile(driver, vm, fd, compressor, asyncJob) < 0) + if (qemuMigrationSrcToFile(driver, vm, &fd, compressor, saveParams, flags, asyncJob) < 0) goto cleanup; /* Touch up file header to mark image complete. */ @@ -441,14 +458,18 @@ qemuSaveImageCreate(virQEMUDriver *driver, /* Reopen the file to touch up the header, since we aren't set * up to seek backwards on wrapperFd. The reopened fd will * trigger a single page of file system cache pollution, but - * that's acceptable. */ - if (VIR_CLOSE(fd) < 0) { - virReportSystemError(errno, _("unable to close %1$s"), path); - goto cleanup; - } + * that's acceptable. + * If using mapped-ram, the fd was passed to qemu, so no need + * to close it. */ + if (!mappedRam) { + if (VIR_CLOSE(fd) < 0) { + virReportSystemError(errno, _("unable to close %1$s"), path); + goto cleanup; + } - if (qemuDomainFileWrapperFDClose(vm, wrapperFd) < 0) - goto cleanup; + if (qemuDomainFileWrapperFDClose(vm, wrapperFd) < 0) + goto cleanup; + } if ((fd = qemuDomainOpenFile(cfg, vm->def, path, O_WRONLY, NULL)) < 0 || virQEMUSaveDataFinish(data, &fd, path) < 0) diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 3cee846f14..eb2835f11b 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -114,6 +114,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, const char *path, virQEMUSaveData *data, virCommand *compressor, + qemuMigrationParams *saveParams, unsigned int flags, virDomainAsyncJob asyncJob); @@ -124,6 +125,7 @@ virQEMUSaveDataWrite(virQEMUSaveData *data, virQEMUSaveData * virQEMUSaveDataNew(virQEMUDriver *driver, + virDomainObj *vm, char *domXML, qemuDomainSaveCookie *cookieObj, bool running, diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index 1d75208814..9833fb6206 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1367,6 +1367,8 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, /* do the memory snapshot if necessary */ if (memory) { + g_autoptr(qemuMigrationParams) snap_params = NULL; + /* check if migration is possible */ if (!qemuMigrationSrcIsAllowed(vm, false, VIR_ASYNC_JOB_SNAPSHOT, 0)) goto cleanup; @@ -1390,7 +1392,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, !(snapdef->cookie = (virObject *) qemuDomainSaveCookieNew(vm))) goto cleanup; - if (!(data = virQEMUSaveDataNew(driver, xml, + if (!(data = virQEMUSaveDataNew(driver, vm, xml, (qemuDomainSaveCookie *) snapdef->cookie, resume, compressed))) goto cleanup; @@ -1398,8 +1400,11 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, memory_existing = virFileExists(snapdef->memorysnapshotfile); + if (!(snap_params = qemuMigrationParamsNew())) + goto cleanup; + if ((ret = qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile, - data, compressor, 0, + data, compressor, snap_params, 0, VIR_ASYNC_JOB_SNAPSHOT)) < 0) goto cleanup; -- 2.35.3

On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
Introduce support for QEMU's new mapped-ram stream format [1]. mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf to version '2'.
To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to 1 - QEMU must be provided an fdset containing the migration fd - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start writing the data stream, e.g.
{"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}}
The mapped-ram stream, in conjunction with direct IO and multifd support provided by subsequent patches, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1).
VM: 32G RAM, 1 vcpu, idle (shortly after boot)
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472
I'm surprised by the restore time speed up, as I didn't think mapped-ram should make any perf difference without direct IO and multifd.
-----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304
Still somewhat surprised by the speed up on restore here too
-----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | | | | + multifd-channels=8 | 4.421s | 0.845s | 34368554318 | 1774312 -------------------------------------------------------------------
VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+--------- legacy | 25.800s | 14.332s | 33154309983 | 64754512 -----------------------+---------+---------+--------------+--------- mapped-ram | 18.742s | 15.027s | 34368559228 | 64617160 -----------------------+---------+---------+--------------+--------- legacy + direct IO | 13.115s | 18.050s | 33154310496 | 64754520 -----------------------+---------+---------+--------------+--------- mapped-ram + direct IO | 13.623s | 15.959s | 34368557392 | 64662040
These figures make more sense with restore time matching save time more or less.
-----------------------+-------- +---------+--------------+--------- mapped-ram + direct IO | | | | + multifd-channels=8 | 6.994s | 6.470s | 34368554980 | 64665776 --------------------------------------------------------------------
As can be seen from the tables, one caveat of mapped-ram is the logical file size of a saved image is basically equivalent to the VM memory size. Note however that mapped-ram typically uses fewer blocks on disk.
Another caveat of mapped-ram is the requirement for a seekable file descriptor, which currently makes it incompatible with libvirt's support for save image compression. Also note the mapped-ram stream is incompatible with the existing stream format, hence mapped-ram cannot be used to restore an image saved with the existing format and vice versa.
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp...
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 20 ++++-- src/qemu/qemu_migration.c | 139 ++++++++++++++++++++++++++------------ src/qemu/qemu_migration.h | 4 +- src/qemu/qemu_monitor.c | 36 ++++++++++ src/qemu/qemu_monitor.h | 4 ++ src/qemu/qemu_saveimage.c | 43 +++++++++--- src/qemu/qemu_saveimage.h | 2 + src/qemu/qemu_snapshot.c | 9 ++- 8 files changed, 195 insertions(+), 62 deletions(-)
diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 6f2ce40124..98a1ad638d 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -96,6 +96,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree); */ virQEMUSaveData * virQEMUSaveDataNew(virQEMUDriver *driver, + virDomainObj *vm, char *domXML, qemuDomainSaveCookie *cookieObj, bool running, @@ -115,6 +116,19 @@ virQEMUSaveDataNew(virQEMUDriver *driver, header = &data->header; memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic)); header->version = cfg->saveImageVersion; + + /* Enable mapped-ram feature if available and save version >= 3 */ + if (header->version >= QEMU_SAVE_VERSION && + qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_MAPPED_RAM)) { + if (compressed != QEMU_SAVE_FORMAT_RAW) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("compression is not supported with save image version %1$u"), + header->version); + goto error; + } + header->features |= QEMU_SAVE_FEATURE_MAPPED_RAM; + }
If the QEMU we're usnig doesnt have CAP_MAPPED_RAM, then I think we should NOT default to Version 3 save images, as that's creating a backcompat problem for zero user benefit. This suggests that in qemu_conf.c, we should initialize the default value to '0', and then in this code, if we see version 0 we should pick either 2 or 3 depending on mapped ram.
+ header->was_running = running ? 1 : 0; header->compressed = compressed;
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
Introduce support for QEMU's new mapped-ram stream format [1]. mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf to version '2'.
To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to 1 - QEMU must be provided an fdset containing the migration fd - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start writing the data stream, e.g.
{"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}}
The mapped-ram stream, in conjunction with direct IO and multifd support provided by subsequent patches, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1).
VM: 32G RAM, 1 vcpu, idle (shortly after boot)
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472
I'm surprised by the restore time speed up, as I didn't think mapped-ram should make any perf difference without direct IO and multifd.
-----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304
Still somewhat surprised by the speed up on restore here too
Hmm, I'm thinking this might be caused by zero page handling. The non mapped-ram path has an extra buffer_is_zero() and memset() of the hva page. Now, is it an issue that mapped-ram skips that memset? I assume guest memory will always be clear at the start of migration. There won't be a situation where the destination VM starts with memory already dirty... *and* the save file is also different, otherwise it wouldn't make any difference.
-----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | | | | + multifd-channels=8 | 4.421s | 0.845s | 34368554318 | 1774312 -------------------------------------------------------------------
VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+--------- legacy | 25.800s | 14.332s | 33154309983 | 64754512 -----------------------+---------+---------+--------------+--------- mapped-ram | 18.742s | 15.027s | 34368559228 | 64617160 -----------------------+---------+---------+--------------+--------- legacy + direct IO | 13.115s | 18.050s | 33154310496 | 64754520 -----------------------+---------+---------+--------------+--------- mapped-ram + direct IO | 13.623s | 15.959s | 34368557392 | 64662040
These figures make more sense with restore time matching save time more or less.
-----------------------+-------- +---------+--------------+--------- mapped-ram + direct IO | | | | + multifd-channels=8 | 6.994s | 6.470s | 34368554980 | 64665776 --------------------------------------------------------------------
As can be seen from the tables, one caveat of mapped-ram is the logical file size of a saved image is basically equivalent to the VM memory size. Note however that mapped-ram typically uses fewer blocks on disk.
Another caveat of mapped-ram is the requirement for a seekable file descriptor, which currently makes it incompatible with libvirt's support for save image compression. Also note the mapped-ram stream is incompatible with the existing stream format, hence mapped-ram cannot be used to restore an image saved with the existing format and vice versa.
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp...
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 20 ++++-- src/qemu/qemu_migration.c | 139 ++++++++++++++++++++++++++------------ src/qemu/qemu_migration.h | 4 +- src/qemu/qemu_monitor.c | 36 ++++++++++ src/qemu/qemu_monitor.h | 4 ++ src/qemu/qemu_saveimage.c | 43 +++++++++--- src/qemu/qemu_saveimage.h | 2 + src/qemu/qemu_snapshot.c | 9 ++- 8 files changed, 195 insertions(+), 62 deletions(-)
diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 6f2ce40124..98a1ad638d 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -96,6 +96,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree); */ virQEMUSaveData * virQEMUSaveDataNew(virQEMUDriver *driver, + virDomainObj *vm, char *domXML, qemuDomainSaveCookie *cookieObj, bool running, @@ -115,6 +116,19 @@ virQEMUSaveDataNew(virQEMUDriver *driver, header = &data->header; memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic)); header->version = cfg->saveImageVersion; + + /* Enable mapped-ram feature if available and save version >= 3 */ + if (header->version >= QEMU_SAVE_VERSION && + qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_MAPPED_RAM)) { + if (compressed != QEMU_SAVE_FORMAT_RAW) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("compression is not supported with save image version %1$u"), + header->version); + goto error; + } + header->features |= QEMU_SAVE_FEATURE_MAPPED_RAM; + }
If the QEMU we're usnig doesnt have CAP_MAPPED_RAM, then I think we should NOT default to Version 3 save images, as that's creating a backcompat problem for zero user benefit.
This suggests that in qemu_conf.c, we should initialize the default value to '0', and then in this code, if we see version 0 we should pick either 2 or 3 depending on mapped ram.
+ header->was_running = running ? 1 : 0; header->compressed = compressed;
With regards, Daniel

On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
Introduce support for QEMU's new mapped-ram stream format [1]. mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf to version '2'.
To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to 1 - QEMU must be provided an fdset containing the migration fd - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start writing the data stream, e.g.
{"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}}
The mapped-ram stream, in conjunction with direct IO and multifd support provided by subsequent patches, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1).
VM: 32G RAM, 1 vcpu, idle (shortly after boot)
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472
I'm surprised by the restore time speed up, as I didn't think mapped-ram should make any perf difference without direct IO and multifd.
-----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304
Still somewhat surprised by the speed up on restore here too
Hmm, I'm thinking this might be caused by zero page handling. The non mapped-ram path has an extra buffer_is_zero() and memset() of the hva page.
Now, is it an issue that mapped-ram skips that memset? I assume guest memory will always be clear at the start of migration. There won't be a situation where the destination VM starts with memory already dirty... *and* the save file is also different, otherwise it wouldn't make any difference.
Consider the snapshot use case. You're running the VM, so memory has arbitrary contents, now you restore to a saved snapshot. QEMU remains running this whole time and you can't assume initial memory is zeroed. Surely we need the memset ? With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
Introduce support for QEMU's new mapped-ram stream format [1]. mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf to version '2'.
To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to 1 - QEMU must be provided an fdset containing the migration fd - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start writing the data stream, e.g.
{"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}}
The mapped-ram stream, in conjunction with direct IO and multifd support provided by subsequent patches, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1).
VM: 32G RAM, 1 vcpu, idle (shortly after boot)
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472
I'm surprised by the restore time speed up, as I didn't think mapped-ram should make any perf difference without direct IO and multifd.
-----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304
Still somewhat surprised by the speed up on restore here too
Hmm, I'm thinking this might be caused by zero page handling. The non mapped-ram path has an extra buffer_is_zero() and memset() of the hva page.
Now, is it an issue that mapped-ram skips that memset? I assume guest memory will always be clear at the start of migration. There won't be a situation where the destination VM starts with memory already dirty... *and* the save file is also different, otherwise it wouldn't make any difference.
Consider the snapshot use case. You're running the VM, so memory has arbitrary contents, now you restore to a saved snapshot. QEMU remains running this whole time and you can't assume initial memory is zeroed. Surely we need the memset ?
Hmm, I probably have a big gap on my knowledge here, but savevm doesn't hook into file migration, so there's no way to load a snapshot with mapped-ram that I know of. Is this something that libvirt enables somehow? There would be no -incoming on the cmdline.
With regards, Daniel

On Thu, Oct 10, 2024 at 02:52:56PM -0300, Fabiano Rosas wrote:
Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
Introduce support for QEMU's new mapped-ram stream format [1]. mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf to version '2'.
To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to 1 - QEMU must be provided an fdset containing the migration fd - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start writing the data stream, e.g.
{"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}}
The mapped-ram stream, in conjunction with direct IO and multifd support provided by subsequent patches, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1).
VM: 32G RAM, 1 vcpu, idle (shortly after boot)
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472
I'm surprised by the restore time speed up, as I didn't think mapped-ram should make any perf difference without direct IO and multifd.
-----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304
Still somewhat surprised by the speed up on restore here too
Hmm, I'm thinking this might be caused by zero page handling. The non mapped-ram path has an extra buffer_is_zero() and memset() of the hva page.
Now, is it an issue that mapped-ram skips that memset? I assume guest memory will always be clear at the start of migration. There won't be a situation where the destination VM starts with memory already dirty... *and* the save file is also different, otherwise it wouldn't make any difference.
Consider the snapshot use case. You're running the VM, so memory has arbitrary contents, now you restore to a saved snapshot. QEMU remains running this whole time and you can't assume initial memory is zeroed. Surely we need the memset ?
Hmm, I probably have a big gap on my knowledge here, but savevm doesn't hook into file migration, so there's no way to load a snapshot with mapped-ram that I know of. Is this something that libvirt enables somehow? There would be no -incoming on the cmdline.
Opps, yes, i always forget savevm is off in its own little world. Upstream we've talking about making savevm be a facade around the 'migrate' command, but no one has ever made a PoC. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Oct 10, 2024 at 02:52:56PM -0300, Fabiano Rosas wrote:
Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
Daniel P. Berrangé <berrange@redhat.com> writes:
On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
Introduce support for QEMU's new mapped-ram stream format [1]. mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf to version '2'.
To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to 1 - QEMU must be provided an fdset containing the migration fd - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start writing the data stream, e.g.
{"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}}
The mapped-ram stream, in conjunction with direct IO and multifd support provided by subsequent patches, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1).
VM: 32G RAM, 1 vcpu, idle (shortly after boot)
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472
I'm surprised by the restore time speed up, as I didn't think mapped-ram should make any perf difference without direct IO and multifd.
-----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304
Still somewhat surprised by the speed up on restore here too
Hmm, I'm thinking this might be caused by zero page handling. The non mapped-ram path has an extra buffer_is_zero() and memset() of the hva page.
Now, is it an issue that mapped-ram skips that memset? I assume guest memory will always be clear at the start of migration. There won't be a situation where the destination VM starts with memory already dirty... *and* the save file is also different, otherwise it wouldn't make any difference.
Consider the snapshot use case. You're running the VM, so memory has arbitrary contents, now you restore to a saved snapshot. QEMU remains running this whole time and you can't assume initial memory is zeroed. Surely we need the memset ?
Hmm, I probably have a big gap on my knowledge here, but savevm doesn't hook into file migration, so there's no way to load a snapshot with mapped-ram that I know of. Is this something that libvirt enables somehow? There would be no -incoming on the cmdline.
Opps, yes, i always forget savevm is off in its own little world.
Upstream we've talking about making savevm be a facade around the 'migrate' command, but no one has ever made a PoC.
Yeah, that would be nice. Once I learn how the data ends up in the qcow2 image, maybe I can look into adding a new 'snapshot' migration mode to QEMU.
With regards, Daniel

Split the reading of libvirt's save image metadata from the opening of the fd that will be passed to QEMU. This provides flexibility for an upcoming patch adding mapped-ram support for restore. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 37 ++++++++-------- src/qemu/qemu_saveimage.c | 89 ++++++++++++++++++++++++--------------- src/qemu/qemu_saveimage.h | 16 ++++--- src/qemu/qemu_snapshot.c | 9 ++-- 4 files changed, 89 insertions(+), 62 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 977763e5d8..87d75b6baa 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5783,9 +5783,12 @@ qemuDomainRestoreInternal(virConnectPtr conn, if (flags & VIR_DOMAIN_SAVE_RESET_NVRAM) reset_nvram = true; - fd = qemuSaveImageOpen(driver, NULL, path, &def, &data, + if (qemuSaveImageGetMetadata(driver, NULL, path, &def, &data, false) < 0) + goto cleanup; + + fd = qemuSaveImageOpen(driver, path, (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) != 0, - &wrapperFd, false, false); + &wrapperFd, false); if (fd < 0) goto cleanup; @@ -5914,15 +5917,11 @@ qemuDomainSaveImageGetXMLDesc(virConnectPtr conn, const char *path, virQEMUDriver *driver = conn->privateData; char *ret = NULL; g_autoptr(virDomainDef) def = NULL; - int fd = -1; virQEMUSaveData *data = NULL; virCheckFlags(VIR_DOMAIN_SAVE_IMAGE_XML_SECURE, NULL); - fd = qemuSaveImageOpen(driver, NULL, path, &def, &data, - false, NULL, false, false); - - if (fd < 0) + if (qemuSaveImageGetMetadata(driver, NULL, path, &def, &data, false) < 0) goto cleanup; if (virDomainSaveImageGetXMLDescEnsureACL(conn, def) < 0) @@ -5932,7 +5931,6 @@ qemuDomainSaveImageGetXMLDesc(virConnectPtr conn, const char *path, cleanup: virQEMUSaveDataFree(data); - VIR_FORCE_CLOSE(fd); return ret; } @@ -5956,8 +5954,10 @@ qemuDomainSaveImageDefineXML(virConnectPtr conn, const char *path, else if (flags & VIR_DOMAIN_SAVE_PAUSED) state = 0; - fd = qemuSaveImageOpen(driver, NULL, path, &def, &data, - false, NULL, true, false); + if (qemuSaveImageGetMetadata(driver, NULL, path, &def, &data, false) < 0) + goto cleanup; + + fd = qemuSaveImageOpen(driver, path, 0, NULL, false); if (fd < 0) goto cleanup; @@ -6015,7 +6015,6 @@ qemuDomainManagedSaveGetXMLDesc(virDomainPtr dom, unsigned int flags) g_autofree char *path = NULL; char *ret = NULL; g_autoptr(virDomainDef) def = NULL; - int fd = -1; virQEMUSaveData *data = NULL; qemuDomainObjPrivate *priv; @@ -6037,15 +6036,14 @@ qemuDomainManagedSaveGetXMLDesc(virDomainPtr dom, unsigned int flags) goto cleanup; } - if ((fd = qemuSaveImageOpen(driver, priv->qemuCaps, path, &def, &data, - false, NULL, false, false)) < 0) + if (qemuSaveImageGetMetadata(driver, priv->qemuCaps, path, + &def, &data, false) < 0) goto cleanup; ret = qemuDomainDefFormatXML(driver, priv->qemuCaps, def, flags); cleanup: virQEMUSaveDataFree(data); - VIR_FORCE_CLOSE(fd); virDomainObjEndAPI(&vm); return ret; } @@ -6101,14 +6099,17 @@ qemuDomainObjRestore(virConnectPtr conn, virQEMUSaveData *data = NULL; virFileWrapperFd *wrapperFd = NULL; - fd = qemuSaveImageOpen(driver, NULL, path, &def, &data, - bypass_cache, &wrapperFd, false, true); - if (fd < 0) { - if (fd == -3) + ret = qemuSaveImageGetMetadata(driver, NULL, path, &def, &data, true); + if (ret < 0) { + if (ret == -3) ret = 1; goto cleanup; } + fd = qemuSaveImageOpen(driver, path, bypass_cache, &wrapperFd, false); + if (fd < 0) + goto cleanup; + if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) { int hookret; diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 98a1ad638d..125064ab66 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -570,58 +570,35 @@ qemuSaveImageGetCompressionProgram(const char *imageFormat, /** - * qemuSaveImageOpen: + * qemuSaveImageGetMetadata: * @driver: qemu driver data * @qemuCaps: pointer to qemuCaps if the domain is running or NULL * @path: path of the save image * @ret_def: returns domain definition created from the XML stored in the image * @ret_data: returns structure filled with data from the image header - * @bypass_cache: bypass cache when opening the file - * @wrapperFd: returns the file wrapper structure - * @open_write: open the file for writing (for updates) * @unlink_corrupt: remove the image file if it is corrupted * - * Returns the opened fd of the save image file and fills the appropriate fields - * on success. On error returns -1 on most failures, -3 if corrupt image was - * unlinked (no error raised). + * Open the save image file, read libvirt's save image metadata, and populate + * the @ret_def and @ret_data structures. Returns 0 on success and -1 on most + * failures. Returns -3 if corrupt image was unlinked (no error raised). */ int -qemuSaveImageOpen(virQEMUDriver *driver, - virQEMUCaps *qemuCaps, - const char *path, - virDomainDef **ret_def, - virQEMUSaveData **ret_data, - bool bypass_cache, - virFileWrapperFd **wrapperFd, - bool open_write, - bool unlink_corrupt) +qemuSaveImageGetMetadata(virQEMUDriver *driver, + virQEMUCaps *qemuCaps, + const char *path, + virDomainDef **ret_def, + virQEMUSaveData **ret_data, + bool unlink_corrupt) { g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); VIR_AUTOCLOSE fd = -1; - int ret = -1; g_autoptr(virQEMUSaveData) data = NULL; virQEMUSaveHeader *header; g_autoptr(virDomainDef) def = NULL; - int oflags = open_write ? O_RDWR : O_RDONLY; size_t xml_len; size_t cookie_len; - if (bypass_cache) { - int directFlag = virFileDirectFdFlag(); - if (directFlag < 0) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("bypass cache unsupported by this system")); - return -1; - } - oflags |= directFlag; - } - - if ((fd = qemuDomainOpenFile(cfg, NULL, path, oflags, NULL)) < 0) - return -1; - - if (bypass_cache && - !(*wrapperFd = virFileWrapperFdNew(&fd, path, - VIR_FILE_WRAPPER_BYPASS_CACHE))) + if ((fd = qemuDomainOpenFile(cfg, NULL, path, O_RDONLY, NULL)) < 0) return -1; data = g_new0(virQEMUSaveData, 1); @@ -725,6 +702,50 @@ qemuSaveImageOpen(virQEMUDriver *driver, *ret_def = g_steal_pointer(&def); *ret_data = g_steal_pointer(&data); + return 0; +} + + +/** + * qemuSaveImageOpen: + * @driver: qemu driver data + * @path: path of the save image + * @bypass_cache: bypass cache when opening the file + * @wrapperFd: returns the file wrapper structure + * @open_write: open the file for writing (for updates) + * + * Returns the opened fd of the save image file on success, -1 on failure. + */ +int +qemuSaveImageOpen(virQEMUDriver *driver, + const char *path, + bool bypass_cache, + virFileWrapperFd **wrapperFd, + bool open_write) +{ + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + VIR_AUTOCLOSE fd = -1; + int ret = -1; + int oflags = open_write ? O_RDWR : O_RDONLY; + + if (bypass_cache) { + int directFlag = virFileDirectFdFlag(); + if (directFlag < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("bypass cache unsupported by this system")); + return -1; + } + oflags |= directFlag; + } + + if ((fd = qemuDomainOpenFile(cfg, NULL, path, oflags, NULL)) < 0) + return -1; + + if (bypass_cache && + !(*wrapperFd = virFileWrapperFdNew(&fd, path, + VIR_FILE_WRAPPER_BYPASS_CACHE))) + return -1; + ret = fd; fd = -1; diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index eb2835f11b..e101fdba6e 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -74,17 +74,21 @@ qemuSaveImageStartVM(virConnectPtr conn, virDomainAsyncJob asyncJob) ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_NONNULL(6); +int +qemuSaveImageGetMetadata(virQEMUDriver *driver, + virQEMUCaps *qemuCaps, + const char *path, + virDomainDef **ret_def, + virQEMUSaveData **ret_data, + bool unlink_corrupt) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); + int qemuSaveImageOpen(virQEMUDriver *driver, - virQEMUCaps *qemuCaps, const char *path, - virDomainDef **ret_def, - virQEMUSaveData **ret_data, bool bypass_cache, virFileWrapperFd **wrapperFd, - bool open_write, - bool unlink_corrupt) - ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); + bool open_write); int qemuSaveImageGetCompressionProgram(const char *imageFormat, diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index 9833fb6206..7d05ce76f4 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -2118,11 +2118,12 @@ qemuSnapshotRevertExternalPrepare(virDomainObj *vm, g_autoptr(virDomainDef) savedef = NULL; memdata->path = snapdef->memorysnapshotfile; - memdata->fd = qemuSaveImageOpen(driver, NULL, memdata->path, - &savedef, &memdata->data, - false, NULL, - false, false); + if (qemuSaveImageGetMetadata(driver, NULL, memdata->path, &savedef, + &memdata->data, false) < 0) + return -1; + memdata->fd = qemuSaveImageOpen(driver, memdata->path, + false, NULL, false); if (memdata->fd < 0) return -1; -- 2.35.3

qemuProcessStartWithMemoryState() is the only caller of qemuProcessStart() that uses the qemuProcessIncomingDef struct. Move creation of the struct to qemuProcessStartWithMemoryState(). Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_process.c | 44 ++++++++++++++++++++--------------------- src/qemu/qemu_process.h | 2 +- 2 files changed, 22 insertions(+), 24 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index cec739c984..7eeb85062c 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8191,7 +8191,7 @@ qemuProcessStart(virConnectPtr conn, virDomainObj *vm, virCPUDef *updatedCPU, virDomainAsyncJob asyncJob, - const char *migrateFrom, + qemuProcessIncomingDef *incoming, int migrateFd, const char *migratePath, virDomainMomentObj *snapshot, @@ -8199,7 +8199,6 @@ qemuProcessStart(virConnectPtr conn, unsigned int flags) { qemuDomainObjPrivate *priv = vm->privateData; - qemuProcessIncomingDef *incoming = NULL; unsigned int stopFlags; bool relabel = false; bool relabelSavedState = false; @@ -8207,11 +8206,11 @@ qemuProcessStart(virConnectPtr conn, int rv; VIR_DEBUG("conn=%p driver=%p vm=%p name=%s id=%d asyncJob=%s " - "migrateFrom=%s migrateFd=%d migratePath=%s " + "incoming=%p migrateFd=%d migratePath=%s " "snapshot=%p vmop=%d flags=0x%x", conn, driver, vm, vm->def->name, vm->def->id, virDomainAsyncJobTypeToString(asyncJob), - NULLSTR(migrateFrom), migrateFd, NULLSTR(migratePath), + incoming, migrateFd, NULLSTR(migratePath), snapshot, vmop, flags); virCheckFlagsGoto(VIR_QEMU_PROCESS_START_COLD | @@ -8220,20 +8219,13 @@ qemuProcessStart(virConnectPtr conn, VIR_QEMU_PROCESS_START_GEN_VMID | VIR_QEMU_PROCESS_START_RESET_NVRAM, cleanup); - if (!migrateFrom && !snapshot) + if (!incoming && !snapshot) flags |= VIR_QEMU_PROCESS_START_NEW; if (qemuProcessInit(driver, vm, updatedCPU, - asyncJob, !!migrateFrom, flags) < 0) + asyncJob, !!incoming, flags) < 0) goto cleanup; - if (migrateFrom) { - incoming = qemuProcessIncomingDefNew(priv->qemuCaps, NULL, migrateFrom, - migrateFd, migratePath); - if (!incoming) - goto stop; - } - if (qemuProcessPrepareDomain(driver, vm, flags) < 0) goto stop; @@ -8286,14 +8278,13 @@ qemuProcessStart(virConnectPtr conn, qemuSecurityRestoreSavedStateLabel(driver->securityManager, vm->def, migratePath) < 0) VIR_WARN("failed to restore save state label on %s", migratePath); - qemuProcessIncomingDefFree(incoming); return ret; stop: stopFlags = 0; if (!relabel) stopFlags |= VIR_QEMU_PROCESS_STOP_NO_RELABEL; - if (migrateFrom) + if (incoming) stopFlags |= VIR_QEMU_PROCESS_STOP_MIGRATED; if (priv->mon) qemuMonitorSetDomainLog(priv->mon, NULL, NULL, NULL); @@ -8347,20 +8338,24 @@ qemuProcessStartWithMemoryState(virConnectPtr conn, VIR_AUTOCLOSE intermediatefd = -1; g_autoptr(virCommand) cmd = NULL; g_autofree char *errbuf = NULL; - const char *migrateFrom = NULL; + qemuProcessIncomingDef *incoming = NULL; int rc = 0; + int ret = -1; + + incoming = qemuProcessIncomingDefNew(priv->qemuCaps, NULL, "stdio", + *fd, path); + if (!incoming) + return -1; if (data) { if (virSaveCookieParseString(data->cookie, (virObject **)&cookie, virDomainXMLOptionGetSaveCookie(driver->xmlopt)) < 0) - return -1; + goto cleanup; if (qemuSaveImageDecompressionStart(data, fd, &intermediatefd, &errbuf, &cmd) < 0) { - return -1; + goto cleanup; } - - migrateFrom = "stdio"; } /* No cookie means libvirt which saved the domain was too old to mess up @@ -8373,7 +8368,7 @@ qemuProcessStartWithMemoryState(virConnectPtr conn, priv->disableSlirp = true; if (qemuProcessStart(conn, driver, vm, cookie ? cookie->cpu : NULL, - asyncJob, migrateFrom, *fd, path, snapshot, + asyncJob, incoming, *fd, path, snapshot, VIR_NETDEV_VPORT_PROFILE_OP_RESTORE, start_flags) == 0) *started = true; @@ -8385,14 +8380,17 @@ qemuProcessStartWithMemoryState(virConnectPtr conn, virDomainAuditStart(vm, reason, *started); if (!*started || rc < 0) - return -1; + goto cleanup; /* qemuProcessStart doesn't unset the qemu error reporting infrastructure * in case of migration (which is used in this case) so we need to reset it * so that the handle to virtlogd is not held open unnecessarily */ qemuMonitorSetDomainLog(qemuDomainGetMonitor(vm), NULL, NULL, NULL); + ret = 0; - return 0; + cleanup: + qemuProcessIncomingDefFree(incoming); + return ret; } diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 2324aeb7bd..e48d53dc46 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -84,7 +84,7 @@ int qemuProcessStart(virConnectPtr conn, virDomainObj *vm, virCPUDef *updatedCPU, virDomainAsyncJob asyncJob, - const char *migrateFrom, + qemuProcessIncomingDef *incoming, int stdin_fd, const char *stdin_path, virDomainMomentObj *snapshot, -- 2.35.3

Similar to qemuMigrationSrcRun, apply migration parameters in qemuMigrationDstRun. This allows callers to create customized migration parameters, but delegates their application to the function performing the migration. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration.c | 15 +++++++++------ src/qemu/qemu_migration.h | 4 +++- src/qemu/qemu_process.c | 2 +- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 3ab5e2fa30..fd897b4ed1 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2285,13 +2285,19 @@ qemuMigrationDstGetURI(const char *migrateFrom, int qemuMigrationDstRun(virDomainObj *vm, const char *uri, - virDomainAsyncJob asyncJob) + virDomainAsyncJob asyncJob, + qemuMigrationParams *migParams, + unsigned int flags) { qemuDomainObjPrivate *priv = vm->privateData; int rv; VIR_DEBUG("Setting up incoming migration with URI %s", uri); + if (migParams && qemuMigrationParamsApply(vm, asyncJob, + migParams, flags) < 0) + return -1; + if (qemuDomainObjEnterMonitorAsync(vm, asyncJob) < 0) return -1; @@ -3207,10 +3213,6 @@ qemuMigrationDstPrepareActive(virQEMUDriver *driver, goto error; } - if (qemuMigrationParamsApply(vm, VIR_ASYNC_JOB_MIGRATION_IN, - migParams, flags) < 0) - goto error; - if (mig->nbd && flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC)) { const char *nbdTLSAlias = NULL; @@ -3242,7 +3244,8 @@ qemuMigrationDstPrepareActive(virQEMUDriver *driver, } if (qemuMigrationDstRun(vm, incoming->uri, - VIR_ASYNC_JOB_MIGRATION_IN) < 0) + VIR_ASYNC_JOB_MIGRATION_IN, + migParams, flags) < 0) goto error; if (qemuProcessFinishStartup(driver, vm, VIR_ASYNC_JOB_MIGRATION_IN, diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 039a926e69..5529f2ee21 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -281,7 +281,9 @@ qemuMigrationDstGetURI(const char *migrateFrom, int qemuMigrationDstRun(virDomainObj *vm, const char *uri, - virDomainAsyncJob asyncJob); + virDomainAsyncJob asyncJob, + qemuMigrationParams *migParams, + unsigned int flags); void qemuMigrationSrcPostcopyFailed(virDomainObj *vm); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 7eeb85062c..1d7f214212 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8248,7 +8248,7 @@ qemuProcessStart(virConnectPtr conn, relabel = true; if (incoming) { - if (qemuMigrationDstRun(vm, incoming->uri, asyncJob) < 0) + if (qemuMigrationDstRun(vm, incoming->uri, asyncJob, NULL, 0) < 0) goto stop; } else { /* Refresh state of devices from QEMU. During migration this happens -- 2.35.3

Add support for the mapped-ram migration capability on restore. Using mapped-ram with QEMU to restore an image requires the same steps as saving: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to a value >= 1 - QEMU must be provided an fdset containing the migration fd(s) - The 'migrate-incoming' qmp command is invoked with a URI referencing the fdset and an offset where to start reading the data stream, e.g. {"execute":"migrate-incoming", "arguments":{"uri":"file:/dev/fdset/0,offset=0x119eb"}} Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 26 +++++++++++++----- src/qemu/qemu_migration.c | 11 ++++---- src/qemu/qemu_process.c | 58 +++++++++++++++++++++++++++++++-------- src/qemu/qemu_process.h | 15 ++++++---- src/qemu/qemu_saveimage.c | 9 ++++-- src/qemu/qemu_saveimage.h | 2 ++ src/qemu/qemu_snapshot.c | 8 +++--- 7 files changed, 92 insertions(+), 37 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 87d75b6baa..6d0f52951c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1611,7 +1611,7 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr conn, } if (qemuProcessStart(conn, driver, vm, NULL, VIR_ASYNC_JOB_START, - NULL, -1, NULL, NULL, + NULL, -1, NULL, NULL, NULL, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags) < 0) { virDomainAuditStart(vm, "booted", false); @@ -5774,6 +5774,8 @@ qemuDomainRestoreInternal(virConnectPtr conn, virFileWrapperFd *wrapperFd = NULL; bool hook_taint = false; bool reset_nvram = false; + bool mapped_ram = false; + g_autoptr(qemuMigrationParams) restoreParams = NULL; virCheckFlags(VIR_DOMAIN_SAVE_BYPASS_CACHE | VIR_DOMAIN_SAVE_RUNNING | @@ -5786,9 +5788,13 @@ qemuDomainRestoreInternal(virConnectPtr conn, if (qemuSaveImageGetMetadata(driver, NULL, path, &def, &data, false) < 0) goto cleanup; + mapped_ram = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; + if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram))) + return -1; + fd = qemuSaveImageOpen(driver, path, (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) != 0, - &wrapperFd, false); + mapped_ram, &wrapperFd, false); if (fd < 0) goto cleanup; @@ -5842,7 +5848,7 @@ qemuDomainRestoreInternal(virConnectPtr conn, if (qemuProcessBeginJob(vm, VIR_DOMAIN_JOB_OPERATION_RESTORE, flags) < 0) goto cleanup; - ret = qemuSaveImageStartVM(conn, driver, vm, &fd, data, path, + ret = qemuSaveImageStartVM(conn, driver, vm, &fd, data, path, restoreParams, false, reset_nvram, VIR_ASYNC_JOB_START); qemuProcessEndJob(vm); @@ -5957,7 +5963,7 @@ qemuDomainSaveImageDefineXML(virConnectPtr conn, const char *path, if (qemuSaveImageGetMetadata(driver, NULL, path, &def, &data, false) < 0) goto cleanup; - fd = qemuSaveImageOpen(driver, path, 0, NULL, false); + fd = qemuSaveImageOpen(driver, path, 0, false, NULL, false); if (fd < 0) goto cleanup; @@ -6098,6 +6104,8 @@ qemuDomainObjRestore(virConnectPtr conn, g_autofree char *xmlout = NULL; virQEMUSaveData *data = NULL; virFileWrapperFd *wrapperFd = NULL; + bool mapped_ram = false; + g_autoptr(qemuMigrationParams) restoreParams = NULL; ret = qemuSaveImageGetMetadata(driver, NULL, path, &def, &data, true); if (ret < 0) { @@ -6106,7 +6114,11 @@ qemuDomainObjRestore(virConnectPtr conn, goto cleanup; } - fd = qemuSaveImageOpen(driver, path, bypass_cache, &wrapperFd, false); + mapped_ram = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; + if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram))) + return -1; + + fd = qemuSaveImageOpen(driver, path, bypass_cache, mapped_ram, &wrapperFd, false); if (fd < 0) goto cleanup; @@ -6148,7 +6160,7 @@ qemuDomainObjRestore(virConnectPtr conn, virDomainObjAssignDef(vm, &def, true, NULL); - ret = qemuSaveImageStartVM(conn, driver, vm, &fd, data, path, + ret = qemuSaveImageStartVM(conn, driver, vm, &fd, data, path, restoreParams, start_paused, reset_nvram, asyncJob); cleanup: @@ -6349,7 +6361,7 @@ qemuDomainObjStart(virConnectPtr conn, } ret = qemuProcessStart(conn, driver, vm, NULL, asyncJob, - NULL, -1, NULL, NULL, + NULL, -1, NULL, NULL, NULL, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags); virDomainAuditStart(vm, "booted", ret >= 0); if (ret >= 0) { diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index fd897b4ed1..35d3e26908 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2953,9 +2953,8 @@ qemuMigrationDstPrepare(virDomainObj *vm, const char *protocol, const char *listenAddress, unsigned short port, - int fd) + int *fd) { - qemuDomainObjPrivate *priv = vm->privateData; g_autofree char *migrateFrom = NULL; if (tunnel) { @@ -3009,8 +3008,8 @@ qemuMigrationDstPrepare(virDomainObj *vm, migrateFrom = g_strdup_printf(incFormat, protocol, listenAddress, port); } - return qemuProcessIncomingDefNew(priv->qemuCaps, listenAddress, - migrateFrom, fd, NULL); + return qemuProcessIncomingDefNew(vm, listenAddress, + migrateFrom, fd, NULL, NULL); } @@ -3154,7 +3153,7 @@ qemuMigrationDstPrepareActive(virQEMUDriver *driver, if (!(incoming = qemuMigrationDstPrepare(vm, tunnel, protocol, listenAddress, port, - dataFD[0]))) + &dataFD[0]))) goto error; if (qemuProcessPrepareDomain(driver, vm, startFlags) < 0) @@ -3524,7 +3523,7 @@ qemuMigrationDstPrepareResume(virQEMUDriver *driver, priv->origname = g_strdup(origname); if (!(incoming = qemuMigrationDstPrepare(vm, false, protocol, - listenAddress, port, -1))) + listenAddress, port, NULL))) goto cleanup; if (qemuDomainObjEnterMonitorAsync(vm, VIR_ASYNC_JOB_MIGRATION_IN) < 0) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1d7f214212..b02cd84aff 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4680,6 +4680,7 @@ qemuProcessIncomingDefFree(qemuProcessIncomingDef *inc) g_free(inc->address); g_free(inc->uri); + qemuFDPassFree(inc->fdPassMigrate); g_free(inc); } @@ -4693,26 +4694,54 @@ qemuProcessIncomingDefFree(qemuProcessIncomingDef *inc) * qemuProcessIncomingDefFree will NOT close it. */ qemuProcessIncomingDef * -qemuProcessIncomingDefNew(virQEMUCaps *qemuCaps, +qemuProcessIncomingDefNew(virDomainObj *vm, const char *listenAddress, const char *migrateFrom, - int fd, - const char *path) + int *fd, + const char *path, + virQEMUSaveData *data) { + qemuDomainObjPrivate *priv = vm->privateData; qemuProcessIncomingDef *inc = NULL; - if (qemuMigrationDstCheckProtocol(qemuCaps, migrateFrom) < 0) + if (qemuMigrationDstCheckProtocol(priv->qemuCaps, migrateFrom) < 0) return NULL; inc = g_new0(qemuProcessIncomingDef, 1); inc->address = g_strdup(listenAddress); - inc->uri = qemuMigrationDstGetURI(migrateFrom, fd); + if (data) { + size_t offset = sizeof(virQEMUSaveHeader) + data->header.data_len; + + if (data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM) { + unsigned int fdsetId; + + inc->fdPassMigrate = qemuFDPassNew("migrate", priv); + qemuFDPassAddFD(inc->fdPassMigrate, fd, "-buffered-fd"); + qemuFDPassGetId(inc->fdPassMigrate, &fdsetId); + inc->uri = g_strdup_printf("file:/dev/fdset/%u,offset=%#lx", fdsetId, offset); + } else { + g_autofree char *tmp = g_new(char, offset); + + /* HACK: + * We provide qemu the offset in case of mapped-ram, but must set + * the file offset for the legacy save format. Unfortunately we + * can't lseek when the fd is wrapped by virFileWrapperFd, so + * we do a needless read instead. + */ + if (saferead(*fd, tmp, offset) != offset) + VIR_DEBUG("failed to read from save file"); + inc->uri = qemuMigrationDstGetURI(migrateFrom, *fd); + } + } else { + inc->uri = qemuMigrationDstGetURI(migrateFrom, *fd); + } + if (!inc->uri) goto error; - inc->fd = fd; + inc->fd = *fd; inc->path = path; return inc; @@ -7782,8 +7811,11 @@ qemuProcessLaunch(virConnectPtr conn, &nnicindexes, &nicindexes))) goto cleanup; - if (incoming && incoming->fd != -1) - virCommandPassFD(cmd, incoming->fd, 0); + if (incoming) { + if (incoming->fd != -1) + virCommandPassFD(cmd, incoming->fd, 0); + qemuFDPassTransferCommand(incoming->fdPassMigrate, cmd); + } /* now that we know it is about to start call the hook if present */ if (qemuProcessStartHook(driver, vm, @@ -8195,6 +8227,7 @@ qemuProcessStart(virConnectPtr conn, int migrateFd, const char *migratePath, virDomainMomentObj *snapshot, + qemuMigrationParams *migParams, virNetDevVPortProfileOp vmop, unsigned int flags) { @@ -8248,7 +8281,7 @@ qemuProcessStart(virConnectPtr conn, relabel = true; if (incoming) { - if (qemuMigrationDstRun(vm, incoming->uri, asyncJob, NULL, 0) < 0) + if (qemuMigrationDstRun(vm, incoming->uri, asyncJob, migParams, 0) < 0) goto stop; } else { /* Refresh state of devices from QEMU. During migration this happens @@ -8302,6 +8335,7 @@ qemuProcessStart(virConnectPtr conn, * @path: path to memory state file * @snapshot: internal snapshot to load when starting QEMU process or NULL * @data: data from memory state file or NULL + * @migParams: Migration params to use on restore or NULL * @asyncJob: type of asynchronous job * @start_flags: flags to start QEMU process with * @reason: audit log reason @@ -8328,6 +8362,7 @@ qemuProcessStartWithMemoryState(virConnectPtr conn, const char *path, virDomainMomentObj *snapshot, virQEMUSaveData *data, + qemuMigrationParams *migParams, virDomainAsyncJob asyncJob, unsigned int start_flags, const char *reason, @@ -8342,8 +8377,7 @@ qemuProcessStartWithMemoryState(virConnectPtr conn, int rc = 0; int ret = -1; - incoming = qemuProcessIncomingDefNew(priv->qemuCaps, NULL, "stdio", - *fd, path); + incoming = qemuProcessIncomingDefNew(vm, NULL, "stdio", fd, path, data); if (!incoming) return -1; @@ -8369,7 +8403,7 @@ qemuProcessStartWithMemoryState(virConnectPtr conn, if (qemuProcessStart(conn, driver, vm, cookie ? cookie->cpu : NULL, asyncJob, incoming, *fd, path, snapshot, - VIR_NETDEV_VPORT_PROFILE_OP_RESTORE, + migParams, VIR_NETDEV_VPORT_PROFILE_OP_RESTORE, start_flags) == 0) *started = true; diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index e48d53dc46..93699da8bd 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -54,14 +54,17 @@ struct _qemuProcessIncomingDef { char *address; /* address where QEMU is supposed to listen */ char *uri; /* used when calling migrate-incoming QMP command */ int fd; /* for fd:N URI */ + qemuFDPass *fdPassMigrate; /* for file:/dev/fdset/n,offset=x URI */ const char *path; /* path associated with fd */ }; -qemuProcessIncomingDef *qemuProcessIncomingDefNew(virQEMUCaps *qemuCaps, - const char *listenAddress, - const char *migrateFrom, - int fd, - const char *path); +qemuProcessIncomingDef *qemuProcessIncomingDefNew(virDomainObj *vm, + const char *listenAddress, + const char *migrateFrom, + int *fd, + const char *path, + virQEMUSaveData *data); + void qemuProcessIncomingDefFree(qemuProcessIncomingDef *inc); int qemuProcessBeginJob(virDomainObj *vm, @@ -88,6 +91,7 @@ int qemuProcessStart(virConnectPtr conn, int stdin_fd, const char *stdin_path, virDomainMomentObj *snapshot, + qemuMigrationParams *migParams, virNetDevVPortProfileOp vmop, unsigned int flags); @@ -98,6 +102,7 @@ int qemuProcessStartWithMemoryState(virConnectPtr conn, const char *path, virDomainMomentObj *snapshot, virQEMUSaveData *data, + qemuMigrationParams *migParams, virDomainAsyncJob asyncJob, unsigned int start_flags, const char *reason, diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 125064ab66..b99e0de1ff 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -711,6 +711,7 @@ qemuSaveImageGetMetadata(virQEMUDriver *driver, * @driver: qemu driver data * @path: path of the save image * @bypass_cache: bypass cache when opening the file + * @mapped_ram: Image contains mapped-ram save format * @wrapperFd: returns the file wrapper structure * @open_write: open the file for writing (for updates) * @@ -720,6 +721,7 @@ int qemuSaveImageOpen(virQEMUDriver *driver, const char *path, bool bypass_cache, + bool mapped_ram, virFileWrapperFd **wrapperFd, bool open_write) { @@ -741,7 +743,7 @@ qemuSaveImageOpen(virQEMUDriver *driver, if ((fd = qemuDomainOpenFile(cfg, NULL, path, oflags, NULL)) < 0) return -1; - if (bypass_cache && + if (!mapped_ram && bypass_cache && !(*wrapperFd = virFileWrapperFdNew(&fd, path, VIR_FILE_WRAPPER_BYPASS_CACHE))) return -1; @@ -760,6 +762,7 @@ qemuSaveImageStartVM(virConnectPtr conn, int *fd, virQEMUSaveData *data, const char *path, + qemuMigrationParams *restoreParams, bool start_paused, bool reset_nvram, virDomainAsyncJob asyncJob) @@ -776,8 +779,8 @@ qemuSaveImageStartVM(virConnectPtr conn, start_flags |= VIR_QEMU_PROCESS_START_RESET_NVRAM; if (qemuProcessStartWithMemoryState(conn, driver, vm, fd, path, NULL, data, - asyncJob, start_flags, "restored", - &started) < 0) { + restoreParams, asyncJob, start_flags, + "restored", &started) < 0) { goto cleanup; } diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index e101fdba6e..2007784e88 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -69,6 +69,7 @@ qemuSaveImageStartVM(virConnectPtr conn, int *fd, virQEMUSaveData *data, const char *path, + qemuMigrationParams *restoreParams, bool start_paused, bool reset_nvram, virDomainAsyncJob asyncJob) @@ -87,6 +88,7 @@ int qemuSaveImageOpen(virQEMUDriver *driver, const char *path, bool bypass_cache, + bool mapped_ram, virFileWrapperFd **wrapperFd, bool open_write); diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index 7d05ce76f4..79c20d48f4 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -2123,7 +2123,7 @@ qemuSnapshotRevertExternalPrepare(virDomainObj *vm, return -1; memdata->fd = qemuSaveImageOpen(driver, memdata->path, - false, NULL, false); + false, false, NULL, false); if (memdata->fd < 0) return -1; @@ -2366,7 +2366,7 @@ qemuSnapshotRevertActive(virDomainObj *vm, if (qemuProcessStartWithMemoryState(snapshot->domain->conn, driver, vm, &memdata.fd, memdata.path, loadSnap, - memdata.data, VIR_ASYNC_JOB_SNAPSHOT, + memdata.data, NULL, VIR_ASYNC_JOB_SNAPSHOT, start_flags, "from-snapshot", &started) < 0) { if (started) { @@ -2520,7 +2520,7 @@ qemuSnapshotRevertInactive(virDomainObj *vm, rc = qemuProcessStart(snapshot->domain->conn, driver, vm, NULL, VIR_ASYNC_JOB_SNAPSHOT, NULL, -1, NULL, NULL, - VIR_NETDEV_VPORT_PROFILE_OP_CREATE, + NULL, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags); virDomainAuditStart(vm, "from-snapshot", rc >= 0); if (rc < 0) { @@ -2997,7 +2997,7 @@ qemuSnapshotDeleteExternalPrepare(virDomainObj *vm, if (!virDomainObjIsActive(vm)) { if (qemuProcessStart(NULL, driver, vm, NULL, VIR_ASYNC_JOB_SNAPSHOT, - NULL, -1, NULL, NULL, + NULL, -1, NULL, NULL, NULL, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, VIR_QEMU_PROCESS_START_PAUSED) < 0) { return -1; -- 2.35.3

When using the mapped-ram migration capability, direct IO is enabled by setting the "direct-io" migration parameter to "true" and passing QEMU an additional fd with O_DIRECT set. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 9 +++++---- src/qemu/qemu_migration.c | 32 ++++++++++++++++++++++++++------ src/qemu/qemu_migration.h | 1 + src/qemu/qemu_migration_params.c | 11 ++++++++++- src/qemu/qemu_migration_params.h | 3 ++- src/qemu/qemu_monitor.c | 7 +++++-- src/qemu/qemu_monitor.h | 3 ++- src/qemu/qemu_saveimage.c | 2 +- 8 files changed, 52 insertions(+), 16 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6d0f52951c..0025bad6e7 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2691,7 +2691,7 @@ qemuDomainSaveInternal(virQEMUDriver *driver, xml = NULL; mappedRam = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(saveParams = qemuMigrationParamsForSave(mappedRam))) + if (!(saveParams = qemuMigrationParamsForSave(mappedRam, flags))) goto endjob; ret = qemuSaveImageCreate(driver, vm, path, data, compressor, @@ -3143,7 +3143,7 @@ doCoreDump(virQEMUDriver *driver, if (!(dump_params = qemuMigrationParamsNew())) goto cleanup; - if (qemuMigrationSrcToFile(driver, vm, &fd, compressor, + if (qemuMigrationSrcToFile(driver, vm, path, &fd, compressor, dump_params, dump_flags, VIR_ASYNC_JOB_DUMP) < 0) goto cleanup; } @@ -5789,7 +5789,7 @@ qemuDomainRestoreInternal(virConnectPtr conn, goto cleanup; mapped_ram = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram))) + if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram, flags))) return -1; fd = qemuSaveImageOpen(driver, path, @@ -6115,7 +6115,8 @@ qemuDomainObjRestore(virConnectPtr conn, } mapped_ram = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram))) + if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram, + bypass_cache ? VIR_DOMAIN_SAVE_BYPASS_CACHE : 0))) return -1; fd = qemuSaveImageOpen(driver, path, bypass_cache, mapped_ram, &wrapperFd, false); diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 35d3e26908..daa42bcfe4 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -7012,17 +7012,36 @@ qemuMigrationSrcToLegacyFile(virQEMUDriver *driver, static int qemuMigrationSrcToMappedFile(virQEMUDriver *driver, virDomainObj *vm, + const char *path, int *fd, unsigned int flags, virDomainAsyncJob asyncJob) { + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + VIR_AUTOCLOSE directFd = -1; + int directFlag = 0; + bool needUnlink = false; int ret; - /* mapped-ram does not support directIO */ + /* When using directio with mapped-ram, qemu needs two fds. One with + * O_DIRECT set writing the memory, and another without it set for + * writing small bits of unaligned state. */ if ((flags & VIR_DOMAIN_SAVE_BYPASS_CACHE)) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("bypass cache unsupported by this system")); - return -1; + directFlag = virFileDirectFdFlag(); + if (directFlag < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("bypass cache unsupported by this system")); + return -1; + } + directFd = virQEMUFileOpenAs(cfg->user, cfg->group, false, path, + O_WRONLY | directFlag, &needUnlink); + + if (directFd < 0) + return -1; + + if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, directFd) < 0) + return -1; + } if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, *fd) < 0) @@ -7031,7 +7050,7 @@ qemuMigrationSrcToMappedFile(virQEMUDriver *driver, if (qemuDomainObjEnterMonitorAsync(vm, asyncJob) < 0) return -1; - ret = qemuMonitorMigrateToFdSet(vm, 0, fd); + ret = qemuMonitorMigrateToFdSet(vm, 0, fd, &directFd); qemuDomainObjExitMonitor(vm); return ret; } @@ -7040,6 +7059,7 @@ qemuMigrationSrcToMappedFile(virQEMUDriver *driver, /* Helper function called while vm is active. */ int qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, + const char *path, int *fd, virCommand *compressor, qemuMigrationParams *migParams, @@ -7077,7 +7097,7 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, if (migParams && qemuMigrationParamsCapEnabled(migParams, QEMU_MIGRATION_CAP_MAPPED_RAM)) { - rc = qemuMigrationSrcToMappedFile(driver, vm, fd, flags, asyncJob); + rc = qemuMigrationSrcToMappedFile(driver, vm, path, fd, flags, asyncJob); } else { rc = qemuMigrationSrcToLegacyFile(driver, vm, *fd, compressor, asyncJob); } diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 5529f2ee21..51d5b680bc 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -236,6 +236,7 @@ qemuMigrationSrcIsAllowed(virDomainObj *vm, int qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, + const char *path, int *fd, virCommand *compressor, qemuMigrationParams *migParams, diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 503d6165b0..8f6003005c 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -129,6 +129,7 @@ VIR_ENUM_IMPL(qemuMigrationParam, "multifd-compression", "multifd-zlib-level", "multifd-zstd-level", + "direct-io", ); typedef struct _qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOnItem; @@ -319,6 +320,9 @@ static const qemuMigrationParamInfoItem qemuMigrationParamInfo[] = { [QEMU_MIGRATION_PARAM_MULTIFD_ZSTD_LEVEL] = { .type = QEMU_MIGRATION_PARAM_TYPE_INT, }, + [QEMU_MIGRATION_PARAM_DIRECT_IO] = { + .type = QEMU_MIGRATION_PARAM_TYPE_BOOL, + }, }; G_STATIC_ASSERT(G_N_ELEMENTS(qemuMigrationParamInfo) == QEMU_MIGRATION_PARAM_LAST); @@ -784,7 +788,7 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params, qemuMigrationParams * -qemuMigrationParamsForSave(bool mappedRam) +qemuMigrationParamsForSave(bool mappedRam, unsigned int flags) { g_autoptr(qemuMigrationParams) saveParams = NULL; @@ -798,6 +802,11 @@ qemuMigrationParamsForSave(bool mappedRam) return NULL; saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = 1; saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].set = true; + + if (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) { + saveParams->params[QEMU_MIGRATION_PARAM_DIRECT_IO].value.b = true; + saveParams->params[QEMU_MIGRATION_PARAM_DIRECT_IO].set = true; + } } return g_steal_pointer(&saveParams); diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index fe239d9a8f..9700469b5e 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -65,6 +65,7 @@ typedef enum { QEMU_MIGRATION_PARAM_MULTIFD_COMPRESSION, QEMU_MIGRATION_PARAM_MULTIFD_ZLIB_LEVEL, QEMU_MIGRATION_PARAM_MULTIFD_ZSTD_LEVEL, + QEMU_MIGRATION_PARAM_DIRECT_IO, QEMU_MIGRATION_PARAM_LAST } qemuMigrationParam; @@ -87,7 +88,7 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params, qemuMigrationParty party); qemuMigrationParams * -qemuMigrationParamsForSave(bool mappedRam); +qemuMigrationParamsForSave(bool mappedRam, unsigned int flags); int qemuMigrationParamsDump(qemuMigrationParams *migParams, diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 9a454a1d08..832ede639e 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2240,7 +2240,8 @@ qemuMonitorMigrateToFd(qemuMonitor *mon, int qemuMonitorMigrateToFdSet(virDomainObj *vm, unsigned int flags, - int *fd) + int *fd, + int *directFd) { qemuDomainObjPrivate *priv = vm->privateData; qemuMonitor *mon = priv->mon; @@ -2250,7 +2251,7 @@ qemuMonitorMigrateToFdSet(virDomainObj *vm, g_autofree char *uri = NULL; int ret; - VIR_DEBUG("fd=%d flags=0x%x", *fd, flags); + VIR_DEBUG("fd=%d directFd=%d flags=0x%x", *fd, *directFd, flags); QEMU_CHECK_MONITOR(mon); @@ -2262,6 +2263,8 @@ qemuMonitorMigrateToFdSet(virDomainObj *vm, fdPassMigrate = qemuFDPassNew("migrate", priv); qemuFDPassAddFD(fdPassMigrate, fd, "-buffered-fd"); + if (*directFd != -1) + qemuFDPassAddFD(fdPassMigrate, directFd, "-directio-fd"); if (qemuFDPassTransferMonitor(fdPassMigrate, mon) < 0) return -1; diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index ebacdf110e..63385e93f3 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -846,7 +846,8 @@ int qemuMonitorMigrateToFd(qemuMonitor *mon, int qemuMonitorMigrateToFdSet(virDomainObj *vm, unsigned int flags, - int *fd); + int *fd, + int *directFd); int qemuMonitorMigrateToHost(qemuMonitor *mon, unsigned int flags, diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index b99e0de1ff..8ffd26d57a 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -450,7 +450,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, goto cleanup; /* Perform the migration */ - if (qemuMigrationSrcToFile(driver, vm, &fd, compressor, saveParams, flags, asyncJob) < 0) + if (qemuMigrationSrcToFile(driver, vm, path, &fd, compressor, saveParams, flags, asyncJob) < 0) goto cleanup; /* Touch up file header to mark image complete. */ -- 2.35.3

When using the mapped-ram migration capability, direct IO is enabled by setting the "direct-io" migration parameter to "true" and passing QEMU an additional fd with O_DIRECT set. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_migration.c | 11 ++++++----- src/qemu/qemu_process.c | 30 ++++++++++++++++++++++++++---- src/qemu/qemu_process.h | 6 ++++-- 3 files changed, 36 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index daa42bcfe4..56199c616b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2948,7 +2948,8 @@ qemuMigrationDstPrepareCleanup(virQEMUDriver *driver, } static qemuProcessIncomingDef * -qemuMigrationDstPrepare(virDomainObj *vm, +qemuMigrationDstPrepare(virQEMUDriver *driver, + virDomainObj *vm, bool tunnel, const char *protocol, const char *listenAddress, @@ -3008,8 +3009,8 @@ qemuMigrationDstPrepare(virDomainObj *vm, migrateFrom = g_strdup_printf(incFormat, protocol, listenAddress, port); } - return qemuProcessIncomingDefNew(vm, listenAddress, - migrateFrom, fd, NULL, NULL); + return qemuProcessIncomingDefNew(driver, vm, listenAddress, + migrateFrom, fd, NULL, NULL, NULL); } @@ -3151,7 +3152,7 @@ qemuMigrationDstPrepareActive(virQEMUDriver *driver, goto error; stopProcess = true; - if (!(incoming = qemuMigrationDstPrepare(vm, tunnel, protocol, + if (!(incoming = qemuMigrationDstPrepare(driver, vm, tunnel, protocol, listenAddress, port, &dataFD[0]))) goto error; @@ -3522,7 +3523,7 @@ qemuMigrationDstPrepareResume(virQEMUDriver *driver, priv->origname = g_strdup(origname); - if (!(incoming = qemuMigrationDstPrepare(vm, false, protocol, + if (!(incoming = qemuMigrationDstPrepare(driver, vm, false, protocol, listenAddress, port, NULL))) goto cleanup; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b02cd84aff..e10894d0f9 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4694,13 +4694,16 @@ qemuProcessIncomingDefFree(qemuProcessIncomingDef *inc) * qemuProcessIncomingDefFree will NOT close it. */ qemuProcessIncomingDef * -qemuProcessIncomingDefNew(virDomainObj *vm, +qemuProcessIncomingDefNew(virQEMUDriver *driver, + virDomainObj *vm, const char *listenAddress, const char *migrateFrom, int *fd, const char *path, - virQEMUSaveData *data) + virQEMUSaveData *data, + qemuMigrationParams *migParams) { + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); qemuDomainObjPrivate *priv = vm->privateData; qemuProcessIncomingDef *inc = NULL; @@ -4715,10 +4718,28 @@ qemuProcessIncomingDefNew(virDomainObj *vm, size_t offset = sizeof(virQEMUSaveHeader) + data->header.data_len; if (data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM) { + bool directio = false; unsigned int fdsetId; inc->fdPassMigrate = qemuFDPassNew("migrate", priv); - qemuFDPassAddFD(inc->fdPassMigrate, fd, "-buffered-fd"); + /* When using directio with mapped-ram, qemu needs an fd without + * O_DIRECT set for reading small bits of unaligned state. */ + if (qemuMigrationParamsGetBool(migParams, + QEMU_MIGRATION_PARAM_DIRECT_IO, + &directio) < 0) + goto error; + + if (directio) { + VIR_AUTOCLOSE bufferedFd = -1; + + if ((bufferedFd = qemuDomainOpenFile(cfg, NULL, path, O_RDONLY, NULL)) < 0) + goto error; + + qemuFDPassAddFD(inc->fdPassMigrate, &bufferedFd, "-buffered-fd"); + qemuFDPassAddFD(inc->fdPassMigrate, fd, "-directio-fd"); + } else { + qemuFDPassAddFD(inc->fdPassMigrate, fd, "-buffered-fd"); + } qemuFDPassGetId(inc->fdPassMigrate, &fdsetId); inc->uri = g_strdup_printf("file:/dev/fdset/%u,offset=%#lx", fdsetId, offset); } else { @@ -8377,7 +8398,8 @@ qemuProcessStartWithMemoryState(virConnectPtr conn, int rc = 0; int ret = -1; - incoming = qemuProcessIncomingDefNew(vm, NULL, "stdio", fd, path, data); + incoming = qemuProcessIncomingDefNew(driver, vm, NULL, "stdio", + fd, path, data, migParams); if (!incoming) return -1; diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 93699da8bd..0e6aedf75d 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -58,12 +58,14 @@ struct _qemuProcessIncomingDef { const char *path; /* path associated with fd */ }; -qemuProcessIncomingDef *qemuProcessIncomingDefNew(virDomainObj *vm, +qemuProcessIncomingDef *qemuProcessIncomingDefNew(virQEMUDriver *driver, + virDomainObj *vm, const char *listenAddress, const char *migrateFrom, int *fd, const char *path, - virQEMUSaveData *data); + virQEMUSaveData *data, + qemuMigrationParams *migParams); void qemuProcessIncomingDefFree(qemuProcessIncomingDef *inc); -- 2.35.3

From: Claudio Fontana <cfontana@suse.de> Add a new VIR_DOMAIN_SAVE_PARALLEL flag to the save and restore APIs, which can be used to specify the use of multiple, parallel channels for saving a domain. The number of parallel channels can be set using the VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS typed parameter. Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- include/libvirt/libvirt-domain.h | 13 +++++++++++++ src/libvirt-domain.c | 6 ++++++ 2 files changed, 19 insertions(+) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 4266237abe..f4bf9c3cdb 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1596,6 +1596,7 @@ typedef enum { VIR_DOMAIN_SAVE_RUNNING = 1 << 1, /* Favor running over paused (Since: 0.9.5) */ VIR_DOMAIN_SAVE_PAUSED = 1 << 2, /* Favor paused over running (Since: 0.9.5) */ VIR_DOMAIN_SAVE_RESET_NVRAM = 1 << 3, /* Re-initialize NVRAM from template (Since: 8.1.0) */ + VIR_DOMAIN_SAVE_PARALLEL = 1 << 4, /* Save and restore using parallel channels (Since: 10.6.0) */ } virDomainSaveRestoreFlags; int virDomainSave (virDomainPtr domain, @@ -1641,6 +1642,18 @@ int virDomainRestoreParams (virConnectPtr conn, */ # define VIR_DOMAIN_SAVE_PARAM_DXML "dxml" +/** + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS: + * + * this optional parameter mirrors the migration parameter + * VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS. + * + * It specifies the number of extra connections to use when transfering state. + * + * Since: 10.6.0 + */ +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS "parallel.connections" + /* See below for virDomainSaveImageXMLFlags */ char * virDomainSaveImageGetXMLDesc (virConnectPtr conn, const char *file, diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 5aedae1b49..32508a4ede 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -1010,6 +1010,9 @@ virDomainSaveFlags(virDomainPtr domain, const char *to, * If VIR_DOMAIN_SAVE_PARAM_FILE is not provided then a managed save is * performed (see virDomainManagedSave). * + * See VIR_DOMAIN_SAVE_PARAM_* for detailed description of accepted save + * parameters. + * * Returns 0 in case of success and -1 in case of failure. * * Since: 8.4.0 @@ -1222,6 +1225,9 @@ virDomainRestoreFlags(virConnectPtr conn, const char *from, const char *dxml, * now, VIR_DOMAIN_SAVE_PARAM_FILE is required but this requirement may * be lifted in the future. * + * See VIR_DOMAIN_SAVE_PARAM_* for detailed description of accepted + * restore parameters. + * * Returns 0 in case of success and -1 in case of failure. * * Since: 8.4.0 -- 2.35.3

On Thu, Aug 08, 2024 at 05:38:10PM -0600, Jim Fehlig via Devel wrote:
From: Claudio Fontana <cfontana@suse.de>
Add a new VIR_DOMAIN_SAVE_PARALLEL flag to the save and restore APIs, which can be used to specify the use of multiple, parallel channels for saving a domain. The number of parallel channels can be set using the VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS typed parameter.
Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- include/libvirt/libvirt-domain.h | 13 +++++++++++++ src/libvirt-domain.c | 6 ++++++ 2 files changed, 19 insertions(+)
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 4266237abe..f4bf9c3cdb 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1596,6 +1596,7 @@ typedef enum { VIR_DOMAIN_SAVE_RUNNING = 1 << 1, /* Favor running over paused (Since: 0.9.5) */ VIR_DOMAIN_SAVE_PAUSED = 1 << 2, /* Favor paused over running (Since: 0.9.5) */ VIR_DOMAIN_SAVE_RESET_NVRAM = 1 << 3, /* Re-initialize NVRAM from template (Since: 8.1.0) */ + VIR_DOMAIN_SAVE_PARALLEL = 1 << 4, /* Save and restore using parallel channels (Since: 10.6.0) */ } virDomainSaveRestoreFlags;
int virDomainSave (virDomainPtr domain, @@ -1641,6 +1642,18 @@ int virDomainRestoreParams (virConnectPtr conn, */ # define VIR_DOMAIN_SAVE_PARAM_DXML "dxml"
+/** + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS: + * + * this optional parameter mirrors the migration parameter + * VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS. + * + * It specifies the number of extra connections to use when transfering state. + * + * Since: 10.6.0 + */ +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS "parallel.connections"
"CONNECTIONS" is perhaps the wrong name, as that's kinda an impl detail. How about VIR_MIGRATE_PARAM_PARALLEL_CPUS to reflect the number of CPUs we're going to burn time on.
+ /* See below for virDomainSaveImageXMLFlags */ char * virDomainSaveImageGetXMLDesc (virConnectPtr conn, const char *file, diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 5aedae1b49..32508a4ede 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -1010,6 +1010,9 @@ virDomainSaveFlags(virDomainPtr domain, const char *to, * If VIR_DOMAIN_SAVE_PARAM_FILE is not provided then a managed save is * performed (see virDomainManagedSave). * + * See VIR_DOMAIN_SAVE_PARAM_* for detailed description of accepted save + * parameters. + * * Returns 0 in case of success and -1 in case of failure. * * Since: 8.4.0 @@ -1222,6 +1225,9 @@ virDomainRestoreFlags(virConnectPtr conn, const char *from, const char *dxml, * now, VIR_DOMAIN_SAVE_PARAM_FILE is required but this requirement may * be lifted in the future. * + * See VIR_DOMAIN_SAVE_PARAM_* for detailed description of accepted + * restore parameters. + * * Returns 0 in case of success and -1 in case of failure. * * Since: 8.4.0 -- 2.35.3
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 10/10/24 15:29, Daniel P. Berrangé wrote:
On Thu, Aug 08, 2024 at 05:38:10PM -0600, Jim Fehlig via Devel wrote:
From: Claudio Fontana <cfontana@suse.de>
Add a new VIR_DOMAIN_SAVE_PARALLEL flag to the save and restore APIs, which can be used to specify the use of multiple, parallel channels for saving a domain. The number of parallel channels can be set using the VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS typed parameter.
Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- include/libvirt/libvirt-domain.h | 13 +++++++++++++ src/libvirt-domain.c | 6 ++++++ 2 files changed, 19 insertions(+)
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 4266237abe..f4bf9c3cdb 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1596,6 +1596,7 @@ typedef enum { VIR_DOMAIN_SAVE_RUNNING = 1 << 1, /* Favor running over paused (Since: 0.9.5) */ VIR_DOMAIN_SAVE_PAUSED = 1 << 2, /* Favor paused over running (Since: 0.9.5) */ VIR_DOMAIN_SAVE_RESET_NVRAM = 1 << 3, /* Re-initialize NVRAM from template (Since: 8.1.0) */ + VIR_DOMAIN_SAVE_PARALLEL = 1 << 4, /* Save and restore using parallel channels (Since: 10.6.0) */ } virDomainSaveRestoreFlags;
int virDomainSave (virDomainPtr domain, @@ -1641,6 +1642,18 @@ int virDomainRestoreParams (virConnectPtr conn, */ # define VIR_DOMAIN_SAVE_PARAM_DXML "dxml"
+/** + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS: + * + * this optional parameter mirrors the migration parameter + * VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS. + * + * It specifies the number of extra connections to use when transfering state. + * + * Since: 10.6.0 + */ +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS "parallel.connections"
"CONNECTIONS" is perhaps the wrong name, as that's kinda an impl
I would suggest CONNECTIONS it is the correct name.
detail. How about VIR_MIGRATE_PARAM_PARALLEL_CPUS to reflect the number of CPUs we're going to burn time on.
This is another aspect of it, but I would suggest that _this_ is much more an implementation detail (inside qemu) on how the parallel connection is actually managed. Maybe in the future there will be a pool of cpus serving this, or another optional way to serve the connections. Considering the fact that libvirt _already has_ for regular migrations {.typedParam = VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS, .param = QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, .party = QEMU_MIGRATION_SOURCE | QEMU_MIGRATION_DESTINATION}, which controls exactly the same QEMU parameter (ie the multifd-channels), using anything else than --parallel-connections is in my opinion very surprising for the users (which would know what to expect from the same command line option), and honestly simply absurd. Thanks, Claudio
+ /* See below for virDomainSaveImageXMLFlags */ char * virDomainSaveImageGetXMLDesc (virConnectPtr conn, const char *file, diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 5aedae1b49..32508a4ede 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -1010,6 +1010,9 @@ virDomainSaveFlags(virDomainPtr domain, const char *to, * If VIR_DOMAIN_SAVE_PARAM_FILE is not provided then a managed save is * performed (see virDomainManagedSave). * + * See VIR_DOMAIN_SAVE_PARAM_* for detailed description of accepted save + * parameters. + * * Returns 0 in case of success and -1 in case of failure. * * Since: 8.4.0 @@ -1222,6 +1225,9 @@ virDomainRestoreFlags(virConnectPtr conn, const char *from, const char *dxml, * now, VIR_DOMAIN_SAVE_PARAM_FILE is required but this requirement may * be lifted in the future. * + * See VIR_DOMAIN_SAVE_PARAM_* for detailed description of accepted + * restore parameters. + * * Returns 0 in case of success and -1 in case of failure. * * Since: 8.4.0 -- 2.35.3
With regards, Daniel

On Mon, Oct 14, 2024 at 06:00:53PM +0200, Claudio Fontana wrote:
On 10/10/24 15:29, Daniel P. Berrangé wrote:
On Thu, Aug 08, 2024 at 05:38:10PM -0600, Jim Fehlig via Devel wrote:
From: Claudio Fontana <cfontana@suse.de>
Add a new VIR_DOMAIN_SAVE_PARALLEL flag to the save and restore APIs, which can be used to specify the use of multiple, parallel channels for saving a domain. The number of parallel channels can be set using the VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS typed parameter.
Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- include/libvirt/libvirt-domain.h | 13 +++++++++++++ src/libvirt-domain.c | 6 ++++++ 2 files changed, 19 insertions(+)
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 4266237abe..f4bf9c3cdb 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1596,6 +1596,7 @@ typedef enum { VIR_DOMAIN_SAVE_RUNNING = 1 << 1, /* Favor running over paused (Since: 0.9.5) */ VIR_DOMAIN_SAVE_PAUSED = 1 << 2, /* Favor paused over running (Since: 0.9.5) */ VIR_DOMAIN_SAVE_RESET_NVRAM = 1 << 3, /* Re-initialize NVRAM from template (Since: 8.1.0) */ + VIR_DOMAIN_SAVE_PARALLEL = 1 << 4, /* Save and restore using parallel channels (Since: 10.6.0) */ } virDomainSaveRestoreFlags;
int virDomainSave (virDomainPtr domain, @@ -1641,6 +1642,18 @@ int virDomainRestoreParams (virConnectPtr conn, */ # define VIR_DOMAIN_SAVE_PARAM_DXML "dxml"
+/** + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS: + * + * this optional parameter mirrors the migration parameter + * VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS. + * + * It specifies the number of extra connections to use when transfering state. + * + * Since: 10.6.0 + */ +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS "parallel.connections"
"CONNECTIONS" is perhaps the wrong name, as that's kinda an impl
I would suggest CONNECTIONS it is the correct name.
detail. How about VIR_MIGRATE_PARAM_PARALLEL_CPUS to reflect the number of CPUs we're going to burn time on.
This is another aspect of it, but I would suggest that _this_ is much more an implementation detail (inside qemu) on how the parallel connection is actually managed. Maybe in the future there will be a pool of cpus serving this, or another optional way to serve the connections.
Considering the fact that libvirt _already has_ for regular migrations
{.typedParam = VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS, .param = QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, .party = QEMU_MIGRATION_SOURCE | QEMU_MIGRATION_DESTINATION},
which controls exactly the same QEMU parameter (ie the multifd-channels), using anything else than --parallel-connections is in my opinion very surprising for the users (which would know what to expect from the same command line option), and honestly simply absurd.
This is comparing two different scenarios which work in varying ways. In the case of live migration there are network connections that we're controlling. So calling the parameter "parallel connections" is contextually relevant, though we could equally have called it "parallel vcpus" - depends whether you're thinking about multifd as being there to max out TCP connections, or to max our local CPU usage. In the case of migrating to a file though, there is no concept of network connections at all. QEMU is opening a file and directly writing to that file from multiple threads. Calling the parameter 'parallel connections' makes no sense in this context, as there are no connections in use. "parallel CPUs" reflects that we're consuming multiple CPUs to write out data, though we could also call it "parallel threads" for similar reasons. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 10/14/24 11:42, Daniel P. Berrangé wrote:
On Mon, Oct 14, 2024 at 06:00:53PM +0200, Claudio Fontana wrote:
On 10/10/24 15:29, Daniel P. Berrangé wrote:
On Thu, Aug 08, 2024 at 05:38:10PM -0600, Jim Fehlig via Devel wrote:
From: Claudio Fontana <cfontana@suse.de>
Add a new VIR_DOMAIN_SAVE_PARALLEL flag to the save and restore APIs, which can be used to specify the use of multiple, parallel channels for saving a domain. The number of parallel channels can be set using the VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS typed parameter.
Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- include/libvirt/libvirt-domain.h | 13 +++++++++++++ src/libvirt-domain.c | 6 ++++++ 2 files changed, 19 insertions(+)
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 4266237abe..f4bf9c3cdb 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1596,6 +1596,7 @@ typedef enum { VIR_DOMAIN_SAVE_RUNNING = 1 << 1, /* Favor running over paused (Since: 0.9.5) */ VIR_DOMAIN_SAVE_PAUSED = 1 << 2, /* Favor paused over running (Since: 0.9.5) */ VIR_DOMAIN_SAVE_RESET_NVRAM = 1 << 3, /* Re-initialize NVRAM from template (Since: 8.1.0) */ + VIR_DOMAIN_SAVE_PARALLEL = 1 << 4, /* Save and restore using parallel channels (Since: 10.6.0) */ } virDomainSaveRestoreFlags;
int virDomainSave (virDomainPtr domain, @@ -1641,6 +1642,18 @@ int virDomainRestoreParams (virConnectPtr conn, */ # define VIR_DOMAIN_SAVE_PARAM_DXML "dxml"
+/** + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS: + * + * this optional parameter mirrors the migration parameter + * VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS. + * + * It specifies the number of extra connections to use when transfering state. + * + * Since: 10.6.0 + */ +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS "parallel.connections"
"CONNECTIONS" is perhaps the wrong name, as that's kinda an impl
I would suggest CONNECTIONS it is the correct name.
detail. How about VIR_MIGRATE_PARAM_PARALLEL_CPUS to reflect the number of CPUs we're going to burn time on.
This is another aspect of it, but I would suggest that _this_ is much more an implementation detail (inside qemu) on how the parallel connection is actually managed. Maybe in the future there will be a pool of cpus serving this, or another optional way to serve the connections.
Considering the fact that libvirt _already has_ for regular migrations
{.typedParam = VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS, .param = QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, .party = QEMU_MIGRATION_SOURCE | QEMU_MIGRATION_DESTINATION},
which controls exactly the same QEMU parameter (ie the multifd-channels), using anything else than --parallel-connections is in my opinion very surprising for the users (which would know what to expect from the same command line option), and honestly simply absurd.
This is comparing two different scenarios which work in varying ways.
In the case of live migration there are network connections that we're controlling. So calling the parameter "parallel connections" is contextually relevant, though we could equally have called it "parallel vcpus" - depends whether you're thinking about multifd as being there to max out TCP connections, or to max our local CPU usage.
In the case of migrating to a file though, there is no concept of network connections at all. QEMU is opening a file and directly writing to that file from multiple threads. Calling the parameter 'parallel connections' makes no sense in this context, as there are no connections in use. "parallel CPUs" reflects that we're consuming multiple CPUs to write out data, though we could also call it "parallel threads" for similar reasons.
I can change this to PARALLEL_CPUS if really desired, but there's something about it that doesn't feel right. It's hard to put to words, probably because the reasons are mostly subjective. Claudio already raised the objective ones. I do prefer PARALLEL_THREADS over PARALLEL_CPUS. How about reusing the name already selected by QEMU and going with PARALLEL_CHANNELS? Regards, Jim

On 10/14/24 21:41, Jim Fehlig via Devel wrote:
On 10/14/24 11:42, Daniel P. Berrangé wrote:
On Mon, Oct 14, 2024 at 06:00:53PM +0200, Claudio Fontana wrote:
On 10/10/24 15:29, Daniel P. Berrangé wrote:
On Thu, Aug 08, 2024 at 05:38:10PM -0600, Jim Fehlig via Devel wrote:
From: Claudio Fontana <cfontana@suse.de>
Add a new VIR_DOMAIN_SAVE_PARALLEL flag to the save and restore APIs, which can be used to specify the use of multiple, parallel channels for saving a domain. The number of parallel channels can be set using the VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS typed parameter.
Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- include/libvirt/libvirt-domain.h | 13 +++++++++++++ src/libvirt-domain.c | 6 ++++++ 2 files changed, 19 insertions(+)
diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 4266237abe..f4bf9c3cdb 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1596,6 +1596,7 @@ typedef enum { VIR_DOMAIN_SAVE_RUNNING = 1 << 1, /* Favor running over paused (Since: 0.9.5) */ VIR_DOMAIN_SAVE_PAUSED = 1 << 2, /* Favor paused over running (Since: 0.9.5) */ VIR_DOMAIN_SAVE_RESET_NVRAM = 1 << 3, /* Re-initialize NVRAM from template (Since: 8.1.0) */ + VIR_DOMAIN_SAVE_PARALLEL = 1 << 4, /* Save and restore using parallel channels (Since: 10.6.0) */ } virDomainSaveRestoreFlags;
int virDomainSave (virDomainPtr domain, @@ -1641,6 +1642,18 @@ int virDomainRestoreParams (virConnectPtr conn, */ # define VIR_DOMAIN_SAVE_PARAM_DXML "dxml"
+/** + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS: + * + * this optional parameter mirrors the migration parameter + * VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS. + * + * It specifies the number of extra connections to use when transfering state. + * + * Since: 10.6.0 + */ +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS "parallel.connections"
"CONNECTIONS" is perhaps the wrong name, as that's kinda an impl
I would suggest CONNECTIONS it is the correct name.
detail. How about VIR_MIGRATE_PARAM_PARALLEL_CPUS to reflect the number of CPUs we're going to burn time on.
This is another aspect of it, but I would suggest that _this_ is much more an implementation detail (inside qemu) on how the parallel connection is actually managed. Maybe in the future there will be a pool of cpus serving this, or another optional way to serve the connections.
Considering the fact that libvirt _already has_ for regular migrations
{.typedParam = VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS, .param = QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, .party = QEMU_MIGRATION_SOURCE | QEMU_MIGRATION_DESTINATION},
which controls exactly the same QEMU parameter (ie the multifd-channels), using anything else than --parallel-connections is in my opinion very surprising for the users (which would know what to expect from the same command line option), and honestly simply absurd.
This is comparing two different scenarios which work in varying ways.
In the case of live migration there are network connections that
this is a fair point
we're controlling. So calling the parameter "parallel connections" is contextually relevant, though we could equally have called it "parallel vcpus" - depends whether you're thinking about multifd as being there to max out TCP connections, or to max our local CPU usage.
In the case of migrating to a file though, there is no concept of network connections at all. QEMU is opening a file and directly writing to that file from multiple threads. Calling the parameter 'parallel connections' makes no sense in this context, as there are no connections in use. "parallel CPUs" reflects that we're consuming multiple CPUs to write out data, though we could also call it "parallel threads" for similar reasons.
here I disagree, because we don't know what QEMU will do to serve those multifd channels (threads, coroutines, or whatever construct might be implemented in the future to improve efficiency).
I can change this to PARALLEL_CPUS if really desired, but there's something about it that doesn't feel right. It's hard to put to words, probably because the reasons are mostly subjective. Claudio already raised the objective ones.
I do prefer PARALLEL_THREADS over PARALLEL_CPUS. How about reusing the name already selected by QEMU and going with PARALLEL_CHANNELS?
Good idea, PARALLEL_CHANNELS seems to be the best to me.
Regards, Jim
Claudio

Add support for parallel save and restore by mapping libvirt's "parallel-connections" parameter to QEMU's "multifd-channels" migration parameter. Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 31 ++++++++++++++++++++----------- src/qemu/qemu_migration_params.c | 25 +++++++++++++++++++++++-- src/qemu/qemu_migration_params.h | 5 ++++- 3 files changed, 47 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0025bad6e7..fce599b321 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2608,6 +2608,8 @@ qemuDomainSaveInternal(virQEMUDriver *driver, int compressed, virCommand *compressor, const char *xmlin, + virTypedParameterPtr params, + int nparams, unsigned int flags) { g_autofree char *xml = NULL; @@ -2691,7 +2693,7 @@ qemuDomainSaveInternal(virQEMUDriver *driver, xml = NULL; mappedRam = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(saveParams = qemuMigrationParamsForSave(mappedRam, flags))) + if (!(saveParams = qemuMigrationParamsForSave(params, nparams, mappedRam, flags))) goto endjob; ret = qemuSaveImageCreate(driver, vm, path, data, compressor, @@ -2777,7 +2779,7 @@ qemuDomainManagedSaveHelper(virQEMUDriver *driver, VIR_INFO("Saving state of domain '%s' to '%s'", vm->def->name, path); if (qemuDomainSaveInternal(driver, vm, path, compressed, - compressor, dxml, flags) < 0) + compressor, dxml, NULL, 0, flags) < 0) return -1; vm->hasManagedSave = true; @@ -2816,7 +2818,7 @@ qemuDomainSaveFlags(virDomainPtr dom, const char *path, const char *dxml, goto cleanup; ret = qemuDomainSaveInternal(driver, vm, path, compressed, - compressor, dxml, flags); + compressor, dxml, NULL, 0, flags); cleanup: virDomainObjEndAPI(&vm); @@ -2846,13 +2848,16 @@ qemuDomainSaveParams(virDomainPtr dom, virCheckFlags(VIR_DOMAIN_SAVE_BYPASS_CACHE | VIR_DOMAIN_SAVE_RUNNING | - VIR_DOMAIN_SAVE_PAUSED, -1); + VIR_DOMAIN_SAVE_PAUSED | + VIR_DOMAIN_SAVE_PARALLEL, -1); if (virTypedParamsValidate(params, nparams, VIR_DOMAIN_SAVE_PARAM_FILE, VIR_TYPED_PARAM_STRING, VIR_DOMAIN_SAVE_PARAM_DXML, VIR_TYPED_PARAM_STRING, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, + VIR_TYPED_PARAM_INT, NULL) < 0) return -1; @@ -2884,7 +2889,7 @@ qemuDomainSaveParams(virDomainPtr dom, goto cleanup; ret = qemuDomainSaveInternal(driver, vm, to, compressed, - compressor, dxml, flags); + compressor, dxml, params, nparams, flags); cleanup: virDomainObjEndAPI(&vm); @@ -5759,6 +5764,8 @@ static int qemuDomainRestoreInternal(virConnectPtr conn, const char *path, const char *dxml, + virTypedParameterPtr params, + int nparams, unsigned int flags, int (*ensureACL)(virConnectPtr, virDomainDef *)) { @@ -5780,7 +5787,8 @@ qemuDomainRestoreInternal(virConnectPtr conn, virCheckFlags(VIR_DOMAIN_SAVE_BYPASS_CACHE | VIR_DOMAIN_SAVE_RUNNING | VIR_DOMAIN_SAVE_PAUSED | - VIR_DOMAIN_SAVE_RESET_NVRAM, -1); + VIR_DOMAIN_SAVE_RESET_NVRAM | + VIR_DOMAIN_SAVE_PARALLEL, -1); if (flags & VIR_DOMAIN_SAVE_RESET_NVRAM) reset_nvram = true; @@ -5789,7 +5797,7 @@ qemuDomainRestoreInternal(virConnectPtr conn, goto cleanup; mapped_ram = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram, flags))) + if (!(restoreParams = qemuMigrationParamsForSave(params, nparams, mapped_ram, flags))) return -1; fd = qemuSaveImageOpen(driver, path, @@ -5871,7 +5879,7 @@ qemuDomainRestoreFlags(virConnectPtr conn, const char *dxml, unsigned int flags) { - return qemuDomainRestoreInternal(conn, path, dxml, flags, + return qemuDomainRestoreInternal(conn, path, dxml, NULL, 0, flags, virDomainRestoreFlagsEnsureACL); } @@ -5879,7 +5887,7 @@ static int qemuDomainRestore(virConnectPtr conn, const char *path) { - return qemuDomainRestoreInternal(conn, path, NULL, 0, + return qemuDomainRestoreInternal(conn, path, NULL, NULL, 0, 0, virDomainRestoreEnsureACL); } @@ -5895,6 +5903,7 @@ qemuDomainRestoreParams(virConnectPtr conn, if (virTypedParamsValidate(params, nparams, VIR_DOMAIN_SAVE_PARAM_FILE, VIR_TYPED_PARAM_STRING, VIR_DOMAIN_SAVE_PARAM_DXML, VIR_TYPED_PARAM_STRING, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, VIR_TYPED_PARAM_INT, NULL) < 0) return -1; @@ -5911,7 +5920,7 @@ qemuDomainRestoreParams(virConnectPtr conn, return -1; } - ret = qemuDomainRestoreInternal(conn, path, dxml, flags, + ret = qemuDomainRestoreInternal(conn, path, dxml, params, nparams, flags, virDomainRestoreParamsEnsureACL); return ret; } @@ -6115,7 +6124,7 @@ qemuDomainObjRestore(virConnectPtr conn, } mapped_ram = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram, + if (!(restoreParams = qemuMigrationParamsForSave(NULL, 0, mapped_ram, bypass_cache ? VIR_DOMAIN_SAVE_BYPASS_CACHE : 0))) return -1; diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 8f6003005c..8c656af163 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -788,7 +788,10 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params, qemuMigrationParams * -qemuMigrationParamsForSave(bool mappedRam, unsigned int flags) +qemuMigrationParamsForSave(virTypedParameterPtr params, + int nparams, + bool mappedRam, + unsigned int flags) { g_autoptr(qemuMigrationParams) saveParams = NULL; @@ -800,7 +803,25 @@ qemuMigrationParamsForSave(bool mappedRam, unsigned int flags) return NULL; if (virBitmapSetBit(saveParams->caps, QEMU_MIGRATION_CAP_MULTIFD) < 0) return NULL; - saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = 1; + + if (flags & VIR_DOMAIN_SAVE_PARALLEL) { + int nchannels; + + if (params && virTypedParamsGetInt(params, nparams, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, + &nchannels) < 0) + return NULL; + + if (nchannels < 1) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("number of parallel save channels cannot be less than 1")); + return NULL; + } + + saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = nchannels; + } else { + saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = 1; + } saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].set = true; if (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) { diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index 9700469b5e..a672b3d875 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -88,7 +88,10 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params, qemuMigrationParty party); qemuMigrationParams * -qemuMigrationParamsForSave(bool mappedRam, unsigned int flags); +qemuMigrationParamsForSave(virTypedParameterPtr params, + int nparams, + bool mappedRam, + unsigned int flags); int qemuMigrationParamsDump(qemuMigrationParams *migParams, -- 2.35.3

On Thu, Aug 08, 2024 at 05:38:11PM -0600, Jim Fehlig via Devel wrote:
Add support for parallel save and restore by mapping libvirt's "parallel-connections" parameter to QEMU's "multifd-channels" migration parameter.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 31 ++++++++++++++++++++----------- src/qemu/qemu_migration_params.c | 25 +++++++++++++++++++++++-- src/qemu/qemu_migration_params.h | 5 ++++- 3 files changed, 47 insertions(+), 14 deletions(-)
Unless I'm missing something, nothing in this patch is validating the the image is about to be written in v3 format with MAPPED_RAM set. We must reject VIR_DOMAIN_SAVE_PARALLEL if we're not using MAPPED_RAM. Likewise when restoring we must reject PARALLEL if the loaded image doesnt have MAPPED_RAM feature set.
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0025bad6e7..fce599b321 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2608,6 +2608,8 @@ qemuDomainSaveInternal(virQEMUDriver *driver, int compressed, virCommand *compressor, const char *xmlin, + virTypedParameterPtr params, + int nparams, unsigned int flags) { g_autofree char *xml = NULL; @@ -2691,7 +2693,7 @@ qemuDomainSaveInternal(virQEMUDriver *driver, xml = NULL;
mappedRam = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(saveParams = qemuMigrationParamsForSave(mappedRam, flags))) + if (!(saveParams = qemuMigrationParamsForSave(params, nparams, mappedRam, flags))) goto endjob;
ret = qemuSaveImageCreate(driver, vm, path, data, compressor, @@ -2777,7 +2779,7 @@ qemuDomainManagedSaveHelper(virQEMUDriver *driver, VIR_INFO("Saving state of domain '%s' to '%s'", vm->def->name, path);
if (qemuDomainSaveInternal(driver, vm, path, compressed, - compressor, dxml, flags) < 0) + compressor, dxml, NULL, 0, flags) < 0) return -1;
vm->hasManagedSave = true; @@ -2816,7 +2818,7 @@ qemuDomainSaveFlags(virDomainPtr dom, const char *path, const char *dxml, goto cleanup;
ret = qemuDomainSaveInternal(driver, vm, path, compressed, - compressor, dxml, flags); + compressor, dxml, NULL, 0, flags);
cleanup: virDomainObjEndAPI(&vm); @@ -2846,13 +2848,16 @@ qemuDomainSaveParams(virDomainPtr dom,
virCheckFlags(VIR_DOMAIN_SAVE_BYPASS_CACHE | VIR_DOMAIN_SAVE_RUNNING | - VIR_DOMAIN_SAVE_PAUSED, -1); + VIR_DOMAIN_SAVE_PAUSED | + VIR_DOMAIN_SAVE_PARALLEL, -1);
if (virTypedParamsValidate(params, nparams, VIR_DOMAIN_SAVE_PARAM_FILE, VIR_TYPED_PARAM_STRING, VIR_DOMAIN_SAVE_PARAM_DXML, VIR_TYPED_PARAM_STRING, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, + VIR_TYPED_PARAM_INT, NULL) < 0) return -1;
@@ -2884,7 +2889,7 @@ qemuDomainSaveParams(virDomainPtr dom, goto cleanup;
ret = qemuDomainSaveInternal(driver, vm, to, compressed, - compressor, dxml, flags); + compressor, dxml, params, nparams, flags);
cleanup: virDomainObjEndAPI(&vm); @@ -5759,6 +5764,8 @@ static int qemuDomainRestoreInternal(virConnectPtr conn, const char *path, const char *dxml, + virTypedParameterPtr params, + int nparams, unsigned int flags, int (*ensureACL)(virConnectPtr, virDomainDef *)) { @@ -5780,7 +5787,8 @@ qemuDomainRestoreInternal(virConnectPtr conn, virCheckFlags(VIR_DOMAIN_SAVE_BYPASS_CACHE | VIR_DOMAIN_SAVE_RUNNING | VIR_DOMAIN_SAVE_PAUSED | - VIR_DOMAIN_SAVE_RESET_NVRAM, -1); + VIR_DOMAIN_SAVE_RESET_NVRAM | + VIR_DOMAIN_SAVE_PARALLEL, -1);
if (flags & VIR_DOMAIN_SAVE_RESET_NVRAM) reset_nvram = true; @@ -5789,7 +5797,7 @@ qemuDomainRestoreInternal(virConnectPtr conn, goto cleanup;
mapped_ram = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram, flags))) + if (!(restoreParams = qemuMigrationParamsForSave(params, nparams, mapped_ram, flags))) return -1;
fd = qemuSaveImageOpen(driver, path, @@ -5871,7 +5879,7 @@ qemuDomainRestoreFlags(virConnectPtr conn, const char *dxml, unsigned int flags) { - return qemuDomainRestoreInternal(conn, path, dxml, flags, + return qemuDomainRestoreInternal(conn, path, dxml, NULL, 0, flags, virDomainRestoreFlagsEnsureACL); }
@@ -5879,7 +5887,7 @@ static int qemuDomainRestore(virConnectPtr conn, const char *path) { - return qemuDomainRestoreInternal(conn, path, NULL, 0, + return qemuDomainRestoreInternal(conn, path, NULL, NULL, 0, 0, virDomainRestoreEnsureACL); }
@@ -5895,6 +5903,7 @@ qemuDomainRestoreParams(virConnectPtr conn, if (virTypedParamsValidate(params, nparams, VIR_DOMAIN_SAVE_PARAM_FILE, VIR_TYPED_PARAM_STRING, VIR_DOMAIN_SAVE_PARAM_DXML, VIR_TYPED_PARAM_STRING, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, VIR_TYPED_PARAM_INT, NULL) < 0) return -1;
@@ -5911,7 +5920,7 @@ qemuDomainRestoreParams(virConnectPtr conn, return -1; }
- ret = qemuDomainRestoreInternal(conn, path, dxml, flags, + ret = qemuDomainRestoreInternal(conn, path, dxml, params, nparams, flags, virDomainRestoreParamsEnsureACL); return ret; } @@ -6115,7 +6124,7 @@ qemuDomainObjRestore(virConnectPtr conn, }
mapped_ram = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM; - if (!(restoreParams = qemuMigrationParamsForSave(mapped_ram, + if (!(restoreParams = qemuMigrationParamsForSave(NULL, 0, mapped_ram, bypass_cache ? VIR_DOMAIN_SAVE_BYPASS_CACHE : 0))) return -1;
diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 8f6003005c..8c656af163 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -788,7 +788,10 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params,
qemuMigrationParams * -qemuMigrationParamsForSave(bool mappedRam, unsigned int flags) +qemuMigrationParamsForSave(virTypedParameterPtr params, + int nparams, + bool mappedRam, + unsigned int flags) { g_autoptr(qemuMigrationParams) saveParams = NULL;
@@ -800,7 +803,25 @@ qemuMigrationParamsForSave(bool mappedRam, unsigned int flags) return NULL; if (virBitmapSetBit(saveParams->caps, QEMU_MIGRATION_CAP_MULTIFD) < 0) return NULL; - saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = 1; + + if (flags & VIR_DOMAIN_SAVE_PARALLEL) { + int nchannels; + + if (params && virTypedParamsGetInt(params, nparams, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, + &nchannels) < 0) + return NULL; + + if (nchannels < 1) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("number of parallel save channels cannot be less than 1")); + return NULL; + } + + saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = nchannels; + } else { + saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = 1; + } saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].set = true;
if (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) { diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index 9700469b5e..a672b3d875 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -88,7 +88,10 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params, qemuMigrationParty party);
qemuMigrationParams * -qemuMigrationParamsForSave(bool mappedRam, unsigned int flags); +qemuMigrationParamsForSave(virTypedParameterPtr params, + int nparams, + bool mappedRam, + unsigned int flags);
int qemuMigrationParamsDump(qemuMigrationParams *migParams, -- 2.35.3
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 10/10/24 07:43, Daniel P. Berrangé wrote:
On Thu, Aug 08, 2024 at 05:38:11PM -0600, Jim Fehlig via Devel wrote:
Add support for parallel save and restore by mapping libvirt's "parallel-connections" parameter to QEMU's "multifd-channels" migration parameter.
Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- src/qemu/qemu_driver.c | 31 ++++++++++++++++++++----------- src/qemu/qemu_migration_params.c | 25 +++++++++++++++++++++++-- src/qemu/qemu_migration_params.h | 5 ++++- 3 files changed, 47 insertions(+), 14 deletions(-)
Unless I'm missing something, nothing in this patch is validating the the image is about to be written in v3 format with MAPPED_RAM set. We must reject VIR_DOMAIN_SAVE_PARALLEL if we're not using MAPPED_RAM. Likewise when restoring we must reject PARALLEL if the loaded image doesnt have MAPPED_RAM feature set.
Yeah, good catch. The existing logic essentially ignores PARALLEL if mapped-ram is not supported. [snip]
diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 8f6003005c..8c656af163 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -788,7 +788,10 @@ qemuMigrationParamsFromFlags(virTypedParameterPtr params,
qemuMigrationParams * -qemuMigrationParamsForSave(bool mappedRam, unsigned int flags) +qemuMigrationParamsForSave(virTypedParameterPtr params, + int nparams, + bool mappedRam, + unsigned int flags) { g_autoptr(qemuMigrationParams) saveParams = NULL;
@@ -800,7 +803,25 @@ qemuMigrationParamsForSave(bool mappedRam, unsigned int flags) return NULL; if (virBitmapSetBit(saveParams->caps, QEMU_MIGRATION_CAP_MULTIFD) < 0) return NULL; - saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = 1; + + if (flags & VIR_DOMAIN_SAVE_PARALLEL) { + int nchannels; + + if (params && virTypedParamsGetInt(params, nparams, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, + &nchannels) < 0) + return NULL; + + if (nchannels < 1) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("number of parallel save channels cannot be less than 1")); + return NULL; + } + + saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = nchannels; + } else { + saveParams->params[QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS].value.i = 1; + }
It's not shown in the diff, but this block is nestled within a 'if (mappedRam)' block. I've squashed the blow hunk into this patch in my local branch. Regards, Jim diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 8c656af163..6fa09c272e 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -795,6 +795,12 @@ qemuMigrationParamsForSave(virTypedParameterPtr params, { g_autoptr(qemuMigrationParams) saveParams = NULL; + if (flags & VIR_DOMAIN_SAVE_PARALLEL && !mapped_ram) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("Parallel save is not available with this QEMU binary")); + return NULL; + } + if (!(saveParams = qemuMigrationParamsNew())) return NULL;

From: Li Zhang <lizhang@suse.de> Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- tools/virsh-domain.c | 41 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 50e80689a2..ec0e43ae7b 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -4145,6 +4145,14 @@ static const vshCmdOptDef opts_save[] = { .type = VSH_OT_BOOL, .help = N_("avoid file system cache when saving") }, + {.name = "parallel", + .type = VSH_OT_BOOL, + .help = N_("enable parallel save") + }, + {.name = "parallel-connections", + .type = VSH_OT_INT, + .help = N_("number of extra connections to use for parallel save") + }, {.name = "xml", .type = VSH_OT_STRING, .unwanted_positional = true, @@ -4175,6 +4183,11 @@ doSave(void *opaque) g_autoptr(virshDomain) dom = NULL; const char *name = NULL; const char *to = NULL; + virTypedParameterPtr params = NULL; + int nparams = 0; + int maxparams = 0; + int nchannels = 0; + int rv = -1; unsigned int flags = 0; const char *xmlfile = NULL; g_autofree char *xml = NULL; @@ -4188,15 +4201,29 @@ doSave(void *opaque) goto out_sig; #endif /* !WIN32 */ - if (vshCommandOptString(ctl, cmd, "file", &to) < 0) - goto out; - if (vshCommandOptBool(cmd, "bypass-cache")) flags |= VIR_DOMAIN_SAVE_BYPASS_CACHE; if (vshCommandOptBool(cmd, "running")) flags |= VIR_DOMAIN_SAVE_RUNNING; if (vshCommandOptBool(cmd, "paused")) flags |= VIR_DOMAIN_SAVE_PAUSED; + if (vshCommandOptBool(cmd, "parallel")) + flags |= VIR_DOMAIN_SAVE_PARALLEL; + + if (vshCommandOptString(ctl, cmd, "file", &to) < 0) + goto out; + if (to && + virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_FILE, to) < 0) + goto out; + + if ((rv = vshCommandOptInt(ctl, cmd, "parallel-connections", &nchannels)) < 0) { + goto out; + } else if (rv > 0) { + if (virTypedParamsAddInt(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, nchannels) < 0) + goto out; + } if (vshCommandOptString(ctl, cmd, "xml", &xmlfile) < 0) goto out; @@ -4209,9 +4236,13 @@ doSave(void *opaque) vshReportError(ctl); goto out; } + if (xml && + virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_DXML, xml) < 0) + goto out; if (flags || xml) { - rc = virDomainSaveFlags(dom, to, xml, flags); + rc = virDomainSaveParams(dom, params, nparams, flags); } else { rc = virDomainSave(dom, to); } @@ -4224,6 +4255,8 @@ doSave(void *opaque) data->ret = 0; out: + virTypedParamsFree(params, nparams); + #ifndef WIN32 pthread_sigmask(SIG_SETMASK, &oldsigmask, NULL); out_sig: -- 2.35.3

On Thu, Aug 08, 2024 at 05:38:12PM -0600, Jim Fehlig via Devel wrote:
From: Li Zhang <lizhang@suse.de>
Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- tools/virsh-domain.c | 41 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 37 insertions(+), 4 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 50e80689a2..ec0e43ae7b 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -4145,6 +4145,14 @@ static const vshCmdOptDef opts_save[] = { .type = VSH_OT_BOOL, .help = N_("avoid file system cache when saving") }, + {.name = "parallel", + .type = VSH_OT_BOOL, + .help = N_("enable parallel save") + }, + {.name = "parallel-connections", + .type = VSH_OT_INT, + .help = N_("number of extra connections to use for parallel save")
s/connections/cpus/ throughout this patch
+ }, {.name = "xml", .type = VSH_OT_STRING, .unwanted_positional = true, @@ -4175,6 +4183,11 @@ doSave(void *opaque) g_autoptr(virshDomain) dom = NULL; const char *name = NULL; const char *to = NULL; + virTypedParameterPtr params = NULL; + int nparams = 0; + int maxparams = 0; + int nchannels = 0; + int rv = -1; unsigned int flags = 0; const char *xmlfile = NULL; g_autofree char *xml = NULL; @@ -4188,15 +4201,29 @@ doSave(void *opaque) goto out_sig; #endif /* !WIN32 */
- if (vshCommandOptString(ctl, cmd, "file", &to) < 0) - goto out; - if (vshCommandOptBool(cmd, "bypass-cache")) flags |= VIR_DOMAIN_SAVE_BYPASS_CACHE; if (vshCommandOptBool(cmd, "running")) flags |= VIR_DOMAIN_SAVE_RUNNING; if (vshCommandOptBool(cmd, "paused")) flags |= VIR_DOMAIN_SAVE_PAUSED; + if (vshCommandOptBool(cmd, "parallel")) + flags |= VIR_DOMAIN_SAVE_PARALLEL; + + if (vshCommandOptString(ctl, cmd, "file", &to) < 0) + goto out; + if (to && + virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_FILE, to) < 0) + goto out; + + if ((rv = vshCommandOptInt(ctl, cmd, "parallel-connections", &nchannels)) < 0) { + goto out; + } else if (rv > 0) { + if (virTypedParamsAddInt(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, nchannels) < 0) + goto out; + }
if (vshCommandOptString(ctl, cmd, "xml", &xmlfile) < 0) goto out; @@ -4209,9 +4236,13 @@ doSave(void *opaque) vshReportError(ctl); goto out; } + if (xml && + virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_DXML, xml) < 0) + goto out;
if (flags || xml) { - rc = virDomainSaveFlags(dom, to, xml, flags); + rc = virDomainSaveParams(dom, params, nparams, flags); } else { rc = virDomainSave(dom, to); } @@ -4224,6 +4255,8 @@ doSave(void *opaque) data->ret = 0;
out: + virTypedParamsFree(params, nparams); + #ifndef WIN32 pthread_sigmask(SIG_SETMASK, &oldsigmask, NULL); out_sig: -- 2.35.3
With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

From: Claudio Fontana <cfontana@suse.de> Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Jim Fehlig <jfehlig@suse.com> --- docs/manpages/virsh.rst | 9 +++++++-- tools/virsh-domain.c | 38 ++++++++++++++++++++++++++++++++++---- 2 files changed, 41 insertions(+), 6 deletions(-) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index fa038e4547..2217728efd 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -3915,12 +3915,13 @@ restore :: restore state-file [--bypass-cache] [--xml file] - [{--running | --paused}] [--reset-nvram] + [{--running | --paused}] [--reset-nvram] [--parallel] [--parallel-connections] Restores a domain from a ``virsh save`` state file. See *save* for more info. If *--bypass-cache* is specified, the restore will avoid the file system -cache, although this may slow down the operation. +cache; depending on the specific scenario this may slow down or speed up +the operation. *--xml* ``file`` is usually omitted, but can be used to supply an alternative XML file for use on the restored guest with changes only @@ -3936,6 +3937,10 @@ domain should be started in. If *--reset-nvram* is specified, any existing NVRAM file will be deleted and re-initialized from its pristine template. +*--parallel* option will cause the save data to be loaded using the number +of parallel connections specified with *--parallel-connections*. Parallel +connections will help speed up large restore operations. + ``Note``: To avoid corrupting file system contents within the domain, you should not reuse the saved state file for a second ``restore`` unless you have also reverted all storage volumes back to the same contents as when diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index ec0e43ae7b..73c04390cc 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -5265,6 +5265,14 @@ static const vshCmdOptDef opts_restore[] = { .type = VSH_OT_BOOL, .help = N_("avoid file system cache when restoring") }, + {.name = "parallel", + .type = VSH_OT_BOOL, + .help = N_("enable parallel restore") + }, + {.name = "parallel-connections", + .type = VSH_OT_INT, + .help = N_("number of connections to use for parallel restore") + }, {.name = "xml", .type = VSH_OT_STRING, .unwanted_positional = true, @@ -5294,13 +5302,16 @@ cmdRestore(vshControl *ctl, const vshCmd *cmd) const char *xmlfile = NULL; g_autofree char *xml = NULL; virshControl *priv = ctl->privData; + virTypedParameterPtr params = NULL; + int nparams = 0; + int maxparams = 0; + int nchannels = 0; int rc; - if (vshCommandOptString(ctl, cmd, "file", &from) < 0) - return false; - if (vshCommandOptBool(cmd, "bypass-cache")) flags |= VIR_DOMAIN_SAVE_BYPASS_CACHE; + if (vshCommandOptBool(cmd, "parallel")) + flags |= VIR_DOMAIN_SAVE_PARALLEL; if (vshCommandOptBool(cmd, "running")) flags |= VIR_DOMAIN_SAVE_RUNNING; if (vshCommandOptBool(cmd, "paused")) @@ -5308,15 +5319,34 @@ cmdRestore(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptBool(cmd, "reset-nvram")) flags |= VIR_DOMAIN_SAVE_RESET_NVRAM; + if (vshCommandOptString(ctl, cmd, "file", &from) < 0) + return false; + if (from && + virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_FILE, from) < 0) + return false; + if (vshCommandOptString(ctl, cmd, "xml", &xmlfile) < 0) return false; if (xmlfile && virFileReadAll(xmlfile, VSH_MAX_XML_FILE, &xml) < 0) return false; + if (xml && + virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_DXML, xml) < 0) + return false; + + if ((rc = vshCommandOptInt(ctl, cmd, "parallel-connections", &nchannels)) < 0) { + return false; + } else if (rc > 0) { + if (virTypedParamsAddInt(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, nchannels) < 0) + return false; + } if (flags || xml) { - rc = virDomainRestoreFlags(priv->conn, from, xml, flags); + rc = virDomainRestoreParams(priv->conn, params, nparams, flags); } else { rc = virDomainRestore(priv->conn, from); } -- 2.35.3

Happy new year everyone! One resolution I have is to revive this topic and strive to get the feature merged :-). On 8/8/24 17:37, Jim Fehlig wrote:
This series is essentially V1 of a prior RFC [1] to support QEMU's mapped-ram stream format [2] and migration capability. Along with supporting mapped-ram, it implements a design approach we discussed for supporting parallel save/restore [3]. In summary, the approach is
1. Add mapped-ram migration capability 2. Steal an element from save header 'unused' for a 'features' variable and bump save version to 3.
IIRC, the work stalled while looking for agreement on this part of the approach. It's implemented in patch7, and there I asked about using the 'format' field of SaveImageHeader, instead of introducing another field https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/NV4E... In fact, after looking again with fresher eyes, I'm wondering if adding a new 'format' is enough? I.e., add a new element to virQEMUSaveFormat enum. An older libvirt would refuse to process an image saved with the new format. This should allow us to avoid bumping the save version, which in turn avoids the version control knob in qemu.conf. Defaulting to mapped-ram would be a matter of changing the existing 'save_image_format' from 'raw' to 'sparse' (or whatever we want to call it). Does this seem reasonable? Have I forgot about something since this work stalled? Regards, Jim
3. Add /etc/libvirt/qemu.conf knob for the save format version, defaulting to latest v3 4. Use v3 (aka mapped-ram) by default 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2 6. include: Define constants for parallel save/restore 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2 8. qemu: Add support for parallel restore. Implies mapped-ram. Reject if v2 9. tools: add parallel parameter to virsh save command 10. tools: add parallel parameter to virsh restore command
With this series, saving and restoring using mapped-ram is enabled by default if the underlying QEMU advertises the mapped-ram migration capability. It can be disabled by changing the 'save_image_version' setting in qemu.conf.
To use mapped-ram with QEMU: - The 'mapped-ram' migration capability must be set to true - The 'multifd' migration capability must be set to true and the 'multifd-channels' migration parameter must set to a value >= 1 - QEMU must be provided an fdset containing the migration fd(s) - The 'migrate' qmp command is invoked with a URI referencing the fdset and an offset where to start reading or writing the data stream, e.g.
{"execute":"migrate", "arguments":{"detach":true,"resume":false, "uri":"file:/dev/fdset/0,offset=0x11921"}}
The mapped-ram stream, in conjunction with direct IO and multifd, can significantly improve the time required to save VM memory state. The following tables compare mapped-ram with the existing, sequential save stream. In all cases, the save and restore operations are to/from a block device comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The values in the 'save time' and 'restore time' columns were scraped from the 'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided by the corresponding outputs of stat(1).
VM: 32G RAM, 1 vcpu, idle (shortly after boot)
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+-------- legacy | 6.193s | 4.399s | 985744812 | 1925288 -----------------------+---------+---------+--------------+-------- mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472 -----------------------+---------+---------+--------------+-------- legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304 -----------------------+---------+---------+--------------+-------- mapped-ram + direct IO | | | | + multifd-channels=8 | 4.421s | 0.845s | 34368554318 | 1774312 -------------------------------------------------------------------
VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory
| save | restore | | time | time | Size | Blocks -----------------------+---------+---------+--------------+--------- legacy | 25.800s | 14.332s | 33154309983 | 64754512 -----------------------+---------+---------+--------------+--------- mapped-ram | 18.742s | 15.027s | 34368559228 | 64617160 -----------------------+---------+---------+--------------+--------- legacy + direct IO | 13.115s | 18.050s | 33154310496 | 64754520 -----------------------+---------+---------+--------------+--------- mapped-ram + direct IO | 13.623s | 15.959s | 34368557392 | 64662040 -----------------------+-------- +---------+--------------+--------- mapped-ram + direct IO | | | | + multifd-channels=8 | 6.994s | 6.470s | 34368554980 | 64665776 --------------------------------------------------------------------
As can be seen from the tables, one caveat of mapped-ram is the logical file size of a saved image is basically equivalent to the VM memory size. Note however that mapped-ram typically uses fewer blocks on disk.
Support for mapped-ram+direct-io only recently landed in upstream QEMU and will first appear in the 9.1 release, which may complicate merging support in libvirt. Specifically, I'm not sure how to detect if the combination is supported by QEMU. Suggestions welcomed.
Similar to the RFC, V1 ignores compression. libvirt currently supports compression by connecting the output of QEMU's save stream to the specified compression program via a pipe. This approach is incompatible with mapped-ram since the fd provided to QEMU must be seekable. In general, we can consider mapped-ram and compression incompatible and document they cannot be used together.
[1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/EF6Y...
[2] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapp...
[3] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/K4BD...
Claudio Fontana (2): include: Define constants for parallel save/restore tools: add parallel parameter to virsh restore command
Jim Fehlig (17): lib: virDomainSaveParams: Ensure absolute save path qemu_fd: Add function to retrieve fdset ID qemu: Add function to check capability in migration params qemu: Add function to get bool value from migration params qemu: Add mapped-ram migration capability qemu: Add function to get migration params for save qemu: QEMU_SAVE_VERSION: Bump to version 3 qemu: conf: Add setting for save image version qemu: Add helper function for creating save image fd qemu: Add support for mapped-ram on save qemu: Decompose qemuSaveImageOpen qemu: Move creation of qemuProcessIncomingDef struct qemu: Apply migration parameters in qemuMigrationDstRun qemu: Add support for mapped-ram on restore qemu: Support O_DIRECT with mapped-ram on save qemu: Support O_DIRECT with mapped-ram on restore qemu: Add support for parallel save and restore
Li Zhang (1): tools: add parallel parameter to virsh save command
docs/manpages/virsh.rst | 9 +- include/libvirt/libvirt-domain.h | 13 ++ src/libvirt-domain.c | 52 +++++-- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf.in | 6 + src/qemu/qemu_conf.c | 16 +++ src/qemu/qemu_conf.h | 5 + src/qemu/qemu_driver.c | 104 +++++++++----- src/qemu/qemu_fd.c | 18 +++ src/qemu/qemu_fd.h | 3 + src/qemu/qemu_migration.c | 192 +++++++++++++++++-------- src/qemu/qemu_migration.h | 9 +- src/qemu/qemu_migration_params.c | 86 ++++++++++++ src/qemu/qemu_migration_params.h | 17 +++ src/qemu/qemu_monitor.c | 39 ++++++ src/qemu/qemu_monitor.h | 5 + src/qemu/qemu_process.c | 120 +++++++++++----- src/qemu/qemu_process.h | 19 ++- src/qemu/qemu_saveimage.c | 216 ++++++++++++++++++++--------- src/qemu/qemu_saveimage.h | 35 +++-- src/qemu/qemu_snapshot.c | 26 ++-- src/qemu/test_libvirtd_qemu.aug.in | 1 + tools/virsh-domain.c | 79 +++++++++-- 23 files changed, 827 insertions(+), 244 deletions(-)

On 1/8/25 16:38, Jim Fehlig wrote:
Happy new year everyone! One resolution I have is to revive this topic and strive to get the feature merged :-).
On 8/8/24 17:37, Jim Fehlig wrote:
This series is essentially V1 of a prior RFC [1] to support QEMU's mapped-ram stream format [2] and migration capability. Along with supporting mapped-ram, it implements a design approach we discussed for supporting parallel save/restore [3]. In summary, the approach is
1. Add mapped-ram migration capability 2. Steal an element from save header 'unused' for a 'features' variable and bump save version to 3.
IIRC, the work stalled while looking for agreement on this part of the approach. It's implemented in patch7, and there I asked about using the 'format' field of SaveImageHeader, instead of introducing another field
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/ NV4E4O2STJQU7F52RFHWLE52C42NX75E/
In fact, after looking again with fresher eyes, I'm wondering if adding a new 'format' is enough? I.e., add a new element to virQEMUSaveFormat enum. An older libvirt would refuse to process an image saved with the new format. This should allow us to avoid bumping the save version, which in turn avoids the version control knob in qemu.conf. Defaulting to mapped-ram would be a matter of changing the existing 'save_image_format' from 'raw' to 'sparse' (or whatever we want to call it).
Does this seem reasonable? Have I forgot about something since this work stalled?
FYI, I've been adjusting the series according to this proposal. Looks good so far. I wont be able to work on it tomorrow, but should have an updated patchset posted soon. Regards, Jim
participants (4)
-
Claudio Fontana
-
Daniel P. Berrangé
-
Fabiano Rosas
-
Jim Fehlig