[libvirt RFCv11 00/33] multifd save restore prototype

This is v11 of the multifd save prototype, which focuses on saving to a single file instead of requiring multiple separate files for multifd channels. This series demonstrates a way to save and restore from a single file by using interleaved channels of a size equal to the transfer buffer size (1MB), and relying on UNIX holes to avoid wasting physical disk space due to channels of different size. KNOWN ISSUES: a) still applies only to save/restore (no managed save etc) b) this is not not done in QEMU, where it could be possible to teach QEMU to migrate directly to a file or block device in a block-aligned way by altering the migration stream code and the state migration code of all devices. changes from v10: * virfile: add new API virFileDiskCopyChannel, which extends the existing virFileDiskCopy to work with parallel channels in the file. * drop use of virthread API, use GLIB for threads. * pass only a single FD to the multifd-helper, which will then open additional FDs as required for the multithreaded I/O. * simplify virQEMUSaveFd API, separating the initialization from the addition of extra channels. * adapt all documentation to mention a single file instead of multiple. * remove the "Lim" versions of virFileDirectRead and Write, they are not needed. --- changes from v9: * exposed virFileDirectAlign * separated the >= 2 QEMU_SAVE_VERSION change in own patch * reworked the write code to add the alignment padding to the data_len, making the on disk format compatible when loaded from an older libvirt. * reworked the read code to use direct I/O APIs only for actual direct I/O file descriptors, so as to make old images work with newer libvirt. --- changes from v8: * rebased on master * reordered patches to add more upstreamable content at the start * split introduction of virQEMUSaveFd, so the first part is multifd-free * new virQEMUSaveDataRead as a mirror of virQEMUSaveDataWrite * introduced virFileDirect API, using it in virFileDisk operations and for virQEMUSaveRead and virQEMUSaveWrite --- changes from v7: * [ base params API and iohelper refactoring upstreamed ] * extended the QEMU save image format more, to record the nr of multifd channels on save. Made the data header struct packed. * removed --parallel-connections from the restore command, as now it is useless due to QEMU save image format extension. * separate out patches to expose migration_params APIs to saveimage, including qemuMigrationParamsSetString, SetCap, SetInt. * fixed bugs in the ImageOpen patch (missing saveFd init), removed some whitespace, and fixed some convoluted code paths for return value -3. --- changes from v6: * improved error path handling, with error messages and especially cancellation of qemu process on error during restore. * split patches more and reordered them to keep general refactoring at the beginning before the --parallel stuff is introduced. * improved multifd compression support, including adding an enum and extending the QEMU save image format to record the compression used on save, and pick it up automatically on restore. --- changes from v4: * runIO renamed to virFileDiskCopy and rethought arguments * renamed new APIs from ...ParametersFlags to ...Params * introduce the new virDomainSaveParams and virDomainRestoreParams without any additional parameters, so they can be upstreamed first. * solved the issue in the gendispatch.pl script generating code that was missing the conn parameter. --- changes from v3: * reordered series to have all helper-related change at the start * solved all reported issues from ninja test, including documentation * fixed most broken migration capabilities code (still imperfect likely) * added G_GNUC_UNUSED as needed * after multifd restore, added what I think were the missing operations: qemuProcessRefreshState(), qemuProcessStartCPUs() - most importantly, virDomainObjSave() The domain now starts running after restore without further encouragement * removed the sleep(10) from the multifd-helper --- changes from v2: * added ability to restore the VM from disk using multifd * fixed the multifd-helper to work in both directions, assuming the need to listen for save, and connect for restore. * fixed a large number of bugs, and probably introduced some :-) --- Claudio Fontana (33): virfile: introduce virFileDirect APIs virfile: use virFileDirect API in runIOCopy qemu: saveimage: rework image read/write to be O_DIRECT friendly qemu: saveimage: assume future formats will also support compression virfile: virFileDiskCopy: prepare for O_DIRECT files without wrapper qemu: saveimage: introduce virQEMUSaveFd qemu: saveimage: convert qemuSaveImageCreate to use virQEMUSaveFd qemu: saveimage: convert qemuSaveImageOpen to use virQEMUSaveFd tools: prepare doSave to use parameters tools: prepare cmdRestore to use parameters libvirt: add new VIR_DOMAIN_SAVE_PARALLEL flag and parameter qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in save qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in restore virfile: add new API virFileDiskCopyChannel multifd-helper: new helper for parallel save/restore qemu: saveimage: update virQEMUSaveFd struct for parallel save qemu: saveimage: wire up saveimage code with the multifd helper qemu: capabilities: add multifd to the probed migration capabilities qemu: saveimage: add multifd related fields to save format qemu: migration_params: add APIs to set Int and Cap qemu: migration: implement qemuMigrationSrcToFilesMultiFd for save qemu: add parameter to qemuMigrationDstRun to skip waiting qemu: implement qemuSaveImageLoadMultiFd for restore tools: add parallel parameter to virsh save command tools: add parallel parameter to virsh restore command qemu: add migration parameter multifd-compression libvirt: add new VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION qemu: saveimage: add parallel compression argument to ImageCreate qemu: saveimage: add stub support for multifd compression parameter qemu: migration: expose qemuMigrationParamsSetString qemu: saveimage: implement multifd-compression in parallel save qemu: saveimage: restore compressed parallel images tools: add parallel-compression parameter to virsh save command docs/manpages/virsh.rst | 26 +- include/libvirt/libvirt-domain.h | 24 + po/POTFILES | 1 + src/libvirt_private.syms | 6 + src/qemu/qemu_capabilities.c | 4 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_driver.c | 146 ++-- src/qemu/qemu_migration.c | 160 ++-- src/qemu/qemu_migration.h | 16 +- src/qemu/qemu_migration_params.c | 71 +- src/qemu/qemu_migration_params.h | 15 + src/qemu/qemu_process.c | 3 +- src/qemu/qemu_process.h | 5 +- src/qemu/qemu_saveimage.c | 703 +++++++++++++----- src/qemu/qemu_saveimage.h | 69 +- src/qemu/qemu_snapshot.c | 6 +- src/util/iohelper.c | 3 + src/util/meson.build | 16 + src/util/multifd-helper.c | 359 +++++++++ src/util/virfile.c | 391 +++++++--- src/util/virfile.h | 11 + .../caps_4.0.0.aarch64.xml | 1 + .../qemucapabilitiesdata/caps_4.0.0.ppc64.xml | 1 + .../caps_4.0.0.riscv32.xml | 1 + .../caps_4.0.0.riscv64.xml | 1 + .../qemucapabilitiesdata/caps_4.0.0.s390x.xml | 1 + .../caps_4.0.0.x86_64.xml | 1 + .../caps_4.1.0.x86_64.xml | 1 + .../caps_4.2.0.aarch64.xml | 1 + .../qemucapabilitiesdata/caps_4.2.0.ppc64.xml | 1 + .../qemucapabilitiesdata/caps_4.2.0.s390x.xml | 1 + .../caps_4.2.0.x86_64.xml | 1 + .../caps_5.0.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 2 + .../caps_5.0.0.riscv64.xml | 2 + .../caps_5.0.0.x86_64.xml | 2 + .../qemucapabilitiesdata/caps_5.1.0.sparc.xml | 2 + .../caps_5.1.0.x86_64.xml | 2 + .../caps_5.2.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_5.2.0.ppc64.xml | 2 + .../caps_5.2.0.riscv64.xml | 2 + .../qemucapabilitiesdata/caps_5.2.0.s390x.xml | 2 + .../caps_5.2.0.x86_64.xml | 2 + .../caps_6.0.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_6.0.0.s390x.xml | 2 + .../caps_6.0.0.x86_64.xml | 2 + .../caps_6.1.0.x86_64.xml | 2 + .../caps_6.2.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_6.2.0.ppc64.xml | 2 + .../caps_6.2.0.x86_64.xml | 2 + .../caps_7.0.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_7.0.0.ppc64.xml | 2 + .../caps_7.0.0.x86_64.xml | 2 + .../caps_7.1.0.x86_64.xml | 2 + tools/virsh-domain.c | 101 ++- 55 files changed, 1748 insertions(+), 445 deletions(-) create mode 100644 src/util/multifd-helper.c -- 2.26.2

these functions help with allocating buffers, aligning, reading and writing files opened with O_DIRECT. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/libvirt_private.syms | 6 ++ src/util/virfile.c | 174 +++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 9 ++ 3 files changed, 189 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index bfedd85326..702f640b5d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2240,7 +2240,13 @@ virFileComparePaths; virFileCopyACLs; virFileDataSync; virFileDeleteTree; +virFileDirectAlign; +virFileDirectBufferNew; +virFileDirectCopyBuf; virFileDirectFdFlag; +virFileDirectRead; +virFileDirectReadCopy; +virFileDirectWrite; virFileExists; virFileFclose; virFileFdopen; diff --git a/src/util/virfile.c b/src/util/virfile.c index 99da058db3..e8500704e5 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -192,6 +192,135 @@ virFileDirectFdFlag(void) return O_DIRECT ? O_DIRECT : -1; } +/** + * virFileDirectAlign: align a value, which may include a pointer. + * + * @value: the value that will be incremented to reach alignment + * + * Returns the aligned value. + */ +uintptr_t +virFileDirectAlign(uintptr_t value) +{ + return (value + VIR_FILE_DIRECT_ALIGN_MASK) & ~VIR_FILE_DIRECT_ALIGN_MASK; +} + +/** + * virFileDirectRead: perform a single aligned read. + * @fd: O_DIRECT file descriptor + * @buf: aligned buffer to read into + * @count: the desired read size. Note that if unaligned, + * extra bytes will be read based on alignment. + * + * Note that buf should be able to contain count plus the alignment! + * + * Returns < 0 and errno set on error, or the number of bytes read, + * which may be smaller or even greater than count. + */ +ssize_t +virFileDirectRead(int fd, void *buf, size_t count) +{ + size_t aligned_count = virFileDirectAlign(count); + while (count > 0) { + ssize_t r = read(fd, buf, aligned_count); + if (r < 0 && errno == EINTR) + continue; + return r; + } + return 0; +} + +/** + * virFileDirectCopyBuf: copy a buffer contents to destination in memory + * @buf: aligned buffer to copy from + * @count: amount of data to copy + * @dst: the destination in memory + * @dst_len: the maximum the destination can hold. + * + * copies up to count bytes from buf into dst, but not more than dst_len. + * increments buf pointer and dst pointer, as well as decrementing the + * maximum the destination can hold (dst_len). + * + * Returns the amount copied. + */ +size_t +virFileDirectCopyBuf(void **buf, size_t count, void **dst, size_t *dst_len) +{ + size_t to_copy; + + to_copy = count > *dst_len ? *dst_len : count; + memcpy(*dst, *buf, to_copy); + *dst_len -= to_copy; + *(char **)dst += to_copy; + *(char **)buf += to_copy; + return to_copy; +} + +/** + * virFileDirectReadCopy: read an fd and copy contents to memory + * @fd: O_DIRECT file descriptor + * @buf: aligned buffer to read the fd into, and then copy from + * @buflen: size of the buffer + * @dst: the destination in memory + * @dst_len: the maximum the destination can hold. + * + * reads data from the fd file descriptor into the buffer, + * and then copy to the destination, filling it up to dst_len. + * + * Returns < 0 and errno set on error, + * or the number of bytes read, which may be past the requested dst_len, + * or may be smaller if the fd does not contain enough data. + * + * The buf pointer is updated to point to eventual exccess data in the buffer. + */ +ssize_t +virFileDirectReadCopy(int fd, void **buf, size_t buflen, void *dst, size_t dst_len) +{ + ssize_t nread = 0; + void *d = dst; + char *s = *buf; + + while (dst_len > 0) { + ssize_t rv; + *buf = s; + rv = virFileDirectRead(fd, s, dst_len < buflen ? dst_len : buflen); + if (rv < 0) + return rv; + if (rv == 0) + return nread; /* not enough data to fulfill request */ + + nread += rv; /* note, we might read past the requested len */ + virFileDirectCopyBuf(buf, rv, &d, &dst_len); + } + return nread; +} + +/** + * virFileDirectWrite: perform a single aligned write. + * @fd: O_DIRECT file descriptor to write to + * @buf: aligned buffer to write from + * @count: the desired write size. Note that if unaligned, + * extra 0 bytes will be written based on alignment. + * + * Returns < 0 and errno set on error, or the number of bytes written, + * which may be smaller or even greater than count. + */ +ssize_t +virFileDirectWrite(int fd, void *buf, size_t count) +{ + size_t aligned_count = virFileDirectAlign(count); + if (aligned_count > count) { + memset((char *)buf + count, 0, aligned_count - count); + } + while (count > 0) { + ssize_t r = write(fd, buf, aligned_count); /* sc_avoid_write */ + if (r < 0 && errno == EINTR) + continue; + return r; + } + return 0; +} + /* Opaque type for managing a wrapper around a fd. For now, * read-write is not supported, just a single direction. */ struct _virFileWrapperFd { @@ -202,6 +331,41 @@ struct _virFileWrapperFd { #ifndef WIN32 +/** + * virFileDirectBufferNew: allocate a buffer and return the first + * block-aligned address in it. + * + * @alloc_base: pointer to the to-be-allocated memory buffer. + * @buflen: desired length, which should be greater than alignment. + * + * Allocate a memory area large enough to accommodate an aligned + * buffer of size buflen. + * + * On success, *alloc_base is set to the newly allocated memory, + * and the aligned buffer within it is returned. + * + * On failure, *alloc_base is set to NULL and the function + * returns NULL. + */ +void * +virFileDirectBufferNew(void **alloc_base, size_t buflen) +{ + void *buf; + buflen = virFileDirectAlign(buflen); + +# if WITH_POSIX_MEMALIGN + if (posix_memalign(alloc_base, VIR_FILE_DIRECT_ALIGN_MASK + 1, buflen)) { + *alloc_base = NULL; + return NULL; + } + buf = *alloc_base; +# else + *alloc_base = g_malloc(buflen + VIR_FILE_DIRECT_ALIGN_MASK); + buf = virFileDirectAlign((uintptr_t)*alloc_base); +# endif + return buf; +} + # ifdef __linux__ /** @@ -372,6 +536,16 @@ virFileWrapperFdNew(int *fd, const char *name, unsigned int flags) return NULL; } #else /* WIN32 */ + +void * +virFileDirectBufferNew(void **alloc_base G_GNUC_UNUSED, + size_t buflen G_GNUC_UNUSED) +{ + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("virFileDirectBufferNew unsupported on this platform")); + return NULL; +} + virFileWrapperFd * virFileWrapperFdNew(int *fd G_GNUC_UNUSED, const char *name G_GNUC_UNUSED, diff --git a/src/util/virfile.h b/src/util/virfile.h index 8e378efe30..844261e0a4 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -104,6 +104,15 @@ typedef struct _virFileWrapperFd virFileWrapperFd; int virFileDirectFdFlag(void); +#define VIR_FILE_DIRECT_ALIGN_MASK ((64 * 1024) - 1) + +void *virFileDirectBufferNew(void **alloc_base, size_t buflen); +uintptr_t virFileDirectAlign(uintptr_t value); +ssize_t virFileDirectRead(int fd, void *buf, size_t count); +ssize_t virFileDirectWrite(int fd, void *buf, size_t count); +size_t virFileDirectCopyBuf(void **buf, size_t count, void **dst, size_t *dst_len); +ssize_t virFileDirectReadCopy(int fd, void **buf, size_t buflen, void *dst, size_t dst_len); + typedef enum { VIR_FILE_WRAPPER_BYPASS_CACHE = (1 << 0), VIR_FILE_WRAPPER_NON_BLOCKING = (1 << 1), -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/util/virfile.c | 28 ++++++++-------------------- 1 file changed, 8 insertions(+), 20 deletions(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index e8500704e5..770649108f 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -4776,20 +4776,14 @@ static off_t runIOCopy(const struct runIOParams p) { g_autofree void *base = NULL; /* Location to be freed */ - char *buf = NULL; /* Aligned location within base */ - size_t buflen = 1024*1024; - intptr_t alignMask = 64*1024 - 1; off_t total = 0; + size_t buflen = 1024*1024; + char *buf = virFileDirectBufferNew(&base, buflen); -# if WITH_POSIX_MEMALIGN - if (posix_memalign(&base, alignMask + 1, buflen)) - abort(); - buf = base; -# else - buf = g_new0(char, buflen + alignMask); - base = buf; - buf = (char *) (((intptr_t) base + alignMask) & ~alignMask); -# endif + if (!buf) { + virReportSystemError(errno, _("Failed to allocate aligned memory in function %s"), __FUNCTION__); + return -5; + } while (1) { ssize_t got; @@ -4801,9 +4795,7 @@ runIOCopy(const struct runIOParams p) * In other cases using saferead reduces number of syscalls. */ if (!p.isWrite && p.isDirect) { - if ((got = read(p.fdin, buf, buflen)) < 0 && - errno == EINTR) - continue; + got = virFileDirectRead(p.fdin, buf, buflen); } else { got = saferead(p.fdin, buf, buflen); } @@ -4819,11 +4811,7 @@ runIOCopy(const struct runIOParams p) /* handle last write size align in direct case */ if (got < buflen && p.isDirect && p.isWrite) { - ssize_t aligned_got = (got + alignMask) & ~alignMask; - - memset(buf + got, 0, aligned_got - got); - - if (safewrite(p.fdout, buf, aligned_got) < 0) { + if (virFileDirectWrite(p.fdout, buf, got) < 0) { virReportSystemError(errno, _("Unable to write %s"), p.fdoutname); return -3; } -- 2.26.2

maintain a compatible image format (still QEMU_SAVE_VERSION 2), but change the on-disk representation to be more O_DIRECT friendly: 1) ensure that the header struct fields are packed, so we can be sure no padding will ever ruin the day in the future 2) finish the libvirt header (header + xml + cookie) with zero padding, in order to ensure that the QEMU VM (QEVM Magic) is aligned. Adapt the read and write of the libvirt header accordingly. The old images should be still loadable after this change, and the new images should be still loadable in old libvirt as well. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_saveimage.c | 267 ++++++++++++++++++++++++-------------- src/qemu/qemu_saveimage.h | 20 +-- 2 files changed, 182 insertions(+), 105 deletions(-) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 4fd4c5cfcd..276033ff6a 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -139,12 +139,12 @@ virQEMUSaveDataWrite(virQEMUSaveData *data, int fd, const char *path) { + g_autofree void *base = NULL; + char *buf, *cur; virQEMUSaveHeader *header = &data->header; size_t len; size_t xml_len; size_t cookie_len = 0; - size_t zerosLen = 0; - g_autofree char *zeros = NULL; xml_len = strlen(data->xml) + 1; if (data->cookie) @@ -165,42 +165,190 @@ virQEMUSaveDataWrite(virQEMUSaveData *data, return -1; } } + /* align data_len */ + len = sizeof(*header) + header->data_len; + header->data_len += virFileDirectAlign(len) - len; - zerosLen = header->data_len - len; - zeros = g_new0(char, zerosLen); + buf = virFileDirectBufferNew(&base, sizeof(*header) + header->data_len); + cur = buf; if (data->cookie) header->cookieOffset = xml_len; - if (safewrite(fd, header, sizeof(*header)) != sizeof(*header)) { - virReportSystemError(errno, - _("failed to write header to domain save file '%s'"), - path); - return -1; + memcpy(cur, header, sizeof(*header)); + cur += sizeof(*header); + memcpy(cur, data->xml, xml_len); + cur += xml_len; + if (data->cookie) { + memcpy(cur, data->cookie, cookie_len); + cur += cookie_len; } - if (safewrite(fd, data->xml, xml_len) != xml_len) { + if (virFileDirectWrite(fd, buf, sizeof(*header) + header->data_len) < 0) { virReportSystemError(errno, - _("failed to write domain xml to '%s'"), + _("failed to write libvirt header of domain save file '%s'"), path); return -1; } - if (data->cookie && - safewrite(fd, data->cookie, cookie_len) != cookie_len) { + return 0; +} + +/* virQEMUSaveDataRead: + * + * Reads libvirt's header (including domain XML) from a saved image. + * + * Returns -1 on generic failure, -3 on a corrupted image, or 0 on success. + */ +int +virQEMUSaveDataRead(virQEMUSaveData *data, + int fd, + const char *path) +{ + g_autofree void *base = NULL; + virQEMUSaveHeader *header = &data->header; + size_t xml_len; + size_t cookie_len; + ssize_t rv; + void *dst; + int oflags = fcntl(fd, F_GETFL); + bool isDirect = O_DIRECT && (oflags & O_DIRECT); + size_t buflen = 1024 * 1024; + char *buf = NULL; + void *src = NULL; + + header = &data->header; + if (isDirect) { + buf = virFileDirectBufferNew(&base, buflen); + src = buf; + rv = virFileDirectReadCopy(fd, &src, buflen, header, sizeof(*header)); + } else { + rv = saferead(fd, header, sizeof(*header)); + } + if (rv < 0) { virReportSystemError(errno, - _("failed to write cookie to '%s'"), + _("failed to read libvirt header of domain save file '%s'"), path); return -1; } + if (rv < sizeof(*header)) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("domain save file '%s' libvirt header appears truncated"), + path); + return -3; + } + rv -= sizeof(*header); - if (safewrite(fd, zeros, zerosLen) != zerosLen) { - virReportSystemError(errno, - _("failed to write padding to '%s'"), - path); + if (memcmp(header->magic, QEMU_SAVE_MAGIC, sizeof(header->magic)) != 0) { + if (memcmp(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic)) == 0) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("domain save file '%s' seems incomplete"), + path); + return -3; + } + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("image magic is incorrect")); + return -1; + } + if (header->version > QEMU_SAVE_VERSION) { + /* convert endianness and try again */ + qemuSaveImageBswapHeader(header); + } + if (header->version > QEMU_SAVE_VERSION) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("image version is not supported (%d > %d)"), + header->version, QEMU_SAVE_VERSION); return -1; } + if (header->cookieOffset) + xml_len = header->cookieOffset; + else + xml_len = header->data_len; + if (xml_len <= 0) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("invalid xml length: %lu"), (unsigned long)xml_len); + return -1; + } + if (header->data_len < xml_len) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("invalid cookieOffset: %u"), header->cookieOffset); + return -1; + } + cookie_len = header->data_len - xml_len; + data->xml = g_new0(char, xml_len); + dst = data->xml; + if (rv > 0) { + if (isDirect) { + rv -= virFileDirectCopyBuf(&src, rv, &dst, &xml_len); + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("unexpected leftover data in non-direct mode")); + return -1; + } + } + if (xml_len > 0) { + if (isDirect) { + rv = virFileDirectReadCopy(fd, &src, buflen, dst, xml_len); + } else { + rv = saferead(fd, dst, xml_len); + } + if (rv < 0) { + virReportSystemError(errno, + _("failed to read libvirt xml in domain save file '%s'"), + path); + return -1; + } + if (rv < xml_len) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("domain save file '%s' xml seems incomplete"), + path); + return -3; + } + rv -= xml_len; + } + if (cookie_len > 0) { + data->cookie = g_new0(char, cookie_len); + dst = data->cookie; + if (rv > 0) { + if (isDirect) { + rv -= virFileDirectCopyBuf(&src, rv, &dst, &cookie_len); + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("unexpected leftover data in non-direct mode")); + return -1; + } + } + if (cookie_len > 0) { + if (isDirect) { + rv = virFileDirectReadCopy(fd, &src, buflen, dst, cookie_len); + } else { + rv = saferead(fd, dst, cookie_len); + } + if (rv < 0) { + virReportSystemError(errno, + _("failed to read libvirt cookie in domain save file '%s'"), + path); + return -1; + } + if (rv < cookie_len) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("domain save file '%s' cookie seems incomplete"), + path); + return -3; + } + rv -= cookie_len; + } + } + /* + * we should now be aligned and ready to read the QEVM. + */ + if (rv > 0) { + if (isDirect) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("unexpected unaligned data in direct mode")); + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("unexpected leftover data in non-direct mode")); + return -1; + } + } return 0; } @@ -444,11 +592,8 @@ qemuSaveImageOpen(virQEMUDriver *driver, VIR_AUTOCLOSE fd = -1; int ret = -1; g_autoptr(virQEMUSaveData) data = NULL; - virQEMUSaveHeader *header; g_autoptr(virDomainDef) def = NULL; int oflags = open_write ? O_RDWR : O_RDONLY; - size_t xml_len; - size_t cookie_len; if (bypass_cache) { int directFlag = virFileDirectFdFlag(); @@ -469,89 +614,17 @@ qemuSaveImageOpen(virQEMUDriver *driver, return -1; data = g_new0(virQEMUSaveData, 1); - - header = &data->header; - if (saferead(fd, header, sizeof(*header)) != sizeof(*header)) { - if (unlink_corrupt) { + ret = virQEMUSaveDataRead(data, fd, path); + if (ret < 0) { + if (unlink_corrupt && ret == -3) { if (unlink(path) < 0) { virReportSystemError(errno, _("cannot remove corrupt file: %s"), path); return -1; - } else { - return -3; } } - - virReportError(VIR_ERR_OPERATION_FAILED, - "%s", _("failed to read qemu header")); - return -1; - } - - if (memcmp(header->magic, QEMU_SAVE_MAGIC, sizeof(header->magic)) != 0) { - if (memcmp(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic)) == 0) { - if (unlink_corrupt) { - if (unlink(path) < 0) { - virReportSystemError(errno, - _("cannot remove corrupt file: %s"), - path); - return -1; - } else { - return -3; - } - } - - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("save image is incomplete")); - return -1; - } - - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("image magic is incorrect")); - return -1; - } - - if (header->version > QEMU_SAVE_VERSION) { - /* convert endianness and try again */ - qemuSaveImageBswapHeader(header); - } - - if (header->version > QEMU_SAVE_VERSION) { - virReportError(VIR_ERR_OPERATION_FAILED, - _("image version is not supported (%d > %d)"), - header->version, QEMU_SAVE_VERSION); - return -1; - } - - if (header->data_len <= 0) { - virReportError(VIR_ERR_OPERATION_FAILED, - _("invalid header data length: %d"), header->data_len); - return -1; - } - - if (header->cookieOffset) - xml_len = header->cookieOffset; - else - xml_len = header->data_len; - - cookie_len = header->data_len - xml_len; - - data->xml = g_new0(char, xml_len); - - if (saferead(fd, data->xml, xml_len) != xml_len) { - virReportError(VIR_ERR_OPERATION_FAILED, - "%s", _("failed to read domain XML")); - return -1; - } - - if (cookie_len > 0) { - data->cookie = g_new0(char, cookie_len); - - if (saferead(fd, data->cookie, cookie_len) != cookie_len) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("failed to read cookie")); - return -1; - } + return ret; } /* Create a domain from this XML */ diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 391cd55ed0..f12374b3d7 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -36,14 +36,14 @@ G_STATIC_ASSERT(sizeof(QEMU_SAVE_MAGIC) == sizeof(QEMU_SAVE_PARTIAL)); typedef struct _virQEMUSaveHeader virQEMUSaveHeader; struct _virQEMUSaveHeader { - char magic[sizeof(QEMU_SAVE_MAGIC)-1]; - uint32_t version; - uint32_t data_len; - uint32_t was_running; - uint32_t compressed; - uint32_t cookieOffset; - uint32_t unused[14]; -}; + char magic[sizeof(QEMU_SAVE_MAGIC)-1]; /* 16 bytes */ + uint32_t version; /* 4 bytes */ + uint32_t data_len; /* 4 bytes */ + uint32_t was_running; /* 4 bytes */ + uint32_t compressed; /* 4 bytes */ + uint32_t cookieOffset; /* 4 bytes */ + uint32_t unused[14]; /* 56 bytes */ +} ATTRIBUTE_PACKED; /* = 92 bytes */ typedef struct _virQEMUSaveData virQEMUSaveData; @@ -103,6 +103,10 @@ int virQEMUSaveDataWrite(virQEMUSaveData *data, int fd, const char *path); +int +virQEMUSaveDataRead(virQEMUSaveData *data, + int fd, + const char *path); virQEMUSaveData * virQEMUSaveDataNew(char *domXML, -- 2.26.2

change the version check to assume even future versions of the format will continue to support compression. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_saveimage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 276033ff6a..af97f45114 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -674,7 +674,7 @@ qemuSaveImageStartVM(virConnectPtr conn, virDomainXMLOptionGetSaveCookie(driver->xmlopt)) < 0) goto cleanup; - if ((header->version == 2) && + if ((header->version >= 2) && (header->compressed != QEMU_SAVE_FORMAT_RAW)) { if (!(cmd = qemuSaveImageGetCompressionCommand(header->compressed))) goto cleanup; -- 2.26.2

we will allow to use already open fds that are not empty for both read and write, as long as they are properly aligned. Adapt the truncation to take into account the initial offset. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/util/virfile.c | 44 +++++++++++++++++++++++--------------------- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/src/util/virfile.c b/src/util/virfile.c index 770649108f..201d7f4e64 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -4811,14 +4811,24 @@ runIOCopy(const struct runIOParams p) /* handle last write size align in direct case */ if (got < buflen && p.isDirect && p.isWrite) { - if (virFileDirectWrite(p.fdout, buf, got) < 0) { + ssize_t nwritten = virFileDirectWrite(p.fdout, buf, got); + if (nwritten < 0) { virReportSystemError(errno, _("Unable to write %s"), p.fdoutname); return -3; } - - if (!p.isBlockDev && ftruncate(p.fdout, total) < 0) { - virReportSystemError(errno, _("Unable to truncate %s"), p.fdoutname); - return -4; + if (!p.isBlockDev) { + off_t off = lseek(p.fdout, (off_t)0, SEEK_CUR); + if (off < 0) { + virReportSystemError(errno, "%s", _("Failed to lseek to get current file offset")); + return -6; + } + if (nwritten > got) { + off -= nwritten - got; + } + if (ftruncate(p.fdout, off) < 0) { + virReportSystemError(errno, _("Unable to truncate %s"), p.fdoutname); + return -4; + } } break; @@ -4898,23 +4908,15 @@ virFileDiskCopy(int disk_fd, const char *disk_path, int remote_fd, const char *r (oflags & O_ACCMODE)); goto cleanup; } - /* To make the implementation simpler, we give up on any - * attempt to use O_DIRECT in a non-trivial manner. */ if (!p.isBlockDev && p.isDirect) { - off_t off; - if (p.isWrite) { - /* - * note: for write we do not only check that disk_fd is seekable, - * we also want to know that the file is empty, so we need SEEK_END. - */ - if ((off = lseek(disk_fd, 0, SEEK_END)) != 0) { - virReportSystemError(off < 0 ? errno : EINVAL, "%s", - _("O_DIRECT write needs empty seekable file")); - goto cleanup; - } - } else if ((off = lseek(disk_fd, 0, SEEK_CUR)) != 0) { - virReportSystemError(off < 0 ? errno : EINVAL, "%s", - _("O_DIRECT read needs entire seekable file")); + off_t off = lseek(disk_fd, 0, SEEK_CUR); + if (off < 0) { + virReportSystemError(errno, "%s", _("O_DIRECT needs a seekable file")); + goto cleanup; + } + if (virFileDirectAlign(off) != off) { + /* we could write some zeroes, but maybe it is safer to just fail */ + virReportSystemError(EINVAL, "%s", _("O_DIRECT attempted with unaligned file pointer")); goto cleanup; } } -- 2.26.2

use this data type to encapsulate the pathname, file descriptor, wrapper, and need to unlink. We also carry along the oflags used to open the file, and the driver configuration. This will make management of the resources associated with an FD used for QEMU save/restore much easier, reducing the amount of explicit cleanup required. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_saveimage.c | 115 ++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_saveimage.h | 23 ++++++++ 2 files changed, 138 insertions(+) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index af97f45114..4c440098fa 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -396,6 +396,121 @@ qemuSaveImageGetCompressionCommand(virQEMUSaveFormat compression) return ret; } +/* + * virQEMUSaveFdInit: initialize a virQEMUSaveFd + * + * @saveFd: the structure to initialize + * @base: the file name + * @oflags the file descriptor open flags + * @cfg: the driver config + * + * Returns -1 on error, 0 on success, + * and in both cases virQEMUSaveFdFini must be called to free resources. + */ +int virQEMUSaveFdInit(virQEMUSaveFd *saveFd, const char *base, + int oflags, virQEMUDriverConfig *cfg) +{ + unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING; + bool isCreat = oflags & O_CREAT; + bool isDirect = O_DIRECT && (oflags & O_DIRECT); + + saveFd->oflags = oflags; + saveFd->cfg = cfg; + + if (isDirect) + wrapperFlags |= VIR_FILE_WRAPPER_BYPASS_CACHE; + + saveFd->path = g_strdup(base); + saveFd->wrapper = NULL; + if (isCreat) { + saveFd->fd = virQEMUFileOpenAs(cfg->user, cfg->group, false, saveFd->path, + oflags, &saveFd->need_unlink); + } else { + saveFd->fd = qemuDomainOpenFile(cfg, NULL, saveFd->path, oflags, NULL); + } + if (saveFd->fd < 0) + return -1; + /* + * For O_CREAT, we always add the wrapper, + * and for !O_CREAT, we only add the wrapper if using O_DIRECT. + */ + if (isDirect || isCreat) { + saveFd->wrapper = virFileWrapperFdNew(&saveFd->fd, saveFd->path, wrapperFlags); + if (!saveFd->wrapper) + return -1; + } + return 0; +} + +/* + * virQEMUSaveFdClose: close a virQEMUSaveFd descriptor with normal close. + * + * @saveFd: the saveFd structure with the file descriptors to close. + * @vm: the virDomainObj (necessary to release lock), or NULL. + * + * If saveFd is NULL, the function will return success. + * + * Returns -1 on error, 0 on success. + */ +int virQEMUSaveFdClose(virQEMUSaveFd *saveFd, virDomainObj *vm) +{ + if (!saveFd) + return 0; + + if (VIR_CLOSE(saveFd->fd) < 0) { + virReportSystemError(errno, _("unable to close %s"), saveFd->path); + return -1; + } + if (vm) { + if (qemuDomainFileWrapperFDClose(vm, saveFd->wrapper) < 0) + return -1; + } else { + if (virFileWrapperFdClose(saveFd->wrapper) < 0) + return -1; + } + return 0; +} + +/* + * virQEMUSaveFdFini: finalize a virQEMUSaveFd + * + * @saveFd: the saveFd structure containing the resources to free. + * @vm: the virDomainObj (necessary to release lock for long close ops), or NULL. + * @ret: the current operation result (< 0 is failure) + * + * If saveFd is NULL, the return value will be unchanged. + * + * Returns ret, or -1 if an error is detected. + */ +int virQEMUSaveFdFini(virQEMUSaveFd *saveFd, virDomainObj *vm, int ret) +{ + if (!saveFd) + return ret; + VIR_FORCE_CLOSE(saveFd->fd); + if (vm) { + if (qemuDomainFileWrapperFDClose(vm, saveFd->wrapper) < 0) + ret = -1; + } else { + if (virFileWrapperFdClose(saveFd->wrapper) < 0) + ret = -1; + } + + if (ret < 0 && saveFd->need_unlink && saveFd->path) { + if (unlink(saveFd->path) < 0) { + virReportSystemError(errno, _("cannot remove file: %s"), + saveFd->path); + } + } + if (saveFd->wrapper) { + virFileWrapperFdFree(saveFd->wrapper); + saveFd->wrapper = NULL; + } + + g_free(saveFd->path); + saveFd->path = NULL; + return ret; +} + /* Helper function to execute a migration to file with a correct save header * the caller needs to make sure that the processors are stopped and do all other diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index f12374b3d7..5768e31fa6 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -54,6 +54,29 @@ struct _virQEMUSaveData { }; +typedef struct _virQEMUSaveFd virQEMUSaveFd; +struct _virQEMUSaveFd { + char *path; + int fd; + int oflags; + bool need_unlink; + virFileWrapperFd *wrapper; + virQEMUDriverConfig *cfg; +}; + +#define QEMU_SAVEFD_INVALID (virQEMUSaveFd) { \ + .path = NULL, .fd = -1, .oflags = 0, \ + .need_unlink = false, .wrapper = NULL, .cfg = NULL, \ +} + +int virQEMUSaveFdInit(virQEMUSaveFd *saveFd, const char *base, + int oflags, virQEMUDriverConfig *cfg) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(4); + +int virQEMUSaveFdClose(virQEMUSaveFd *saveFd, virDomainObj *vm); + +int virQEMUSaveFdFini(virQEMUSaveFd *saveFd, virDomainObj *vm, int ret); + virDomainDef * qemuSaveImageUpdateDef(virQEMUDriver *driver, virDomainDef *def, -- 2.26.2

now that we introduced virQEMUSaveFd, use it in the creation of a new save image. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_saveimage.c | 54 +++++++++++---------------------------- 1 file changed, 15 insertions(+), 39 deletions(-) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 4c440098fa..2a58bd23b8 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -525,41 +525,31 @@ qemuSaveImageCreate(virQEMUDriver *driver, virDomainAsyncJob asyncJob) { g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); - bool needUnlink = false; + virQEMUSaveFd saveFd = QEMU_SAVEFD_INVALID; + unsigned int oflags = O_WRONLY | O_TRUNC | O_CREAT; int ret = -1; - int fd = -1; - int directFlag = 0; - virFileWrapperFd *wrapperFd = NULL; - unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING; - /* Obtain the file handle. */ - if ((flags & VIR_DOMAIN_SAVE_BYPASS_CACHE)) { - wrapperFlags |= VIR_FILE_WRAPPER_BYPASS_CACHE; - directFlag = virFileDirectFdFlag(); - if (directFlag < 0) { + if (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) { + if (virFileDirectFdFlag() < 0) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("bypass cache unsupported by this system")); - goto cleanup; + return -1; } + oflags |= O_DIRECT; } - fd = virQEMUFileOpenAs(cfg->user, cfg->group, false, path, - O_WRONLY | O_TRUNC | O_CREAT | directFlag, - &needUnlink); - if (fd < 0) + if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg) < 0) goto cleanup; - - if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, fd) < 0) + if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, saveFd.fd) < 0) goto cleanup; - - if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags))) + if (virQEMUSaveDataWrite(data, saveFd.fd, saveFd.path) < 0) goto cleanup; - if (virQEMUSaveDataWrite(data, fd, path) < 0) + /* Perform the migration */ + if (qemuMigrationSrcToFile(driver, vm, saveFd.fd, compressor, asyncJob) < 0) goto cleanup; - /* Perform the migration */ - if (qemuMigrationSrcToFile(driver, vm, fd, compressor, asyncJob) < 0) + if (virQEMUSaveFdClose(&saveFd, vm) < 0) goto cleanup; /* Touch up file header to mark image complete. */ @@ -568,29 +558,15 @@ qemuSaveImageCreate(virQEMUDriver *driver, * up to seek backwards on wrapperFd. The reopened fd will * trigger a single page of file system cache pollution, but * that's acceptable. */ - if (VIR_CLOSE(fd) < 0) { - virReportSystemError(errno, _("unable to close %s"), path); - goto cleanup; - } - if (qemuDomainFileWrapperFDClose(vm, wrapperFd) < 0) - goto cleanup; - - if ((fd = qemuDomainOpenFile(cfg, vm->def, path, O_WRONLY, NULL)) < 0 || - virQEMUSaveDataFinish(data, &fd, path) < 0) + if ((saveFd.fd = qemuDomainOpenFile(cfg, vm->def, saveFd.path, O_WRONLY, NULL)) < 0 || + virQEMUSaveDataFinish(data, &saveFd.fd, saveFd.path) < 0) goto cleanup; ret = 0; cleanup: - VIR_FORCE_CLOSE(fd); - if (qemuDomainFileWrapperFDClose(vm, wrapperFd) < 0) - ret = -1; - virFileWrapperFdFree(wrapperFd); - - if (ret < 0 && needUnlink) - unlink(path); - + ret = virQEMUSaveFdFini(&saveFd, vm, ret); return ret; } -- 2.26.2

all the logic to open a fd, create a wrapper etc, is boilerplate code that is best reused, so change the Open function to take an existing already initialized virQEMUSaveFd. Adapt all callers of qemuSaveImageOpen. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_driver.c | 101 ++++++++++++++++++++++---------------- src/qemu/qemu_saveimage.c | 56 +++++---------------- src/qemu/qemu_saveimage.h | 9 ++-- 3 files changed, 73 insertions(+), 93 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0c6645ed89..9ad4945928 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5852,12 +5852,13 @@ qemuDomainRestoreInternal(virConnectPtr conn, virDomainObj *vm = NULL; g_autofree char *xmlout = NULL; const char *newxml = dxml; - int fd = -1; int ret = -1; virQEMUSaveData *data = NULL; - virFileWrapperFd *wrapperFd = NULL; + virQEMUSaveFd saveFd = QEMU_SAVEFD_INVALID; bool hook_taint = false; bool reset_nvram = false; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + int oflags = O_RDONLY; virCheckFlags(VIR_DOMAIN_SAVE_BYPASS_CACHE | VIR_DOMAIN_SAVE_RUNNING | @@ -5867,10 +5868,17 @@ qemuDomainRestoreInternal(virConnectPtr conn, if (flags & VIR_DOMAIN_SAVE_RESET_NVRAM) reset_nvram = true; - fd = qemuSaveImageOpen(driver, NULL, path, &def, &data, - (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) != 0, - &wrapperFd, false, false); - if (fd < 0) + if (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) { + if (virFileDirectFdFlag() < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("bypass cache unsupported by this system")); + return -1; + } + oflags |= O_DIRECT; + } + if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg) < 0) + return -1; + if (qemuSaveImageOpen(driver, NULL, &def, &data, false, &saveFd) < 0) goto cleanup; if (ensureACL(conn, def) < 0) @@ -5924,16 +5932,13 @@ qemuDomainRestoreInternal(virConnectPtr conn, flags) < 0) goto cleanup; - ret = qemuSaveImageStartVM(conn, driver, vm, &fd, data, path, + ret = qemuSaveImageStartVM(conn, driver, vm, &saveFd.fd, data, path, false, reset_nvram, VIR_ASYNC_JOB_START); qemuProcessEndJob(vm); cleanup: - VIR_FORCE_CLOSE(fd); - if (virFileWrapperFdClose(wrapperFd) < 0) - ret = -1; - virFileWrapperFdFree(wrapperFd); + ret = virQEMUSaveFdFini(&saveFd, vm, ret); virQEMUSaveDataFree(data); if (vm && ret < 0) qemuDomainRemoveInactive(driver, vm); @@ -5999,15 +6004,15 @@ qemuDomainSaveImageGetXMLDesc(virConnectPtr conn, const char *path, virQEMUDriver *driver = conn->privateData; char *ret = NULL; g_autoptr(virDomainDef) def = NULL; - int fd = -1; virQEMUSaveData *data = NULL; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + virQEMUSaveFd saveFd = QEMU_SAVEFD_INVALID; virCheckFlags(VIR_DOMAIN_SAVE_IMAGE_XML_SECURE, NULL); - fd = qemuSaveImageOpen(driver, NULL, path, &def, &data, - false, NULL, false, false); - - if (fd < 0) + if (virQEMUSaveFdInit(&saveFd, path, O_RDONLY, cfg) < 0) + return NULL; + if (qemuSaveImageOpen(driver, NULL, &def, &data, false, &saveFd) < 0) goto cleanup; if (virDomainSaveImageGetXMLDescEnsureACL(conn, def) < 0) @@ -6017,7 +6022,8 @@ qemuDomainSaveImageGetXMLDesc(virConnectPtr conn, const char *path, cleanup: virQEMUSaveDataFree(data); - VIR_FORCE_CLOSE(fd); + if (virQEMUSaveFdFini(&saveFd, NULL, ret ? 0 : -1) < 0) + ret = NULL; return ret; } @@ -6029,8 +6035,9 @@ qemuDomainSaveImageDefineXML(virConnectPtr conn, const char *path, int ret = -1; g_autoptr(virDomainDef) def = NULL; g_autoptr(virDomainDef) newdef = NULL; - int fd = -1; virQEMUSaveData *data = NULL; + virQEMUSaveFd saveFd = QEMU_SAVEFD_INVALID; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); int state = -1; virCheckFlags(VIR_DOMAIN_SAVE_RUNNING | @@ -6041,10 +6048,9 @@ qemuDomainSaveImageDefineXML(virConnectPtr conn, const char *path, else if (flags & VIR_DOMAIN_SAVE_PAUSED) state = 0; - fd = qemuSaveImageOpen(driver, NULL, path, &def, &data, - false, NULL, true, false); - - if (fd < 0) + if (virQEMUSaveFdInit(&saveFd, path, O_RDWR, cfg) < 0) + return -1; + if (qemuSaveImageOpen(driver, NULL, &def, &data, false, &saveFd) < 0) goto cleanup; if (virDomainSaveImageDefineXMLEnsureACL(conn, def) < 0) @@ -6071,15 +6077,15 @@ qemuDomainSaveImageDefineXML(virConnectPtr conn, const char *path, VIR_DOMAIN_XML_MIGRATABLE))) goto cleanup; - if (lseek(fd, 0, SEEK_SET) != 0) { + if (lseek(saveFd.fd, 0, SEEK_SET) != 0) { virReportSystemError(errno, _("cannot seek in '%s'"), path); goto cleanup; } - if (virQEMUSaveDataWrite(data, fd, path) < 0) + if (virQEMUSaveDataWrite(data, saveFd.fd, path) < 0) goto cleanup; - if (VIR_CLOSE(fd) < 0) { + if (virQEMUSaveFdClose(&saveFd, NULL) < 0) { virReportSystemError(errno, _("failed to write header data to '%s'"), path); goto cleanup; } @@ -6087,8 +6093,8 @@ qemuDomainSaveImageDefineXML(virConnectPtr conn, const char *path, ret = 0; cleanup: - VIR_FORCE_CLOSE(fd); virQEMUSaveDataFree(data); + ret = virQEMUSaveFdFini(&saveFd, NULL, ret); return ret; } @@ -6100,8 +6106,9 @@ qemuDomainManagedSaveGetXMLDesc(virDomainPtr dom, unsigned int flags) g_autofree char *path = NULL; char *ret = NULL; g_autoptr(virDomainDef) def = NULL; - int fd = -1; virQEMUSaveData *data = NULL; + virQEMUSaveFd saveFd = QEMU_SAVEFD_INVALID; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); qemuDomainObjPrivate *priv; virCheckFlags(VIR_DOMAIN_SAVE_IMAGE_XML_SECURE, NULL); @@ -6122,15 +6129,18 @@ qemuDomainManagedSaveGetXMLDesc(virDomainPtr dom, unsigned int flags) goto cleanup; } - if ((fd = qemuSaveImageOpen(driver, priv->qemuCaps, path, &def, &data, - false, NULL, false, false)) < 0) + if (virQEMUSaveFdInit(&saveFd, path, O_RDONLY, cfg) < 0) + goto cleanup; + if (qemuSaveImageOpen(driver, priv->qemuCaps, &def, &data, false, + &saveFd) < 0) goto cleanup; ret = qemuDomainDefFormatXML(driver, priv->qemuCaps, def, flags); cleanup: virQEMUSaveDataFree(data); - VIR_FORCE_CLOSE(fd); + if (virQEMUSaveFdFini(&saveFd, vm, ret ? 0 : -1) < 0) + ret = NULL; virDomainObjEndAPI(&vm); return ret; } @@ -6180,20 +6190,30 @@ qemuDomainObjRestore(virConnectPtr conn, { g_autoptr(virDomainDef) def = NULL; qemuDomainObjPrivate *priv = vm->privateData; - int fd = -1; int ret = -1; g_autofree char *xmlout = NULL; virQEMUSaveData *data = NULL; - virFileWrapperFd *wrapperFd = NULL; + virQEMUSaveFd saveFd = QEMU_SAVEFD_INVALID; + int oflags = O_RDONLY; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); - fd = qemuSaveImageOpen(driver, NULL, path, &def, &data, - bypass_cache, &wrapperFd, false, true); - if (fd < 0) { - if (fd == -3) + if (bypass_cache) { + if (virFileDirectFdFlag() < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("bypass cache unsupported by this system")); + return -1; + } + oflags |= O_DIRECT; + } + if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg) < 0) + goto cleanup; + ret = qemuSaveImageOpen(driver, NULL, &def, &data, true, &saveFd); + if (ret < 0) { + if (ret == -3) ret = 1; goto cleanup; } - + ret = -1; if (virHookPresent(VIR_HOOK_DRIVER_QEMU)) { int hookret; @@ -6233,15 +6253,12 @@ qemuDomainObjRestore(virConnectPtr conn, virDomainObjAssignDef(vm, &def, true, NULL); - ret = qemuSaveImageStartVM(conn, driver, vm, &fd, data, path, + ret = qemuSaveImageStartVM(conn, driver, vm, &saveFd.fd, data, path, start_paused, reset_nvram, asyncJob); cleanup: virQEMUSaveDataFree(data); - VIR_FORCE_CLOSE(fd); - if (virFileWrapperFdClose(wrapperFd) < 0) - ret = -1; - virFileWrapperFdFree(wrapperFd); + ret = virQEMUSaveFdFini(&saveFd, vm, ret); return ret; } diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 2a58bd23b8..dee87ecfad 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -659,61 +659,30 @@ qemuSaveImageGetCompressionProgram(const char *imageFormat, * @path: path of the save image * @ret_def: returns domain definition created from the XML stored in the image * @ret_data: returns structure filled with data from the image header - * @bypass_cache: bypass cache when opening the file - * @wrapperFd: returns the file wrapper structure - * @open_write: open the file for writing (for updates) - * @unlink_corrupt: remove the image file if it is corrupted + * @unlink_corrupt: mark the image file for removal if it is corrupted + * @saveFd: the save file * - * Returns the opened fd of the save image file and fills the appropriate fields - * on success. On error returns -1 on most failures, -3 if corrupt image was - * unlinked (no error raised). + * Returns 0 on success or -1 on failure. + * -3 is a special failure in which the saveFd has been marked for unlinking. + * On success, the appropriate fields are filled. */ int qemuSaveImageOpen(virQEMUDriver *driver, virQEMUCaps *qemuCaps, - const char *path, virDomainDef **ret_def, virQEMUSaveData **ret_data, - bool bypass_cache, - virFileWrapperFd **wrapperFd, - bool open_write, - bool unlink_corrupt) + bool unlink_corrupt, + virQEMUSaveFd *saveFd) { - g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); - VIR_AUTOCLOSE fd = -1; - int ret = -1; + int ret; g_autoptr(virQEMUSaveData) data = NULL; g_autoptr(virDomainDef) def = NULL; - int oflags = open_write ? O_RDWR : O_RDONLY; - - if (bypass_cache) { - int directFlag = virFileDirectFdFlag(); - if (directFlag < 0) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("bypass cache unsupported by this system")); - return -1; - } - oflags |= directFlag; - } - - if ((fd = qemuDomainOpenFile(cfg, NULL, path, oflags, NULL)) < 0) - return -1; - - if (bypass_cache && - !(*wrapperFd = virFileWrapperFdNew(&fd, path, - VIR_FILE_WRAPPER_BYPASS_CACHE))) - return -1; data = g_new0(virQEMUSaveData, 1); - ret = virQEMUSaveDataRead(data, fd, path); + ret = virQEMUSaveDataRead(data, saveFd->fd, saveFd->path); if (ret < 0) { if (unlink_corrupt && ret == -3) { - if (unlink(path) < 0) { - virReportSystemError(errno, - _("cannot remove corrupt file: %s"), - path); - return -1; - } + saveFd->need_unlink = true; } return ret; } @@ -727,10 +696,7 @@ qemuSaveImageOpen(virQEMUDriver *driver, *ret_def = g_steal_pointer(&def); *ret_data = g_steal_pointer(&data); - ret = fd; - fd = -1; - - return ret; + return 0; } int diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 5768e31fa6..6db4e120cb 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -97,14 +97,11 @@ qemuSaveImageStartVM(virConnectPtr conn, int qemuSaveImageOpen(virQEMUDriver *driver, virQEMUCaps *qemuCaps, - const char *path, virDomainDef **ret_def, virQEMUSaveData **ret_data, - bool bypass_cache, - virFileWrapperFd **wrapperFd, - bool open_write, - bool unlink_corrupt) - ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); + bool unlink_corrupt, + virQEMUSaveFd *saveFd) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(6); int qemuSaveImageGetCompressionProgram(const char *imageFormat, -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- tools/virsh-domain.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index bac4cc0fb6..120d0ef5f0 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -4203,6 +4203,9 @@ doSave(void *opaque) g_autoptr(virshDomain) dom = NULL; const char *name = NULL; const char *to = NULL; + virTypedParameterPtr params = NULL; + int nparams = 0; + int maxparams = 0; unsigned int flags = 0; const char *xmlfile = NULL; g_autofree char *xml = NULL; @@ -4216,9 +4219,12 @@ doSave(void *opaque) goto out_sig; #endif /* !WIN32 */ - if (vshCommandOptStringReq(ctl, cmd, "file", &to) < 0) + if (vshCommandOptStringReq(ctl, cmd, "file", &to) < 0) { goto out; - + } else if (virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_FILE, to) < 0) { + goto out; + } if (vshCommandOptBool(cmd, "bypass-cache")) flags |= VIR_DOMAIN_SAVE_BYPASS_CACHE; if (vshCommandOptBool(cmd, "running")) @@ -4232,10 +4238,14 @@ doSave(void *opaque) if (!(dom = virshCommandOptDomain(ctl, cmd, &name))) goto out; - if (xmlfile && - virFileReadAll(xmlfile, VSH_MAX_XML_FILE, &xml) < 0) { - vshReportError(ctl); - goto out; + if (xmlfile) { + if (virFileReadAll(xmlfile, VSH_MAX_XML_FILE, &xml) < 0) { + vshReportError(ctl); + goto out; + } else if (virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_DXML, xml) < 0) { + goto out; + } } if (flags || xml) { @@ -4252,6 +4262,7 @@ doSave(void *opaque) data->ret = 0; out: + virTypedParamsFree(params, nparams); #ifndef WIN32 pthread_sigmask(SIG_SETMASK, &oldsigmask, NULL); out_sig: -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- tools/virsh-domain.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 120d0ef5f0..a3f007a1d2 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -5316,15 +5316,21 @@ static bool cmdRestore(vshControl *ctl, const vshCmd *cmd) { const char *from = NULL; + virTypedParameterPtr params = NULL; + int nparams = 0; + int maxparams = 0; unsigned int flags = 0; const char *xmlfile = NULL; g_autofree char *xml = NULL; virshControl *priv = ctl->privData; - int rc; - - if (vshCommandOptStringReq(ctl, cmd, "file", &from) < 0) - return false; + int rc = -1; + if (vshCommandOptStringReq(ctl, cmd, "file", &from) < 0) { + goto out; + } else if (virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_FILE, from) < 0) { + goto out; + } if (vshCommandOptBool(cmd, "bypass-cache")) flags |= VIR_DOMAIN_SAVE_BYPASS_CACHE; if (vshCommandOptBool(cmd, "running")) @@ -5335,11 +5341,15 @@ cmdRestore(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SAVE_RESET_NVRAM; if (vshCommandOptStringReq(ctl, cmd, "xml", &xmlfile) < 0) - return false; + goto out; - if (xmlfile && - virFileReadAll(xmlfile, VSH_MAX_XML_FILE, &xml) < 0) - return false; + if (xmlfile) { + if (virFileReadAll(xmlfile, VSH_MAX_XML_FILE, &xml) < 0) + goto out; + else if (virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_DXML, xml) < 0) + goto out; + } if (flags || xml) { rc = virDomainRestoreFlags(priv->conn, from, xml, flags); @@ -5349,11 +5359,13 @@ cmdRestore(vshControl *ctl, const vshCmd *cmd) if (rc < 0) { vshError(ctl, _("Failed to restore domain from %s"), from); - return false; + goto out; } vshPrintExtra(ctl, _("Domain restored from %s\n"), from); - return true; + out: + virTypedParamsFree(params, nparams); + return rc >= 0; } /* -- 2.26.2

in order to enable parallel save functionality, we will need an opportune new flag and a parameter to specify the number of extra connections to use. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- include/libvirt/libvirt-domain.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 24846046aa..0b73ee27b6 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1555,6 +1555,7 @@ typedef enum { VIR_DOMAIN_SAVE_RUNNING = 1 << 1, /* Favor running over paused (Since: 0.9.5) */ VIR_DOMAIN_SAVE_PAUSED = 1 << 2, /* Favor paused over running (Since: 0.9.5) */ VIR_DOMAIN_SAVE_RESET_NVRAM = 1 << 3, /* Re-initialize NVRAM from template (Since: 8.1.0) */ + VIR_DOMAIN_SAVE_PARALLEL = 1 << 4, /* Parallel Save/Restore (Since: 8.4.0) */ } virDomainSaveRestoreFlags; int virDomainSave (virDomainPtr domain, @@ -1600,6 +1601,18 @@ int virDomainRestoreParams (virConnectPtr conn, */ # define VIR_DOMAIN_SAVE_PARAM_DXML "dxml" +/** + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS: + * + * this optional parameter mirrors the migration parameter + * VIR_MIGRATE_PARAM_PARALLEL_CONNECTIONS. + * + * It specifies the number of extra connections to use when transfering state. + * + * Since: 8.4.0 + */ +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS "parallel.connections" + /* See below for virDomainSaveImageXMLFlags */ char * virDomainSaveImageGetXMLDesc (virConnectPtr conn, const char *file, -- 2.26.2

and its companion param VIR_SAVE_PARAM_PARALLEL_CONNECTIONS Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_driver.c | 18 ++++++++++++------ src/qemu/qemu_saveimage.c | 1 + src/qemu/qemu_saveimage.h | 1 + src/qemu/qemu_snapshot.c | 2 +- 4 files changed, 15 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 9ad4945928..5a8d3d1309 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2650,7 +2650,7 @@ static int qemuDomainSaveInternal(virQEMUDriver *driver, virDomainObj *vm, const char *path, int compressed, virCommand *compressor, - const char *xmlin, unsigned int flags) + const char *xmlin, int nconn, unsigned int flags) { g_autofree char *xml = NULL; bool was_running = false; @@ -2731,7 +2731,7 @@ qemuDomainSaveInternal(virQEMUDriver *driver, xml = NULL; ret = qemuSaveImageCreate(driver, vm, path, data, compressor, - flags, VIR_ASYNC_JOB_SAVE); + nconn, flags, VIR_ASYNC_JOB_SAVE); if (ret < 0) goto endjob; @@ -2809,7 +2809,7 @@ qemuDomainManagedSaveHelper(virQEMUDriver *driver, VIR_INFO("Saving state of domain '%s' to '%s'", vm->def->name, path); if (qemuDomainSaveInternal(driver, vm, path, compressed, - compressor, dxml, flags) < 0) + compressor, dxml, -1, flags) < 0) return -1; vm->hasManagedSave = true; @@ -2848,7 +2848,7 @@ qemuDomainSaveFlags(virDomainPtr dom, const char *path, const char *dxml, goto cleanup; ret = qemuDomainSaveInternal(driver, vm, path, compressed, - compressor, dxml, flags); + compressor, dxml, -1, flags); cleanup: virDomainObjEndAPI(&vm); @@ -2875,16 +2875,20 @@ qemuDomainSaveParams(virDomainPtr dom, const char *dxml = NULL; int compressed; int ret = -1; + int nconn = 2; virCheckFlags(VIR_DOMAIN_SAVE_BYPASS_CACHE | VIR_DOMAIN_SAVE_RUNNING | - VIR_DOMAIN_SAVE_PAUSED, -1); + VIR_DOMAIN_SAVE_PAUSED | + VIR_DOMAIN_SAVE_PARALLEL, -1); if (virTypedParamsValidate(params, nparams, VIR_DOMAIN_SAVE_PARAM_FILE, VIR_TYPED_PARAM_STRING, VIR_DOMAIN_SAVE_PARAM_DXML, VIR_TYPED_PARAM_STRING, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, + VIR_TYPED_PARAM_INT, NULL) < 0) return -1; @@ -2894,6 +2898,8 @@ qemuDomainSaveParams(virDomainPtr dom, if (virTypedParamsGetString(params, nparams, VIR_DOMAIN_SAVE_PARAM_DXML, &dxml) < 0) return -1; + if (virTypedParamsGetInt(params, nparams, VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, &nconn) < 0) + return -1; if (!(vm = qemuDomainObjFromDomain(dom))) goto cleanup; @@ -2916,7 +2922,7 @@ qemuDomainSaveParams(virDomainPtr dom, goto cleanup; ret = qemuDomainSaveInternal(driver, vm, to, compressed, - compressor, dxml, flags); + compressor, dxml, nconn, flags); cleanup: virDomainObjEndAPI(&vm); diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index dee87ecfad..70ded3c439 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -521,6 +521,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, const char *path, virQEMUSaveData *data, virCommand *compressor, + int nconn G_GNUC_UNUSED, unsigned int flags, virDomainAsyncJob asyncJob) { diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 6db4e120cb..9b04af2fb9 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -116,6 +116,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, const char *path, virQEMUSaveData *data, virCommand *compressor, + int nconn, unsigned int flags, virDomainAsyncJob asyncJob); diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index c6beeda1d0..3437c3e0cb 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1457,7 +1457,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, memory_existing = virFileExists(snapdef->memorysnapshotfile); if ((ret = qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile, - data, compressor, 0, + data, compressor, -1, 0, VIR_ASYNC_JOB_SNAPSHOT)) < 0) goto cleanup; -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_driver.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5a8d3d1309..986bbf9e58 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5869,7 +5869,8 @@ qemuDomainRestoreInternal(virConnectPtr conn, virCheckFlags(VIR_DOMAIN_SAVE_BYPASS_CACHE | VIR_DOMAIN_SAVE_RUNNING | VIR_DOMAIN_SAVE_PAUSED | - VIR_DOMAIN_SAVE_RESET_NVRAM, -1); + VIR_DOMAIN_SAVE_RESET_NVRAM | + VIR_DOMAIN_SAVE_PARALLEL, -1); if (flags & VIR_DOMAIN_SAVE_RESET_NVRAM) reset_nvram = true; -- 2.26.2

allow interleaved parallel write to a single file, using a record size equal to the io buffer size (1MB). Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/util/iohelper.c | 3 + src/util/virfile.c | 151 +++++++++++++++++++++++++++++--------------- src/util/virfile.h | 2 + 3 files changed, 106 insertions(+), 50 deletions(-) diff --git a/src/util/iohelper.c b/src/util/iohelper.c index 055540c8c4..dcbdda366f 100644 --- a/src/util/iohelper.c +++ b/src/util/iohelper.c @@ -85,6 +85,9 @@ main(int argc, char **argv) if (fd < 0 || virFileDiskCopy(fd, path, -1, "stdio") < 0) goto error; + if (VIR_CLOSE(fd) < 0) + goto error; + return 0; error: diff --git a/src/util/virfile.c b/src/util/virfile.c index 201d7f4e64..f9ae7d94c4 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -4761,6 +4761,9 @@ struct runIOParams { const char *fdinname; int fdout; const char *fdoutname; + int idx; + int nchannels; + off_t total; }; /** @@ -4779,12 +4782,18 @@ runIOCopy(const struct runIOParams p) off_t total = 0; size_t buflen = 1024*1024; char *buf = virFileDirectBufferNew(&base, buflen); + int diskfd = p.isWrite ? p.fdout : p.fdin; if (!buf) { virReportSystemError(errno, _("Failed to allocate aligned memory in function %s"), __FUNCTION__); return -5; } - + if (p.idx >= 0) { + if (lseek(diskfd, p.idx * buflen, SEEK_CUR) < 0) { + virReportSystemError(errno, "%s", _("Failed to lseek to file channel offset")); + return -6; + } + } while (1) { ssize_t got; @@ -4808,7 +4817,12 @@ runIOCopy(const struct runIOParams p) break; total += got; - + if (p.idx >= 0 && !p.isWrite && total > p.total) { + /* do not write to socket too much for this channel, according to CLIA */ + off_t difference = total - p.total; + got -= difference; + total -= difference; + } /* handle last write size align in direct case */ if (got < buflen && p.isDirect && p.isWrite) { ssize_t nwritten = virFileDirectWrite(p.fdout, buf, got); @@ -4816,7 +4830,7 @@ runIOCopy(const struct runIOParams p) virReportSystemError(errno, _("Unable to write %s"), p.fdoutname); return -3; } - if (!p.isBlockDev) { + if (!p.isBlockDev && p.idx < 0) { off_t off = lseek(p.fdout, (off_t)0, SEEK_CUR); if (off < 0) { virReportSystemError(errno, "%s", _("Failed to lseek to get current file offset")); @@ -4824,6 +4838,7 @@ runIOCopy(const struct runIOParams p) } if (nwritten > got) { off -= nwritten - got; + total -= nwritten - got; } if (ftruncate(p.fdout, off) < 0) { virReportSystemError(errno, _("Unable to truncate %s"), p.fdoutname); @@ -4838,51 +4853,61 @@ runIOCopy(const struct runIOParams p) virReportSystemError(errno, _("Unable to write %s"), p.fdoutname); return -3; } + if (p.idx >= 0) { + if (!p.isWrite && total >= p.total) { + /* done for this channel */ + break; + } + /* move channel cursor to the next record */ + if (lseek(diskfd, buflen * (p.nchannels - 1), SEEK_CUR) < 0) { + virReportSystemError(errno, "%s", _("Failed to lseek to next channel record")); + return -7; + } + } } return total; } /** - * virFileDiskCopy: run IO to copy data between storage and a pipe or socket. - * - * @disk_fd: the already open regular file or block device - * @disk_path: the pathname corresponding to disk_fd (for error reporting) - * @remote_fd: the pipe or socket - * Use -1 to auto-choose between STDIN or STDOUT. - * @remote_path: the pathname corresponding to remote_fd (for error reporting) - * - * Note that the direction of the transfer is detected based on the @disk_fd - * file access mode (man 2 open). Therefore @disk_fd must be opened with - * O_RDONLY or O_WRONLY. O_RDWR is not supported. - * - * virFileDiskCopy always closes the file descriptor disk_fd, - * and any error during close(2) is reported and considered a failure. - * - * Returns: bytes transferred or < 0 on failure. + * virFileDiskCopyChannel: like virFileDiskCopy, channel interleaved read/write + * ... + * @idx: channel index + * @nchannels: total number of channels */ off_t -virFileDiskCopy(int disk_fd, const char *disk_path, int remote_fd, const char *remote_path) +virFileDiskCopyChannel(int disk_fd, const char *disk_path, int remote_fd, const char *remote_path, + int idx, int nchannels, off_t total) { - int ret = -1; - off_t total = 0; + off_t new_total = -1; struct stat sb; struct runIOParams p; int oflags = -1; + if ((nchannels == 0) || + (nchannels > 0 && idx >= nchannels) || + (nchannels > 0 && idx < 0) || + (nchannels < 0 && idx >= 0)) { + virReportSystemError(EINVAL, "%s", _("Invalid channel arguments")); + goto out; + } + p.idx = idx; + p.nchannels = nchannels; + p.total = total; + oflags = fcntl(disk_fd, F_GETFL); if (oflags < 0) { virReportSystemError(errno, _("unable to determine access mode of %s"), disk_path); - goto cleanup; + goto out; } if (fstat(disk_fd, &sb) < 0) { virReportSystemError(errno, _("unable to stat file descriptor %d path %s"), disk_fd, disk_path); - goto cleanup; + goto out; } p.isBlockDev = S_ISBLK(sb.st_mode); p.isDirect = O_DIRECT && (oflags & O_DIRECT); @@ -4906,53 +4931,79 @@ virFileDiskCopy(int disk_fd, const char *disk_path, int remote_fd, const char *r default: virReportSystemError(EINVAL, _("Unable to process file with flags %d"), (oflags & O_ACCMODE)); - goto cleanup; + goto out; } if (!p.isBlockDev && p.isDirect) { off_t off = lseek(disk_fd, 0, SEEK_CUR); if (off < 0) { virReportSystemError(errno, "%s", _("O_DIRECT needs a seekable file")); - goto cleanup; + goto out; } if (virFileDirectAlign(off) != off) { /* we could write some zeroes, but maybe it is safer to just fail */ virReportSystemError(EINVAL, "%s", _("O_DIRECT attempted with unaligned file pointer")); - goto cleanup; + goto out; } } - total = runIOCopy(p); - if (total < 0) - goto cleanup; - - /* Ensure all data is written */ - if (virFileDataSync(p.fdout) < 0) { - if (errno != EINVAL && errno != EROFS) { - /* fdatasync() may fail on some special FDs, e.g. pipes */ - virReportSystemError(errno, _("unable to fsync %s"), p.fdoutname); - goto cleanup; + new_total = runIOCopy(p); + if (new_total < 0) + goto out; + + if (p.idx < 0 && p.isWrite) { + /* without channels we can run the fdatasync here */ + if (virFileDataSync(disk_fd) < 0) { + if (errno != EINVAL && errno != EROFS) { + virReportSystemError(errno, _("unable to fsyncdata %s"), p.fdoutname); + new_total = -1; + goto out; + } } } - ret = 0; - - cleanup: - if (VIR_CLOSE(disk_fd) < 0 && ret == 0) { - virReportSystemError(errno, _("Unable to close %s"), disk_path); - ret = -1; - } - return ret; + out: + return new_total; } #else /* WIN32 */ off_t -virFileDiskCopy(int disk_fd G_GNUC_UNUSED, - const char *disk_path G_GNUC_UNUSED, - int remote_fd G_GNUC_UNUSED, - const char *remote_path G_GNUC_UNUSED) +virFileDiskCopyChannel(int disk_fd G_GNUC_UNUSED, + const char *disk_path G_GNUC_UNUSED, + int remote_fd G_GNUC_UNUSED, + const char *remote_path G_GNUC_UNUSED, + int idx G_GNUC_UNUSED, + int nchannels G_GNUC_UNUSED, + off_t total G_GNUC_UNUSED) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("virFileDiskCopy unsupported on this platform")); + _("virFileDiskCopyChannel unsupported on this platform")); return -1; } #endif /* WIN32 */ + +/** + * virFileDiskCopy: run IO to copy data between storage and a pipe or socket. + * + * @disk_fd: the already open regular file or block device + * @disk_path: the pathname corresponding to disk_fd (for error reporting) + * @remote_fd: the pipe or socket + * Use -1 to auto-choose between STDIN or STDOUT. + * @remote_path: the pathname corresponding to remote_fd (for error reporting) + * + * Note that the direction of the transfer is detected based on the @disk_fd + * file access mode (man 2 open). Therefore @disk_fd must be opened with + * O_RDONLY or O_WRONLY. O_RDWR is not supported. + * + * virFileDiskCopy always closes the file descriptor disk_fd, + * and any error during close(2) is reported and considered a failure. + * + * Returns: bytes transferred or < 0 on failure. + */ + +off_t +virFileDiskCopy(int disk_fd, const char *disk_path, + int remote_fd, const char *remote_path) +{ + return virFileDiskCopyChannel(disk_fd, disk_path, remote_fd, remote_path, + -1, -1, 0); +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 844261e0a4..4d75389c84 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -394,3 +394,5 @@ int virFileSetCOW(const char *path, virTristateBool state); off_t virFileDiskCopy(int disk_fd, const char *disk_path, int remote_fd, const char *remote_path); +off_t virFileDiskCopyChannel(int disk_fd, const char *disk_path, int remote_fd, const char *remote_path, + int idx, int nchannels, off_t total); -- 2.26.2

On 6/7/22 11:19, Claudio Fontana wrote:
allow interleaved parallel write to a single file, using a record size equal to the io buffer size (1MB).
Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/util/iohelper.c | 3 + src/util/virfile.c | 151 +++++++++++++++++++++++++++++--------------- src/util/virfile.h | 2 + 3 files changed, 106 insertions(+), 50 deletions(-)
diff --git a/src/util/iohelper.c b/src/util/iohelper.c index 055540c8c4..dcbdda366f 100644 --- a/src/util/iohelper.c +++ b/src/util/iohelper.c @@ -85,6 +85,9 @@ main(int argc, char **argv) if (fd < 0 || virFileDiskCopy(fd, path, -1, "stdio") < 0) goto error;
+ if (VIR_CLOSE(fd) < 0) + goto error; + return 0;
error: diff --git a/src/util/virfile.c b/src/util/virfile.c index 201d7f4e64..f9ae7d94c4 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -4761,6 +4761,9 @@ struct runIOParams { const char *fdinname; int fdout; const char *fdoutname; + int idx; + int nchannels; + off_t total; };
/** @@ -4779,12 +4782,18 @@ runIOCopy(const struct runIOParams p) off_t total = 0; size_t buflen = 1024*1024; char *buf = virFileDirectBufferNew(&base, buflen); + int diskfd = p.isWrite ? p.fdout : p.fdin;
if (!buf) { virReportSystemError(errno, _("Failed to allocate aligned memory in function %s"), __FUNCTION__); return -5; } - + if (p.idx >= 0) { + if (lseek(diskfd, p.idx * buflen, SEEK_CUR) < 0) { + virReportSystemError(errno, "%s", _("Failed to lseek to file channel offset")); + return -6; + } + } while (1) { ssize_t got;
@@ -4808,7 +4817,12 @@ runIOCopy(const struct runIOParams p) break;
total += got; - + if (p.idx >= 0 && !p.isWrite && total > p.total) { + /* do not write to socket too much for this channel, according to CLIA */ + off_t difference = total - p.total; + got -= difference; + total -= difference; + } /* handle last write size align in direct case */ if (got < buflen && p.isDirect && p.isWrite) { ssize_t nwritten = virFileDirectWrite(p.fdout, buf, got); @@ -4816,7 +4830,7 @@ runIOCopy(const struct runIOParams p) virReportSystemError(errno, _("Unable to write %s"), p.fdoutname); return -3; } - if (!p.isBlockDev) { + if (!p.isBlockDev && p.idx < 0) { off_t off = lseek(p.fdout, (off_t)0, SEEK_CUR); if (off < 0) { virReportSystemError(errno, "%s", _("Failed to lseek to get current file offset")); @@ -4824,6 +4838,7 @@ runIOCopy(const struct runIOParams p) } if (nwritten > got) { off -= nwritten - got; + total -= nwritten - got; } if (ftruncate(p.fdout, off) < 0) { virReportSystemError(errno, _("Unable to truncate %s"), p.fdoutname); @@ -4838,51 +4853,61 @@ runIOCopy(const struct runIOParams p) virReportSystemError(errno, _("Unable to write %s"), p.fdoutname); return -3; } + if (p.idx >= 0) { + if (!p.isWrite && total >= p.total) { + /* done for this channel */ + break; + } + /* move channel cursor to the next record */ + if (lseek(diskfd, buflen * (p.nchannels - 1), SEEK_CUR) < 0) { + virReportSystemError(errno, "%s", _("Failed to lseek to next channel record")); + return -7; + } + } } return total; }
/** - * virFileDiskCopy: run IO to copy data between storage and a pipe or socket. - * - * @disk_fd: the already open regular file or block device - * @disk_path: the pathname corresponding to disk_fd (for error reporting) - * @remote_fd: the pipe or socket - * Use -1 to auto-choose between STDIN or STDOUT. - * @remote_path: the pathname corresponding to remote_fd (for error reporting) - * - * Note that the direction of the transfer is detected based on the @disk_fd - * file access mode (man 2 open). Therefore @disk_fd must be opened with - * O_RDONLY or O_WRONLY. O_RDWR is not supported. - * - * virFileDiskCopy always closes the file descriptor disk_fd, - * and any error during close(2) is reported and considered a failure. - * - * Returns: bytes transferred or < 0 on failure. + * virFileDiskCopyChannel: like virFileDiskCopy, channel interleaved read/write + * ... + * @idx: channel index + * @nchannels: total number of channels */
off_t -virFileDiskCopy(int disk_fd, const char *disk_path, int remote_fd, const char *remote_path) +virFileDiskCopyChannel(int disk_fd, const char *disk_path, int remote_fd, const char *remote_path, + int idx, int nchannels, off_t total) { - int ret = -1; - off_t total = 0; + off_t new_total = -1; struct stat sb; struct runIOParams p; int oflags = -1;
+ if ((nchannels == 0) || + (nchannels > 0 && idx >= nchannels) || + (nchannels > 0 && idx < 0) || + (nchannels < 0 && idx >= 0)) { + virReportSystemError(EINVAL, "%s", _("Invalid channel arguments")); + goto out; + } + p.idx = idx; + p.nchannels = nchannels; + p.total = total; + oflags = fcntl(disk_fd, F_GETFL);
if (oflags < 0) { virReportSystemError(errno, _("unable to determine access mode of %s"), disk_path); - goto cleanup; + goto out; } if (fstat(disk_fd, &sb) < 0) { virReportSystemError(errno, _("unable to stat file descriptor %d path %s"), disk_fd, disk_path); - goto cleanup; + goto out; } p.isBlockDev = S_ISBLK(sb.st_mode); p.isDirect = O_DIRECT && (oflags & O_DIRECT); @@ -4906,53 +4931,79 @@ virFileDiskCopy(int disk_fd, const char *disk_path, int remote_fd, const char *r default: virReportSystemError(EINVAL, _("Unable to process file with flags %d"), (oflags & O_ACCMODE)); - goto cleanup; + goto out; } if (!p.isBlockDev && p.isDirect) { off_t off = lseek(disk_fd, 0, SEEK_CUR); if (off < 0) { virReportSystemError(errno, "%s", _("O_DIRECT needs a seekable file")); - goto cleanup; + goto out; } if (virFileDirectAlign(off) != off) { /* we could write some zeroes, but maybe it is safer to just fail */ virReportSystemError(EINVAL, "%s", _("O_DIRECT attempted with unaligned file pointer")); - goto cleanup; + goto out; } } - total = runIOCopy(p); - if (total < 0) - goto cleanup; - - /* Ensure all data is written */ - if (virFileDataSync(p.fdout) < 0) { - if (errno != EINVAL && errno != EROFS) { - /* fdatasync() may fail on some special FDs, e.g. pipes */ - virReportSystemError(errno, _("unable to fsync %s"), p.fdoutname); - goto cleanup; + new_total = runIOCopy(p); + if (new_total < 0) + goto out; + + if (p.idx < 0 && p.isWrite) { + /* without channels we can run the fdatasync here */ + if (virFileDataSync(disk_fd) < 0) { + if (errno != EINVAL && errno != EROFS) { + virReportSystemError(errno, _("unable to fsyncdata %s"), p.fdoutname); + new_total = -1; + goto out; + } } }
- ret = 0; - - cleanup: - if (VIR_CLOSE(disk_fd) < 0 && ret == 0) { - virReportSystemError(errno, _("Unable to close %s"), disk_path); - ret = -1; - } - return ret; + out: + return new_total; }
#else /* WIN32 */
off_t -virFileDiskCopy(int disk_fd G_GNUC_UNUSED, - const char *disk_path G_GNUC_UNUSED, - int remote_fd G_GNUC_UNUSED, - const char *remote_path G_GNUC_UNUSED) +virFileDiskCopyChannel(int disk_fd G_GNUC_UNUSED, + const char *disk_path G_GNUC_UNUSED, + int remote_fd G_GNUC_UNUSED, + const char *remote_path G_GNUC_UNUSED, + int idx G_GNUC_UNUSED, + int nchannels G_GNUC_UNUSED, + off_t total G_GNUC_UNUSED) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("virFileDiskCopy unsupported on this platform")); + _("virFileDiskCopyChannel unsupported on this platform")); return -1; } #endif /* WIN32 */ + +/** + * virFileDiskCopy: run IO to copy data between storage and a pipe or socket. + * + * @disk_fd: the already open regular file or block device + * @disk_path: the pathname corresponding to disk_fd (for error reporting) + * @remote_fd: the pipe or socket + * Use -1 to auto-choose between STDIN or STDOUT. + * @remote_path: the pathname corresponding to remote_fd (for error reporting) + * + * Note that the direction of the transfer is detected based on the @disk_fd + * file access mode (man 2 open). Therefore @disk_fd must be opened with + * O_RDONLY or O_WRONLY. O_RDWR is not supported. + * + * virFileDiskCopy always closes the file descriptor disk_fd, + * and any error during close(2) is reported and considered a failure.
this is not true anymore, the close needs to be done outside of virFileDiskCopy now.
+ * + * Returns: bytes transferred or < 0 on failure. + */ + +off_t +virFileDiskCopy(int disk_fd, const char *disk_path, + int remote_fd, const char *remote_path) +{ + return virFileDiskCopyChannel(disk_fd, disk_path, remote_fd, remote_path, + -1, -1, 0); +} diff --git a/src/util/virfile.h b/src/util/virfile.h index 844261e0a4..4d75389c84 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -394,3 +394,5 @@ int virFileSetCOW(const char *path, virTristateBool state);
off_t virFileDiskCopy(int disk_fd, const char *disk_path, int remote_fd, const char *remote_path); +off_t virFileDiskCopyChannel(int disk_fd, const char *disk_path, int remote_fd, const char *remote_path, + int idx, int nchannels, off_t total);

For the save direction, this helper listens on a unix socket which QEMU connects to for multifd migration to a file. For the restore direction, this helper connects to a unix socket QEMU listens at for multifd migration from a file. The file descriptor is passed as a command line parameter, and interleaved channels are used to allow reading/writing to different parts of the file depending on the channel used. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- po/POTFILES | 1 + src/util/meson.build | 16 ++ src/util/multifd-helper.c | 359 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 376 insertions(+) create mode 100644 src/util/multifd-helper.c diff --git a/po/POTFILES b/po/POTFILES index faaba53c8f..97ecbb0ead 100644 --- a/po/POTFILES +++ b/po/POTFILES @@ -241,6 +241,7 @@ src/storage_file/storage_source.c src/storage_file/storage_source_backingstore.c src/test/test_driver.c src/util/iohelper.c +src/util/multifd-helper.c src/util/viralloc.c src/util/virarptable.c src/util/viraudit.c diff --git a/src/util/meson.build b/src/util/meson.build index 07ae94631c..2e08ed8745 100644 --- a/src/util/meson.build +++ b/src/util/meson.build @@ -179,6 +179,11 @@ io_helper_sources = [ 'virfile.c', ] +multifd_helper_sources = [ + 'multifd-helper.c', + 'virfile.c', +] + virt_util_lib = static_library( 'virt_util', [ @@ -220,6 +225,17 @@ if conf.has('WITH_LIBVIRTD') libutil_dep, ], } + virt_helpers += { + 'name': 'libvirt_multifd_helper', + 'sources': [ + files(multifd_helper_sources), + dtrace_gen_headers, + ], + 'deps': [ + acl_dep, + libutil_dep, + ], + } endif util_inc_dir = include_directories('.') diff --git a/src/util/multifd-helper.c b/src/util/multifd-helper.c new file mode 100644 index 0000000000..6d26e2210a --- /dev/null +++ b/src/util/multifd-helper.c @@ -0,0 +1,359 @@ +/* + * multifd-helper.c: listens on Unix socket to perform I/O to multiple files + * + * Copyright (C) 2022 SUSE LLC + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library. If not, see + * <http://www.gnu.org/licenses/>. + * + * This has been written to support QEMU multifd migration to file, + * allowing better use of cpu resources to speed up the save/restore. + */ + +#include <config.h> + +#include <unistd.h> +#include <fcntl.h> +#include <stdlib.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <sys/socket.h> +#include <sys/un.h> + +#include "virfile.h" +#include "virerror.h" +#include "virstring.h" +#include "virgettext.h" + +#define VIR_FROM_THIS VIR_FROM_STORAGE + +typedef struct _multiFdConnData multiFdConnData; +struct _multiFdConnData { + int idx; + int nchannels; + int clientfd; + int filefd; + int oflags; + const char *sun_path; + const char *disk_path; + off_t total; + + GThread *tid; +}; + +typedef struct _multiFdThreadArgs multiFdThreadArgs; +struct _multiFdThreadArgs { + int nchannels; + multiFdConnData *conn; /* contains main fd + nchannels */ + const char *sun_path; /* unix socket name to use for the server */ + const char *disk_path; /* disk pathname */ + struct sockaddr_un serv_addr; + + off_t total; +}; + +static gpointer clientThreadFunc(void *a) +{ + multiFdConnData *c = a; + c->total = virFileDiskCopyChannel(c->filefd, c->disk_path, c->clientfd, c->sun_path, + c->idx, c->nchannels, c->total); + return &c->total; +} + +static off_t waitClientThreads(multiFdConnData *conn, int n) +{ + int idx; + off_t total = 0; + + for (idx = 0; idx < n; idx++) { + multiFdConnData *c = &conn[idx]; + off_t *ctotal; + + ctotal = g_thread_join(c->tid); + if (*ctotal < (off_t)0) { + total = -1; + } else if (total >= 0) { + total += *ctotal; + } + if (VIR_CLOSE(c->clientfd) < 0) { + total = -1; + } + } + return total; +} + +static gpointer loadThreadFunc(void *a) +{ + multiFdThreadArgs *args = a; + int idx; + args->total = -1; + + for (idx = 0; idx < args->nchannels; idx++) { + /* Perform outgoing connections */ + multiFdConnData *c = &args->conn[idx]; + c->clientfd = socket(AF_UNIX, SOCK_STREAM, 0); + if (c->clientfd < 0) { + virReportSystemError(errno, "%s", _("loadThread: socket() failed")); + goto cleanup; + } + if (connect(c->clientfd, (const struct sockaddr *)&args->serv_addr, + sizeof(struct sockaddr_un)) < 0) { + virReportSystemError(errno, "%s", _("loadThread: connect() failed")); + goto cleanup; + } + c->tid = g_thread_new("libvirt_multifd_load", &clientThreadFunc, c); + } + args->total = waitClientThreads(args->conn, args->nchannels); + + cleanup: + for (idx = 0; idx < args->nchannels; idx++) { + multiFdConnData *c = &args->conn[idx]; + VIR_FORCE_CLOSE(c->clientfd); + } + return &args->total; +} + +static gpointer saveThreadFunc(void *a) +{ + multiFdThreadArgs *args = a; + int idx; + const char buf[1] = {'R'}; + int sockfd; + + args->total = -1; + + if ((sockfd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { + virReportSystemError(errno, "%s", _("saveThread: socket() failed")); + return &args->total; + } + unlink(args->sun_path); + if (bind(sockfd, (struct sockaddr *)&args->serv_addr, sizeof(args->serv_addr)) < 0) { + virReportSystemError(errno, "%s", _("saveThread: bind() failed")); + goto cleanup; + } + if (listen(sockfd, args->nchannels) < 0) { + virReportSystemError(errno, "%s", _("saveThread: listen() failed")); + goto cleanup; + } + /* signal that the server is ready */ + if (safewrite(STDOUT_FILENO, &buf, 1) != 1) { + virReportSystemError(errno, "%s", _("saveThread: safewrite failed")); + goto cleanup; + } + for (idx = 0; idx < args->nchannels; idx++) { + /* Wait for incoming connection. */ + multiFdConnData *c = &args->conn[idx]; + if ((c->clientfd = accept(sockfd, NULL, NULL)) < 0) { + virReportSystemError(errno, "%s", _("saveThread: accept() failed")); + goto cleanup; + } + c->tid = g_thread_new("libvirt_multifd_save", &clientThreadFunc, c); + } + args->total = waitClientThreads(args->conn, args->nchannels); + + cleanup: + for (idx = 0; idx < args->nchannels; idx++) { + multiFdConnData *c = &args->conn[idx]; + VIR_FORCE_CLOSE(c->clientfd); + } + if (VIR_CLOSE(sockfd) < 0) + args->total = -1; + return &args->total; +} + +static int readCLIA(int disk_fd, int nchannels, multiFdConnData *conn) +{ + int idx; + g_autofree void *base = NULL; /* Location to be freed */ + size_t buflen = virFileDirectAlign(nchannels * 8); + int64_t *buf = virFileDirectBufferNew(&base, buflen); + ssize_t got = saferead(disk_fd, buf, buflen); + + if (got < buflen) + return -1; + + for (idx = 0; idx < nchannels; idx++) { + multiFdConnData *c = &conn[idx]; + c->total = buf[idx]; + } + return 0; +} + +static int writeCLIA(int disk_fd, int nchannels, multiFdConnData *conn) +{ + int idx; + g_autofree void *base = NULL; /* Location to be freed */ + size_t buflen = virFileDirectAlign(nchannels * 8); + int64_t *buf = virFileDirectBufferNew(&base, buflen); + + for (idx = 0; idx < nchannels; idx++) { + multiFdConnData *c = &conn[idx]; + buf[idx] = c->total; + } + if (safewrite(disk_fd, buf, buflen) < buflen) + return -1; + return 0; +} + +static const char *program_name; + +G_GNUC_NORETURN static void +usage(int status) +{ + if (status) { + fprintf(stderr, _("%s: try --help for more details"), program_name); + } else { + fprintf(stderr, _("Usage: %s UNIX_SOCNAME DISK_PATHNAME N MAINFD"), program_name); + } + exit(status); +} + +int +main(int argc, char **argv) +{ + GThread *tid; + GThreadFunc func; + multiFdThreadArgs args = { 0 }; + multiFdConnData *mc; + int idx; + off_t clia_off, data_off, *total; + + program_name = argv[0]; + if (virGettextInitialize() < 0 || + virErrorInitialize() < 0) { + fprintf(stderr, _("%s: initialization failed\n"), program_name); + exit(EXIT_FAILURE); + } + + if (argc > 1 && STREQ(argv[1], "--help")) + usage(EXIT_SUCCESS); + if (argc < 5) + usage(EXIT_FAILURE); + + args.sun_path = argv[1]; + args.disk_path = argv[2]; + if (virStrToLong_i(argv[3], NULL, 10, &args.nchannels) < 0) { + fprintf(stderr, _("%s: malformed number of channels N %s\n"), program_name, argv[3]); + usage(EXIT_FAILURE); + } + /* consider the main channel as just another channel */ + args.nchannels += 1; + args.conn = g_new0(multiFdConnData, args.nchannels); + + /* set main channel connection data */ + mc = &args.conn[0]; + mc->idx = 0; + mc->nchannels = args.nchannels; + if (virStrToLong_i(argv[4], NULL, 10, &mc->filefd) < 0) { + fprintf(stderr, _("%s: malformed MAINFD %s\n"), program_name, argv[4]); + usage(EXIT_FAILURE); + } + +#ifndef F_GETFL +# error "multifd-helper requires F_GETFL parameter of fcntl" +#endif + + mc->oflags = fcntl(mc->filefd, F_GETFL); + mc->sun_path = args.sun_path; + mc->disk_path = args.disk_path; + clia_off = lseek(mc->filefd, 0, SEEK_CUR); + if (clia_off < 0) { + fprintf(stderr, _("%s: failed to seek %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + if ((mc->oflags & O_ACCMODE) == O_RDONLY) { + func = loadThreadFunc; + /* set totals from the Channel Length Indicators Area */ + if (readCLIA(mc->filefd, args.nchannels, args.conn) < 0) { + fprintf(stderr, _("%s: failed to read CLIA\n"), program_name); + exit(EXIT_FAILURE); + } + } else { + func = saveThreadFunc; + /* skip Channel Length Indicators Area */ + if (lseek(mc->filefd, virFileDirectAlign(args.nchannels * 8), SEEK_CUR) < 0) { + fprintf(stderr, _("%s: failed to seek %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + mc->total = 0; + } + if ((data_off = lseek(mc->filefd, 0, SEEK_CUR)) < 0) { + fprintf(stderr, _("%s: failed to seek %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + + /* initialize channels */ + for (idx = 1; idx < args.nchannels; idx++) { + multiFdConnData *c = &args.conn[idx]; + c->idx = idx; + c->nchannels = args.nchannels; + c->oflags = mc->oflags & ~(O_TRUNC | O_CREAT); + c->filefd = open(args.disk_path, c->oflags); + if (c->filefd < 0) { + fprintf(stderr, _("%s: failed to open %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + c->sun_path = args.sun_path; + c->disk_path = args.disk_path; + if (mc->total == 0) + c->total = 0; + if (lseek(c->filefd, data_off, SEEK_SET) < 0) { + fprintf(stderr, _("%s: failed to seek %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + } + + /* initialize server address structure */ + memset(&args.serv_addr, 0, sizeof(args.serv_addr)); + args.serv_addr.sun_family = AF_UNIX; + virStrcpyStatic(args.serv_addr.sun_path, args.sun_path); + + tid = g_thread_new("libvirt_multifd_func", func, &args); + + total = g_thread_join(tid); + if (*total < 0) { + exit(EXIT_FAILURE); + } + if (func == saveThreadFunc) { + /* write CLIA */ + if (lseek(mc->filefd, clia_off, SEEK_SET) < 0) { + fprintf(stderr, _("%s: failed to seek %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + /* set totals into the Channel Length Indicators Area */ + if (writeCLIA(mc->filefd, args.nchannels, args.conn) < 0) { + fprintf(stderr, _("%s: failed to write CLIA\n"), program_name); + exit(EXIT_FAILURE); + } + if (lseek(mc->filefd, 0, SEEK_END) < 0) { + fprintf(stderr, _("%s: failed to seek %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + if (virFileDataSync(mc->filefd) < 0) { + if (errno != EINVAL && errno != EROFS) { + fprintf(stderr, _("%s: failed to fsyncdata %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + } + } + /* close up */ + for (idx = 0; idx < args.nchannels; idx++) { + multiFdConnData *c = &args.conn[idx]; + if (VIR_CLOSE(c->filefd) < 0) { + fprintf(stderr, _("%s: failed to close %s\n"), program_name, args.disk_path); + exit(EXIT_FAILURE); + } + } + exit(EXIT_SUCCESS); +} -- 2.26.2

add nchannels to the struct and adapt the APIs slightly. Add a new API virQEMUSaveFdAddChannels to set the number of channels, which might not be known at Init time. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_driver.c | 10 +++++----- src/qemu/qemu_saveimage.c | 27 ++++++++++++++++++++++++--- src/qemu/qemu_saveimage.h | 6 +++++- 3 files changed, 34 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 986bbf9e58..d403ff6bf2 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5883,7 +5883,7 @@ qemuDomainRestoreInternal(virConnectPtr conn, } oflags |= O_DIRECT; } - if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg) < 0) + if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg, false) < 0) return -1; if (qemuSaveImageOpen(driver, NULL, &def, &data, false, &saveFd) < 0) goto cleanup; @@ -6017,7 +6017,7 @@ qemuDomainSaveImageGetXMLDesc(virConnectPtr conn, const char *path, virCheckFlags(VIR_DOMAIN_SAVE_IMAGE_XML_SECURE, NULL); - if (virQEMUSaveFdInit(&saveFd, path, O_RDONLY, cfg) < 0) + if (virQEMUSaveFdInit(&saveFd, path, O_RDONLY, cfg, false) < 0) return NULL; if (qemuSaveImageOpen(driver, NULL, &def, &data, false, &saveFd) < 0) goto cleanup; @@ -6055,7 +6055,7 @@ qemuDomainSaveImageDefineXML(virConnectPtr conn, const char *path, else if (flags & VIR_DOMAIN_SAVE_PAUSED) state = 0; - if (virQEMUSaveFdInit(&saveFd, path, O_RDWR, cfg) < 0) + if (virQEMUSaveFdInit(&saveFd, path, O_RDWR, cfg, false) < 0) return -1; if (qemuSaveImageOpen(driver, NULL, &def, &data, false, &saveFd) < 0) goto cleanup; @@ -6136,7 +6136,7 @@ qemuDomainManagedSaveGetXMLDesc(virDomainPtr dom, unsigned int flags) goto cleanup; } - if (virQEMUSaveFdInit(&saveFd, path, O_RDONLY, cfg) < 0) + if (virQEMUSaveFdInit(&saveFd, path, O_RDONLY, cfg, false) < 0) goto cleanup; if (qemuSaveImageOpen(driver, priv->qemuCaps, &def, &data, false, &saveFd) < 0) @@ -6212,7 +6212,7 @@ qemuDomainObjRestore(virConnectPtr conn, } oflags |= O_DIRECT; } - if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg) < 0) + if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg, false) < 0) goto cleanup; ret = qemuSaveImageOpen(driver, NULL, &def, &data, true, &saveFd); if (ret < 0) { diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 70ded3c439..6418114d8a 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -403,12 +403,13 @@ qemuSaveImageGetCompressionCommand(virQEMUSaveFormat compression) * @base: the file name * @oflags the file descriptor open flags * @cfg: the driver config + * @parallel: whether parallel save/restore is enabled * * Returns -1 on error, 0 on success, * and in both cases virQEMUSaveFdFini must be called to free resources. */ int virQEMUSaveFdInit(virQEMUSaveFd *saveFd, const char *base, - int oflags, virQEMUDriverConfig *cfg) + int oflags, virQEMUDriverConfig *cfg, bool parallel) { unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING; bool isCreat = oflags & O_CREAT; @@ -430,11 +431,14 @@ int virQEMUSaveFdInit(virQEMUSaveFd *saveFd, const char *base, } if (saveFd->fd < 0) return -1; + + saveFd->nchannels = 0; /* + * iohelper Wrapper is never required for multifd parallel save. * For O_CREAT, we always add the wrapper, * and for !O_CREAT, we only add the wrapper if using O_DIRECT. */ - if (isDirect || isCreat) { + if (!parallel && (isDirect || isCreat)) { saveFd->wrapper = virFileWrapperFdNew(&saveFd->fd, saveFd->path, wrapperFlags); if (!saveFd->wrapper) return -1; @@ -442,6 +446,23 @@ int virQEMUSaveFdInit(virQEMUSaveFd *saveFd, const char *base, return 0; } +/* + * virQEMUSaveFdAddChannels: add multifd channels to a virQEMUSaveFd + * + * @saveFd: the structure to add channels to. + * @nchannels:number of multifd channels, must be > 0. + * + * Returns -1 on error, 0 on success. + */ +int virQEMUSaveFdAddChannels(virQEMUSaveFd *saveFd, int nchannels) +{ + if (nchannels > 0) { + saveFd->nchannels = nchannels; + return 0; + } + return -1; +} + /* * virQEMUSaveFdClose: close a virQEMUSaveFd descriptor with normal close. * @@ -539,7 +560,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, oflags |= O_DIRECT; } - if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg) < 0) + if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg, false) < 0) goto cleanup; if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, saveFd.fd) < 0) goto cleanup; diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 9b04af2fb9..dcc9aa7d63 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -62,17 +62,21 @@ struct _virQEMUSaveFd { bool need_unlink; virFileWrapperFd *wrapper; virQEMUDriverConfig *cfg; + int nchannels; }; #define QEMU_SAVEFD_INVALID (virQEMUSaveFd) { \ .path = NULL, .fd = -1, .oflags = 0, \ .need_unlink = false, .wrapper = NULL, .cfg = NULL, \ + .nchannels = 0, \ } int virQEMUSaveFdInit(virQEMUSaveFd *saveFd, const char *base, - int oflags, virQEMUDriverConfig *cfg) + int oflags, virQEMUDriverConfig *cfg, bool parallel) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(4); +int virQEMUSaveFdAddChannels(virQEMUSaveFd *saveFd, int nchannels); + int virQEMUSaveFdClose(virQEMUSaveFd *saveFd, virDomainObj *vm); int virQEMUSaveFdFini(virQEMUSaveFd *saveFd, virDomainObj *vm, int ret); -- 2.26.2

prepare to use the multifd helper. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_saveimage.c | 41 ++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 6418114d8a..8af9b92b35 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -17,6 +17,7 @@ */ #include <config.h> +#include <configmake.h> #include "qemu_saveimage.h" #include "qemu_domain.h" @@ -542,7 +543,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, const char *path, virQEMUSaveData *data, virCommand *compressor, - int nconn G_GNUC_UNUSED, + int nconn, unsigned int flags, virDomainAsyncJob asyncJob) { @@ -566,10 +567,40 @@ qemuSaveImageCreate(virQEMUDriver *driver, goto cleanup; if (virQEMUSaveDataWrite(data, saveFd.fd, saveFd.path) < 0) goto cleanup; - - /* Perform the migration */ - if (qemuMigrationSrcToFile(driver, vm, saveFd.fd, compressor, asyncJob) < 0) - goto cleanup; + if (nconn > 0) { + if (virQEMUSaveFdAddChannels(&saveFd, nconn) < 0) + goto cleanup; + } + if (flags & VIR_DOMAIN_SAVE_PARALLEL) { + g_autoptr(virCommand) cmd = NULL; + g_autofree char *helper_path = NULL; + qemuDomainObjPrivate *priv = vm->privateData; + g_autofree char *sun_path = g_strdup_printf("%s/save-multifd.sock", priv->libDir); + char buf[1]; + int helper_out = -1; + if (!(helper_path = virFileFindResource("libvirt_multifd_helper", + abs_top_builddir "/src", + LIBEXECDIR))) + goto cleanup; + cmd = virCommandNewArgList(helper_path, sun_path, saveFd.path, NULL); + virCommandAddArgFormat(cmd, "%d", nconn); + virCommandAddArgFormat(cmd, "%d", saveFd.fd); + virCommandPassFD(cmd, saveFd.fd, 0); + virCommandSetOutputFD(cmd, &helper_out); /* should create pipe automagically */ + if (virCommandRunAsync(cmd, NULL) < 0) + goto cleanup; + if (saferead(helper_out, &buf, 1) != 1 || buf[0] != 'R') + goto cleanup; + if (chown(sun_path, cfg->user, cfg->group) < 0) + goto cleanup; + /* still using single fd migration for now */ + if (qemuMigrationSrcToFile(driver, vm, saveFd.fd, compressor, asyncJob) < 0) + goto cleanup; + } else { + /* Perform non-parallel migration to file */ + if (qemuMigrationSrcToFile(driver, vm, saveFd.fd, compressor, asyncJob) < 0) + goto cleanup; + } if (virQEMUSaveFdClose(&saveFd, vm) < 0) goto cleanup; -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Ani Sinha <ani@anisinha.ca> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_4.2.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_4.2.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml | 1 + 35 files changed, 36 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index d0c8217825..44e3f9b81f 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -676,6 +676,7 @@ VIR_ENUM_IMPL(virQEMUCaps, /* 430 */ "chardev.qemu-vdagent", /* QEMU_CAPS_CHARDEV_QEMU_VDAGENT */ "display-dbus", /* QEMU_CAPS_DISPLAY_DBUS */ + "migrate-multifd", /* QEMU_CAPS_MIGRATE_MULTIFD */ ); @@ -1234,6 +1235,7 @@ struct virQEMUCapsStringFlags virQEMUCapsCommands[] = { struct virQEMUCapsStringFlags virQEMUCapsMigration[] = { { "rdma-pin-all", QEMU_CAPS_MIGRATE_RDMA }, + { "multifd", QEMU_CAPS_MIGRATE_MULTIFD }, }; /* Use virQEMUCapsQMPSchemaQueries for querying parameters of events */ diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 26cd655db3..4c7e031e28 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -651,6 +651,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ /* 430 */ QEMU_CAPS_CHARDEV_QEMU_VDAGENT, /* -chardev qemu-vdagent */ QEMU_CAPS_DISPLAY_DBUS, /* -display dbus */ + QEMU_CAPS_MIGRATE_MULTIFD, /* migrate can set multifd parameter */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml index 7e0b8fbddf..c3df1b403d 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.aarch64.xml @@ -147,6 +147,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700240</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml index 19bbbd1de3..17b1c6dc29 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.ppc64.xml @@ -152,6 +152,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900240</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml b/tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml index ef6f04f54b..8c0ca3cd92 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.riscv32.xml @@ -144,6 +144,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml index 7c65aff290..08b1f87f08 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.riscv64.xml @@ -144,6 +144,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml b/tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml index 9b5ed96ba3..a024383b72 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.s390x.xml @@ -114,6 +114,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100240</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml index 3cf6a66389..d110bd0eaf 100644 --- a/tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_4.0.0.x86_64.xml @@ -187,6 +187,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100240</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml index 5daa7bda75..50e80aab9f 100644 --- a/tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_4.1.0.x86_64.xml @@ -194,6 +194,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml index 833bf955f9..250bf8a433 100644 --- a/tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_4.2.0.aarch64.xml @@ -162,6 +162,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4001050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.2.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_4.2.0.ppc64.xml index 1586f28ca5..06432c25bd 100644 --- a/tests/qemucapabilitiesdata/caps_4.2.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_4.2.0.ppc64.xml @@ -159,6 +159,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4001050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.2.0.s390x.xml b/tests/qemucapabilitiesdata/caps_4.2.0.s390x.xml index ce5b782afd..1bd6f44f16 100644 --- a/tests/qemucapabilitiesdata/caps_4.2.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_4.2.0.s390x.xml @@ -127,6 +127,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml index 9ee4c0534d..d96f13b650 100644 --- a/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_4.2.0.x86_64.xml @@ -205,6 +205,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='virtio-blk.queue-size'/> + <flag name='migrate-multifd'/> <version>4002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml index 29ee31473f..2a7f5074bc 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml @@ -174,6 +174,7 @@ <flag name='virtio-blk.queue-size'/> <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> + <flag name='migrate-multifd'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml index 1fdec901a6..d2c083e8f7 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml @@ -180,6 +180,7 @@ <flag name='virtio-blk.queue-size'/> <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> + <flag name='migrate-multifd'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml index 3b58e7fece..933b4870cd 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml @@ -166,6 +166,7 @@ <flag name='virtio-blk.queue-size'/> <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> + <flag name='migrate-multifd'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml index bee5a84cf9..9bb48e1982 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml @@ -214,6 +214,7 @@ <flag name='virtio-blk.queue-size'/> <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> + <flag name='migrate-multifd'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml b/tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml index e6a3ed5ec0..f01cd25278 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml @@ -86,6 +86,7 @@ <flag name='input-linux'/> <flag name='query-display-options'/> <flag name='memory-backend-file.prealloc-threads'/> + <flag name='migrate-multifd'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml index 070f64cb1c..f4446674a4 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml @@ -218,6 +218,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml index 8e17532f3a..f3e3640657 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml @@ -181,6 +181,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml index df02e264d7..86d021c2fc 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml @@ -185,6 +185,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml index 7cb4383693..2079257759 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml @@ -171,6 +171,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml b/tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml index 30d10236e9..d800e35096 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml @@ -139,6 +139,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml index cad4ed40e6..f189e299c3 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml @@ -222,6 +222,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml index 4b4cc2d3aa..e86a2cf5d6 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml @@ -189,6 +189,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>6000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml b/tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml index 06543071aa..bbc5825def 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml @@ -147,6 +147,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>6000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml index 8c61bf8a84..496dd5564c 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml @@ -231,6 +231,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>6000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml index afd8f606eb..fab5e40e35 100644 --- a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml @@ -236,6 +236,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> + <flag name='migrate-multifd'/> <version>6001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml index 86fc46918f..a83b8c8d77 100644 --- a/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml @@ -201,6 +201,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> + <flag name='migrate-multifd'/> <version>6001050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700244</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml index 983b54430d..bfacae9b3b 100644 --- a/tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml @@ -196,6 +196,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> + <flag name='migrate-multifd'/> <version>6002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900244</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml index 19605d93ae..1293cd1bf1 100644 --- a/tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml @@ -238,6 +238,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> + <flag name='migrate-multifd'/> <version>6002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100244</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml index e24e2235fb..16626a1342 100644 --- a/tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml @@ -209,6 +209,7 @@ <flag name='virtio-iommu.boot-bypass'/> <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> + <flag name='migrate-multifd'/> <version>6002092</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml index 83e0f50e3a..9f4242c5c0 100644 --- a/tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml @@ -213,6 +213,7 @@ <flag name='virtio-iommu.boot-bypass'/> <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> + <flag name='migrate-multifd'/> <version>7000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml index 05f844fd5b..1d003a6fb7 100644 --- a/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml @@ -243,6 +243,7 @@ <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> <flag name='display-dbus'/> + <flag name='migrate-multifd'/> <version>7000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml index bdc613a54a..80762d9fe9 100644 --- a/tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml @@ -243,6 +243,7 @@ <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> <flag name='display-dbus'/> + <flag name='migrate-multifd'/> <version>7000050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100244</microcodeVersion> -- 2.26.2

add both multifd compression and number of multifd channels Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_saveimage.c | 17 +++++++++++++++++ src/qemu/qemu_saveimage.h | 4 +++- 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 8af9b92b35..7d8527596a 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -67,6 +67,23 @@ VIR_ENUM_IMPL(qemuSaveCompression, "lzop", ); +typedef enum { + QEMU_SAVE_MULTIFD_COMP_NONE = 0, + QEMU_SAVE_MULTIFD_COMP_ZLIB = 1, + QEMU_SAVE_MULTIFD_COMP_ZSTD = 2, + + /* used for the on-disk format, do not change/re-use numbers */ + QEMU_SAVE_MULTIFD_COMP_LAST +} virQEMUSaveMultiFdComp; + +VIR_ENUM_DECL(qemuSaveMultiFdComp); +VIR_ENUM_IMPL(qemuSaveMultiFdComp, + QEMU_SAVE_MULTIFD_COMP_LAST, + "none", + "zlib", + "zstd", +); + static inline void qemuSaveImageBswapHeader(virQEMUSaveHeader *hdr) { diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index dcc9aa7d63..2af873dea9 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -42,7 +42,9 @@ struct _virQEMUSaveHeader { uint32_t was_running; /* 4 bytes */ uint32_t compressed; /* 4 bytes */ uint32_t cookieOffset; /* 4 bytes */ - uint32_t unused[14]; /* 56 bytes */ + uint16_t multifd_channels; /* 2 bytes */ + uint16_t multifd_comp; /* 2 bytes */ + uint32_t unused[13]; /* 52 bytes */ } ATTRIBUTE_PACKED; /* = 92 bytes */ -- 2.26.2

similarly to qemuMigrationParamsSetULL, we need to be able to set fields from qemu_saveimage. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_migration_params.c | 22 ++++++++++++++++++++++ src/qemu/qemu_migration_params.h | 9 +++++++++ 2 files changed, 31 insertions(+) diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 95fd773645..fad61631fa 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -1109,6 +1109,28 @@ qemuMigrationParamsFetch(virQEMUDriver *driver, } +void +qemuMigrationParamsSetCap(qemuMigrationParams *migParams, + qemuMigrationCapability mcap) +{ + ignore_value(virBitmapSetBit(migParams->caps, mcap)); +} + + +int +qemuMigrationParamsSetInt(qemuMigrationParams *migParams, + qemuMigrationParam param, + int value) +{ + if (qemuMigrationParamsCheckType(param, QEMU_MIGRATION_PARAM_TYPE_INT) < 0) + return -1; + + migParams->params[param].value.i = value; + migParams->params[param].set = true; + return 0; +} + + int qemuMigrationParamsSetULL(qemuMigrationParams *migParams, qemuMigrationParam param, diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index a0909b9f3d..036c4e7582 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -123,6 +123,15 @@ qemuMigrationParamsFetch(virQEMUDriver *driver, int asyncJob, qemuMigrationParams **migParams); +void +qemuMigrationParamsSetCap(qemuMigrationParams *migParams, + qemuMigrationCapability mcap); + +int +qemuMigrationParamsSetInt(qemuMigrationParams *migParams, + qemuMigrationParam param, + int value); + int qemuMigrationParamsSetULL(qemuMigrationParams *migParams, qemuMigrationParam param, -- 2.26.2

implement a function similar to qemuMigrationSrcToFile that migrates to multiple files using QEMU multifd, and use it for VIR_DOMAIN_SAVE_PARALLEL saves. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_migration.c | 131 +++++++++++++++++++++++++------------- src/qemu/qemu_migration.h | 7 ++ src/qemu/qemu_saveimage.c | 8 ++- 3 files changed, 98 insertions(+), 48 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index f130ccbe26..fa6cddcc59 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -5875,13 +5875,14 @@ qemuMigrationDstFinish(virQEMUDriver *driver, return dom; } - /* Helper function called while vm is active. */ -int -qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, - int fd, - virCommand *compressor, - virDomainAsyncJob asyncJob) +static int +qemuMigrationSrcToFileAux(virQEMUDriver *driver, virDomainObj *vm, + int fd, + virCommand *compressor, + virDomainAsyncJob asyncJob, + const char *sun_path, + int nchannels) { qemuDomainObjPrivate *priv = vm->privateData; bool bwParam = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_PARAM_BANDWIDTH); @@ -5892,24 +5893,26 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, char *errbuf = NULL; virErrorPtr orig_err = NULL; g_autoptr(qemuMigrationParams) migParams = NULL; + bool needParams = (bwParam || sun_path); + if (sun_path && !virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATE_MULTIFD)) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("QEMU does not seem to support multifd migration, required for parallel migration to files")); + return -1; + } if (qemuMigrationSetDBusVMState(driver, vm) < 0) return -1; /* Increase migration bandwidth to unlimited since target is a file. * Failure to change migration speed is not fatal. */ - if (bwParam) { - if (!(migParams = qemuMigrationParamsNew())) - return -1; + if (needParams && !((migParams = qemuMigrationParamsNew()))) + return -1; + if (bwParam) { if (qemuMigrationParamsSetULL(migParams, QEMU_MIGRATION_PARAM_MAX_BANDWIDTH, QEMU_DOMAIN_MIG_BANDWIDTH_MAX * 1024 * 1024) < 0) return -1; - - if (qemuMigrationParamsApply(driver, vm, asyncJob, migParams) < 0) - return -1; - priv->migMaxBandwidth = QEMU_DOMAIN_MIG_BANDWIDTH_MAX; } else { if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { @@ -5920,6 +5923,17 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, } } + if (sun_path) { + qemuMigrationParamsSetCap(migParams, QEMU_MIGRATION_CAP_MULTIFD); + if (qemuMigrationParamsSetInt(migParams, + QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, + nchannels) < 0) + return -1; + } + + if (needParams && qemuMigrationParamsApply(driver, vm, asyncJob, migParams) < 0) + return -1; + if (!virDomainObjIsActive(vm)) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("guest unexpectedly quit")); @@ -5927,45 +5941,53 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, return -1; } - if (compressor && virPipe(pipeFD) < 0) + if (!sun_path && compressor && virPipe(pipeFD) < 0) return -1; - /* All right! We can use fd migration, which means that qemu - * doesn't have to open() the file, so while we still have to - * grant SELinux access, we can do it on fd and avoid cleanup - * later, as well as skip futzing with cgroup. */ - if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, - compressor ? pipeFD[1] : fd) < 0) - goto cleanup; - if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) goto cleanup; - if (!compressor) { - rc = qemuMonitorMigrateToFd(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - fd); + if (sun_path) { + rc = qemuMonitorMigrateToSocket(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + sun_path); } else { - virCommandSetInputFD(compressor, pipeFD[0]); - virCommandSetOutputFD(compressor, &fd); - virCommandSetErrorBuffer(compressor, &errbuf); - virCommandDoAsyncIO(compressor); - if (virSetCloseExec(pipeFD[1]) < 0) { - virReportSystemError(errno, "%s", - _("Unable to set cloexec flag")); - qemuDomainObjExitMonitor(vm); - goto cleanup; - } - if (virCommandRunAsync(compressor, NULL) < 0) { - qemuDomainObjExitMonitor(vm); + /* + * All right! We can use fd migration, which means that qemu + * doesn't have to open() the file, so while we still have to + * grant SELinux access, we can do it on fd and avoid cleanup + * later, as well as skip futzing with cgroup. + */ + if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, + compressor ? pipeFD[1] : fd) < 0) goto cleanup; + + if (!compressor) { + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + fd); + } else { + virCommandSetInputFD(compressor, pipeFD[0]); + virCommandSetOutputFD(compressor, &fd); + virCommandSetErrorBuffer(compressor, &errbuf); + virCommandDoAsyncIO(compressor); + if (virSetCloseExec(pipeFD[1]) < 0) { + virReportSystemError(errno, "%s", + _("Unable to set cloexec flag")); + qemuDomainObjExitMonitor(vm); + goto cleanup; + } + if (virCommandRunAsync(compressor, NULL) < 0) { + qemuDomainObjExitMonitor(vm); + goto cleanup; + } + rc = qemuMonitorMigrateToFd(priv->mon, + QEMU_MONITOR_MIGRATE_BACKGROUND, + pipeFD[1]); + if (VIR_CLOSE(pipeFD[0]) < 0 || + VIR_CLOSE(pipeFD[1]) < 0) + VIR_WARN("failed to close intermediate pipe"); } - rc = qemuMonitorMigrateToFd(priv->mon, - QEMU_MONITOR_MIGRATE_BACKGROUND, - pipeFD[1]); - if (VIR_CLOSE(pipeFD[0]) < 0 || - VIR_CLOSE(pipeFD[1]) < 0) - VIR_WARN("failed to close intermediate pipe"); } qemuDomainObjExitMonitor(vm); if (rc < 0) @@ -5986,7 +6008,7 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, goto cleanup; } - if (compressor && virCommandWait(compressor, NULL) < 0) + if (!sun_path && compressor && virCommandWait(compressor, NULL) < 0) goto cleanup; qemuDomainEventEmitJobCompleted(driver, vm); @@ -6025,6 +6047,25 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, return ret; } +int +qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, + int fd, + virCommand *compressor, + virDomainAsyncJob asyncJob) +{ + return qemuMigrationSrcToFileAux(driver, vm, fd, compressor, + asyncJob, NULL, -1); +} + +int +qemuMigrationSrcToFilesMultiFd(virQEMUDriver *driver, virDomainObj *vm, + virDomainAsyncJob asyncJob, + const char *sun_path, + int nchannels) +{ + return qemuMigrationSrcToFileAux(driver, vm, -1, NULL, + asyncJob, sun_path, nchannels); +} int qemuMigrationSrcCancel(virQEMUDriver *driver, diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index a8afa66119..ddc8e65489 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -213,6 +213,13 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainAsyncJob asyncJob) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; +int +qemuMigrationSrcToFilesMultiFd(virQEMUDriver *driver, virDomainObj *vm, + virDomainAsyncJob asyncJob, + const char *sun_path, + int nchannels) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; + int qemuMigrationSrcCancel(virQEMUDriver *driver, virDomainObj *vm); diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 7d8527596a..8abb919459 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -578,8 +578,11 @@ qemuSaveImageCreate(virQEMUDriver *driver, oflags |= O_DIRECT; } - if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg, false) < 0) + if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg, flags & VIR_DOMAIN_SAVE_PARALLEL) < 0) goto cleanup; + if (nconn > 0) { + data->header.multifd_channels = nconn; + } if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, saveFd.fd) < 0) goto cleanup; if (virQEMUSaveDataWrite(data, saveFd.fd, saveFd.path) < 0) @@ -610,8 +613,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, goto cleanup; if (chown(sun_path, cfg->user, cfg->group) < 0) goto cleanup; - /* still using single fd migration for now */ - if (qemuMigrationSrcToFile(driver, vm, saveFd.fd, compressor, asyncJob) < 0) + if (qemuMigrationSrcToFilesMultiFd(driver, vm, asyncJob, sun_path, nconn) < 0) goto cleanup; } else { /* Perform non-parallel migration to file */ -- 2.26.2

The distinction on whether to wait for the migration completion or not was made on the async job type, but with the future addition of multifd migration from files, we need a way to avoid waiting, so we can prepare multifd migration parameters before starting the transfers. Adapt all callers. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_driver.c | 8 ++++---- src/qemu/qemu_migration.c | 18 ++++++++++-------- src/qemu/qemu_migration.h | 3 ++- src/qemu/qemu_process.c | 3 ++- src/qemu/qemu_process.h | 5 +++-- src/qemu/qemu_saveimage.c | 4 +++- src/qemu/qemu_saveimage.h | 1 + src/qemu/qemu_snapshot.c | 4 ++-- 8 files changed, 27 insertions(+), 19 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d403ff6bf2..3d2b5d78e6 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1639,7 +1639,7 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr conn, } if (qemuProcessStart(conn, driver, vm, NULL, VIR_ASYNC_JOB_START, - NULL, -1, NULL, NULL, + NULL, -1, NULL, false, NULL, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags) < 0) { virDomainAuditStart(vm, "booted", false); @@ -5940,7 +5940,7 @@ qemuDomainRestoreInternal(virConnectPtr conn, goto cleanup; ret = qemuSaveImageStartVM(conn, driver, vm, &saveFd.fd, data, path, - false, reset_nvram, VIR_ASYNC_JOB_START); + false, reset_nvram, true, VIR_ASYNC_JOB_START); qemuProcessEndJob(vm); @@ -6261,7 +6261,7 @@ qemuDomainObjRestore(virConnectPtr conn, virDomainObjAssignDef(vm, &def, true, NULL); ret = qemuSaveImageStartVM(conn, driver, vm, &saveFd.fd, data, path, - start_paused, reset_nvram, asyncJob); + start_paused, reset_nvram, true, asyncJob); cleanup: virQEMUSaveDataFree(data); @@ -6515,7 +6515,7 @@ qemuDomainObjStart(virConnectPtr conn, } ret = qemuProcessStart(conn, driver, vm, NULL, asyncJob, - NULL, -1, NULL, NULL, + NULL, -1, NULL, false, NULL, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags); virDomainAuditStart(vm, "booted", ret >= 0); if (ret >= 0) { diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index fa6cddcc59..439dd7478d 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2122,7 +2122,8 @@ int qemuMigrationDstRun(virQEMUDriver *driver, virDomainObj *vm, const char *uri, - virDomainAsyncJob asyncJob) + virDomainAsyncJob asyncJob, + bool wait) { qemuDomainObjPrivate *priv = vm->privateData; int rv; @@ -2143,14 +2144,15 @@ qemuMigrationDstRun(virQEMUDriver *driver, if (rv < 0) return -1; - if (asyncJob == VIR_ASYNC_JOB_MIGRATION_IN) { - /* qemuMigrationDstWaitForCompletion is called from the Finish phase */ - return 0; + if (wait) { + /* + * the Migration Finish phase, as well as the multifd load from files, + * need to call qemuMigrationDstWaitForCompletion separately, not here. + */ + if (qemuMigrationDstWaitForCompletion(driver, vm, asyncJob, false) < 0) + return -1; } - if (qemuMigrationDstWaitForCompletion(driver, vm, asyncJob, false) < 0) - return -1; - return 0; } @@ -3024,7 +3026,7 @@ qemuMigrationDstPrepareAny(virQEMUDriver *driver, } if (qemuMigrationDstRun(driver, vm, incoming->uri, - VIR_ASYNC_JOB_MIGRATION_IN) < 0) + VIR_ASYNC_JOB_MIGRATION_IN, false) < 0) goto stopjob; if (qemuProcessFinishStartup(driver, vm, VIR_ASYNC_JOB_MIGRATION_IN, diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index ddc8e65489..c3c48c19c0 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -255,7 +255,8 @@ int qemuMigrationDstRun(virQEMUDriver *driver, virDomainObj *vm, const char *uri, - virDomainAsyncJob asyncJob); + virDomainAsyncJob asyncJob, + bool wait); void qemuMigrationAnyPostcopyFailed(virQEMUDriver *driver, diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 1593ca7933..40870b8f7a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7742,6 +7742,7 @@ qemuProcessStart(virConnectPtr conn, const char *migrateFrom, int migrateFd, const char *migratePath, + bool wait_incoming, virDomainMomentObj *snapshot, virNetDevVPortProfileOp vmop, unsigned int flags) @@ -7804,7 +7805,7 @@ qemuProcessStart(virConnectPtr conn, relabel = true; if (incoming) { - if (qemuMigrationDstRun(driver, vm, incoming->uri, asyncJob) < 0) + if (qemuMigrationDstRun(driver, vm, incoming->uri, asyncJob, wait_incoming) < 0) goto stop; } else { /* Refresh state of devices from QEMU. During migration this happens diff --git a/src/qemu/qemu_process.h b/src/qemu/qemu_process.h index 9866d2bf35..aad34ae0d8 100644 --- a/src/qemu/qemu_process.h +++ b/src/qemu/qemu_process.h @@ -86,8 +86,9 @@ int qemuProcessStart(virConnectPtr conn, virCPUDef *updatedCPU, virDomainAsyncJob asyncJob, const char *migrateFrom, - int stdin_fd, - const char *stdin_path, + int fd, + const char *migratePath, + bool wait_incoming, virDomainMomentObj *snapshot, virNetDevVPortProfileOp vmop, unsigned int flags); diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 8abb919459..9c86e20f30 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -780,6 +780,7 @@ qemuSaveImageStartVM(virConnectPtr conn, const char *path, bool start_paused, bool reset_nvram, + bool wait_incoming, virDomainAsyncJob asyncJob) { qemuDomainObjPrivate *priv = vm->privateData; @@ -834,7 +835,8 @@ qemuSaveImageStartVM(virConnectPtr conn, priv->disableSlirp = true; if (qemuProcessStart(conn, driver, vm, cookie ? cookie->cpu : NULL, - asyncJob, "stdio", *fd, path, NULL, + asyncJob, "stdio", *fd, path, wait_incoming, + NULL, VIR_NETDEV_VPORT_PROFILE_OP_RESTORE, start_flags) == 0) started = true; diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 2af873dea9..91ea8dcbf2 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -97,6 +97,7 @@ qemuSaveImageStartVM(virConnectPtr conn, const char *path, bool start_paused, bool reset_nvram, + bool wait_incoming, virDomainAsyncJob asyncJob) ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_NONNULL(6); diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index 3437c3e0cb..dec653c6d3 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -2092,7 +2092,7 @@ qemuSnapshotRevertActive(virDomainObj *vm, rc = qemuProcessStart(snapshot->domain->conn, driver, vm, cookie ? cookie->cpu : NULL, - VIR_ASYNC_JOB_START, NULL, -1, NULL, snap, + VIR_ASYNC_JOB_START, NULL, -1, NULL, false, snap, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags); virDomainAuditStart(vm, "from-snapshot", rc >= 0); @@ -2215,7 +2215,7 @@ qemuSnapshotRevertInactive(virDomainObj *vm, start_flags |= paused ? VIR_QEMU_PROCESS_START_PAUSED : 0; rc = qemuProcessStart(snapshot->domain->conn, driver, vm, NULL, - VIR_ASYNC_JOB_START, NULL, -1, NULL, NULL, + VIR_ASYNC_JOB_START, NULL, -1, NULL, false, NULL, VIR_NETDEV_VPORT_PROFILE_OP_CREATE, start_flags); virDomainAuditStart(vm, "from-snapshot", rc >= 0); -- 2.26.2

use multifd to restore parallel images, if VIR_DOMAIN_SAVE_PARALLEL is enabled. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_driver.c | 16 +++++- src/qemu/qemu_migration.c | 2 +- src/qemu/qemu_migration.h | 6 ++ src/qemu/qemu_saveimage.c | 114 +++++++++++++++++++++++++++++++++++++- src/qemu/qemu_saveimage.h | 8 ++- 5 files changed, 139 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3d2b5d78e6..c3b2afe0d8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -5883,7 +5883,7 @@ qemuDomainRestoreInternal(virConnectPtr conn, } oflags |= O_DIRECT; } - if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg, false) < 0) + if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg, flags & VIR_DOMAIN_SAVE_PARALLEL) < 0) return -1; if (qemuSaveImageOpen(driver, NULL, &def, &data, false, &saveFd) < 0) goto cleanup; @@ -5939,8 +5939,18 @@ qemuDomainRestoreInternal(virConnectPtr conn, flags) < 0) goto cleanup; - ret = qemuSaveImageStartVM(conn, driver, vm, &saveFd.fd, data, path, - false, reset_nvram, true, VIR_ASYNC_JOB_START); + if (flags & VIR_DOMAIN_SAVE_PARALLEL) { + ret = qemuSaveImageLoadMultiFd(conn, vm, data, reset_nvram, + &saveFd, VIR_ASYNC_JOB_START); + + } else if (data->header.multifd_channels != 0) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("save file seems to contain multifd channels information, and restore is not flagged as 'parallel'")); + ret = -1; + } else { + ret = qemuSaveImageStartVM(conn, driver, vm, &saveFd.fd, data, path, + false, reset_nvram, true, VIR_ASYNC_JOB_START); + } qemuProcessEndJob(vm); diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 439dd7478d..c214f88dd1 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1920,7 +1920,7 @@ qemuMigrationSrcWaitForCompletion(virQEMUDriver *driver, } -static int +int qemuMigrationDstWaitForCompletion(virQEMUDriver *driver, virDomainObj *vm, virDomainAsyncJob asyncJob, diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index c3c48c19c0..38f4877cf0 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -191,6 +191,12 @@ qemuMigrationDstFinish(virQEMUDriver *driver, int retcode, bool v3proto); +int +qemuMigrationDstWaitForCompletion(virQEMUDriver *driver, + virDomainObj *vm, + virDomainAsyncJob asyncJob, + bool postcopy); + int qemuMigrationSrcConfirm(virQEMUDriver *driver, virDomainObj *vm, diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 9c86e20f30..2ed02d31de 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -643,6 +643,105 @@ qemuSaveImageCreate(virQEMUDriver *driver, } +int qemuSaveImageLoadMultiFd(virConnectPtr conn, virDomainObj *vm, + virQEMUSaveData *data, bool reset_nvram, + virQEMUSaveFd *saveFd, virDomainAsyncJob asyncJob) +{ + virQEMUDriver *driver = conn->privateData; + qemuDomainObjPrivate *priv = vm->privateData; + g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); + g_autoptr(virCommand) cmd = NULL; + g_autofree char *helper_path = NULL; + g_autofree char *sun_path = g_strdup_printf("%s/restore-multifd.sock", cfg->saveDir); + bool qemu_started = false; + int ret = -1; + + if (!(helper_path = virFileFindResource("libvirt_multifd_helper", + abs_top_builddir "/src", + LIBEXECDIR))) + goto cleanup; + cmd = virCommandNewArgList(helper_path, sun_path, saveFd->path, NULL); + virCommandAddArgFormat(cmd, "%d", saveFd->nchannels); + virCommandAddArgFormat(cmd, "%d", saveFd->fd); + virCommandPassFD(cmd, saveFd->fd, 0); + + if (qemuSaveImageStartVM(conn, driver, vm, NULL, data, sun_path, + false, reset_nvram, false, asyncJob) < 0) + goto cleanup; + + qemu_started = true; + + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATE_MULTIFD)) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("QEMU does not seem to support multifd migration")); + goto cleanup; + } else { + g_autoptr(qemuMigrationParams) migParams = qemuMigrationParamsNew(); + bool bwParam = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_PARAM_BANDWIDTH); + + if (bwParam) { + if (qemuMigrationParamsSetULL(migParams, + QEMU_MIGRATION_PARAM_MAX_BANDWIDTH, + QEMU_DOMAIN_MIG_BANDWIDTH_MAX * 1024 * 1024) < 0) + goto cleanup; + priv->migMaxBandwidth = QEMU_DOMAIN_MIG_BANDWIDTH_MAX; + } else { + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) == 0) { + qemuMonitorSetMigrationSpeed(priv->mon, + QEMU_DOMAIN_MIG_BANDWIDTH_MAX); + priv->migMaxBandwidth = QEMU_DOMAIN_MIG_BANDWIDTH_MAX; + qemuDomainObjExitMonitor(vm); + } + } + qemuMigrationParamsSetCap(migParams, QEMU_MIGRATION_CAP_MULTIFD); + if (qemuMigrationParamsSetInt(migParams, + QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, + saveFd->nchannels) < 0) + goto cleanup; + if (qemuMigrationParamsApply(driver, vm, asyncJob, migParams) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest unexpectedly quit")); + goto cleanup; + } + /* multifd helper can now connect, then wait for migration to complete */ + if (virCommandRunAsync(cmd, NULL) < 0) + goto cleanup; + + if (qemuMigrationDstWaitForCompletion(driver, vm, asyncJob, false) < 0) + goto cleanup; + + if (qemuProcessRefreshState(driver, vm, asyncJob) < 0) + goto cleanup; + + /* run 'cont' on the destination */ + if (qemuProcessStartCPUs(driver, vm, + VIR_DOMAIN_RUNNING_RESTORED, + asyncJob) < 0) { + if (virGetLastErrorCode() == VIR_ERR_OK) + virReportError(VIR_ERR_OPERATION_FAILED, + "%s", _("failed to resume domain")); + goto cleanup; + } + if (virDomainObjSave(vm, driver->xmlopt, cfg->stateDir) < 0) { + VIR_WARN("Failed to save status on vm %s", vm->def->name); + goto cleanup; + } + } + qemuDomainEventEmitJobCompleted(driver, vm); + ret = 0; + + cleanup: + if (ret < 0 && qemu_started) { + qemuProcessStop(driver, vm, VIR_DOMAIN_SHUTOFF_FAILED, + asyncJob, VIR_QEMU_PROCESS_STOP_MIGRATED); + } + return ret; +} + + /* qemuSaveImageGetCompressionProgram: * @imageFormat: String representation from qemu.conf for the compression * image format being used (dump, save, or snapshot). @@ -758,6 +857,10 @@ qemuSaveImageOpen(virQEMUDriver *driver, } return ret; } + if (data->header.multifd_channels > 0) { + if (virQEMUSaveFdAddChannels(saveFd, data->header.multifd_channels) < 0) + return -1; + } /* Create a domain from this XML */ if (!(def = virDomainDefParseString(data->xml, driver->xmlopt, qemuCaps, @@ -788,6 +891,7 @@ qemuSaveImageStartVM(virConnectPtr conn, bool started = false; virObjectEvent *event; VIR_AUTOCLOSE intermediatefd = -1; + g_autofree char *migrate_from = NULL; g_autoptr(virCommand) cmd = NULL; g_autofree char *errbuf = NULL; g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); @@ -834,8 +938,14 @@ qemuSaveImageStartVM(virConnectPtr conn, if (cookie && !cookie->slirpHelper) priv->disableSlirp = true; + if (fd) { + migrate_from = g_strdup("stdio"); + } else { + migrate_from = g_strdup_printf("unix://%s", path); + } + if (qemuProcessStart(conn, driver, vm, cookie ? cookie->cpu : NULL, - asyncJob, "stdio", *fd, path, wait_incoming, + asyncJob, migrate_from, fd ? *fd : -1, path, wait_incoming, NULL, VIR_NETDEV_VPORT_PROFILE_OP_RESTORE, start_flags) == 0) @@ -859,7 +969,7 @@ qemuSaveImageStartVM(virConnectPtr conn, VIR_DEBUG("Decompression binary stderr: %s", NULLSTR(errbuf)); virErrorRestore(&orig_err); } - if (VIR_CLOSE(*fd) < 0) { + if (fd && VIR_CLOSE(*fd) < 0) { virReportSystemError(errno, _("cannot close file: %s"), path); rc = -1; } diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 91ea8dcbf2..158bcaff50 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -99,7 +99,7 @@ qemuSaveImageStartVM(virConnectPtr conn, bool reset_nvram, bool wait_incoming, virDomainAsyncJob asyncJob) - ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5) ATTRIBUTE_NONNULL(6); + ATTRIBUTE_NONNULL(5) ATTRIBUTE_NONNULL(6); int qemuSaveImageOpen(virQEMUDriver *driver, @@ -117,6 +117,12 @@ qemuSaveImageGetCompressionProgram(const char *imageFormat, bool use_raw_on_fail) ATTRIBUTE_NONNULL(2); +int qemuSaveImageLoadMultiFd(virConnectPtr conn, virDomainObj *vm, + virQEMUSaveData *data, bool reset_nvram, + virQEMUSaveFd *saveFd, virDomainAsyncJob asyncJob) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(5) G_GNUC_WARN_UNUSED_RESULT; + int qemuSaveImageCreate(virQEMUDriver *driver, virDomainObj *vm, -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- docs/manpages/virsh.rst | 11 ++++++++++- tools/virsh-domain.c | 24 ++++++++++++++++++++++-- 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index 965b89c99d..9f33b0aaef 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -3803,6 +3803,7 @@ save :: save domain state-file [--bypass-cache] [--xml file] + [--parallel] [--parallel-connections connections] [{--running | --paused}] [--verbose] Saves a running domain (RAM, but not disk state) to a state file so that @@ -3810,8 +3811,10 @@ it can be restored later. Once saved, the domain will no longer be running on the system, thus the memory allocated for the domain will be free for other domains to use. ``virsh restore`` restores from this state file. + If *--bypass-cache* is specified, the save will avoid the file system -cache, although this may slow down the operation. +cache; depending on the specific scenario this may slow down or speed up +the operation. The progress may be monitored using ``domjobinfo`` virsh command and canceled with ``domjobabort`` command (sent by another virsh instance). Another option @@ -3833,6 +3836,12 @@ based on the state the domain was in when the save was done; passing either the *--running* or *--paused* flag will allow overriding which state the ``restore`` should use. +*--parallel* option will cause the save data to be sent over multiple +parallel connections to the file. The number of extra connections +can be set using *--parallel-connections*. + +Parallel connections help in speeding up large save operations. + Domain saved state files assume that disk images will be unchanged between the creation and restore point. For a more complete system restore point, where the disk state is saved alongside the memory diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index a3f007a1d2..7a313cb29e 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -4174,6 +4174,14 @@ static const vshCmdOptDef opts_save[] = { .type = VSH_OT_BOOL, .help = N_("avoid file system cache when saving") }, + {.name = "parallel", + .type = VSH_OT_BOOL, + .help = N_("enable parallel save") + }, + {.name = "parallel-connections", + .type = VSH_OT_INT, + .help = N_("number of extra connections to use for parallel save") + }, {.name = "xml", .type = VSH_OT_STRING, .completer = virshCompletePathLocalExisting, @@ -4206,6 +4214,8 @@ doSave(void *opaque) virTypedParameterPtr params = NULL; int nparams = 0; int maxparams = 0; + int intOpt = 0; + int rv = -1; unsigned int flags = 0; const char *xmlfile = NULL; g_autofree char *xml = NULL; @@ -4227,6 +4237,15 @@ doSave(void *opaque) } if (vshCommandOptBool(cmd, "bypass-cache")) flags |= VIR_DOMAIN_SAVE_BYPASS_CACHE; + if (vshCommandOptBool(cmd, "parallel")) + flags |= VIR_DOMAIN_SAVE_PARALLEL; + if ((rv = vshCommandOptInt(ctl, cmd, "parallel-connections", &intOpt)) < 0) { + goto out; + } else if (rv > 0) { + if (virTypedParamsAddInt(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, intOpt) < 0) + goto out; + } if (vshCommandOptBool(cmd, "running")) flags |= VIR_DOMAIN_SAVE_RUNNING; if (vshCommandOptBool(cmd, "paused")) @@ -4247,8 +4266,9 @@ doSave(void *opaque) goto out; } } - - if (flags || xml) { + if (flags & VIR_DOMAIN_SAVE_PARALLEL) { + rc = virDomainSaveParams(dom, params, nparams, flags); + } else if (flags || xml) { rc = virDomainSaveFlags(dom, to, xml, flags); } else { rc = virDomainSave(dom, to); -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- docs/manpages/virsh.rst | 11 +++++++++-- tools/virsh-domain.c | 10 +++++++++- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index 9f33b0aaef..5abb03be9a 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -3754,12 +3754,13 @@ restore :: restore state-file [--bypass-cache] [--xml file] - [{--running | --paused}] [--reset-nvram] + [{--running | --paused}] [--reset-nvram] [--parallel] Restores a domain from a ``virsh save`` state file. See *save* for more info. If *--bypass-cache* is specified, the restore will avoid the file system -cache, although this may slow down the operation. +cache; depending on the specific scenario this may slow down or speed up +the operation. *--xml* ``file`` is usually omitted, but can be used to supply an alternative XML file for use on the restored guest with changes only @@ -3775,6 +3776,12 @@ domain should be started in. If *--reset-nvram* is specified, any existing NVRAM file will be deleted and re-initialized from its pristine template. +*--parallel* option will cause the save data to be loaded using parallel +connections. The state file stores the number of channels in the file, +so there is no need to specify the number of parallel connections. + +Parallel connections may help in speeding up large restore operations. + ``Note``: To avoid corrupting file system contents within the domain, you should not reuse the saved state file for a second ``restore`` unless you have also reverted all storage volumes back to the same contents as when diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 7a313cb29e..98a1825c98 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -5312,6 +5312,10 @@ static const vshCmdOptDef opts_restore[] = { .type = VSH_OT_BOOL, .help = N_("avoid file system cache when restoring") }, + {.name = "parallel", + .type = VSH_OT_BOOL, + .help = N_("enable parallel restore") + }, {.name = "xml", .type = VSH_OT_STRING, .completer = virshCompletePathLocalExisting, @@ -5353,6 +5357,8 @@ cmdRestore(vshControl *ctl, const vshCmd *cmd) } if (vshCommandOptBool(cmd, "bypass-cache")) flags |= VIR_DOMAIN_SAVE_BYPASS_CACHE; + if (vshCommandOptBool(cmd, "parallel")) + flags |= VIR_DOMAIN_SAVE_PARALLEL; if (vshCommandOptBool(cmd, "running")) flags |= VIR_DOMAIN_SAVE_RUNNING; if (vshCommandOptBool(cmd, "paused")) @@ -5371,7 +5377,9 @@ cmdRestore(vshControl *ctl, const vshCmd *cmd) goto out; } - if (flags || xml) { + if (flags & VIR_DOMAIN_SAVE_PARALLEL) { + rc = virDomainRestoreParams(priv->conn, params, nparams, flags); + } else if (flags || xml) { rc = virDomainRestoreFlags(priv->conn, from, xml, flags); } else { rc = virDomainRestore(priv->conn, from); -- 2.26.2

add it to both capabilities and migration parameters Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_migration_params.c | 2 ++ src/qemu/qemu_migration_params.h | 1 + tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml | 1 + tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml | 1 + tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml | 1 + tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml | 1 + tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml | 1 + tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml | 1 + 26 files changed, 28 insertions(+) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 44e3f9b81f..065d45e1f9 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -677,6 +677,7 @@ VIR_ENUM_IMPL(virQEMUCaps, "chardev.qemu-vdagent", /* QEMU_CAPS_CHARDEV_QEMU_VDAGENT */ "display-dbus", /* QEMU_CAPS_DISPLAY_DBUS */ "migrate-multifd", /* QEMU_CAPS_MIGRATE_MULTIFD */ + "migration-param.multifd-compression", /* QEMU_CAPS_MIGRATION_PARAM_MULTIFD_COMPRESSION */ ); @@ -1613,6 +1614,7 @@ static struct virQEMUCapsStringFlags virQEMUCapsQMPSchemaQueries[] = { { "migrate-set-parameters/arg-type/downtime-limit", QEMU_CAPS_MIGRATION_PARAM_DOWNTIME }, { "migrate-set-parameters/arg-type/xbzrle-cache-size", QEMU_CAPS_MIGRATION_PARAM_XBZRLE_CACHE_SIZE }, { "migrate-set-parameters/arg-type/block-bitmap-mapping/bitmaps/transform", QEMU_CAPS_MIGRATION_PARAM_BLOCK_BITMAP_MAPPING }, + { "migrate-set-parameters/arg-type/multifd-compression", QEMU_CAPS_MIGRATION_PARAM_MULTIFD_COMPRESSION }, { "nbd-server-start/arg-type/tls-creds", QEMU_CAPS_NBD_TLS }, { "nbd-server-add/arg-type/bitmap", QEMU_CAPS_NBD_BITMAP }, { "netdev_add/arg-type/+vhost-vdpa", QEMU_CAPS_NETDEV_VHOST_VDPA }, diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 4c7e031e28..ceac26a320 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -652,6 +652,7 @@ typedef enum { /* virQEMUCapsFlags grouping marker for syntax-check */ QEMU_CAPS_CHARDEV_QEMU_VDAGENT, /* -chardev qemu-vdagent */ QEMU_CAPS_DISPLAY_DBUS, /* -display dbus */ QEMU_CAPS_MIGRATE_MULTIFD, /* migrate can set multifd parameter */ + QEMU_CAPS_MIGRATION_PARAM_MULTIFD_COMPRESSION, /* multifd-compression in migrate-set-parameters */ QEMU_CAPS_LAST /* this must always be the last item */ } virQEMUCapsFlags; diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index fad61631fa..43975640d9 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -115,6 +115,7 @@ VIR_ENUM_IMPL(qemuMigrationParam, "xbzrle-cache-size", "max-postcopy-bandwidth", "multifd-channels", + "multifd-compression", ); typedef struct _qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOnItem; @@ -234,6 +235,7 @@ static const qemuMigrationParamType qemuMigrationParamTypes[] = { [QEMU_MIGRATION_PARAM_XBZRLE_CACHE_SIZE] = QEMU_MIGRATION_PARAM_TYPE_ULL, [QEMU_MIGRATION_PARAM_MAX_POSTCOPY_BANDWIDTH] = QEMU_MIGRATION_PARAM_TYPE_ULL, [QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS] = QEMU_MIGRATION_PARAM_TYPE_INT, + [QEMU_MIGRATION_PARAM_MULTIFD_COMPRESSION] = QEMU_MIGRATION_PARAM_TYPE_STRING, }; G_STATIC_ASSERT(G_N_ELEMENTS(qemuMigrationParamTypes) == QEMU_MIGRATION_PARAM_LAST); diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index 036c4e7582..900ad7cfc8 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -60,6 +60,7 @@ typedef enum { QEMU_MIGRATION_PARAM_XBZRLE_CACHE_SIZE, QEMU_MIGRATION_PARAM_MAX_POSTCOPY_BANDWIDTH, QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, + QEMU_MIGRATION_PARAM_MULTIFD_COMPRESSION, QEMU_MIGRATION_PARAM_LAST } qemuMigrationParam; diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml index 2a7f5074bc..de995c4434 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.aarch64.xml @@ -175,6 +175,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml index d2c083e8f7..a366282d55 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.ppc64.xml @@ -181,6 +181,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml index 933b4870cd..c350c1b9a9 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.riscv64.xml @@ -167,6 +167,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml index 9bb48e1982..938775e7bb 100644 --- a/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.0.0.x86_64.xml @@ -215,6 +215,7 @@ <flag name='memory-backend-file.prealloc-threads'/> <flag name='virtio-iommu-pci'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100241</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml b/tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml index f01cd25278..cef1e418ba 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.sparc.xml @@ -87,6 +87,7 @@ <flag name='query-display-options'/> <flag name='memory-backend-file.prealloc-threads'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml index f4446674a4..c78c078b24 100644 --- a/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.1.0.x86_64.xml @@ -219,6 +219,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml index f3e3640657..9426ef386c 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.aarch64.xml @@ -182,6 +182,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml index 86d021c2fc..c39e51d595 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.ppc64.xml @@ -186,6 +186,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml index 2079257759..59742d607f 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.riscv64.xml @@ -172,6 +172,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>0</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml b/tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml index d800e35096..bc7e701f1a 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.s390x.xml @@ -140,6 +140,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml index f189e299c3..b1aff5f7b9 100644 --- a/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_5.2.0.x86_64.xml @@ -223,6 +223,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>5002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml index e86a2cf5d6..c3dae8c8db 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.aarch64.xml @@ -190,6 +190,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>6000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml b/tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml index bbc5825def..654eb92d4f 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.s390x.xml @@ -148,6 +148,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>6000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>39100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml index 496dd5564c..2350d61f93 100644 --- a/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.0.0.x86_64.xml @@ -232,6 +232,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>6000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100242</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml index fab5e40e35..2e682ae4ba 100644 --- a/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.1.0.x86_64.xml @@ -237,6 +237,7 @@ <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>6001000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml index a83b8c8d77..ee90c76c30 100644 --- a/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_6.2.0.aarch64.xml @@ -202,6 +202,7 @@ <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>6001050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700244</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml index bfacae9b3b..07b06a36f6 100644 --- a/tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_6.2.0.ppc64.xml @@ -197,6 +197,7 @@ <flag name='virtio-iommu-pci'/> <flag name='virtio-net.rss'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>6002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900244</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml index 1293cd1bf1..8597367f20 100644 --- a/tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_6.2.0.x86_64.xml @@ -239,6 +239,7 @@ <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>6002000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100244</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml b/tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml index 16626a1342..b15510d7d1 100644 --- a/tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml +++ b/tests/qemucapabilitiesdata/caps_7.0.0.aarch64.xml @@ -210,6 +210,7 @@ <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>6002092</version> <kvmVersion>0</kvmVersion> <microcodeVersion>61700243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml b/tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml index 9f4242c5c0..4017023486 100644 --- a/tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml +++ b/tests/qemucapabilitiesdata/caps_7.0.0.ppc64.xml @@ -214,6 +214,7 @@ <flag name='virtio-net.rss'/> <flag name='chardev.qemu-vdagent'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>7000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>42900243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml index 1d003a6fb7..74f2e28a06 100644 --- a/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_7.0.0.x86_64.xml @@ -244,6 +244,7 @@ <flag name='chardev.qemu-vdagent'/> <flag name='display-dbus'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>7000000</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100243</microcodeVersion> diff --git a/tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml b/tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml index 80762d9fe9..efbfc94dfe 100644 --- a/tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml +++ b/tests/qemucapabilitiesdata/caps_7.1.0.x86_64.xml @@ -244,6 +244,7 @@ <flag name='chardev.qemu-vdagent'/> <flag name='display-dbus'/> <flag name='migrate-multifd'/> + <flag name='migration-param.multifd-compression'/> <version>7000050</version> <kvmVersion>0</kvmVersion> <microcodeVersion>43100244</microcodeVersion> -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- include/libvirt/libvirt-domain.h | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index 0b73ee27b6..0858d02e48 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -1613,6 +1613,17 @@ int virDomainRestoreParams (virConnectPtr conn, */ # define VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS "parallel.connections" +/** + * VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION: + * + * this optional parameter is used in conjunction with the flag + * VIR_DOMAIN_SAVE_PARALLEL during save to ask the hypervisor for + * compressed channels to be used using this algorithm. + * + * Since: 8.4.0 + */ +# define VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION "parallel.compression" + /* See below for virDomainSaveImageXMLFlags */ char * virDomainSaveImageGetXMLDesc (virConnectPtr conn, const char *file, -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_driver.c | 11 ++++++----- src/qemu/qemu_saveimage.c | 1 + src/qemu/qemu_saveimage.h | 1 + src/qemu/qemu_snapshot.c | 2 +- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c3b2afe0d8..8839e0031d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2650,7 +2650,8 @@ static int qemuDomainSaveInternal(virQEMUDriver *driver, virDomainObj *vm, const char *path, int compressed, virCommand *compressor, - const char *xmlin, int nconn, unsigned int flags) + const char *xmlin, int nconn, const char *pcomp, + unsigned int flags) { g_autofree char *xml = NULL; bool was_running = false; @@ -2731,7 +2732,7 @@ qemuDomainSaveInternal(virQEMUDriver *driver, xml = NULL; ret = qemuSaveImageCreate(driver, vm, path, data, compressor, - nconn, flags, VIR_ASYNC_JOB_SAVE); + nconn, pcomp, flags, VIR_ASYNC_JOB_SAVE); if (ret < 0) goto endjob; @@ -2809,7 +2810,7 @@ qemuDomainManagedSaveHelper(virQEMUDriver *driver, VIR_INFO("Saving state of domain '%s' to '%s'", vm->def->name, path); if (qemuDomainSaveInternal(driver, vm, path, compressed, - compressor, dxml, -1, flags) < 0) + compressor, dxml, -1, NULL, flags) < 0) return -1; vm->hasManagedSave = true; @@ -2848,7 +2849,7 @@ qemuDomainSaveFlags(virDomainPtr dom, const char *path, const char *dxml, goto cleanup; ret = qemuDomainSaveInternal(driver, vm, path, compressed, - compressor, dxml, -1, flags); + compressor, dxml, -1, NULL, flags); cleanup: virDomainObjEndAPI(&vm); @@ -2922,7 +2923,7 @@ qemuDomainSaveParams(virDomainPtr dom, goto cleanup; ret = qemuDomainSaveInternal(driver, vm, to, compressed, - compressor, dxml, nconn, flags); + compressor, dxml, nconn, NULL, flags); cleanup: virDomainObjEndAPI(&vm); diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 2ed02d31de..505d9b615f 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -561,6 +561,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, virQEMUSaveData *data, virCommand *compressor, int nconn, + const char *pcomp G_GNUC_UNUSED, unsigned int flags, virDomainAsyncJob asyncJob) { diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h index 158bcaff50..8542cd9081 100644 --- a/src/qemu/qemu_saveimage.h +++ b/src/qemu/qemu_saveimage.h @@ -130,6 +130,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, virQEMUSaveData *data, virCommand *compressor, int nconn, + const char *pcomp, unsigned int flags, virDomainAsyncJob asyncJob); diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c index dec653c6d3..e1c18bbaff 100644 --- a/src/qemu/qemu_snapshot.c +++ b/src/qemu/qemu_snapshot.c @@ -1457,7 +1457,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver, memory_existing = virFileExists(snapdef->memorysnapshotfile); if ((ret = qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile, - data, compressor, -1, 0, + data, compressor, -1, NULL, 0, VIR_ASYNC_JOB_SNAPSHOT)) < 0) goto cleanup; -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_driver.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8839e0031d..52f3300c55 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2874,6 +2874,7 @@ qemuDomainSaveParams(virDomainPtr dom, g_autoptr(virCommand) compressor = NULL; const char *to = NULL; const char *dxml = NULL; + const char *pcomp = NULL; int compressed; int ret = -1; int nconn = 2; @@ -2890,6 +2891,8 @@ qemuDomainSaveParams(virDomainPtr dom, VIR_TYPED_PARAM_STRING, VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, VIR_TYPED_PARAM_INT, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION, + VIR_TYPED_PARAM_STRING, NULL) < 0) return -1; @@ -2901,6 +2904,8 @@ qemuDomainSaveParams(virDomainPtr dom, return -1; if (virTypedParamsGetInt(params, nparams, VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, &nconn) < 0) return -1; + if (virTypedParamsGetString(params, nparams, VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION, &pcomp) < 0) + return -1; if (!(vm = qemuDomainObjFromDomain(dom))) goto cleanup; @@ -2923,7 +2928,7 @@ qemuDomainSaveParams(virDomainPtr dom, goto cleanup; ret = qemuDomainSaveInternal(driver, vm, to, compressed, - compressor, dxml, nconn, NULL, flags); + compressor, dxml, nconn, pcomp, flags); cleanup: virDomainObjEndAPI(&vm); -- 2.26.2

change from static to external linkage, and move the function close to the other similar ones, near qemuMigrationParamsSetULL. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_migration_params.c | 47 +++++++++++++++----------------- src/qemu/qemu_migration_params.h | 5 ++++ 2 files changed, 27 insertions(+), 25 deletions(-) diff --git a/src/qemu/qemu_migration_params.c b/src/qemu/qemu_migration_params.c index 43975640d9..3efe40fc66 100644 --- a/src/qemu/qemu_migration_params.c +++ b/src/qemu/qemu_migration_params.c @@ -900,31 +900,6 @@ qemuMigrationParamsApply(virQEMUDriver *driver, } -/** - * qemuMigrationParamsSetString: - * @migrParams: migration parameter object - * @param: parameter to set - * @value: new value - * - * Enables and sets the migration parameter @param in @migrParams. Returns 0 on - * success and -1 on error. Libvirt error is reported. - */ -static int -qemuMigrationParamsSetString(qemuMigrationParams *migParams, - qemuMigrationParam param, - const char *value) -{ - if (qemuMigrationParamsCheckType(param, QEMU_MIGRATION_PARAM_TYPE_STRING) < 0) - return -1; - - migParams->params[param].value.s = g_strdup(value); - - migParams->params[param].set = true; - - return 0; -} - - /* qemuMigrationParamsEnableTLS * @driver: pointer to qemu driver * @vm: domain object @@ -1146,6 +1121,28 @@ qemuMigrationParamsSetULL(qemuMigrationParams *migParams, return 0; } +/** + * qemuMigrationParamsSetString: + * @migrParams: migration parameter object + * @param: parameter to set + * @value: new value + * + * Enables and sets the migration parameter @param in @migrParams. Returns 0 on + * success and -1 on error. Libvirt error is reported. + */ +int +qemuMigrationParamsSetString(qemuMigrationParams *migParams, + qemuMigrationParam param, + const char *value) +{ + if (qemuMigrationParamsCheckType(param, QEMU_MIGRATION_PARAM_TYPE_STRING) < 0) + return -1; + + migParams->params[param].value.s = g_strdup(value); + migParams->params[param].set = true; + return 0; +} + /** * Returns -1 on error, diff --git a/src/qemu/qemu_migration_params.h b/src/qemu/qemu_migration_params.h index 900ad7cfc8..f69bb9b302 100644 --- a/src/qemu/qemu_migration_params.h +++ b/src/qemu/qemu_migration_params.h @@ -138,6 +138,11 @@ qemuMigrationParamsSetULL(qemuMigrationParams *migParams, qemuMigrationParam param, unsigned long long value); +int +qemuMigrationParamsSetString(qemuMigrationParams *migParams, + qemuMigrationParam param, + const char *value); + int qemuMigrationParamsGetULL(qemuMigrationParams *migParams, qemuMigrationParam param, -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_migration.c | 17 +++++++++++++---- src/qemu/qemu_migration.h | 2 +- src/qemu/qemu_saveimage.c | 15 ++++++++++++--- 3 files changed, 26 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index c214f88dd1..5127dc07fe 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -5884,7 +5884,7 @@ qemuMigrationSrcToFileAux(virQEMUDriver *driver, virDomainObj *vm, virCommand *compressor, virDomainAsyncJob asyncJob, const char *sun_path, - int nchannels) + int nchannels, const char *pcomp) { qemuDomainObjPrivate *priv = vm->privateData; bool bwParam = virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_PARAM_BANDWIDTH); @@ -5931,6 +5931,15 @@ qemuMigrationSrcToFileAux(virQEMUDriver *driver, virDomainObj *vm, QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, nchannels) < 0) return -1; + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_PARAM_MULTIFD_COMPRESSION)) { + if (qemuMigrationParamsSetString(migParams, + QEMU_MIGRATION_PARAM_MULTIFD_COMPRESSION, pcomp) < 0) + return -1; + } else { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("QEMU does not seem to support multifd compression")); + return -1; + } } if (needParams && qemuMigrationParamsApply(driver, vm, asyncJob, migParams) < 0) @@ -6056,17 +6065,17 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm, virDomainAsyncJob asyncJob) { return qemuMigrationSrcToFileAux(driver, vm, fd, compressor, - asyncJob, NULL, -1); + asyncJob, NULL, -1, NULL); } int qemuMigrationSrcToFilesMultiFd(virQEMUDriver *driver, virDomainObj *vm, virDomainAsyncJob asyncJob, const char *sun_path, - int nchannels) + int nchannels, const char *pcomp) { return qemuMigrationSrcToFileAux(driver, vm, -1, NULL, - asyncJob, sun_path, nchannels); + asyncJob, sun_path, nchannels, pcomp); } int diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 38f4877cf0..d6185770b2 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -223,7 +223,7 @@ int qemuMigrationSrcToFilesMultiFd(virQEMUDriver *driver, virDomainObj *vm, virDomainAsyncJob asyncJob, const char *sun_path, - int nchannels) + int nchannels, const char *pcomp) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT; int diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index 505d9b615f..f60040ef25 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -561,12 +561,13 @@ qemuSaveImageCreate(virQEMUDriver *driver, virQEMUSaveData *data, virCommand *compressor, int nconn, - const char *pcomp G_GNUC_UNUSED, + const char *pcomp, unsigned int flags, virDomainAsyncJob asyncJob) { g_autoptr(virQEMUDriverConfig) cfg = virQEMUDriverGetConfig(driver); virQEMUSaveFd saveFd = QEMU_SAVEFD_INVALID; + virQEMUSaveMultiFdComp multiComp = QEMU_SAVE_MULTIFD_COMP_NONE; unsigned int oflags = O_WRONLY | O_TRUNC | O_CREAT; int ret = -1; @@ -578,11 +579,19 @@ qemuSaveImageCreate(virQEMUDriver *driver, } oflags |= O_DIRECT; } - + if (!pcomp || !pcomp[0]) { + pcomp = qemuSaveMultiFdCompTypeToString(QEMU_SAVE_MULTIFD_COMP_NONE); + } if (virQEMUSaveFdInit(&saveFd, path, oflags, cfg, flags & VIR_DOMAIN_SAVE_PARALLEL) < 0) goto cleanup; if (nconn > 0) { data->header.multifd_channels = nconn; + if ((multiComp = qemuSaveMultiFdCompTypeFromString(pcomp)) < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("Invalid %s multifd compression format specified"), pcomp); + goto cleanup; + } + data->header.multifd_comp = multiComp; } if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, saveFd.fd) < 0) goto cleanup; @@ -614,7 +623,7 @@ qemuSaveImageCreate(virQEMUDriver *driver, goto cleanup; if (chown(sun_path, cfg->user, cfg->group) < 0) goto cleanup; - if (qemuMigrationSrcToFilesMultiFd(driver, vm, asyncJob, sun_path, nconn) < 0) + if (qemuMigrationSrcToFilesMultiFd(driver, vm, asyncJob, sun_path, nconn, pcomp) < 0) goto cleanup; } else { /* Perform non-parallel migration to file */ -- 2.26.2

Signed-off-by: Claudio Fontana <cfontana@suse.de> --- src/qemu/qemu_saveimage.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c index f60040ef25..d3757254a9 100644 --- a/src/qemu/qemu_saveimage.c +++ b/src/qemu/qemu_saveimage.c @@ -708,6 +708,23 @@ int qemuSaveImageLoadMultiFd(virConnectPtr conn, virDomainObj *vm, QEMU_MIGRATION_PARAM_MULTIFD_CHANNELS, saveFd->nchannels) < 0) goto cleanup; + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATION_PARAM_MULTIFD_COMPRESSION)) { + const char *pcomp = qemuSaveMultiFdCompTypeToString(data->header.multifd_comp); + if (!pcomp) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("libvirt does not support parallel compression type %u"), + data->header.multifd_comp); + goto cleanup; + } + if (qemuMigrationParamsSetString(migParams, + QEMU_MIGRATION_PARAM_MULTIFD_COMPRESSION, + pcomp) < 0) + goto cleanup; + } else if (data->header.multifd_comp != QEMU_SAVE_MULTIFD_COMP_NONE) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("QEMU does not seem to support multifd compression")); + goto cleanup; + } if (qemuMigrationParamsApply(driver, vm, asyncJob, migParams) < 0) goto cleanup; -- 2.26.2

this completes the save side of the parallel compression support. Signed-off-by: Claudio Fontana <cfontana@suse.de> --- docs/manpages/virsh.rst | 4 ++++ tools/virsh-domain.c | 12 ++++++++++++ 2 files changed, 16 insertions(+) diff --git a/docs/manpages/virsh.rst b/docs/manpages/virsh.rst index 5abb03be9a..19fe94ed73 100644 --- a/docs/manpages/virsh.rst +++ b/docs/manpages/virsh.rst @@ -3811,6 +3811,7 @@ save save domain state-file [--bypass-cache] [--xml file] [--parallel] [--parallel-connections connections] + [--parallel-compression algo] [{--running | --paused}] [--verbose] Saves a running domain (RAM, but not disk state) to a state file so that @@ -3849,6 +3850,9 @@ can be set using *--parallel-connections*. Parallel connections help in speeding up large save operations. +*--parallel-compression* can be used to ask the hypervisor to provide +compressed channels in the save stream using algorithm ``algo``. + Domain saved state files assume that disk images will be unchanged between the creation and restore point. For a more complete system restore point, where the disk state is saved alongside the memory diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 98a1825c98..dfb62b592e 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -4182,6 +4182,10 @@ static const vshCmdOptDef opts_save[] = { .type = VSH_OT_INT, .help = N_("number of extra connections to use for parallel save") }, + {.name = "parallel-compression", + .type = VSH_OT_STRING, + .help = N_("compression algorithm and format for parallel save") + }, {.name = "xml", .type = VSH_OT_STRING, .completer = virshCompletePathLocalExisting, @@ -4211,6 +4215,7 @@ doSave(void *opaque) g_autoptr(virshDomain) dom = NULL; const char *name = NULL; const char *to = NULL; + const char *pcomp = NULL; virTypedParameterPtr params = NULL; int nparams = 0; int maxparams = 0; @@ -4246,6 +4251,13 @@ doSave(void *opaque) VIR_DOMAIN_SAVE_PARAM_PARALLEL_CONNECTIONS, intOpt) < 0) goto out; } + if ((rv = vshCommandOptStringReq(ctl, cmd, "parallel-compression", &pcomp)) < 0) { + goto out; + } else { + if (virTypedParamsAddString(¶ms, &nparams, &maxparams, + VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION, pcomp) < 0) + goto out; + } if (vshCommandOptBool(cmd, "running")) flags |= VIR_DOMAIN_SAVE_RUNNING; if (vshCommandOptBool(cmd, "paused")) -- 2.26.2

Hello Daniel and all @libvirt, reviving this old thread for multifd save/restore (for high performance migration of a large VM to an nvme disk and viceversa), now that we have made substantial progress on point b) below :-) There is a lot of ongoing work in QEMU spearheaded by Fabiano to migrate in an optimized way now to/from disk by leveraging among others multifd and file offsets. A couple questions that come to mind for the libvirt side though. In terms of our use case, we would need to trigger these migrations from virsh save, restore, managedsave / start. 1) Can you confirm this is still a good target? It would seem right from my perspective to hook up save/restore first, and then reuse the same mechanism for managedsave / start. 2) Do we expect to pass filename or file descriptor from libvirt into QEMU? As is, libvirt today generally passes an already opened file descriptor to QEMU for migrations, roughly: {"execute": "getfd", "arguments": {"fdname":"migrate"}} (passing the already open fd from libvirt f.e. 10) {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"}}' Do we want to change libvirt to migrate to a file: URI ? Does this have consequence for "labeling" / security sandboxing? Or would it be better to continue opening the fd in libvirt, writing the libvirt header, and then passing the existing open fd to QEMU, using QMP command "getfd", followed by "migrate"? In this second case we would need to inform QEMU of the offset into the already open fd. Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about, but who should own it, libvirt or QEMU? 3) How do we deal with O_DIRECT? In the prototype we were setting the O_DIRECT on the fd from libvirt in response to the user request for --bypass-cache, which is needed 99% of the time with large VMs. I think I remember that we plan to write from libvirt normally (without O_DIRECT) and then set the flag later, but should libvirt or QEMU set the O_DIRECT flag? This likely depends on who owns the fd? Thank you for your thoughts from the libvirt side and your help in resurrecting this topic, Claudio On 6/7/22 11:19, Claudio Fontana wrote:
This is v11 of the multifd save prototype, which focuses on saving to a single file instead of requiring multiple separate files for multifd channels.
This series demonstrates a way to save and restore from a single file by using interleaved channels of a size equal to the transfer buffer size (1MB), and relying on UNIX holes to avoid wasting physical disk space due to channels of different size.
KNOWN ISSUES:
a) still applies only to save/restore (no managed save etc)
b) this is not not done in QEMU, where it could be possible to teach QEMU to migrate directly to a file or block device in a block-aligned way by altering the migration stream code and the state migration code of all devices.
changes from v10:
* virfile: add new API virFileDiskCopyChannel, which extends the existing virFileDiskCopy to work with parallel channels in the file.
* drop use of virthread API, use GLIB for threads.
* pass only a single FD to the multifd-helper, which will then open additional FDs as required for the multithreaded I/O.
* simplify virQEMUSaveFd API, separating the initialization from the addition of extra channels.
* adapt all documentation to mention a single file instead of multiple.
* remove the "Lim" versions of virFileDirectRead and Write, they are not needed.
---
changes from v9:
* exposed virFileDirectAlign
* separated the >= 2 QEMU_SAVE_VERSION change in own patch
* reworked the write code to add the alignment padding to the data_len, making the on disk format compatible when loaded from an older libvirt.
* reworked the read code to use direct I/O APIs only for actual direct I/O file descriptors, so as to make old images work with newer libvirt.
---
changes from v8:
* rebased on master
* reordered patches to add more upstreamable content at the start
* split introduction of virQEMUSaveFd, so the first part is multifd-free
* new virQEMUSaveDataRead as a mirror of virQEMUSaveDataWrite
* introduced virFileDirect API, using it in virFileDisk operations and for virQEMUSaveRead and virQEMUSaveWrite
---
changes from v7:
* [ base params API and iohelper refactoring upstreamed ]
* extended the QEMU save image format more, to record the nr of multifd channels on save. Made the data header struct packed.
* removed --parallel-connections from the restore command, as now it is useless due to QEMU save image format extension.
* separate out patches to expose migration_params APIs to saveimage, including qemuMigrationParamsSetString, SetCap, SetInt.
* fixed bugs in the ImageOpen patch (missing saveFd init), removed some whitespace, and fixed some convoluted code paths for return value -3.
---
changes from v6:
* improved error path handling, with error messages and especially cancellation of qemu process on error during restore.
* split patches more and reordered them to keep general refactoring at the beginning before the --parallel stuff is introduced.
* improved multifd compression support, including adding an enum and extending the QEMU save image format to record the compression used on save, and pick it up automatically on restore.
---
changes from v4:
* runIO renamed to virFileDiskCopy and rethought arguments
* renamed new APIs from ...ParametersFlags to ...Params
* introduce the new virDomainSaveParams and virDomainRestoreParams without any additional parameters, so they can be upstreamed first.
* solved the issue in the gendispatch.pl script generating code that was missing the conn parameter.
---
changes from v3:
* reordered series to have all helper-related change at the start
* solved all reported issues from ninja test, including documentation
* fixed most broken migration capabilities code (still imperfect likely)
* added G_GNUC_UNUSED as needed
* after multifd restore, added what I think were the missing operations:
qemuProcessRefreshState(), qemuProcessStartCPUs() - most importantly, virDomainObjSave()
The domain now starts running after restore without further encouragement
* removed the sleep(10) from the multifd-helper
---
changes from v2:
* added ability to restore the VM from disk using multifd
* fixed the multifd-helper to work in both directions, assuming the need to listen for save, and connect for restore.
* fixed a large number of bugs, and probably introduced some :-)
---
Claudio Fontana (33): virfile: introduce virFileDirect APIs virfile: use virFileDirect API in runIOCopy qemu: saveimage: rework image read/write to be O_DIRECT friendly qemu: saveimage: assume future formats will also support compression virfile: virFileDiskCopy: prepare for O_DIRECT files without wrapper qemu: saveimage: introduce virQEMUSaveFd qemu: saveimage: convert qemuSaveImageCreate to use virQEMUSaveFd qemu: saveimage: convert qemuSaveImageOpen to use virQEMUSaveFd tools: prepare doSave to use parameters tools: prepare cmdRestore to use parameters libvirt: add new VIR_DOMAIN_SAVE_PARALLEL flag and parameter qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in save qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in restore virfile: add new API virFileDiskCopyChannel multifd-helper: new helper for parallel save/restore qemu: saveimage: update virQEMUSaveFd struct for parallel save qemu: saveimage: wire up saveimage code with the multifd helper qemu: capabilities: add multifd to the probed migration capabilities qemu: saveimage: add multifd related fields to save format qemu: migration_params: add APIs to set Int and Cap qemu: migration: implement qemuMigrationSrcToFilesMultiFd for save qemu: add parameter to qemuMigrationDstRun to skip waiting qemu: implement qemuSaveImageLoadMultiFd for restore tools: add parallel parameter to virsh save command tools: add parallel parameter to virsh restore command qemu: add migration parameter multifd-compression libvirt: add new VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION qemu: saveimage: add parallel compression argument to ImageCreate qemu: saveimage: add stub support for multifd compression parameter qemu: migration: expose qemuMigrationParamsSetString qemu: saveimage: implement multifd-compression in parallel save qemu: saveimage: restore compressed parallel images tools: add parallel-compression parameter to virsh save command
docs/manpages/virsh.rst | 26 +- include/libvirt/libvirt-domain.h | 24 + po/POTFILES | 1 + src/libvirt_private.syms | 6 + src/qemu/qemu_capabilities.c | 4 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_driver.c | 146 ++-- src/qemu/qemu_migration.c | 160 ++-- src/qemu/qemu_migration.h | 16 +- src/qemu/qemu_migration_params.c | 71 +- src/qemu/qemu_migration_params.h | 15 + src/qemu/qemu_process.c | 3 +- src/qemu/qemu_process.h | 5 +- src/qemu/qemu_saveimage.c | 703 +++++++++++++----- src/qemu/qemu_saveimage.h | 69 +- src/qemu/qemu_snapshot.c | 6 +- src/util/iohelper.c | 3 + src/util/meson.build | 16 + src/util/multifd-helper.c | 359 +++++++++ src/util/virfile.c | 391 +++++++--- src/util/virfile.h | 11 + .../caps_4.0.0.aarch64.xml | 1 + .../qemucapabilitiesdata/caps_4.0.0.ppc64.xml | 1 + .../caps_4.0.0.riscv32.xml | 1 + .../caps_4.0.0.riscv64.xml | 1 + .../qemucapabilitiesdata/caps_4.0.0.s390x.xml | 1 + .../caps_4.0.0.x86_64.xml | 1 + .../caps_4.1.0.x86_64.xml | 1 + .../caps_4.2.0.aarch64.xml | 1 + .../qemucapabilitiesdata/caps_4.2.0.ppc64.xml | 1 + .../qemucapabilitiesdata/caps_4.2.0.s390x.xml | 1 + .../caps_4.2.0.x86_64.xml | 1 + .../caps_5.0.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 2 + .../caps_5.0.0.riscv64.xml | 2 + .../caps_5.0.0.x86_64.xml | 2 + .../qemucapabilitiesdata/caps_5.1.0.sparc.xml | 2 + .../caps_5.1.0.x86_64.xml | 2 + .../caps_5.2.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_5.2.0.ppc64.xml | 2 + .../caps_5.2.0.riscv64.xml | 2 + .../qemucapabilitiesdata/caps_5.2.0.s390x.xml | 2 + .../caps_5.2.0.x86_64.xml | 2 + .../caps_6.0.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_6.0.0.s390x.xml | 2 + .../caps_6.0.0.x86_64.xml | 2 + .../caps_6.1.0.x86_64.xml | 2 + .../caps_6.2.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_6.2.0.ppc64.xml | 2 + .../caps_6.2.0.x86_64.xml | 2 + .../caps_7.0.0.aarch64.xml | 2 + .../qemucapabilitiesdata/caps_7.0.0.ppc64.xml | 2 + .../caps_7.0.0.x86_64.xml | 2 + .../caps_7.1.0.x86_64.xml | 2 + tools/virsh-domain.c | 101 ++- 55 files changed, 1748 insertions(+), 445 deletions(-) create mode 100644 src/util/multifd-helper.c

On Wed, Oct 11, 2023 at 03:46:59PM +0200, Claudio Fontana wrote:
In terms of our use case, we would need to trigger these migrations from virsh save, restore, managedsave / start.
1) Can you confirm this is still a good target?
IIRC the 'dump' command also has a codepath that can exercise the migrate-to-file logic too.
It would seem right from my perspective to hook up save/restore first, and then reuse the same mechanism for managedsave / start.
All of save, restore, managedsave, start, dump end up calling into the same internal helper methods. So once you update these helpers, you essentially get all the commands converted in one go.
2) Do we expect to pass filename or file descriptor from libvirt into QEMU?
As is, libvirt today generally passes an already opened file descriptor to QEMU for migrations, roughly:
{"execute": "getfd", "arguments": {"fdname":"migrate"}} (passing the already open fd from libvirt f.e. 10) {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"}}'
Do we want to change libvirt to migrate to a file: URI ? Does this have consequence for "labeling" / security sandboxing?
Or would it be better to continue opening the fd in libvirt, writing the libvirt header, and then passing the existing open fd to QEMU, using QMP command "getfd", followed by "migrate"? In this second case we would need to inform QEMU of the offset into the already open fd.
How about both :-) The current migration 'fd' protocol technically can cope with any type of FD being passed on. QEMU doesn't try to interpret the FD type right to any significant degree. The 'file' protocol is explicitly providing a migration transport supporting random access I/O to storage. As such we can specify the offset too. Now the neat trick is that 'file' protocol impl uses qio_channel_file and this in turn uses qemu_open, which supports FD passing. Instead of using 'getfd' though we have to use 'add-fd'. Anyway, this lets us do FD passing as normal, whle also letting us specify the offset. {"execute": "add-fd", "arguments": {"fdset-id":"migrate"}} {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about, but who should own it, libvirt or QEMU?
How about both :-) Libvirt will open the file, in order to write its header. Then libvirt passes the open FD to QEMU, specifying the offset, and QEMU does its thing with vmstate, etc and closes the FD when its done. libvirt's copy of the FD is still open, and libvirt can finalize its header and close the FD.
3) How do we deal with O_DIRECT? In the prototype we were setting the O_DIRECT on the fd from libvirt in response to the user request for --bypass-cache, which is needed 99% of the time with large VMs. I think I remember that we plan to write from libvirt normally (without O_DIRECT) and then set the flag later, but should libvirt or QEMU set the O_DIRECT flag? This likely depends on who owns the fd?
For O_DIRECT, the 'file' protocol should gain a new parameter 'bypass_cache: bool'. If this is set to 'true' then QEMU can set O_DIRECT on the FD it opens or receives from libvirt. Libvirt probably just has to be careful to unset O_DIRECT at the end before it finalizes the header. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Hi Daniel, thanks for your answer, On 10/11/23 16:05, Daniel P. Berrangé wrote:
On Wed, Oct 11, 2023 at 03:46:59PM +0200, Claudio Fontana wrote:
In terms of our use case, we would need to trigger these migrations from virsh save, restore, managedsave / start.
1) Can you confirm this is still a good target?
IIRC the 'dump' command also has a codepath that can exercise the migrate-to-file logic too.
It would seem right from my perspective to hook up save/restore first, and then reuse the same mechanism for managedsave / start.
All of save, restore, managedsave, start, dump end up calling into the same internal helper methods. So once you update these helpers, you essentially get all the commands converted in one go.
ok
2) Do we expect to pass filename or file descriptor from libvirt into QEMU?
As is, libvirt today generally passes an already opened file descriptor to QEMU for migrations, roughly:
{"execute": "getfd", "arguments": {"fdname":"migrate"}} (passing the already open fd from libvirt f.e. 10) {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"}}'
Do we want to change libvirt to migrate to a file: URI ? Does this have consequence for "labeling" / security sandboxing?
Or would it be better to continue opening the fd in libvirt, writing the libvirt header, and then passing the existing open fd to QEMU, using QMP command "getfd", followed by "migrate"? In this second case we would need to inform QEMU of the offset into the already open fd.
How about both :-)
The current migration 'fd' protocol technically can cope with any type of FD being passed on. QEMU doesn't try to interpret the FD type right to any significant degree.
The 'file' protocol is explicitly providing a migration transport supporting random access I/O to storage. As such we can specify the offset too.
Now the neat trick is that 'file' protocol impl uses qio_channel_file and this in turn uses qemu_open, which supports FD passing.
Interesting!
Instead of using 'getfd' though we have to use 'add-fd'.
Anyway, this lets us do FD passing as normal, whle also letting us specify the offset.
{"execute": "add-fd", "arguments": {"fdset-id":"migrate"}} {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about, but who should own it, libvirt or QEMU?
How about both :-)
I need to familiarize a bit with this, there are pieces I am missing. Can you correct here? OPTION 1) libvirt opens the file and has the FD, writes the header, marks the offset, then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of the dup to "O_DIRECT" (the usual case) depending on --bypass-cache, pass the duped FD to QEMU, QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter), then libvirt closes the duped fd libvirt rewrites the header using the original fd (needed to update the metadata), libvirt closes the original fd OPTION 2) libvirt opens the file and has the FD, writes the header, marks the offset, then we pass the FD to QEMU, QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter, QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter), QEMU closes the duped FD, libvirt rewrites the header using the original fd (needed to update the metadata), libvirt closes the original fd I don't remember if QEMU changes for the file offsets optimization are already "block friendly" ie they operate correctly whatever the state of O_DIRECT or ~O_DIRECT, I think so. They have been thought with O_DIRECT in mind. So I would tend to see OPTION 1) as more attractive as QEMU does not need to care about another parameter, whatever has been chosen in libvirt in terms of bypass cache is handled in libvirt. Please correct my understanding where needed, thanks! Claudio
Libvirt will open the file, in order to write its header. Then libvirt passes the open FD to QEMU, specifying the offset, and QEMU does its thing with vmstate, etc and closes the FD when its done. libvirt's copy of the FD is still open, and libvirt can finalize its header and close the FD.
3) How do we deal with O_DIRECT? In the prototype we were setting the O_DIRECT on the fd from libvirt in response to the user request for --bypass-cache, which is needed 99% of the time with large VMs. I think I remember that we plan to write from libvirt normally (without O_DIRECT) and then set the flag later, but should libvirt or QEMU set the O_DIRECT flag? This likely depends on who owns the fd?
For O_DIRECT, the 'file' protocol should gain a new parameter 'bypass_cache: bool'. If this is set to 'true' then QEMU can set O_DIRECT on the FD it opens or receives from libvirt.
Libvirt probably just has to be careful to unset O_DIRECT at the end before it finalizes the header.
With regards, Daniel

On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote:
On 10/11/23 16:05, Daniel P. Berrangé wrote:
Instead of using 'getfd' though we have to use 'add-fd'.
Anyway, this lets us do FD passing as normal, whle also letting us specify the offset.
{"execute": "add-fd", "arguments": {"fdset-id":"migrate"}} {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about, but who should own it, libvirt or QEMU?
How about both :-)
I need to familiarize a bit with this, there are pieces I am missing. Can you correct here?
OPTION 1)
libvirt opens the file and has the FD, writes the header, marks the offset, then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of the dup to "O_DIRECT" (the usual case) depending on --bypass-cache, pass the duped FD to QEMU, QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter), then libvirt closes the duped fd libvirt rewrites the header using the original fd (needed to update the metadata), libvirt closes the original fd
OPTION 2)
libvirt opens the file and has the FD, writes the header, marks the offset, then we pass the FD to QEMU, QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter, QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter), QEMU closes the duped FD, libvirt rewrites the header using the original fd (needed to update the metadata), libvirt closes the original fd
I don't remember if QEMU changes for the file offsets optimization are already "block friendly" ie they operate correctly whatever the state of O_DIRECT or ~O_DIRECT, I think so. They have been thought with O_DIRECT in mind.
The 'file' protocol as it exists currently is not O_DIRECT capable. It is not writing aligned buffers to aligned offsets in the file. It is still running the regular old migration stream format over the file, not taking advantage of it being random access. What's needed is the followup "fixed ram" format adaptation. Use of that format should imply O_DIRECT, so in fact we don't need an explicit 'bypass_cache' parameter in QAPI, just a way to ask for the 'fixed ram' format.
So I would tend to see OPTION 1) as more attractive as QEMU does not need to care about another parameter, whatever has been chosen in libvirt in terms of bypass cache is handled in libvirt.
The 'fixed ram' format will only take care of I/O for the main RAM blocks which are nicely aligned and can be written to aligned file offsets. The general device vmstate I/O probably can't be assumed to be aligned. While we could futz around with QEMUFile so that it bounce buffers vmstate to an aligned region and flushes it in page sized chunks that's probably too much of a pain. IOW, actually I think what QEMU would likely want to do is 1. qemu_open -> get a FD *without* O_DIRECT set 2. write some vmstate stuff 3. turn on O_DIRECT 4. write RAM in fixed locations 5. turn off O_DIRECT 6. write remaining vmstate With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 10/11/23 17:29, Daniel P. Berrangé wrote:
On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote:
On 10/11/23 16:05, Daniel P. Berrangé wrote:
Instead of using 'getfd' though we have to use 'add-fd'.
Anyway, this lets us do FD passing as normal, whle also letting us specify the offset.
{"execute": "add-fd", "arguments": {"fdset-id":"migrate"}} {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
Hi Daniel, the "add-fd" is the part that I don't understand at all, should we actually pass an fd there like with fd-get, already open with the savevm file? Something in pseudocode like: virsh qemu-monitor-command --pass-fds 10 --cmd='{"execute": "add-fd", "arguments": {"fdset-id":10}} ? should we use "opaque" instead of "fdset-id" if you want to actually set it to "migrate"? And how to reference it later? virsh qemu-monitor-command --cmd='{"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}} ? "opaque" does not seem to get me a reachable /dev/fdset/migrate though. I can currently trigger the migration to the URI file:/mnt/nvme/savevm so that seems to work fine, it's the file:/dev/fdset part that I am still unable to glue together. Thanks for any idea, Claudio
Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about, but who should own it, libvirt or QEMU?
How about both :-)
I need to familiarize a bit with this, there are pieces I am missing. Can you correct here?
OPTION 1)
libvirt opens the file and has the FD, writes the header, marks the offset, then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of the dup to "O_DIRECT" (the usual case) depending on --bypass-cache, pass the duped FD to QEMU, QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter), then libvirt closes the duped fd libvirt rewrites the header using the original fd (needed to update the metadata), libvirt closes the original fd
OPTION 2)
libvirt opens the file and has the FD, writes the header, marks the offset, then we pass the FD to QEMU, QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter, QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter), QEMU closes the duped FD, libvirt rewrites the header using the original fd (needed to update the metadata), libvirt closes the original fd
I don't remember if QEMU changes for the file offsets optimization are already "block friendly" ie they operate correctly whatever the state of O_DIRECT or ~O_DIRECT, I think so. They have been thought with O_DIRECT in mind.
The 'file' protocol as it exists currently is not O_DIRECT capable. It is not writing aligned buffers to aligned offsets in the file. It is still running the regular old migration stream format over the file, not taking advantage of it being random access.
What's needed is the followup "fixed ram" format adaptation. Use of that format should imply O_DIRECT, so in fact we don't need an explicit 'bypass_cache' parameter in QAPI, just a way to ask for the 'fixed ram' format.
So I would tend to see OPTION 1) as more attractive as QEMU does not need to care about another parameter, whatever has been chosen in libvirt in terms of bypass cache is handled in libvirt.
The 'fixed ram' format will only take care of I/O for the main RAM blocks which are nicely aligned and can be written to aligned file offsets. The general device vmstate I/O probably can't be assumed to be aligned. While we could futz around with QEMUFile so that it bounce buffers vmstate to an aligned region and flushes it in page sized chunks that's probably too much of a pain.
IOW, actually I think what QEMU would likely want to do is
1. qemu_open -> get a FD *without* O_DIRECT set 2. write some vmstate stuff 3. turn on O_DIRECT 4. write RAM in fixed locations 5. turn off O_DIRECT 6. write remaining vmstate
With regards, Daniel

On Mon, Oct 23, 2023 at 04:06:53PM +0200, Claudio Fontana wrote:
On 10/11/23 17:29, Daniel P. Berrangé wrote:
On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote:
On 10/11/23 16:05, Daniel P. Berrangé wrote:
Instead of using 'getfd' though we have to use 'add-fd'.
Anyway, this lets us do FD passing as normal, whle also letting us specify the offset.
{"execute": "add-fd", "arguments": {"fdset-id":"migrate"}} {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
Hi Daniel,
the "add-fd" is the part that I don't understand at all,
should we actually pass an fd there like with fd-get, already open with the savevm file? Something in pseudocode like:
virsh qemu-monitor-command --pass-fds 10 --cmd='{"execute": "add-fd", "arguments": {"fdset-id":10}} ?
should we use "opaque" instead of "fdset-id" if you want to actually set it to "migrate"? And how to reference it later?
virsh qemu-monitor-command --cmd='{"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}
?
"opaque" does not seem to get me a reachable /dev/fdset/migrate though.
I can currently trigger the migration to the URI file:/mnt/nvme/savevm so that seems to work fine, it's the file:/dev/fdset part that I am still unable to glue together.
Sorry, I was mis-remembering the fdset usage. They have to be used via a numeric ID value, not a named set. The numeric IDs don't match the FD numbers either. So I think it would be : virsh qemu-monitor-command --pass-fds 10 --cmd='{"execute": "add-fd", "arguments": {"fdset-id":7}} ? virsh qemu-monitor-command --cmd='{"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/7,offset=124456"}} With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
participants (2)
-
Claudio Fontana
-
Daniel P. Berrangé