Hello Daniel and all @libvirt,
reviving this old thread for multifd save/restore (for high performance migration of a
large VM to an nvme disk and viceversa),
now that we have made substantial progress on point b) below :-)
There is a lot of ongoing work in QEMU spearheaded by Fabiano to migrate in an optimized
way now to/from disk by leveraging among others multifd and file offsets.
A couple questions that come to mind for the libvirt side though.
In terms of our use case, we would need to trigger these migrations from virsh save,
restore, managedsave / start.
1) Can you confirm this is still a good target?
It would seem right from my perspective to hook up save/restore first, and then reuse the
same mechanism for managedsave / start.
2) Do we expect to pass filename or file descriptor from libvirt into QEMU?
As is, libvirt today generally passes an already opened file descriptor to QEMU for
migrations, roughly:
{"execute": "getfd", "arguments":
{"fdname":"migrate"}} (passing the already open fd from libvirt f.e.
10)
{"execute": "migrate", "arguments":
{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"}}'
Do we want to change libvirt to migrate to a file: URI ? Does this have consequence for
"labeling" / security sandboxing?
Or would it be better to continue opening the fd in libvirt, writing the libvirt header,
and then passing the existing open fd to QEMU, using QMP command "getfd",
followed by "migrate"? In this second case we would need to inform QEMU of the
offset into the already open fd.
Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is
in any case just one fd to worry about,
but who should own it, libvirt or QEMU?
3) How do we deal with O_DIRECT? In the prototype we were setting the O_DIRECT on the fd
from libvirt in response to the user request for --bypass-cache,
which is needed 99% of the time with large VMs. I think I remember that we plan to write
from libvirt normally (without O_DIRECT) and then set the flag later,
but should libvirt or QEMU set the O_DIRECT flag? This likely depends on who owns the fd?
Thank you for your thoughts from the libvirt side and your help in resurrecting this
topic,
Claudio
On 6/7/22 11:19, Claudio Fontana wrote:
This is v11 of the multifd save prototype, which focuses on saving
to a single file instead of requiring multiple separate files for
multifd channels.
This series demonstrates a way to save and restore from a single file
by using interleaved channels of a size equal to the transfer buffer
size (1MB), and relying on UNIX holes to avoid wasting physical disk
space due to channels of different size.
KNOWN ISSUES:
a) still applies only to save/restore (no managed save etc)
b) this is not not done in QEMU, where it could be possible to teach
QEMU to migrate directly to a file or block device in a
block-aligned way by altering the migration stream code and the
state migration code of all devices.
changes from v10:
* virfile: add new API virFileDiskCopyChannel, which extends the
existing virFileDiskCopy to work with parallel channels in the file.
* drop use of virthread API, use GLIB for threads.
* pass only a single FD to the multifd-helper, which will then open
additional FDs as required for the multithreaded I/O.
* simplify virQEMUSaveFd API, separating the initialization from the
addition of extra channels.
* adapt all documentation to mention a single file instead of multiple.
* remove the "Lim" versions of virFileDirectRead and Write, they are
not needed.
---
changes from v9:
* exposed virFileDirectAlign
* separated the >= 2 QEMU_SAVE_VERSION change in own patch
* reworked the write code to add the alignment padding to the
data_len, making the on disk format compatible when loaded
from an older libvirt.
* reworked the read code to use direct I/O APIs only for actual
direct I/O file descriptors, so as to make old images work
with newer libvirt.
---
changes from v8:
* rebased on master
* reordered patches to add more upstreamable content at the start
* split introduction of virQEMUSaveFd, so the first part is multifd-free
* new virQEMUSaveDataRead as a mirror of virQEMUSaveDataWrite
* introduced virFileDirect API, using it in virFileDisk operations and
for virQEMUSaveRead and virQEMUSaveWrite
---
changes from v7:
* [ base params API and iohelper refactoring upstreamed ]
* extended the QEMU save image format more, to record the nr
of multifd channels on save. Made the data header struct packed.
* removed --parallel-connections from the restore command, as now
it is useless due to QEMU save image format extension.
* separate out patches to expose migration_params APIs to saveimage,
including qemuMigrationParamsSetString, SetCap, SetInt.
* fixed bugs in the ImageOpen patch (missing saveFd init), removed
some whitespace, and fixed some convoluted code paths for return
value -3.
---
changes from v6:
* improved error path handling, with error messages and especially
cancellation of qemu process on error during restore.
* split patches more and reordered them to keep general refactoring
at the beginning before the --parallel stuff is introduced.
* improved multifd compression support, including adding an enum
and extending the QEMU save image format to record the compression
used on save, and pick it up automatically on restore.
---
changes from v4:
* runIO renamed to virFileDiskCopy and rethought arguments
* renamed new APIs from ...ParametersFlags to ...Params
* introduce the new virDomainSaveParams and virDomainRestoreParams
without any additional parameters, so they can be upstreamed first.
* solved the issue in the gendispatch.pl script generating code that
was missing the conn parameter.
---
changes from v3:
* reordered series to have all helper-related change at the start
* solved all reported issues from ninja test, including documentation
* fixed most broken migration capabilities code (still imperfect likely)
* added G_GNUC_UNUSED as needed
* after multifd restore, added what I think were the missing operations:
qemuProcessRefreshState(),
qemuProcessStartCPUs() - most importantly,
virDomainObjSave()
The domain now starts running after restore without further encouragement
* removed the sleep(10) from the multifd-helper
---
changes from v2:
* added ability to restore the VM from disk using multifd
* fixed the multifd-helper to work in both directions,
assuming the need to listen for save, and connect for restore.
* fixed a large number of bugs, and probably introduced some :-)
---
Claudio Fontana (33):
virfile: introduce virFileDirect APIs
virfile: use virFileDirect API in runIOCopy
qemu: saveimage: rework image read/write to be O_DIRECT friendly
qemu: saveimage: assume future formats will also support compression
virfile: virFileDiskCopy: prepare for O_DIRECT files without wrapper
qemu: saveimage: introduce virQEMUSaveFd
qemu: saveimage: convert qemuSaveImageCreate to use virQEMUSaveFd
qemu: saveimage: convert qemuSaveImageOpen to use virQEMUSaveFd
tools: prepare doSave to use parameters
tools: prepare cmdRestore to use parameters
libvirt: add new VIR_DOMAIN_SAVE_PARALLEL flag and parameter
qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in save
qemu: add stub support for VIR_DOMAIN_SAVE_PARALLEL in restore
virfile: add new API virFileDiskCopyChannel
multifd-helper: new helper for parallel save/restore
qemu: saveimage: update virQEMUSaveFd struct for parallel save
qemu: saveimage: wire up saveimage code with the multifd helper
qemu: capabilities: add multifd to the probed migration capabilities
qemu: saveimage: add multifd related fields to save format
qemu: migration_params: add APIs to set Int and Cap
qemu: migration: implement qemuMigrationSrcToFilesMultiFd for save
qemu: add parameter to qemuMigrationDstRun to skip waiting
qemu: implement qemuSaveImageLoadMultiFd for restore
tools: add parallel parameter to virsh save command
tools: add parallel parameter to virsh restore command
qemu: add migration parameter multifd-compression
libvirt: add new VIR_DOMAIN_SAVE_PARAM_PARALLEL_COMPRESSION
qemu: saveimage: add parallel compression argument to ImageCreate
qemu: saveimage: add stub support for multifd compression parameter
qemu: migration: expose qemuMigrationParamsSetString
qemu: saveimage: implement multifd-compression in parallel save
qemu: saveimage: restore compressed parallel images
tools: add parallel-compression parameter to virsh save command
docs/manpages/virsh.rst | 26 +-
include/libvirt/libvirt-domain.h | 24 +
po/POTFILES | 1 +
src/libvirt_private.syms | 6 +
src/qemu/qemu_capabilities.c | 4 +
src/qemu/qemu_capabilities.h | 2 +
src/qemu/qemu_driver.c | 146 ++--
src/qemu/qemu_migration.c | 160 ++--
src/qemu/qemu_migration.h | 16 +-
src/qemu/qemu_migration_params.c | 71 +-
src/qemu/qemu_migration_params.h | 15 +
src/qemu/qemu_process.c | 3 +-
src/qemu/qemu_process.h | 5 +-
src/qemu/qemu_saveimage.c | 703 +++++++++++++-----
src/qemu/qemu_saveimage.h | 69 +-
src/qemu/qemu_snapshot.c | 6 +-
src/util/iohelper.c | 3 +
src/util/meson.build | 16 +
src/util/multifd-helper.c | 359 +++++++++
src/util/virfile.c | 391 +++++++---
src/util/virfile.h | 11 +
.../caps_4.0.0.aarch64.xml | 1 +
.../qemucapabilitiesdata/caps_4.0.0.ppc64.xml | 1 +
.../caps_4.0.0.riscv32.xml | 1 +
.../caps_4.0.0.riscv64.xml | 1 +
.../qemucapabilitiesdata/caps_4.0.0.s390x.xml | 1 +
.../caps_4.0.0.x86_64.xml | 1 +
.../caps_4.1.0.x86_64.xml | 1 +
.../caps_4.2.0.aarch64.xml | 1 +
.../qemucapabilitiesdata/caps_4.2.0.ppc64.xml | 1 +
.../qemucapabilitiesdata/caps_4.2.0.s390x.xml | 1 +
.../caps_4.2.0.x86_64.xml | 1 +
.../caps_5.0.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_5.0.0.ppc64.xml | 2 +
.../caps_5.0.0.riscv64.xml | 2 +
.../caps_5.0.0.x86_64.xml | 2 +
.../qemucapabilitiesdata/caps_5.1.0.sparc.xml | 2 +
.../caps_5.1.0.x86_64.xml | 2 +
.../caps_5.2.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_5.2.0.ppc64.xml | 2 +
.../caps_5.2.0.riscv64.xml | 2 +
.../qemucapabilitiesdata/caps_5.2.0.s390x.xml | 2 +
.../caps_5.2.0.x86_64.xml | 2 +
.../caps_6.0.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_6.0.0.s390x.xml | 2 +
.../caps_6.0.0.x86_64.xml | 2 +
.../caps_6.1.0.x86_64.xml | 2 +
.../caps_6.2.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_6.2.0.ppc64.xml | 2 +
.../caps_6.2.0.x86_64.xml | 2 +
.../caps_7.0.0.aarch64.xml | 2 +
.../qemucapabilitiesdata/caps_7.0.0.ppc64.xml | 2 +
.../caps_7.0.0.x86_64.xml | 2 +
.../caps_7.1.0.x86_64.xml | 2 +
tools/virsh-domain.c | 101 ++-
55 files changed, 1748 insertions(+), 445 deletions(-)
create mode 100644 src/util/multifd-helper.c