Daniel P. Berrangé <berrange(a)redhat.com> writes:
On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel
wrote:
> Introduce support for QEMU's new mapped-ram stream format [1].
> mapped-ram is enabled by default if the underlying QEMU advertises
> the mapped-ram migration capability. It can be disabled by changing
> the 'save_image_version' setting in qemu.conf to version '2'.
>
> To use mapped-ram with QEMU:
> - The 'mapped-ram' migration capability must be set to true
> - The 'multifd' migration capability must be set to true and
> the 'multifd-channels' migration parameter must set to 1
> - QEMU must be provided an fdset containing the migration fd
> - The 'migrate' qmp command is invoked with a URI referencing the
> fdset and an offset where to start writing the data stream, e.g.
>
> {"execute":"migrate",
> "arguments":{"detach":true,"resume":false,
> "uri":"file:/dev/fdset/0,offset=0x11921"}}
>
> The mapped-ram stream, in conjunction with direct IO and multifd
> support provided by subsequent patches, can significantly improve
> the time required to save VM memory state. The following tables
> compare mapped-ram with the existing, sequential save stream. In
> all cases, the save and restore operations are to/from a block
> device comprised of two NVMe disks in RAID0 configuration with
> xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
> columns were scraped from the 'real' time reported by time(1). The
> 'Size' and 'Blocks' columns were provided by the corresponding
> outputs of stat(1).
>
> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
>
> | save | restore |
> | time | time | Size | Blocks
> -----------------------+---------+---------+--------------+--------
> legacy | 6.193s | 4.399s | 985744812 | 1925288
> -----------------------+---------+---------+--------------+--------
> mapped-ram | 5.109s | 1.176s | 34368554354 | 1774472
I'm surprised by the restore time speed up, as I didn't think
mapped-ram should make any perf difference without direct IO
and multifd.
> -----------------------+---------+---------+--------------+--------
> legacy + direct IO | 5.725s | 4.512s | 985765251 | 1925328
> -----------------------+---------+---------+--------------+--------
> mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 | 1774304
Still somewhat surprised by the speed up on restore here too
Hmm, I'm thinking this might be caused by zero page handling. The non
mapped-ram path has an extra buffer_is_zero() and memset() of the hva
page.
Now, is it an issue that mapped-ram skips that memset? I assume guest
memory will always be clear at the start of migration. There won't be a
situation where the destination VM starts with memory already
dirty... *and* the save file is also different, otherwise it wouldn't
make any difference.
>
>> -----------------------+---------+---------+--------------+--------
>> mapped-ram + direct IO | | | |
>> + multifd-channels=8 | 4.421s | 0.845s | 34368554318 | 1774312
>> -------------------------------------------------------------------
>>
>> VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory
>>
>> | save | restore |
>> | time | time | Size | Blocks
>> -----------------------+---------+---------+--------------+---------
>> legacy | 25.800s | 14.332s | 33154309983 | 64754512
>> -----------------------+---------+---------+--------------+---------
>> mapped-ram | 18.742s | 15.027s | 34368559228 | 64617160
>> -----------------------+---------+---------+--------------+---------
>> legacy + direct IO | 13.115s | 18.050s | 33154310496 | 64754520
>> -----------------------+---------+---------+--------------+---------
>> mapped-ram + direct IO | 13.623s | 15.959s | 34368557392 | 64662040
>
> These figures make more sense with restore time matching save time
> more or less.
>
>> -----------------------+-------- +---------+--------------+---------
>> mapped-ram + direct IO | | | |
>> + multifd-channels=8 | 6.994s | 6.470s | 34368554980 | 64665776
>> --------------------------------------------------------------------
>>
>> As can be seen from the tables, one caveat of mapped-ram is the logical
>> file size of a saved image is basically equivalent to the VM memory size.
>> Note however that mapped-ram typically uses fewer blocks on disk.
>>
>> Another caveat of mapped-ram is the requirement for a seekable file
>> descriptor, which currently makes it incompatible with libvirt's
>> support for save image compression. Also note the mapped-ram stream
>> is incompatible with the existing stream format, hence mapped-ram
>> cannot be used to restore an image saved with the existing format
>> and vice versa.
>>
>> [1]
https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/m...
>>
>> Signed-off-by: Jim Fehlig <jfehlig(a)suse.com>
>> ---
>> src/qemu/qemu_driver.c | 20 ++++--
>> src/qemu/qemu_migration.c | 139 ++++++++++++++++++++++++++------------
>> src/qemu/qemu_migration.h | 4 +-
>> src/qemu/qemu_monitor.c | 36 ++++++++++
>> src/qemu/qemu_monitor.h | 4 ++
>> src/qemu/qemu_saveimage.c | 43 +++++++++---
>> src/qemu/qemu_saveimage.h | 2 +
>> src/qemu/qemu_snapshot.c | 9 ++-
>> 8 files changed, 195 insertions(+), 62 deletions(-)
>>
>
>
>
>> diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c
>> index 6f2ce40124..98a1ad638d 100644
>> --- a/src/qemu/qemu_saveimage.c
>> +++ b/src/qemu/qemu_saveimage.c
>> @@ -96,6 +96,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData,
virQEMUSaveDataFree);
>> */
>> virQEMUSaveData *
>> virQEMUSaveDataNew(virQEMUDriver *driver,
>> + virDomainObj *vm,
>> char *domXML,
>> qemuDomainSaveCookie *cookieObj,
>> bool running,
>> @@ -115,6 +116,19 @@ virQEMUSaveDataNew(virQEMUDriver *driver,
>> header = &data->header;
>> memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic));
>> header->version = cfg->saveImageVersion;
>> +
>> + /* Enable mapped-ram feature if available and save version >= 3 */
>> + if (header->version >= QEMU_SAVE_VERSION &&
>> + qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_MAPPED_RAM)) {
>> + if (compressed != QEMU_SAVE_FORMAT_RAW) {
>> + virReportError(VIR_ERR_OPERATION_FAILED,
>> + _("compression is not supported with save image
version %1$u"),
>> + header->version);
>> + goto error;
>> + }
>> + header->features |= QEMU_SAVE_FEATURE_MAPPED_RAM;
>> + }
>
> If the QEMU we're usnig doesnt have CAP_MAPPED_RAM, then I think
> we should NOT default to Version 3 save images, as that's creating
> a backcompat problem for zero user benefit.
>
> This suggests that in qemu_conf.c, we should initialize the
> default value to '0', and then in this code, if we see
> version 0 we should pick either 2 or 3 depending on mapped
> ram.
>
>> +
>> header->was_running = running ? 1 : 0;
>> header->compressed = compressed;
>>
>
> With regards,
> Daniel