Daniel P. Berrangé <berrange(a)redhat.com> writes:
On Thu, Oct 10, 2024 at 02:52:56PM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange(a)redhat.com> writes:
>
> > On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
> >> Daniel P. Berrangé <berrange(a)redhat.com> writes:
> >>
> >> > On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
> >> >> Introduce support for QEMU's new mapped-ram stream format [1].
> >> >> mapped-ram is enabled by default if the underlying QEMU advertises
> >> >> the mapped-ram migration capability. It can be disabled by
changing
> >> >> the 'save_image_version' setting in qemu.conf to version
'2'.
> >> >>
> >> >> To use mapped-ram with QEMU:
> >> >> - The 'mapped-ram' migration capability must be set to
true
> >> >> - The 'multifd' migration capability must be set to true
and
> >> >> the 'multifd-channels' migration parameter must set to 1
> >> >> - QEMU must be provided an fdset containing the migration fd
> >> >> - The 'migrate' qmp command is invoked with a URI
referencing the
> >> >> fdset and an offset where to start writing the data stream, e.g.
> >> >>
> >> >> {"execute":"migrate",
> >> >>
"arguments":{"detach":true,"resume":false,
> >> >>
"uri":"file:/dev/fdset/0,offset=0x11921"}}
> >> >>
> >> >> The mapped-ram stream, in conjunction with direct IO and multifd
> >> >> support provided by subsequent patches, can significantly improve
> >> >> the time required to save VM memory state. The following tables
> >> >> compare mapped-ram with the existing, sequential save stream. In
> >> >> all cases, the save and restore operations are to/from a block
> >> >> device comprised of two NVMe disks in RAID0 configuration with
> >> >> xfs (~8600MiB/s). The values in the 'save time' and
'restore time'
> >> >> columns were scraped from the 'real' time reported by
time(1). The
> >> >> 'Size' and 'Blocks' columns were provided by the
corresponding
> >> >> outputs of stat(1).
> >> >>
> >> >> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
> >> >>
> >> >> | save | restore |
> >> >> | time | time | Size | Blocks
> >> >>
-----------------------+---------+---------+--------------+--------
> >> >> legacy | 6.193s | 4.399s | 985744812 |
1925288
> >> >>
-----------------------+---------+---------+--------------+--------
> >> >> mapped-ram | 5.109s | 1.176s | 34368554354 |
1774472
> >> >
> >> > I'm surprised by the restore time speed up, as I didn't think
> >> > mapped-ram should make any perf difference without direct IO
> >> > and multifd.
> >> >
> >> >>
-----------------------+---------+---------+--------------+--------
> >> >> legacy + direct IO | 5.725s | 4.512s | 985765251 |
1925328
> >> >>
-----------------------+---------+---------+--------------+--------
> >> >> mapped-ram + direct IO | 4.627s | 1.490s | 34368554354 |
1774304
> >> >
> >> > Still somewhat surprised by the speed up on restore here too
> >>
> >> Hmm, I'm thinking this might be caused by zero page handling. The non
> >> mapped-ram path has an extra buffer_is_zero() and memset() of the hva
> >> page.
> >>
> >> Now, is it an issue that mapped-ram skips that memset? I assume guest
> >> memory will always be clear at the start of migration. There won't be a
> >> situation where the destination VM starts with memory already
> >> dirty... *and* the save file is also different, otherwise it wouldn't
> >> make any difference.
> >
> > Consider the snapshot use case. You're running the VM, so memory
> > has arbitrary contents, now you restore to a saved snapshot. QEMU
> > remains running this whole time and you can't assume initial
> > memory is zeroed. Surely we need the memset ?
>
> Hmm, I probably have a big gap on my knowledge here, but savevm doesn't
> hook into file migration, so there's no way to load a snapshot with
> mapped-ram that I know of. Is this something that libvirt enables
> somehow? There would be no -incoming on the cmdline.
Opps, yes, i always forget savevm is off in its own little world.
Upstream we've talking about making savevm be a facade around the
'migrate' command, but no one has ever made a PoC.
Yeah, that would be nice. Once I learn how the data ends up in the qcow2
image, maybe I can look into adding a new 'snapshot' migration mode to
QEMU.
>
> With regards,
> Daniel