Re: Revisiting parallel save/restore

26 Apr 2024

      On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
...
A good starting point on this journey is supporting the new mapped-ram
capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I
assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not
sure how to detect if a saved image is in mapped-ram format vs the existing,
sequential stream format.
Yes, we'll need to be supporting 'mapped-ram', so a good first step.

A question is whether we make that feature mandatory for all save images,
or implied by another feature (parallel save), or an directly controllable
feature with opt-in.

The former breaks back compat with existnig libvirt, while the latter 2
options are net new so don't have compat implications.

In terms of actual data blocks written on disk mapped-ram should be be the
same size, or smaller, than the existing format.

In terms of logical file size, however, mapped-ram will almost always be
larger.

This is because mapped-ram will result in a file whose logical size matches
the guest RAM size, plus some header overhead, while being sparse so not
all blocks are written.

If tools handling save images aren't sparse-aware this could come across
as a surprise and even be considered a regression.

Mapped ram is needed for parallel saves since it lets each thread write
to a specific region of the file.

Mapped ram is good for non-parallel saves too though, because the mapping
of RAM into the file is aligned suitably to allow for O_DIRECT to be used.
Currently libvirt has to tunnnel over its iohelper to futz alignment
needed for O_DIRECT. This makes it desirable to use in general, but back
compat hurts...

Looking at what we did in the past

First time, we stole a element from 'uint32_t unused[..]' in the
save header, to add the 'compressed' field, and bumped the
version. This prevented old libvirt reading the files. This was
needed as adding compression was a non-backwards compatible
change. We could have carried on using version 1 for non-compressd
fields, but we didn't for some reason. It was a hard compat break.

Next time, we stole a element from 'uint32 unused[..]' in the
save header, to add the 'cookie_len' field, but did NOT bump
the version. 'unused' is always all zeroes, so new libvirt could
detect whether the cookie was present by the len being non-zero.
Old libvirt would still load the image, but would be ignoring
the cookie data. This was largely harmless.

This time mapped-ram is a non-compatible change, so we need to
ensure old libvirt won't try to read the files, which suggests
either a save version bump, or we could abuse the 'compressed'
field to indicate 'mapped-ram' as a form of compression.

If we did a save version bump, we might want to carrry on using
v2 for non mapped ram.
...
IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and
instead must use 'file:'. Does qemu advertise support for that? I couldn't
find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in
theory we could live without the advertisement.
'mapped-ram' is reported in QMP as a MigrationCapability, so I think we
can probe for it directly.

Yes, it is exclusively for use with 'file:' protocol. If we want to use
FD passing, then we can still do that with 'file:', by using QEMU's
generic /dev/fdset/NNN approach we have with block devices.
...
It's also not clear when we want to enable the mapped-ram capability. Should
it always be enabled if supported by the underlying qemu? One motivation for
creating the mapped-ram was to support direct-io of the migration stream in
qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g.
the mapped-ram capability is enabled when user specifies
VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd
&& qemu supports mapped-ram?
One option is to be lazy and have a /etc/libvirt/qemu.conf for the
save format version, defaulting to latest v3. Release note that
admin/host provisioning apps must set it to v2 if back compat is
needed with old libvirt. If we assume new -> old save image loading
is relatively rare, that's probably good enough.

IOW, we can

 * Bump save version to 3
 * Use v3 by default
 * Add a SAVE_PARALLEL flag which implies mapped-ram, reject
   if v2
 * Use mapped RAM with BYPASS_CACHE for v3, old approach for v2
 * Steal another unused field to indicate use of mapped-ram,
   or perhaps future proof it by declaring a 'features'
   field. So we don't need to bump version again, just make
   sure that the libvirt loading an image supports all
   set features.
...
Looking ahead, should the mapped-ram capability be required for supporting
the VIR_DOMAIN_SAVE_PARALLEL flag? As I understand, parallel save/restore
was another motivation for creating the mapped-ram feature. It allows
multifd threads to write exclusively to the offsets provided by mapped-ram.
Can multiple multifd threads concurrently write to an fd without mapped-ram?
Yes, mapped-ram should be a pre-requisite.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: Revisiting parallel save/restore

Daniel P. Berrangé