
On 4/26/24 4:04 AM, Daniel P. Berrangé wrote:
On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
A good starting point on this journey is supporting the new mapped-ram capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how to detect if a saved image is in mapped-ram format vs the existing, sequential stream format.
Yes, we'll need to be supporting 'mapped-ram', so a good first step.
A question is whether we make that feature mandatory for all save images, or implied by another feature (parallel save), or an directly controllable feature with opt-in.
It feels more like an implementation detail.
The former breaks back compat with existnig libvirt, while the latter 2 options are net new so don't have compat implications.
In terms of actual data blocks written on disk mapped-ram should be be the same size, or smaller, than the existing format.
In terms of logical file size, however, mapped-ram will almost always be larger.
Correct. E.g. from a mostly idle 8G VM # stat existing-format.sav Size: 510046983 Blocks: 996192 IO Block: 4096 regular file # stat mapped-ram-format.sav Size: 8597730739 Blocks: 956200 IO Block: 4096 regular file The upside is mapped-ram is bounded, unlike the existing stream, which can result in actual file sizes much greater than RAM size when VM runs a memory intensive workload.
This is because mapped-ram will result in a file whose logical size matches the guest RAM size, plus some header overhead, while being sparse so not all blocks are written.
If tools handling save images aren't sparse-aware this could come across as a surprise and even be considered a regression.
Yes, I already had visions of phone ringing off the hook asking "why are my save images suddenly huge?". But maybe it's tolerable once they realize actual blocks used, and when combined with parallel they could also be asking "why are saves suddenly so fast?" :-).
Mapped ram is needed for parallel saves since it lets each thread write to a specific region of the file.
Mapped ram is good for non-parallel saves too though, because the mapping of RAM into the file is aligned suitably to allow for O_DIRECT to be used. Currently libvirt has to tunnnel over its iohelper to futz alignment needed for O_DIRECT. This makes it desirable to use in general, but back compat hurts...
My POC avoids the use of iohelper with mapped-ram. It provides qemu with two fds when direct-io has been requested, one opened with O_DIRECT, one without.
Looking at what we did in the past
First time, we stole a element from 'uint32_t unused[..]' in the save header, to add the 'compressed' field, and bumped the version. This prevented old libvirt reading the files. This was needed as adding compression was a non-backwards compatible change. We could have carried on using version 1 for non-compressd fields, but we didn't for some reason. It was a hard compat break.
Next time, we stole a element from 'uint32 unused[..]' in the save header, to add the 'cookie_len' field, but did NOT bump the version. 'unused' is always all zeroes, so new libvirt could detect whether the cookie was present by the len being non-zero. Old libvirt would still load the image, but would be ignoring the cookie data. This was largely harmless.
This time mapped-ram is a non-compatible change, so we need to ensure old libvirt won't try to read the files, which suggests either a save version bump, or we could abuse the 'compressed' field to indicate 'mapped-ram' as a form of compression.
If we did a save version bump, we might want to carrry on using v2 for non mapped ram.
IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and instead must use 'file:'. Does qemu advertise support for that? I couldn't find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in theory we could live without the advertisement.
'mapped-ram' is reported in QMP as a MigrationCapability, so I think we can probe for it directly.
Yes, mapped-ram is reported. Sorry for not being clear, but I was asking if qemu advertised support for the 'file:' migration URI it gained in 8.2? Probably not a problem either way since it predates mapped-ram.
Yes, it is exclusively for use with 'file:' protocol. If we want to use FD passing, then we can still do that with 'file:', by using QEMU's generic /dev/fdset/NNN approach we have with block devices.
It's also not clear when we want to enable the mapped-ram capability. Should it always be enabled if supported by the underlying qemu? One motivation for creating the mapped-ram was to support direct-io of the migration stream in qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. the mapped-ram capability is enabled when user specifies VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd && qemu supports mapped-ram?
One option is to be lazy and have a /etc/libvirt/qemu.conf for the save format version, defaulting to latest v3. Release note that admin/host provisioning apps must set it to v2 if back compat is needed with old libvirt. If we assume new -> old save image loading is relatively rare, that's probably good enough.
IOW, we can
* Bump save version to 3 * Use v3 by default * Add a SAVE_PARALLEL flag which implies mapped-ram, reject if v2 * Use mapped RAM with BYPASS_CACHE for v3, old approach for v2 * Steal another unused field to indicate use of mapped-ram, or perhaps future proof it by declaring a 'features' field. So we don't need to bump version again, just make sure that the libvirt loading an image supports all set features.
This sounds like a reasonable start. Thanks for the feedback. Regards, Jim
Looking ahead, should the mapped-ram capability be required for supporting the VIR_DOMAIN_SAVE_PARALLEL flag? As I understand, parallel save/restore was another motivation for creating the mapped-ram feature. It allows multifd threads to write exclusively to the offsets provided by mapped-ram. Can multiple multifd threads concurrently write to an fd without mapped-ram?
Yes, mapped-ram should be a pre-requisite.
With regards, Daniel