
On 4/26/24 4:04 AM, Daniel P. Berrangé wrote:
On Wed, Apr 17, 2024 at 05:12:27PM -0600, Jim Fehlig via Devel wrote:
A good starting point on this journey is supporting the new mapped-ram capability in qemu 9.0 [2]. Since mapped-ram is a new on-disk format, I assume we'll need a new QEMU_SAVE_VERSION 3 when using it? Otherwise I'm not sure how to detect if a saved image is in mapped-ram format vs the existing, sequential stream format.
Yes, we'll need to be supporting 'mapped-ram', so a good first step.
A question is whether we make that feature mandatory for all save images, or implied by another feature (parallel save), or an directly controllable feature with opt-in.
The former breaks back compat with existnig libvirt, while the latter 2 options are net new so don't have compat implications.
In terms of actual data blocks written on disk mapped-ram should be be the same size, or smaller, than the existing format.
In terms of logical file size, however, mapped-ram will almost always be larger.
This is because mapped-ram will result in a file whose logical size matches the guest RAM size, plus some header overhead, while being sparse so not all blocks are written.
If tools handling save images aren't sparse-aware this could come across as a surprise and even be considered a regression.
Mapped ram is needed for parallel saves since it lets each thread write to a specific region of the file.
Mapped ram is good for non-parallel saves too though, because the mapping of RAM into the file is aligned suitably to allow for O_DIRECT to be used. Currently libvirt has to tunnnel over its iohelper to futz alignment needed for O_DIRECT. This makes it desirable to use in general, but back compat hurts...
Looking at what we did in the past
First time, we stole a element from 'uint32_t unused[..]' in the save header, to add the 'compressed' field, and bumped the version. This prevented old libvirt reading the files. This was needed as adding compression was a non-backwards compatible change. We could have carried on using version 1 for non-compressd fields, but we didn't for some reason. It was a hard compat break.
Hmm, libvirt's implementation of compression seems to conflict with mapped-ram. AFAIK, mapped-ram requires a seekable fd. Should the two be mutually exclusive?
Next time, we stole a element from 'uint32 unused[..]' in the save header, to add the 'cookie_len' field, but did NOT bump the version. 'unused' is always all zeroes, so new libvirt could detect whether the cookie was present by the len being non-zero. Old libvirt would still load the image, but would be ignoring the cookie data. This was largely harmless.
This time mapped-ram is a non-compatible change, so we need to ensure old libvirt won't try to read the files, which suggests either a save version bump, or we could abuse the 'compressed' field to indicate 'mapped-ram' as a form of compression.
If we did a save version bump, we might want to carrry on using v2 for non mapped ram.
IIUC, mapped-ram cannot be used with the exiting 'fd:' migration URI and instead must use 'file:'. Does qemu advertise support for that? I couldn't find it. If not, 'file:' (available in qemu 8.2) predates mapped-ram, so in theory we could live without the advertisement.
'mapped-ram' is reported in QMP as a MigrationCapability, so I think we can probe for it directly.
Yes, it is exclusively for use with 'file:' protocol. If we want to use FD passing, then we can still do that with 'file:', by using QEMU's generic /dev/fdset/NNN approach we have with block devices.
It's also not clear when we want to enable the mapped-ram capability. Should it always be enabled if supported by the underlying qemu? One motivation for creating the mapped-ram was to support direct-io of the migration stream in qemu, in which case it could be tied to VIR_DOMAIN_SAVE_BYPASS_CACHE. E.g. the mapped-ram capability is enabled when user specifies VIR_DOMAIN_SAVE_BYPASS_CACHE && user-provided path results in a seekable fd && qemu supports mapped-ram?
One option is to be lazy and have a /etc/libvirt/qemu.conf for the save format version, defaulting to latest v3. Release note that admin/host provisioning apps must set it to v2 if back compat is needed with old libvirt. If we assume new -> old save image loading is relatively rare, that's probably good enough.
IOW, we can
* Bump save version to 3 * Use v3 by default
Using mapped-ram by default but not supporting compression would be a regression, right? E.g. 'virsh save vm-name /some/path' would suddenly fail if user's /etc/libvirt/qemu.conf contained 'save_image_format = "lzop"'. Regards, Jim
* Add a SAVE_PARALLEL flag which implies mapped-ram, reject if v2 * Use mapped RAM with BYPASS_CACHE for v3, old approach for v2 * Steal another unused field to indicate use of mapped-ram, or perhaps future proof it by declaring a 'features' field. So we don't need to bump version again, just make sure that the libvirt loading an image supports all set features.
Looking ahead, should the mapped-ram capability be required for supporting the VIR_DOMAIN_SAVE_PARALLEL flag? As I understand, parallel save/restore was another motivation for creating the mapped-ram feature. It allows multifd threads to write exclusively to the offsets provided by mapped-ram. Can multiple multifd threads concurrently write to an fd without mapped-ram?
Yes, mapped-ram should be a pre-requisite.
With regards, Daniel