Re: [libvirt RFCv11 00/33] multifd save restore prototype

On 11/27/23 11:18, Daniel P. Berrangé wrote:
On Mon, Nov 27, 2023 at 10:43:58AM +0100, Claudio Fontana wrote:
Hi all,
I understand there has been some movement in this topic as the fixed-offset ram and multifd code evolves.
I think I understood that now the idea is to pass from libvirt to QEMU two file descriptors, one for writing metadata, and a separate one for the actual memory pages, which is the one that can potentially be O_DIRECT.
We determined that O_DIRECT changes propagate across dup'd file descriptors, so we have only two choices
* 1 FD, and QEMU has to take care to toggle O_DIRECT on/off repeatedly depending on what phase it is in * 2 FDs, one with and one without O_DIRECT
Either is viable for libvirt. I have a mild preference for having 1 FD, but not enough to call it a design blocker. So at the discretion of whomever implements the QEMU part.
1) I would assume that libvirt would then need to check if the user requested --parallel / --parallel-connections to enable QEMU multifd.
Yes
2) I would also assume that libvirt would check the presence of --bypass-cache as the condition to set O_DIRECT on the second (memory pages fd), and to enable QEMU "io-direct" feature.
Yes
3) I would tentatively suggest that when it comes to fixed-ram-offset, the condition to enable that one is a check like the one currently in libvirt:
src/util/virfile.c::virFileDiskCopy()
ie checking that we are writing to a seekable file that is not S_ISBLK .
Does this match your understanding/reasoning?
Both the io-direct and fixed-ram-offset features are dependent on new QEMU impls, so there is a mild backwards compatibility concern.
ie, lets say if we are running QEMU 9.0.0, but with old machine type pc-i440fx-8.0.0, and we save the state, but want to then restore with QEMU 8.2.0.
Essentially we must NOT use io-direct/fixed-ram-offset if we want the ability to migrate to older QEMU. At the same time I would like us to be able to take advantage of this new QEMU support to the greatest extent possible, even if not doing the --parallel stuff which was the original motivator.
Thus, we need some way to decide whether to use the new or the old on disk format.
I wonder if having a setting in '/etc/libvirt/qemu.conf' is sufficient, or whether we must expose a flag via the API.
With regards, Daniel
Thanks Daniel, that's an interesting point. The new fixed-ram-offset format is a QEMU format, and as such I presume that in theory there is a qemu_saveimage.h:#define QEMU_SAVE_VERSION 2 that could be bumped to 3 if this new format is used? But then again, libvirt would need to decide whether to save in "old QEMU compatibility mode" or in the new QEMU_SAVE_VERSION 3 mode that allows for fixed-ram-offset. Maybe a new libvirt option for controlling which QEMU_SAVE_VERSION format to use for the save, with the default being v2 for backward compatibility reasons? Thanks, Claudio

On Mon, Nov 27, 2023 at 11:40:29AM +0100, Claudio Fontana wrote:
On 11/27/23 11:18, Daniel P. Berrangé wrote:
On Mon, Nov 27, 2023 at 10:43:58AM +0100, Claudio Fontana wrote:
Hi all,
I understand there has been some movement in this topic as the fixed-offset ram and multifd code evolves.
I think I understood that now the idea is to pass from libvirt to QEMU two file descriptors, one for writing metadata, and a separate one for the actual memory pages, which is the one that can potentially be O_DIRECT.
We determined that O_DIRECT changes propagate across dup'd file descriptors, so we have only two choices
* 1 FD, and QEMU has to take care to toggle O_DIRECT on/off repeatedly depending on what phase it is in * 2 FDs, one with and one without O_DIRECT
Either is viable for libvirt. I have a mild preference for having 1 FD, but not enough to call it a design blocker. So at the discretion of whomever implements the QEMU part.
1) I would assume that libvirt would then need to check if the user requested --parallel / --parallel-connections to enable QEMU multifd.
Yes
2) I would also assume that libvirt would check the presence of --bypass-cache as the condition to set O_DIRECT on the second (memory pages fd), and to enable QEMU "io-direct" feature.
Yes
3) I would tentatively suggest that when it comes to fixed-ram-offset, the condition to enable that one is a check like the one currently in libvirt:
src/util/virfile.c::virFileDiskCopy()
ie checking that we are writing to a seekable file that is not S_ISBLK .
Does this match your understanding/reasoning?
Both the io-direct and fixed-ram-offset features are dependent on new QEMU impls, so there is a mild backwards compatibility concern.
ie, lets say if we are running QEMU 9.0.0, but with old machine type pc-i440fx-8.0.0, and we save the state, but want to then restore with QEMU 8.2.0.
Essentially we must NOT use io-direct/fixed-ram-offset if we want the ability to migrate to older QEMU. At the same time I would like us to be able to take advantage of this new QEMU support to the greatest extent possible, even if not doing the --parallel stuff which was the original motivator.
Thus, we need some way to decide whether to use the new or the old on disk format.
I wonder if having a setting in '/etc/libvirt/qemu.conf' is sufficient, or whether we must expose a flag via the API.
With regards, Daniel
Thanks Daniel,
that's an interesting point. The new fixed-ram-offset format is a QEMU format, and as such I presume that in theory there is a
qemu_saveimage.h:#define QEMU_SAVE_VERSION 2
that could be bumped to 3 if this new format is used?
But then again, libvirt would need to decide whether to save in "old QEMU compatibility mode" or in the new QEMU_SAVE_VERSION 3 mode that allows for fixed-ram-offset.
Maybe a new libvirt option for controlling which QEMU_SAVE_VERSION format to use for the save, with the default being v2 for backward compatibility reasons?
What makes me reluctant to add public API, is that generally I feel that things like file format versions are internal/private impl details that mgmt apps should not need to care about. Even the compatibility problem should only be a short term issue, since there's only limited version combinations that end up getting used in practice, and so called "backwards" version compat usage is even rarer. I guess if we stick with a qemu.conf setting initially, we can revisit later if we find an API control knob is requird. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On 11/27/23 11:50, Daniel P. Berrangé wrote:
On Mon, Nov 27, 2023 at 11:40:29AM +0100, Claudio Fontana wrote:
On 11/27/23 11:18, Daniel P. Berrangé wrote:
On Mon, Nov 27, 2023 at 10:43:58AM +0100, Claudio Fontana wrote:
Hi all,
I understand there has been some movement in this topic as the fixed-offset ram and multifd code evolves.
I think I understood that now the idea is to pass from libvirt to QEMU two file descriptors, one for writing metadata, and a separate one for the actual memory pages, which is the one that can potentially be O_DIRECT.
We determined that O_DIRECT changes propagate across dup'd file descriptors, so we have only two choices
* 1 FD, and QEMU has to take care to toggle O_DIRECT on/off repeatedly depending on what phase it is in * 2 FDs, one with and one without O_DIRECT
Either is viable for libvirt. I have a mild preference for having 1 FD, but not enough to call it a design blocker. So at the discretion of whomever implements the QEMU part.
1) I would assume that libvirt would then need to check if the user requested --parallel / --parallel-connections to enable QEMU multifd.
Yes
2) I would also assume that libvirt would check the presence of --bypass-cache as the condition to set O_DIRECT on the second (memory pages fd), and to enable QEMU "io-direct" feature.
Yes
3) I would tentatively suggest that when it comes to fixed-ram-offset, the condition to enable that one is a check like the one currently in libvirt:
src/util/virfile.c::virFileDiskCopy()
ie checking that we are writing to a seekable file that is not S_ISBLK .
Does this match your understanding/reasoning?
Both the io-direct and fixed-ram-offset features are dependent on new QEMU impls, so there is a mild backwards compatibility concern.
ie, lets say if we are running QEMU 9.0.0, but with old machine type pc-i440fx-8.0.0, and we save the state, but want to then restore with QEMU 8.2.0.
Essentially we must NOT use io-direct/fixed-ram-offset if we want the ability to migrate to older QEMU. At the same time I would like us to be able to take advantage of this new QEMU support to the greatest extent possible, even if not doing the --parallel stuff which was the original motivator.
Thus, we need some way to decide whether to use the new or the old on disk format.
I wonder if having a setting in '/etc/libvirt/qemu.conf' is sufficient, or whether we must expose a flag via the API.
With regards, Daniel
Thanks Daniel,
that's an interesting point. The new fixed-ram-offset format is a QEMU format, and as such I presume that in theory there is a
qemu_saveimage.h:#define QEMU_SAVE_VERSION 2
that could be bumped to 3 if this new format is used?
But then again, libvirt would need to decide whether to save in "old QEMU compatibility mode" or in the new QEMU_SAVE_VERSION 3 mode that allows for fixed-ram-offset.
Maybe a new libvirt option for controlling which QEMU_SAVE_VERSION format to use for the save, with the default being v2 for backward compatibility reasons?
What makes me reluctant to add public API, is that generally I feel that things like file format versions are internal/private impl details that mgmt apps should not need to care about.
Even the compatibility problem should only be a short term issue, since there's only limited version combinations that end up getting used in practice, and so called "backwards" version compat usage is even rarer.
I guess if we stick with a qemu.conf setting initially, we can revisit later if we find an API control knob is requird.
With regards, Daniel
I think qemu.conf also works; I'd still default to v2 for a while, we can document that in order to get these additional features setting v3 in qemu.conf is required to save in the new format. Thanks, Claudio
participants (2)
-
Claudio Fontana
-
Daniel P. Berrangé