Hi Daniel,
thanks for your answer,
On 10/11/23 16:05, Daniel P. Berrangé wrote:
On Wed, Oct 11, 2023 at 03:46:59PM +0200, Claudio Fontana wrote:
> In terms of our use case, we would need to trigger these migrations from virsh save,
restore, managedsave / start.
>
> 1) Can you confirm this is still a good target?
IIRC the 'dump' command also has a codepath that can exercise
the migrate-to-file logic too.
> It would seem right from my perspective to hook up save/restore first, and then reuse
the same mechanism for managedsave / start.
All of save, restore, managedsave, start, dump end up calling
into the same internal helper methods. So once you update these
helpers, you essentially get all the commands converted in one
go.
ok
> 2) Do we expect to pass filename or file descriptor from libvirt into QEMU?
>
>
> As is, libvirt today generally passes an already opened file descriptor to QEMU for
migrations, roughly:
>
> {"execute": "getfd", "arguments":
{"fdname":"migrate"}} (passing the already open fd from libvirt f.e.
10)
> {"execute": "migrate", "arguments":
{"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"}}'
>
> Do we want to change libvirt to migrate to a file: URI ? Does this have consequence
for "labeling" / security sandboxing?
>
> Or would it be better to continue opening the fd in libvirt, writing the libvirt
header, and then passing the existing open fd to QEMU, using QMP command
"getfd",
> followed by "migrate"? In this second case we would need to inform QEMU of
the offset into the already open fd.
How about both :-)
The current migration 'fd' protocol technically can cope with
any type of FD being passed on. QEMU doesn't try to interpret
the FD type right to any significant degree.
The 'file' protocol is explicitly providing a migration transport
supporting random access I/O to storage. As such we can specify
the offset too.
Now the neat trick is that 'file' protocol impl uses
qio_channel_file and this in turn uses qemu_open,
which supports FD passing.
Interesting!
Instead of using 'getfd' though we have to use 'add-fd'.
Anyway, this lets us do FD passing as normal, whle also
letting us specify the offset.
{"execute": "add-fd", "arguments":
{"fdset-id":"migrate"}}
{"execute": "migrate", "arguments":
{"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
> Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there
is in any case just one fd to worry about,
> but who should own it, libvirt or QEMU?
How about both :-)
I need to familiarize a bit with this, there are pieces I am missing. Can you correct
here?
OPTION 1)
libvirt opens the file and has the FD, writes the header, marks the offset,
then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of the dup
to "O_DIRECT" (the usual case) depending on --bypass-cache,
pass the duped FD to QEMU,
QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the
file:// URI optional offset parameter),
then libvirt closes the duped fd
libvirt rewrites the header using the original fd (needed to update the metadata),
libvirt closes the original fd
OPTION 2)
libvirt opens the file and has the FD, writes the header, marks the offset,
then we pass the FD to QEMU,
QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter,
QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the
file:// URI optional offset parameter),
QEMU closes the duped FD,
libvirt rewrites the header using the original fd (needed to update the metadata),
libvirt closes the original fd
I don't remember if QEMU changes for the file offsets optimization are already
"block friendly" ie they operate correctly whatever the state of O_DIRECT or
~O_DIRECT,
I think so. They have been thought with O_DIRECT in mind.
So I would tend to see OPTION 1) as more attractive as QEMU does not need to care about
another parameter, whatever has been chosen in libvirt in terms of bypass cache is handled
in libvirt.
Please correct my understanding where needed, thanks!
Claudio
Libvirt will open the file, in order to write its header.
Then libvirt passes the open FD to QEMU, specifying the
offset, and QEMU does its thing with vmstate, etc and
closes the FD when its done. libvirt's copy of the FD
is still open, and libvirt can finalize its header and
close the FD.
> 3) How do we deal with O_DIRECT? In the prototype we were setting the O_DIRECT on the
fd from libvirt in response to the user request for --bypass-cache,
> which is needed 99% of the time with large VMs. I think I remember that we plan to
write from libvirt normally (without O_DIRECT) and then set the flag later,
> but should libvirt or QEMU set the O_DIRECT flag? This likely depends on who owns the
fd?
For O_DIRECT, the 'file' protocol should gain a new parameter
'bypass_cache: bool'. If this is set to 'true' then QEMU can
set O_DIRECT on the FD it opens or receives from libvirt.
Libvirt probably just has to be careful to unset O_DIRECT
at the end before it finalizes the header.
With regards,
Daniel