On 10/11/23 17:29, Daniel P. Berrangé wrote:
On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote:
>
> On 10/11/23 16:05, Daniel P. Berrangé wrote:
>>
>> Instead of using 'getfd' though we have to use 'add-fd'.
>>
>> Anyway, this lets us do FD passing as normal, whle also
>> letting us specify the offset.
>>
>> {"execute": "add-fd", "arguments":
{"fdset-id":"migrate"}}
>> {"execute": "migrate", "arguments":
{"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
Hi Daniel,
the "add-fd" is the part that I don't understand at all,
should we actually pass an fd there like with fd-get, already open with the savevm file?
Something in pseudocode like:
virsh qemu-monitor-command --pass-fds 10 --cmd='{"execute":
"add-fd", "arguments": {"fdset-id":10}} ?
should we use "opaque" instead of "fdset-id" if you want to actually
set it to "migrate"?
And how to reference it later?
virsh qemu-monitor-command --cmd='{"execute": "migrate",
"arguments":
{"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}
?
"opaque" does not seem to get me a reachable /dev/fdset/migrate though.
I can currently trigger the migration to the URI file:/mnt/nvme/savevm so that seems to
work fine,
it's the file:/dev/fdset part that I am still unable to glue together.
Thanks for any idea,
Claudio
>>
>>> Internally, the QEMU multifd code just reads and writes using pread, pwrite,
so there is in any case just one fd to worry about,
>>> but who should own it, libvirt or QEMU?
>>
>> How about both :-)
>
> I need to familiarize a bit with this, there are pieces I am missing. Can you correct
here?
>
> OPTION 1)
>
> libvirt opens the file and has the FD, writes the header, marks the offset,
> then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of
the dup to "O_DIRECT" (the usual case) depending on --bypass-cache,
> pass the duped FD to QEMU,
> QEMU does all the pread/pwrite on it with the correct offset (since it knows it from
the file:// URI optional offset parameter),
> then libvirt closes the duped fd
> libvirt rewrites the header using the original fd (needed to update the metadata),
> libvirt closes the original fd
>
>
> OPTION 2)
>
> libvirt opens the file and has the FD, writes the header, marks the offset,
> then we pass the FD to QEMU,
> QEMU dups the FD and sets it as "O_DIRECT" depending on a passed
parameter,
> QEMU does all the pread/pwrite on it with the correct offset (since it knows it from
the file:// URI optional offset parameter),
> QEMU closes the duped FD,
> libvirt rewrites the header using the original fd (needed to update the metadata),
> libvirt closes the original fd
>
>
> I don't remember if QEMU changes for the file offsets optimization are already
"block friendly" ie they operate correctly whatever the state of O_DIRECT or
~O_DIRECT,
> I think so. They have been thought with O_DIRECT in mind.
The 'file' protocol as it exists currently is not O_DIRECT
capable. It is not writing aligned buffers to aligned offsets
in the file. It is still running the regular old migration
stream format over the file, not taking advantage of it being
random access.
What's needed is the followup "fixed ram" format adaptation.
Use of that format should imply O_DIRECT, so in fact we
don't need an explicit 'bypass_cache' parameter in QAPI,
just a way to ask for the 'fixed ram' format.
> So I would tend to see OPTION 1) as more attractive as QEMU does not need to care
about another parameter, whatever has been chosen in libvirt in terms of bypass cache is
handled in libvirt.
The 'fixed ram' format will only take care of I/O for the
main RAM blocks which are nicely aligned and can be written
to aligned file offsets. The general device vmstate I/O
probably can't be assumed to be aligned. While we could
futz around with QEMUFile so that it bounce buffers vmstate
to an aligned region and flushes it in page sized chunks
that's probably too much of a pain.
IOW, actually I think what QEMU would likely want to
do is
1. qemu_open -> get a FD *without* O_DIRECT set
2. write some vmstate stuff
3. turn on O_DIRECT
4. write RAM in fixed locations
5. turn off O_DIRECT
6. write remaining vmstate
With regards,
Daniel