On Wed, Oct 11, 2023 at 04:56:12PM +0200, Claudio Fontana wrote:
On 10/11/23 16:05, Daniel P. Berrangé wrote:
>
> Instead of using 'getfd' though we have to use 'add-fd'.
>
> Anyway, this lets us do FD passing as normal, whle also
> letting us specify the offset.
>
> {"execute": "add-fd", "arguments":
{"fdset-id":"migrate"}}
> {"execute": "migrate", "arguments":
{"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
>
>> Internally, the QEMU multifd code just reads and writes using pread, pwrite, so
there is in any case just one fd to worry about,
>> but who should own it, libvirt or QEMU?
>
> How about both :-)
I need to familiarize a bit with this, there are pieces I am missing. Can you correct
here?
OPTION 1)
libvirt opens the file and has the FD, writes the header, marks the offset,
then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of the
dup to "O_DIRECT" (the usual case) depending on --bypass-cache,
pass the duped FD to QEMU,
QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the
file:// URI optional offset parameter),
then libvirt closes the duped fd
libvirt rewrites the header using the original fd (needed to update the metadata),
libvirt closes the original fd
OPTION 2)
libvirt opens the file and has the FD, writes the header, marks the offset,
then we pass the FD to QEMU,
QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter,
QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the
file:// URI optional offset parameter),
QEMU closes the duped FD,
libvirt rewrites the header using the original fd (needed to update the metadata),
libvirt closes the original fd
I don't remember if QEMU changes for the file offsets optimization are already
"block friendly" ie they operate correctly whatever the state of O_DIRECT or
~O_DIRECT,
I think so. They have been thought with O_DIRECT in mind.
The 'file' protocol as it exists currently is not O_DIRECT
capable. It is not writing aligned buffers to aligned offsets
in the file. It is still running the regular old migration
stream format over the file, not taking advantage of it being
random access.
What's needed is the followup "fixed ram" format adaptation.
Use of that format should imply O_DIRECT, so in fact we
don't need an explicit 'bypass_cache' parameter in QAPI,
just a way to ask for the 'fixed ram' format.
So I would tend to see OPTION 1) as more attractive as QEMU does not
need to care about another parameter, whatever has been chosen in libvirt in terms of
bypass cache is handled in libvirt.
The 'fixed ram' format will only take care of I/O for the
main RAM blocks which are nicely aligned and can be written
to aligned file offsets. The general device vmstate I/O
probably can't be assumed to be aligned. While we could
futz around with QEMUFile so that it bounce buffers vmstate
to an aligned region and flushes it in page sized chunks
that's probably too much of a pain.
IOW, actually I think what QEMU would likely want to
do is
1. qemu_open -> get a FD *without* O_DIRECT set
2. write some vmstate stuff
3. turn on O_DIRECT
4. write RAM in fixed locations
5. turn off O_DIRECT
6. write remaining vmstate
With regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|