On 4/11/22 8:53 PM, Dr. David Alan Gilbert wrote:
* Claudio Fontana (cfontana(a)suse.de) wrote:
> On 4/7/22 3:57 PM, Claudio Fontana wrote:
>> On 4/7/22 3:53 PM, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfontana(a)suse.de) wrote:
>>>> On 4/5/22 10:35 AM, Dr. David Alan Gilbert wrote:
>>>>> * Claudio Fontana (cfontana(a)suse.de) wrote:
>>>>>> On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
>>>>>>> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana
wrote:
>>>>>>>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>>>>>>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio
Fontana wrote:
>>>>>>>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert
wrote:
>>>>>>>>>>> * Claudio Fontana (cfontana(a)suse.de) wrote:
>>>>>>>>>>>> On 3/17/22 2:41 PM, Claudio Fontana
wrote:
>>>>>>>>>>>>> On 3/17/22 11:25 AM, Daniel P.
Berrangé wrote:
>>>>>>>>>>>>>> On Thu, Mar 17, 2022 at
11:12:11AM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>> On 3/16/22 1:17 PM, Claudio
Fontana wrote:
>>>>>>>>>>>>>>>> On 3/14/22 6:48 PM,
Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>>> On Mon, Mar 14, 2022
at 06:38:31PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>>>> On 3/14/22 6:17
PM, Daniel P. Berrangé wrote:
>>>>>>>>>>>>>>>>>>> On Sat, Mar
12, 2022 at 05:30:01PM +0100, Claudio Fontana wrote:
>>>>>>>>>>>>>>>>>>>> the first
user is the qemu driver,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> virsh
save/resume would slow to a crawl with a default pipe size (64k).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This
improves the situation by 400%.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Going
through io_helper still seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>>>>>>>> compared
with direct qemu migration to a nc socket to a file.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Claudio Fontana <cfontana(a)suse.de>
>>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>>
src/qemu/qemu_driver.c | 6 +++---
>>>>>>>>>>>>>>>>>>>>
src/qemu/qemu_saveimage.c | 11 ++++++-----
>>>>>>>>>>>>>>>>>>>>
src/util/virfile.c | 12 ++++++++++++
>>>>>>>>>>>>>>>>>>>>
src/util/virfile.h | 1 +
>>>>>>>>>>>>>>>>>>>> 4 files
changed, 22 insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hello, I
initially thought this to be a qemu performance issue,
>>>>>>>>>>>>>>>>>>>> so you
can find the discussion about this in qemu-devel:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> "Re:
bad virsh save /dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Current results show these
experimental averages maximum throughput
>>>>>>>>>>>>>>> migrating to /dev/null per
each FdWrapper Pipe Size (as per QEMU QMP
>>>>>>>>>>>>>>> "query-migrate",
tests repeated 5 times for each).
>>>>>>>>>>>>>>> VM Size is 60G, most of the
memory effectively touched before migration,
>>>>>>>>>>>>>>> through user application
allocating and touching all memory with
>>>>>>>>>>>>>>> pseudorandom data.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 64K: 5200 Mbps (current
situation)
>>>>>>>>>>>>>>> 128K: 5800 Mbps
>>>>>>>>>>>>>>> 256K: 20900 Mbps
>>>>>>>>>>>>>>> 512K: 21600 Mbps
>>>>>>>>>>>>>>> 1M: 22800 Mbps
>>>>>>>>>>>>>>> 2M: 22800 Mbps
>>>>>>>>>>>>>>> 4M: 22400 Mbps
>>>>>>>>>>>>>>> 8M: 22500 Mbps
>>>>>>>>>>>>>>> 16M: 22800 Mbps
>>>>>>>>>>>>>>> 32M: 22900 Mbps
>>>>>>>>>>>>>>> 64M: 22900 Mbps
>>>>>>>>>>>>>>> 128M: 22800 Mbps
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This above is the throughput
out of patched libvirt with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ok, its bouncing around with
noise after 1 MB. So I'd suggest that
>>>>>>>>>>>>>> libvirt attempt to raise the pipe
limit to 1 MB by default, but
>>>>>>>>>>>>>> not try to go higher.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As for the theoretical limit
for the libvirt architecture,
>>>>>>>>>>>>>>> I ran a qemu migration
directly issuing the appropriate QMP
>>>>>>>>>>>>>>> commands, setting the same
migration parameters as per libvirt,
>>>>>>>>>>>>>>> and then migrating to a
socket netcatted to /dev/null via
>>>>>>>>>>>>>>> {"execute":
"migrate", "arguments": { "uri",
"unix:///tmp/netcat.sock" } } :
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> QMP: 37000 Mbps
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> So although the Pipe size
improves things (in particular the
>>>>>>>>>>>>>>> large jump is for the 256K
size, although 1M seems a very good value),
>>>>>>>>>>>>>>> there is still a second
bottleneck in there somewhere that
>>>>>>>>>>>>>>> accounts for a loss of ~14200
Mbps in throughput.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Interesting addition: I tested quickly on
a system with faster cpus and larger VM sizes, up to 200GB,
>>>>>>>>>>>> and the difference in throughput libvirt
vs qemu is basically the same ~14500 Mbps.
>>>>>>>>>>>>
>>>>>>>>>>>> ~50000 mbps qemu to netcat socket to
/dev/null
>>>>>>>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>>>>>>>
>>>>>>>>>>>> Seems it is not proportional to cpu speed
by the looks of it (not a totally fair comparison because the VM sizes are different).
>>>>>>>>>>>
>>>>>>>>>>> It might be closer to RAM or cache bandwidth
limited though; for an extra copy.
>>>>>>>>>>
>>>>>>>>>> I was thinking about sendfile(2) in iohelper, but
that probably
>>>>>>>>>> can't work as the input fd is a socket, I am
getting EINVAL.
>>>>>>>>>
>>>>>>>>> Yep, sendfile() requires the input to be a mmapable
FD,
>>>>>>>>> and the output to be a socket.
>>>>>>>>>
>>>>>>>>> Try splice() instead which merely requires 1 end to
be a
>>>>>>>>> pipe, and the other end can be any FD afaik.
>>>>>>>>>
>>>>>>>>
>>>>>>>> I did try splice(), but performance is worse by around
500%.
>>>>>>>
>>>>>>> Hmm, that's certainly unexpected !
>>>>>>>
>>>>>>>> Any ideas welcome,
>>>>>>>
>>>>>>> I learnt there is also a newer copy_file_range call, not
sure if that's
>>>>>>> any better.
>>>>>>>
>>>>>>> You passed len as 1 MB, I wonder if passing MAXINT is viable
? We just
>>>>>>> want to copy everything IIRC.
>>>>>>>
>>>>>>> With regards,
>>>>>>> Daniel
>>>>>>>
>>>>>>
>>>>>> Crazy idea, would trying to use the parallel migration concept
for migrating to/from a file make any sense?
>>>>>>
>>>>>> Not sure if applying the qemu multifd implementation of this
would apply, maybe it could be given another implementation for "toFile", trying
to use more than one cpu to do the transfer?
>>>>>
>>>>> I can't see a way that would help; well, I could if you could
>>>>> somehow have multiple io helper threads that dealt with it.
>>>>
>>>> The first issue I encounter here for both the "virsh save" and
"virsh restore" scenarios is that libvirt uses fd: migration, not unix:
migration.
>>>> QEMU supports multifd for unix:, tcp:, vsock: as far as I can see.
>>>>
>>>> Current save procedure in QMP in short:
>>>>
>>>> {"execute":"migrate-set-capabilities", ...}
>>>> {"execute":"migrate-set-parameters", ...}
>>>>
{"execute":"getfd","arguments":{"fdname":"migrate"},
...} fd=26
>>>> QEMU_MONITOR_IO_SEND_FD: fd=26
>>>>
{"execute":"migrate","arguments":{"uri":"fd:migrate"},
...}
>>>>
>>>>
>>>> Current restore procedure in QMP in short:
>>>>
>>>> (start QEMU)
>>>>
{"execute":"migrate-incoming","arguments":{"uri":"fd:21"},
...}
>>>>
>>>>
>>>> Should I investigate changing libvirt to use unix: for save/restore?
>>>> Or should I look into changing qemu to somehow accept fd: for multifd,
meaning I guess providing multiple fd: uris in the migrate command?
>>>
>>> So I'm not sure this is the right direction; i.e. if multifd is the
>>> right answer to your problem.
>>
>> Of course, just exploring the space.
>
>
> I have some progress on multifd if we can call it so:
>
> I wrote a simple program that sets up a unix socket,
> listens for N_CHANNELS + 1 connections there, sets up multifd parameters, and runs
the migration,
> spawning threads for each incoming connection from QEMU, creating a file to use to
store the migration data coming from qemu (optionally using O_DIRECT).
>
> This program plays the role of a "iohelper"-like thing, basically just
copying things over, making O_DIRECT possible.
>
> I save the data streams to multiple files; this works, for the actual results though
I will have to migrate to a better hardware setup (enterprise nvme + fast cpu, under
various memory configurations).
>
> The intuition would be that if we have enough cpus to spare (no libvirt in the
picture as mentioned for now),
> say, the same 4 cpus already allocated for a certain VM to run, we can use those cpus
(now "free" since we suspended the guest)
> to compress each multifd channel (multifd-zstd? multifd-zlib?), thus reducing the
amount of stuff that needs to go to disk, making use of those cpus.
Yes possibly; you have an advantage over ormal migration, in that your
vCPUs are stopped.
Indeed, it seems to help immensely in the save vm case, cutting on the full transfer cost
(including sync).
In my experiment though the data is 90G generated via random() so it likely contains too
many repeated patterns,
so the effectiveness will likely depend a lot on how much we can compress.
> Work in progress...
>
>>
>>> However, I think the qemu code probably really really wants to be a
>>> socket.
>>
>> Understood, I'll try to bend libvirt to use unix:/// and see how far I get,
>>
>> Thanks,
>>
>> Claudio
>>
>>>
>>> Dave
>>>
>>>>
>>>> Thank you for your help,
>>>>
>>>> Claudio
>>>>
>>
>