On 3/28/22 10:31 AM, Daniel P. Berrangé wrote:
> On Sat, Mar 26, 2022 at 04:49:46PM +0100, Claudio Fontana wrote:
>> On 3/25/22 12:29 PM, Daniel P. Berrangé wrote:
>>> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>>>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>>>> * Claudio Fontana (cfontana(a)suse.de) wrote:
>>>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana
wrote:
>>>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100,
Claudio Fontana wrote:
>>>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé
wrote:
>>>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM
+0100, Claudio Fontana wrote:
>>>>>>>>>>>>>> the first user is the qemu
driver,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> virsh save/resume would slow to a
crawl with a default pipe size (64k).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This improves the situation by
400%.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Going through io_helper still
seems to incur in some penalty (~15%-ish)
>>>>>>>>>>>>>> compared with direct qemu
migration to a nc socket to a file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Claudio Fontana
<cfontana(a)suse.de>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>> src/qemu/qemu_driver.c | 6
+++---
>>>>>>>>>>>>>> src/qemu/qemu_saveimage.c | 11
++++++-----
>>>>>>>>>>>>>> src/util/virfile.c | 12
++++++++++++
>>>>>>>>>>>>>> src/util/virfile.h | 1
+
>>>>>>>>>>>>>> 4 files changed, 22
insertions(+), 8 deletions(-)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello, I initially thought this
to be a qemu performance issue,
>>>>>>>>>>>>>> so you can find the discussion
about this in qemu-devel:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "Re: bad virsh save
/dev/null performance (600 MiB/s max)"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>>>
>>>>>>>>
>>>>>>>>> Current results show these experimental averages
maximum throughput
>>>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size
(as per QEMU QMP
>>>>>>>>> "query-migrate", tests repeated 5 times for
each).
>>>>>>>>> VM Size is 60G, most of the memory effectively
touched before migration,
>>>>>>>>> through user application allocating and touching all
memory with
>>>>>>>>> pseudorandom data.
>>>>>>>>>
>>>>>>>>> 64K: 5200 Mbps (current situation)
>>>>>>>>> 128K: 5800 Mbps
>>>>>>>>> 256K: 20900 Mbps
>>>>>>>>> 512K: 21600 Mbps
>>>>>>>>> 1M: 22800 Mbps
>>>>>>>>> 2M: 22800 Mbps
>>>>>>>>> 4M: 22400 Mbps
>>>>>>>>> 8M: 22500 Mbps
>>>>>>>>> 16M: 22800 Mbps
>>>>>>>>> 32M: 22900 Mbps
>>>>>>>>> 64M: 22900 Mbps
>>>>>>>>> 128M: 22800 Mbps
>>>>>>>>>
>>>>>>>>> This above is the throughput out of patched libvirt
with multiple Pipe Sizes for the FDWrapper.
>>>>>>>>
>>>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd
suggest that
>>>>>>>> libvirt attempt to raise the pipe limit to 1 MB by
default, but
>>>>>>>> not try to go higher.
>>>>>>>>
>>>>>>>>> As for the theoretical limit for the libvirt
architecture,
>>>>>>>>> I ran a qemu migration directly issuing the
appropriate QMP
>>>>>>>>> commands, setting the same migration parameters as
per libvirt,
>>>>>>>>> and then migrating to a socket netcatted to /dev/null
via
>>>>>>>>> {"execute": "migrate",
"arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>>>
>>>>>>>>> QMP: 37000 Mbps
>>>>>>>>
>>>>>>>>> So although the Pipe size improves things (in
particular the
>>>>>>>>> large jump is for the 256K size, although 1M seems a
very good value),
>>>>>>>>> there is still a second bottleneck in there somewhere
that
>>>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>>>
>>>>>>
>>>>>> Interesting addition: I tested quickly on a system with faster
cpus and larger VM sizes, up to 200GB,
>>>>>> and the difference in throughput libvirt vs qemu is basically the
same ~14500 Mbps.
>>>>>>
>>>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>>>> ~35500 mbps virsh save to /dev/null
>>>>>>
>>>>>> Seems it is not proportional to cpu speed by the looks of it (not
a totally fair comparison because the VM sizes are different).
>>>>>
>>>>> It might be closer to RAM or cache bandwidth limited though; for an
extra copy.
>>>>
>>>> I was thinking about sendfile(2) in iohelper, but that probably
>>>> can't work as the input fd is a socket, I am getting EINVAL.
>>>
>>> Yep, sendfile() requires the input to be a mmapable FD,
>>> and the output to be a socket.
>>>
>>> Try splice() instead which merely requires 1 end to be a
>>> pipe, and the other end can be any FD afaik.
>>>
>>
>> I did try splice(), but performance is worse by around 500%.
>
> Hmm, that's certainly unexpected !
>
>> Any ideas welcome,
>
> I learnt there is also a newer copy_file_range call, not sure if that's
> any better.
>
> You passed len as 1 MB, I wonder if passing MAXINT is viable ? We just
> want to copy everything IIRC.
>
> With regards,
> Daniel
>
Hi Daniel, tried also up to 64MB, no improvement with splice.
I'll take a look at copy_file_range,
It fails with EINVAL, according to man pages it needs both fds to refer to regular files.
All these alternatives to read/write API seem very situational...
would be cool if there was an API that does the best thing to minimize copies with the FDs
it is passed, avoiding the need for userspace buffer
whatever the FDs refer to, but seems like there isn't one?
Ciao,
Claudio