Thanks Daniel,
On 3/25/22 11:33 AM, Daniel P. Berrangé wrote:
> On Fri, Mar 18, 2022 at 02:34:29PM +0100, Claudio Fontana wrote:
>> On 3/17/22 4:03 PM, Dr. David Alan Gilbert wrote:
>>> * Claudio Fontana (cfontana(a)suse.de) wrote:
>>>> On 3/17/22 2:41 PM, Claudio Fontana wrote:
>>>>> On 3/17/22 11:25 AM, Daniel P. Berrangé wrote:
>>>>>> On Thu, Mar 17, 2022 at 11:12:11AM +0100, Claudio Fontana
wrote:
>>>>>>> On 3/16/22 1:17 PM, Claudio Fontana wrote:
>>>>>>>> On 3/14/22 6:48 PM, Daniel P. Berrangé wrote:
>>>>>>>>> On Mon, Mar 14, 2022 at 06:38:31PM +0100, Claudio
Fontana wrote:
>>>>>>>>>> On 3/14/22 6:17 PM, Daniel P. Berrangé wrote:
>>>>>>>>>>> On Sat, Mar 12, 2022 at 05:30:01PM +0100,
Claudio Fontana wrote:
>>>>>>>>>>>> the first user is the qemu driver,
>>>>>>>>>>>>
>>>>>>>>>>>> virsh save/resume would slow to a crawl
with a default pipe size (64k).
>>>>>>>>>>>>
>>>>>>>>>>>> This improves the situation by 400%.
>>>>>>>>>>>>
>>>>>>>>>>>> Going through io_helper still seems to
incur in some penalty (~15%-ish)
>>>>>>>>>>>> compared with direct qemu migration to a
nc socket to a file.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Claudio Fontana
<cfontana(a)suse.de>
>>>>>>>>>>>> ---
>>>>>>>>>>>> src/qemu/qemu_driver.c | 6 +++---
>>>>>>>>>>>> src/qemu/qemu_saveimage.c | 11
++++++-----
>>>>>>>>>>>> src/util/virfile.c | 12
++++++++++++
>>>>>>>>>>>> src/util/virfile.h | 1 +
>>>>>>>>>>>> 4 files changed, 22 insertions(+), 8
deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> Hello, I initially thought this to be a
qemu performance issue,
>>>>>>>>>>>> so you can find the discussion about
this in qemu-devel:
>>>>>>>>>>>>
>>>>>>>>>>>> "Re: bad virsh save /dev/null
performance (600 MiB/s max)"
>>>>>>>>>>>>
>>>>>>>>>>>>
https://lists.gnu.org/archive/html/qemu-devel/2022-03/msg03142.html
>>>>>>
>>>>>>
>>>>>>> Current results show these experimental averages maximum
throughput
>>>>>>> migrating to /dev/null per each FdWrapper Pipe Size (as per
QEMU QMP
>>>>>>> "query-migrate", tests repeated 5 times for
each).
>>>>>>> VM Size is 60G, most of the memory effectively touched
before migration,
>>>>>>> through user application allocating and touching all memory
with
>>>>>>> pseudorandom data.
>>>>>>>
>>>>>>> 64K: 5200 Mbps (current situation)
>>>>>>> 128K: 5800 Mbps
>>>>>>> 256K: 20900 Mbps
>>>>>>> 512K: 21600 Mbps
>>>>>>> 1M: 22800 Mbps
>>>>>>> 2M: 22800 Mbps
>>>>>>> 4M: 22400 Mbps
>>>>>>> 8M: 22500 Mbps
>>>>>>> 16M: 22800 Mbps
>>>>>>> 32M: 22900 Mbps
>>>>>>> 64M: 22900 Mbps
>>>>>>> 128M: 22800 Mbps
>>>>>>>
>>>>>>> This above is the throughput out of patched libvirt with
multiple Pipe Sizes for the FDWrapper.
>>>>>>
>>>>>> Ok, its bouncing around with noise after 1 MB. So I'd
suggest that
>>>>>> libvirt attempt to raise the pipe limit to 1 MB by default, but
>>>>>> not try to go higher.
>>>>>>
>>>>>>> As for the theoretical limit for the libvirt architecture,
>>>>>>> I ran a qemu migration directly issuing the appropriate QMP
>>>>>>> commands, setting the same migration parameters as per
libvirt,
>>>>>>> and then migrating to a socket netcatted to /dev/null via
>>>>>>> {"execute": "migrate",
"arguments": { "uri", "unix:///tmp/netcat.sock" } } :
>>>>>>>
>>>>>>> QMP: 37000 Mbps
>>>>>>
>>>>>>> So although the Pipe size improves things (in particular
the
>>>>>>> large jump is for the 256K size, although 1M seems a very
good value),
>>>>>>> there is still a second bottleneck in there somewhere that
>>>>>>> accounts for a loss of ~14200 Mbps in throughput.
>>>>
>>>>
>>>> Interesting addition: I tested quickly on a system with faster cpus and
larger VM sizes, up to 200GB,
>>>> and the difference in throughput libvirt vs qemu is basically the same
~14500 Mbps.
>>>>
>>>> ~50000 mbps qemu to netcat socket to /dev/null
>>>> ~35500 mbps virsh save to /dev/null
>>>>
>>>> Seems it is not proportional to cpu speed by the looks of it (not a
totally fair comparison because the VM sizes are different).
>>>
>>> It might be closer to RAM or cache bandwidth limited though; for an extra
copy.
>>
>> I was thinking about sendfile(2) in iohelper, but that probably can't work
as the input fd is a socket, I am getting EINVAL.
>>
>> One thing that I noticed is:
>>
>> ommit afe6e58aedcd5e27ea16184fed90b338569bd042
>> Author: Jiri Denemark <jdenemar(a)redhat.com>
>> Date: Mon Feb 6 14:40:48 2012 +0100
>>
>> util: Generalize virFileDirectFd
>>
>> virFileDirectFd was used for accessing files opened with O_DIRECT using
>> libvirt_iohelper. We will want to use the helper for accessing files
>> regardless on O_DIRECT and thus virFileDirectFd was generalized and
>> renamed to virFileWrapperFd.
>>
>>
>> And in particular the comment in src/util/virFile.c:
>>
>> /* XXX support posix_fadvise rather than O_DIRECT, if the kernel support
>> * for that is decent enough. In that case, we will also need to
>> * explicitly support VIR_FILE_WRAPPER_NON_BLOCKING since
>> * VIR_FILE_WRAPPER_BYPASS_CACHE alone will no longer require spawning
>> * iohelper.
>> */
>>
>> by Jiri Denemark.
>>
>> I have lots of questions here, and I tried to involve Jiri and Andrea Righi
here, who a long time ago proposed a POSIX_FADV_NOREUSE implementation.
>>
>> 1) What is the reason iohelper was introduced?
>
> With POSIX you can't get sensible results from poll() on FDs associated with
> plain files. It will always report the file as readable/writable, and the
> userspace caller will get blocked any time the I/O operation causes the
> kernel to read/write from the underlying (potentially very slow) storage.
>
> IOW if you give QEMU an FD associated with a plain file and tell it to
> migrate to that, the guest OS will get stalled.
we send a stop command to qemu just before migrating to a file in virsh save though
right?
With virsh restore we also first load the VM, and only then start executing it.
So for virsh save and virsh restore, this should not be a problem? Still we need the
iohelper?
The same code is used in libvirt for other commands like 'virsh dump'
and snapshots, where the VM remains live though. In general I don't
think we should remove the iohelper, because QEMU code is written from
the POV that the channels honour O_NOBLOCK.
With regards,
Daniel
--
|: