Hi Peter, Hi Chuan,
On Thu, May 9, 2024 at 4:14 PM Peter Xu <peterx(a)redhat.com> wrote:
On Thu, May 09, 2024 at 04:58:34PM +0800, Zheng Chuan via wrote:
> That's a good news to see the socket abstraction for RDMA!
> When I was developed the series above, the most pain is the RDMA migration has no
QIOChannel abstraction and i need to take a 'fake channel'
> for it which is awkward in code implementation.
> So, as far as I know, we can do this by
> i. the first thing is that we need to evaluate the rsocket is good enough to satisfy
our QIOChannel fundamental abstraction
> ii. if it works right, then we will continue to see if it can give us opportunity to
hide the detail of rdma protocol
> into rsocket by remove most of code in rdma.c and also some hack in migration
main process.
> iii. implement the advanced features like multi-fd and multi-uri for rdma
migration.
>
> Since I am not familiar with rsocket, I need some times to look at it and do some
quick verify with rdma migration based on rsocket.
> But, yes, I am willing to involved in this refactor work and to see if we can make
this migration feature more better:)
Based on what we have now, it looks like we'd better halt the deprecation
process a bit, so I think we shouldn't need to rush it at least in 9.1
then, and we'll need to see how it goes on the refactoring.
It'll be perfect if rsocket works, otherwise supporting multifd with little
overhead / exported APIs would also be a good thing in general with
whatever approach. And obviously all based on the facts that we can get
resources from companies to support this feature first.
Note that so far nobody yet compared with rdma v.s. nic perf, so I hope if
any of us can provide some test results please do so. Many people are
saying RDMA is better, but I yet didn't see any numbers comparing it with
modern TCP networks. I don't want to have old impressions floating around
even if things might have changed.. When we have consolidated results, we
should share them out and also reflect that in QEMU's migration docs when a
rdma document page is ready.
I also did a tests with Mellanox ConnectX-6 100 G RoCE
nic, the
results are mixed, for less than 3 streams native ethernet is faster,
and when more than 3 streams rsocket performs better.
root@x4-right:~# iperf -c 1.1.1.16 -P 1
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 44214 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0000 sec 52.9 GBytes 45.4 Gbits/sec
root@x4-right:~# iperf -c 1.1.1.16 -P 2
[ 3] local 1.1.1.15 port 33118 connected with 1.1.1.16 port 5001
[ 4] local 1.1.1.15 port 33130 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0001 sec 45.0 GBytes 38.7 Gbits/sec
[ 4] 0.0000-10.0000 sec 43.9 GBytes 37.7 Gbits/sec
[SUM] 0.0000-10.0000 sec 88.9 GBytes 76.4 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
0.172/0.189/0.205/0.172 ms (tot/err) = 2/0
root@x4-right:~# iperf -c 1.1.1.16 -P 4
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 5] local 1.1.1.15 port 50748 connected with 1.1.1.16 port 5001
[ 4] local 1.1.1.15 port 50734 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 50764 connected with 1.1.1.16 port 5001
[ 3] local 1.1.1.15 port 50730 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0000 sec 24.7 GBytes 21.2 Gbits/sec
[ 3] 0.0000-10.0004 sec 23.6 GBytes 20.3 Gbits/sec
[ 4] 0.0000-10.0000 sec 27.8 GBytes 23.9 Gbits/sec
[ 5] 0.0000-10.0000 sec 28.0 GBytes 24.0 Gbits/sec
[SUM] 0.0000-10.0000 sec 104 GBytes 89.4 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
0.104/0.156/0.204/0.124 ms (tot/err) = 4/0
root@x4-right:~# iperf -c 1.1.1.16 -P 8
[ 4] local 1.1.1.15 port 55588 connected with 1.1.1.16 port 5001
[ 5] local 1.1.1.15 port 55600 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 325 KByte (default)
------------------------------------------------------------
[ 10] local 1.1.1.15 port 55628 connected with 1.1.1.16 port 5001
[ 15] local 1.1.1.15 port 55648 connected with 1.1.1.16 port 5001
[ 7] local 1.1.1.15 port 55620 connected with 1.1.1.16 port 5001
[ 3] local 1.1.1.15 port 55584 connected with 1.1.1.16 port 5001
[ 14] local 1.1.1.15 port 55644 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 55610 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0015 sec 8.47 GBytes 7.27 Gbits/sec
[ 4] 0.0000-10.0011 sec 8.62 GBytes 7.40 Gbits/sec
[ 7] 0.0000-10.0000 sec 18.1 GBytes 15.5 Gbits/sec
[ 14] 0.0000-10.0000 sec 8.69 GBytes 7.46 Gbits/sec
[ 5] 0.0000-10.0006 sec 18.5 GBytes 15.9 Gbits/sec
[ 10] 0.0000-10.0006 sec 16.1 GBytes 13.9 Gbits/sec
[ 3] 0.0000-10.0000 sec 17.1 GBytes 14.6 Gbits/sec
[ 15] 0.0000-10.0016 sec 8.54 GBytes 7.34 Gbits/sec
[SUM] 0.0000-10.0017 sec 104 GBytes 89.4 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
0.049/0.095/0.213/0.062 ms (tot/err) = 8/0
root@x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 1
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 45596 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0000 sec 37.8 GBytes 32.5 Gbits/sec
root@x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 2
[ 4] local 1.1.1.15 port 46782 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 43237 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0000-10.0000 sec 37.5 GBytes 32.2 Gbits/sec
[ 3] 0.0000-10.0000 sec 40.7 GBytes 34.9 Gbits/sec
[SUM] 0.0000-10.0000 sec 78.2 GBytes 67.2 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.819/6.579/7.340/7.340 ms (tot/err) = 2/0
root@x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 4
[ 4] local 1.1.1.15 port 60385 connected with 1.1.1.16 port 5001
[ 7] local 1.1.1.15 port 55203 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 35084 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 37253 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0000 sec 28.4 GBytes 24.4 Gbits/sec
[ 4] 0.0000-10.0000 sec 28.3 GBytes 24.3 Gbits/sec
[ 7] 0.0000-10.0000 sec 28.4 GBytes 24.4 Gbits/sec
[ 3] 0.0000-10.0001 sec 28.2 GBytes 24.3 Gbits/sec
[SUM] 0.0000-10.0001 sec 113 GBytes 97.3 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.311/7.579/10.019/4.165 ms (tot/err) = 4/0
root@x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 8
[ 8] local 1.1.1.15 port 33684 connected with 1.1.1.16 port 5001
[ 10] local 1.1.1.15 port 40620 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 56988 connected with 1.1.1.16 port 5001
[ 4] local 1.1.1.15 port 51139 connected with 1.1.1.16 port 5001
[ 12] local 1.1.1.15 port 44712 connected with 1.1.1.16 port 5001
[ 5] local 1.1.1.15 port 50838 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 51334 connected with 1.1.1.16 port 5001
[ 9] local 1.1.1.15 port 40611 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec
[ 5] 0.0000-10.0001 sec 13.9 GBytes 11.9 Gbits/sec
[ 12] 0.0000-10.0001 sec 13.8 GBytes 11.9 Gbits/sec
[ 10] 0.0000-10.0001 sec 13.9 GBytes 11.9 Gbits/sec
[ 9] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec
[ 6] 0.0000-10.0000 sec 13.9 GBytes 11.9 Gbits/sec
[ 8] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec
[ 4] 0.0000-10.0001 sec 13.8 GBytes 11.9 Gbits/sec
[SUM] 0.0000-10.0001 sec 111 GBytes 95.1 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.973/10.699/15.943/4.251 ms (tot/err) = 8/0
root@x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 1
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 36960 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0000-10.0000 sec 41.1 GBytes 35.3 Gbits/sec
root@x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 2
[ 3] local 1.1.1.15 port 32799 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 4] local 1.1.1.15 port 35912 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0000-10.0000 sec 36.6 GBytes 31.4 Gbits/sec
[ 3] 0.0000-10.0000 sec 36.6 GBytes 31.4 Gbits/sec
[SUM] 0.0000-10.0000 sec 73.2 GBytes 62.9 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.172/5.842/6.512/6.512 ms (tot/err) = 2/0
root@x4-right:~#
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c
1.1.1.16 -P 4
[ 4] local 1.1.1.15 port 53311 connected with 1.1.1.16 port 5001
------------------------------------------------------------
Client connecting to 1.1.1.16, TCP port 5001
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 3] local 1.1.1.15 port 37243 connected with 1.1.1.16 port 5001
[ 7] local 1.1.1.15 port 60801 connected with 1.1.1.16 port 5001
[ 6] local 1.1.1.15 port 49694 connected with 1.1.1.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec
[ 7] 0.0000-10.0000 sec 28.2 GBytes 24.3 Gbits/sec
[ 3] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec
[ 4] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec
[SUM] 0.0000-10.0000 sec 113 GBytes 96.9 Gbits/sec
[ CT] final connect times (min/avg/max/stdev) =
5.570/7.762/10.045/4.265 ms (tot/err) = 4/0
root@x4-right:~#
Chuan, please check the whole thread discussion, it may help to understand
what we are looking for on rdma migrations [1]. Meanwhile please feel free
to sync with Jinpu's team and see how to move forward with such a project.
We
are happy to work with community to improve rdma migration.
Regards!
>
> --
> Peter Xu
>