
Hi Peter, Hi Chuan, On Thu, May 9, 2024 at 4:14 PM Peter Xu <peterx@redhat.com> wrote:
On Thu, May 09, 2024 at 04:58:34PM +0800, Zheng Chuan via wrote:
That's a good news to see the socket abstraction for RDMA! When I was developed the series above, the most pain is the RDMA migration has no QIOChannel abstraction and i need to take a 'fake channel' for it which is awkward in code implementation. So, as far as I know, we can do this by i. the first thing is that we need to evaluate the rsocket is good enough to satisfy our QIOChannel fundamental abstraction ii. if it works right, then we will continue to see if it can give us opportunity to hide the detail of rdma protocol into rsocket by remove most of code in rdma.c and also some hack in migration main process. iii. implement the advanced features like multi-fd and multi-uri for rdma migration.
Since I am not familiar with rsocket, I need some times to look at it and do some quick verify with rdma migration based on rsocket. But, yes, I am willing to involved in this refactor work and to see if we can make this migration feature more better:)
Based on what we have now, it looks like we'd better halt the deprecation process a bit, so I think we shouldn't need to rush it at least in 9.1 then, and we'll need to see how it goes on the refactoring.
It'll be perfect if rsocket works, otherwise supporting multifd with little overhead / exported APIs would also be a good thing in general with whatever approach. And obviously all based on the facts that we can get resources from companies to support this feature first.
Note that so far nobody yet compared with rdma v.s. nic perf, so I hope if any of us can provide some test results please do so. Many people are saying RDMA is better, but I yet didn't see any numbers comparing it with modern TCP networks. I don't want to have old impressions floating around even if things might have changed.. When we have consolidated results, we should share them out and also reflect that in QEMU's migration docs when a rdma document page is ready.
I also did a tests with Mellanox ConnectX-6 100 G RoCE nic, the results are mixed, for less than 3 streams native ethernet is faster, and when more than 3 streams rsocket performs better. root@x4-right:~# iperf -c 1.1.1.16 -P 1 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 44214 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0000 sec 52.9 GBytes 45.4 Gbits/sec root@x4-right:~# iperf -c 1.1.1.16 -P 2 [ 3] local 1.1.1.15 port 33118 connected with 1.1.1.16 port 5001 [ 4] local 1.1.1.15 port 33130 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 4.00 MByte (default) ------------------------------------------------------------ [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0001 sec 45.0 GBytes 38.7 Gbits/sec [ 4] 0.0000-10.0000 sec 43.9 GBytes 37.7 Gbits/sec [SUM] 0.0000-10.0000 sec 88.9 GBytes 76.4 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 0.172/0.189/0.205/0.172 ms (tot/err) = 2/0 root@x4-right:~# iperf -c 1.1.1.16 -P 4 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 5] local 1.1.1.15 port 50748 connected with 1.1.1.16 port 5001 [ 4] local 1.1.1.15 port 50734 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 50764 connected with 1.1.1.16 port 5001 [ 3] local 1.1.1.15 port 50730 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0000-10.0000 sec 24.7 GBytes 21.2 Gbits/sec [ 3] 0.0000-10.0004 sec 23.6 GBytes 20.3 Gbits/sec [ 4] 0.0000-10.0000 sec 27.8 GBytes 23.9 Gbits/sec [ 5] 0.0000-10.0000 sec 28.0 GBytes 24.0 Gbits/sec [SUM] 0.0000-10.0000 sec 104 GBytes 89.4 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 0.104/0.156/0.204/0.124 ms (tot/err) = 4/0 root@x4-right:~# iperf -c 1.1.1.16 -P 8 [ 4] local 1.1.1.15 port 55588 connected with 1.1.1.16 port 5001 [ 5] local 1.1.1.15 port 55600 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 10] local 1.1.1.15 port 55628 connected with 1.1.1.16 port 5001 [ 15] local 1.1.1.15 port 55648 connected with 1.1.1.16 port 5001 [ 7] local 1.1.1.15 port 55620 connected with 1.1.1.16 port 5001 [ 3] local 1.1.1.15 port 55584 connected with 1.1.1.16 port 5001 [ 14] local 1.1.1.15 port 55644 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 55610 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0000-10.0015 sec 8.47 GBytes 7.27 Gbits/sec [ 4] 0.0000-10.0011 sec 8.62 GBytes 7.40 Gbits/sec [ 7] 0.0000-10.0000 sec 18.1 GBytes 15.5 Gbits/sec [ 14] 0.0000-10.0000 sec 8.69 GBytes 7.46 Gbits/sec [ 5] 0.0000-10.0006 sec 18.5 GBytes 15.9 Gbits/sec [ 10] 0.0000-10.0006 sec 16.1 GBytes 13.9 Gbits/sec [ 3] 0.0000-10.0000 sec 17.1 GBytes 14.6 Gbits/sec [ 15] 0.0000-10.0016 sec 8.54 GBytes 7.34 Gbits/sec [SUM] 0.0000-10.0017 sec 104 GBytes 89.4 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 0.049/0.095/0.213/0.062 ms (tot/err) = 8/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 1 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 45596 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0000 sec 37.8 GBytes 32.5 Gbits/sec root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 2 [ 4] local 1.1.1.15 port 46782 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 43237 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0000-10.0000 sec 37.5 GBytes 32.2 Gbits/sec [ 3] 0.0000-10.0000 sec 40.7 GBytes 34.9 Gbits/sec [SUM] 0.0000-10.0000 sec 78.2 GBytes 67.2 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.819/6.579/7.340/7.340 ms (tot/err) = 2/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 4 [ 4] local 1.1.1.15 port 60385 connected with 1.1.1.16 port 5001 [ 7] local 1.1.1.15 port 55203 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 35084 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 37253 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0000-10.0000 sec 28.4 GBytes 24.4 Gbits/sec [ 4] 0.0000-10.0000 sec 28.3 GBytes 24.3 Gbits/sec [ 7] 0.0000-10.0000 sec 28.4 GBytes 24.4 Gbits/sec [ 3] 0.0000-10.0001 sec 28.2 GBytes 24.3 Gbits/sec [SUM] 0.0000-10.0001 sec 113 GBytes 97.3 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.311/7.579/10.019/4.165 ms (tot/err) = 4/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 8 [ 8] local 1.1.1.15 port 33684 connected with 1.1.1.16 port 5001 [ 10] local 1.1.1.15 port 40620 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 56988 connected with 1.1.1.16 port 5001 [ 4] local 1.1.1.15 port 51139 connected with 1.1.1.16 port 5001 [ 12] local 1.1.1.15 port 44712 connected with 1.1.1.16 port 5001 [ 5] local 1.1.1.15 port 50838 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 51334 connected with 1.1.1.16 port 5001 [ 9] local 1.1.1.15 port 40611 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec [ 5] 0.0000-10.0001 sec 13.9 GBytes 11.9 Gbits/sec [ 12] 0.0000-10.0001 sec 13.8 GBytes 11.9 Gbits/sec [ 10] 0.0000-10.0001 sec 13.9 GBytes 11.9 Gbits/sec [ 9] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec [ 6] 0.0000-10.0000 sec 13.9 GBytes 11.9 Gbits/sec [ 8] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec [ 4] 0.0000-10.0001 sec 13.8 GBytes 11.9 Gbits/sec [SUM] 0.0000-10.0001 sec 111 GBytes 95.1 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.973/10.699/15.943/4.251 ms (tot/err) = 8/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 1 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 36960 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0000 sec 41.1 GBytes 35.3 Gbits/sec root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 2 [ 3] local 1.1.1.15 port 32799 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 4] local 1.1.1.15 port 35912 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0000-10.0000 sec 36.6 GBytes 31.4 Gbits/sec [ 3] 0.0000-10.0000 sec 36.6 GBytes 31.4 Gbits/sec [SUM] 0.0000-10.0000 sec 73.2 GBytes 62.9 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.172/5.842/6.512/6.512 ms (tot/err) = 2/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 4 [ 4] local 1.1.1.15 port 53311 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 37243 connected with 1.1.1.16 port 5001 [ 7] local 1.1.1.15 port 60801 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 49694 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec [ 7] 0.0000-10.0000 sec 28.2 GBytes 24.3 Gbits/sec [ 3] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec [ 4] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec [SUM] 0.0000-10.0000 sec 113 GBytes 96.9 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.570/7.762/10.045/4.265 ms (tot/err) = 4/0 root@x4-right:~#
Chuan, please check the whole thread discussion, it may help to understand what we are looking for on rdma migrations [1]. Meanwhile please feel free to sync with Jinpu's team and see how to move forward with such a project.
We are happy to work with community to improve rdma migration.
[1] https://lore.kernel.org/qemu-devel/87frwatp7n.fsf@suse.de/
Thanks,
Regards!
-- Peter Xu