Hello Peter and all,
I did a comparison of the VM live-migration speeds between RDMA and
TCP/IP on our servers
and plotted the results to get an initial impression. Unfortunately,
the Ethernet NICs are not the
recent ones, therefore, it may not make much sense. I can do it on
servers with more recent Ethernet
NICs and keep you updated.
It seems that the benefits of RDMA becomes obviously when the VM has
large memory and is
running memory-intensive workload.
Best regards,
Yu Zhang @ IONOS Cloud
On Thu, May 9, 2024 at 4:14 PM Peter Xu <peterx(a)redhat.com> wrote:
On Thu, May 09, 2024 at 04:58:34PM +0800, Zheng Chuan via wrote:
> That's a good news to see the socket abstraction for RDMA!
> When I was developed the series above, the most pain is the RDMA migration has no
QIOChannel abstraction and i need to take a 'fake channel'
> for it which is awkward in code implementation.
> So, as far as I know, we can do this by
> i. the first thing is that we need to evaluate the rsocket is good enough to satisfy
our QIOChannel fundamental abstraction
> ii. if it works right, then we will continue to see if it can give us opportunity to
hide the detail of rdma protocol
> into rsocket by remove most of code in rdma.c and also some hack in migration
main process.
> iii. implement the advanced features like multi-fd and multi-uri for rdma
migration.
>
> Since I am not familiar with rsocket, I need some times to look at it and do some
quick verify with rdma migration based on rsocket.
> But, yes, I am willing to involved in this refactor work and to see if we can make
this migration feature more better:)
Based on what we have now, it looks like we'd better halt the deprecation
process a bit, so I think we shouldn't need to rush it at least in 9.1
then, and we'll need to see how it goes on the refactoring.
It'll be perfect if rsocket works, otherwise supporting multifd with little
overhead / exported APIs would also be a good thing in general with
whatever approach. And obviously all based on the facts that we can get
resources from companies to support this feature first.
Note that so far nobody yet compared with rdma v.s. nic perf, so I hope if
any of us can provide some test results please do so. Many people are
saying RDMA is better, but I yet didn't see any numbers comparing it with
modern TCP networks. I don't want to have old impressions floating around
even if things might have changed.. When we have consolidated results, we
should share them out and also reflect that in QEMU's migration docs when a
rdma document page is ready.
Chuan, please check the whole thread discussion, it may help to understand
what we are looking for on rdma migrations [1]. Meanwhile please feel free
to sync with Jinpu's team and see how to move forward with such a project.
[1]
https://lore.kernel.org/qemu-devel/87frwatp7n.fsf@suse.de/
Thanks,
--
Peter Xu