On Mon, May 06, 2024 at 02:06:28AM +0000, Gonglei (Arei) wrote:
Hi, Peter
Hey, Lei,
Happy to see you around again after years.
RDMA features high bandwidth, low latency (in non-blocking lossless
network), and direct remote memory access by bypassing the CPU (As you
know, CPU resources are expensive for cloud vendors, which is one of the
reasons why we introduced offload cards.), which TCP does not have.
It's another cost to use offload cards, v.s. preparing more cpu resources?
In some scenarios where fast live migration is needed (extremely
short
interruption duration and migration duration) is very useful. To this
end, we have also developed RDMA support for multifd.
Will any of you upstream that work? I'm curious how intrusive would it be
when adding it to multifd, if it can keep only 5 exported functions like
what rdma.h does right now it'll be pretty nice. We also want to make sure
it works with arbitrary sized loads and buffers, e.g. vfio is considering
to add IO loads to multifd channels too.
One thing to note that the question here is not about a pure performance
comparison between rdma and nics only. It's about help us make a decision
on whether to drop rdma, iow, even if rdma performs well, the community
still has the right to drop it if nobody can actively work and maintain it.
It's just that if nics can perform as good it's more a reason to drop,
unless companies can help to provide good support and work together.
Thanks,
--
Peter Xu