Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

8 Apr 2024

      On Mon, Apr 08, 2024 at 04:07:20PM +0200, Jinpu Wang wrote:
...
Hi Peter,
Jinpu,

Thanks for joining the discussion.
...
On Tue, Apr 2, 2024 at 11:24 PM Peter Xu <peterx@redhat.com> wrote:
...
On Mon, Apr 01, 2024 at 11:26:25PM +0200, Yu Zhang wrote:
...
Hello Peter und Zhjian,
Thank you so much for letting me know about this. I'm also a bit surprised at
the plan for deprecating the RDMA migration subsystem.
It's not too late, since it looks like we do have users not yet notified
from this, we'll redo the deprecation procedure even if it'll be the final
plan, and it'll be 2 releases after this.
...
...
IMHO it's more important to know whether there are still users and whether
they would still like to see it around.
...
I admit RDMA migration was lack of testing(unit/CI test), which led to the a few
obvious bugs being noticed too late.
Yes, we are a user of this subsystem. I was unaware of the lack of test coverage
for this part. As soon as 8.2 was released, I saw that many of the
migration test
cases failed and came to realize that there might be a bug between 8.1
and 8.2, but
was unable to confirm and report it quickly to you.
The maintenance of this part could be too costly or difficult from
your point of view.
It may or may not be too costly, it's just that we need real users of RDMA
taking some care of it.  Having it broken easily for >1 releases definitely
is a sign of lack of users.  It is an implication to the community that we
should consider dropping some features so that we can get the best use of
the community resources for the things that may have a broader audience.
One thing majorly missing is a RDMA tester to guard all the merges to not
break RDMA paths, hopefully in CI.  That should not rely on RDMA hardwares
but just to sanity check the migration+rdma code running all fine.  RDMA
taught us the lesson so we're requesting CI coverage for all other new
features that will be merged at least for migration subsystem, so that we
plan to not merge anything that is not covered by CI unless extremely
necessary in the future.
For sure CI is not the only missing part, but I'd say we should start with
it, then someone should also take care of the code even if only in
maintenance mode (no new feature to add on top).
...
My concern is, this plan will forces a few QEMU users (not sure how
many) like us
either to stick to the RDMA migration by using an increasingly older
version of QEMU,
or to abandon the currently used RDMA migration.
RDMA doesn't get new features anyway, if there's specific use case for RDMA
migrations, would it work if such a scenario uses the old binary?  Is it
possible to switch to the TCP protocol with some good NICs?
We have used rdma migration with HCA from Nvidia for years, our
experience is RDMA migration works better than tcp (over ipoib).
Please bare with me, as I know little on rdma stuff.

I'm actually pretty confused (and since a long time ago..) on why we need
to operation with rdma contexts when ipoib seems to provide all the tcp
layers.  I meant, can it work with the current "tcp:" protocol with ipoib
even if there's rdma/ib hardwares underneath?  Is it because of performance
improvements so that we must use a separate path comparing to generic
"tcp:" protocol here?
...
Switching back to TCP will lead us to the old problems which was
solved by RDMA migration.
Can you elaborate the problems, and why tcp won't work in this case?  They
may not be directly relevant to the issue we're discussing, but I'm happy
to learn more.

What is the NICs you were testing before?  Did the test carry out with
things like modern ones (50Gbps-200Gbps NICs), or the test was done when
these hardwares are not common?

Per my recent knowledge on the new Intel hardwares, at least the ones that
support QPL, it's easy to achieve single core 50Gbps+.

https://lore.kernel.org/r/PH7PR11MB5941A91AC1E514BCC32896A6A3342@PH7PR11MB59...

Quote from Yuan:

  Yes, I use iperf3 to check the bandwidth for one core, the bandwith is 60Gbps.
  [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
  [  5]   0.00-1.00   sec  7.00 GBytes  60.1 Gbits/sec    0   2.87 MBytes
  [  5]   1.00-2.00   sec  7.05 GBytes  60.6 Gbits/sec    0   2.87 Mbytes

  And in the live migration test, a multifd thread's CPU utilization is almost 100%

It boils down to what old problems were there with tcp first, though.
...
...
Per our best knowledge, RDMA users are rare, and please let anyone know if
you are aware of such users.  IIUC the major reason why RDMA stopped being
the trend is because the network is not like ten years ago; I don't think I
have good knowledge in RDMA at all nor network, but my understanding is
it's pretty easy to fetch modern NIC to outperform RDMAs, then it may make
little sense to maintain multiple protocols, considering RDMA migration
code is so special so that it has the most custom code comparing to other
protocols.
+cc some guys from Huawei.
I'm surprised RDMA users are rare,  I guess maybe many are just
working with different code base.
Yes, please cc whoever might be interested (or surprised.. :) to know this,
and let's be open to all possibilities.

I don't think it makes sense if there're a lot of users of a feature then
we deprecate that without a good reason.  However there's always the
resource limitation issue we're facing, so it could still have the
possibility that this gets deprecated if nobody is working on our upstream
branch. Say, if people use private branches anyway to support rdma without
collaborating upstream, keeping such feature upstream then may not make
much sense either, unless there's some way to collaborate.  We'll see.

It seems there can still be people joining this discussion.  I'll hold off
a bit on merging this patch to provide enough window for anyone to chim in.

Thanks,
...
...
Thanks,
--
Peter Xu
Thx!
Jinpu Wang
...
-- 
Peter Xu