Hi, Peter
RDMA features high bandwidth, low latency (in non-blocking lossless network), and direct
remote
memory access by bypassing the CPU (As you know, CPU resources are expensive for cloud
vendors,
which is one of the reasons why we introduced offload cards.), which TCP does not have.
In some scenarios where fast live migration is needed (extremely short interruption
duration and migration
duration) is very useful. To this end, we have also developed RDMA support for multifd.
Regards,
-Gonglei
-----Original Message-----
From: Peter Xu [mailto:peterx@redhat.com]
Sent: Wednesday, May 1, 2024 11:31 PM
To: Daniel P. Berrangé <berrange(a)redhat.com>
Cc: Markus Armbruster <armbru(a)redhat.com>; Michael Galaxy
<mgalaxy(a)akamai.com>; Yu Zhang <yu.zhang(a)ionos.com>; Zhijian Li (Fujitsu)
<lizhijian(a)fujitsu.com>; Jinpu Wang <jinpu.wang(a)ionos.com>; Elmar Gerdes
<elmar.gerdes(a)ionos.com>; qemu-devel(a)nongnu.org; Yuval Shaia
<yuval.shaia.ml(a)gmail.com>; Kevin Wolf <kwolf(a)redhat.com>; Prasanna
Kumar Kalever <prasanna.kalever(a)redhat.com>; Cornelia Huck
<cohuck(a)redhat.com>; Michael Roth <michael.roth(a)amd.com>; Prasanna
Kumar Kalever <prasanna4324(a)gmail.com>; integration(a)gluster.org; Paolo
Bonzini <pbonzini(a)redhat.com>; qemu-block(a)nongnu.org;
devel(a)lists.libvirt.org; Hanna Reitz <hreitz(a)redhat.com>; Michael S. Tsirkin
<mst(a)redhat.com>; Thomas Huth <thuth(a)redhat.com>; Eric Blake
<eblake(a)redhat.com>; Song Gao <gaosong(a)loongson.cn>; Marc-André
Lureau <marcandre.lureau(a)redhat.com>; Alex Bennée
<alex.bennee(a)linaro.org>; Wainer dos Santos Moschetta
<wainersm(a)redhat.com>; Beraldo Leal <bleal(a)redhat.com>; Gonglei (Arei)
<arei.gonglei(a)huawei.com>; Pannengyuan <pannengyuan(a)huawei.com>
Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
On Tue, Apr 30, 2024 at 09:00:49AM +0100, Daniel P. Berrangé wrote:
> On Tue, Apr 30, 2024 at 09:15:03AM +0200, Markus Armbruster wrote:
> > Peter Xu <peterx(a)redhat.com> writes:
> >
> > > On Mon, Apr 29, 2024 at 08:08:10AM -0500, Michael Galaxy wrote:
> > >> Hi All (and Peter),
> > >
> > > Hi, Michael,
> > >
> > >>
> > >> My name is Michael Galaxy (formerly Hines). Yes, I changed my
> > >> last name (highly irregular for a male) and yes, that's my real
last name:
> > >>
https://www.linkedin.com/in/mrgalaxy/)
> > >>
> > >> I'm the original author of the RDMA implementation. I've
been
> > >> discussing with Yu Zhang for a little bit about potentially
> > >> handing over maintainership of the codebase to his team.
> > >>
> > >> I simply have zero access to RoCE or Infiniband hardware at all,
> > >> unfortunately. so I've never been able to run tests or use what
I
> > >> wrote at work, and as all of you know, if you don't have a way
to
> > >> test something, then you can't maintain it.
> > >>
> > >> Yu Zhang put a (very kind) proposal forward to me to ask the
> > >> community if they feel comfortable training his team to maintain
> > >> the codebase (and run
> > >> tests) while they learn about it.
> > >
> > > The "while learning" part is fine at least to me. IMHO the
> > > "ownership" to the code, or say, taking over the
responsibility,
> > > may or may not need 100% mastering the code base first. There
> > > should still be some fundamental confidence to work on the code
> > > though as a starting point, then it's about serious use case to
> > > back this up, and careful testings while getting more familiar with it.
> >
> > How much experience we expect of maintainers depends on the
> > subsystem and other circumstances. The hard requirement isn't
> > experience, it's trust. See the recent attack on xz.
> >
> > I do not mean to express any doubts whatsoever on Yu Zhang's integrity!
> > I'm merely reminding y'all what's at stake.
>
> I think we shouldn't overly obsess[1] about 'xz', because the
> overwhealmingly common scenario is that volunteer maintainers are
> honest people. QEMU is in a massively better peer review situation.
> With xz there was basically no oversight of the new maintainer. With
> QEMU, we have oversight from 1000's of people on the list, a huge pool
> of general maintainers, the specific migration maintainers, and the release
manager merging code.
>
> With a lack of historical experiance with QEMU maintainership, I'd
> suggest that new RDMA volunteers would start by adding themselves to the
"MAINTAINERS"
> file with only the 'Reviewer' classification. The main migration
> maintainers would still handle pull requests, but wait for a R-b from
> one of the RMDA volunteers. After some period of time the RDMA folks
> could graduate to full maintainer status if the migration maintainers needed
to reduce their load.
> I suspect that might prove unneccesary though, given RDMA isn't an
> area of code with a high turnover of patches.
Right, and we can do that as a start, it also follows our normal rules of starting
from Reviewers to maintain something. I even considered Zhijian to be the
previous rdma goto guy / maintainer no matter what role he used to have in
the MAINTAINERS file.
Here IMHO it's more about whether any company would like to stand up and
provide help, without yet binding that to be able to send pull requests in the
near future or even longer term.
What I worry more is whether this is really what we want to keep rdma in
qemu, and that's also why I was trying to request for some serious
performance measurements comparing rdma v.s. nics. And here when I said
"we" I mean both QEMU community and any company that will support
keeping rdma around.
The problem is if NICs now are fast enough to perform at least equally against
rdma, and if it has a lower cost of overall maintenance, does it mean that rdma
migration will only be used by whoever wants to keep them in the products and
existed already? In that case we should simply ask new users to stick with tcp,
and rdma users should only drop but not increase.
It seems also destined that most new migration features will not support
rdma: see how much we drop old features in migration now (which rdma
_might_ still leverage, but maybe not), and how much we add mostly multifd
relevant which will probably not apply to rdma at all. So in general what I am
worrying is a both-loss condition, if the company might be easier to either stick
with an old qemu (depending on whether other new features are requested to
be used besides RDMA alone), or do periodic rebase with RDMA downstream
only.
So even if we want to keep RDMA around I hope with this chance we can at
least have clear picture on when we should still suggest any new user to use
RDMA (with the reasons behind). Or we simply shouldn't suggest any new
user to use RDMA at all (because at least it'll lose many new migration
features).
Thanks,
--
Peter Xu