Hi Gonglei,
On Tue, May 28, 2024 at 11:06 AM Gonglei (Arei) <arei.gonglei(a)huawei.com> wrote:
Hi Peter,
> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Wednesday, May 22, 2024 6:15 AM
> To: Yu Zhang <yu.zhang(a)ionos.com>
> Cc: Michael Galaxy <mgalaxy(a)akamai.com>; Jinpu Wang
> <jinpu.wang(a)ionos.com>; Elmar Gerdes <elmar.gerdes(a)ionos.com>;
> zhengchuan <zhengchuan(a)huawei.com>; Gonglei (Arei)
> <arei.gonglei(a)huawei.com>; Daniel P. Berrangé <berrange(a)redhat.com>;
> Markus Armbruster <armbru(a)redhat.com>; Zhijian Li (Fujitsu)
> <lizhijian(a)fujitsu.com>; qemu-devel(a)nongnu.org; Yuval Shaia
> <yuval.shaia.ml(a)gmail.com>; Kevin Wolf <kwolf(a)redhat.com>; Prasanna
> Kumar Kalever <prasanna.kalever(a)redhat.com>; Cornelia Huck
> <cohuck(a)redhat.com>; Michael Roth <michael.roth(a)amd.com>; Prasanna
> Kumar Kalever <prasanna4324(a)gmail.com>; Paolo Bonzini
> <pbonzini(a)redhat.com>; qemu-block(a)nongnu.org; devel(a)lists.libvirt.org;
> Hanna Reitz <hreitz(a)redhat.com>; Michael S. Tsirkin <mst(a)redhat.com>;
> Thomas Huth <thuth(a)redhat.com>; Eric Blake <eblake(a)redhat.com>; Song
> Gao <gaosong(a)loongson.cn>; Marc-André Lureau
> <marcandre.lureau(a)redhat.com>; Alex Bennée <alex.bennee(a)linaro.org>;
> Wainer dos Santos Moschetta <wainersm(a)redhat.com>; Beraldo Leal
> <bleal(a)redhat.com>; Pannengyuan <pannengyuan(a)huawei.com>;
> Xiexiangyou <xiexiangyou(a)huawei.com>; Fabiano Rosas <farosas(a)suse.de>
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
>
> On Fri, May 17, 2024 at 03:01:59PM +0200, Yu Zhang wrote:
> > Hello Michael and Peter,
>
> Hi,
>
> >
> > Exactly, not so compelling, as I did it first only on servers widely
> > used for production in our data center. The network adapters are
> >
> > Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720
> > 2-port Gigabit Ethernet PCIe
>
> Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks more reasonable.
>
>
https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda15
> wVAqk81vXtKzx-LfJQ(a)mail.gmail.com/
>
> Appreciate a lot for everyone helping on the testings.
>
> > InfiniBand controller: Mellanox Technologies MT27800 Family
> > [ConnectX-5]
> >
> > which doesn't meet our purpose. I can choose RDMA or TCP for VM
> > migration. RDMA traffic is through InfiniBand and TCP through Ethernet
> > on these two hosts. One is standby while the other is active.
> >
> > Now I'll try on a server with more recent Ethernet and InfiniBand
> > network adapters. One of them has:
> > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
> >
> > The comparison between RDMA and TCP on the same NIC could make more
> sense.
>
> It looks to me NICs are powerful now, but again as I mentioned I don't think
it's
> a reason we need to deprecate rdma, especially if QEMU's rdma migration has
> the chance to be refactored using rsocket.
>
> Is there anyone who started looking into that direction? Would it make sense
> we start some PoC now?
>
My team has finished the PoC refactoring which works well.
Progress:
1. Implement io/channel-rdma.c,
2. Add unit test tests/unit/test-io-channel-rdma.c and verifying it is successful,
3. Remove the original code from migration/rdma.c,
4. Rewrite the rdma_start_outgoing_migration and rdma_start_incoming_migration logic,
5. Remove all rdma_xxx functions from migration/ram.c. (to prevent RDMA live migration
from polluting the core logic of live migration),
6. The soft-RoCE implemented by software is used to test the RDMA live migration.
It's successful.
We will be submit the patchset later.
Thanks for working on this PoC, and sharing progress on this, we are
looking forward for the patchset.
Regards,
-Gonglei
Regards!
Jinpu
>
> > Thanks,
> >
> > --
> > Peter Xu
>