Hi Peter,
-----Original Message-----
From: Peter Xu [mailto:peterx@redhat.com]
Sent: Wednesday, May 22, 2024 6:15 AM
To: Yu Zhang <yu.zhang(a)ionos.com>
Cc: Michael Galaxy <mgalaxy(a)akamai.com>; Jinpu Wang
<jinpu.wang(a)ionos.com>; Elmar Gerdes <elmar.gerdes(a)ionos.com>;
zhengchuan <zhengchuan(a)huawei.com>; Gonglei (Arei)
<arei.gonglei(a)huawei.com>; Daniel P. Berrangé <berrange(a)redhat.com>;
Markus Armbruster <armbru(a)redhat.com>; Zhijian Li (Fujitsu)
<lizhijian(a)fujitsu.com>; qemu-devel(a)nongnu.org; Yuval Shaia
<yuval.shaia.ml(a)gmail.com>; Kevin Wolf <kwolf(a)redhat.com>; Prasanna
Kumar Kalever <prasanna.kalever(a)redhat.com>; Cornelia Huck
<cohuck(a)redhat.com>; Michael Roth <michael.roth(a)amd.com>; Prasanna
Kumar Kalever <prasanna4324(a)gmail.com>; Paolo Bonzini
<pbonzini(a)redhat.com>; qemu-block(a)nongnu.org; devel(a)lists.libvirt.org;
Hanna Reitz <hreitz(a)redhat.com>; Michael S. Tsirkin <mst(a)redhat.com>;
Thomas Huth <thuth(a)redhat.com>; Eric Blake <eblake(a)redhat.com>; Song
Gao <gaosong(a)loongson.cn>; Marc-André Lureau
<marcandre.lureau(a)redhat.com>; Alex Bennée <alex.bennee(a)linaro.org>;
Wainer dos Santos Moschetta <wainersm(a)redhat.com>; Beraldo Leal
<bleal(a)redhat.com>; Pannengyuan <pannengyuan(a)huawei.com>;
Xiexiangyou <xiexiangyou(a)huawei.com>; Fabiano Rosas <farosas(a)suse.de>
Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
On Fri, May 17, 2024 at 03:01:59PM +0200, Yu Zhang wrote:
> Hello Michael and Peter,
Hi,
>
> Exactly, not so compelling, as I did it first only on servers widely
> used for production in our data center. The network adapters are
>
> Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720
> 2-port Gigabit Ethernet PCIe
Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks more reasonable.
https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda15
wVAqk81vXtKzx-LfJQ(a)mail.gmail.com/
Appreciate a lot for everyone helping on the testings.
> InfiniBand controller: Mellanox Technologies MT27800 Family
> [ConnectX-5]
>
> which doesn't meet our purpose. I can choose RDMA or TCP for VM
> migration. RDMA traffic is through InfiniBand and TCP through Ethernet
> on these two hosts. One is standby while the other is active.
>
> Now I'll try on a server with more recent Ethernet and InfiniBand
> network adapters. One of them has:
> BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
>
> The comparison between RDMA and TCP on the same NIC could make more
sense.
It looks to me NICs are powerful now, but again as I mentioned I don't think
it's
a reason we need to deprecate rdma, especially if QEMU's rdma migration has
the chance to be refactored using rsocket.
Is there anyone who started looking into that direction? Would it make sense
we start some PoC now?
My team has finished the PoC refactoring which works well.
Progress:
1. Implement io/channel-rdma.c,
2. Add unit test tests/unit/test-io-channel-rdma.c and verifying it is successful,
3. Remove the original code from migration/rdma.c,
4. Rewrite the rdma_start_outgoing_migration and rdma_start_incoming_migration logic,
5. Remove all rdma_xxx functions from migration/ram.c. (to prevent RDMA live migration
from polluting the core logic of live migration),
6. The soft-RoCE implemented by software is used to test the RDMA live migration.
It's successful.
We will be submit the patchset later.
Regards,
-Gonglei
Thanks,
--
Peter Xu