On Wed, May 29, 2024 at 4:43 AM Gonglei (Arei) <arei.gonglei(a)huawei.com> wrote:
Hi,
> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Tuesday, May 28, 2024 11:55 PM
> > > > Exactly, not so compelling, as I did it first only on servers
> > > > widely used for production in our data center. The network
> > > > adapters are
> > > >
> > > > Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
> > > > BCM5720 2-port Gigabit Ethernet PCIe
> > >
> > > Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks more
> reasonable.
> > >
> > >
>
https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda15
> > > wVAqk81vXtKzx-LfJQ(a)mail.gmail.com/
> > >
> > > Appreciate a lot for everyone helping on the testings.
> > >
> > > > InfiniBand controller: Mellanox Technologies MT27800 Family
> > > > [ConnectX-5]
> > > >
> > > > which doesn't meet our purpose. I can choose RDMA or TCP for VM
> > > > migration. RDMA traffic is through InfiniBand and TCP through
> > > > Ethernet on these two hosts. One is standby while the other is
active.
> > > >
> > > > Now I'll try on a server with more recent Ethernet and
InfiniBand
> > > > network adapters. One of them has:
> > > > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
> > > >
> > > > The comparison between RDMA and TCP on the same NIC could make
> > > > more
> > > sense.
> > >
> > > It looks to me NICs are powerful now, but again as I mentioned I
> > > don't think it's a reason we need to deprecate rdma, especially
if
> > > QEMU's rdma migration has the chance to be refactored using rsocket.
> > >
> > > Is there anyone who started looking into that direction? Would it
> > > make sense we start some PoC now?
> > >
> >
> > My team has finished the PoC refactoring which works well.
> >
> > Progress:
> > 1. Implement io/channel-rdma.c,
> > 2. Add unit test tests/unit/test-io-channel-rdma.c and verifying it
> > is successful, 3. Remove the original code from migration/rdma.c, 4.
> > Rewrite the rdma_start_outgoing_migration and
> > rdma_start_incoming_migration logic, 5. Remove all rdma_xxx functions
> > from migration/ram.c. (to prevent RDMA live migration from polluting the
> core logic of live migration), 6. The soft-RoCE implemented by software is
> used to test the RDMA live migration. It's successful.
> >
> > We will be submit the patchset later.
>
> That's great news, thank you!
>
> --
> Peter Xu
For rdma programming, the current mainstream implementation is to use rdma_cm to
establish a connection, and then use verbs to transmit data.
rdma_cm and ibverbs create two FDs respectively. The two FDs have different
responsibilities. rdma_cm fd is used to notify connection establishment events,
and verbs fd is used to notify new CQEs. When poll/epoll monitoring is directly performed
on the rdma_cm fd, only a pollin event can be monitored, which means
that an rdma_cm event occurs. When the verbs fd is directly polled/epolled, only the
pollin event can be listened, which indicates that a new CQE is generated.
Rsocket is a sub-module attached to the rdma_cm library and provides rdma calls that are
completely similar to socket interfaces. However, this library returns
only the rdma_cm fd for listening to link setup-related events and does not expose the
verbs fd (readable and writable events for listening to data). Only the rpoll
interface provided by the RSocket can be used to listen to related events. However, QEMU
uses the ppoll interface to listen to the rdma_cm fd (gotten by raccept API).
And cannot listen to the verbs fd event. Only some hacking methods can be used to address
this problem.
Do you guys have any ideas? Thanks.