On Wed, Apr 10, 2024 at 09:49:15AM -0400, Peter Xu wrote:
On Wed, Apr 10, 2024 at 02:28:59AM +0000, Zhijian Li (Fujitsu) via
wrote:
>
>
> on 4/10/2024 3:46 AM, Peter Xu wrote:
>
> >> Is there document/link about the unittest/CI for migration tests, Why
> >> are those tests missing?
> >> Is it hard or very special to set up an environment for that? maybe we
> >> can help in this regards.
> > See tests/qtest/migration-test.c. We put most of our migration tests
> > there and that's covered in CI.
> >
> > I think one major issue is CI systems don't normally have rdma devices.
> > Can rdma migration test be carried out without a real hardware?
>
> Yeah, RXE aka. SOFT-RoCE is able to emulate the RDMA, for example
> $ sudo rdma link add rxe_eth0 type rxe netdev eth0 # on host
> then we can get a new RDMA interface "rxe_eth0".
> This new RDMA interface is able to do the QEMU RDMA migration.
>
> Also, the loopback(lo) device is able to emulate the RDMA interface
> "rxe_lo", however when
> I tried(years ago) to do RDMA migration over this
> interface(rdma:127.0.0.1:3333) , it got something wrong.
> So i gave up enabling the RDMA migration qtest at that time.
Thanks, Zhijian.
I'm not sure adding an emu-link for rdma is doable for CI systems, though.
Maybe someone more familiar with how CI works can chim in.
Some people got dropped on the cc list for unknown reason, I'm adding them
back (Fabiano, Peter Maydell, Phil). Let's make sure nobody is dropped by
accident.
I'll try to summarize what is still missing, and I think these will be
greatly helpful if we don't want to deprecate rdma migration:
1) Either a CI test covering at least the major RDMA paths, or at least
periodically tests for each QEMU release will be needed.
2) Some performance tests between modern RDMA and NIC devices are
welcomed. The current knowledge is modern NIC can work similarly to
RDMA in performance, then it's debatable why we still maintain so much
rdma specific code.
3) No need to be soild patchsets for this one, but some plan to improve
RDMA migration code so that it is not almost isolated from the rest
protocols.
4) Someone to look after this code for real.
For 2) and 3) more info is here:
https://lore.kernel.org/r/ZhWa0YeAb9ySVKD1@x1n
Here 4) can be the most important as Markus pointed out. We just didn't
get there yet on the discussions, but maybe Markus is right that we should
talk that first.
Thanks,
--
Peter Xu