Hey, Dave!
On Wed, Jun 05, 2024 at 12:31:56AM +0000, Dr. David Alan Gilbert wrote:
* Michael Galaxy (mgalaxy(a)akamai.com) wrote:
> One thing to keep in mind here (despite me not having any hardware to test)
> was that one of the original goals here
> in the RDMA implementation was not simply raw throughput nor raw latency,
> but a lack of CPU utilization in kernel
> space due to the offload. While it is entirely possible that newer hardware
> w/ TCP might compete, the significant
> reductions in CPU usage in the TCP/IP stack were a big win at the time.
>
> Just something to consider while you're doing the testing........
I just noticed this thread; some random notes from a somewhat
fragmented memory of this:
a) Long long ago, I also tried rsocket;
https://lists.gnu.org/archive/html/qemu-devel/2015-01/msg02040.html
as I remember the library was quite flaky at the time.
Hmm interesting. There also looks like a thread doing rpoll().
Btw, not sure whether you noticed, but there's the series posted for the
latest rsocket conversion here:
https://lore.kernel.org/r/1717503252-51884-1-git-send-email-arei.gonglei@...
I hope Lei and his team has tested >4G mem, otherwise definitely worth
checking. Lei also mentioned there're rsocket bugs they found in the cover
letter, but not sure what's that about.
b) A lot of the complexity in the rdma migration code comes from
emulating a stream to carry the migration control data and interleaving
that with the actual RAM copy. I believe the original design used
a separate TCP socket for the control data, and just used the RDMA
for the data - that should be a lot simpler (but alas was rejected
in review early on)
c) I can't rememmber the last benchmarks I did; but I think I did
manage to beat RDMA with multifd; but yes, multifd does eat host CPU
where as RDMA barely uses a whisper.
I think my first impression on this matter came from you on this one. :)
d) The 'zero-copy-send' option in migrate may well get some of that
CPU time back; but if I remember we were still bottle necked on
the receive side. (I can't remember if zero-copy-send worked with
multifd?)
Yes, and zero-copy requires multifd for now. I think it's because we didn't
want to complicate the header processings in the migration stream where it
may not be page aligned.
e) Someone made a good suggestion (sorry can't remember who) - that the
RDMA migration structure was the wrong way around - it should be the
destination which initiates an RDMA read, rather than the source
doing a write; then things might become a LOT simpler; you just need
to send page ranges to the destination and it can pull it.
That might work nicely for postcopy.
I'm not sure whether it'll still be a problem if rdma recv side is based on
zero-copy. It would be a matter of whether atomicity can be guaranteed so
that we don't want the guest vcpus to see a partially copied page during
on-flight DMAs. UFFDIO_COPY (or friend) is currently the only solution for
that.
Thanks,
--
Peter Xu