Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

5 Jun 2024


      * Peter Xu (peterx@redhat.com) wrote:
...
Hey, Dave!
Hey!
...
On Wed, Jun 05, 2024 at 12:31:56AM +0000, Dr. David Alan Gilbert wrote:
...
* Michael Galaxy (mgalaxy@akamai.com) wrote:
...
One thing to keep in mind here (despite me not having any hardware to test)
was that one of the original goals here
in the RDMA implementation was not simply raw throughput nor raw latency,
but a lack of CPU utilization in kernel
space due to the offload. While it is entirely possible that newer hardware
w/ TCP might compete, the significant
reductions in CPU usage in the TCP/IP stack were a big win at the time.
Just something to consider while you're doing the testing........
I just noticed this thread; some random notes from a somewhat
fragmented memory of this:
a) Long long ago, I also tried rsocket; 
      https://lists.gnu.org/archive/html/qemu-devel/2015-01/msg02040.html
     as I remember the library was quite flaky at the time.
Hmm interesting.  There also looks like a thread doing rpoll().
Yeh, I can't actually remember much more about what I did back then!
...
Btw, not sure whether you noticed, but there's the series posted for the
latest rsocket conversion here:
https://lore.kernel.org/r/1717503252-51884-1-git-send-email-arei.gonglei@hua...
Oh I hadn't; I think all of the stack of qemu's file abstractions had
changed in the ~10 years since I wrote my version!
...
I hope Lei and his team has tested >4G mem, otherwise definitely worth
checking.  Lei also mentioned there're rsocket bugs they found in the cover
letter, but not sure what's that about.
It would probably be a good idea to keep track of what bugs
are in flight with it, and try it on a few RDMA cards to see
what problems get triggered.
I think I reported a few at the time, but I gave up after
feeling it was getting very hacky.
...
Yes, and zero-copy requires multifd for now. I think it's because we didn't
want to complicate the header processings in the migration stream where it
may not be page aligned.
Ah yes.
...
...
e) Someone made a good suggestion (sorry can't remember who) - that the
     RDMA migration structure was the wrong way around - it should be the
     destination which initiates an RDMA read, rather than the source
     doing a write; then things might become a LOT simpler; you just need
     to send page ranges to the destination and it can pull it.
     That might work nicely for postcopy.
I'm not sure whether it'll still be a problem if rdma recv side is based on
zero-copy.  It would be a matter of whether atomicity can be guaranteed so
that we don't want the guest vcpus to see a partially copied page during
on-flight DMAs.  UFFDIO_COPY (or friend) is currently the only solution for
that.
Yes, but even ignoring that (and the UFFDIO_CONTINUE idea you mention), if
the destination can issue an RDMA read itself, it doesn't need to send messages
to the source to ask for a page fetch; it just goes and grabs it itself,
that's got to be good for latency.

Dave
...
Thanks,
-- 
Peter Xu
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/