
* Peter Xu (peterx@redhat.com) wrote:
Hey, Dave!
Hey!
On Wed, Jun 05, 2024 at 12:31:56AM +0000, Dr. David Alan Gilbert wrote:
* Michael Galaxy (mgalaxy@akamai.com) wrote:
One thing to keep in mind here (despite me not having any hardware to test) was that one of the original goals here in the RDMA implementation was not simply raw throughput nor raw latency, but a lack of CPU utilization in kernel space due to the offload. While it is entirely possible that newer hardware w/ TCP might compete, the significant reductions in CPU usage in the TCP/IP stack were a big win at the time.
Just something to consider while you're doing the testing........
I just noticed this thread; some random notes from a somewhat fragmented memory of this:
a) Long long ago, I also tried rsocket; https://lists.gnu.org/archive/html/qemu-devel/2015-01/msg02040.html as I remember the library was quite flaky at the time.
Hmm interesting. There also looks like a thread doing rpoll().
Yeh, I can't actually remember much more about what I did back then!
Btw, not sure whether you noticed, but there's the series posted for the latest rsocket conversion here:
https://lore.kernel.org/r/1717503252-51884-1-git-send-email-arei.gonglei@hua...
Oh I hadn't; I think all of the stack of qemu's file abstractions had changed in the ~10 years since I wrote my version!
I hope Lei and his team has tested >4G mem, otherwise definitely worth checking. Lei also mentioned there're rsocket bugs they found in the cover letter, but not sure what's that about.
It would probably be a good idea to keep track of what bugs are in flight with it, and try it on a few RDMA cards to see what problems get triggered. I think I reported a few at the time, but I gave up after feeling it was getting very hacky.
Yes, and zero-copy requires multifd for now. I think it's because we didn't want to complicate the header processings in the migration stream where it may not be page aligned.
Ah yes.
e) Someone made a good suggestion (sorry can't remember who) - that the RDMA migration structure was the wrong way around - it should be the destination which initiates an RDMA read, rather than the source doing a write; then things might become a LOT simpler; you just need to send page ranges to the destination and it can pull it. That might work nicely for postcopy.
I'm not sure whether it'll still be a problem if rdma recv side is based on zero-copy. It would be a matter of whether atomicity can be guaranteed so that we don't want the guest vcpus to see a partially copied page during on-flight DMAs. UFFDIO_COPY (or friend) is currently the only solution for that.
Yes, but even ignoring that (and the UFFDIO_CONTINUE idea you mention), if the destination can issue an RDMA read itself, it doesn't need to send messages to the source to ask for a page fetch; it just goes and grabs it itself, that's got to be good for latency. Dave
Thanks,
-- Peter Xu
-- -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux | Happy \ \ dave @ treblig.org | | In Hex / \ _________________________|_____ http://www.treblig.org |_______/