RX throttling causes keep-alive timeout

4 Aug 2021

      Hello folks,

I recently discovered that the max_client_requests configuration option
affects the keep-alive RPC and can cause the connection timeout, and I
wanted to verify if my understanding is correct.

Let me outline the context. Under certain circumstances, one of the
connected clients in my setup issued multiple concurrent long-standing
live-migration requests and reached the default limit of five concurrent
requests. Consequently, it triggered the RX throttling, so the server
stopped reading the incoming data from this client's file descriptor.
Meanwhile, the server issued the keepalive "ping" requests but ignored
the "pong" responses from the client due to the RX throttling. As a result,
the server concluded the client was dead and closed the connection with the
"connection closed due to keepalive timeout" message after the default five
"ping" attempts with five seconds timeout each.

The idea of throttling makes perfect sense: the server prevents hogging
of the worker thread pool (or would prevent the unbounded growth of the
memory footprint if the libvirtd server continued parsing the incoming
data and queued the requests). What concerns me is that the server drops
the connection for the alive clients when they're throttled.

One approach to this problem is implementing the EAGAIN-like handling:
parse the incoming data above the limit and respond with the error response,
but handle the keep-alive RPCs gracefully. However, I see two problems here:
either it's a backwards-compatibility concern if implemented unconditionally
or polluting the configuration space if implemented conditionally.

What is the community's opinion on the above issue?

Kind regards,
Ivan

RX throttling causes keep-alive timeout

Ivan Teterevkov