Re: [libvirt] [PATCH v2 1/2] rpc: Report proper close reason for keepalive disconnections

5 Dec 2014

On Mon, Dec 01, 2014 at 12:00:52 +0100, Martin Kletzander wrote:
...
Whenever client socket was disconnected due to keepalive timeout, the
I/O event loop did not exit and continued until the point where the
hangup was found.  Ending with an error right away when the
keepalive times out takes care of this problem.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com>
---
 src/rpc/virkeepalive.c | 11 ++++++-----
 src/rpc/virnetclient.c |  1 +
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/src/rpc/virkeepalive.c b/src/rpc/virkeepalive.c
index c882313..f382f93 100644
--- a/src/rpc/virkeepalive.c
+++ b/src/rpc/virkeepalive.c
@@ -136,11 +136,12 @@ virKeepAliveTimerInternal(virKeepAlivePtr ka,
           ka, ka->client, ka->countToDeath, timeval);
if (ka->countToDeath == 0) {
-        VIR_WARN("No response from client %p after %d keepalive messages in"
-                 " %d seconds",
-                 ka->client,
-                 ka->count,
-                 timeval);
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("No response from client %p after %d keepalive messages in"
+                         " %d seconds"),
+                       ka->client,
+                       ka->count,
+                       timeval);
         return true;
     } else {
         ka->countToDeath--;
diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c
index 8657b0e..ac4850c 100644
--- a/src/rpc/virnetclient.c
+++ b/src/rpc/virnetclient.c
@@ -1525,6 +1525,7 @@ static int virNetClientIOEventLoop(virNetClientPtr client,
if (virKeepAliveTrigger(client->keepalive, &msg)) {
             virNetClientMarkClose(client, VIR_CONNECT_CLOSE_REASON_KEEPALIVE);
+            goto error;
         } else if (msg && virNetClientQueueNonBlocking(client, msg) < 0) {
             VIR_WARN("Could not queue keepalive request");
             virNetMessageFree(msg);
I don't think this patch is quite right. The first hunk is OK, but the
second one is problematic. We don't have an event loop here (sigh) and
thus we use the passing the buck algorithm to process requests from
several threads. So in general, in case of any error (not just keepalive
timeout), we should make sure we process all possibly pending data on
the socket first and let completed requests finish. Only unfinished
requests should report the error. We should first finish the rest of the
loop and only jump to the error path once we processed possibly
completed requests. Moreover we should make sure the right error is
reported by all threads, i.e., we either need to carry the original
error in client and copy it from there to the thread local error object
or each thread needs to report the error again (which is the current
approach with "received hangup / error event on socket" message at the
end of the loop).

Welcome to the client's IO loop code :-)

Jirka

    

Re: [libvirt] [PATCH v2 1/2] rpc: Report proper close reason for keepalive disconnections

Jiri Denemark