[libvirt] [PATCH v2 0/2] virsh: Don't wait for reconnection when it's unnecessary

Before when disconnected due to keepalive timeout: $ virsh -k1 -K1 list 2014-12-01 10:58:42.725+0000: 1643: info : libvirt version: 1.2.11 2014-12-01 10:58:42.725+0000: 1643: warning : virKeepAliveTimerInternal:143 : No response from client 0x7fa8b7c46f70 after 1 keepalive messages in 2 seconds 2014-12-01 10:58:42.725+0000: 1644: warning : virKeepAliveTimerInternal:143 : No response from client 0x7fa8b7c46f70 after 1 keepalive messages in 2 seconds error: Failed to list domains * virsh hangs here until reconnected or killed, if reconnected: error: internal error: received hangup / error event on socket After (the same scenario): $ virsh -k1 -K1 list error: Failed to list domains error: internal error: No response from client 0x7ff0e9545f70 after 1 keepalive messages in 2 seconds * virsh doesn't hang :) Martin Kletzander (2): rpc: Report proper close reason for keepalive disconnections virsh: Don't reconnect after the command when disconnected src/rpc/virkeepalive.c | 11 ++++++----- src/rpc/virnetclient.c | 1 + tools/virsh.c | 3 --- 3 files changed, 7 insertions(+), 8 deletions(-) -- 2.1.3

Whenever client socket was disconnected due to keepalive timeout, the I/O event loop did not exit and continued until the point where the hangup was found. Ending with an error right away when the keepalive times out takes care of this problem. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/rpc/virkeepalive.c | 11 ++++++----- src/rpc/virnetclient.c | 1 + 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/src/rpc/virkeepalive.c b/src/rpc/virkeepalive.c index c882313..f382f93 100644 --- a/src/rpc/virkeepalive.c +++ b/src/rpc/virkeepalive.c @@ -136,11 +136,12 @@ virKeepAliveTimerInternal(virKeepAlivePtr ka, ka, ka->client, ka->countToDeath, timeval); if (ka->countToDeath == 0) { - VIR_WARN("No response from client %p after %d keepalive messages in" - " %d seconds", - ka->client, - ka->count, - timeval); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No response from client %p after %d keepalive messages in" + " %d seconds"), + ka->client, + ka->count, + timeval); return true; } else { ka->countToDeath--; diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c index 8657b0e..ac4850c 100644 --- a/src/rpc/virnetclient.c +++ b/src/rpc/virnetclient.c @@ -1525,6 +1525,7 @@ static int virNetClientIOEventLoop(virNetClientPtr client, if (virKeepAliveTrigger(client->keepalive, &msg)) { virNetClientMarkClose(client, VIR_CONNECT_CLOSE_REASON_KEEPALIVE); + goto error; } else if (msg && virNetClientQueueNonBlocking(client, msg) < 0) { VIR_WARN("Could not queue keepalive request"); virNetMessageFree(msg); -- 2.1.3

On Mon, Dec 01, 2014 at 12:00:52 +0100, Martin Kletzander wrote:
Whenever client socket was disconnected due to keepalive timeout, the I/O event loop did not exit and continued until the point where the hangup was found. Ending with an error right away when the keepalive times out takes care of this problem.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/rpc/virkeepalive.c | 11 ++++++----- src/rpc/virnetclient.c | 1 + 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/src/rpc/virkeepalive.c b/src/rpc/virkeepalive.c index c882313..f382f93 100644 --- a/src/rpc/virkeepalive.c +++ b/src/rpc/virkeepalive.c @@ -136,11 +136,12 @@ virKeepAliveTimerInternal(virKeepAlivePtr ka, ka, ka->client, ka->countToDeath, timeval);
if (ka->countToDeath == 0) { - VIR_WARN("No response from client %p after %d keepalive messages in" - " %d seconds", - ka->client, - ka->count, - timeval); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No response from client %p after %d keepalive messages in" + " %d seconds"), + ka->client, + ka->count, + timeval); return true; } else { ka->countToDeath--; diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c index 8657b0e..ac4850c 100644 --- a/src/rpc/virnetclient.c +++ b/src/rpc/virnetclient.c @@ -1525,6 +1525,7 @@ static int virNetClientIOEventLoop(virNetClientPtr client,
if (virKeepAliveTrigger(client->keepalive, &msg)) { virNetClientMarkClose(client, VIR_CONNECT_CLOSE_REASON_KEEPALIVE); + goto error; } else if (msg && virNetClientQueueNonBlocking(client, msg) < 0) { VIR_WARN("Could not queue keepalive request"); virNetMessageFree(msg);
I don't think this patch is quite right. The first hunk is OK, but the second one is problematic. We don't have an event loop here (sigh) and thus we use the passing the buck algorithm to process requests from several threads. So in general, in case of any error (not just keepalive timeout), we should make sure we process all possibly pending data on the socket first and let completed requests finish. Only unfinished requests should report the error. We should first finish the rest of the loop and only jump to the error path once we processed possibly completed requests. Moreover we should make sure the right error is reported by all threads, i.e., we either need to carry the original error in client and copy it from there to the thread local error object or each thread needs to report the error again (which is the current approach with "received hangup / error event on socket" message at the end of the loop). Welcome to the client's IO loop code :-) Jirka

On Fri, Dec 05, 2014 at 12:30:17PM +0100, Jiri Denemark wrote:
On Mon, Dec 01, 2014 at 12:00:52 +0100, Martin Kletzander wrote:
Whenever client socket was disconnected due to keepalive timeout, the I/O event loop did not exit and continued until the point where the hangup was found. Ending with an error right away when the keepalive times out takes care of this problem.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/rpc/virkeepalive.c | 11 ++++++----- src/rpc/virnetclient.c | 1 + 2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/src/rpc/virkeepalive.c b/src/rpc/virkeepalive.c index c882313..f382f93 100644 --- a/src/rpc/virkeepalive.c +++ b/src/rpc/virkeepalive.c @@ -136,11 +136,12 @@ virKeepAliveTimerInternal(virKeepAlivePtr ka, ka, ka->client, ka->countToDeath, timeval);
if (ka->countToDeath == 0) { - VIR_WARN("No response from client %p after %d keepalive messages in" - " %d seconds", - ka->client, - ka->count, - timeval); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No response from client %p after %d keepalive messages in" + " %d seconds"), + ka->client, + ka->count, + timeval); return true; } else { ka->countToDeath--; diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c index 8657b0e..ac4850c 100644 --- a/src/rpc/virnetclient.c +++ b/src/rpc/virnetclient.c @@ -1525,6 +1525,7 @@ static int virNetClientIOEventLoop(virNetClientPtr client,
if (virKeepAliveTrigger(client->keepalive, &msg)) { virNetClientMarkClose(client, VIR_CONNECT_CLOSE_REASON_KEEPALIVE); + goto error; } else if (msg && virNetClientQueueNonBlocking(client, msg) < 0) { VIR_WARN("Could not queue keepalive request"); virNetMessageFree(msg);
I don't think this patch is quite right. The first hunk is OK, but the second one is problematic. We don't have an event loop here (sigh) and thus we use the passing the buck algorithm to process requests from several threads. So in general, in case of any error (not just keepalive timeout), we should make sure we process all possibly pending data on the socket first and let completed requests finish. Only unfinished requests should report the error. We should first finish the rest of the loop and only jump to the error path once we processed possibly completed requests. Moreover we should make sure the right error is reported by all threads, i.e., we either need to carry the original error in client and copy it from there to the thread local error object or each thread needs to report the error again (which is the current approach with "received hangup / error event on socket" message at the end of the loop).
Welcome to the client's IO loop code :-)
So if I understand it properly, I cannot use virReportError(), I cannot finish the execution with any data left in the socket to read... What *can* I do? :D As you ACKed the second patch, which can go in without this one, would you be OK with it going it with the previous series as well (all together), i.e.: https://www.redhat.com/archives/libvir-list/2014-November/msg01081.html ttps://www.redhat.com/archives/libvir-list/2014-November/msg01082.html *and* 2/2 of this v2? Martin

Each command that needs a connection causes a new connection to be made. Reconnecting after a command failed is pointless, mainly when there is no other command to run. Removeing three lines of code takes care of that and keeps virsh working as it should. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- tools/virsh.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index bcfa561..0ead9ae 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -1958,9 +1958,6 @@ vshCommandRun(vshControl *ctl, const vshCmd *cmd) if (!ret) vshReportError(ctl); - if (!ret && disconnected != 0) - vshReconnect(ctl); - if (STREQ(cmd->def->name, "quit") || STREQ(cmd->def->name, "exit")) /* hack ... */ return ret; -- 2.1.3

On Mon, Dec 01, 2014 at 12:00:53 +0100, Martin Kletzander wrote:
Each command that needs a connection causes a new connection to be made. Reconnecting after a command failed is pointless, mainly when there is no other command to run. Removeing three lines of code takes care of that and keeps virsh working as it should.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- tools/virsh.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/tools/virsh.c b/tools/virsh.c index bcfa561..0ead9ae 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -1958,9 +1958,6 @@ vshCommandRun(vshControl *ctl, const vshCmd *cmd) if (!ret) vshReportError(ctl);
- if (!ret && disconnected != 0) - vshReconnect(ctl); - if (STREQ(cmd->def->name, "quit") || STREQ(cmd->def->name, "exit")) /* hack ... */ return ret;
ACK Jirka

On Fri, Dec 05, 2014 at 12:32:28PM +0100, Jiri Denemark wrote:
On Mon, Dec 01, 2014 at 12:00:53 +0100, Martin Kletzander wrote:
Each command that needs a connection causes a new connection to be made. Reconnecting after a command failed is pointless, mainly when there is no other command to run. Removeing three lines of code takes care of that and keeps virsh working as it should.
Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- tools/virsh.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/tools/virsh.c b/tools/virsh.c index bcfa561..0ead9ae 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -1958,9 +1958,6 @@ vshCommandRun(vshControl *ctl, const vshCmd *cmd) if (!ret) vshReportError(ctl);
- if (!ret && disconnected != 0) - vshReconnect(ctl); - if (STREQ(cmd->def->name, "quit") || STREQ(cmd->def->name, "exit")) /* hack ... */ return ret;
ACK
Pushed, thanks. Martin
participants (2)
-
Jiri Denemark
-
Martin Kletzander