Re: [libvirt] [dbus RFC 06/11] connect: implement reconnect functionality to libvirt

23 Jan 2018

On Tue, Jan 23, 2018 at 10:02:24AM +0000, Daniel P. Berrange wrote:
...
On Mon, Jan 22, 2018 at 06:16:04PM +0100, Pavel Hrdina wrote:
...
If the connection dies for some reason between D-Bus calls we need
to properly reconnect because the current connection is not usable
anymore.  Without this the only solution would be to restart the
libvirt-dbus daemon.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com>
---
 src/connect.c | 37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/src/connect.c b/src/connect.c
index 9de764c..10183f3 100644
--- a/src/connect.c
+++ b/src/connect.c
@@ -4,6 +4,7 @@
 #include "util.h"
#include <errno.h>
+#include <stdbool.h>
 #include <stdlib.h>
static int virtDBusConnectCredType[] = {
@@ -34,12 +35,34 @@ static virConnectAuth virtDBusConnectAuth = {
     NULL,
 };
+static void
+virtDBusConnectClose(virtDBusConnect *connect,
+                     bool deregisterEvents)
+{
+
+    for (int i = 0; i < VIR_DOMAIN_EVENT_ID_LAST; i += 1) {
+        if (connect->callback_ids[i] >= 0) {
+            if (deregisterEvents) {
+                virConnectDomainEventDeregisterAny(connect->connection,
+                                                   connect->callback_ids[i]);
+            }
+            connect->callback_ids[i] = -1;
+        }
+    }
+
+    virConnectClose(connect->connection);
I think it is prudent to set connect->connection = NULL at this
point.
Right, I'll fix that.
...
...
static int
 virtDBusConnectOpen(virtDBusConnect *connect,
                     sd_bus_error *error)
 {
-    if (connect->connection)
-        return 0;
+    if (connect->connection) {
+        if (virConnectIsAlive(connect->connection))
This means that every single dbus call runs an extra RPC call to ping
the server for liveliness.
That's not a huge problem, but at some point I'd recommend that 
just use the close callback to immediately detect connection failure
and close the connection & run background job to re-open it.
Presumably you're going to issue dbus signals for domain lifecycle
events.  Using the close callback & job to re-open means your
lifecycle events will start working again in the shortest amount
of time, instead of waiting for the next methd call to detect
the failure.
I knew that this implementation was too easy.  I'll prepare a followup
patch to use close callback instead.

Pavel