
On Tue, Jan 23, 2018 at 10:02:24AM +0000, Daniel P. Berrange wrote:
On Mon, Jan 22, 2018 at 06:16:04PM +0100, Pavel Hrdina wrote:
If the connection dies for some reason between D-Bus calls we need to properly reconnect because the current connection is not usable anymore. Without this the only solution would be to restart the libvirt-dbus daemon.
Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- src/connect.c | 37 +++++++++++++++++++++++++++---------- 1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/src/connect.c b/src/connect.c index 9de764c..10183f3 100644 --- a/src/connect.c +++ b/src/connect.c @@ -4,6 +4,7 @@ #include "util.h"
#include <errno.h> +#include <stdbool.h> #include <stdlib.h>
static int virtDBusConnectCredType[] = { @@ -34,12 +35,34 @@ static virConnectAuth virtDBusConnectAuth = { NULL, };
+static void +virtDBusConnectClose(virtDBusConnect *connect, + bool deregisterEvents) +{ + + for (int i = 0; i < VIR_DOMAIN_EVENT_ID_LAST; i += 1) { + if (connect->callback_ids[i] >= 0) { + if (deregisterEvents) { + virConnectDomainEventDeregisterAny(connect->connection, + connect->callback_ids[i]); + } + connect->callback_ids[i] = -1; + } + } + + virConnectClose(connect->connection);
I think it is prudent to set connect->connection = NULL at this point.
Right, I'll fix that.
static int virtDBusConnectOpen(virtDBusConnect *connect, sd_bus_error *error) { - if (connect->connection) - return 0; + if (connect->connection) { + if (virConnectIsAlive(connect->connection))
This means that every single dbus call runs an extra RPC call to ping the server for liveliness.
That's not a huge problem, but at some point I'd recommend that just use the close callback to immediately detect connection failure and close the connection & run background job to re-open it.
Presumably you're going to issue dbus signals for domain lifecycle events. Using the close callback & job to re-open means your lifecycle events will start working again in the shortest amount of time, instead of waiting for the next methd call to detect the failure.
I knew that this implementation was too easy. I'll prepare a followup patch to use close callback instead. Pavel