On Tue, Jan 23, 2018 at 10:02:24AM +0000, Daniel P. Berrange wrote:
On Mon, Jan 22, 2018 at 06:16:04PM +0100, Pavel Hrdina wrote:
> If the connection dies for some reason between D-Bus calls we need
> to properly reconnect because the current connection is not usable
> anymore. Without this the only solution would be to restart the
> libvirt-dbus daemon.
>
> Signed-off-by: Pavel Hrdina <phrdina(a)redhat.com>
> ---
> src/connect.c | 37 +++++++++++++++++++++++++++----------
> 1 file changed, 27 insertions(+), 10 deletions(-)
>
> diff --git a/src/connect.c b/src/connect.c
> index 9de764c..10183f3 100644
> --- a/src/connect.c
> +++ b/src/connect.c
> @@ -4,6 +4,7 @@
> #include "util.h"
>
> #include <errno.h>
> +#include <stdbool.h>
> #include <stdlib.h>
>
> static int virtDBusConnectCredType[] = {
> @@ -34,12 +35,34 @@ static virConnectAuth virtDBusConnectAuth = {
> NULL,
> };
>
> +static void
> +virtDBusConnectClose(virtDBusConnect *connect,
> + bool deregisterEvents)
> +{
> +
> + for (int i = 0; i < VIR_DOMAIN_EVENT_ID_LAST; i += 1) {
> + if (connect->callback_ids[i] >= 0) {
> + if (deregisterEvents) {
> + virConnectDomainEventDeregisterAny(connect->connection,
> + connect->callback_ids[i]);
> + }
> + connect->callback_ids[i] = -1;
> + }
> + }
> +
> + virConnectClose(connect->connection);
I think it is prudent to set connect->connection = NULL at this
point.
Right, I'll fix that.
> static int
> virtDBusConnectOpen(virtDBusConnect *connect,
> sd_bus_error *error)
> {
> - if (connect->connection)
> - return 0;
> + if (connect->connection) {
> + if (virConnectIsAlive(connect->connection))
This means that every single dbus call runs an extra RPC call to ping
the server for liveliness.
That's not a huge problem, but at some point I'd recommend that
just use the close callback to immediately detect connection failure
and close the connection & run background job to re-open it.
Presumably you're going to issue dbus signals for domain lifecycle
events. Using the close callback & job to re-open means your
lifecycle events will start working again in the shortest amount
of time, instead of waiting for the next methd call to detect
the failure.
I knew that this implementation was too easy. I'll prepare a followup
patch to use close callback instead.
Pavel