[libvirt] [PATCH 0/3] fix qemu:///session startup errors (bz 1271183)

libguestfs and gnome-boxes users have been reporting intermittent qemu:///session launch errors for a while: https://bugzilla.redhat.com/show_bug.cgi?id=1271183 This series fixes it, see individual patches for details Cole Robinson (3): rpc: socket: Minor cleanups rpc: socket: Explicitly error if we exceed retry count rpc: socket: Don't repeatedly attempt to launch daemon src/rpc/virnetsocket.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-) -- 2.5.0

- Add some debugging - Make the loop dependent only on retries - Make it explicit that connect(2) success exits the loop - Invert the error checking logic --- src/rpc/virnetsocket.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index 4f67c8f..80c21c1 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -620,6 +620,9 @@ int virNetSocketNewConnectUNIX(const char *path, char *rundir = NULL; int ret = -1; + VIR_DEBUG("path=%s spawnDaemon=%d binary=%s", path, spawnDaemon, + NULLSTR(binary)); + memset(&localAddr, 0, sizeof(localAddr)); memset(&remoteAddr, 0, sizeof(remoteAddr)); @@ -680,10 +683,15 @@ int virNetSocketNewConnectUNIX(const char *path, if (remoteAddr.data.un.sun_path[0] == '@') remoteAddr.data.un.sun_path[0] = '\0'; - while (retries && - connect(fd, &remoteAddr.data.sa, remoteAddr.len) < 0) { - if (!(spawnDaemon && (errno == ENOENT || - errno == ECONNREFUSED))) { + while (retries) { + if (connect(fd, &remoteAddr.data.sa, remoteAddr.len) == 0) { + VIR_DEBUG("connect() succeeded"); + break; + } + VIR_DEBUG("connect() failed, errno=%d", errno); + + if (!spawnDaemon || + (errno != ENOENT && errno != ECONNREFUSED)) { virReportSystemError(errno, _("Failed to connect socket to '%s'"), path); goto cleanup; -- 2.5.0

When we autolaunch libvirtd for session URIs, we spin in a retry loop waiting for the daemon to start and the connect(2) to succeed. However if we exceed the retry count, we don't explicitly raise an error, which can yield a slew of different error messages elsewhere in the code. Explicitly raise the last connect(2) failure if we run out of retries. --- src/rpc/virnetsocket.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index 80c21c1..2d6d44f 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -690,7 +690,9 @@ int virNetSocketNewConnectUNIX(const char *path, } VIR_DEBUG("connect() failed, errno=%d", errno); + retries--; if (!spawnDaemon || + retries == 0 || (errno != ENOENT && errno != ECONNREFUSED)) { virReportSystemError(errno, _("Failed to connect socket to '%s'"), path); @@ -700,7 +702,6 @@ int virNetSocketNewConnectUNIX(const char *path, if (virNetSocketForkDaemon(binary) < 0) goto cleanup; - retries--; usleep(5000); } -- 2.5.0

On Mon, Jan 11, 2016 at 08:22:33PM -0500, Cole Robinson wrote:
When we autolaunch libvirtd for session URIs, we spin in a retry loop waiting for the daemon to start and the connect(2) to succeed.
However if we exceed the retry count, we don't explicitly raise an error, which can yield a slew of different error messages elsewhere in the code.
Explicitly raise the last connect(2) failure if we run out of retries. --- src/rpc/virnetsocket.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index 80c21c1..2d6d44f 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -690,7 +690,9 @@ int virNetSocketNewConnectUNIX(const char *path, } VIR_DEBUG("connect() failed, errno=%d", errno);
+ retries--; if (!spawnDaemon || + retries == 0 || (errno != ENOENT && errno != ECONNREFUSED)) { virReportSystemError(errno, _("Failed to connect socket to '%s'"), path);
Good catch, although I would also like to see the 'retries' being used somewhere in a debug message, so that it is visible at least in the debug logs why that happened.

On 01/12/2016 01:31 AM, Martin Kletzander wrote:
On Mon, Jan 11, 2016 at 08:22:33PM -0500, Cole Robinson wrote:
When we autolaunch libvirtd for session URIs, we spin in a retry loop waiting for the daemon to start and the connect(2) to succeed.
However if we exceed the retry count, we don't explicitly raise an error, which can yield a slew of different error messages elsewhere in the code.
Explicitly raise the last connect(2) failure if we run out of retries. --- src/rpc/virnetsocket.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index 80c21c1..2d6d44f 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -690,7 +690,9 @@ int virNetSocketNewConnectUNIX(const char *path, } VIR_DEBUG("connect() failed, errno=%d", errno);
+ retries--; if (!spawnDaemon || + retries == 0 || (errno != ENOENT && errno != ECONNREFUSED)) { virReportSystemError(errno, _("Failed to connect socket to '%s'"), path);
Good catch, although I would also like to see the 'retries' being used somewhere in a debug message, so that it is visible at least in the debug logs why that happened.
Good point, I squashed retries into the above VIR_DEBUG message, and pushed this series. Thanks for the review - Cole

On every socket connect(2) attempt we were re-launching session libvirtd, up to 100 times in 5 seconds. This understandably caused some weird load races and intermittent qemu:///session startup failures https://bugzilla.redhat.com/show_bug.cgi?id=1271183 --- src/rpc/virnetsocket.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c index 2d6d44f..368774a 100644 --- a/src/rpc/virnetsocket.c +++ b/src/rpc/virnetsocket.c @@ -619,6 +619,7 @@ int virNetSocketNewConnectUNIX(const char *path, virSocketAddr remoteAddr; char *rundir = NULL; int ret = -1; + bool daemonLaunched = false; VIR_DEBUG("path=%s spawnDaemon=%d binary=%s", path, spawnDaemon, NULLSTR(binary)); @@ -699,8 +700,12 @@ int virNetSocketNewConnectUNIX(const char *path, goto cleanup; } - if (virNetSocketForkDaemon(binary) < 0) - goto cleanup; + if (!daemonLaunched) { + if (virNetSocketForkDaemon(binary) < 0) + goto cleanup; + + daemonLaunched = true; + } usleep(5000); } -- 2.5.0

On Mon, Jan 11, 2016 at 08:22:31PM -0500, Cole Robinson wrote:
libguestfs and gnome-boxes users have been reporting intermittent qemu:///session launch errors for a while:
https://bugzilla.redhat.com/show_bug.cgi?id=1271183
This series fixes it, see individual patches for details
We missed these problems when reworking that part of the code, but looking at them, they look like they were pre-existing (or at least some of them). Anyway, nise to have this cleaned up before the release. ACK and safe for freeze (with the tiny nit adjusted in 2/3).
Cole Robinson (3): rpc: socket: Minor cleanups rpc: socket: Explicitly error if we exceed retry count rpc: socket: Don't repeatedly attempt to launch daemon
src/rpc/virnetsocket.c | 28 +++++++++++++++++++++------- 1 file changed, 21 insertions(+), 7 deletions(-)
-- 2.5.0
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (2)
-
Cole Robinson
-
Martin Kletzander