On 09/08/2014 01:46 AM, Martin Kletzander wrote:
This way it behaves more like the daemon itself does (acquiring a
pidfile, deleting the socket before binding, etc.).
Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=927369
Resolves:
https://bugzilla.redhat.com/show_bug.cgi?id=1138604
Signed-off-by: Martin Kletzander <mkletzan(a)redhat.com>
---
src/rpc/virnetsocket.c | 57 ++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 51 insertions(+), 6 deletions(-)
diff --git a/src/rpc/virnetsocket.c b/src/rpc/virnetsocket.c
index 42184bd..18a5a8f 100644
--- a/src/rpc/virnetsocket.c
+++ b/src/rpc/virnetsocket.c
@@ -51,9 +51,11 @@
#include "virlog.h"
#include "virfile.h"
#include "virthread.h"
+#include "virpidfile.h"
#include "virprobe.h"
#include "virprocess.h"
#include "virstring.h"
+#include "dirname.h"
#include "passfd.h"
#if WITH_SSH2
@@ -544,7 +546,10 @@ int virNetSocketNewConnectUNIX(const char *path,
const char *binary,
virNetSocketPtr *retsock)
{
+ char *binname = NULL;
+ char *pidpath = NULL;
int fd, passfd = -1;
+ int pidfd = -1;
virSocketAddr localAddr;
virSocketAddr remoteAddr;
@@ -583,16 +588,45 @@ int virNetSocketNewConnectUNIX(const char *path,
goto error;
}
+ if (!(binname = last_component(binary)) || binname[0] == '\0') {
+ virReportError(VIR_ERR_INTERNAL_ERROR,
+ _("Cannot determine basename for binary
'%s'"),
+ binary);
+ goto error;
+ }
+
+ if (virPidFileConstructPath(false, NULL, binname, &pidpath) < 0)
+ goto error;
+
+ if ((pidfd = virPidFileAcquirePath(pidpath, false, getpid())) < 0) {
+ /*
+ * This can happen in a very rare case of two clients spawning two
+ * daemons at the same time, and the error in the logs that gets
+ * reset here can be a clue to some future debugging.
+ */
+ virResetLastError();
+ spawnDaemon = false;
Since this is the failure path of connect(), the pidpath will be lost if
the subsequent connect() succeeds.
+ goto retry;
+ }
+
if ((passfd = socket(PF_UNIX, SOCK_STREAM, 0)) < 0) {
virReportSystemError(errno, "%s", _("Failed to create
socket"));
goto error;
}
/*
- * We have to fork() here, because umask() is set
- * per-process, chmod() is racy and fchmod() has undefined
- * behaviour on sockets according to POSIX, so it doesn't
- * work outside Linux.
+ * We already even acquired the pidfile, so noone else should be using
s/noone/no one/
+ * @path right now. So we're OK to unlink it and paying
attention to
+ * the return value makes no real sense here. Only if it's not an
+ * abstract socket, of course.
+ */
+ if (path[0] != '@')
+ unlink(path);
+
+ /*
+ * We have to fork() here, because umask() is set per-process, chmod()
+ * is racy and fchmod() has undefined behaviour on sockets according to
+ * POSIX, so it doesn't work outside Linux.
*/
if ((pid = virFork()) < 0)
goto error;
@@ -612,8 +646,9 @@ int virNetSocketNewConnectUNIX(const char *path,
/*
* OK, so the subprocces failed to bind() the socket. This may mean
Not related but, s/subprocces/subprocess [or I guess technically child]
* that another daemon was starting at the same time and
succeeded
- * with its bind(). So we'll try connecting again, but this time
- * without spawning the daemon.
+ * with its bind() (even though it should not happen because we
+ * using a pidfile for the race). So we'll try connecting again,
+ * but this time without spawning the daemon.
*/
spawnDaemon = false;
goto retry;
Like in the previous goto, since this is the failure path of a connect()
and if the subsequent one succeeds, then 'pidfd', 'pidpath',
'passfd'
are all leaked.
@@ -632,6 +667,12 @@ int virNetSocketNewConnectUNIX(const char
*path,
goto error;
}
+ /*
+ * Do we need to eliminate the super-rare race here any more? It would
+ * need incorporating the following VIR_FORCE_CLOSE() into a
+ * virCommandHook inside a virNetSocketForkDaemon().
+ */
+ VIR_FORCE_CLOSE(pidfd);
Well - you caused me to look... it's interesting to note that
"virNetSocketForkDaemon()" is bounded by "#ifndef WIN32", but it's
a bit
different here. Yes, I know it says *UNIX in the prototype, but yet
virNetSocketNewConnectUNIX is bounded by "#ifdef HAVE_SYS_UN_H"...
Of course any time there's a "timing window" available - something will
eventually find it (whether or not there a full-moon, on the 13th of the
month, when tides are extraordinarily high, or whatever other strange
thing happens when software goes bad).
Would another option to tighten the window be to pass the 'pidfd' (by
reference to ensure a VIR_FREE(*pidfd); does what you expect) to the
call and do the close "later" or closer to the condition causing the
race? Optionally adjust the code to return 'cmd' and run from here
assuming that's where the race is.
Perhaps eblake will have an opinion and thoughts on the 'super-rare'
condition...
if (virNetSocketForkDaemon(binary, passfd) < 0)
goto error;
}
@@ -648,8 +689,12 @@ int virNetSocketNewConnectUNIX(const char *path,
return 0;
error:
+ if (pidfd > 0)
+ virPidFileDeletePath(pidpath);
Is '0' a valid/possible 'pidfd'? Other places use pidfd != -1 and
virPidFileReleasePath() which does the CLOSE as well and changes the
order to avoid a different race.
John
+ VIR_FREE(pidpath);
VIR_FORCE_CLOSE(fd);
VIR_FORCE_CLOSE(passfd);
+ VIR_FORCE_CLOSE(pidfd);
if (spawnDaemon)
unlink(path);
return -1;