Re: [libvirt] [PATCH 2/4] virNetSocket: Be more safe with fork() around virNetSocketDupFD()

26 Sep 2018

On 09/25/2018 06:25 PM, John Ferlan wrote:
...
On 9/21/18 5:29 AM, Michal Privoznik wrote:
...
If there was a caller which would dup the client FD without
CLOEXEC flag and later decided to change the flag it wouldn't be
safe to do because fork() might have had occurred meantime.
blank line
...
Switch to the other pattern - always dup FD with the flag set and
let callers clear the flag if they need to do so.
I don't see this last sentence as happening in this patch...
...
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/libxl/libxl_migration.c |  4 ++--
 src/qemu/qemu_migration.c   |  2 +-
 src/rpc/virnetclient.c      | 10 +++++++++-
 src/rpc/virnetsocket.c      |  7 ++-----
 src/rpc/virnetsocket.h      |  2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)
I'm with Marc on this - documenting virNetSocketDupFD is a good thing.
Although it could be considered out of the scope of what you're doing,
since you are changing the semantics describing the nuance that fork
brings about w/r/t fd duplication in the code rather than the commit
message doesn't make someone have to go back and look at the commit
message to gain more insight.  It also should provide the reader with
the knowledge of what virSetInherit would provide to remove if so desired.
Of course while you're at it, documenting virNetClientDupFD similarly
would be quite beneficial.
...

diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index fc7ccb53d0..5eb8eb1203 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -310,7 +310,7 @@ libxlMigrateDstReceive(virNetSocketPtr sock,
     }
     VIR_DEBUG("Accepted migration connection."
               "  Spawning thread to process migration data");
-    recvfd = virNetSocketDupFD(client_sock, true);
+    recvfd = virNetSocketDupFD(client_sock);
     virObjectUnref(client_sock);
/*
@@ -1254,7 +1254,7 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivatePtr driver,
         goto cleanup;
     }
-    sockfd = virNetSocketDupFD(sock, true);
+    sockfd = virNetSocketDupFD(sock);
     virObjectUnref(sock);
if (virDomainLockProcessPause(driver->lockManager, vm, &priv->lockState) < 0)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 825a9d399b..129be0a11a 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -3303,7 +3303,7 @@ qemuMigrationSrcConnect(virQEMUDriverPtr driver,
     if (virNetSocketNewConnectTCP(host, port,
                                   AF_UNSPEC,
                                   &sock) == 0) {
-        spec->dest.fd.qemu = virNetSocketDupFD(sock, true);
+        spec->dest.fd.qemu = virNetSocketDupFD(sock);
         virObjectUnref(sock);
     }
     if (qemuSecurityClearSocketLabel(driver->securityManager, vm->def) < 0 ||
diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c
index b4d8fb2187..40ed3fde6d 100644
--- a/src/rpc/virnetclient.c
+++ b/src/rpc/virnetclient.c
@@ -715,8 +715,16 @@ int virNetClientGetFD(virNetClientPtr client)
 int virNetClientDupFD(virNetClientPtr client, bool cloexec)
 {
     int fd;
+
     virObjectLock(client);
-    fd = virNetSocketDupFD(client->sock, cloexec);
+
+    fd = virNetSocketDupFD(client->sock);
+    if (!cloexec && fd >= 0 && virSetInherit(fd, true)) {
+        virReportSystemError(errno, "%s",
+                             _("Cannot disable close-on-exec flag"));
+        VIR_FORCE_CLOSE(fd);
+    }
+
Caveat: my sock knowledge of setting or duplication of the socket fd
isn't that deep, especially as it relates to fork "mechanics".
So prior to this change, the call to virNetSocketDupFD(fd, false) would:
fd = dup(sock->fd);
Or essentially create a new socket fd from the existing one copying
everything it already has...
With this change you have (discounting some new error paths)
fd = fcntl(sock->fd, F_DUPFD_CLOEXEC, 0);
        fcntl(fd, F_SETFD, ~FD_CLOEXEC)
which seems to be a nop and/or unnecessary to me.
Beyond that if the sock->fd didn't have CLOEXEC already, then what
impact does briefly changing CLOEXEC have on something that arrives
while set?
So imagine the following happening. There are two threads A,B and they
do the following:

    thread A        |  thread B   | child
    ----------------+-------------+--------
    copy = dup(fd); |             |
                    |  fork();    |
    fcntl(copy);    |             |
                    |             |  exec();

(the fcntl() call is there to set CLOEXEC flag)

This will obviously not work as expected because when B does fork() both
@copy and @fd descriptors are copied to child. So even when A then tries
to set CLOEXEC flag it won't matter as it doesn't affect FDs inside
child. In the end, @copy is leaked to a process that child exec()-s.


My patches make this safer:

    thread A          |  thread B   | child
    ------------------+-------------+--------
    copy = fcntl(fd); |             |
                      |  XXX        |
                      |  fork();    |
                      |             |  exec();

(here, the fcntl() call is to atomically duplicate the @fd and set
CLOEXEC flag)

It's clean to see that no matter where fork() is placed, @copy is not
leaked into the new process. Either @copy exists in child but it's
closed on exec() (because of the flag), or fork() happened before
fcntl() call and there is no @copy in child at all.

The XXX can be replaced with virSetInherit() if the code intentionally
wants to leak @copy into exec().
...
And of course if sock->fd had CLOEXEC and then you clear it on the
'shared' socket, that doesn't seem right either.
In fact it does. When starting a domain, we fork() plenty of times.
Also, fork() happens in secdriver transactions, when fetching qemu caps,
on every virCommandRun(), etc. What I'm trying to say is that fork() +
exec() happen pretty asynchronously and therefore we can't do dup() +
fcntl() because that is not atomic (as shown above).

In fact, it's fairly easy to see this problem in action. What was
sufficient for me was to start two domains at once from a loop. As I say
in the cover letter. While one thread was doing the metada lock (where
connection FD to virtlockd is duplicated), the other did fork() to start
qemu and the connection FD got leaked there.
...
Seems to me the dup() does the trick and then lets the caller decide
what CLOEXEC semantics they want with the duplicated fd without perhaps
changing the sock->fd semantics.
No, this is not atomic and therefore it's plain wrong.

Michal