Re: [libvirt] [PATCH 2/4] virNetSocket: Be more safe with fork() around virNetSocketDupFD()

26 Sep 2018

On 9/26/18 5:46 AM, Michal Privoznik wrote:
...
On 09/25/2018 06:25 PM, John Ferlan wrote:
...
On 9/21/18 5:29 AM, Michal Privoznik wrote:
...
If there was a caller which would dup the client FD without
CLOEXEC flag and later decided to change the flag it wouldn't be
safe to do because fork() might have had occurred meantime.
blank line
...
Switch to the other pattern - always dup FD with the flag set and
let callers clear the flag if they need to do so.
I don't see this last sentence as happening in this patch...
...
Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/libxl/libxl_migration.c |  4 ++--
 src/qemu/qemu_migration.c   |  2 +-
 src/rpc/virnetclient.c      | 10 +++++++++-
 src/rpc/virnetsocket.c      |  7 ++-----
 src/rpc/virnetsocket.h      |  2 +-
 5 files changed, 15 insertions(+), 10 deletions(-)
I'm with Marc on this - documenting virNetSocketDupFD is a good thing.
Although it could be considered out of the scope of what you're doing,
since you are changing the semantics describing the nuance that fork
brings about w/r/t fd duplication in the code rather than the commit
message doesn't make someone have to go back and look at the commit
message to gain more insight.  It also should provide the reader with
the knowledge of what virSetInherit would provide to remove if so desired.
Of course while you're at it, documenting virNetClientDupFD similarly
would be quite beneficial.
...

diff --git a/src/libxl/libxl_migration.c b/src/libxl/libxl_migration.c
index fc7ccb53d0..5eb8eb1203 100644
--- a/src/libxl/libxl_migration.c
+++ b/src/libxl/libxl_migration.c
@@ -310,7 +310,7 @@ libxlMigrateDstReceive(virNetSocketPtr sock,
     }
     VIR_DEBUG("Accepted migration connection."
               "  Spawning thread to process migration data");
-    recvfd = virNetSocketDupFD(client_sock, true);
+    recvfd = virNetSocketDupFD(client_sock);
     virObjectUnref(client_sock);
/*
@@ -1254,7 +1254,7 @@ libxlDomainMigrationSrcPerform(libxlDriverPrivatePtr driver,
         goto cleanup;
     }
-    sockfd = virNetSocketDupFD(sock, true);
+    sockfd = virNetSocketDupFD(sock);
     virObjectUnref(sock);
if (virDomainLockProcessPause(driver->lockManager, vm, &priv->lockState) < 0)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index 825a9d399b..129be0a11a 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -3303,7 +3303,7 @@ qemuMigrationSrcConnect(virQEMUDriverPtr driver,
     if (virNetSocketNewConnectTCP(host, port,
                                   AF_UNSPEC,
                                   &sock) == 0) {
-        spec->dest.fd.qemu = virNetSocketDupFD(sock, true);
+        spec->dest.fd.qemu = virNetSocketDupFD(sock);
         virObjectUnref(sock);
     }
     if (qemuSecurityClearSocketLabel(driver->securityManager, vm->def) < 0 ||
diff --git a/src/rpc/virnetclient.c b/src/rpc/virnetclient.c
index b4d8fb2187..40ed3fde6d 100644
--- a/src/rpc/virnetclient.c
+++ b/src/rpc/virnetclient.c
@@ -715,8 +715,16 @@ int virNetClientGetFD(virNetClientPtr client)
 int virNetClientDupFD(virNetClientPtr client, bool cloexec)
 {
     int fd;
+
     virObjectLock(client);
-    fd = virNetSocketDupFD(client->sock, cloexec);
+
+    fd = virNetSocketDupFD(client->sock);
+    if (!cloexec && fd >= 0 && virSetInherit(fd, true)) {
+        virReportSystemError(errno, "%s",
+                             _("Cannot disable close-on-exec flag"));
+        VIR_FORCE_CLOSE(fd);
+    }
+
Caveat: my sock knowledge of setting or duplication of the socket fd
isn't that deep, especially as it relates to fork "mechanics".
So prior to this change, the call to virNetSocketDupFD(fd, false) would:
fd = dup(sock->fd);
Or essentially create a new socket fd from the existing one copying
everything it already has...
With this change you have (discounting some new error paths)
fd = fcntl(sock->fd, F_DUPFD_CLOEXEC, 0);
        fcntl(fd, F_SETFD, ~FD_CLOEXEC)
which seems to be a nop and/or unnecessary to me.
Beyond that if the sock->fd didn't have CLOEXEC already, then what
impact does briefly changing CLOEXEC have on something that arrives
while set?
So imagine the following happening. There are two threads A,B and they
do the following:
thread A        |  thread B   | child
    ----------------+-------------+--------
    copy = dup(fd); |             |
                    |  fork();    |
    fcntl(copy);    |             |
                    |             |  exec();
(the fcntl() call is there to set CLOEXEC flag)
This will obviously not work as expected because when B does fork() both
@copy and @fd descriptors are copied to child. So even when A then tries
to set CLOEXEC flag it won't matter as it doesn't affect FDs inside
child. In the end, @copy is leaked to a process that child exec()-s.
So I'm going to have to read more docs then, <sigh>. Is there any
question why documenting what these functions do and what edge
conditions are being avoided is so important? IOW: Which way to call the
function based on what the caller really wants.

I see that the dup() man page says it's a copy of the original except
that the FD_CLOEXEC is not set and fcntl(F_DUPFD_CLOEXEC) is a copy of
the original except that FD_CLOEXEC is set. That would also seem to
imply to me that fcntl(F_DUPFD) would be the same as dup(), but who
knows at this point.

So let's go back to your example - "thread A" calls virNetSocketDupFD:

    if (cloexec)
        fd = fcntl(sock->fd, F_DUPFD_CLOEXEC, 0);
    else
        fd = dup(sock->fd);

So using your logic, that means @cloexec == false meaning the caller
wants a duplicate socket without the FD_CLOEXEC. Then you note that "the
fcntl() call is there to set CLOEXEC flag" - so that means to me someone
called the function with @false and then set the flag afterwards. Which
as you note is a problem, which I agree; however, isn't that a problem
with the caller? If the caller called w/ @cloexec == true and then
disabled - they'd have the same issue. That's why the code is written as
it is, isn't it?

I'm trying to consider this patch on it's own merits and now you're
requiring every fd to be duplicated with the CLOEXEC flag and then
afterwards it's cleared if the virNetClientDupFD consumer wanted it
cleared. It seems to me that this patch is adding and unnecessary step
that causes the threadB/child processing problem.

Maybe I just don't know enough about this socket duplication processing,
so hopefully you'll find someone that does. Because I'm totally missing
the point.

John
...
My patches make this safer:
thread A          |  thread B   | child
    ------------------+-------------+--------
    copy = fcntl(fd); |             |
                      |  XXX        |
                      |  fork();    |
                      |             |  exec();
(here, the fcntl() call is to atomically duplicate the @fd and set
CLOEXEC flag)
It's clean to see that no matter where fork() is placed, @copy is not
leaked into the new process. Either @copy exists in child but it's
closed on exec() (because of the flag), or fork() happened before
fcntl() call and there is no @copy in child at all.
The XXX can be replaced with virSetInherit() if the code intentionally
wants to leak @copy into exec().
...
And of course if sock->fd had CLOEXEC and then you clear it on the
'shared' socket, that doesn't seem right either.
In fact it does. When starting a domain, we fork() plenty of times.
Also, fork() happens in secdriver transactions, when fetching qemu caps,
on every virCommandRun(), etc. What I'm trying to say is that fork() +
exec() happen pretty asynchronously and therefore we can't do dup() +
fcntl() because that is not atomic (as shown above).
In fact, it's fairly easy to see this problem in action. What was
sufficient for me was to start two domains at once from a loop. As I say
in the cover letter. While one thread was doing the metada lock (where
connection FD to virtlockd is duplicated), the other did fork() to start
qemu and the connection FD got leaked there.
...
Seems to me the dup() does the trick and then lets the caller decide
what CLOEXEC semantics they want with the duplicated fd without perhaps
changing the sock->fd semantics.
No, this is not atomic and therefore it's plain wrong.
Michal