[PATCH v2 0/5] Improve closing of FDs for child processes

This is a v2 of: https://listman.redhat.com/archives/libvir-list/2023-June/240351.html Hopefully, I've implemented all Dan's suggestions. Michal Prívozník (5): virfile: Introduce virCloseRange() virfile: Introduce virCloseFrom() vircommand: Unify mass FD closing vircommand: Introduce virCommandMassCloseRange() src: Detect close_range syscall during virGlobalInit() src/libvirt.c | 4 + src/libvirt_private.syms | 4 + src/util/vircommand.c | 160 ++++++++++++++++++++++++--------------- src/util/virfile.c | 110 +++++++++++++++++++++++++++ src/util/virfile.h | 5 ++ tests/commandtest.c | 2 + 6 files changed, 222 insertions(+), 63 deletions(-) -- 2.41.0

Linux gained new close_range() syscall (in v5.9) that allows closing a range of FDs in a single syscall. Ideally, we would use it to close FDs when spawning a process (e.g. via virCommand module). Glibc has close_range() wrapper over the syscall, which falls back to iterative closing of all FDs inside the range if running under older kernel. We don't wan that as in that case we might just close opened FDs (see Linux version of virCommandMassClose()). And musl doesn't have close_range() at all. Therefore, call syscall directly. Now, mass close of FDs happen in a fork()-ed off child. While it could detect whether the kernel does support close_range(), it has no way of passing this info back to the parent and thus each child would need to query it again and again. Since this can't change while we are running we can cache the information - hence virCloseRangeInit(). Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 3 ++ src/util/virfile.c | 89 ++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 4 ++ 3 files changed, 96 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index da60c965dd..3782f7f3c7 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2269,6 +2269,9 @@ saferead; safewrite; safezero; virBuildPathInternal; +virCloseRange; +virCloseRangeInit; +virCloseRangeIsSupported; virDirClose; virDirCreate; virDirIsEmpty; diff --git a/src/util/virfile.c b/src/util/virfile.c index fe456596ae..7696910e00 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -87,6 +87,7 @@ #include "virlog.h" #include "virprocess.h" #include "virstring.h" +#include "virthread.h" #include "virutil.h" #include "virsocket.h" @@ -109,6 +110,9 @@ VIR_LOG_INIT("util.file"); # define O_DIRECT 0 #endif +static virOnceControl virCloseRangeOnce = VIR_ONCE_CONTROL_INITIALIZER; +static bool virCloseRangeSupported; + int virFileClose(int *fdptr, virFileCloseFlags flags) { int saved_errno = 0; @@ -176,6 +180,91 @@ FILE *virFileFdopen(int *fdptr, const char *mode) } +static int +virCloseRangeImpl(unsigned int first G_GNUC_UNUSED, + unsigned int last G_GNUC_UNUSED) +{ +#if defined(WITH_SYS_SYSCALL_H) && defined(__NR_close_range) + return syscall(__NR_close_range, first, last, 0); +#endif + + errno = ENOSYS; + return -1; +} + + +static void +virCloseRangeOnceInit(void) +{ + int fd[2] = { -1, -1}; + + if (virPipeQuiet(fd) < 0) + return; + + VIR_FORCE_CLOSE(fd[1]); + if (virCloseRangeImpl(fd[0], fd[0]) < 0) { + VIR_FORCE_CLOSE(fd[0]); + return; + } + + virCloseRangeSupported = true; +} + + +/** + * virCloseRange: + * + * Closes all open file descriptors from @first to @last (included). + * + * Returns: 0 on success, + * -1 on failure (with errno set). + */ +int +virCloseRange(unsigned int first, + unsigned int last) +{ + if (virCloseRangeInit() < 0) + return -1; + + if (!virCloseRangeSupported) { + errno = ENOSYS; + return -1; + } + + return virCloseRangeImpl(first, last); +} + + +/** + * virCloseRangeInit: + * + * Detects whether close_range() is available and cache the result. + */ +int +virCloseRangeInit(void) +{ + if (virOnce(&virCloseRangeOnce, virCloseRangeOnceInit) < 0) + return -1; + + return 0; +} + + +/** + * virCloseRangeIsSupported: + * + * Returns whether close_range() is supported or not. + */ +bool +virCloseRangeIsSupported(void) +{ + if (virCloseRangeInit() < 0) + return false; + + return virCloseRangeSupported; +} + + /** * virFileDirectFdFlag: * diff --git a/src/util/virfile.h b/src/util/virfile.h index 60bb1d64e7..be0b02fdf0 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -61,6 +61,10 @@ static inline void virForceCloseHelper(int *fd) ignore_value(virFileClose(fd, VIR_FILE_CLOSE_PRESERVE_ERRNO)); } +int virCloseRange(unsigned int from, unsigned int to); +int virCloseRangeInit(void); +bool virCloseRangeIsSupported(void); + /* For use on normal paths; caller must check return value, and failure sets errno per close. */ #define VIR_CLOSE(FD) virFileClose(&(FD), 0) -- 2.41.0

On a Tuesday in 2023, Michal Privoznik wrote:
Linux gained new close_range() syscall (in v5.9) that allows closing a range of FDs in a single syscall. Ideally, we would use it to close FDs when spawning a process (e.g. via virCommand module).
Glibc has close_range() wrapper over the syscall, which falls back to iterative closing of all FDs inside the range if running under older kernel. We don't wan that as in that case we might
want
just close opened FDs (see Linux version of virCommandMassClose()). And musl doesn't have close_range() at all. Therefore, call syscall directly.
Now, mass close of FDs happen in a fork()-ed off child. While it
happens
could detect whether the kernel does support close_range(), it has no way of passing this info back to the parent and thus each child would need to query it again and again.
Since this can't change while we are running we can cache the information - hence virCloseRangeInit().
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 3 ++ src/util/virfile.c | 89 ++++++++++++++++++++++++++++++++++++++++ src/util/virfile.h | 4 ++ 3 files changed, 96 insertions(+)
diff --git a/src/util/virfile.c b/src/util/virfile.c index fe456596ae..7696910e00 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c +static void +virCloseRangeOnceInit(void) +{ + int fd[2] = { -1, -1};
Uneven spacing.
+ + if (virPipeQuiet(fd) < 0) + return; + + VIR_FORCE_CLOSE(fd[1]); + if (virCloseRangeImpl(fd[0], fd[0]) < 0) { + VIR_FORCE_CLOSE(fd[0]); + return; + } + + virCloseRangeSupported = true; +} + +
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano

It is handy to close all FDs from given FD to infinity. On FreeBSD the libc even has a function for that: closefrom(). It was ported to glibc too, but not musl. At least glibc implementation falls back to calling: close_range(from, ~0U, 0); Now that we have a wrapper for close_range() we implement closefrom() trivially. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virfile.c | 21 +++++++++++++++++++++ src/util/virfile.h | 1 + 3 files changed, 23 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 3782f7f3c7..9477a07834 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2269,6 +2269,7 @@ saferead; safewrite; safezero; virBuildPathInternal; +virCloseFrom; virCloseRange; virCloseRangeInit; virCloseRangeIsSupported; diff --git a/src/util/virfile.c b/src/util/virfile.c index 7696910e00..c74bdd1264 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -265,6 +265,27 @@ virCloseRangeIsSupported(void) } +/** + * virCloseFrom: + * + * Closes all open file descriptors greater than or equal to @fromfd. + * + * Returns: 0 on success, + * -1 on error (with errno set). + */ +int +virCloseFrom(int fromfd) +{ +#ifdef __FreeBSD__ + /* FreeBSD has closefrom() since FreeBSD-8.0, i.e. since 2009. */ + closefrom(fromfd); + return 0; +#else /* !__FreeBSD__ */ + return virCloseRange(fromfd, ~0U); +#endif /* !__FreeBSD__ */ +} + + /** * virFileDirectFdFlag: * diff --git a/src/util/virfile.h b/src/util/virfile.h index be0b02fdf0..adc032ba33 100644 --- a/src/util/virfile.h +++ b/src/util/virfile.h @@ -64,6 +64,7 @@ static inline void virForceCloseHelper(int *fd) int virCloseRange(unsigned int from, unsigned int to); int virCloseRangeInit(void); bool virCloseRangeIsSupported(void); +int virCloseFrom(int fromfd); /* For use on normal paths; caller must check return value, and failure sets errno per close. */ -- 2.41.0

We have two version of mass FD closing: one for FreeBSD (because it has closefrom()) and the other for everything else. But now that we have closefrom() wrapper even for Linux, we can unify these two. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/vircommand.c | 85 +++++++++++++------------------------------ 1 file changed, 25 insertions(+), 60 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 49abb53c28..867f45b57b 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -526,60 +526,6 @@ virCommandMassCloseGetFDsGeneric(virCommand *cmd G_GNUC_UNUSED, } # endif /* !__linux__ */ -# ifdef __FreeBSD__ - -static int -virCommandMassClose(virCommand *cmd, - int childin, - int childout, - int childerr) -{ - int lastfd = -1; - int fd = -1; - size_t i; - - /* - * Two phases of closing. - * - * The first (inefficient) phase iterates over FDs, - * preserving certain FDs we need to pass down, and - * closing others. The number of iterations is bounded - * to the number of the biggest FD we need to preserve. - * - * The second (speedy) phase uses closefrom() to cull - * all remaining FDs in the process. - * - * Usually the first phase will be fairly quick only - * processing a handful of low FD numbers, and thus using - * closefrom() is a massive win for high ulimit() NFILES - * values. - */ - lastfd = MAX(lastfd, childin); - lastfd = MAX(lastfd, childout); - lastfd = MAX(lastfd, childerr); - - for (i = 0; i < cmd->npassfd; i++) - lastfd = MAX(lastfd, cmd->passfd[i].fd); - - for (fd = 0; fd <= lastfd; fd++) { - if (fd == childin || fd == childout || fd == childerr) - continue; - if (!virCommandFDIsSet(cmd, fd)) { - int tmpfd = fd; - VIR_MASS_CLOSE(tmpfd); - } else if (virSetInherit(fd, true) < 0) { - virReportSystemError(errno, _("failed to preserve fd %1$d"), fd); - return -1; - } - } - - closefrom(lastfd + 1); - - return 0; -} - -# else /* ! __FreeBSD__ */ - static int virCommandMassClose(virCommand *cmd, int childin, @@ -588,7 +534,9 @@ virCommandMassClose(virCommand *cmd, { g_autoptr(virBitmap) fds = NULL; int openmax = sysconf(_SC_OPEN_MAX); + int lastfd = -1; int fd = -1; + size_t i; /* In general, it is not safe to call malloc() between fork() and exec() * because the child might have forked at the worst possible time, i.e. @@ -605,16 +553,23 @@ virCommandMassClose(virCommand *cmd, fds = virBitmapNew(openmax); -# ifdef __linux__ +# ifdef __linux__ if (virCommandMassCloseGetFDsLinux(cmd, fds) < 0) return -1; -# else +# else if (virCommandMassCloseGetFDsGeneric(cmd, fds) < 0) return -1; -# endif +# endif + + lastfd = MAX(lastfd, childin); + lastfd = MAX(lastfd, childout); + lastfd = MAX(lastfd, childerr); + + for (i = 0; i < cmd->npassfd; i++) + lastfd = MAX(lastfd, cmd->passfd[i].fd); fd = virBitmapNextSetBit(fds, 2); - for (; fd >= 0; fd = virBitmapNextSetBit(fds, fd)) { + for (; fd >= 0 && fd <= lastfd; fd = virBitmapNextSetBit(fds, fd)) { if (fd == childin || fd == childout || fd == childerr) continue; if (!virCommandFDIsSet(cmd, fd)) { @@ -626,11 +581,21 @@ virCommandMassClose(virCommand *cmd, } } + if (virCloseFrom(lastfd + 1) < 0) { + if (errno != ENOSYS) + return -1; + + if (fd > 0) { + for (; fd >= 0; fd = virBitmapNextSetBit(fds, fd)) { + int tmpfd = fd; + VIR_MASS_CLOSE(tmpfd); + } + } + } + return 0; } -# endif /* ! __FreeBSD__ */ - /* * virExec: -- 2.41.0

This is brand new way of closing FDs before exec(). We need to close all FDs except those we want to explicitly pass to avoid leaking FDs into the child. Historically, we've done this by either iterating over all opened FDs and closing them one by one (or preserving them), or by iterating over an FD interval [2 ... N] and closing them one by one followed by calling closefrom(N + 1). This is a lot of syscalls. That's why Linux kernel developers introduced new close_from syscall. It closes all FDs within given range, in a single syscall. Since we keep list of FDs we want to preserve and pass to the child process, we can use this syscall to close all FDs in between. We don't even need to care about opened FDs. Of course, we have to check whether the syscall is available and fall back to the old implementation if it isn't. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/util/vircommand.c | 77 ++++++++++++++++++++++++++++++++++++++++--- tests/commandtest.c | 2 ++ 2 files changed, 75 insertions(+), 4 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 867f45b57b..5f094c625a 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -527,10 +527,10 @@ virCommandMassCloseGetFDsGeneric(virCommand *cmd G_GNUC_UNUSED, # endif /* !__linux__ */ static int -virCommandMassClose(virCommand *cmd, - int childin, - int childout, - int childerr) +virCommandMassCloseFrom(virCommand *cmd, + int childin, + int childout, + int childerr) { g_autoptr(virBitmap) fds = NULL; int openmax = sysconf(_SC_OPEN_MAX); @@ -597,6 +597,75 @@ virCommandMassClose(virCommand *cmd, } +static int +virCommandMassCloseRange(virCommand *cmd, + int childin, + int childout, + int childerr) +{ + g_autoptr(virBitmap) fds = virBitmapNew(0); + ssize_t first; + ssize_t last; + size_t i; + + virBitmapSetBitExpand(fds, childin); + virBitmapSetBitExpand(fds, childout); + virBitmapSetBitExpand(fds, childerr); + + for (i = 0; i < cmd->npassfd; i++) { + int fd = cmd->passfd[i].fd; + + virBitmapSetBitExpand(fds, fd); + + if (virSetInherit(fd, true) < 0) { + virReportSystemError(errno, _("failed to preserve fd %1$d"), fd); + return -1; + } + } + + first = 2; + while ((last = virBitmapNextSetBit(fds, first)) >= 0) { + if (first + 1 == last) { + first = last; + continue; + } + + /* Preserve @first and @last and close everything in between. */ + if (virCloseRange(first + 1, last - 1) < 0) { + virReportSystemError(errno, + _("Unable to mass close FDs (first=%1$zd, last=%2$zd)"), + first + 1, last - 1); + return -1; + } + + first = last; + } + + if (virCloseRange(first + 1, ~0U) < 0) { + virReportSystemError(errno, + _("Unable to mass close FDs (first=%1$zd, last=%2$d"), + first + 1, ~0U); + return -1; + } + + return 0; +} + + + +static int +virCommandMassClose(virCommand *cmd, + int childin, + int childout, + int childerr) +{ + if (virCloseRangeIsSupported()) + return virCommandMassCloseRange(cmd, childin, childout, childerr); + + return virCommandMassCloseFrom(cmd, childin, childout, childerr); +} + + /* * virExec: * @cmd virCommand * containing all information about the program to diff --git a/tests/commandtest.c b/tests/commandtest.c index 688cf59160..aa108ce583 100644 --- a/tests/commandtest.c +++ b/tests/commandtest.c @@ -1247,6 +1247,8 @@ mymain(void) setpgid(0, 0); ignore_value(setsid()); + virCloseRangeInit(); + /* Our test expects particular fd values; to get that, we must not * leak fds that we inherited from a lazy parent. At the same * time, virInitialize may open some fds (perhaps via third-party -- 2.41.0

The whole purpose of virCloseRangeInit() is to be called somewhere during initialization (ideally before first virExec() or virCommandRun()), so that the rest of the code already knows kernel capabilities. While I can put the call somewhere into remote_daemon.c (when a daemon initializes), we might call virCommand*() even from client library (i.e. no daemon). Therefore, put it into virGlobalInit() with the rest of initialization code. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/src/libvirt.c b/src/libvirt.c index 69d5b13bff..26c3fe454f 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -231,6 +231,10 @@ virGlobalInit(void) goto error; } + /* Do this upfront rather than every time a child is spawned. */ + if (virCloseRangeInit() < 0) + goto error; + if (virLogSetFromEnv() < 0) goto error; -- 2.41.0

On a Tuesday in 2023, Michal Privoznik wrote:
This is a v2 of:
https://listman.redhat.com/archives/libvir-list/2023-June/240351.html
Hopefully, I've implemented all Dan's suggestions.
Michal Prívozník (5): virfile: Introduce virCloseRange() virfile: Introduce virCloseFrom() vircommand: Unify mass FD closing vircommand: Introduce virCommandMassCloseRange() src: Detect close_range syscall during virGlobalInit()
src/libvirt.c | 4 + src/libvirt_private.syms | 4 + src/util/vircommand.c | 160 ++++++++++++++++++++++++--------------- src/util/virfile.c | 110 +++++++++++++++++++++++++++ src/util/virfile.h | 5 ++ tests/commandtest.c | 2 + 6 files changed, 222 insertions(+), 63 deletions(-)
Reviewed-by: Ján Tomko <jtomko@redhat.com> Jano

On Tue, Aug 22, 2023 at 3:35 PM Michal Privoznik <mprivozn@redhat.com> wrote:
This is a v2 of:
https://listman.redhat.com/archives/libvir-list/2023-June/240351.html
Hopefully, I've implemented all Dan's suggestions.
Michal Prívozník (5): virfile: Introduce virCloseRange() virfile: Introduce virCloseFrom() vircommand: Unify mass FD closing vircommand: Introduce virCommandMassCloseRange() src: Detect close_range syscall during virGlobalInit()
src/libvirt.c | 4 + src/libvirt_private.syms | 4 + src/util/vircommand.c | 160 ++++++++++++++++++++++++--------------- src/util/virfile.c | 110 +++++++++++++++++++++++++++ src/util/virfile.h | 5 ++ tests/commandtest.c | 2 + 6 files changed, 222 insertions(+), 63 deletions(-)
Reviewed-by: Kristina Hanicova <khanicov@redhat.com> Kristina
participants (3)
-
Ján Tomko
-
Kristina Hanicova
-
Michal Privoznik