On 7/3/19 2:19 AM, Michal Privoznik wrote:
When spawning a child process, between fork() and exec() we close
all file descriptors and keep only those the caller wants us to
pass onto the child. The problem is how we do that. Currently, we
get the limit of opened files and then iterate through each one
of them and either close() it or make it survive exec(). This
approach is suboptimal (although, not that much in default
configurations where the limit is pretty low - 1024). We have
/proc where we can learn what FDs we hold open and thus we can
selectively close only those.
Signed-off-by: Michal Privoznik <mprivozn(a)redhat.com>
---
src/util/vircommand.c | 78 ++++++++++++++++++++++++++++++++++++++-----
1 file changed, 70 insertions(+), 8 deletions(-)
+# ifdef __linux__
+/* On Linux, we can utilize procfs and read the table of opened
+ * FDs and selectively close only those FDs we don't want to pass
+ * onto child process (well, the one we will exec soon since this
+ * is called from the child). */
+static int
+virCommandMassCloseGetFDsLinux(virCommandPtr cmd ATTRIBUTE_UNUSED,
+ virBitmapPtr fds)
+{
+ DIR *dp = NULL;
+ struct dirent *entry;
+ const char *dirName = "/proc/self/fd";
+ int rc;
+ int ret = -1;
+
+ if (virDirOpen(&dp, dirName) < 0)
+ return -1;
Unfortunately, POSIX says that opendir()/readdir() are not async-safe
(the list of async-safe functions is
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html...;
some implementations of opendir() call malloc(), and malloc() is
definitely not async-safe - if you ever fork() in one thread while
another thread is locked inside a malloc, your child process is
deadlocked if it has to malloc because the forked process no longer has
the thread to release the malloc lock). It also says readdir() is not
threadafe
(
https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html...)
Therefore, opendir() (hidden in virDirOpen) and readdir() (hidden in
virDirRead) are generally unsafe to be used between fork() of a
multi-threaded app and the subsequent exec(). But maybe you are safe
because this code is only compiled on Linux? Are we absolutely sure this
can't deadlock, in spite of violating POSIX constraints on async-safety?
http://austingroupbugs.net/view.php?id=696 points out some interesting
problems from the POSIX side - readdir_r() is absolutely broken, and
readdir() being marked as thread-unsafe may be too strict. But then
again,
http://austingroupbugs.net/view.php?id=75 states
"The application shall not modify the structure to which the
return value of readdir() points, nor any storage areas pointed to
by pointers within the structure. The returned pointer, and
pointers within the structure, might be invalidated or the
structure or the storage areas might be overwritten by a
subsequent call to readdir() on the same directory stream.
They shall not be affected by a call to readdir() on a different
directory stream."
where it may be the intent that if opendir() doesn't take any locks or
other async-unsafe actions, using opendir() after fork() may be safe
after all (readdir on a DIR* that was opened in the parent is not, but
readdir() on a fresh DIR* opened in the child might be safe, even if not
currently strictly portable).
+
+ while ((rc = virDirRead(dp, &entry, dirName)) > 0) {
+ int fd;
+
+ if (virStrToLong_i(entry->d_name, NULL, 10, &fd) < 0) {
+ virReportError(VIR_ERR_INTERNAL_ERROR,
+ _("unable to parse FD: %s"),
+ entry->d_name);
+ goto cleanup;
+ }
+
+ if (virBitmapSetBit(fds, fd) < 0) {
Is our use of virBitmap likewise avoiding use of malloc()?
+ virReportError(VIR_ERR_INTERNAL_ERROR,
+ _("unable to set FD as open: %d"),
+ fd);
+ goto cleanup;
+ }
+ }
+
+ if (rc < 0)
+ goto cleanup;
+
+ ret = 0;
+ cleanup:
+ VIR_DIR_CLOSE(dp);
+ return ret;
+}
+
+# else /* !__linux__ */
+
+static int
+virCommandMassCloseGetFDsGeneric(virCommandPtr cmd ATTRIBUTE_UNUSED,
+ virBitmapPtr fds)
+{
+ virBitmapSetAll(fds);
+ return 0;
+}
+# endif /* !__linux__ */
+
static int
virCommandMassClose(virCommandPtr cmd,
int childin,
int childout,
int childerr)
{
+ VIR_AUTOPTR(virBitmap) fds = NULL;
int openmax = sysconf(_SC_OPEN_MAX);
- int fd;
- int tmpfd;
+ int fd = -1;
- if (openmax < 0) {
- virReportSystemError(errno, "%s",
- _("sysconf(_SC_OPEN_MAX) failed"));
+ if (!(fds = virBitmapNew(openmax)))
Would it be better to pre-alloc the bitmap in the parent, prior to
fork()ing, to ensure that we are not using malloc() in between fork and
exec?
return -1;
- }
- for (fd = 3; fd < openmax; fd++) {
+# ifdef __linux__
+ if (virCommandMassCloseGetFDsLinux(cmd, fds) < 0)
+ return -1;
+# else
+ if (virCommandMassCloseGetFDsGeneric(cmd, fds) < 0)
+ return -1;
+# endif
+
+ fd = virBitmapNextSetBit(fds, -1);
+ for (; fd >= 0; fd = virBitmapNextSetBit(fds, fd)) {
if (fd == childin || fd == childout || fd == childerr)
continue;
if (!virCommandFDIsSet(cmd, fd)) {
- tmpfd = fd;
+ int tmpfd = fd;
VIR_MASS_CLOSE(tmpfd);
} else if (virSetInherit(fd, true) < 0) {
virReportSystemError(errno, _("failed to preserve fd %d"), fd);
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization:
qemu.org |
libvirt.org