[libvirt] [PATCHv2 0/7] lxc: honor mount namespaces

We are still awaiting a CVE number to be assigned, but Reco reported in Debian bug #732394 that a malicious guest could cause virDomainShutdown and virDomainReboot to cause the host to misbehave, if the host blindly follows symlinks in its own mount namespace instead of the guest's namespace. I have not yet tried to patch the bugs in virDomainDeviceAttach dereferencing /dev from the wrong namespace, which also suffers from the same vulnerability, but virProcessRunInMountNamespace should also be usable in that situation. While working on this series, I found several issues with virFork and virt-login-shell; since those are also related to correct namespace usage, I've bundled everything into one series; but the CVE is not fixed until patch 7/7 plus the future patch to /dev. I've done some pretty decent testing on the new virt-login-shell, but did not get as much testing on virDomainReboot. Since this series does address a CVE, and also regressions caused by our previous CVE fix in the same area of code (CVE-2013-4400 is unfortunately a poor example of shipping "fixes" without testing that the code still worked), I'd definitely appreciate a close review. Patch 6/7 is interesting: it uses virFork to use the mount namespace without impacting the parent process. However, since setns() is thread-safe, I wonder if it would be simpler to instead use pthread_create to do the callback within the same process instead of having to create a separate process, as that would make for easier coordination for passing the results back to the remaining threads that have not changed namespace. Thankfully, I think we came up with a good abstraction - I'm fairly confident that 6/7 could be rewritten to use pthread_create without changing the function signatures, in which case patch 7/7 would not need any changes to pick up the changed backend. Eric Blake (7): virt-login-shell: fix regressions in behavior virFork: simplify semantics virt-login-shell: use single instead of double fork virt-login-shell: saner exit value virsh: report exit status of failed lxc-enter-namespace lxc: add virProcessRunInMountNamespace lxc: security fix for virInitctlSetRunLevel src/internal.h | 7 +++ src/libvirt.c | 2 +- src/libvirt_private.syms | 1 + src/lxc/lxc_driver.c | 38 ++++++------ src/util/vircommand.c | 128 +++++++++++++++----------------------- src/util/vircommand.h | 2 +- src/util/virfile.c | 25 ++------ src/util/virinitctl.c | 28 ++++----- src/util/virinitctl.h | 5 +- src/util/virprocess.c | 81 ++++++++++++++++++++++-- src/util/virprocess.h | 11 ++++ tools/virsh-domain.c | 34 ++++++---- tools/virsh.pod | 3 +- tools/virt-login-shell.c | 151 ++++++++++++++++++++------------------------- tools/virt-login-shell.pod | 23 ++++++- 15 files changed, 299 insertions(+), 240 deletions(-) -- 1.8.4.2

Our fixes for CVE-2013-4400 were so effective at "fixing" bugs in virt-login-shell that we ended up fixing it into a useless do-nothing program. Commit 3e2f27e1 picked the name LIBVIRT_SETUID_RPC_CLIENT for the witness macro when we are doing secure compilation. But commit 9cd6a57d checked whether the name IN_VIRT_LOGIN_SHELL, from an earlier version of the patch series, was defined; with the net result that virt-login-shell invariably detected that it was setuid and failed virInitialize. Commit b7fcc799 closed all fds larger than stderr, but in the wrong place. Looking at the larger context, we mistakenly did the close in between obtaining the set of namespace fds, then actually using those fds to switch namespace, which means that virt-login-shell will ALWAYS fail. This is the minimal patch to fix the regressions, although further patches are also worth having to clean up poor semantics of the resulting program (for example, it is rude to not pass on the exit status of the wrapped program back to the invoking shell). * tools/virt-login-shell.c (main): Don't close fds until after namespace swap. * src/libvirt.c (virGlobalInit): Use correct macro. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/libvirt.c | 2 +- tools/virt-login-shell.c | 19 ++++++++++--------- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/src/libvirt.c b/src/libvirt.c index 77f481e..6aaa3a4 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -409,7 +409,7 @@ virGlobalInit(void) virErrorInitialize() < 0) goto error; -#ifndef IN_VIRT_LOGIN_SHELL +#ifndef LIBVIRT_SETUID_RPC_CLIENT if (virIsSUID()) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("libvirt.so is not safe to use from setuid programs")); diff --git a/tools/virt-login-shell.c b/tools/virt-login-shell.c index 8b35e34..ec78fbc 100644 --- a/tools/virt-login-shell.c +++ b/tools/virt-login-shell.c @@ -317,15 +317,6 @@ main(int argc, char **argv) int openmax = sysconf(_SC_OPEN_MAX); int fd; - if (openmax < 0) { - virReportSystemError(errno, "%s", - _("sysconf(_SC_OPEN_MAX) failed")); - return EXIT_FAILURE; - } - for (fd = 3; fd < openmax; fd++) { - int tmpfd = fd; - VIR_MASS_CLOSE(tmpfd); - } /* Fork once because we don't want to affect * virt-login-shell's namespace itself @@ -354,6 +345,16 @@ main(int argc, char **argv) if (ret < 0) return EXIT_FAILURE; + if (openmax < 0) { + virReportSystemError(errno, "%s", + _("sysconf(_SC_OPEN_MAX) failed")); + return EXIT_FAILURE; + } + for (fd = 3; fd < openmax; fd++) { + int tmpfd = fd; + VIR_MASS_CLOSE(tmpfd); + } + if (virFork(&ccpid) < 0) return EXIT_FAILURE; -- 1.8.4.2

The old semantics of virFork() violates the priciple of good usability: it requires the caller to check the pid argument after use, *even when virFork returned -1*, in order to properly abort a child process that failed setup done immediately after fork() - that is, the caller must call _exit() in the child. While uses in virfile.c did this correctly, uses in 'virsh lxc-enter-namespace' and 'virt-login-shell' would happily return from the calling function in both the child and the parent, leading to very confusing results. [Thankfully, I found the problem by inspection, and can't actually trigger the double return on error without an LD_PRELOAD library.] It is much better if the semantics of virFork are impossible to abuse. Looking at virFork(), the parent could only ever return -1 with a non-negative pid if it misused pthread_sigmask, but this never happens. The child could return -1 with non-negative pid if it fails to set up signals correctly, which is less likely but harder to guarantee - but if we instead make the child call _exit() at that point instead of forcing the caller to do it, then it is harder for the caller to misuse the virFork interface. Note that once we make that change, the semantics of the return value and of the pid argument are now redundant (a -1 return now happens only for failure to fork, a child 0 return only happens for a successful 0 pid, and a parent 0 return only happens for a successful non-zero pid), so we might as well return the pid directly rather than an integer of whether it succeeded or failed; this is also good from the interface design perspective as users are already familiar with fork() semantics. One last change in this patch: before returning the pid directly, I found cases where using virProcessWait unconditionally on a cleanup path of a virFork's -1 pid return would be nicer if there were a way to avoid it overwriting an earlier message. While such paths are a bit harder to come by with my change to a direct pid return, I decided to keep the virProcessWait change in this patch. * src/util/virprocess.c (virProcessWait): Tweak semantics when pid is -1. * src/util/vircommand.h (virFork): Change signature. * src/util/vircommand.c (virFork): Guarantee that child will only return on success, to simplify callers. Return pid rather than status, now that the situations are always the same. (virExec): Adjust caller, also avoid open-coding process death. * src/util/virfile.c (virFileAccessibleAs, virFileOpenForked) (virDirCreate): Adjust callers. * tools/virsh-domain.c (cmdLxcEnterNamespace): Likewise. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/util/vircommand.c | 127 ++++++++++++++++++----------------------------- src/util/vircommand.h | 2 +- src/util/virfile.c | 25 ++-------- src/util/virprocess.c | 15 +++--- tools/virsh-domain.c | 7 +-- tools/virt-login-shell.c | 12 +++-- 6 files changed, 75 insertions(+), 113 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index a52a1ab..ad767d7 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -193,28 +193,27 @@ virCommandFDSet(virCommandPtr cmd, /** * virFork: - * @pid - a pointer to a pid_t that will receive the return value from - * fork() * - * fork a new process while avoiding various race/deadlock conditions - * - * on return from virFork(), if *pid < 0, the fork failed and there is - * no new process. Otherwise, just like fork(), if *pid == 0, it is the - * child process returning, and if *pid > 0, it is the parent. - * - * Even if *pid >= 0, if the return value from virFork() is < 0, it - * indicates a failure that occurred in the parent or child process - * after the fork. In this case, the child process should call - * _exit(EXIT_FAILURE) after doing any additional error reporting. + * Wrapper around fork() that avoids various race/deadlock conditions. + * + * Like fork(), there are several return possibilities: + * 1. No child was created: the return is -1, errno is set, and an error + * message has been reported. The semantics of virWaitProcess() recognize + * this to avoid clobbering the error message from here. + * 2. This is the parent: the return is > 0. The parent can now attempt + * to interact with the child (but be aware that unlike raw fork(), the + * child may not return - some failures in the child result in this + * function calling _exit(255) if the child cannot be set up correctly). + * 3. This is the child: the return is 0. If this happens, the parent + * is also guaranteed to return. */ -int -virFork(pid_t *pid) +pid_t +virFork(void) { sigset_t oldmask, newmask; struct sigaction sig_action; - int saved_errno, ret = -1; - - *pid = -1; + int saved_errno; + pid_t pid; /* * Need to block signals now, so that child process can safely @@ -222,54 +221,47 @@ virFork(pid_t *pid) */ sigfillset(&newmask); if (pthread_sigmask(SIG_SETMASK, &newmask, &oldmask) != 0) { - saved_errno = errno; virReportSystemError(errno, "%s", _("cannot block signals")); - goto cleanup; + return -1; } /* Ensure we hold the logging lock, to protect child processes * from deadlocking on another thread's inherited mutex state */ virLogLock(); - *pid = fork(); + pid = fork(); saved_errno = errno; /* save for caller */ /* Unlock for both parent and child process */ virLogUnlock(); - if (*pid < 0) { + if (pid < 0) { /* attempt to restore signal mask, but ignore failure, to - avoid obscuring the fork failure */ + * avoid obscuring the fork failure */ ignore_value(pthread_sigmask(SIG_SETMASK, &oldmask, NULL)); virReportSystemError(saved_errno, "%s", _("cannot fork child process")); - goto cleanup; - } - - if (*pid) { + errno = saved_errno; + } else if (pid) { /* parent process */ /* Restore our original signal mask now that the child is - safely running */ - if (pthread_sigmask(SIG_SETMASK, &oldmask, NULL) != 0) { - saved_errno = errno; /* save for caller */ - virReportSystemError(errno, "%s", _("cannot unblock signals")); - goto cleanup; - } - ret = 0; + * safely running. Only documented failures are EFAULT (not + * possible, since we are using just-grabbed mask) or EINVAL + * (not possible, since we are using correct arguments). */ + ignore_value(pthread_sigmask(SIG_SETMASK, &oldmask, NULL)); } else { - /* child process */ int logprio; size_t i; - /* Remove any error callback so errors in child now - get sent to stderr where they stand a fighting chance - of being seen / logged */ + /* Remove any error callback so errors in child now get sent + * to stderr where they stand a fighting chance of being seen + * and logged */ virSetErrorFunc(NULL, NULL); virSetErrorLogPriorityFunc(NULL); @@ -280,36 +272,30 @@ virFork(pid_t *pid) virLogSetDefaultPriority(logprio); /* Clear out all signal handlers from parent so nothing - unexpected can happen in our child once we unblock - signals */ + * unexpected can happen in our child once we unblock + * signals */ sig_action.sa_handler = SIG_DFL; sig_action.sa_flags = 0; sigemptyset(&sig_action.sa_mask); for (i = 1; i < NSIG; i++) { - /* Only possible errors are EFAULT or EINVAL - The former wont happen, the latter we - expect, so no need to check return value */ - - sigaction(i, &sig_action, NULL); + /* Only possible errors are EFAULT or EINVAL The former + * won't happen, the latter we expect, so no need to check + * return value */ + ignore_value(sigaction(i, &sig_action, NULL)); } - /* Unmask all signals in child, since we've no idea - what the caller's done with their signal mask - and don't want to propagate that to children */ + /* Unmask all signals in child, since we've no idea what the + * caller's done with their signal mask and don't want to + * propagate that to children */ sigemptyset(&newmask); if (pthread_sigmask(SIG_SETMASK, &newmask, NULL) != 0) { - saved_errno = errno; /* save for caller */ virReportSystemError(errno, "%s", _("cannot unblock signals")); - goto cleanup; + virDispatchError(NULL); + _exit(255); } - ret = 0; } - -cleanup: - if (ret < 0) - errno = saved_errno; - return ret; + return pid; } /* @@ -407,7 +393,7 @@ virExec(virCommandPtr cmd) int tmpfd; char *binarystr = NULL; const char *binary = NULL; - int forkRet, ret; + int ret; struct sigaction waxon, waxoff; gid_t *groups = NULL; int ngroups; @@ -484,17 +470,13 @@ virExec(virCommandPtr cmd) if ((ngroups = virGetGroupList(cmd->uid, cmd->gid, &groups)) < 0) goto cleanup; - forkRet = virFork(&pid); + pid = virFork(); if (pid < 0) { goto cleanup; } if (pid) { /* parent */ - if (forkRet < 0) { - goto cleanup; - } - VIR_FORCE_CLOSE(null); if (cmd->outfdptr && *cmd->outfdptr == -1) { VIR_FORCE_CLOSE(pipeout[1]); @@ -514,14 +496,6 @@ virExec(virCommandPtr cmd) } /* child */ - - if (forkRet < 0) { - /* The fork was successful, but after that there was an error - * in the child (which was already logged). - */ - goto fork_error; - } - openmax = sysconf(_SC_OPEN_MAX); if (openmax < 0) { virReportSystemError(errno, "%s", @@ -592,12 +566,10 @@ virExec(virCommandPtr cmd) if (pid > 0) { if (cmd->pidfile && (virPidFileWritePath(cmd->pidfile, pid) < 0)) { - kill(pid, SIGTERM); - usleep(500*1000); - kill(pid, SIGTERM); - virReportSystemError(errno, - _("could not write pidfile %s for %d"), - cmd->pidfile, pid); + if (virProcessKillPainfully(pid, true) >= 0) + virReportSystemError(errno, + _("could not write pidfile %s for %d"), + cmd->pidfile, pid); goto fork_error; } _exit(0); @@ -778,10 +750,9 @@ virExec(virCommandPtr cmd ATTRIBUTE_UNUSED) return -1; } -int -virFork(pid_t *pid) +pid_t +virFork(void) { - *pid = -1; errno = ENOTSUP; return -1; diff --git a/src/util/vircommand.h b/src/util/vircommand.h index e977f93..4372a7f 100644 --- a/src/util/vircommand.h +++ b/src/util/vircommand.h @@ -33,7 +33,7 @@ typedef virCommand *virCommandPtr; * call any function that is not async-signal-safe. */ typedef int (*virExecHook)(void *data); -int virFork(pid_t *pid) ATTRIBUTE_RETURN_CHECK; +pid_t virFork(void) ATTRIBUTE_RETURN_CHECK; int virRun(const char *const*argv, int *status) ATTRIBUTE_RETURN_CHECK; diff --git a/src/util/virfile.c b/src/util/virfile.c index 631cd06..92d4815 100644 --- a/src/util/virfile.c +++ b/src/util/virfile.c @@ -1730,7 +1730,7 @@ virFileAccessibleAs(const char *path, int mode, if (ngroups < 0) return -1; - forkRet = virFork(&pid); + pid = virFork(); if (pid < 0) { VIR_FREE(groups); @@ -1740,8 +1740,7 @@ virFileAccessibleAs(const char *path, int mode, if (pid) { /* parent */ VIR_FREE(groups); if (virProcessWait(pid, &status) < 0) { - /* virProcessWait() already - * reported error */ + /* virProcessWait() already reported error */ return -1; } @@ -1842,7 +1841,6 @@ virFileOpenForked(const char *path, int openflags, mode_t mode, int waitret, status, ret = 0; int fd = -1; int pair[2] = { -1, -1 }; - int forkRet; gid_t *groups; int ngroups; @@ -1864,7 +1862,7 @@ virFileOpenForked(const char *path, int openflags, mode_t mode, return ret; } - forkRet = virFork(&pid); + pid = virFork(); if (pid < 0) return -errno; @@ -1872,15 +1870,8 @@ virFileOpenForked(const char *path, int openflags, mode_t mode, /* child */ - VIR_FORCE_CLOSE(pair[0]); /* preserves errno */ - if (forkRet < 0) { - /* error encountered and logged in virFork() after the fork. */ - ret = -errno; - goto childerror; - } - /* set desired uid/gid, then attempt to create the file */ - + VIR_FORCE_CLOSE(pair[0]); if (virSetUIDGID(uid, gid, groups, ngroups) < 0) { ret = -errno; goto childerror; @@ -2151,7 +2142,7 @@ virDirCreate(const char *path, if (ngroups < 0) return -errno; - int forkRet = virFork(&pid); + pid = virFork(); if (pid < 0) { ret = -errno; @@ -2181,13 +2172,7 @@ parenterror: /* child */ - if (forkRet < 0) { - /* error encountered and logged in virFork() after the fork. */ - goto childerror; - } - /* set desired uid/gid, then attempt to create the directory */ - if (virSetUIDGID(uid, gid, groups, ngroups) < 0) { ret = -errno; goto childerror; diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 9fc3207..5fbed25 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -151,10 +151,12 @@ virProcessAbort(pid_t pid) * @exitstatus: optional status collection * * Wait for a child process to complete. - * Return -1 on any error waiting for - * completion. Returns 0 if the command - * finished with the exit status set. If @exitstatus is NULL, then the - * child must exit with status 0 for this to succeed. + * If @pid is -1, do nothing, but return -1 (useful for error cleanup, + * and assumes an earlier message was already issued). Otherwise, + * return -1 on any error waiting for completion, with an error message + * issued. If @exitstatus is NULL, require the process to complete normally + * with status 0; otherwise, populate @exitstatus (whether the process + * completed normally or was terminated by a signal). */ int virProcessWait(pid_t pid, int *exitstatus) @@ -163,8 +165,9 @@ virProcessWait(pid_t pid, int *exitstatus) int status; if (pid <= 0) { - virReportSystemError(EINVAL, _("unable to wait for process %lld"), - (long long) pid); + if (pid != -1) + virReportSystemError(EINVAL, _("unable to wait for process %lld"), + (long long) pid); return -1; } diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 760dca5..d9b542e 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -8134,9 +8134,10 @@ cmdLxcEnterNamespace(vshControl *ctl, const vshCmd *cmd) } /* Fork once because we don't want to affect - * virsh's namespace itself + * virsh's namespace itself, and because user namespace + * can only be changed in single-threaded process */ - if (virFork(&pid) < 0) + if ((pid = virFork()) < 0) goto cleanup; if (pid == 0) { if (setlabel && @@ -8157,7 +8158,7 @@ cmdLxcEnterNamespace(vshControl *ctl, const vshCmd *cmd) /* Fork a second time because entering the * pid namespace only takes effect after fork */ - if (virFork(&pid) < 0) + if ((pid = virFork()) < 0) _exit(255); if (pid == 0) { execv(cmdargv[0], cmdargv); diff --git a/tools/virt-login-shell.c b/tools/virt-login-shell.c index ec78fbc..e26b7a1 100644 --- a/tools/virt-login-shell.c +++ b/tools/virt-login-shell.c @@ -309,7 +309,10 @@ main(int argc, char **argv) if (virDomainGetSecurityLabel(dom, seclabel) < 0) goto cleanup; - if (virFork(&cpid) < 0) + /* Fork once because we don't want to affect virt-login-shell's + * namespace itself. FIXME: is this necessary? + */ + if ((cpid = virFork()) < 0) goto cleanup; if (cpid == 0) { @@ -318,9 +321,6 @@ main(int argc, char **argv) int openmax = sysconf(_SC_OPEN_MAX); int fd; - /* Fork once because we don't want to affect - * virt-login-shell's namespace itself - */ if (virSetUIDGID(0, 0, NULL, 0) < 0) return EXIT_FAILURE; @@ -355,7 +355,9 @@ main(int argc, char **argv) VIR_MASS_CLOSE(tmpfd); } - if (virFork(&ccpid) < 0) + /* A fork is required to create new process in correct pid + * namespace. */ + if ((ccpid = virFork()) < 0) return EXIT_FAILURE; if (ccpid == 0) { -- 1.8.4.2

Note that 'virsh lxc-enter-namespace' must double-fork, for two reasons: some namespaces can only be done from a single thread, while virsh is multithreaded; and because virsh can be run in batch mode where we must not corrupt the namespace of that execution upon return from the subsidiary command. When virt-login-shell was first written, it blindly copied from 'virsh lxc-enter-namespace', including the double-fork. But neither of the reasons for double forking apply to virt-login-shell (we are single-threaded, and we have nothing to do after the child completes that would require us to preserve a namespace), so we can simplify life by using a single fork. In turn, this will make it easier for a future patch to pass the child's exit status on to the invoking shell. In flattening to a single fork, note that closing the fds must be done after fork, because the parent process still needs to use fds to control the virConnectPtr; meanwhile, chdir can be done prior to forking (in fact, it's easier to report errors on anything attempted before forking). * tools/virt-login-shell.c (main): Single rather than double fork. (virLoginShellFini): Delete, by inlining actions instead. Signed-off-by: Eric Blake <eblake@redhat.com> --- tools/virt-login-shell.c | 116 ++++++++++++++++++----------------------------- 1 file changed, 43 insertions(+), 73 deletions(-) diff --git a/tools/virt-login-shell.c b/tools/virt-login-shell.c index e26b7a1..bc8394e 100644 --- a/tools/virt-login-shell.c +++ b/tools/virt-login-shell.c @@ -41,24 +41,8 @@ #include "vircommand.h" #define VIR_FROM_THIS VIR_FROM_NONE -static ssize_t nfdlist; -static int *fdlist; static const char *conf_file = SYSCONFDIR "/libvirt/virt-login-shell.conf"; -static void virLoginShellFini(virConnectPtr conn, virDomainPtr dom) -{ - size_t i; - - for (i = 0; i < nfdlist; i++) - VIR_FORCE_CLOSE(fdlist[i]); - VIR_FREE(fdlist); - nfdlist = 0; - if (dom) - virDomainFree(dom); - if (conn) - virConnectClose(conn); -} - static int virLoginShellAllowedUser(virConfPtr conf, const char *name, gid_t *groups) @@ -187,7 +171,6 @@ main(int argc, char **argv) pid_t cpid; int ret = EXIT_FAILURE; int status; - int status2; uid_t uid = getuid(); gid_t gid = getgid(); char *name = NULL; @@ -201,6 +184,10 @@ main(int argc, char **argv) int longindex = -1; int ngroups; gid_t *groups = NULL; + ssize_t nfdlist = 0; + int *fdlist = NULL; + int openmax; + size_t i; struct option opt[] = { {"help", no_argument, NULL, 'h'}, @@ -298,6 +285,13 @@ main(int argc, char **argv) } } + openmax = sysconf(_SC_OPEN_MAX); + if (openmax < 0) { + virReportSystemError(errno, "%s", + _("sysconf(_SC_OPEN_MAX) failed")); + goto cleanup; + } + if ((nfdlist = virDomainLxcOpenNamespace(dom, &fdlist, 0)) < 0) goto cleanup; if (VIR_ALLOC(secmodel) < 0) @@ -308,76 +302,52 @@ main(int argc, char **argv) goto cleanup; if (virDomainGetSecurityLabel(dom, seclabel) < 0) goto cleanup; + if (virSetUIDGID(0, 0, NULL, 0) < 0) + goto cleanup; + if (virDomainLxcEnterSecurityLabel(secmodel, seclabel, NULL, 0) < 0) + goto cleanup; + if (nfdlist > 0 && + virDomainLxcEnterNamespace(dom, nfdlist, fdlist, NULL, NULL, 0) < 0) + goto cleanup; + if (virSetUIDGID(uid, gid, groups, ngroups) < 0) + goto cleanup; + if (chdir(homedir) < 0) { + virReportSystemError(errno, _("Unable to chdir(%s)"), homedir); + goto cleanup; + } - /* Fork once because we don't want to affect virt-login-shell's - * namespace itself. FIXME: is this necessary? - */ + /* A fork is required to create new process in correct pid namespace. */ if ((cpid = virFork()) < 0) goto cleanup; if (cpid == 0) { - pid_t ccpid; - - int openmax = sysconf(_SC_OPEN_MAX); - int fd; + int tmpfd; - if (virSetUIDGID(0, 0, NULL, 0) < 0) - return EXIT_FAILURE; - - if (virDomainLxcEnterSecurityLabel(secmodel, - seclabel, - NULL, - 0) < 0) - return EXIT_FAILURE; - - if (nfdlist > 0) { - if (virDomainLxcEnterNamespace(dom, - nfdlist, - fdlist, - NULL, - NULL, - 0) < 0) - return EXIT_FAILURE; - } - - ret = virSetUIDGID(uid, gid, groups, ngroups); - VIR_FREE(groups); - if (ret < 0) - return EXIT_FAILURE; - - if (openmax < 0) { - virReportSystemError(errno, "%s", - _("sysconf(_SC_OPEN_MAX) failed")); - return EXIT_FAILURE; - } - for (fd = 3; fd < openmax; fd++) { - int tmpfd = fd; + for (i = 3; i < openmax; i++) { + tmpfd = i; VIR_MASS_CLOSE(tmpfd); } - - /* A fork is required to create new process in correct pid - * namespace. */ - if ((ccpid = virFork()) < 0) + if (execv(shargv[0], (char *const*) shargv) < 0) { + virReportSystemError(errno, _("Unable to exec shell %s"), + shargv[0]); return EXIT_FAILURE; - - if (ccpid == 0) { - if (chdir(homedir) < 0) { - virReportSystemError(errno, _("Unable to chdir(%s)"), homedir); - return EXIT_FAILURE; - } - if (execv(shargv[0], (char *const*) shargv) < 0) { - virReportSystemError(errno, _("Unable to exec shell %s"), - shargv[0]); - return EXIT_FAILURE; - } } - return virProcessWait(ccpid, &status2); + return EXIT_SUCCESS; } - ret = virProcessWait(cpid, &status); + + if (virProcessWait(cpid, &status) < 0) + goto cleanup; + ret = EXIT_SUCCESS; cleanup: + for (i = 0; i < nfdlist; i++) + VIR_FORCE_CLOSE(fdlist[i]); + VIR_FREE(fdlist); virConfFree(conf); - virLoginShellFini(conn, dom); + if (dom) + virDomainFree(dom); + if (conn) + virConnectClose(conn); virStringFreeList(shargv); VIR_FREE(name); VIR_FREE(homedir); -- 1.8.4.2

virt-login-shell was exiting with status 0, regardless of what the wrapped shell returned. This is unkind to users; we should behave more like env(1), nice(1), su(1), and other wrapper programs, by preserving the invoked application's status (which includes the distinction between death due to signal vs. normal death). * src/internal.h (EXIT_CANCELED, EXIT_CANNOT_INVOKE) (EXIT_ENOENT): New enum. * src/util/vircommand.c (virFork): Document specific exit value if child aborts early. * tools/virt-login-shell.c (main): Pass through child exit status. * tools/virt-login-shell.pod: Document exit status. Signed-off-by: Eric Blake <eblake@redhat.com> --- src/internal.h | 7 +++++++ src/util/vircommand.c | 5 +++-- tools/virt-login-shell.c | 44 ++++++++++++++++++++++++++++---------------- tools/virt-login-shell.pod | 23 ++++++++++++++++++++++- 4 files changed, 60 insertions(+), 19 deletions(-) diff --git a/src/internal.h b/src/internal.h index 5e29694..c90c83f 100644 --- a/src/internal.h +++ b/src/internal.h @@ -431,5 +431,12 @@ #NAME ": " FMT, __VA_ARGS__); # endif +/* Specific error values for use in forwarding programs such as + * virt-login-shell; these values match what GNU env does. */ +enum { + EXIT_CANCELED = 125, /* Failed before attempting exec */ + EXIT_CANNOT_INVOKE = 126, /* Exists but couldn't exec */ + EXIT_ENOENT = 127, /* Could not find program to exec */ +}; #endif /* __VIR_INTERNAL_H__ */ diff --git a/src/util/vircommand.c b/src/util/vircommand.c index ad767d7..3bbbe5f 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -203,7 +203,8 @@ virCommandFDSet(virCommandPtr cmd, * 2. This is the parent: the return is > 0. The parent can now attempt * to interact with the child (but be aware that unlike raw fork(), the * child may not return - some failures in the child result in this - * function calling _exit(255) if the child cannot be set up correctly). + * function calling _exit(EXIT_CANCELED) if the child cannot be set up + * correctly). * 3. This is the child: the return is 0. If this happens, the parent * is also guaranteed to return. */ @@ -292,7 +293,7 @@ virFork(void) if (pthread_sigmask(SIG_SETMASK, &newmask, NULL) != 0) { virReportSystemError(errno, "%s", _("cannot unblock signals")); virDispatchError(NULL); - _exit(255); + _exit(EXIT_CANCELED); } } return pid; diff --git a/tools/virt-login-shell.c b/tools/virt-login-shell.c index bc8394e..2172325 100644 --- a/tools/virt-login-shell.c +++ b/tools/virt-login-shell.c @@ -21,13 +21,14 @@ */ #include <config.h> -#include <stdarg.h> -#include <getopt.h> -#include <stdio.h> #include <errno.h> -#include <stdlib.h> #include <fnmatch.h> +#include <getopt.h> #include <locale.h> +#include <signal.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> #include "internal.h" #include "virerror.h" @@ -168,8 +169,8 @@ main(int argc, char **argv) { virConfPtr conf = NULL; const char *login_shell_path = conf_file; - pid_t cpid; - int ret = EXIT_FAILURE; + pid_t cpid = -1; + int ret = EXIT_CANCELED; int status; uid_t uid = getuid(); gid_t gid = getgid(); @@ -195,8 +196,8 @@ main(int argc, char **argv) {NULL, 0, NULL, 0} }; if (virInitialize() < 0) { - fprintf(stderr, _("Failed to initialize libvirt Error Handling")); - return EXIT_FAILURE; + fprintf(stderr, _("Failed to initialize libvirt error handling")); + return EXIT_CANCELED; } setenv("PATH", "/bin:/usr/bin", 1); @@ -231,7 +232,7 @@ main(int argc, char **argv) case '?': default: usage(); - exit(EXIT_FAILURE); + exit(EXIT_CANCELED); } } @@ -330,15 +331,13 @@ main(int argc, char **argv) if (execv(shargv[0], (char *const*) shargv) < 0) { virReportSystemError(errno, _("Unable to exec shell %s"), shargv[0]); - return EXIT_FAILURE; + virDispatchError(NULL); + return errno == ENOENT ? EXIT_ENOENT : EXIT_CANNOT_INVOKE; } - return EXIT_SUCCESS; } - if (virProcessWait(cpid, &status) < 0) - goto cleanup; - ret = EXIT_SUCCESS; - + /* At this point, the parent is now waiting for the child to exit, + * but as that may take a long time, we release resources now. */ cleanup: for (i = 0; i < nfdlist; i++) VIR_FORCE_CLOSE(fdlist[i]); @@ -354,7 +353,20 @@ cleanup: VIR_FREE(seclabel); VIR_FREE(secmodel); VIR_FREE(groups); - if (ret) + + if (virProcessWait(cpid, &status) == 0) { + if (WIFEXITED(status)) { + ret = WEXITSTATUS(status); + } else if (WIFSIGNALED(status)) { + /* Try to kill the parent with the same signal as the + * child, but if that fails, at least give a decent exit + * status value. */ + raise(WTERMSIG(status)); + ret = 128 + WTERMSIG(status); + } + } + + if (virGetLastError()) virDispatchError(NULL); return ret; } diff --git a/tools/virt-login-shell.pod b/tools/virt-login-shell.pod index bcd7855..c1d0cb3 100644 --- a/tools/virt-login-shell.pod +++ b/tools/virt-login-shell.pod @@ -4,7 +4,7 @@ virt-login-shell - tool to execute a shell within a container matching the users =head1 SYNOPSIS -B<virt-login-shell> +B<virt-login-shell> [I<OPTION>] =head1 DESCRIPTION @@ -47,6 +47,27 @@ variable in /etc/libvirt/virt-login-shell.conf. eg. allowed_users = [ "tom", "dick", "harry" ] +=head1 EXIT STATUS + +B<virt-login-shell> normally returns the exit status of the command it +executed. If the command was killed by a signal, but that signal is not +fatal to virt-login-shell, then it returns the signal number plus 128. + +Exit status generated by B<virt-login-shell> itself: + +=over 4 + +=item B<0> An option was used to learn more about this binary. + +=item B<125> Generic error before attempting execution of the configured +shell; for example, if libvirtd is not running. + +=item B<126> The configured shell exists but could not be executed. + +=item B<127> The configured shell could not be found. + +=back + =head1 BUGS Report any bugs discovered to the libvirt community via the mailing -- 1.8.4.2

'virsh lxc-enter-namespace' does not have a way to reflect exit status to the caller in single-command mode, but we might as well at least report the exit status. Prior to this patch, $ virsh -c lxc:/// lxc-enter-namespace shell /bin/sh 'exit 3'; echo $? 1 now it gives some details: $ virsh -c lxc:/// lxc-enter-namespace shell /bin/sh -c 'exit 3'; echo $? error: internal error: Child process (31557) unexpected exit status 3 1 Also useful: $ virsh -c lxc:/// lxc-enter-namespace shell /bin/sh -c 'kill $$'; echo $? error: internal error: Child process (31585) unexpected fatal signal 15 1 * tools/virsh-domain.c (cmdLxcEnterNamespace): Avoid magic numbers. Dispatch any error. * tools/virsh.pod: Document that non-zero exit status is collapsed. Signed-off-by: Eric Blake <eblake@redhat.com> --- tools/virsh-domain.c | 27 +++++++++++++++++++-------- tools/virsh.pod | 3 ++- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index d9b542e..4dc6410 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -8140,12 +8140,14 @@ cmdLxcEnterNamespace(vshControl *ctl, const vshCmd *cmd) if ((pid = virFork()) < 0) goto cleanup; if (pid == 0) { + int status; + if (setlabel && virDomainLxcEnterSecurityLabel(secmodel, seclabel, NULL, 0) < 0) - _exit(255); + _exit(EXIT_CANCELED); if (virDomainLxcEnterNamespace(dom, nfdlist, @@ -8153,27 +8155,36 @@ cmdLxcEnterNamespace(vshControl *ctl, const vshCmd *cmd) NULL, NULL, 0) < 0) - _exit(255); + _exit(EXIT_CANCELED); /* Fork a second time because entering the * pid namespace only takes effect after fork */ if ((pid = virFork()) < 0) - _exit(255); + _exit(EXIT_CANCELED); if (pid == 0) { execv(cmdargv[0], cmdargv); - _exit(255); + _exit(errno == ENOENT ? EXIT_ENOENT : EXIT_CANNOT_INVOKE); + } + if (virProcessWait(pid, &status) < 0) + _exit(EXIT_CANNOT_INVOKE); + if (WIFSIGNALED(status)) { + raise(WTERMSIG(status)); + status = WTERMSIG(status) + 128; + } else if (WIFEXITED(status)) { + status = WEXITSTATUS(status); } else { - if (virProcessWait(pid, NULL) < 0) - _exit(255); + status = EXIT_CANNOT_INVOKE; } - _exit(0); + _exit(status); } else { for (i = 0; i < nfdlist; i++) VIR_FORCE_CLOSE(fdlist[i]); VIR_FREE(fdlist); - if (virProcessWait(pid, NULL) < 0) + if (virProcessWait(pid, NULL) < 0) { + vshReportError(ctl); goto cleanup; + } } ret = true; diff --git a/tools/virsh.pod b/tools/virsh.pod index 3534b54..9b24a69 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -3284,7 +3284,8 @@ Enter the namespace of I<domain> and execute the command C</path/to/binary> passing the requested args. The binary path is relative to the container root filesystem, not the host root filesystem. The binary will inherit the environment variables / console visible to virsh. This command only works -when connected to the LXC hypervisor driver. +when connected to the LXC hypervisor driver. This command succeeds only +if C</path/to/binary> has 0 exit status. =back -- 1.8.4.2

Implement virProcessRunInMountNamespace, which runs callback of type virProcessNamespaceCallback in a container namespace. Idea by Dan Berrange, based on an initial report by Reco <recoverym4n@gmail.com> at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394 Signed-off-by: Eric Blake <eblake@redhat.com> --- setns() is a per-thread call. Would it be any simpler to just pthread_create() a short-lived helper thread, so that we don't have to worry about full-blown async-signal safety, and so that the thread can pass more information back rather than the limitation of an exit status? --- src/libvirt_private.syms | 1 + src/util/virprocess.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virprocess.h | 11 ++++++++ 3 files changed, 78 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 2dbb8f8..e210fd0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1646,6 +1646,7 @@ virProcessGetNamespaces; virProcessGetStartTime; virProcessKill; virProcessKillPainfully; +virProcessRunInMountNamespace; virProcessSetAffinity; virProcessSetMaxFiles; virProcessSetMaxMemLock; diff --git a/src/util/virprocess.c b/src/util/virprocess.c index 5fbed25..c99b75a 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -31,6 +31,7 @@ # include <sys/resource.h> #endif #include <sched.h> +#include <stdlib.h> #ifdef __FreeBSD__ # include <sys/param.h> @@ -39,6 +40,7 @@ #endif #include "viratomic.h" +#include "vircommand.h" #include "virprocess.h" #include "virerror.h" #include "viralloc.h" @@ -850,3 +852,67 @@ int virProcessGetStartTime(pid_t pid, return 0; } #endif + + +#ifdef HAVE_SETNS +/* Run cb(opaque) in the mount namespace of pid. Return -1 with error + * message raised if we fail to run the child, if the child dies from + * a signal, or if the child has status EXIT_CANCELED; otherwise + * return the exit status of the child. */ +int +virProcessRunInMountNamespace(pid_t pid, + virProcessNamespaceCallback cb, + void *opaque) +{ + char *path = NULL; + int ret = -1; + int cpid = -1; + int fd = -1; + int status; + + if (virAsprintf(&path, "/proc/%llu/ns/mnt", (unsigned long long)pid) < 0) + goto cleanup; + + if ((fd = open(path, O_RDONLY)) < 0) { + virReportSystemError(errno, "%s", + _("Kernel does not provide mount namespace")); + goto cleanup; + } + + if ((cpid = virFork() < 0)) + goto cleanup; + if (cpid == 0) { + /* child */ + if (setns(fd, 0) == -1) + _exit(EXIT_CANCELED); + + ret = cb(pid, opaque); + _exit(ret < 0 ? EXIT_CANCELED : ret); + } + + /* parent */ + if (virProcessWait(cpid, &status) < 0) + goto cleanup; + if (!WIFEXITED(status) || WEXITSTATUS(status) == EXIT_CANCELED) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("mount namespace callback did not complete normally")); + goto cleanup; + } + ret = WEXITSTATUS(status); + +cleanup: + VIR_FREE(path); + VIR_FORCE_CLOSE(fd); + return ret; +} +#else /* !HAVE_SETNS */ +int +virProcessRunInMountNamespace(pid_t pid ATTRIBUTE_UNUSED, + virProcessNamespaceCallback cb ATTRIBUTE_UNUSED, + void *opaque ATTRIBUTE_UNUSED) +{ + virReportSystemError(ENOSYS, "%s", + _("Mount namespaces are not available on this platform")); + return -1; +} +#endif diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 9f77bc5..75c7d1b 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -60,4 +60,15 @@ int virProcessSetNamespaces(size_t nfdlist, int virProcessSetMaxMemLock(pid_t pid, unsigned long long bytes); int virProcessSetMaxProcesses(pid_t pid, unsigned int procs); int virProcessSetMaxFiles(pid_t pid, unsigned int files); + +/* Callback to run code within the mount namespace tied to the given + * pid. This function must use only async-signal-safe functions, as + * it gets run after a fork of a multi-threaded process. The return + * value of this function is passed to _exit(), except that a + * negative value is treated as EXIT_CANCELED. */ +typedef int (*virProcessNamespaceCallback)(pid_t pid, void *opaque); + +int virProcessRunInMountNamespace(pid_t pid, + virProcessNamespaceCallback cb, + void *opaque); #endif /* __VIR_PROCESS_H__ */ -- 1.8.4.2

On Mon, Dec 23, 2013 at 10:55:50PM -0700, Eric Blake wrote: [..snip..]
+ if (virAsprintf(&path, "/proc/%llu/ns/mnt", (unsigned long long)pid) < 0) + goto cleanup; + + if ((fd = open(path, O_RDONLY)) < 0) { + virReportSystemError(errno, "%s", + _("Kernel does not provide mount namespace")); + goto cleanup; + }
So in case mount namespaces are unavailable we'll fail these operations entirely? I think this is the right thing to do but it will break distros that have a too old kernel. So shutting down of containers will no longer work (as it did before). Cheers, -- Guido

On 01/08/2014 12:10 PM, Guido Günther wrote:
On Mon, Dec 23, 2013 at 10:55:50PM -0700, Eric Blake wrote: [..snip..]
+ if (virAsprintf(&path, "/proc/%llu/ns/mnt", (unsigned long long)pid) < 0) + goto cleanup; + + if ((fd = open(path, O_RDONLY)) < 0) { + virReportSystemError(errno, "%s", + _("Kernel does not provide mount namespace")); + goto cleanup; + }
So in case mount namespaces are unavailable we'll fail these operations entirely? I think this is the right thing to do but it will break distros that have a too old kernel. So shutting down of containers will no longer work (as it did before).
We'll fail the attempt to use initctl as the shutdown mechanism, but should still gracefully fall back to the attempt to use signals (once this patch is in [1]). Or, if the user explicitly requested intictl only, then they WANT to know that initctl didn't work. [1] https://www.redhat.com/archives/libvir-list/2014-January/msg00277.html -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Wed, Jan 08, 2014 at 12:26:41PM -0700, Eric Blake wrote:
On 01/08/2014 12:10 PM, Guido Günther wrote:
On Mon, Dec 23, 2013 at 10:55:50PM -0700, Eric Blake wrote: [..snip..]
+ if (virAsprintf(&path, "/proc/%llu/ns/mnt", (unsigned long long)pid) < 0) + goto cleanup; + + if ((fd = open(path, O_RDONLY)) < 0) { + virReportSystemError(errno, "%s", + _("Kernel does not provide mount namespace")); + goto cleanup; + }
So in case mount namespaces are unavailable we'll fail these operations entirely? I think this is the right thing to do but it will break distros that have a too old kernel. So shutting down of containers will no longer work (as it did before).
We'll fail the attempt to use initctl as the shutdown mechanism, but should still gracefully fall back to the attempt to use signals (once this patch is in [1]). Or, if the user explicitly requested intictl only, then they WANT to know that initctl didn't work.
Ahh...the signals. Forgat that we have this too. Thanks! -- Guido

On 12/23/2013 10:55 PM, Eric Blake wrote:
Implement virProcessRunInMountNamespace, which runs callback of type virProcessNamespaceCallback in a container namespace.
Idea by Dan Berrange, based on an initial report by Reco <recoverym4n@gmail.com> at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394
Signed-off-by: Eric Blake <eblake@redhat.com>
---
setns() is a per-thread call. Would it be any simpler to just pthread_create() a short-lived helper thread, so that we don't have to worry about full-blown async-signal safety, and so that the thread can pass more information back rather than the limitation of an exit status?
The more I look at the LXC device hotplug, the more I keep coming back to this question. Writing async-signal-safe functions whose only way of communicating back to the parent is through an exit status is tough; doing the callback as a dedicated thread (since setns() is a per-thread call, as long as you aren't worried about the pid namespace) seems like it would be a lot more manageable for having the temporary thread still take full advantage of libvirt error reporting. But I'm not sure if there are any implications I'm overlooking by the idea of using a dedicated thread. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Use helper virProcessRunInMountNamespace in lxcDomainShutdownFlags and lxcDomainReboot. Otherwise, a malicious guest could use symlinks to force the host to manipulate the wrong file in the host's namespace. Idea by Dan Berrange, based on an initial report by Reco <recoverym4n@gmail.com> at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=732394 Signed-off-by: Eric Blake <eblake@redhat.com> --- src/lxc/lxc_driver.c | 38 ++++++++++++++++++++------------------ src/util/virinitctl.c | 28 +++++++++++----------------- src/util/virinitctl.h | 5 ++--- 3 files changed, 33 insertions(+), 38 deletions(-) diff --git a/src/lxc/lxc_driver.c b/src/lxc/lxc_driver.c index e5298d1..30d65c8 100644 --- a/src/lxc/lxc_driver.c +++ b/src/lxc/lxc_driver.c @@ -2694,12 +2694,20 @@ lxcConnectListAllDomains(virConnectPtr conn, static int +lxcDomainInitctlCallback(pid_t pid ATTRIBUTE_UNUSED, + void *opaque) +{ + int *command = opaque; + return virInitctlSetRunLevel(*command); +} + + +static int lxcDomainShutdownFlags(virDomainPtr dom, unsigned int flags) { virLXCDomainObjPrivatePtr priv; virDomainObjPtr vm; - char *vroot = NULL; int ret = -1; int rc; @@ -2726,16 +2734,14 @@ lxcDomainShutdownFlags(virDomainPtr dom, goto cleanup; } - if (virAsprintf(&vroot, "/proc/%llu/root", - (unsigned long long)priv->initpid) < 0) - goto cleanup; - if (flags == 0 || (flags & VIR_DOMAIN_SHUTDOWN_INITCTL)) { - if ((rc = virInitctlSetRunLevel(VIR_INITCTL_RUNLEVEL_POWEROFF, - vroot)) < 0) { + int command = VIR_INITCTL_RUNLEVEL_POWEROFF; + + if ((rc = virProcessRunInMountNamespace(priv->initpid, + lxcDomainInitctlCallback, + &command)) < 0) goto cleanup; - } if (rc == 0 && flags != 0 && ((flags & ~VIR_DOMAIN_SHUTDOWN_INITCTL) == 0)) { virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", @@ -2761,7 +2767,6 @@ lxcDomainShutdownFlags(virDomainPtr dom, ret = 0; cleanup: - VIR_FREE(vroot); if (vm) virObjectUnlock(vm); return ret; @@ -2773,13 +2778,13 @@ lxcDomainShutdown(virDomainPtr dom) return lxcDomainShutdownFlags(dom, 0); } + static int lxcDomainReboot(virDomainPtr dom, unsigned int flags) { virLXCDomainObjPrivatePtr priv; virDomainObjPtr vm; - char *vroot = NULL; int ret = -1; int rc; @@ -2806,16 +2811,14 @@ lxcDomainReboot(virDomainPtr dom, goto cleanup; } - if (virAsprintf(&vroot, "/proc/%llu/root", - (unsigned long long)priv->initpid) < 0) - goto cleanup; - if (flags == 0 || (flags & VIR_DOMAIN_REBOOT_INITCTL)) { - if ((rc = virInitctlSetRunLevel(VIR_INITCTL_RUNLEVEL_REBOOT, - vroot)) < 0) { + int command = VIR_INITCTL_RUNLEVEL_REBOOT; + + if ((rc = virProcessRunInMountNamespace(priv->initpid, + lxcDomainInitctlCallback, + &command)) < 0) goto cleanup; - } if (rc == 0 && flags != 0 && ((flags & ~VIR_DOMAIN_SHUTDOWN_INITCTL) == 0)) { virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", @@ -2841,7 +2844,6 @@ lxcDomainReboot(virDomainPtr dom, ret = 0; cleanup: - VIR_FREE(vroot); if (vm) virObjectUnlock(vm); return ret; diff --git a/src/util/virinitctl.c b/src/util/virinitctl.c index 64bc23a..029dbb9 100644 --- a/src/util/virinitctl.c +++ b/src/util/virinitctl.c @@ -1,7 +1,7 @@ /* * virinitctl.c: API for talking to init systems via initctl * - * Copyright (C) 2012 Red Hat, Inc. + * Copyright (C) 2012-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -111,16 +111,18 @@ struct virInitctlRequest { # endif /* - * Send a message to init to change the runlevel + * Send a message to init to change the runlevel. This function is + * asynchronous-signal-safe (thus safe to use after fork of a + * multithreaded parent) - which is good, because it should only be + * used after forking and entering correct namespace. * * Returns 1 on success, 0 if initctl does not exist, -1 on error */ -int virInitctlSetRunLevel(virInitctlRunLevel level, - const char *vroot) +int +virInitctlSetRunLevel(virInitctlRunLevel level) { struct virInitctlRequest req; int fd = -1; - char *path = NULL; int ret = -1; memset(&req, 0, sizeof(req)); @@ -131,36 +133,28 @@ int virInitctlSetRunLevel(virInitctlRunLevel level, /* Yes it is an 'int' field, but wants a numeric character. Go figure */ req.runlevel = '0' + level; - if (vroot) { - if (virAsprintf(&path, "%s/%s", vroot, VIR_INITCTL_FIFO) < 0) - return -1; - } else { - if (VIR_STRDUP(path, VIR_INITCTL_FIFO) < 0) - return -1; - } - - if ((fd = open(path, O_WRONLY|O_NONBLOCK|O_CLOEXEC|O_NOCTTY)) < 0) { + if ((fd = open(VIR_INITCTL_FIFO, + O_WRONLY|O_NONBLOCK|O_CLOEXEC|O_NOCTTY)) < 0) { if (errno == ENOENT) { ret = 0; goto cleanup; } virReportSystemError(errno, _("Cannot open init control %s"), - path); + VIR_INITCTL_FIFO); goto cleanup; } if (safewrite(fd, &req, sizeof(req)) != sizeof(req)) { virReportSystemError(errno, _("Failed to send request to init control %s"), - path); + VIR_INITCTL_FIFO); goto cleanup; } ret = 1; cleanup: - VIR_FREE(path); VIR_FORCE_CLOSE(fd); return ret; } diff --git a/src/util/virinitctl.h b/src/util/virinitctl.h index 862539f..18052c0 100644 --- a/src/util/virinitctl.h +++ b/src/util/virinitctl.h @@ -1,7 +1,7 @@ /* * virinitctl.h: API for talking to init systems via initctl * - * Copyright (C) 2012 Red Hat, Inc. + * Copyright (C) 2012-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -37,7 +37,6 @@ enum virInitctlRunLevel { VIR_INITCTL_RUNLEVEL_LAST }; -int virInitctlSetRunLevel(virInitctlRunLevel level, - const char *vroot); +int virInitctlSetRunLevel(virInitctlRunLevel level); #endif -- 1.8.4.2

On 12/23/2013 10:55 PM, Eric Blake wrote:
We are still awaiting a CVE number to be assigned,
Wow, that was fast. I just learned that this is assigned CVE-2013-6456.
but Reco reported in Debian bug #732394 that a malicious guest could cause virDomainShutdown and virDomainReboot to cause the host to misbehave, if the host blindly follows symlinks in its own mount namespace instead of the guest's namespace.
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Hi. On Mon, 23 Dec 2013 23:00:43 -0700 Eric Blake <eblake@redhat.com> wrote:
On 12/23/2013 10:55 PM, Eric Blake wrote:
We are still awaiting a CVE number to be assigned,
Wow, that was fast. I just learned that this is assigned CVE-2013-6456.
Exellent job with these patches. Was worth waiting for them, that's for sure. I've applied these patches to my testing environment, and, to my big surprise, with these patches enabled, 'virsh -c lxc:// shutdown' forces libvirtd to terminate itself and all its' children. A relevant part of strace is (3933 is a parent, 4038 is a child from the fork, syscall_308 is setns): 3933 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID| CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90ed0aa9d0) = 4038 3933 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 3933 syscall_308(0x16, 0, 0, 0xf5c, 0xf5d, …) = -1 (errno 22) 3933 exit_group(125) = ? 4038 syscall_308(0x16, 0, 0, 0, 0, …) = -1 (errno 22) 4038 open("/dev/initctl", O_WRONLY|O_NOCTTY|O_NONBLOCK|O_CLOEXEC) = -1 ENOENT (No such file or directory) 4038 exit_group(0) = ? My best guess it that changed virFork forces libvirtd to misbehave, but I could be wrong. Sincerely yours, Reco.

On 12/24/2013 12:08 AM, Reco wrote:
A relevant part of strace is (3933 is a parent, 4038 is a child from the fork, syscall_308 is setns):
3933 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID| CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f90ed0aa9d0) = 4038 3933 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 3933 syscall_308(0x16, 0, 0, 0xf5c, 0xf5d, …) = -1 (errno 22)
Whoops - why is the parent trying to call the same thing...
3933 exit_group(125) = ? 4038 syscall_308(0x16, 0, 0, 0, 0, …) = -1 (errno 22)
...as the child? Oh, I see. [I _did_ say I hadn't tested patches 6 and 7 as much as the first five.] Please squash in this typo fix to 6/7, and that should clean up the problem (but I do appreciate you testing it): diff --git i/src/util/virprocess.c w/src/util/virprocess.c index c99b75a..e069483 100644 --- i/src/util/virprocess.c +++ w/src/util/virprocess.c @@ -879,7 +879,7 @@ virProcessRunInMountNamespace(pid_t pid, goto cleanup; } - if ((cpid = virFork() < 0)) + if ((cpid = virFork()) < 0) goto cleanup; if (cpid == 0) { /* child */ -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, 24 Dec 2013 06:29:11 -0700 Eric Blake <eblake@redhat.com> wrote:
diff --git i/src/util/virprocess.c w/src/util/virprocess.c index c99b75a..e069483 100644 --- i/src/util/virprocess.c +++ w/src/util/virprocess.c @@ -879,7 +879,7 @@ virProcessRunInMountNamespace(pid_t pid, goto cleanup; }
- if ((cpid = virFork() < 0)) + if ((cpid = virFork()) < 0) goto cleanup; if (cpid == 0) { /* child */
Thanks, that solves it. With this extra patch libvirtd writes to the container's /dev/initctl only and terminates child process only. Reco

On 12/24/2013 06:45 AM, Reco wrote:
On Tue, 24 Dec 2013 06:29:11 -0700 Eric Blake <eblake@redhat.com> wrote:
diff --git i/src/util/virprocess.c w/src/util/virprocess.c index c99b75a..e069483 100644 --- i/src/util/virprocess.c +++ w/src/util/virprocess.c @@ -879,7 +879,7 @@ virProcessRunInMountNamespace(pid_t pid, goto cleanup; }
- if ((cpid = virFork() < 0)) + if ((cpid = virFork()) < 0) goto cleanup; if (cpid == 0) { /* child */
Thanks, that solves it. With this extra patch libvirtd writes to the container's /dev/initctl only and terminates child process only.
Thanks again for the functional review. I'm still waiting for a code review from anyone willing, since this does fix a security issue and I don't want to introduce an unintentional regression. And I guess there's still the need to fix the access to the namespace /dev during device hotplog... -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 01/07/2014 12:18 PM, Eric Blake wrote:
On 12/24/2013 06:45 AM, Reco wrote:
On Tue, 24 Dec 2013 06:29:11 -0700 Eric Blake <eblake@redhat.com> wrote:
diff --git i/src/util/virprocess.c w/src/util/virprocess.c index c99b75a..e069483 100644 --- i/src/util/virprocess.c +++ w/src/util/virprocess.c @@ -879,7 +879,7 @@ virProcessRunInMountNamespace(pid_t pid, goto cleanup; }
- if ((cpid = virFork() < 0)) + if ((cpid = virFork()) < 0) goto cleanup; if (cpid == 0) { /* child */
Thanks, that solves it. With this extra patch libvirtd writes to the container's /dev/initctl only and terminates child process only.
Thanks again for the functional review. I'm still waiting for a code review from anyone willing, since this does fix a security issue and I don't want to introduce an unintentional regression. And I guess there's still the need to fix the access to the namespace /dev during device hotplog...
Yes, device hotplug has the same problem. ACK to this serial. Thanks!

On 01/07/2014 06:32 PM, Gao feng wrote:
On 01/07/2014 12:18 PM, Eric Blake wrote:
On 12/24/2013 06:45 AM, Reco wrote:
On Tue, 24 Dec 2013 06:29:11 -0700 Eric Blake <eblake@redhat.com> wrote:
diff --git i/src/util/virprocess.c w/src/util/virprocess.c index c99b75a..e069483 100644 --- i/src/util/virprocess.c +++ w/src/util/virprocess.c @@ -879,7 +879,7 @@ virProcessRunInMountNamespace(pid_t pid, goto cleanup; }
- if ((cpid = virFork() < 0)) + if ((cpid = virFork()) < 0) goto cleanup; if (cpid == 0) { /* child */
Thanks, that solves it. With this extra patch libvirtd writes to the container's /dev/initctl only and terminates child process only.
Thanks again for the functional review. I'm still waiting for a code review from anyone willing, since this does fix a security issue and I don't want to introduce an unintentional regression. And I guess there's still the need to fix the access to the namespace /dev during device hotplog...
Yes, device hotplug has the same problem. ACK to this serial.
s/serial/series/ (English is weird) I've pushed patch 1, but am seeing if I can work up patches for the /dev issue before I push any others (in particular, if that work turns up any need to rethink the strategy, I'd like to avoid the churn - because I still want this CVE fixed in time for the 1.2.1 release). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (4)
-
Eric Blake
-
Gao feng
-
Guido Günther
-
Reco