[libvirt] [PATCH 00/15] Permit setting capabilities for uid!=0 processes

There are a bunch of patches here, but each is small and single-purpose to make reviewing easier (and also so that any potential regressions can be easily bisected). The original purpose of the patches was to permit setting CAP_COMPROMISE_KERNEL for non-root qemu processes, since Fedora 18 now requires that in order for generic PCI passthrough to work (the alternative was to always run qemu as root). Although we may not actually want to do that part (if we can convince kernel people to implement CAP_COMPROMISE_KERNEL such that it's only required when *opening* the necessary sysfs file (done by libvirt), rather than for every read/write (done by qemu), then we will not need CAP_COMPROMISE_KERNEL for qemu), but that is just a couple lines in the final patch, and the rest of the series is still useful, as it make dropping/keeping caps truly work for non-root child processes - this has never before been the case. (for example, CAP_SYS_RAWIO is needed for generic scsi passthrough to work, and until now the only way to have that was to run *all* qemus as root). A bit higher level description of what I've done with all the patches: 1) remove the programmable "hook" from virExecWithHook(), since that function was only called from one place, and always with the same hook function. Rename virExecWithHook() to virExec(), and replace the call to that hook with inline code. 2) give virCommand an API to set the intended uid/gid of the command that's going to be run, and use that instead of a "user hook" where appropriate (in the process completely eliminating two hook functions). 3) Also add an API to virCommand to do the final "set the process label" step for selinux/apparmor. 4) Add a new API to the security driver (and use it from qemu) called virSecurityManagerSetChildProcessLabel() which a) is called prior to virCommandRun() rather than from a command "hook" function, b) takes a virCommand, and c) rather than immediately performing the operation (as virSecurityManagerSetProcessLabel() did), merely stores the necessary information in the virCommand so that virExec can perform the operation (setting selinux label, setuid/gid, etc) 5) make a new function combining the setting of uid/gid and maintaining of capabilities, because that is the only way you can set uid!=0 and still maintain capabilities. Use this in virExec() 6) *Finally* set the CAP_COMPROMISE_KERNEL capability unconditionally for all qemu processes. (If we really do have to do this, we may want to consider making it a qemu.conf setting). Laine Stump (15): util: eliminate generic hook from virExecWithHook util: eliminate extra args from virExec util: refactor virCommandHook into virExec and virCommandHandshakeChild util: add virCommandSetUID and virCommandSetGID util: make virSetUIDGID a NOP when uid or gid is -1 qemu: replace exec hook with virCommandSetUID/GID in qemuCaps* qemu: replace exec hook with virCommandSetUID/GID in storage_backend build: define SECDRIVER_LIBS in Makefile.am util: add security label setting to virCommand security: add new virSecurityManagerSetChildProcessLabel API qemu: let virCommand set child process security labels/uid/gid util: drop capabilities immediately after changing uid/gid of child util: virSetUIDGIDWithCaps - change uid while keeping caps util: maintain caps when running command with uid != 0 qemu: set CAP_COMPROMISE_KERNEL so that pci passthrough works src/Makefile.am | 35 ++- src/libvirt_private.syms | 5 + src/qemu/qemu_capabilities.c | 61 ++--- src/qemu/qemu_process.c | 23 +- src/security/security_apparmor.c | 41 +++- src/security/security_dac.c | 24 +- src/security/security_driver.h | 6 +- src/security/security_manager.c | 13 +- src/security/security_manager.h | 6 +- src/security/security_nop.c | 10 +- src/security/security_selinux.c | 34 ++- src/security/security_stack.c | 20 +- src/storage/storage_backend.c | 28 +-- src/util/vircommand.c | 481 ++++++++++++++++++--------------------- src/util/vircommand.h | 9 +- src/util/virutil.c | 113 ++++++++- src/util/virutil.h | 1 + 17 files changed, 543 insertions(+), 367 deletions(-) -- 1.8.1

virExecWithHook is only called from one place, so it always has the same "hook" function (virHookCommand), and the data sent to that function is always a virCommandPtr, so eliminate the function and generic data from the arglist, and replace it with "virCommandPtr cmd". The call to (hook)(data) is replaced with "virHookCommand(cmd)". Finally, virExecWithHook is renamed to virExec. Indentation has been updated only for code that will remain after the next patch, which will remove all other args to virExec (since they are now redundant, as they're all members of virCommandPtr). --- src/util/vircommand.c | 79 ++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 42 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index db7dbca..02bf50d 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -1,7 +1,7 @@ /* * vircommand.c: Child command execution * - * Copyright (C) 2010-2012 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -45,7 +45,7 @@ #define VIR_FROM_THIS VIR_FROM_NONE -/* Flags for virExecWithHook */ +/* Flags for virExec */ enum { VIR_EXEC_NONE = 0, VIR_EXEC_NONBLOCK = (1 << 0), @@ -104,6 +104,8 @@ struct _virCommand { unsigned long long capabilities; }; +static int virCommandHook(virCommandPtr cmd); + /* * virCommandFDIsSet: * @fd: FD to test @@ -390,21 +392,19 @@ prepareStdFd(int fd, int std) * VIR_EXEC_NONE : Default function behavior * VIR_EXEC_NONBLOCK : Set child process output fd's as non-blocking * VIR_EXEC_DAEMON : Daemonize the child process - * @hook optional virExecHook function to call prior to exec - * @data data to pass to the hook function + * @cmd virCommandPtr to pass to virCommandHook * @pidfile path to use as pidfile for daemonized process (needs DAEMON flag) * @capabilities capabilities to keep */ static int -virExecWithHook(const char *const*argv, +virExec(virCommandPtr cmd, + const char *const*argv, const char *const*envp, const int *keepfd, int keepfd_size, pid_t *retpid, int infd, int *outfd, int *errfd, unsigned int flags, - virExecHook hook, - void *data, char *pidfile, unsigned long long capabilities) { @@ -417,6 +417,7 @@ virExecWithHook(const char *const*argv, int tmpfd; const char *binary = NULL; int forkRet; + struct sigaction waxon, waxoff; if (argv[0][0] != '/') { if (!(binary = virFindFileInPath(argv[0]))) { @@ -602,33 +603,30 @@ virExecWithHook(const char *const*argv, } } - if (hook) { - /* virFork reset all signal handlers to the defaults. - * This is good for the child process, but our hook - * risks running something that generates SIGPIPE, - * so we need to temporarily block that again - */ - struct sigaction waxon, waxoff; - memset(&waxoff, 0, sizeof(waxoff)); - waxoff.sa_handler = SIG_IGN; - sigemptyset(&waxoff.sa_mask); - memset(&waxon, 0, sizeof(waxon)); - if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) { - virReportSystemError(errno, "%s", - _("Could not disable SIGPIPE")); - goto fork_error; - } + /* virFork reset all signal handlers to the defaults. + * This is good for the child process, but our hook + * risks running something that generates SIGPIPE, + * so we need to temporarily block that again + */ + memset(&waxoff, 0, sizeof(waxoff)); + waxoff.sa_handler = SIG_IGN; + sigemptyset(&waxoff.sa_mask); + memset(&waxon, 0, sizeof(waxon)); + if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) { + virReportSystemError(errno, "%s", + _("Could not disable SIGPIPE")); + goto fork_error; + } - if ((hook)(data) != 0) { - VIR_DEBUG("Hook function failed."); - goto fork_error; - } + if (virCommandHook(cmd) != 0) { + VIR_DEBUG("Hook function failed."); + goto fork_error; + } - if (sigaction(SIGPIPE, &waxon, NULL) < 0) { - virReportSystemError(errno, "%s", - _("Could not re-enable SIGPIPE")); - goto fork_error; - } + if (sigaction(SIGPIPE, &waxon, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Could not re-enable SIGPIPE")); + goto fork_error; } /* The steps above may need todo something privileged, so @@ -712,7 +710,8 @@ virRun(const char *const *argv ATTRIBUTE_UNUSED, } static int -virExecWithHook(const char *const*argv ATTRIBUTE_UNUSED, +virExec(virCommandPtr cmd ATTRIBUTE_UNUSED, + const char *const*argv ATTRIBUTE_UNUSED, const char *const*envp ATTRIBUTE_UNUSED, const int *keepfd ATTRIBUTE_UNUSED, int keepfd_size ATTRIBUTE_UNUSED, @@ -721,15 +720,13 @@ virExecWithHook(const char *const*argv ATTRIBUTE_UNUSED, int *outfd ATTRIBUTE_UNUSED, int *errfd ATTRIBUTE_UNUSED, int flags_unused ATTRIBUTE_UNUSED, - virExecHook hook ATTRIBUTE_UNUSED, - void *data ATTRIBUTE_UNUSED, char *pidfile ATTRIBUTE_UNUSED, unsigned long long capabilities ATTRIBUTE_UNUSED) { /* XXX: Some day we can implement pieces of virCommand/virExec on * top of _spawn() or CreateProcess(), but we can't implement * everything, since mingw completely lacks fork(), so we cannot - * run hook code in the child. */ + * run our own code in the child process. */ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("virExec is not implemented for WIN32")); return -1; @@ -2061,9 +2058,8 @@ virCommandRun(virCommandPtr cmd, int *exitstatus) * Perform all virCommand-specific actions, along with the user hook. */ static int -virCommandHook(void *data) +virCommandHook(virCommandPtr cmd) { - virCommandPtr cmd = data; int res = 0; if (cmd->hook) { @@ -2401,7 +2397,8 @@ virCommandRunAsync(virCommandPtr cmd, pid_t *pid) VIR_DEBUG("About to run %s", str ? str : cmd->args[0]); VIR_FREE(str); - ret = virExecWithHook((const char *const *)cmd->args, + ret = virExec(cmd, + (const char *const *)cmd->args, (const char *const *)cmd->env, cmd->preserve, cmd->preserve_size, @@ -2410,8 +2407,6 @@ virCommandRunAsync(virCommandPtr cmd, pid_t *pid) cmd->outfdptr, cmd->errfdptr, cmd->flags, - virCommandHook, - cmd, cmd->pidfile, cmd->capabilities); @@ -2549,7 +2544,7 @@ void virCommandAbort(virCommandPtr cmd ATTRIBUTE_UNUSED) { /* Mingw lacks WNOHANG and kill(). But since we haven't ported - * virExecWithHook to mingw yet, there's no process to be killed, + * virExec to mingw yet, there's no process to be killed, * making this implementation trivially correct for now :) */ } #endif -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
virExecWithHook is only called from one place, so it always has the same "hook" function (virHookCommand), and the data sent to that function is always a virCommandPtr, so eliminate the function and generic data from the arglist, and replace it with "virCommandPtr cmd". The call to (hook)(data) is replaced with "virHookCommand(cmd)". Finally, virExecWithHook is renamed to virExec.
Indentation has been updated only for code that will remain after the next patch, which will remove all other args to virExec (since they are now redundant, as they're all members of virCommandPtr). --- src/util/vircommand.c | 79 ++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 42 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

All args except "cmd" in the call to virExec are now redundant, since they can all be found in cmd, so remove the args and reference the data directly in cmd. One exception to this is that "infd" was being modified within virExec, and modifying the original in cmd caused make check failures, so cmd->infd is copied to a local, and the local is used during virExec(). --- src/util/vircommand.c | 142 +++++++++++++++++--------------------------------- 1 file changed, 49 insertions(+), 93 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 02bf50d..b41c78b 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -378,40 +378,18 @@ prepareStdFd(int fd, int std) } /* - * @argv argv to exec - * @envp optional environment to use for exec - * @keepfd options fd_ret to keep open for child process - * @retpid optional pointer to store child process pid - * @infd optional file descriptor to use as child input, otherwise /dev/null - * @outfd optional pointer to communicate output fd behavior - * outfd == NULL : Use /dev/null - * *outfd == -1 : Use a new fd - * *outfd != -1 : Use *outfd - * @errfd optional pointer to communcate error fd behavior. See outfd - * @flags possible combination of the following: - * VIR_EXEC_NONE : Default function behavior - * VIR_EXEC_NONBLOCK : Set child process output fd's as non-blocking - * VIR_EXEC_DAEMON : Daemonize the child process - * @cmd virCommandPtr to pass to virCommandHook - * @pidfile path to use as pidfile for daemonized process (needs DAEMON flag) - * @capabilities capabilities to keep + * virExec: + * @cmd virCommandPtr containing all information about the program to + * exec. */ static int -virExec(virCommandPtr cmd, - const char *const*argv, - const char *const*envp, - const int *keepfd, - int keepfd_size, - pid_t *retpid, - int infd, int *outfd, int *errfd, - unsigned int flags, - char *pidfile, - unsigned long long capabilities) +virExec(virCommandPtr cmd) { pid_t pid; int null = -1, i, openmax; int pipeout[2] = {-1,-1}; int pipeerr[2] = {-1,-1}; + int childin = cmd->infd; int childout = -1; int childerr = -1; int tmpfd; @@ -419,41 +397,41 @@ virExec(virCommandPtr cmd, int forkRet; struct sigaction waxon, waxoff; - if (argv[0][0] != '/') { - if (!(binary = virFindFileInPath(argv[0]))) { + if (cmd->args[0][0] != '/') { + if (!(binary = virFindFileInPath(cmd->args[0]))) { virReportSystemError(ENOENT, _("Cannot find '%s' in path"), - argv[0]); + cmd->args[0]); return -1; } } else { - binary = argv[0]; + binary = cmd->args[0]; } - if (infd < 0) { + if (childin < 0) { if (getDevNull(&null) < 0) goto cleanup; - infd = null; + childin = null; } - if (outfd != NULL) { - if (*outfd == -1) { + if (cmd->outfdptr != NULL) { + if (*cmd->outfdptr == -1) { if (pipe2(pipeout, O_CLOEXEC) < 0) { virReportSystemError(errno, "%s", _("cannot create pipe")); goto cleanup; } - if ((flags & VIR_EXEC_NONBLOCK) && + if ((cmd->flags & VIR_EXEC_NONBLOCK) && virSetNonBlock(pipeout[0]) == -1) { - virReportSystemError(errno, - "%s", _("Failed to set non-blocking file descriptor flag")); + virReportSystemError(errno, "%s", + _("Failed to set non-blocking file descriptor flag")); goto cleanup; } childout = pipeout[1]; } else { - childout = *outfd; + childout = *cmd->outfdptr; } } else { if (getDevNull(&null) < 0) @@ -461,26 +439,26 @@ virExec(virCommandPtr cmd, childout = null; } - if (errfd != NULL) { - if (errfd == outfd) { + if (cmd->errfdptr != NULL) { + if (cmd->errfdptr == cmd->outfdptr) { childerr = childout; - } else if (*errfd == -1) { + } else if (*cmd->errfdptr == -1) { if (pipe2(pipeerr, O_CLOEXEC) < 0) { virReportSystemError(errno, "%s", _("Failed to create pipe")); goto cleanup; } - if ((flags & VIR_EXEC_NONBLOCK) && + if ((cmd->flags & VIR_EXEC_NONBLOCK) && virSetNonBlock(pipeerr[0]) == -1) { - virReportSystemError(errno, - "%s", _("Failed to set non-blocking file descriptor flag")); + virReportSystemError(errno, "%s", + _("Failed to set non-blocking file descriptor flag")); goto cleanup; } childerr = pipeerr[1]; } else { - childerr = *errfd; + childerr = *cmd->errfdptr; } } else { if (getDevNull(&null) < 0) @@ -500,18 +478,18 @@ virExec(virCommandPtr cmd, } VIR_FORCE_CLOSE(null); - if (outfd && *outfd == -1) { + if (cmd->outfdptr && *cmd->outfdptr == -1) { VIR_FORCE_CLOSE(pipeout[1]); - *outfd = pipeout[0]; + *cmd->outfdptr = pipeout[0]; } - if (errfd && *errfd == -1) { + if (cmd->errfdptr && *cmd->errfdptr == -1) { VIR_FORCE_CLOSE(pipeerr[1]); - *errfd = pipeerr[0]; + *cmd->errfdptr = pipeerr[0]; } - *retpid = pid; + cmd->pid = pid; - if (binary != argv[0]) + if (binary != cmd->args[0]) VIR_FREE(binary); return 0; @@ -528,9 +506,10 @@ virExec(virCommandPtr cmd, openmax = sysconf(_SC_OPEN_MAX); for (i = 3; i < openmax; i++) { - if (i == infd || i == childout || i == childerr) + if (i == childin || i == childout || i == childerr) continue; - if (!keepfd || !virCommandFDIsSet(i, keepfd, keepfd_size)) { + if (!cmd->preserve || + !virCommandFDIsSet(i, cmd->preserve, cmd->preserve_size)) { tmpfd = i; VIR_MASS_CLOSE(tmpfd); } else if (virSetInherit(i, true) < 0) { @@ -539,7 +518,7 @@ virExec(virCommandPtr cmd, } } - if (prepareStdFd(infd, STDIN_FILENO) < 0) { + if (prepareStdFd(childin, STDIN_FILENO) < 0) { virReportSystemError(errno, "%s", _("failed to setup stdin file handle")); goto fork_error; @@ -555,9 +534,9 @@ virExec(virCommandPtr cmd, goto fork_error; } - if (infd != STDIN_FILENO && infd != null && infd != childerr && - infd != childout) - VIR_FORCE_CLOSE(infd); + if (childin != STDIN_FILENO && childin != null && + childin != childerr && childin != childout) + VIR_FORCE_CLOSE(childin); if (childout > STDERR_FILENO && childout != null && childout != childerr) VIR_FORCE_CLOSE(childout); if (childerr > STDERR_FILENO && childerr != null) @@ -569,7 +548,7 @@ virExec(virCommandPtr cmd, /* Daemonize as late as possible, so the parent process can detect * the above errors with wait* */ - if (flags & VIR_EXEC_DAEMON) { + if (cmd->flags & VIR_EXEC_DAEMON) { if (setsid() < 0) { virReportSystemError(errno, "%s", _("cannot become session leader")); @@ -590,13 +569,13 @@ virExec(virCommandPtr cmd, } if (pid > 0) { - if (pidfile && (virPidFileWritePath(pidfile,pid) < 0)) { + if (cmd->pidfile && (virPidFileWritePath(cmd->pidfile, pid) < 0)) { kill(pid, SIGTERM); usleep(500*1000); kill(pid, SIGTERM); virReportSystemError(errno, _("could not write pidfile %s for %d"), - pidfile, pid); + cmd->pidfile, pid); goto fork_error; } _exit(0); @@ -631,21 +610,21 @@ virExec(virCommandPtr cmd, /* The steps above may need todo something privileged, so * we delay clearing capabilities until the last minute */ - if (capabilities || (flags & VIR_EXEC_CLEAR_CAPS)) - if (virSetCapabilities(capabilities) < 0) + if (cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) + if (virSetCapabilities(cmd->capabilities) < 0) goto fork_error; /* Close logging again to ensure no FDs leak to child */ virLogReset(); - if (envp) - execve(binary, (char **) argv, (char**)envp); + if (cmd->env) + execve(binary, cmd->args, cmd->env); else - execv(binary, (char **) argv); + execv(binary, cmd->args); virReportSystemError(errno, _("cannot execute binary %s"), - argv[0]); + cmd->args[0]); fork_error: virDispatchError(NULL); @@ -655,7 +634,7 @@ virExec(virCommandPtr cmd, /* This is cleanup of parent process only - child should never jump here on error */ - if (binary != argv[0]) + if (binary != cmd->args[0]) VIR_FREE(binary); /* NB we don't virReportError() on any failures here @@ -710,18 +689,7 @@ virRun(const char *const *argv ATTRIBUTE_UNUSED, } static int -virExec(virCommandPtr cmd ATTRIBUTE_UNUSED, - const char *const*argv ATTRIBUTE_UNUSED, - const char *const*envp ATTRIBUTE_UNUSED, - const int *keepfd ATTRIBUTE_UNUSED, - int keepfd_size ATTRIBUTE_UNUSED, - pid_t *retpid ATTRIBUTE_UNUSED, - int infd ATTRIBUTE_UNUSED, - int *outfd ATTRIBUTE_UNUSED, - int *errfd ATTRIBUTE_UNUSED, - int flags_unused ATTRIBUTE_UNUSED, - char *pidfile ATTRIBUTE_UNUSED, - unsigned long long capabilities ATTRIBUTE_UNUSED) +virExec(virCommandPtr cmd ATTRIBUTE_UNUSED) { /* XXX: Some day we can implement pieces of virCommand/virExec on * top of _spawn() or CreateProcess(), but we can't implement @@ -2397,19 +2365,7 @@ virCommandRunAsync(virCommandPtr cmd, pid_t *pid) VIR_DEBUG("About to run %s", str ? str : cmd->args[0]); VIR_FREE(str); - ret = virExec(cmd, - (const char *const *)cmd->args, - (const char *const *)cmd->env, - cmd->preserve, - cmd->preserve_size, - &cmd->pid, - cmd->infd, - cmd->outfdptr, - cmd->errfdptr, - cmd->flags, - cmd->pidfile, - cmd->capabilities); - + ret = virExec(cmd); VIR_DEBUG("Command result %d, with PID %d", ret, (int)cmd->pid); -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
All args except "cmd" in the call to virExec are now redundant, since they can all be found in cmd, so remove the args and reference the data directly in cmd. One exception to this is that "infd" was being modified within virExec, and modifying the original in cmd caused make check failures, so cmd->infd is copied to a local, and the local is used during virExec(). --- src/util/vircommand.c | 142 +++++++++++++++++--------------------------------- 1 file changed, 49 insertions(+), 93 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

--- src/util/vircommand.c | 149 +++++++++++++++++++++++--------------------------- 1 file changed, 68 insertions(+), 81 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index b41c78b..ba81c14 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -104,7 +104,7 @@ struct _virCommand { unsigned long long capabilities; }; -static int virCommandHook(virCommandPtr cmd); +static int virCommandHandshakeChild(virCommandPtr cmd); /* * virCommandFDIsSet: @@ -394,7 +394,7 @@ virExec(virCommandPtr cmd) int childerr = -1; int tmpfd; const char *binary = NULL; - int forkRet; + int forkRet, ret; struct sigaction waxon, waxoff; if (cmd->args[0][0] != '/') { @@ -597,11 +597,26 @@ virExec(virCommandPtr cmd) goto fork_error; } - if (virCommandHook(cmd) != 0) { - VIR_DEBUG("Hook function failed."); - goto fork_error; + if (cmd->hook) { + VIR_DEBUG("Run hook %p %p", cmd->hook, cmd->opaque); + ret = cmd->hook(cmd->opaque); + VIR_DEBUG("Done hook %d", ret); + if (ret < 0) + goto fork_error; } + if (cmd->pwd) { + VIR_DEBUG("Running child in %s", cmd->pwd); + if (chdir(cmd->pwd) < 0) { + virReportSystemError(errno, + _("Unable to change to %s"), cmd->pwd); + goto fork_error; + } + } + + if (virCommandHandshakeChild(cmd) < 0) + goto fork_error; + if (sigaction(SIGPIPE, &waxon, NULL) < 0) { virReportSystemError(errno, "%s", _("Could not re-enable SIGPIPE")); @@ -2022,82 +2037,6 @@ virCommandRun(virCommandPtr cmd, int *exitstatus) } -/* - * Perform all virCommand-specific actions, along with the user hook. - */ -static int -virCommandHook(virCommandPtr cmd) -{ - int res = 0; - - if (cmd->hook) { - VIR_DEBUG("Run hook %p %p", cmd->hook, cmd->opaque); - res = cmd->hook(cmd->opaque); - VIR_DEBUG("Done hook %d", res); - } - if (res == 0 && cmd->pwd) { - VIR_DEBUG("Running child in %s", cmd->pwd); - res = chdir(cmd->pwd); - if (res < 0) { - virReportSystemError(errno, - _("Unable to change to %s"), cmd->pwd); - } - } - if (cmd->handshake) { - char c = res < 0 ? '0' : '1'; - int rv; - VIR_DEBUG("Notifying parent for handshake start on %d", - cmd->handshakeWait[1]); - if (safewrite(cmd->handshakeWait[1], &c, sizeof(c)) != sizeof(c)) { - virReportSystemError(errno, "%s", - _("Unable to notify parent process")); - return -1; - } - - /* On failure we pass the error message back to parent, - * so they don't have to dig through stderr logs - */ - if (res < 0) { - virErrorPtr err = virGetLastError(); - const char *msg = err ? err->message : - _("Unknown failure during hook execution"); - size_t len = strlen(msg) + 1; - if (safewrite(cmd->handshakeWait[1], msg, len) != len) { - virReportSystemError(errno, "%s", - _("Unable to send error to parent")); - return -1; - } - return -1; - } - - VIR_DEBUG("Waiting on parent for handshake complete on %d", - cmd->handshakeNotify[0]); - if ((rv = saferead(cmd->handshakeNotify[0], &c, - sizeof(c))) != sizeof(c)) { - if (rv < 0) - virReportSystemError(errno, "%s", - _("Unable to wait on parent process")); - else - virReportSystemError(EIO, "%s", - _("libvirtd quit during handshake")); - return -1; - } - if (c != '1') { - virReportSystemError(EINVAL, - _("Unexpected confirm code '%c' from parent"), - c); - return -1; - } - VIR_FORCE_CLOSE(cmd->handshakeWait[1]); - VIR_FORCE_CLOSE(cmd->handshakeNotify[0]); - } - - VIR_DEBUG("Hook is done %d", res); - - return res; -} - - static void virCommandHandleReadWrite(int watch, int fd, int events, void *opaque) { @@ -2546,6 +2485,54 @@ void virCommandRequireHandshake(virCommandPtr cmd) cmd->handshake = true; } +/* virCommandHandshakeChild: + * + * child side of handshake - called by child process in virExec() to + * indicate to parent that the child process has successfully + * completed its pre-exec initialization. + */ +static int +virCommandHandshakeChild(virCommandPtr cmd) +{ + char c = '1'; + int rv; + + if (!cmd->handshake) + return true; + + VIR_DEBUG("Notifying parent for handshake start on %d", + cmd->handshakeWait[1]); + if (safewrite(cmd->handshakeWait[1], &c, sizeof(c)) != sizeof(c)) { + virReportSystemError(errno, "%s", + _("Unable to notify parent process")); + return -1; + } + + VIR_DEBUG("Waiting on parent for handshake complete on %d", + cmd->handshakeNotify[0]); + if ((rv = saferead(cmd->handshakeNotify[0], &c, + sizeof(c))) != sizeof(c)) { + if (rv < 0) + virReportSystemError(errno, "%s", + _("Unable to wait on parent process")); + else + virReportSystemError(EIO, "%s", + _("libvirtd quit during handshake")); + return -1; + } + if (c != '1') { + virReportSystemError(EINVAL, + _("Unexpected confirm code '%c' from parent"), + c); + return -1; + } + VIR_FORCE_CLOSE(cmd->handshakeWait[1]); + VIR_FORCE_CLOSE(cmd->handshakeNotify[0]); + + VIR_DEBUG("Handshake with parent is done"); + return 0; +} + /** * virCommandHandshakeWait: * @cmd: command to wait on -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
--- src/util/vircommand.c | 149 +++++++++++++++++++++++--------------------------- 1 file changed, 68 insertions(+), 81 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

If a uid and/or gid is specified for a command, it will be set just after the user-supplied post-fork "hook" function is called. The intent is that this can replace user hook functions that set uid/gid. This moves the setting of uid/gid and dropping of capabilities closer to each other, which is important since the two should really be done at the same time (libcapng provides a single function that does both, which we will be unable to use, but want to mimic as closely as possible). --- src/libvirt_private.syms | 2 ++ src/util/vircommand.c | 26 ++++++++++++++++++++++++++ src/util/vircommand.h | 6 +++++- 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 57e3eb4..83d83ad 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -159,12 +159,14 @@ virCommandRun; virCommandRunAsync; virCommandSetErrorBuffer; virCommandSetErrorFD; +virCommandSetGID; virCommandSetInputBuffer; virCommandSetInputFD; virCommandSetOutputBuffer; virCommandSetOutputFD; virCommandSetPidFile; virCommandSetPreExecHook; +virCommandSetUID; virCommandSetWorkingDirectory; virCommandToString; virCommandTransferFD; diff --git a/src/util/vircommand.c b/src/util/vircommand.c index ba81c14..65838d1 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -101,6 +101,8 @@ struct _virCommand { char *pidfile; bool reap; + uid_t uid; + gid_t gid; unsigned long long capabilities; }; @@ -605,6 +607,12 @@ virExec(virCommandPtr cmd) goto fork_error; } + if (cmd->uid > 0 || cmd->gid > 0) { + VIR_DEBUG("Setting child uid:gid to %u:%u", cmd->uid, cmd->gid); + if (virSetUIDGID(cmd->uid, cmd->gid) < 0) + goto fork_error; + } + if (cmd->pwd) { VIR_DEBUG("Running child in %s", cmd->pwd); if (chdir(cmd->pwd) < 0) { @@ -905,6 +913,24 @@ virCommandSetPidFile(virCommandPtr cmd, const char *pidfile) } +void +virCommandSetGID(virCommandPtr cmd, gid_t gid) +{ + if (!cmd || cmd->has_error) + return; + + cmd->gid = gid; +} + +void +virCommandSetUID(virCommandPtr cmd, uid_t uid) +{ + if (!cmd || cmd->has_error) + return; + + cmd->uid = uid; +} + /** * virCommandClearCaps: * @cmd: the command to modify diff --git a/src/util/vircommand.h b/src/util/vircommand.h index c1a2e24..ac940f0 100644 --- a/src/util/vircommand.h +++ b/src/util/vircommand.h @@ -1,7 +1,7 @@ /* * vircommand.h: Child command execution * - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2011, 2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -61,6 +61,10 @@ void virCommandTransferFD(virCommandPtr cmd, void virCommandSetPidFile(virCommandPtr cmd, const char *pidfile) ATTRIBUTE_NONNULL(2); +void virCommandSetGID(virCommandPtr cmd, gid_t gid); + +void virCommandSetUID(virCommandPtr cmd, uid_t uid); + void virCommandClearCaps(virCommandPtr cmd); void virCommandAllowCap(virCommandPtr cmd, -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
If a uid and/or gid is specified for a command, it will be set just after the user-supplied post-fork "hook" function is called.
The intent is that this can replace user hook functions that set uid/gid. This moves the setting of uid/gid and dropping of capabilities closer to each other, which is important since the two should really be done at the same time (libcapng provides a single function that does both, which we will be unable to use, but want to mimic as closely as possible). --- src/libvirt_private.syms | 2 ++ src/util/vircommand.c | 26 ++++++++++++++++++++++++++ src/util/vircommand.h | 6 +++++- 3 files changed, 33 insertions(+), 1 deletion(-)
+++ b/src/util/vircommand.c @@ -101,6 +101,8 @@ struct _virCommand { char *pidfile; bool reap;
+ uid_t uid; + gid_t gid; unsigned long long capabilities; };
@@ -605,6 +607,12 @@ virExec(virCommandPtr cmd) goto fork_error; }
+ if (cmd->uid > 0 || cmd->gid > 0) {
This says we can't explicitly request to run as uid 0. Wouldn't it be better to pre-initialize these two fields to (uid_t)-1 and (gid_t)-1 when the virCommandPtr is first allocated, and then check if they have been changed away from -1 here?
+ VIR_DEBUG("Setting child uid:gid to %u:%u", cmd->uid, cmd->gid);
Not portable to cygwin; you have to cast uid_t and gid_t to int before sending it through *printf (see src/util/virutil.c for examples). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

setregid() and setreuid() already interpret -1 as a NOP, so this is just an optimization for those, but we are also calling getpwuid_r and initgroups, and it's unclear what the former would do with a uid of -1. --- src/util/virutil.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/virutil.c b/src/util/virutil.c index 24ba954..fddc39e 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -2687,7 +2687,7 @@ virSetUIDGID(uid_t uid, gid_t gid) int err; char *buf = NULL; - if (gid > 0) { + if (gid != -1 && gid > 0) { if (setregid(gid, gid) < 0) { virReportSystemError(err = errno, _("cannot change to '%d' group"), @@ -2696,7 +2696,7 @@ virSetUIDGID(uid_t uid, gid_t gid) } } - if (uid > 0) { + if (uid != -1 && uid > 0) { # ifdef HAVE_INITGROUPS struct passwd pwd, *pwd_result; size_t bufsize; -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
setregid() and setreuid() already interpret -1 as a NOP, so this is just an optimization for those, but we are also calling getpwuid_r and initgroups, and it's unclear what the former would do with a uid of -1. --- src/util/virutil.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/util/virutil.c b/src/util/virutil.c index 24ba954..fddc39e 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -2687,7 +2687,7 @@ virSetUIDGID(uid_t uid, gid_t gid) int err; char *buf = NULL;
- if (gid > 0) { + if (gid != -1 && gid > 0) {
gid_t might be an unsigned type, or it might be a signed type. Really, the only time we should not attempt setregid is if it it was -1; or if we are optimizing for gid==0; but we can't really use gid > 0 as a valid test. Also, the width of gid_t is not mandated by POSIX, so the only portable way to compare to -1 is with a cast. I think you want: if (gid && gid != (gid_t)-1) {
@@ -2696,7 +2696,7 @@ virSetUIDGID(uid_t uid, gid_t gid) } }
- if (uid > 0) { + if (uid != -1 && uid > 0) {
Likewise, you want: if (uid && uid != (uid_t)-1) { I'm not clear on whether avoiding these functions for uid/gid==0 makes sense, or if you instead want: if (uid != (uid_t)-1) { -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Setting the uid/gid of the child process was the only thing done by the hook function in this case, and that can now be done more simply with virCommandSetUID/GID. --- src/qemu/qemu_capabilities.c | 61 +++++++++++--------------------------------- 1 file changed, 15 insertions(+), 46 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index e390cb1..be9b69a 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -279,37 +279,10 @@ static const char *qemuCapsArchToString(virArch arch) } -struct _qemuCapsHookData { - uid_t runUid; - gid_t runGid; -}; -typedef struct _qemuCapsHookData qemuCapsHookData; -typedef qemuCapsHookData *qemuCapsHookDataPtr; - -static int qemuCapsHook(void * data) -{ - int ret; - qemuCapsHookDataPtr hookData = data; - - if (!hookData) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("QEMU uid:gid not specified by caller")); - ret = -1; - goto cleanup; - } - - VIR_DEBUG("Switch QEMU uid:gid to %d:%d", - hookData->runUid, hookData->runGid); - ret = virSetUIDGID(hookData->runUid, hookData->runGid); - -cleanup: - return ret; -} - static virCommandPtr qemuCapsProbeCommand(const char *qemu, qemuCapsPtr caps, - qemuCapsHookDataPtr hookData) + uid_t runUid, gid_t runGid) { virCommandPtr cmd = virCommandNew(qemu); @@ -322,7 +295,8 @@ qemuCapsProbeCommand(const char *qemu, virCommandAddEnvPassCommon(cmd); virCommandClearCaps(cmd); - virCommandSetPreExecHook(cmd, qemuCapsHook, hookData); + virCommandSetGID(cmd, runGid); + virCommandSetUID(cmd, runUid); return cmd; } @@ -416,7 +390,7 @@ no_memory: } static int -qemuCapsProbeMachineTypes(qemuCapsPtr caps, qemuCapsHookDataPtr hookData) +qemuCapsProbeMachineTypes(qemuCapsPtr caps, uid_t runUid, gid_t runGid) { char *output; int ret = -1; @@ -433,7 +407,7 @@ qemuCapsProbeMachineTypes(qemuCapsPtr caps, qemuCapsHookDataPtr hookData) return -1; } - cmd = qemuCapsProbeCommand(caps->binary, caps, hookData); + cmd = qemuCapsProbeCommand(caps->binary, caps, runUid, runGid); virCommandAddArgList(cmd, "-M", "?", NULL); virCommandSetOutputBuffer(cmd, &output); @@ -572,7 +546,7 @@ cleanup: } static int -qemuCapsProbeCPUModels(qemuCapsPtr caps, qemuCapsHookDataPtr hookData) +qemuCapsProbeCPUModels(qemuCapsPtr caps, uid_t runUid, gid_t runGid) { char *output = NULL; int ret = -1; @@ -590,7 +564,7 @@ qemuCapsProbeCPUModels(qemuCapsPtr caps, qemuCapsHookDataPtr hookData) return 0; } - cmd = qemuCapsProbeCommand(caps->binary, caps, hookData); + cmd = qemuCapsProbeCommand(caps->binary, caps, runUid, runGid); virCommandAddArgList(cmd, "-cpu", "?", NULL); virCommandSetOutputBuffer(cmd, &output); @@ -1601,7 +1575,7 @@ qemuCapsParseDeviceStr(qemuCapsPtr caps, const char *str) static int qemuCapsExtractDeviceStr(const char *qemu, qemuCapsPtr caps, - qemuCapsHookDataPtr hookData) + uid_t runUid, gid_t runGid) { char *output = NULL; virCommandPtr cmd; @@ -1615,7 +1589,7 @@ qemuCapsExtractDeviceStr(const char *qemu, * understand '-device name,?', and always exits with status 1 for * the simpler '-device ?', so this function is really only useful * if -help includes "device driver,?". */ - cmd = qemuCapsProbeCommand(qemu, caps, hookData); + cmd = qemuCapsProbeCommand(qemu, caps, runUid, runGid); virCommandAddArgList(cmd, "-device", "?", "-device", "pci-assign,?", @@ -2183,7 +2157,6 @@ qemuCapsInitHelp(qemuCapsPtr caps, uid_t runUid, gid_t runGid) char *help = NULL; int ret = -1; const char *tmp; - qemuCapsHookData hookData; VIR_DEBUG("caps=%p", caps); @@ -2196,9 +2169,7 @@ qemuCapsInitHelp(qemuCapsPtr caps, uid_t runUid, gid_t runGid) caps->arch = virArchFromHost(); } - hookData.runUid = runUid; - hookData.runGid = runGid; - cmd = qemuCapsProbeCommand(caps->binary, NULL, &hookData); + cmd = qemuCapsProbeCommand(caps->binary, NULL, runUid, runGid); virCommandAddArgList(cmd, "-help", NULL); virCommandSetOutputBuffer(cmd, &help); @@ -2227,13 +2198,13 @@ qemuCapsInitHelp(qemuCapsPtr caps, uid_t runUid, gid_t runGid) * understands the 0.13.0+ notion of "-device driver,". */ if (qemuCapsGet(caps, QEMU_CAPS_DEVICE) && strstr(help, "-device driver,?") && - qemuCapsExtractDeviceStr(caps->binary, caps, &hookData) < 0) + qemuCapsExtractDeviceStr(caps->binary, caps, runUid, runGid) < 0) goto cleanup; - if (qemuCapsProbeCPUModels(caps, &hookData) < 0) + if (qemuCapsProbeCPUModels(caps, runUid, runGid) < 0) goto cleanup; - if (qemuCapsProbeMachineTypes(caps, &hookData) < 0) + if (qemuCapsProbeMachineTypes(caps, runUid, runGid) < 0) goto cleanup; ret = 0; @@ -2329,7 +2300,6 @@ qemuCapsInitQMP(qemuCapsPtr caps, char *monarg = NULL; char *monpath = NULL; char *pidfile = NULL; - qemuCapsHookData hookData; char *archstr; pid_t pid = 0; virDomainObj vm; @@ -2383,9 +2353,8 @@ qemuCapsInitQMP(qemuCapsPtr caps, NULL); virCommandAddEnvPassCommon(cmd); virCommandClearCaps(cmd); - hookData.runUid = runUid; - hookData.runGid = runGid; - virCommandSetPreExecHook(cmd, qemuCapsHook, &hookData); + virCommandSetGID(cmd, runGid); + virCommandSetUID(cmd, runUid); if (virCommandRun(cmd, &status) < 0) goto cleanup; -- 1.8.1

On Thu, Feb 07, 2013 at 04:37:47PM -0500, Laine Stump wrote:
Setting the uid/gid of the child process was the only thing done by the hook function in this case, and that can now be done more simply with virCommandSetUID/GID. --- src/qemu/qemu_capabilities.c | 61 +++++++++++--------------------------------- 1 file changed, 15 insertions(+), 46 deletions(-)
ACK, this is the kind of nice benefit we get from good apis :-) Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

--- src/storage/storage_backend.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-) diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index cab72c6..044024a 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -1,7 +1,7 @@ /* * storage_backend.c: internal storage driver backend contract * - * Copyright (C) 2007-2012 Red Hat, Inc. + * Copyright (C) 2007-2013 Red Hat, Inc. * Copyright (C) 2007-2008 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -533,24 +533,6 @@ cleanup: return ret; } -struct hookdata { - virStorageVolDefPtr vol; - bool skip; -}; - -static int virStorageBuildSetUIDHook(void *data) { - struct hookdata *tmp = data; - virStorageVolDefPtr vol = tmp->vol; - - if (tmp->skip) - return 0; - - if (virSetUIDGID(vol->target.perms.uid, vol->target.perms.gid) < 0) - return -1; - - return 0; -} - static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, virStorageVolDefPtr vol, virCommandPtr cmd) { @@ -558,7 +540,6 @@ static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, gid_t gid; uid_t uid; int filecreated = 0; - struct hookdata data = {vol, false}; if ((pool->def->type == VIR_STORAGE_POOL_NETFS) && (((getuid() == 0) @@ -567,7 +548,8 @@ static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, || ((vol->target.perms.gid != -1) && (vol->target.perms.gid != getgid())))) { - virCommandSetPreExecHook(cmd, virStorageBuildSetUIDHook, &data); + virCommandSetUID(cmd, vol->target.perms.uid); + virCommandSetGID(cmd, vol->target.perms.gid); if (virCommandRun(cmd, NULL) == 0) { /* command was successfully run, check if the file was created */ @@ -576,7 +558,9 @@ static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, } } - data.skip = true; + /* don't change uid/gid if we retry */ + virCommandSetUID(cmd, 0); + virCommandSetGID(cmd, 0); if (!filecreated) { if (virCommandRun(cmd, NULL) < 0) { -- 1.8.1

On Thu, Feb 07, 2013 at 04:37:48PM -0500, Laine Stump wrote:
--- src/storage/storage_backend.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 02/07/2013 02:37 PM, Laine Stump wrote:
--- src/storage/storage_backend.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-)
@@ -576,7 +558,9 @@ static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, } }
- data.skip = true; + /* don't change uid/gid if we retry */ + virCommandSetUID(cmd, 0); + virCommandSetGID(cmd, 0);
Hmm, so you are reusing an existing virCommand, but want to change it to no longer attempt uid/gid change (that is, inherit the uid/gid of the current libvirtd). If you refactor things in earlier patches to use -1 as the no-op, and allow an attempt to change to id 0, then this needs alteration to -1. And per the man page of setfsuid, there really are reasons why one would attempt to change uid to 0, even when already executing as uid 0 - it forces Linux to resync the fsuid back to 0. True, not much code plays with fsuid, and it is rare to have a program where fsuid differs from uid, but I'd rather our wrapper lets us expose full kernel/glibc rules on id setting, than to accidentally short-circuit away something that has important side-effects in odd corner cases. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This makes it simpler to include the necessary system security driver libraries for a particular system. For this patch, several existing conditional sections from the Makfile were replaced; I'll later be adding SECDRIVER_LIBS to libvirt_util_la_LIBADD, because vircommand.c will be calling a function from $securitylib. --- src/Makefile.am | 32 +++++++++++--------------------- 1 file changed, 11 insertions(+), 21 deletions(-) diff --git a/src/Makefile.am b/src/Makefile.am index f6162df..2e68e96 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -32,6 +32,13 @@ nodist_conf_DATA = THREAD_LIBS = $(LIB_PTHREAD) $(LTLIBMULTITHREAD) +if WITH_SECDRIVER_SELINUX +SECDRIVER_LIBS = $(SELINUX_LIBS) +endif +if WITH_SECDRIVER_APPARMOR +SECDRIVER_LIBS = $(APPARMOR_LIBS) +endif + if WITH_NETWORK UUID=$(shell uuidgen 2>/dev/null) endif @@ -986,12 +993,7 @@ if WITH_BLKID libvirt_driver_lxc_impl_la_CFLAGS += $(BLKID_CFLAGS) libvirt_driver_lxc_impl_la_LIBADD += $(BLKID_LIBS) endif -if WITH_SECDRIVER_SELINUX -libvirt_driver_lxc_impl_la_LIBADD += $(SELINUX_LIBS) -endif -if WITH_SECDRIVER_APPARMOR -libvirt_driver_lxc_impl_la_LIBADD += $(APPARMOR_LIBS) -endif +libvirt_driver_lxc_impl_la_LIBADD += $(SECDRIVER_LIBS) libvirt_driver_lxc_impl_la_SOURCES = $(LXC_DRIVER_SOURCES) conf_DATA += lxc/lxc.conf @@ -1158,12 +1160,7 @@ libvirt_driver_storage_impl_la_CFLAGS = \ -I$(top_srcdir)/src/conf $(AM_CFLAGS) libvirt_driver_storage_impl_la_LDFLAGS = $(AM_LDFLAGS) libvirt_driver_storage_impl_la_LIBADD = -if WITH_SECDRIVER_SELINUX -libvirt_driver_storage_impl_la_LIBADD += $(SELINUX_LIBS) -endif -if WITH_SECDRIVER_APPARMOR -libvirt_driver_storage_impl_la_LIBADD += $(APPARMOR_LIBS) -endif +libvirt_driver_storage_impl_la_LIBADD += $(SECDRIVER_LIBS) if WITH_BLKID libvirt_driver_storage_impl_la_CFLAGS += $(BLKID_CFLAGS) libvirt_driver_storage_impl_la_LIBADD += $(BLKID_LIBS) @@ -1276,16 +1273,14 @@ libvirt_la_BUILT_LIBADD += libvirt_security_manager.la libvirt_security_manager_la_CFLAGS = \ -I$(top_srcdir)/src/conf $(AM_CFLAGS) libvirt_security_manager_la_LDFLAGS = $(AM_LDFLAGS) -libvirt_security_manager_la_LIBADD = +libvirt_security_manager_la_LIBADD = $(SECDRIVER_LIBS) if WITH_SECDRIVER_SELINUX libvirt_security_manager_la_SOURCES += $(SECURITY_DRIVER_SELINUX_SOURCES) libvirt_security_manager_la_CFLAGS += $(SELINUX_CFLAGS) -libvirt_security_manager_la_LIBADD += $(SELINUX_LIBS) endif if WITH_SECDRIVER_APPARMOR libvirt_security_manager_la_SOURCES += $(SECURITY_DRIVER_APPARMOR_SOURCES) libvirt_security_manager_la_CFLAGS += $(APPARMOR_CFLAGS) -libvirt_security_manager_la_LIBADD += $(APPARMOR_LIBS) endif # Add all conditional sources just in case... @@ -1943,12 +1938,7 @@ libvirt_lxc_LDADD = \ if WITH_DTRACE_PROBES libvirt_lxc_LDADD += libvirt_probes.lo endif -if WITH_SECDRIVER_SELINUX -libvirt_lxc_LDADD += $(SELINUX_LIBS) -endif -if WITH_SECDRIVER_APPARMOR -libvirt_lxc_LDADD += $(APPARMOR_LIBS) -endif +libvirt_lxc_LDADD += $(SECDRIVER_LIBS) libvirt_lxc_CFLAGS = \ -I$(top_srcdir)/src/conf \ $(AM_CFLAGS) \ -- 1.8.1

On Thu, Feb 07, 2013 at 04:37:49PM -0500, Laine Stump wrote:
This makes it simpler to include the necessary system security driver libraries for a particular system. For this patch, several existing conditional sections from the Makfile were replaced; I'll later be adding SECDRIVER_LIBS to libvirt_util_la_LIBADD, because vircommand.c will be calling a function from $securitylib. --- src/Makefile.am | 32 +++++++++++--------------------- 1 file changed, 11 insertions(+), 21 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

virCommand gets the new API virCommandSetSecLabel(), which saves a copy of a null-terminated string in the virCommand. During virCommandRun, if the seclabel is non-NULL and we've been compiled with a security driver, the appropriate security library function is called to set the label for the child process. In the case of SELinux, setexeccon_raw() is called, and for AppArmor, aa_change_profile() is called. This functionality has been added so that users of virCommand can use the upcoming virSecurityManagerSetChildProcessLabel() prior to running a child process, rather than needing to setup a hook function to be called (and in turn call virSecurityManagerSetProcessLabel()) *during* the setup of the child process. --- src/Makefile.am | 3 ++- src/libvirt_private.syms | 1 + src/util/vircommand.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/vircommand.h | 3 +++ 4 files changed, 61 insertions(+), 1 deletion(-) diff --git a/src/Makefile.am b/src/Makefile.am index 2e68e96..41cef96 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -763,7 +763,8 @@ libvirt_util_la_CFLAGS = $(CAPNG_CFLAGS) $(YAJL_CFLAGS) $(LIBNL_CFLAGS) \ $(DBUS_CFLAGS) $(LDEXP_LIBM) libvirt_util_la_LIBADD = $(CAPNG_LIBS) $(YAJL_LIBS) $(LIBNL_LIBS) \ $(THREAD_LIBS) $(AUDIT_LIBS) $(DEVMAPPER_LIBS) \ - $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) + $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) \ + $(SECDRIVER_LIBS) noinst_LTLIBRARIES += libvirt_conf.la diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 83d83ad..4ac2d52 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -166,6 +166,7 @@ virCommandSetOutputBuffer; virCommandSetOutputFD; virCommandSetPidFile; virCommandSetPreExecHook; +virCommandSetSecLabel; virCommandSetUID; virCommandSetWorkingDirectory; virCommandToString; diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 65838d1..3eb8465 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -33,6 +33,12 @@ # include <cap-ng.h> #endif +#if defined(WITH_SECDRIVER_SELINUX) +# include <selinux/selinux.h> +#elif defined(WITH_SECDRIVER_APPARMOR) +# include <sys/apparmor.h> +#endif + #include "vircommand.h" #include "viralloc.h" #include "virerror.h" @@ -104,6 +110,7 @@ struct _virCommand { uid_t uid; gid_t gid; unsigned long long capabilities; + char *seclabel; }; static int virCommandHandshakeChild(virCommandPtr cmd); @@ -607,6 +614,29 @@ virExec(virCommandPtr cmd) goto fork_error; } + if (cmd->seclabel) { + VIR_DEBUG("Setting child security label to %s", cmd->seclabel); +# if defined(WITH_SECDRIVER_SELINUX) + if (setexeccon_raw(cmd->seclabel) == -1) { + virReportSystemError(errno, + _("unable to set security context '%s' " + "for '%s'"), + cmd->seclabel, cmd->args[0]); + if (security_getenforce() == 1) + goto fork_error; + } +# elif defined(WITH_SECDRIVER_APPARMOR) + if (aa_change_profile(cmd->seclabel) < 0) { + virReportSystemError(errno, + _("unable to set AppArmor profile '%s' " + "for '%s'"), + cmd->seclabel, cmd->args[0]); + goto fork_error; + } +# endif + + } + if (cmd->uid > 0 || cmd->gid > 0) { VIR_DEBUG("Setting child uid:gid to %u:%u", cmd->uid, cmd->gid); if (virSetUIDGID(cmd->uid, cmd->gid) < 0) @@ -964,6 +994,30 @@ virCommandAllowCap(virCommandPtr cmd, } +/** + * virCommandSetSecLabel: + * @cmd: the command to modify + * @label: the label to use + * + * Saves a copy of @label to use when calling the appropriate security + * driver after the child process has been started. In the case of + * SELinux, this label will be sent to setexeccon_raw(), and in the + * case of AppArmor, it will be sent to aa_change_profile(). If + * neither of these is configured into libvirt, or if label is NULL, + * nothing will be done. + */ +void +virCommandSetSecLabel(virCommandPtr cmd, const char *label) +{ + if (!cmd || cmd->has_error) + return; + + VIR_FREE(cmd->seclabel); + if (label && !(cmd->seclabel = strdup(label))) + cmd->has_error = ENOMEM; + return; +} + /** * virCommandDaemonize: @@ -2712,6 +2766,7 @@ virCommandFree(virCommandPtr cmd) VIR_FREE(cmd->transfer); VIR_FREE(cmd->preserve); + VIR_FREE(cmd->seclabel); VIR_FREE(cmd); } diff --git a/src/util/vircommand.h b/src/util/vircommand.h index ac940f0..6d76d42 100644 --- a/src/util/vircommand.h +++ b/src/util/vircommand.h @@ -70,6 +70,9 @@ void virCommandClearCaps(virCommandPtr cmd); void virCommandAllowCap(virCommandPtr cmd, int capability); +void virCommandSetSecLabel(virCommandPtr cmd, + const char *label); + void virCommandDaemonize(virCommandPtr cmd); void virCommandNonblockingFDs(virCommandPtr cmd); -- 1.8.1

On Thu, Feb 07, 2013 at 04:37:50PM -0500, Laine Stump wrote:
virCommand gets the new API virCommandSetSecLabel(), which saves a copy of a null-terminated string in the virCommand. During virCommandRun, if the seclabel is non-NULL and we've been compiled with a security driver, the appropriate security library function is called to set the label for the child process. In the case of SELinux, setexeccon_raw() is called, and for AppArmor, aa_change_profile() is called.
This functionality has been added so that users of virCommand can use the upcoming virSecurityManagerSetChildProcessLabel() prior to running a child process, rather than needing to setup a hook function to be called (and in turn call virSecurityManagerSetProcessLabel()) *during* the setup of the child process. ---
+#if defined(WITH_SECDRIVER_SELINUX) +# include <selinux/selinux.h> +#elif defined(WITH_SECDRIVER_APPARMOR) +# include <sys/apparmor.h> +#endif
[snip]
+/** + * virCommandSetSecLabel: + * @cmd: the command to modify + * @label: the label to use + * + * Saves a copy of @label to use when calling the appropriate security + * driver after the child process has been started. In the case of + * SELinux, this label will be sent to setexeccon_raw(), and in the + * case of AppArmor, it will be sent to aa_change_profile(). If + * neither of these is configured into libvirt, or if label is NULL, + * nothing will be done. + */ +void +virCommandSetSecLabel(virCommandPtr cmd, const char *label) +{ + if (!cmd || cmd->has_error) + return; + + VIR_FREE(cmd->seclabel); + if (label && !(cmd->seclabel = strdup(label))) + cmd->has_error = ENOMEM; + return; +}
It is technically possible to build libvirt with both apparmour and selinux enabled, and choose between the impl with a libvirtd config. This means we need to have separate methods for each in virCommand. So I'd suggest a pair of methods virCommandSetSELinuxLabel(...) virCommandSetAppArmourProfile(...) Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 02/08/2013 11:23 AM, Daniel P. Berrange wrote:
On Thu, Feb 07, 2013 at 04:37:50PM -0500, Laine Stump wrote:
virCommand gets the new API virCommandSetSecLabel(), which saves a copy of a null-terminated string in the virCommand. During virCommandRun, if the seclabel is non-NULL and we've been compiled with a security driver, the appropriate security library function is called to set the label for the child process. In the case of SELinux, setexeccon_raw() is called, and for AppArmor, aa_change_profile() is called.
This functionality has been added so that users of virCommand can use the upcoming virSecurityManagerSetChildProcessLabel() prior to running a child process, rather than needing to setup a hook function to be called (and in turn call virSecurityManagerSetProcessLabel()) *during* the setup of the child process. ---
+#if defined(WITH_SECDRIVER_SELINUX) +# include <selinux/selinux.h> +#elif defined(WITH_SECDRIVER_APPARMOR) +# include <sys/apparmor.h> +#endif [snip]
+/** + * virCommandSetSecLabel: + * @cmd: the command to modify + * @label: the label to use + * + * Saves a copy of @label to use when calling the appropriate security + * driver after the child process has been started. In the case of + * SELinux, this label will be sent to setexeccon_raw(), and in the + * case of AppArmor, it will be sent to aa_change_profile(). If + * neither of these is configured into libvirt, or if label is NULL, + * nothing will be done. + */ +void +virCommandSetSecLabel(virCommandPtr cmd, const char *label) +{ + if (!cmd || cmd->has_error) + return; + + VIR_FREE(cmd->seclabel); + if (label && !(cmd->seclabel = strdup(label))) + cmd->has_error = ENOMEM; + return; +} It is technically possible to build libvirt with both apparmour and selinux enabled, and choose between the impl with a libvirtd config.
Until I broke it in the previous patch :-). I'll fix that as well.
This means we need to have separate methods for each in virCommand. So I'd suggest a pair of methods
virCommandSetSELinuxLabel(...) virCommandSetAppArmourProfile(...)
Daniel

The existing virSecurityManagerSetProcessLabel() API is designed so that it must be called after forking the child process, but before exec'ing the child. Due to the way the virCommand API works, that means it needs to be put in a "hook" function that virCommand is told to call out to at that time. Setting the child process label is a basic enough need when executing any process that virCommand should have a method of doing that. But virCommand must be told what label to set, and only the security driver knows the answer to that question. The new virSecurityManagerSet*Child*ProcessLabel() API is the way to transfer the knowledge about what label to set from the security driver to the virCommand object. It is given a virCommandPtr, and each security driver calls the appropriate virCommand* API to tell virCommand what to do between fork and exec. 1) in the case of the DAC security driver, it calls virCommandSetUID/GID() to set a uid and gid that must be set for the child process. 2) for the SELinux and AppArmor security drivers, it calls virCommandSetSecLabel() to save a copy of the char* that will be sent to each driver's respective "SetProcessLabel" API *after forking the child process*. With this new API in place, we will be able to remove virSecurityManagerSetProcessLabel() from any virCommand pre-exec hooks. (Unfortunately, the LXC driver uses clone() rather than virCommand, so it can't take advantage of this new security driver API, meaning that we need to keep around the older virSecurityManagerSetProcessLabel(), at least for now.) --- src/libvirt_private.syms | 1 + src/security/security_apparmor.c | 41 +++++++++++++++++++++++++++++++++++++++- src/security/security_dac.c | 24 ++++++++++++++++++++++- src/security/security_driver.h | 6 +++++- src/security/security_manager.c | 13 ++++++++++++- src/security/security_manager.h | 6 +++++- src/security/security_nop.c | 10 +++++++++- src/security/security_selinux.c | 34 ++++++++++++++++++++++++++++++++- src/security/security_stack.c | 20 +++++++++++++++++++- 9 files changed, 147 insertions(+), 8 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 4ac2d52..e50b63d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1066,6 +1066,7 @@ virSecurityManagerRestoreHostdevLabel; virSecurityManagerRestoreImageLabel; virSecurityManagerRestoreSavedStateLabel; virSecurityManagerSetAllLabel; +virSecurityManagerSetChildProcessLabel; virSecurityManagerSetDaemonSocketLabel; virSecurityManagerSetHostdevLabel; virSecurityManagerSetHugepages; diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index bf795b0..4a81118 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -1,7 +1,7 @@ /* * AppArmor security driver for libvirt * - * Copyright (C) 2011 Red Hat, Inc. + * Copyright (C) 2011, 2013 Red Hat, Inc. * Copyright (C) 2009-2010 Canonical Ltd. * * This library is free software; you can redistribute it and/or @@ -626,6 +626,44 @@ AppArmorSetSecurityProcessLabel(virSecurityManagerPtr mgr, virDomainDefPtr def) return rc; } +/* Called directly by API user prior to virCommandRun(). virCommandRun() + * will then call aa_change_profile() (if a cmd->seclabel has been set) + * *after forking the child process*. + */ +static int +AppArmorSetSecurityChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def, + virCommandPtr cmd) +{ + int rc = -1; + char *profile_name = NULL; + const virSecurityLabelDefPtr secdef = + virDomainDefGetSecurityLabelDef(def, SECURITY_APPARMOR_NAME); + + if (!secdef) + goto cleanup; + + if (STRNEQ(virSecurityManagerGetModel(mgr), secdef->model)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("security label driver mismatch: " + "\'%s\' model configured for domain, but " + "hypervisor driver is \'%s\'."), + secdef->model, virSecurityManagerGetModel(mgr)); + if (use_apparmor() > 0) + goto cleanup; + } + + if ((profile_name = get_profile_name(def)) == NULL) + goto cleanup; + + virCommandSetSecLabel(cmd, profile_name); + rc = 0; + + cleanup: + VIR_FREE(profile_name); + return rc; +} + static int AppArmorSetSecurityDaemonSocketLabel(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, virDomainDefPtr vm ATTRIBUTE_UNUSED) @@ -926,6 +964,7 @@ virSecurityDriver virAppArmorSecurityDriver = { .domainGetSecurityProcessLabel = AppArmorGetSecurityProcessLabel, .domainSetSecurityProcessLabel = AppArmorSetSecurityProcessLabel, + .domainSetSecurityChildProcessLabel = AppArmorSetSecurityChildProcessLabel, .domainSetSecurityAllLabel = AppArmorSetSecurityAllLabel, .domainRestoreSecurityAllLabel = AppArmorRestoreSecurityAllLabel, diff --git a/src/security/security_dac.c b/src/security/security_dac.c index ae489e2..b115bb0 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2010-2012 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -878,6 +878,27 @@ virSecurityDACSetProcessLabel(virSecurityManagerPtr mgr, static int +virSecurityDACSetChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def ATTRIBUTE_UNUSED, + virCommandPtr cmd) +{ + uid_t user; + gid_t group; + virSecurityDACDataPtr priv = virSecurityManagerGetPrivateData(mgr); + + if (virSecurityDACGetIds(def, priv, &user, &group)) + return -1; + + VIR_DEBUG("Setting child to drop privileges of DEF to %u:%u", + (unsigned int) user, (unsigned int) group); + + virCommandSetUID(cmd, user); + virCommandSetGID(cmd, group); + return 0; +} + + +static int virSecurityDACVerify(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, virDomainDefPtr def ATTRIBUTE_UNUSED) { @@ -1072,6 +1093,7 @@ virSecurityDriver virSecurityDriverDAC = { .domainGetSecurityProcessLabel = virSecurityDACGetProcessLabel, .domainSetSecurityProcessLabel = virSecurityDACSetProcessLabel, + .domainSetSecurityChildProcessLabel = virSecurityDACSetChildProcessLabel, .domainSetSecurityAllLabel = virSecurityDACSetSecurityAllLabel, .domainRestoreSecurityAllLabel = virSecurityDACRestoreSecurityAllLabel, diff --git a/src/security/security_driver.h b/src/security/security_driver.h index 6b775ab..73d4d3b 100644 --- a/src/security/security_driver.h +++ b/src/security/security_driver.h @@ -1,5 +1,5 @@ /* - * Copyright (C) 2008, 2010 Red Hat, Inc. + * Copyright (C) 2008, 2010, 2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -92,6 +92,9 @@ typedef int (*virSecurityDomainGetProcessLabel) (virSecurityManagerPtr mgr, virSecurityLabelPtr sec); typedef int (*virSecurityDomainSetProcessLabel) (virSecurityManagerPtr mgr, virDomainDefPtr def); +typedef int (*virSecurityDomainSetChildProcessLabel) (virSecurityManagerPtr mgr, + virDomainDefPtr def, + virCommandPtr cmd); typedef int (*virSecurityDomainSecurityVerify) (virSecurityManagerPtr mgr, virDomainDefPtr def); typedef int (*virSecurityDomainSetImageFDLabel) (virSecurityManagerPtr mgr, @@ -131,6 +134,7 @@ struct _virSecurityDriver { virSecurityDomainGetProcessLabel domainGetSecurityProcessLabel; virSecurityDomainSetProcessLabel domainSetSecurityProcessLabel; + virSecurityDomainSetChildProcessLabel domainSetSecurityChildProcessLabel; virSecurityDomainSetAllLabel domainSetSecurityAllLabel; virSecurityDomainRestoreAllLabel domainRestoreSecurityAllLabel; diff --git a/src/security/security_manager.c b/src/security/security_manager.c index 593c00b..7fb75dd 100644 --- a/src/security/security_manager.c +++ b/src/security/security_manager.c @@ -1,7 +1,7 @@ /* * security_manager.c: Internal security manager API * - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2011, 2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -437,6 +437,17 @@ int virSecurityManagerSetProcessLabel(virSecurityManagerPtr mgr, return -1; } +int virSecurityManagerSetChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virCommandPtr cmd) +{ + if (mgr->drv->domainSetSecurityChildProcessLabel) + return mgr->drv->domainSetSecurityChildProcessLabel(mgr, vm, cmd); + + virReportError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + return -1; +} + int virSecurityManagerVerify(virSecurityManagerPtr mgr, virDomainDefPtr def) { diff --git a/src/security/security_manager.h b/src/security/security_manager.h index dc09c7c..f0fdd15 100644 --- a/src/security/security_manager.h +++ b/src/security/security_manager.h @@ -1,7 +1,7 @@ /* * security_manager.h: Internal security manager API * - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2011, 2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -24,6 +24,7 @@ # define VIR_SECURITY_MANAGER_H__ # include "domain_conf.h" +# include "vircommand.h" typedef struct _virSecurityManager virSecurityManager; typedef virSecurityManager *virSecurityManagerPtr; @@ -102,6 +103,9 @@ int virSecurityManagerGetProcessLabel(virSecurityManagerPtr mgr, virSecurityLabelPtr sec); int virSecurityManagerSetProcessLabel(virSecurityManagerPtr mgr, virDomainDefPtr def); +int virSecurityManagerSetChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def, + virCommandPtr cmd); int virSecurityManagerVerify(virSecurityManagerPtr mgr, virDomainDefPtr def); int virSecurityManagerSetImageFDLabel(virSecurityManagerPtr mgr, diff --git a/src/security/security_nop.c b/src/security/security_nop.c index b6eb3f6..29ead09 100644 --- a/src/security/security_nop.c +++ b/src/security/security_nop.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2011, 2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -157,6 +157,13 @@ static int virSecurityDomainSetProcessLabelNop(virSecurityManagerPtr mgr ATTRIBU return 0; } +static int virSecurityDomainSetChildProcessLabelNop(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, + virDomainDefPtr vm ATTRIBUTE_UNUSED, + virCommandPtr cmd ATTRIBUTE_UNUSED) +{ + return 0; +} + static int virSecurityDomainVerifyNop(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, virDomainDefPtr def ATTRIBUTE_UNUSED) { @@ -207,6 +214,7 @@ virSecurityDriver virSecurityDriverNop = { .domainGetSecurityProcessLabel = virSecurityDomainGetProcessLabelNop, .domainSetSecurityProcessLabel = virSecurityDomainSetProcessLabelNop, + .domainSetSecurityChildProcessLabel = virSecurityDomainSetChildProcessLabelNop, .domainSetSecurityAllLabel = virSecurityDomainSetAllLabelNop, .domainRestoreSecurityAllLabel = virSecurityDomainRestoreAllLabelNop, diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 2f5012d..2f25bc8 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2008-2012 Red Hat, Inc. + * Copyright (C) 2008-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -1859,6 +1859,37 @@ virSecuritySELinuxSetSecurityProcessLabel(virSecurityManagerPtr mgr, } static int +virSecuritySELinuxSetSecurityChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def, + virCommandPtr cmd) +{ + /* TODO: verify DOI */ + virSecurityLabelDefPtr secdef; + + secdef = virDomainDefGetSecurityLabelDef(def, SECURITY_SELINUX_NAME); + if (secdef == NULL) + return -1; + + if (secdef->label == NULL) + return 0; + + VIR_DEBUG("label=%s", secdef->label); + if (!STREQ(virSecurityManagerGetModel(mgr), secdef->model)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("security label driver mismatch: " + "'%s' model configured for domain, but " + "hypervisor driver is '%s'."), + secdef->model, virSecurityManagerGetModel(mgr)); + if (security_getenforce() == 1) + return -1; + } + + /* save in cmd to be set after fork/before child process is exec'ed */ + virCommandSetSecLabel(cmd, secdef->label); + return 0; +} + +static int virSecuritySELinuxSetSecurityDaemonSocketLabel(virSecurityManagerPtr mgr, virDomainDefPtr def) { @@ -2261,6 +2292,7 @@ virSecurityDriver virSecurityDriverSELinux = { .domainGetSecurityProcessLabel = virSecuritySELinuxGetSecurityProcessLabel, .domainSetSecurityProcessLabel = virSecuritySELinuxSetSecurityProcessLabel, + .domainSetSecurityChildProcessLabel = virSecuritySELinuxSetSecurityChildProcessLabel, .domainSetSecurityAllLabel = virSecuritySELinuxSetSecurityAllLabel, .domainRestoreSecurityAllLabel = virSecuritySELinuxRestoreSecurityAllLabel, diff --git a/src/security/security_stack.c b/src/security/security_stack.c index 8e1e5f9..dd321f7 100644 --- a/src/security/security_stack.c +++ b/src/security/security_stack.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2011, 2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -367,6 +367,23 @@ virSecurityStackSetProcessLabel(virSecurityManagerPtr mgr, } static int +virSecurityStackSetChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virCommandPtr cmd) +{ + virSecurityStackDataPtr priv = virSecurityManagerGetPrivateData(mgr); + virSecurityStackItemPtr item = priv->itemsHead; + int rc = 0; + + for (; item; item = item->next) { + if (virSecurityManagerSetChildProcessLabel(item->securityManager, vm, cmd) < 0) + rc = -1; + } + + return rc; +} + +static int virSecurityStackGetProcessLabel(virSecurityManagerPtr mgr, virDomainDefPtr vm, pid_t pid, @@ -540,6 +557,7 @@ virSecurityDriver virSecurityDriverStack = { .domainGetSecurityProcessLabel = virSecurityStackGetProcessLabel, .domainSetSecurityProcessLabel = virSecurityStackSetProcessLabel, + .domainSetSecurityChildProcessLabel = virSecurityStackSetChildProcessLabel, .domainSetSecurityAllLabel = virSecurityStackSetSecurityAllLabel, .domainRestoreSecurityAllLabel = virSecurityStackRestoreSecurityAllLabel, -- 1.8.1

On Thu, Feb 07, 2013 at 04:37:51PM -0500, Laine Stump wrote:
The existing virSecurityManagerSetProcessLabel() API is designed so that it must be called after forking the child process, but before exec'ing the child. Due to the way the virCommand API works, that means it needs to be put in a "hook" function that virCommand is told to call out to at that time.
Setting the child process label is a basic enough need when executing any process that virCommand should have a method of doing that. But virCommand must be told what label to set, and only the security driver knows the answer to that question.
The new virSecurityManagerSet*Child*ProcessLabel() API is the way to transfer the knowledge about what label to set from the security driver to the virCommand object. It is given a virCommandPtr, and each security driver calls the appropriate virCommand* API to tell virCommand what to do between fork and exec.
1) in the case of the DAC security driver, it calls virCommandSetUID/GID() to set a uid and gid that must be set for the child process.
2) for the SELinux and AppArmor security drivers, it calls virCommandSetSecLabel() to save a copy of the char* that will be sent to each driver's respective "SetProcessLabel" API *after forking the child process*.
This will need a tweak based on the change I suggested to the previous patch.
diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index bf795b0..4a81118 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -1,7 +1,7 @@ /* * AppArmor security driver for libvirt * - * Copyright (C) 2011 Red Hat, Inc. + * Copyright (C) 2011, 2013 Red Hat, Inc.
You can turn this into a range '2011-2013' - there were plenty of Red Hat changes to this file throughout the last year(s). Likewise for all others. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

The qemu driver had been calling virSecurityManagerSetProcessLabel() from a "pre-exec hook" function that is run after the child is forked, but before exec'ing qemu. This is problematic because the uid and gid of the child are set by the security driver, but capabilities are dropped by virCommand - such separation doesn't work; the two operations must be done together or the capabilities do not transfer properly to the child process. This patch switches to using virSecurityManagerSetChildProcessLabel(), which is called prior to virCommandRun() (rather than being called *during* virCommandrun() by the hook function), and doesn't set the UID/GID/security label directly, but instead merely informs virCommand what it should set them all to when the time is appropriate. This lets virCommand choose to do the uid/gid and caps dropping all at the same time if it wants (it does *want* to, but isn't doing so yet; that's for an upcoming patch). --- src/qemu/qemu_process.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 30f923a..a343ac3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1,7 +1,7 @@ /* * qemu_process.h: QEMU process management * - * Copyright (C) 2006-2012 Red Hat, Inc. + * Copyright (C) 2006-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -2756,10 +2756,6 @@ static int qemuProcessHook(void *data) if (qemuProcessInitNumaMemoryPolicy(h->vm, h->nodemask) < 0) goto cleanup; - VIR_DEBUG("Setting up security labelling"); - if (virSecurityManagerSetProcessLabel(h->driver->securityManager, h->vm->def) < 0) - goto cleanup; - ret = 0; cleanup: @@ -3889,6 +3885,12 @@ int qemuProcessStart(virConnectPtr conn, virCommandSetPreExecHook(cmd, qemuProcessHook, &hookData); + VIR_DEBUG("Setting up security labelling"); + if (virSecurityManagerSetChildProcessLabel(driver->securityManager, + vm->def, cmd) < 0) { + goto cleanup; + } + virCommandSetOutputFD(cmd, &logfile); virCommandSetErrorFD(cmd, &logfile); virCommandNonblockingFDs(cmd); -- 1.8.1

On Thu, Feb 07, 2013 at 04:37:52PM -0500, Laine Stump wrote:
The qemu driver had been calling virSecurityManagerSetProcessLabel() from a "pre-exec hook" function that is run after the child is forked, but before exec'ing qemu. This is problematic because the uid and gid of the child are set by the security driver, but capabilities are dropped by virCommand - such separation doesn't work; the two operations must be done together or the capabilities do not transfer properly to the child process.
This patch switches to using virSecurityManagerSetChildProcessLabel(), which is called prior to virCommandRun() (rather than being called *during* virCommandrun() by the hook function), and doesn't set the UID/GID/security label directly, but instead merely informs virCommand what it should set them all to when the time is appropriate.
This lets virCommand choose to do the uid/gid and caps dropping all at the same time if it wants (it does *want* to, but isn't doing so yet; that's for an upcoming patch). --- src/qemu/qemu_process.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

This is an interim measure to make sure everything still works in this order. The next step will be to perform capabilities drop and setuid/gid as a single operation (which is the only way to keep any capabilities when switching to a non-root uid). --- src/util/vircommand.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 3eb8465..2eb11f5 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -643,6 +643,12 @@ virExec(virCommandPtr cmd) goto fork_error; } + /* The steps above may need todo something privileged, so + * we delay clearing capabilities until the last minute */ + if (cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) + if (virSetCapabilities(cmd->capabilities) < 0) + goto fork_error; + if (cmd->pwd) { VIR_DEBUG("Running child in %s", cmd->pwd); if (chdir(cmd->pwd) < 0) { @@ -661,12 +667,6 @@ virExec(virCommandPtr cmd) goto fork_error; } - /* The steps above may need todo something privileged, so - * we delay clearing capabilities until the last minute */ - if (cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) - if (virSetCapabilities(cmd->capabilities) < 0) - goto fork_error; - /* Close logging again to ensure no FDs leak to child */ virLogReset(); -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
This is an interim measure to make sure everything still works in this order. The next step will be to perform capabilities drop and setuid/gid as a single operation (which is the only way to keep any capabilities when switching to a non-root uid). --- src/util/vircommand.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process. In order to set a non-0 uid while still maintaining any capabilities bits, it is necessary to either call capng_change_id() (which unfortunately doesn't not currently call initgroups to setup auxiliary group membership), or to perform the small amount of calisthenics contained in the new utility function virSetUIDGIDWithCaps(). A short description of what is done: 1) clear all capabilities then set all those desired by the caller (in capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed to change the capabilities bounding set). 2) call prctl(), telling it that we want to maintain current capabilities across an upcoming setuid(). 3) switch to the new uid/gid 4) again call prctl(), telling it we will no longer want capabilities maintained if this process does another setuid(). 5) clear the capabilities that we added to allow us to setuid/setgid/change the bounding set (unless they were also requested by the caller via the virCommand API). Because the modification/maintaining of capabilities is intermingled with setting the uid, this is necessarily done in a single function, rather than having two independent functions. Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command (e.g. "filecap /usr/bin/qemu-kvm sys_rawio"). --- src/libvirt_private.syms | 1 + src/util/virutil.c | 109 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virutil.h | 1 + 3 files changed, 111 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index e50b63d..aa42499 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1310,6 +1310,7 @@ virSetDeviceUnprivSGIO; virSetInherit; virSetNonBlock; virSetUIDGID; +virSetUIDGIDWithCaps; virSkipSpaces; virSkipSpacesAndBackslash; virSkipSpacesBackwards; diff --git a/src/util/virutil.c b/src/util/virutil.c index fddc39e..7b2886f 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -60,6 +60,7 @@ #endif #if WITH_CAPNG # include <cap-ng.h> +# include <sys/prctl.h> #endif #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R # include <mntent.h> @@ -2990,6 +2991,114 @@ virGetGroupName(gid_t gid ATTRIBUTE_UNUSED) } #endif /* HAVE_GETPWUID_R */ +#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return + * 0 on success, -1 on failure (the original system error remains in + * errno). + */ +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + int ii, capng_ret, ret = -1; + bool need_setgid = false, need_setuid = false; + bool need_prctl = false, need_setpcap = false; + + /* First drop all caps except those in capBits + the extra ones we + * need to change uid/gid and change the capabilities bounding + * set. + */ + + capng_clear(CAPNG_SELECT_BOTH); + + for (ii = 0; ii <= CAP_LAST_CAP; ii++) { + if (capBits & (1ULL << ii)) { + capng_update(CAPNG_ADD, + CAPNG_EFFECTIVE|CAPNG_INHERITABLE| + CAPNG_PERMITTED|CAPNG_BOUNDING_SET, + ii); + } + } + + if (gid != -1 && gid > 0 && !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETGID)) { + need_setgid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + } + if (uid != -1 && uid > 0 && !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETUID)) { + need_setuid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + } +# ifdef PR_CAPBSET_DROP + // If newer kernel, we need also need setpcap to change the bounding set + if ((capBits || need_setgid || need_setuid) && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) { + need_setpcap = true; + } + if (need_setpcap) + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); +# endif + + need_prctl = capBits || need_setgid || need_setuid || need_setpcap; + + // Tell system we want to keep caps across uid change + if (need_prctl && prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to set KEEPCAPS")); + goto cleanup; + } + + // Change to the temp capabilities + if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + } + + /* Set UID/GID */ + if (virSetUIDGID(uid, gid) < 0) + goto cleanup; + + // Tell it we are done keeping capabilities + if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to reset KEEPCAPS")); + goto cleanup; + } + + /* Drop the caps that allow setuid/gid (unless they were requested) */ + if (need_setgid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + if (need_setuid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + /* Throw away CAP_SETPCAP so no more changes */ + if (need_setpcap) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); + + if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + ret = -1; + goto cleanup; + } + + ret = 0; +cleanup: + return ret; +} + +#else +/* + * On platforms without libcapng, the capabilities setting is treated + * as a NOP. + */ + +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + return virSetUIDGID(uid, gid); +} +#endif + #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virutil.h b/src/util/virutil.h index 4201aa1..2dc3403 100644 --- a/src/util/virutil.h +++ b/src/util/virutil.h @@ -54,6 +54,7 @@ int virPipeReadUntilEOF(int outfd, int errfd, char **outbuf, char **errbuf); int virSetUIDGID(uid_t uid, gid_t gid); +int virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits); int virFileReadLimFD(int fd, int maxlen, char **buf) ATTRIBUTE_RETURN_CHECK; -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities bits, it is necessary to either call capng_change_id() (which unfortunately doesn't not currently call initgroups to setup auxiliary
s/doesn't not/doesn't/
group membership), or to perform the small amount of calisthenics contained in the new utility function virSetUIDGIDWithCaps().
Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command (e.g. "filecap /usr/bin/qemu-kvm sys_rawio").
Sheesh - does that mean that we have to fix up qemu packaging for Fedora to include that capability by default?
+#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return + * 0 on success, -1 on failure (the original system error remains in + * errno). + */ +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + int ii, capng_ret, ret = -1; + bool need_setgid = false, need_setuid = false; + bool need_prctl = false, need_setpcap = false; + + /* First drop all caps except those in capBits + the extra ones we + * need to change uid/gid and change the capabilities bounding + * set. + */ + + capng_clear(CAPNG_SELECT_BOTH); + + for (ii = 0; ii <= CAP_LAST_CAP; ii++) { + if (capBits & (1ULL << ii)) { + capng_update(CAPNG_ADD, + CAPNG_EFFECTIVE|CAPNG_INHERITABLE| + CAPNG_PERMITTED|CAPNG_BOUNDING_SET, + ii); + } + } + + if (gid != -1 && gid > 0 && !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETGID)) {
Another site that needs casting (gid != (gid_t) -1); and do we really want to short-circuit the gid>0 case?
+ need_setgid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + } + if (uid != -1 && uid > 0 && !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETUID)) { + need_setuid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + } +# ifdef PR_CAPBSET_DROP + // If newer kernel, we need also need setpcap to change the bounding set
Prefer /**/ comment style, please.
+ if ((capBits || need_setgid || need_setuid) && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) { + need_setpcap = true; + } + if (need_setpcap) + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP);
Odd spacing.
+# endif + + need_prctl = capBits || need_setgid || need_setuid || need_setpcap; + + // Tell system we want to keep caps across uid change
More comment style.
+ if (need_prctl && prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to set KEEPCAPS")); + goto cleanup; + } + + // Change to the temp capabilities
And again.
+ if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + } + + /* Set UID/GID */ + if (virSetUIDGID(uid, gid) < 0) + goto cleanup; + + // Tell it we are done keeping capabilities
and again
+} + +#else +/* + * On platforms without libcapng, the capabilities setting is treated + * as a NOP. + */ + +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits)
Need ATTRIBUTE_UNUSED on capBits.
+{ + return virSetUIDGID(uid, gid); +} +#endif +
#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virutil.h b/src/util/virutil.h index 4201aa1..2dc3403 100644 --- a/src/util/virutil.h +++ b/src/util/virutil.h @@ -54,6 +54,7 @@ int virPipeReadUntilEOF(int outfd, int errfd, char **outbuf, char **errbuf);
int virSetUIDGID(uid_t uid, gid_t gid); +int virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits);
int virFileReadLimFD(int fd, int maxlen, char **buf) ATTRIBUTE_RETURN_CHECK;
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 02/08/2013 02:01 PM, Eric Blake wrote:
On 02/07/2013 02:37 PM, Laine Stump wrote:
Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities bits, it is necessary to either call capng_change_id() (which unfortunately doesn't not currently call initgroups to setup auxiliary s/doesn't not/doesn't/
group membership), or to perform the small amount of calisthenics contained in the new utility function virSetUIDGIDWithCaps().
Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command (e.g. "filecap /usr/bin/qemu-kvm sys_rawio"). Sheesh - does that mean that we have to fix up qemu packaging for Fedora to include that capability by default?
Well, the kernel guys actually have a patch I'm about to try out that will hopefully eliminate the need for CAP_COMPROMISE_KERNEL, so we won't need it for that. And the demand to have generic SCSI command passthrough is apparently so small that nobody has even tried it (or possibly they just all gave up and set qemu user to root). But at any rate, for now I think we can just leave qemu as it is and let people set the filecap themselves if they need this (unusual) capability; if there is enough whining, then maybe we can pass it along to qemu.
+#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return + * 0 on success, -1 on failure (the original system error remains in + * errno). + */ +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + int ii, capng_ret, ret = -1; + bool need_setgid = false, need_setuid = false; + bool need_prctl = false, need_setpcap = false; + + /* First drop all caps except those in capBits + the extra ones we + * need to change uid/gid and change the capabilities bounding + * set. + */ + + capng_clear(CAPNG_SELECT_BOTH); + + for (ii = 0; ii <= CAP_LAST_CAP; ii++) { + if (capBits & (1ULL << ii)) { + capng_update(CAPNG_ADD, + CAPNG_EFFECTIVE|CAPNG_INHERITABLE| + CAPNG_PERMITTED|CAPNG_BOUNDING_SET, + ii); + } + } + + if (gid != -1 && gid > 0 && !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETGID)) {
Another site that needs casting (gid != (gid_t) -1); and do we really want to short-circuit the gid>0 case?
Yeah, I've fixed all of those.
+ need_setgid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + } + if (uid != -1 && uid > 0 && !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETUID)) { + need_setuid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + } +# ifdef PR_CAPBSET_DROP + // If newer kernel, we need also need setpcap to change the bounding set Prefer /**/ comment style, please.
Oops. That's due to pasting the source of capng_change_id(). I'll fix all of those.
+} + +#else +/* + * On platforms without libcapng, the capabilities setting is treated + * as a NOP. + */ + +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) Need ATTRIBUTE_UNUSED on capBits.
Right.

virCommand was previously calling virSetUIDGID() to change the uid and gid of the child process, then separately calling virSetCapabilities(). This did not work if the desired uid was != 0, since a setuid to anything other than 0 normally clears all capabilities bits. The solution is to use the new virSetUIDGIDWithCaps(), sending it the uid, gid, and capabilities bits. This will get the new process setup properly. Since the static functions virSetCapabilities() and virClearCapabilities are no longer called, they have been removed. NOTE: When combined with "filecap $path-to-qemu sys_rawio", this patch will make CAP_SYS_RAWIO (which is required for passthrough of generic scsi commands to a guest - see commits e8daeeb, 177db08, 397e6a7, and 74e0349) be retained by qemu when necessary. Apparently that capability has been broken for non-root qemu every since it was originally added. --- src/util/vircommand.c | 76 ++++++--------------------------------------------- 1 file changed, 8 insertions(+), 68 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 2eb11f5..2419315 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -177,65 +177,6 @@ virCommandFDSet(int fd, #ifndef WIN32 -static int virClearCapabilities(void) ATTRIBUTE_UNUSED; - -# if WITH_CAPNG -static int virClearCapabilities(void) -{ - int ret; - - capng_clear(CAPNG_SELECT_BOTH); - - if ((ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot clear process capabilities %d"), ret); - return -1; - } - - return 0; -} - -/** - * virSetCapabilities: - * @capabilities - capability flag to set. - * In case of 0, this function is identical to - * virClearCapabilities() - * - */ -static int virSetCapabilities(unsigned long long capabilities) -{ - int ret, i; - - capng_clear(CAPNG_SELECT_BOTH); - - for (i = 0; i <= CAP_LAST_CAP; i++) { - if (capabilities & (1ULL << i)) - capng_update(CAPNG_ADD, CAPNG_BOUNDING_SET, i); - } - - if ((ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot apply process capabilities %d"), ret); - return -1; - } - - return 0; -} -# else -static int virClearCapabilities(void) -{ -// VIR_WARN("libcap-ng support not compiled in, unable to clear " -// "capabilities"); - return 0; -} - -static int -virSetCapabilities(unsigned long long capabilities ATTRIBUTE_UNUSED) -{ - return 0; -} -# endif - /** * virFork: * @pid - a pointer to a pid_t that will receive the return value from @@ -637,18 +578,17 @@ virExec(virCommandPtr cmd) } - if (cmd->uid > 0 || cmd->gid > 0) { - VIR_DEBUG("Setting child uid:gid to %u:%u", cmd->uid, cmd->gid); - if (virSetUIDGID(cmd->uid, cmd->gid) < 0) + /* The steps above may need todo something privileged, so we delay + * setuid and clearing capabilities until the last minute. + */ + if (cmd->uid > 0 || cmd->gid > 0 || + cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) { + VIR_DEBUG("Setting child uid:gid to %u:%u with caps %llx", + cmd->uid, cmd->gid, cmd->capabilities); + if (virSetUIDGIDWithCaps(cmd->uid, cmd->gid, cmd->capabilities) < 0) goto fork_error; } - /* The steps above may need todo something privileged, so - * we delay clearing capabilities until the last minute */ - if (cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) - if (virSetCapabilities(cmd->capabilities) < 0) - goto fork_error; - if (cmd->pwd) { VIR_DEBUG("Running child in %s", cmd->pwd); if (chdir(cmd->pwd) < 0) { -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
virCommand was previously calling virSetUIDGID() to change the uid and gid of the child process, then separately calling virSetCapabilities(). This did not work if the desired uid was != 0, since a setuid to anything other than 0 normally clears all capabilities bits.
The solution is to use the new virSetUIDGIDWithCaps(), sending it the uid, gid, and capabilities bits. This will get the new process setup properly.
Since the static functions virSetCapabilities() and virClearCapabilities are no longer called, they have been removed.
NOTE: When combined with "filecap $path-to-qemu sys_rawio", this patch will make CAP_SYS_RAWIO (which is required for passthrough of generic scsi commands to a guest - see commits e8daeeb, 177db08, 397e6a7, and 74e0349) be retained by qemu when necessary. Apparently that capability has been broken for non-root qemu every since it was
s/every/ever/
originally added. --- src/util/vircommand.c | 76 ++++++--------------------------------------------- 1 file changed, 8 insertions(+), 68 deletions(-)
ACK.
-# else -static int virClearCapabilities(void) -{ -// VIR_WARN("libcap-ng support not compiled in, unable to clear " -// "capabilities");
Odd that we had commented this out previously. Should patch 13/15 log any warnings when we are not preserving/clearing capabilities, rather than silently ignoring the capability request?
- if (cmd->uid > 0 || cmd->gid > 0) { - VIR_DEBUG("Setting child uid:gid to %u:%u", cmd->uid, cmd->gid); - if (virSetUIDGID(cmd->uid, cmd->gid) < 0) + /* The steps above may need todo something privileged, so we delay
As long as you are touching this comment, s/todo/to do/ (but you've moved it at least twice in this series, so it depends on how much churn you want on when you finally fix it). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 02/08/2013 02:05 PM, Eric Blake wrote:
On 02/07/2013 02:37 PM, Laine Stump wrote:
virCommand was previously calling virSetUIDGID() to change the uid and gid of the child process, then separately calling virSetCapabilities(). This did not work if the desired uid was != 0, since a setuid to anything other than 0 normally clears all capabilities bits.
The solution is to use the new virSetUIDGIDWithCaps(), sending it the uid, gid, and capabilities bits. This will get the new process setup properly.
Since the static functions virSetCapabilities() and virClearCapabilities are no longer called, they have been removed.
NOTE: When combined with "filecap $path-to-qemu sys_rawio", this patch will make CAP_SYS_RAWIO (which is required for passthrough of generic scsi commands to a guest - see commits e8daeeb, 177db08, 397e6a7, and 74e0349) be retained by qemu when necessary. Apparently that capability has been broken for non-root qemu every since it was s/every/ever/
originally added. --- src/util/vircommand.c | 76 ++++++--------------------------------------------- 1 file changed, 8 insertions(+), 68 deletions(-) ACK.
-# else -static int virClearCapabilities(void) -{ -// VIR_WARN("libcap-ng support not compiled in, unable to clear " -// "capabilities"); Odd that we had commented this out previously.
1) Even more odd is that this function wasn't being called from anywhere, but still existed *and* didn't cause a compile error. I had thought that unreferenced static functions always caused a warning (which we promote to an error). 2) There are several places where we call virCommandClearCapabilities() without regard to whether or not WITH_CAPNG is actually set. Systems without WITH_CAPNG would have tons of warnings, which I'm guessing people don't want. If we want to do a warning or error, maybe we could do it at configure time (maybe require people to specifically say "--without-capng" or something).
Should patch 13/15 log any warnings when we are not preserving/clearing capabilities, rather than silently ignoring the capability request?
- if (cmd->uid > 0 || cmd->gid > 0) { - VIR_DEBUG("Setting child uid:gid to %u:%u", cmd->uid, cmd->gid); - if (virSetUIDGID(cmd->uid, cmd->gid) < 0) + /* The steps above may need todo something privileged, so we delay
As long as you are touching this comment, s/todo/to do/ (but you've moved it at least twice in this series, so it depends on how much churn you want on when you finally fix it).
Okay. I'll do it here.

Any system with CAP_COMPROMISE_KERNEL available in the kernel was not able to perform PCI passthrough device assignment without 1) running qemu as root *and* 2) setting "clear_emulator_capabilities=0" in /etc/libvirt/qemu.conf. This patch is the final piece to make pci passthrough once again work properly with a non-root qemu. It sets CAP_COMPROMISE_KERNEL; now that virCommand is properly setup to honor that request for non-root child processes, it will actually do some good. It is still necessary to set the file capability for the qemu binary, however (see the rules for determining effective caps of a process running as non-root in "man 7 capabilities"). This can be done with: filecap $path-to-qemu-binary compromise_kernel --- src/qemu/qemu_process.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index a343ac3..537c515 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3859,6 +3859,17 @@ int qemuProcessStart(virConnectPtr conn, if (cfg->clearEmulatorCapabilities) virCommandClearCaps(cmd); +#ifdef CAP_COMPROMISE_KERNEL + /* On kernels that have this capability, we must set it for qemu + * in order for PCI passthrough with -device pci-assign to work + * (the qemu binary must also have that *file* capability, set + * with "filecap /usr/bin/qemu-kvm compromise_kernel", for + * example). (either that, or qemu must be run as root, with + * "clear_emulator_capabilities=0" in /etc/libvirt/qemu.conf). + */ + virCommandAllowCap(cmd, CAP_COMPROMISE_KERNEL); +#endif + /* in case a certain disk is desirous of CAP_SYS_RAWIO, add this */ for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDefPtr disk = vm->def->disks[i]; -- 1.8.1

On 02/07/2013 02:37 PM, Laine Stump wrote:
Any system with CAP_COMPROMISE_KERNEL available in the kernel was not able to perform PCI passthrough device assignment without 1) running qemu as root *and* 2) setting "clear_emulator_capabilities=0" in /etc/libvirt/qemu.conf.
This patch is the final piece to make pci passthrough once again work properly with a non-root qemu. It sets CAP_COMPROMISE_KERNEL; now that virCommand is properly setup to honor that request for non-root child processes, it will actually do some good.
It is still necessary to set the file capability for the qemu binary, however (see the rules for determining effective caps of a process running as non-root in "man 7 capabilities"). This can be done with:
filecap $path-to-qemu-binary compromise_kernel
Sounds like something that should be done by default at least for the Fedora packaging of qemu - that is, if the kernel folks don't honor our request to make CAP_COMPROMISE_KERNEL needed only on open() rather than all read()/write(). We may not need this patch, if the kernel folks are sensible. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Feb 08, 2013 at 12:07:16PM -0700, Eric Blake wrote:
On 02/07/2013 02:37 PM, Laine Stump wrote:
Any system with CAP_COMPROMISE_KERNEL available in the kernel was not able to perform PCI passthrough device assignment without 1) running qemu as root *and* 2) setting "clear_emulator_capabilities=0" in /etc/libvirt/qemu.conf.
This patch is the final piece to make pci passthrough once again work properly with a non-root qemu. It sets CAP_COMPROMISE_KERNEL; now that virCommand is properly setup to honor that request for non-root child processes, it will actually do some good.
It is still necessary to set the file capability for the qemu binary, however (see the rules for determining effective caps of a process running as non-root in "man 7 capabilities"). This can be done with:
filecap $path-to-qemu-binary compromise_kernel
Sounds like something that should be done by default at least for the Fedora packaging of qemu - that is, if the kernel folks don't honor our request to make CAP_COMPROMISE_KERNEL needed only on open() rather than all read()/write().
We may not need this patch, if the kernel folks are sensible.
Yes, I want to push this back onto the kernel developers. IMHO this is a userspace ABI change they've made here. The secureboot stuff should be a complete no-op if the kernel is not booted in secureboot mode, but the current kernel patch does not satisfy that. I don't think it should be libvirt or KVM's job to fix this kernel breakage. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 02/07/2013 04:37 PM, Laine Stump wrote:
There are a bunch of patches here, but each is small and single-purpose to make reviewing easier (and also so that any potential regressions can be easily bisected).
The original purpose of the patches was to permit setting CAP_COMPROMISE_KERNEL for non-root qemu processes, since Fedora 18 now requires that in order for generic PCI passthrough to work (the alternative was to always run qemu as root).
I forgot to mention the BZ I opened to track this: https://bugzilla.redhat.com/show_bug.cgi?id=908888
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump