[libvirt] [PATCHv2 00/15] Permit setting capabilities for uid!=0 processes

(and also properly clear capabilities bounding set for child processes) This replaces the earlier version of the same patches: https://www.redhat.com/archives/libvir-list/2013-February/msg00446.html Many of these patches were already ACKed in the first version. If I haven't modified them (other than rebasing) I note that in the subject. Other patches have been modified based on Dan and Eric's reviews of V1. Also, since I first posted the series, we've had a talk with some kernel people who are amenable to modify the semantics of CAP_COMPROMISE_KERNEL such that it's only checked on open(), not on read or write. If this is done, we will (thankfully!) no longer need to set CAP_COMPROMISE_KERNEL for the qemu process. However, the first 14 patches in this series are still useful. Two big changes are: 1) it allows setting CAP_SYS_RAWIO when needed for generic scsi command passthrough, and 2) the bounding set of child processes will now be properly cleared (it previously wasn't, even though the code looked like it *should* have been doing so). Here is the blurb from V1's intro: There are a bunch of patches here, but each is small and single-purpose to make reviewing easier (and also so that any potential regressions can be easily bisected). The original purpose of the patches was to permit setting CAP_COMPROMISE_KERNEL for non-root qemu processes, since Fedora 18 now requires that in order for generic PCI passthrough to work (the alternative was to always run qemu as root). Although we may not actually want to do that part (if we can convince kernel people to implement CAP_COMPROMISE_KERNEL such that it's only required when *opening* the necessary sysfs file (done by libvirt), rather than for every read/write (done by qemu), then we will not need CAP_COMPROMISE_KERNEL for qemu), but that is just a couple lines in the final patch, and the rest of the series is still useful, as it make dropping/keeping caps truly work for non-root child processes - this has never before been the case. (for example, CAP_SYS_RAWIO is needed for generic scsi passthrough to work, and until now the only way to have that was to run *all* qemus as root). A bit higher level description of what I've done with all the patches: 1) remove the programmable "hook" from virExecWithHook(), since that function was only called from one place, and always with the same hook function. Rename virExecWithHook() to virExec(), and replace the call to that hook with inline code. 2) give virCommand an API to set the intended uid/gid of the command that's going to be run, and use that instead of a "user hook" where appropriate (in the process completely eliminating two hook functions). 3) Also add an API to virCommand to do the final "set the process label" step for selinux/apparmor. 4) Add a new API to the security driver (and use it from qemu) called virSecurityManagerSetChildProcessLabel() which a) is called prior to virCommandRun() rather than from a command "hook" function, b) takes a virCommand, and c) rather than immediately performing the operation (as virSecurityManagerSetProcessLabel() did), merely stores the necessary information in the virCommand so that virExec can perform the operation (setting selinux label, setuid/gid, etc) 5) make a new function combining the setting of uid/gid and maintaining of capabilities, because that is the only way you can set uid!=0 and still maintain capabilities. Use this in virExec() 6) *Finally* set the CAP_COMPROMISE_KERNEL capability unconditionally for all qemu processes. (If we really do have to do this, we may want to consider making it a qemu.conf setting). Laine Stump (15): util: eliminate generic hook from virExecWithHook util: eliminate extra args from virExec util: refactor virCommandHook into virExec and virCommandHandshakeChild util: add virCommandSetUID and virCommandSetGID util: make virSetUIDGID a NOP only when uid or gid is -1 qemu: replace exec hook with virCommandSetUID/GID in qemuCaps* qemu: replace exec hook with virCommandSetUID/GID in storage_backend build: define SECDRIVER_LIBS in Makefile.am util: add security label setting to virCommand security: add new virSecurityManagerSetChildProcessLabel API qemu: let virCommand set child process security labels/uid/gid util: drop capabilities immediately after changing uid/gid of child util: virSetUIDGIDWithCaps - change uid while keeping caps util: maintain caps when running command with uid != 0 qemu: set CAP_COMPROMISE_KERNEL so that pci passthrough works src/Makefile.am | 36 ++- src/libvirt_private.syms | 6 + src/qemu/qemu_capabilities.c | 64 ++--- src/qemu/qemu_process.c | 23 +- src/security/security_apparmor.c | 42 +++- src/security/security_dac.c | 24 +- src/security/security_driver.h | 6 +- src/security/security_manager.c | 13 +- src/security/security_manager.h | 6 +- src/security/security_nop.c | 10 +- src/security/security_selinux.c | 32 +++ src/security/security_stack.c | 20 +- src/storage/storage_backend.c | 28 +-- src/util/vircommand.c | 523 ++++++++++++++++++++------------------- src/util/vircommand.h | 12 +- src/util/virutil.c | 115 ++++++++- src/util/virutil.h | 1 + 17 files changed, 595 insertions(+), 366 deletions(-) -- 1.8.1

virExecWithHook is only called from one place, so it always has the same "hook" function (virHookCommand), and the data sent to that function is always a virCommandPtr, so eliminate the function and generic data from the arglist, and replace it with "virCommandPtr cmd". The call to (hook)(data) is replaced with "virHookCommand(cmd)". Finally, virExecWithHook is renamed to virExec. Indentation has been updated only for code that will remain after the next patch, which will remove all other args to virExec (since they are now redundant, as they're all members of virCommandPtr). --- Change from V1: rebased. src/util/vircommand.c | 79 ++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 42 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index db7dbca..02bf50d 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -1,7 +1,7 @@ /* * vircommand.c: Child command execution * - * Copyright (C) 2010-2012 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -45,7 +45,7 @@ #define VIR_FROM_THIS VIR_FROM_NONE -/* Flags for virExecWithHook */ +/* Flags for virExec */ enum { VIR_EXEC_NONE = 0, VIR_EXEC_NONBLOCK = (1 << 0), @@ -104,6 +104,8 @@ struct _virCommand { unsigned long long capabilities; }; +static int virCommandHook(virCommandPtr cmd); + /* * virCommandFDIsSet: * @fd: FD to test @@ -390,21 +392,19 @@ prepareStdFd(int fd, int std) * VIR_EXEC_NONE : Default function behavior * VIR_EXEC_NONBLOCK : Set child process output fd's as non-blocking * VIR_EXEC_DAEMON : Daemonize the child process - * @hook optional virExecHook function to call prior to exec - * @data data to pass to the hook function + * @cmd virCommandPtr to pass to virCommandHook * @pidfile path to use as pidfile for daemonized process (needs DAEMON flag) * @capabilities capabilities to keep */ static int -virExecWithHook(const char *const*argv, +virExec(virCommandPtr cmd, + const char *const*argv, const char *const*envp, const int *keepfd, int keepfd_size, pid_t *retpid, int infd, int *outfd, int *errfd, unsigned int flags, - virExecHook hook, - void *data, char *pidfile, unsigned long long capabilities) { @@ -417,6 +417,7 @@ virExecWithHook(const char *const*argv, int tmpfd; const char *binary = NULL; int forkRet; + struct sigaction waxon, waxoff; if (argv[0][0] != '/') { if (!(binary = virFindFileInPath(argv[0]))) { @@ -602,33 +603,30 @@ virExecWithHook(const char *const*argv, } } - if (hook) { - /* virFork reset all signal handlers to the defaults. - * This is good for the child process, but our hook - * risks running something that generates SIGPIPE, - * so we need to temporarily block that again - */ - struct sigaction waxon, waxoff; - memset(&waxoff, 0, sizeof(waxoff)); - waxoff.sa_handler = SIG_IGN; - sigemptyset(&waxoff.sa_mask); - memset(&waxon, 0, sizeof(waxon)); - if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) { - virReportSystemError(errno, "%s", - _("Could not disable SIGPIPE")); - goto fork_error; - } + /* virFork reset all signal handlers to the defaults. + * This is good for the child process, but our hook + * risks running something that generates SIGPIPE, + * so we need to temporarily block that again + */ + memset(&waxoff, 0, sizeof(waxoff)); + waxoff.sa_handler = SIG_IGN; + sigemptyset(&waxoff.sa_mask); + memset(&waxon, 0, sizeof(waxon)); + if (sigaction(SIGPIPE, &waxoff, &waxon) < 0) { + virReportSystemError(errno, "%s", + _("Could not disable SIGPIPE")); + goto fork_error; + } - if ((hook)(data) != 0) { - VIR_DEBUG("Hook function failed."); - goto fork_error; - } + if (virCommandHook(cmd) != 0) { + VIR_DEBUG("Hook function failed."); + goto fork_error; + } - if (sigaction(SIGPIPE, &waxon, NULL) < 0) { - virReportSystemError(errno, "%s", - _("Could not re-enable SIGPIPE")); - goto fork_error; - } + if (sigaction(SIGPIPE, &waxon, NULL) < 0) { + virReportSystemError(errno, "%s", + _("Could not re-enable SIGPIPE")); + goto fork_error; } /* The steps above may need todo something privileged, so @@ -712,7 +710,8 @@ virRun(const char *const *argv ATTRIBUTE_UNUSED, } static int -virExecWithHook(const char *const*argv ATTRIBUTE_UNUSED, +virExec(virCommandPtr cmd ATTRIBUTE_UNUSED, + const char *const*argv ATTRIBUTE_UNUSED, const char *const*envp ATTRIBUTE_UNUSED, const int *keepfd ATTRIBUTE_UNUSED, int keepfd_size ATTRIBUTE_UNUSED, @@ -721,15 +720,13 @@ virExecWithHook(const char *const*argv ATTRIBUTE_UNUSED, int *outfd ATTRIBUTE_UNUSED, int *errfd ATTRIBUTE_UNUSED, int flags_unused ATTRIBUTE_UNUSED, - virExecHook hook ATTRIBUTE_UNUSED, - void *data ATTRIBUTE_UNUSED, char *pidfile ATTRIBUTE_UNUSED, unsigned long long capabilities ATTRIBUTE_UNUSED) { /* XXX: Some day we can implement pieces of virCommand/virExec on * top of _spawn() or CreateProcess(), but we can't implement * everything, since mingw completely lacks fork(), so we cannot - * run hook code in the child. */ + * run our own code in the child process. */ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("virExec is not implemented for WIN32")); return -1; @@ -2061,9 +2058,8 @@ virCommandRun(virCommandPtr cmd, int *exitstatus) * Perform all virCommand-specific actions, along with the user hook. */ static int -virCommandHook(void *data) +virCommandHook(virCommandPtr cmd) { - virCommandPtr cmd = data; int res = 0; if (cmd->hook) { @@ -2401,7 +2397,8 @@ virCommandRunAsync(virCommandPtr cmd, pid_t *pid) VIR_DEBUG("About to run %s", str ? str : cmd->args[0]); VIR_FREE(str); - ret = virExecWithHook((const char *const *)cmd->args, + ret = virExec(cmd, + (const char *const *)cmd->args, (const char *const *)cmd->env, cmd->preserve, cmd->preserve_size, @@ -2410,8 +2407,6 @@ virCommandRunAsync(virCommandPtr cmd, pid_t *pid) cmd->outfdptr, cmd->errfdptr, cmd->flags, - virCommandHook, - cmd, cmd->pidfile, cmd->capabilities); @@ -2549,7 +2544,7 @@ void virCommandAbort(virCommandPtr cmd ATTRIBUTE_UNUSED) { /* Mingw lacks WNOHANG and kill(). But since we haven't ported - * virExecWithHook to mingw yet, there's no process to be killed, + * virExec to mingw yet, there's no process to be killed, * making this implementation trivially correct for now :) */ } #endif -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
virExecWithHook is only called from one place, so it always has the same "hook" function (virHookCommand), and the data sent to that function is always a virCommandPtr, so eliminate the function and generic data from the arglist, and replace it with "virCommandPtr cmd". The call to (hook)(data) is replaced with "virHookCommand(cmd)". Finally, virExecWithHook is renamed to virExec.
Indentation has been updated only for code that will remain after the next patch, which will remove all other args to virExec (since they are now redundant, as they're all members of virCommandPtr). --- Change from V1: rebased.
+static int virCommandHook(virCommandPtr cmd); +
Is it worth a separate patch that just does code motion to hoist the function, to avoid a forward declaration? (I'm a fan of topological sorting of static functions, where it makes sense, if you haven't guessed.) But no impact to this particular patch, so the ack stands. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, Feb 12, 2013 at 03:15:35PM -0500, Laine Stump wrote:
virExecWithHook is only called from one place, so it always has the same "hook" function (virHookCommand), and the data sent to that function is always a virCommandPtr, so eliminate the function and generic data from the arglist, and replace it with "virCommandPtr cmd". The call to (hook)(data) is replaced with "virHookCommand(cmd)". Finally, virExecWithHook is renamed to virExec.
Indentation has been updated only for code that will remain after the next patch, which will remove all other args to virExec (since they are now redundant, as they're all members of virCommandPtr). --- Change from V1: rebased.
src/util/vircommand.c | 79 ++++++++++++++++++++++++--------------------------- 1 file changed, 37 insertions(+), 42 deletions(-)
If you have a series like this where a number of patches have been ACKd on previous posting, just merge the ACKd patches. Don't keep re-posting them, unless they depend on earlier patches in the series which are not yet ACKd. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

All args except "cmd" in the call to virExec are now redundant, since they can all be found in cmd, so remove the args and reference the data directly in cmd. One exception to this is that "infd" was being modified within virExec, and modifying the original in cmd caused make check failures, so cmd->infd is copied to a local, and the local is used during virExec(). --- Change from V1: rebased. src/util/vircommand.c | 142 +++++++++++++++++--------------------------------- 1 file changed, 49 insertions(+), 93 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 02bf50d..b41c78b 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -378,40 +378,18 @@ prepareStdFd(int fd, int std) } /* - * @argv argv to exec - * @envp optional environment to use for exec - * @keepfd options fd_ret to keep open for child process - * @retpid optional pointer to store child process pid - * @infd optional file descriptor to use as child input, otherwise /dev/null - * @outfd optional pointer to communicate output fd behavior - * outfd == NULL : Use /dev/null - * *outfd == -1 : Use a new fd - * *outfd != -1 : Use *outfd - * @errfd optional pointer to communcate error fd behavior. See outfd - * @flags possible combination of the following: - * VIR_EXEC_NONE : Default function behavior - * VIR_EXEC_NONBLOCK : Set child process output fd's as non-blocking - * VIR_EXEC_DAEMON : Daemonize the child process - * @cmd virCommandPtr to pass to virCommandHook - * @pidfile path to use as pidfile for daemonized process (needs DAEMON flag) - * @capabilities capabilities to keep + * virExec: + * @cmd virCommandPtr containing all information about the program to + * exec. */ static int -virExec(virCommandPtr cmd, - const char *const*argv, - const char *const*envp, - const int *keepfd, - int keepfd_size, - pid_t *retpid, - int infd, int *outfd, int *errfd, - unsigned int flags, - char *pidfile, - unsigned long long capabilities) +virExec(virCommandPtr cmd) { pid_t pid; int null = -1, i, openmax; int pipeout[2] = {-1,-1}; int pipeerr[2] = {-1,-1}; + int childin = cmd->infd; int childout = -1; int childerr = -1; int tmpfd; @@ -419,41 +397,41 @@ virExec(virCommandPtr cmd, int forkRet; struct sigaction waxon, waxoff; - if (argv[0][0] != '/') { - if (!(binary = virFindFileInPath(argv[0]))) { + if (cmd->args[0][0] != '/') { + if (!(binary = virFindFileInPath(cmd->args[0]))) { virReportSystemError(ENOENT, _("Cannot find '%s' in path"), - argv[0]); + cmd->args[0]); return -1; } } else { - binary = argv[0]; + binary = cmd->args[0]; } - if (infd < 0) { + if (childin < 0) { if (getDevNull(&null) < 0) goto cleanup; - infd = null; + childin = null; } - if (outfd != NULL) { - if (*outfd == -1) { + if (cmd->outfdptr != NULL) { + if (*cmd->outfdptr == -1) { if (pipe2(pipeout, O_CLOEXEC) < 0) { virReportSystemError(errno, "%s", _("cannot create pipe")); goto cleanup; } - if ((flags & VIR_EXEC_NONBLOCK) && + if ((cmd->flags & VIR_EXEC_NONBLOCK) && virSetNonBlock(pipeout[0]) == -1) { - virReportSystemError(errno, - "%s", _("Failed to set non-blocking file descriptor flag")); + virReportSystemError(errno, "%s", + _("Failed to set non-blocking file descriptor flag")); goto cleanup; } childout = pipeout[1]; } else { - childout = *outfd; + childout = *cmd->outfdptr; } } else { if (getDevNull(&null) < 0) @@ -461,26 +439,26 @@ virExec(virCommandPtr cmd, childout = null; } - if (errfd != NULL) { - if (errfd == outfd) { + if (cmd->errfdptr != NULL) { + if (cmd->errfdptr == cmd->outfdptr) { childerr = childout; - } else if (*errfd == -1) { + } else if (*cmd->errfdptr == -1) { if (pipe2(pipeerr, O_CLOEXEC) < 0) { virReportSystemError(errno, "%s", _("Failed to create pipe")); goto cleanup; } - if ((flags & VIR_EXEC_NONBLOCK) && + if ((cmd->flags & VIR_EXEC_NONBLOCK) && virSetNonBlock(pipeerr[0]) == -1) { - virReportSystemError(errno, - "%s", _("Failed to set non-blocking file descriptor flag")); + virReportSystemError(errno, "%s", + _("Failed to set non-blocking file descriptor flag")); goto cleanup; } childerr = pipeerr[1]; } else { - childerr = *errfd; + childerr = *cmd->errfdptr; } } else { if (getDevNull(&null) < 0) @@ -500,18 +478,18 @@ virExec(virCommandPtr cmd, } VIR_FORCE_CLOSE(null); - if (outfd && *outfd == -1) { + if (cmd->outfdptr && *cmd->outfdptr == -1) { VIR_FORCE_CLOSE(pipeout[1]); - *outfd = pipeout[0]; + *cmd->outfdptr = pipeout[0]; } - if (errfd && *errfd == -1) { + if (cmd->errfdptr && *cmd->errfdptr == -1) { VIR_FORCE_CLOSE(pipeerr[1]); - *errfd = pipeerr[0]; + *cmd->errfdptr = pipeerr[0]; } - *retpid = pid; + cmd->pid = pid; - if (binary != argv[0]) + if (binary != cmd->args[0]) VIR_FREE(binary); return 0; @@ -528,9 +506,10 @@ virExec(virCommandPtr cmd, openmax = sysconf(_SC_OPEN_MAX); for (i = 3; i < openmax; i++) { - if (i == infd || i == childout || i == childerr) + if (i == childin || i == childout || i == childerr) continue; - if (!keepfd || !virCommandFDIsSet(i, keepfd, keepfd_size)) { + if (!cmd->preserve || + !virCommandFDIsSet(i, cmd->preserve, cmd->preserve_size)) { tmpfd = i; VIR_MASS_CLOSE(tmpfd); } else if (virSetInherit(i, true) < 0) { @@ -539,7 +518,7 @@ virExec(virCommandPtr cmd, } } - if (prepareStdFd(infd, STDIN_FILENO) < 0) { + if (prepareStdFd(childin, STDIN_FILENO) < 0) { virReportSystemError(errno, "%s", _("failed to setup stdin file handle")); goto fork_error; @@ -555,9 +534,9 @@ virExec(virCommandPtr cmd, goto fork_error; } - if (infd != STDIN_FILENO && infd != null && infd != childerr && - infd != childout) - VIR_FORCE_CLOSE(infd); + if (childin != STDIN_FILENO && childin != null && + childin != childerr && childin != childout) + VIR_FORCE_CLOSE(childin); if (childout > STDERR_FILENO && childout != null && childout != childerr) VIR_FORCE_CLOSE(childout); if (childerr > STDERR_FILENO && childerr != null) @@ -569,7 +548,7 @@ virExec(virCommandPtr cmd, /* Daemonize as late as possible, so the parent process can detect * the above errors with wait* */ - if (flags & VIR_EXEC_DAEMON) { + if (cmd->flags & VIR_EXEC_DAEMON) { if (setsid() < 0) { virReportSystemError(errno, "%s", _("cannot become session leader")); @@ -590,13 +569,13 @@ virExec(virCommandPtr cmd, } if (pid > 0) { - if (pidfile && (virPidFileWritePath(pidfile,pid) < 0)) { + if (cmd->pidfile && (virPidFileWritePath(cmd->pidfile, pid) < 0)) { kill(pid, SIGTERM); usleep(500*1000); kill(pid, SIGTERM); virReportSystemError(errno, _("could not write pidfile %s for %d"), - pidfile, pid); + cmd->pidfile, pid); goto fork_error; } _exit(0); @@ -631,21 +610,21 @@ virExec(virCommandPtr cmd, /* The steps above may need todo something privileged, so * we delay clearing capabilities until the last minute */ - if (capabilities || (flags & VIR_EXEC_CLEAR_CAPS)) - if (virSetCapabilities(capabilities) < 0) + if (cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) + if (virSetCapabilities(cmd->capabilities) < 0) goto fork_error; /* Close logging again to ensure no FDs leak to child */ virLogReset(); - if (envp) - execve(binary, (char **) argv, (char**)envp); + if (cmd->env) + execve(binary, cmd->args, cmd->env); else - execv(binary, (char **) argv); + execv(binary, cmd->args); virReportSystemError(errno, _("cannot execute binary %s"), - argv[0]); + cmd->args[0]); fork_error: virDispatchError(NULL); @@ -655,7 +634,7 @@ virExec(virCommandPtr cmd, /* This is cleanup of parent process only - child should never jump here on error */ - if (binary != argv[0]) + if (binary != cmd->args[0]) VIR_FREE(binary); /* NB we don't virReportError() on any failures here @@ -710,18 +689,7 @@ virRun(const char *const *argv ATTRIBUTE_UNUSED, } static int -virExec(virCommandPtr cmd ATTRIBUTE_UNUSED, - const char *const*argv ATTRIBUTE_UNUSED, - const char *const*envp ATTRIBUTE_UNUSED, - const int *keepfd ATTRIBUTE_UNUSED, - int keepfd_size ATTRIBUTE_UNUSED, - pid_t *retpid ATTRIBUTE_UNUSED, - int infd ATTRIBUTE_UNUSED, - int *outfd ATTRIBUTE_UNUSED, - int *errfd ATTRIBUTE_UNUSED, - int flags_unused ATTRIBUTE_UNUSED, - char *pidfile ATTRIBUTE_UNUSED, - unsigned long long capabilities ATTRIBUTE_UNUSED) +virExec(virCommandPtr cmd ATTRIBUTE_UNUSED) { /* XXX: Some day we can implement pieces of virCommand/virExec on * top of _spawn() or CreateProcess(), but we can't implement @@ -2397,19 +2365,7 @@ virCommandRunAsync(virCommandPtr cmd, pid_t *pid) VIR_DEBUG("About to run %s", str ? str : cmd->args[0]); VIR_FREE(str); - ret = virExec(cmd, - (const char *const *)cmd->args, - (const char *const *)cmd->env, - cmd->preserve, - cmd->preserve_size, - &cmd->pid, - cmd->infd, - cmd->outfdptr, - cmd->errfdptr, - cmd->flags, - cmd->pidfile, - cmd->capabilities); - + ret = virExec(cmd); VIR_DEBUG("Command result %d, with PID %d", ret, (int)cmd->pid); -- 1.8.1

--- change from V1: rebased src/util/vircommand.c | 149 +++++++++++++++++++++++--------------------------- 1 file changed, 68 insertions(+), 81 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index b41c78b..ba81c14 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -104,7 +104,7 @@ struct _virCommand { unsigned long long capabilities; }; -static int virCommandHook(virCommandPtr cmd); +static int virCommandHandshakeChild(virCommandPtr cmd); /* * virCommandFDIsSet: @@ -394,7 +394,7 @@ virExec(virCommandPtr cmd) int childerr = -1; int tmpfd; const char *binary = NULL; - int forkRet; + int forkRet, ret; struct sigaction waxon, waxoff; if (cmd->args[0][0] != '/') { @@ -597,11 +597,26 @@ virExec(virCommandPtr cmd) goto fork_error; } - if (virCommandHook(cmd) != 0) { - VIR_DEBUG("Hook function failed."); - goto fork_error; + if (cmd->hook) { + VIR_DEBUG("Run hook %p %p", cmd->hook, cmd->opaque); + ret = cmd->hook(cmd->opaque); + VIR_DEBUG("Done hook %d", ret); + if (ret < 0) + goto fork_error; } + if (cmd->pwd) { + VIR_DEBUG("Running child in %s", cmd->pwd); + if (chdir(cmd->pwd) < 0) { + virReportSystemError(errno, + _("Unable to change to %s"), cmd->pwd); + goto fork_error; + } + } + + if (virCommandHandshakeChild(cmd) < 0) + goto fork_error; + if (sigaction(SIGPIPE, &waxon, NULL) < 0) { virReportSystemError(errno, "%s", _("Could not re-enable SIGPIPE")); @@ -2022,82 +2037,6 @@ virCommandRun(virCommandPtr cmd, int *exitstatus) } -/* - * Perform all virCommand-specific actions, along with the user hook. - */ -static int -virCommandHook(virCommandPtr cmd) -{ - int res = 0; - - if (cmd->hook) { - VIR_DEBUG("Run hook %p %p", cmd->hook, cmd->opaque); - res = cmd->hook(cmd->opaque); - VIR_DEBUG("Done hook %d", res); - } - if (res == 0 && cmd->pwd) { - VIR_DEBUG("Running child in %s", cmd->pwd); - res = chdir(cmd->pwd); - if (res < 0) { - virReportSystemError(errno, - _("Unable to change to %s"), cmd->pwd); - } - } - if (cmd->handshake) { - char c = res < 0 ? '0' : '1'; - int rv; - VIR_DEBUG("Notifying parent for handshake start on %d", - cmd->handshakeWait[1]); - if (safewrite(cmd->handshakeWait[1], &c, sizeof(c)) != sizeof(c)) { - virReportSystemError(errno, "%s", - _("Unable to notify parent process")); - return -1; - } - - /* On failure we pass the error message back to parent, - * so they don't have to dig through stderr logs - */ - if (res < 0) { - virErrorPtr err = virGetLastError(); - const char *msg = err ? err->message : - _("Unknown failure during hook execution"); - size_t len = strlen(msg) + 1; - if (safewrite(cmd->handshakeWait[1], msg, len) != len) { - virReportSystemError(errno, "%s", - _("Unable to send error to parent")); - return -1; - } - return -1; - } - - VIR_DEBUG("Waiting on parent for handshake complete on %d", - cmd->handshakeNotify[0]); - if ((rv = saferead(cmd->handshakeNotify[0], &c, - sizeof(c))) != sizeof(c)) { - if (rv < 0) - virReportSystemError(errno, "%s", - _("Unable to wait on parent process")); - else - virReportSystemError(EIO, "%s", - _("libvirtd quit during handshake")); - return -1; - } - if (c != '1') { - virReportSystemError(EINVAL, - _("Unexpected confirm code '%c' from parent"), - c); - return -1; - } - VIR_FORCE_CLOSE(cmd->handshakeWait[1]); - VIR_FORCE_CLOSE(cmd->handshakeNotify[0]); - } - - VIR_DEBUG("Hook is done %d", res); - - return res; -} - - static void virCommandHandleReadWrite(int watch, int fd, int events, void *opaque) { @@ -2546,6 +2485,54 @@ void virCommandRequireHandshake(virCommandPtr cmd) cmd->handshake = true; } +/* virCommandHandshakeChild: + * + * child side of handshake - called by child process in virExec() to + * indicate to parent that the child process has successfully + * completed its pre-exec initialization. + */ +static int +virCommandHandshakeChild(virCommandPtr cmd) +{ + char c = '1'; + int rv; + + if (!cmd->handshake) + return true; + + VIR_DEBUG("Notifying parent for handshake start on %d", + cmd->handshakeWait[1]); + if (safewrite(cmd->handshakeWait[1], &c, sizeof(c)) != sizeof(c)) { + virReportSystemError(errno, "%s", + _("Unable to notify parent process")); + return -1; + } + + VIR_DEBUG("Waiting on parent for handshake complete on %d", + cmd->handshakeNotify[0]); + if ((rv = saferead(cmd->handshakeNotify[0], &c, + sizeof(c))) != sizeof(c)) { + if (rv < 0) + virReportSystemError(errno, "%s", + _("Unable to wait on parent process")); + else + virReportSystemError(EIO, "%s", + _("libvirtd quit during handshake")); + return -1; + } + if (c != '1') { + virReportSystemError(EINVAL, + _("Unexpected confirm code '%c' from parent"), + c); + return -1; + } + VIR_FORCE_CLOSE(cmd->handshakeWait[1]); + VIR_FORCE_CLOSE(cmd->handshakeNotify[0]); + + VIR_DEBUG("Handshake with parent is done"); + return 0; +} + /** * virCommandHandshakeWait: * @cmd: command to wait on -- 1.8.1

If a uid and/or gid is specified for a command, it will be set just after the user-supplied post-fork "hook" function is called. The intent is that this can replace user hook functions that set uid/gid. This moves the setting of uid/gid and dropping of capabilities closer to each other, which is important since the two should really be done at the same time (libcapng provides a single function that does both, which we will be unable to use, but want to mimic as closely as possible). --- Change from V1: * only bypass uid/gid setting if they are -1 * cast -1 to ([gu]id_t) when comparing to a [gu]id_t * cast uid and gid to (int) for printing src/libvirt_private.syms | 2 ++ src/util/vircommand.c | 29 +++++++++++++++++++++++++++++ src/util/vircommand.h | 6 +++++- 3 files changed, 36 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b9d45a2..511a686 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -158,12 +158,14 @@ virCommandRun; virCommandRunAsync; virCommandSetErrorBuffer; virCommandSetErrorFD; +virCommandSetGID; virCommandSetInputBuffer; virCommandSetInputFD; virCommandSetOutputBuffer; virCommandSetOutputFD; virCommandSetPidFile; virCommandSetPreExecHook; +virCommandSetUID; virCommandSetWorkingDirectory; virCommandToString; virCommandTransferFD; diff --git a/src/util/vircommand.c b/src/util/vircommand.c index ba81c14..84d275f 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -101,6 +101,8 @@ struct _virCommand { char *pidfile; bool reap; + uid_t uid; + gid_t gid; unsigned long long capabilities; }; @@ -605,6 +607,13 @@ virExec(virCommandPtr cmd) goto fork_error; } + if (cmd->uid != (uid_t)-1 || cmd->gid != (gid_t)-1) { + VIR_DEBUG("Setting child uid:gid to %d:%d", + (int)cmd->uid, (int)cmd->gid); + if (virSetUIDGID(cmd->uid, cmd->gid) < 0) + goto fork_error; + } + if (cmd->pwd) { VIR_DEBUG("Running child in %s", cmd->pwd); if (chdir(cmd->pwd) < 0) { @@ -767,6 +776,8 @@ virCommandNewArgs(const char *const*args) cmd->infd = cmd->outfd = cmd->errfd = -1; cmd->inWatch = cmd->outWatch = cmd->errWatch = -1; cmd->pid = -1; + cmd->uid = -1; + cmd->gid = -1; virCommandAddArgSet(cmd, args); @@ -905,6 +916,24 @@ virCommandSetPidFile(virCommandPtr cmd, const char *pidfile) } +void +virCommandSetGID(virCommandPtr cmd, gid_t gid) +{ + if (!cmd || cmd->has_error) + return; + + cmd->gid = gid; +} + +void +virCommandSetUID(virCommandPtr cmd, uid_t uid) +{ + if (!cmd || cmd->has_error) + return; + + cmd->uid = uid; +} + /** * virCommandClearCaps: * @cmd: the command to modify diff --git a/src/util/vircommand.h b/src/util/vircommand.h index c1a2e24..a4022fa 100644 --- a/src/util/vircommand.h +++ b/src/util/vircommand.h @@ -1,7 +1,7 @@ /* * vircommand.h: Child command execution * - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -61,6 +61,10 @@ void virCommandTransferFD(virCommandPtr cmd, void virCommandSetPidFile(virCommandPtr cmd, const char *pidfile) ATTRIBUTE_NONNULL(2); +void virCommandSetGID(virCommandPtr cmd, gid_t gid); + +void virCommandSetUID(virCommandPtr cmd, uid_t uid); + void virCommandClearCaps(virCommandPtr cmd); void virCommandAllowCap(virCommandPtr cmd, -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
If a uid and/or gid is specified for a command, it will be set just after the user-supplied post-fork "hook" function is called.
The intent is that this can replace user hook functions that set uid/gid. This moves the setting of uid/gid and dropping of capabilities closer to each other, which is important since the two should really be done at the same time (libcapng provides a single function that does both, which we will be unable to use, but want to mimic as closely as possible). --- Change from V1: * only bypass uid/gid setting if they are -1 * cast -1 to ([gu]id_t) when comparing to a [gu]id_t * cast uid and gid to (int) for printing
src/libvirt_private.syms | 2 ++ src/util/vircommand.c | 29 +++++++++++++++++++++++++++++ src/util/vircommand.h | 6 +++++- 3 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b9d45a2..511a686 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -158,12 +158,14 @@ virCommandRun; virCommandRunAsync; virCommandSetErrorBuffer; virCommandSetErrorFD; +virCommandSetGID; virCommandSetInputBuffer; virCommandSetInputFD; virCommandSetOutputBuffer; virCommandSetOutputFD; virCommandSetPidFile; virCommandSetPreExecHook; +virCommandSetUID;
Is it common enough to set both gid/uid at once, in order to make this a single function virCommandSetUIDGID?
@@ -605,6 +607,13 @@ virExec(virCommandPtr cmd) goto fork_error; }
+ if (cmd->uid != (uid_t)-1 || cmd->gid != (gid_t)-1) { + VIR_DEBUG("Setting child uid:gid to %d:%d", + (int)cmd->uid, (int)cmd->gid); + if (virSetUIDGID(cmd->uid, cmd->gid) < 0)
In fact, down at a lower layer in the stack, we pass both ids at once. Hmm, in the chown() case, doing both at once lets you use one syscall instead of two; but in the set*id() functions, it's separate syscalls for uid vs. gid no matter what we do, so I guess it doesn't really matter whether it is two separate calls or one combined call higher up in the stack. But what you have with separate calls works, so I don't mind whether you keep it as-is, to save the hassle of rippling a combined call through the rest of the series. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 02/12/2013 07:16 PM, Eric Blake wrote:
On 02/12/2013 01:15 PM, Laine Stump wrote:
If a uid and/or gid is specified for a command, it will be set just after the user-supplied post-fork "hook" function is called.
The intent is that this can replace user hook functions that set uid/gid. This moves the setting of uid/gid and dropping of capabilities closer to each other, which is important since the two should really be done at the same time (libcapng provides a single function that does both, which we will be unable to use, but want to mimic as closely as possible). --- Change from V1: * only bypass uid/gid setting if they are -1 * cast -1 to ([gu]id_t) when comparing to a [gu]id_t * cast uid and gid to (int) for printing
src/libvirt_private.syms | 2 ++ src/util/vircommand.c | 29 +++++++++++++++++++++++++++++ src/util/vircommand.h | 6 +++++- 3 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b9d45a2..511a686 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -158,12 +158,14 @@ virCommandRun; virCommandRunAsync; virCommandSetErrorBuffer; virCommandSetErrorFD; +virCommandSetGID; virCommandSetInputBuffer; virCommandSetInputFD; virCommandSetOutputBuffer; virCommandSetOutputFD; virCommandSetPidFile; virCommandSetPreExecHook; +virCommandSetUID; Is it common enough to set both gid/uid at once, in order to make this a single function virCommandSetUIDGID?
Well, at the bottom of the call chain, all three operations (setuid, setgid, and set/clear capabilities) have to be done in a single function, but setting capabilities is necessarily a separate function at the higher level because you need to call it multiple times to set multiple capabilities, and it seemed like a natural extension to make setuid and setgid separate functions at this level too; seems to go along with the general philosophy in virCommand of having many separate simple calls, rather than a few really complicated ones.
@@ -605,6 +607,13 @@ virExec(virCommandPtr cmd) goto fork_error; }
+ if (cmd->uid != (uid_t)-1 || cmd->gid != (gid_t)-1) { + VIR_DEBUG("Setting child uid:gid to %d:%d", + (int)cmd->uid, (int)cmd->gid); + if (virSetUIDGID(cmd->uid, cmd->gid) < 0) In fact, down at a lower layer in the stack, we pass both ids at once.
Hmm, in the chown() case, doing both at once lets you use one syscall instead of two; but in the set*id() functions, it's separate syscalls for uid vs. gid no matter what we do, so I guess it doesn't really matter whether it is two separate calls or one combined call higher up in the stack. But what you have with separate calls works, so I don't mind whether you keep it as-is, to save the hassle of rippling a combined call through the rest of the series.
ACK.

Rather than treating uid:gid of 0:0 as a NOP, we blindly pass that through to the lower layers. However, we *do* check for a requested value of "-1" to mean "don't change this setting". setregid() and setreuid() already interpret -1 as a NOP, so this is just an optimization, but we are also calling getpwuid_r and initgroups, and it's unclear what the former would do with a uid of -1. --- Change from V1: * only bypass uid/gid setting if they are -1 rather than > 0 * cast -1 to ([gu]id_t) when comparing to a [gu]id_t src/util/virutil.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/virutil.c b/src/util/virutil.c index 24ba954..0d7db00 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -2687,7 +2687,7 @@ virSetUIDGID(uid_t uid, gid_t gid) int err; char *buf = NULL; - if (gid > 0) { + if (gid != (gid_t)-1) { if (setregid(gid, gid) < 0) { virReportSystemError(err = errno, _("cannot change to '%d' group"), @@ -2696,7 +2696,7 @@ virSetUIDGID(uid_t uid, gid_t gid) } } - if (uid > 0) { + if (uid != (uid_t)-1) { # ifdef HAVE_INITGROUPS struct passwd pwd, *pwd_result; size_t bufsize; -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
Rather than treating uid:gid of 0:0 as a NOP, we blindly pass that through to the lower layers. However, we *do* check for a requested value of "-1" to mean "don't change this setting". setregid() and setreuid() already interpret -1 as a NOP, so this is just an optimization, but we are also calling getpwuid_r and initgroups, and it's unclear what the former would do with a uid of -1. --- Change from V1: * only bypass uid/gid setting if they are -1 rather than > 0 * cast -1 to ([gu]id_t) when comparing to a [gu]id_t
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Setting the uid/gid of the child process was the only thing done by the hook function in this case, and that can now be done more simply with virCommandSetUID/GID. --- Change from V1: rebased. src/qemu/qemu_capabilities.c | 64 +++++++++++++------------------------------- 1 file changed, 18 insertions(+), 46 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 4efe052..51fc9dc 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -279,37 +279,10 @@ static const char *virQEMUCapsArchToString(virArch arch) } -struct _virQEMUCapsHookData { - uid_t runUid; - gid_t runGid; -}; -typedef struct _virQEMUCapsHookData virQEMUCapsHookData; -typedef virQEMUCapsHookData *virQEMUCapsHookDataPtr; - -static int virQEMUCapsHook(void * data) -{ - int ret; - virQEMUCapsHookDataPtr hookData = data; - - if (!hookData) { - virReportError(VIR_ERR_INTERNAL_ERROR, "%s", - _("QEMU uid:gid not specified by caller")); - ret = -1; - goto cleanup; - } - - VIR_DEBUG("Switch QEMU uid:gid to %d:%d", - hookData->runUid, hookData->runGid); - ret = virSetUIDGID(hookData->runUid, hookData->runGid); - -cleanup: - return ret; -} - static virCommandPtr virQEMUCapsProbeCommand(const char *qemu, virQEMUCapsPtr qemuCaps, - virQEMUCapsHookDataPtr hookData) + uid_t runUid, gid_t runGid) { virCommandPtr cmd = virCommandNew(qemu); @@ -322,7 +295,8 @@ virQEMUCapsProbeCommand(const char *qemu, virCommandAddEnvPassCommon(cmd); virCommandClearCaps(cmd); - virCommandSetPreExecHook(cmd, virQEMUCapsHook, hookData); + virCommandSetGID(cmd, runGid); + virCommandSetUID(cmd, runUid); return cmd; } @@ -416,7 +390,8 @@ no_memory: } static int -virQEMUCapsProbeMachineTypes(virQEMUCapsPtr qemuCaps, virQEMUCapsHookDataPtr hookData) +virQEMUCapsProbeMachineTypes(virQEMUCapsPtr qemuCaps, + uid_t runUid, gid_t runGid) { char *output; int ret = -1; @@ -433,7 +408,7 @@ virQEMUCapsProbeMachineTypes(virQEMUCapsPtr qemuCaps, virQEMUCapsHookDataPtr hoo return -1; } - cmd = virQEMUCapsProbeCommand(qemuCaps->binary, qemuCaps, hookData); + cmd = virQEMUCapsProbeCommand(qemuCaps->binary, qemuCaps, runUid, runGid); virCommandAddArgList(cmd, "-M", "?", NULL); virCommandSetOutputBuffer(cmd, &output); @@ -572,7 +547,7 @@ cleanup: } static int -virQEMUCapsProbeCPUModels(virQEMUCapsPtr qemuCaps, virQEMUCapsHookDataPtr hookData) +virQEMUCapsProbeCPUModels(virQEMUCapsPtr qemuCaps, uid_t runUid, gid_t runGid) { char *output = NULL; int ret = -1; @@ -590,7 +565,7 @@ virQEMUCapsProbeCPUModels(virQEMUCapsPtr qemuCaps, virQEMUCapsHookDataPtr hookDa return 0; } - cmd = virQEMUCapsProbeCommand(qemuCaps->binary, qemuCaps, hookData); + cmd = virQEMUCapsProbeCommand(qemuCaps->binary, qemuCaps, runUid, runGid); virCommandAddArgList(cmd, "-cpu", "?", NULL); virCommandSetOutputBuffer(cmd, &output); @@ -1601,7 +1576,7 @@ virQEMUCapsParseDeviceStr(virQEMUCapsPtr qemuCaps, const char *str) static int virQEMUCapsExtractDeviceStr(const char *qemu, virQEMUCapsPtr qemuCaps, - virQEMUCapsHookDataPtr hookData) + uid_t runUid, gid_t runGid) { char *output = NULL; virCommandPtr cmd; @@ -1615,7 +1590,7 @@ virQEMUCapsExtractDeviceStr(const char *qemu, * understand '-device name,?', and always exits with status 1 for * the simpler '-device ?', so this function is really only useful * if -help includes "device driver,?". */ - cmd = virQEMUCapsProbeCommand(qemu, qemuCaps, hookData); + cmd = virQEMUCapsProbeCommand(qemu, qemuCaps, runUid, runGid); virCommandAddArgList(cmd, "-device", "?", "-device", "pci-assign,?", @@ -2183,7 +2158,6 @@ virQEMUCapsInitHelp(virQEMUCapsPtr qemuCaps, uid_t runUid, gid_t runGid) char *help = NULL; int ret = -1; const char *tmp; - virQEMUCapsHookData hookData; VIR_DEBUG("qemuCaps=%p", qemuCaps); @@ -2196,9 +2170,7 @@ virQEMUCapsInitHelp(virQEMUCapsPtr qemuCaps, uid_t runUid, gid_t runGid) qemuCaps->arch = virArchFromHost(); } - hookData.runUid = runUid; - hookData.runGid = runGid; - cmd = virQEMUCapsProbeCommand(qemuCaps->binary, NULL, &hookData); + cmd = virQEMUCapsProbeCommand(qemuCaps->binary, NULL, runUid, runGid); virCommandAddArgList(cmd, "-help", NULL); virCommandSetOutputBuffer(cmd, &help); @@ -2227,13 +2199,15 @@ virQEMUCapsInitHelp(virQEMUCapsPtr qemuCaps, uid_t runUid, gid_t runGid) * understands the 0.13.0+ notion of "-device driver,". */ if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE) && strstr(help, "-device driver,?") && - virQEMUCapsExtractDeviceStr(qemuCaps->binary, qemuCaps, &hookData) < 0) + virQEMUCapsExtractDeviceStr(qemuCaps->binary, + qemuCaps, runUid, runGid) < 0) { goto cleanup; + } - if (virQEMUCapsProbeCPUModels(qemuCaps, &hookData) < 0) + if (virQEMUCapsProbeCPUModels(qemuCaps, runUid, runGid) < 0) goto cleanup; - if (virQEMUCapsProbeMachineTypes(qemuCaps, &hookData) < 0) + if (virQEMUCapsProbeMachineTypes(qemuCaps, runUid, runGid) < 0) goto cleanup; ret = 0; @@ -2329,7 +2303,6 @@ virQEMUCapsInitQMP(virQEMUCapsPtr qemuCaps, char *monarg = NULL; char *monpath = NULL; char *pidfile = NULL; - virQEMUCapsHookData hookData; char *archstr; pid_t pid = 0; virDomainObj vm; @@ -2383,9 +2356,8 @@ virQEMUCapsInitQMP(virQEMUCapsPtr qemuCaps, NULL); virCommandAddEnvPassCommon(cmd); virCommandClearCaps(cmd); - hookData.runUid = runUid; - hookData.runGid = runGid; - virCommandSetPreExecHook(cmd, virQEMUCapsHook, &hookData); + virCommandSetGID(cmd, runGid); + virCommandSetUID(cmd, runUid); if (virCommandRun(cmd, &status) < 0) goto cleanup; -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
Setting the uid/gid of the child process was the only thing done by the hook function in this case, and that can now be done more simply with virCommandSetUID/GID. --- Change from V1: rebased.
static virCommandPtr virQEMUCapsProbeCommand(const char *qemu, virQEMUCapsPtr qemuCaps, - virQEMUCapsHookDataPtr hookData) + uid_t runUid, gid_t runGid) { virCommandPtr cmd = virCommandNew(qemu);
@@ -322,7 +295,8 @@ virQEMUCapsProbeCommand(const char *qemu,
virCommandAddEnvPassCommon(cmd); virCommandClearCaps(cmd); - virCommandSetPreExecHook(cmd, virQEMUCapsHook, hookData); + virCommandSetGID(cmd, runGid); + virCommandSetUID(cmd, runUid);
Back to my argument in 4/15 - it looks funny to see virQEMUCapsProbeCommand taking two arguments, just to call into two virCommandSet* calls with one argument, just to be combined back into virSetUIDGID with two arguments. But again, I'm not going to insist on a respin for a cosmetic difference. ACK still stands. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

--- Change from V1: * reset uid/gid to -1 rather than 0 when re-using virCommand src/storage/storage_backend.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-) diff --git a/src/storage/storage_backend.c b/src/storage/storage_backend.c index cab72c6..b0ddd46 100644 --- a/src/storage/storage_backend.c +++ b/src/storage/storage_backend.c @@ -1,7 +1,7 @@ /* * storage_backend.c: internal storage driver backend contract * - * Copyright (C) 2007-2012 Red Hat, Inc. + * Copyright (C) 2007-2013 Red Hat, Inc. * Copyright (C) 2007-2008 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -533,24 +533,6 @@ cleanup: return ret; } -struct hookdata { - virStorageVolDefPtr vol; - bool skip; -}; - -static int virStorageBuildSetUIDHook(void *data) { - struct hookdata *tmp = data; - virStorageVolDefPtr vol = tmp->vol; - - if (tmp->skip) - return 0; - - if (virSetUIDGID(vol->target.perms.uid, vol->target.perms.gid) < 0) - return -1; - - return 0; -} - static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, virStorageVolDefPtr vol, virCommandPtr cmd) { @@ -558,7 +540,6 @@ static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, gid_t gid; uid_t uid; int filecreated = 0; - struct hookdata data = {vol, false}; if ((pool->def->type == VIR_STORAGE_POOL_NETFS) && (((getuid() == 0) @@ -567,7 +548,8 @@ static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, || ((vol->target.perms.gid != -1) && (vol->target.perms.gid != getgid())))) { - virCommandSetPreExecHook(cmd, virStorageBuildSetUIDHook, &data); + virCommandSetUID(cmd, vol->target.perms.uid); + virCommandSetGID(cmd, vol->target.perms.gid); if (virCommandRun(cmd, NULL) == 0) { /* command was successfully run, check if the file was created */ @@ -576,7 +558,9 @@ static int virStorageBackendCreateExecCommand(virStoragePoolObjPtr pool, } } - data.skip = true; + /* don't change uid/gid if we retry */ + virCommandSetUID(cmd, -1); + virCommandSetGID(cmd, -1); if (!filecreated) { if (virCommandRun(cmd, NULL) < 0) { -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
--- Change from V1: * reset uid/gid to -1 rather than 0 when re-using virCommand
src/storage/storage_backend.c | 28 ++++++---------------------- 1 file changed, 6 insertions(+), 22 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This makes it simpler to include the necessary system security driver libraries for a particular system. For this patch, several existing conditional sections from the Makfile were replaced; I'll later be adding SECDRIVER_LIBS to libvirt_util_la_LIBADD, because vircommand.c will be calling a function from $securitylib. --- Change from V1: * initialize SECDRIVER_LIBS to empty * allow both WITH_SECDRIVER_SELINUX and WITH_SECDRIVER_APPARMOR in the same build src/Makefile.am | 33 ++++++++++++--------------------- 1 file changed, 12 insertions(+), 21 deletions(-) diff --git a/src/Makefile.am b/src/Makefile.am index d554aa1..12319e0 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -32,6 +32,14 @@ nodist_conf_DATA = THREAD_LIBS = $(LIB_PTHREAD) $(LTLIBMULTITHREAD) +SECDRIVER_LIBS = +if WITH_SECDRIVER_SELINUX +SECDRIVER_LIBS += $(SELINUX_LIBS) +endif +if WITH_SECDRIVER_APPARMOR +SECDRIVER_LIBS += $(APPARMOR_LIBS) +endif + if WITH_NETWORK UUID=$(shell uuidgen 2>/dev/null) endif @@ -987,12 +995,7 @@ if WITH_BLKID libvirt_driver_lxc_impl_la_CFLAGS += $(BLKID_CFLAGS) libvirt_driver_lxc_impl_la_LIBADD += $(BLKID_LIBS) endif -if WITH_SECDRIVER_SELINUX -libvirt_driver_lxc_impl_la_LIBADD += $(SELINUX_LIBS) -endif -if WITH_SECDRIVER_APPARMOR -libvirt_driver_lxc_impl_la_LIBADD += $(APPARMOR_LIBS) -endif +libvirt_driver_lxc_impl_la_LIBADD += $(SECDRIVER_LIBS) libvirt_driver_lxc_impl_la_SOURCES = $(LXC_DRIVER_SOURCES) conf_DATA += lxc/lxc.conf @@ -1159,12 +1162,7 @@ libvirt_driver_storage_impl_la_CFLAGS = \ -I$(top_srcdir)/src/conf $(AM_CFLAGS) libvirt_driver_storage_impl_la_LDFLAGS = $(AM_LDFLAGS) libvirt_driver_storage_impl_la_LIBADD = -if WITH_SECDRIVER_SELINUX -libvirt_driver_storage_impl_la_LIBADD += $(SELINUX_LIBS) -endif -if WITH_SECDRIVER_APPARMOR -libvirt_driver_storage_impl_la_LIBADD += $(APPARMOR_LIBS) -endif +libvirt_driver_storage_impl_la_LIBADD += $(SECDRIVER_LIBS) if WITH_BLKID libvirt_driver_storage_impl_la_CFLAGS += $(BLKID_CFLAGS) libvirt_driver_storage_impl_la_LIBADD += $(BLKID_LIBS) @@ -1277,16 +1275,14 @@ libvirt_la_BUILT_LIBADD += libvirt_security_manager.la libvirt_security_manager_la_CFLAGS = \ -I$(top_srcdir)/src/conf $(AM_CFLAGS) libvirt_security_manager_la_LDFLAGS = $(AM_LDFLAGS) -libvirt_security_manager_la_LIBADD = +libvirt_security_manager_la_LIBADD = $(SECDRIVER_LIBS) if WITH_SECDRIVER_SELINUX libvirt_security_manager_la_SOURCES += $(SECURITY_DRIVER_SELINUX_SOURCES) libvirt_security_manager_la_CFLAGS += $(SELINUX_CFLAGS) -libvirt_security_manager_la_LIBADD += $(SELINUX_LIBS) endif if WITH_SECDRIVER_APPARMOR libvirt_security_manager_la_SOURCES += $(SECURITY_DRIVER_APPARMOR_SOURCES) libvirt_security_manager_la_CFLAGS += $(APPARMOR_CFLAGS) -libvirt_security_manager_la_LIBADD += $(APPARMOR_LIBS) endif # Add all conditional sources just in case... @@ -1944,12 +1940,7 @@ libvirt_lxc_LDADD = \ if WITH_DTRACE_PROBES libvirt_lxc_LDADD += libvirt_probes.lo endif -if WITH_SECDRIVER_SELINUX -libvirt_lxc_LDADD += $(SELINUX_LIBS) -endif -if WITH_SECDRIVER_APPARMOR -libvirt_lxc_LDADD += $(APPARMOR_LIBS) -endif +libvirt_lxc_LDADD += $(SECDRIVER_LIBS) libvirt_lxc_CFLAGS = \ -I$(top_srcdir)/src/conf \ $(AM_CFLAGS) \ -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
This makes it simpler to include the necessary system security driver libraries for a particular system. For this patch, several existing conditional sections from the Makfile were replaced; I'll later be adding SECDRIVER_LIBS to libvirt_util_la_LIBADD, because vircommand.c will be calling a function from $securitylib. --- Change from V1: * initialize SECDRIVER_LIBS to empty * allow both WITH_SECDRIVER_SELINUX and WITH_SECDRIVER_APPARMOR in the same build
Do you have a handy pointer for how to get apparmor headers installed for building with both libraries on Fedora?
src/Makefile.am | 33 ++++++++++++--------------------- 1 file changed, 12 insertions(+), 21 deletions(-)
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 02/12/2013 07:24 PM, Eric Blake wrote:
On 02/12/2013 01:15 PM, Laine Stump wrote:
This makes it simpler to include the necessary system security driver libraries for a particular system. For this patch, several existing conditional sections from the Makfile were replaced; I'll later be adding SECDRIVER_LIBS to libvirt_util_la_LIBADD, because vircommand.c will be calling a function from $securitylib. --- Change from V1: * initialize SECDRIVER_LIBS to empty * allow both WITH_SECDRIVER_SELINUX and WITH_SECDRIVER_APPARMOR in the same build Do you have a handy pointer for how to get apparmor headers installed for building with both libraries on Fedora?
No idea. I just learned from Dan that it was at least theoretically possible, and am counting on debian/ubuntu people to actually test this bit, since I don't have any system that uses AppArmor.

Laine Stump wrote:
On 02/12/2013 07:24 PM, Eric Blake wrote:
On 02/12/2013 01:15 PM, Laine Stump wrote:
This makes it simpler to include the necessary system security driver libraries for a particular system. For this patch, several existing conditional sections from the Makfile were replaced; I'll later be adding SECDRIVER_LIBS to libvirt_util_la_LIBADD, because vircommand.c will be calling a function from $securitylib. --- Change from V1: * initialize SECDRIVER_LIBS to empty * allow both WITH_SECDRIVER_SELINUX and WITH_SECDRIVER_APPARMOR in the same build
Do you have a handy pointer for how to get apparmor headers installed for building with both libraries on Fedora?
No idea. I just learned from Dan that it was at least theoretically possible, and am counting on debian/ubuntu people to actually test this bit, since I don't have any system that uses AppArmor.
I verified this builds with both apparmor and selinux enabled. Regards, Jim

virCommand gets two new APIs: virCommandSetSELinuxLabel() and virCommandSetAppArmorProfile(), which both save a copy of a null-terminated string in the virCommand. During virCommandRun, if the string is non-NULL and we've been compiled with AppArmor and/or SELinux security driver support, the appropriate security library function is called for the child process, using the string that was previously set. In the case of SELinux, setexeccon_raw() is called, and for AppArmor, aa_change_profile() is called. This functionality has been added so that users of virCommand can use the upcoming virSecurityManagerSetChildProcessLabel() prior to running a child process, rather than needing to setup a hook function to be called (and in turn call virSecurityManagerSetProcessLabel()) *during* the setup of the child process. --- Change from V1: * V1 had a single API that did double duty for both SELinux and AppArmor (because I didn't realize both could be built in simultaneously). V1 treats each separately, with two different APIs. src/Makefile.am | 3 +- src/libvirt_private.syms | 2 + src/util/vircommand.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/vircommand.h | 6 +++ 4 files changed, 105 insertions(+), 1 deletion(-) diff --git a/src/Makefile.am b/src/Makefile.am index 12319e0..0d2f2f8 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -765,7 +765,8 @@ libvirt_util_la_CFLAGS = $(CAPNG_CFLAGS) $(YAJL_CFLAGS) $(LIBNL_CFLAGS) \ $(DBUS_CFLAGS) $(LDEXP_LIBM) libvirt_util_la_LIBADD = $(CAPNG_LIBS) $(YAJL_LIBS) $(LIBNL_LIBS) \ $(THREAD_LIBS) $(AUDIT_LIBS) $(DEVMAPPER_LIBS) \ - $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) + $(LIB_CLOCK_GETTIME) $(DBUS_LIBS) $(MSCOM_LIBS) $(LIBXML_LIBS) \ + $(SECDRIVER_LIBS) noinst_LTLIBRARIES += libvirt_conf.la diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 511a686..8241e6f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -156,6 +156,7 @@ virCommandPreserveFD; virCommandRequireHandshake; virCommandRun; virCommandRunAsync; +virCommandSetAppArmorProfile; virCommandSetErrorBuffer; virCommandSetErrorFD; virCommandSetGID; @@ -165,6 +166,7 @@ virCommandSetOutputBuffer; virCommandSetOutputFD; virCommandSetPidFile; virCommandSetPreExecHook; +virCommandSetSELinuxLabel; virCommandSetUID; virCommandSetWorkingDirectory; virCommandToString; diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 84d275f..9727809 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -33,6 +33,13 @@ # include <cap-ng.h> #endif +#if defined(WITH_SECDRIVER_SELINUX) +# include <selinux/selinux.h> +#endif +#if defined(WITH_SECDRIVER_APPARMOR) +# include <sys/apparmor.h> +#endif + #include "vircommand.h" #include "viralloc.h" #include "virerror.h" @@ -104,6 +111,12 @@ struct _virCommand { uid_t uid; gid_t gid; unsigned long long capabilities; +#if defined(WITH_SECDRIVER_SELINUX) + char *seLinuxLabel; +#endif +#if defined(WITH_SECDRIVER_APPARMOR) + char *appArmorProfile; +#endif }; static int virCommandHandshakeChild(virCommandPtr cmd); @@ -607,6 +620,32 @@ virExec(virCommandPtr cmd) goto fork_error; } +# if defined(WITH_SECDRIVER_SELINUX) + if (cmd->seLinuxLabel) { + VIR_DEBUG("Setting child security label to %s", cmd->seLinuxLabel); + if (setexeccon_raw(cmd->seLinuxLabel) == -1) { + virReportSystemError(errno, + _("unable to set SELinux security context " + "'%s' for '%s'"), + cmd->seLinuxLabel, cmd->args[0]); + if (security_getenforce() == 1) + goto fork_error; + } + } +# endif +# if defined(WITH_SECDRIVER_APPARMOR) + if (cmd->appArmorProfile) { + VIR_DEBUG("Setting child AppArmor profile to %s", cmd->appArmorProfile); + if (aa_change_profile(cmd->appArmorProfile) < 0) { + virReportSystemError(errno, + _("unable to set AppArmor profile '%s' " + "for '%s'"), + cmd->appArmorProfile, cmd->args[0]); + goto fork_error; + } + } +# endif + if (cmd->uid != (uid_t)-1 || cmd->gid != (gid_t)-1) { VIR_DEBUG("Setting child uid:gid to %d:%d", (int)cmd->uid, (int)cmd->gid); @@ -967,6 +1006,56 @@ virCommandAllowCap(virCommandPtr cmd, } +/** + * virCommandSetSELinuxLabel: + * @cmd: the command to modify + * @label: the SELinux label to use for the child process + * + * Saves a copy of @label to use when setting the SELinux context + * label (with setexeccon_raw()) after the child process has been + * started. If SELinux isn't compiled into libvirt, or if label is + * NULL, nothing will be done. + */ +void +virCommandSetSELinuxLabel(virCommandPtr cmd, + const char *label ATTRIBUTE_UNUSED) +{ + if (!cmd || cmd->has_error) + return; + +#if defined(WITH_SECDRIVER_SELINUX) + VIR_FREE(cmd->seLinuxLabel); + if (label && !(cmd->seLinuxLabel = strdup(label))) + cmd->has_error = ENOMEM; +#endif + return; +} + + +/** + * virCommandSetAppArmorProfile: + * @cmd: the command to modify + * @profile: the AppArmor profile to use + * + * Saves a copy of @profile to use when aa_change_profile() after the + * child process has been started. If AppArmor support isn't + * configured into libvirt, or if profile is NULL, nothing will be done. + */ +void +virCommandSetAppArmorProfile(virCommandPtr cmd, + const char *profile ATTRIBUTE_UNUSED) +{ + if (!cmd || cmd->has_error) + return; + +#if defined(WITH_SECDRIVER_APPARMOR) + VIR_FREE(cmd->appArmorProfile); + if (profile && !(cmd->appArmorProfile = strdup(profile))) + cmd->has_error = ENOMEM; +#endif + return; +} + /** * virCommandDaemonize: @@ -2715,6 +2804,12 @@ virCommandFree(virCommandPtr cmd) VIR_FREE(cmd->transfer); VIR_FREE(cmd->preserve); +#if defined(WITH_SECDRIVER_SELINUX) + VIR_FREE(cmd->seLinuxLabel); +#endif +#if defined(WITH_SECDRIVER_APPARMOR) + VIR_FREE(cmd->appArmorProfile); +#endif VIR_FREE(cmd); } diff --git a/src/util/vircommand.h b/src/util/vircommand.h index a4022fa..6c13795 100644 --- a/src/util/vircommand.h +++ b/src/util/vircommand.h @@ -70,6 +70,12 @@ void virCommandClearCaps(virCommandPtr cmd); void virCommandAllowCap(virCommandPtr cmd, int capability); +void virCommandSetSELinuxLabel(virCommandPtr cmd, + const char *label); + +void virCommandSetAppArmorProfile(virCommandPtr cmd, + const char *profile); + void virCommandDaemonize(virCommandPtr cmd); void virCommandNonblockingFDs(virCommandPtr cmd); -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
virCommand gets two new APIs: virCommandSetSELinuxLabel() and virCommandSetAppArmorProfile(), which both save a copy of a null-terminated string in the virCommand. During virCommandRun, if the string is non-NULL and we've been compiled with AppArmor and/or SELinux security driver support, the appropriate security library function is called for the child process, using the string that was previously set. In the case of SELinux, setexeccon_raw() is called, and for AppArmor, aa_change_profile() is called.
This functionality has been added so that users of virCommand can use the upcoming virSecurityManagerSetChildProcessLabel() prior to running a child process, rather than needing to setup a hook function to be called (and in turn call virSecurityManagerSetProcessLabel()) *during* the setup of the child process. --- Change from V1:
* V1 had a single API that did double duty for both SELinux and AppArmor (because I didn't realize both could be built in simultaneously). V1 treats each separately, with two different APIs.
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The existing virSecurityManagerSetProcessLabel() API is designed so that it must be called after forking the child process, but before exec'ing the child. Due to the way the virCommand API works, that means it needs to be put in a "hook" function that virCommand is told to call out to at that time. Setting the child process label is a basic enough need when executing any process that virCommand should have a method of doing that. But virCommand must be told what label to set, and only the security driver knows the answer to that question. The new virSecurityManagerSet*Child*ProcessLabel() API is the way to transfer the knowledge about what label to set from the security driver to the virCommand object. It is given a virCommandPtr, and each security driver calls the appropriate virCommand* API to tell virCommand what to do between fork and exec. 1) in the case of the DAC security driver, it calls virCommandSetUID/GID() to set a uid and gid that must be set for the child process. 2) for the SELinux security driver, it calls virCommandSetSELinuxLabel() to save a copy of the char* that will be sent to setexeccon_raw() *after forking the child process*. 3) for the AppArmor security drivers, it calls virCommandSetAppArmorProfile() to save a copy of the char* that will be sent to aa_change_profile() *after forking the child process*. With this new API in place, we will be able to remove virSecurityManagerSetProcessLabel() from any virCommand pre-exec hooks. (Unfortunately, the LXC driver uses clone() rather than virCommand, so it can't take advantage of this new security driver API, meaning that we need to keep around the older virSecurityManagerSetProcessLabel(), at least for now.) --- Change from V1: rebased, Copyright dates simplified. src/libvirt_private.syms | 1 + src/security/security_apparmor.c | 42 +++++++++++++++++++++++++++++++++++++++- src/security/security_dac.c | 24 ++++++++++++++++++++++- src/security/security_driver.h | 6 +++++- src/security/security_manager.c | 13 ++++++++++++- src/security/security_manager.h | 6 +++++- src/security/security_nop.c | 10 +++++++++- src/security/security_selinux.c | 32 ++++++++++++++++++++++++++++++ src/security/security_stack.c | 20 ++++++++++++++++++- 9 files changed, 147 insertions(+), 7 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 8241e6f..dcdcb67 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1068,6 +1068,7 @@ virSecurityManagerRestoreHostdevLabel; virSecurityManagerRestoreImageLabel; virSecurityManagerRestoreSavedStateLabel; virSecurityManagerSetAllLabel; +virSecurityManagerSetChildProcessLabel; virSecurityManagerSetDaemonSocketLabel; virSecurityManagerSetHostdevLabel; virSecurityManagerSetHugepages; diff --git a/src/security/security_apparmor.c b/src/security/security_apparmor.c index 532b21b..ddc1fe4 100644 --- a/src/security/security_apparmor.c +++ b/src/security/security_apparmor.c @@ -1,7 +1,7 @@ /* * AppArmor security driver for libvirt * - * Copyright (C) 2011 Red Hat, Inc. + * Copyright (C) 2011-2013 Red Hat, Inc. * Copyright (C) 2009-2010 Canonical Ltd. * * This library is free software; you can redistribute it and/or @@ -627,6 +627,45 @@ AppArmorSetSecurityProcessLabel(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, return rc; } +/* Called directly by API user prior to virCommandRun(). + * virCommandRun() will then call aa_change_profile() (if a + * cmd->appArmorProfile has been set) *after forking the child + * process*. + */ +static int +AppArmorSetSecurityChildProcessLabel(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, + virDomainDefPtr def, + virCommandPtr cmd) +{ + int rc = -1; + char *profile_name = NULL; + const virSecurityLabelDefPtr secdef = + virDomainDefGetSecurityLabelDef(def, SECURITY_APPARMOR_NAME); + + if (!secdef) + goto cleanup; + + if (STRNEQ(SECURITY_APPARMOR_NAME, secdef->model)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("security label driver mismatch: " + "\'%s\' model configured for domain, but " + "hypervisor driver is \'%s\'."), + secdef->model, SECURITY_APPARMOR_NAME); + if (use_apparmor() > 0) + goto cleanup; + } + + if ((profile_name = get_profile_name(def)) == NULL) + goto cleanup; + + virCommandSetAppArmorProfile(cmd, profile_name); + rc = 0; + + cleanup: + VIR_FREE(profile_name); + return rc; +} + static int AppArmorSetSecurityDaemonSocketLabel(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, virDomainDefPtr vm ATTRIBUTE_UNUSED) @@ -927,6 +966,7 @@ virSecurityDriver virAppArmorSecurityDriver = { .domainGetSecurityProcessLabel = AppArmorGetSecurityProcessLabel, .domainSetSecurityProcessLabel = AppArmorSetSecurityProcessLabel, + .domainSetSecurityChildProcessLabel = AppArmorSetSecurityChildProcessLabel, .domainSetSecurityAllLabel = AppArmorSetSecurityAllLabel, .domainRestoreSecurityAllLabel = AppArmorRestoreSecurityAllLabel, diff --git a/src/security/security_dac.c b/src/security/security_dac.c index ae489e2..b115bb0 100644 --- a/src/security/security_dac.c +++ b/src/security/security_dac.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2010-2012 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -878,6 +878,27 @@ virSecurityDACSetProcessLabel(virSecurityManagerPtr mgr, static int +virSecurityDACSetChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def ATTRIBUTE_UNUSED, + virCommandPtr cmd) +{ + uid_t user; + gid_t group; + virSecurityDACDataPtr priv = virSecurityManagerGetPrivateData(mgr); + + if (virSecurityDACGetIds(def, priv, &user, &group)) + return -1; + + VIR_DEBUG("Setting child to drop privileges of DEF to %u:%u", + (unsigned int) user, (unsigned int) group); + + virCommandSetUID(cmd, user); + virCommandSetGID(cmd, group); + return 0; +} + + +static int virSecurityDACVerify(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, virDomainDefPtr def ATTRIBUTE_UNUSED) { @@ -1072,6 +1093,7 @@ virSecurityDriver virSecurityDriverDAC = { .domainGetSecurityProcessLabel = virSecurityDACGetProcessLabel, .domainSetSecurityProcessLabel = virSecurityDACSetProcessLabel, + .domainSetSecurityChildProcessLabel = virSecurityDACSetChildProcessLabel, .domainSetSecurityAllLabel = virSecurityDACSetSecurityAllLabel, .domainRestoreSecurityAllLabel = virSecurityDACRestoreSecurityAllLabel, diff --git a/src/security/security_driver.h b/src/security/security_driver.h index 6b775ab..cc401e1 100644 --- a/src/security/security_driver.h +++ b/src/security/security_driver.h @@ -1,5 +1,5 @@ /* - * Copyright (C) 2008, 2010 Red Hat, Inc. + * Copyright (C) 2008, 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -92,6 +92,9 @@ typedef int (*virSecurityDomainGetProcessLabel) (virSecurityManagerPtr mgr, virSecurityLabelPtr sec); typedef int (*virSecurityDomainSetProcessLabel) (virSecurityManagerPtr mgr, virDomainDefPtr def); +typedef int (*virSecurityDomainSetChildProcessLabel) (virSecurityManagerPtr mgr, + virDomainDefPtr def, + virCommandPtr cmd); typedef int (*virSecurityDomainSecurityVerify) (virSecurityManagerPtr mgr, virDomainDefPtr def); typedef int (*virSecurityDomainSetImageFDLabel) (virSecurityManagerPtr mgr, @@ -131,6 +134,7 @@ struct _virSecurityDriver { virSecurityDomainGetProcessLabel domainGetSecurityProcessLabel; virSecurityDomainSetProcessLabel domainSetSecurityProcessLabel; + virSecurityDomainSetChildProcessLabel domainSetSecurityChildProcessLabel; virSecurityDomainSetAllLabel domainSetSecurityAllLabel; virSecurityDomainRestoreAllLabel domainRestoreSecurityAllLabel; diff --git a/src/security/security_manager.c b/src/security/security_manager.c index 50962ba..c621366 100644 --- a/src/security/security_manager.c +++ b/src/security/security_manager.c @@ -1,7 +1,7 @@ /* * security_manager.c: Internal security manager API * - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -571,6 +571,17 @@ int virSecurityManagerSetProcessLabel(virSecurityManagerPtr mgr, return -1; } +int virSecurityManagerSetChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virCommandPtr cmd) +{ + if (mgr->drv->domainSetSecurityChildProcessLabel) + return mgr->drv->domainSetSecurityChildProcessLabel(mgr, vm, cmd); + + virReportError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + return -1; +} + int virSecurityManagerVerify(virSecurityManagerPtr mgr, virDomainDefPtr def) { diff --git a/src/security/security_manager.h b/src/security/security_manager.h index 8e8accf..711b354 100644 --- a/src/security/security_manager.h +++ b/src/security/security_manager.h @@ -1,7 +1,7 @@ /* * security_manager.h: Internal security manager API * - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -24,6 +24,7 @@ # define VIR_SECURITY_MANAGER_H__ # include "domain_conf.h" +# include "vircommand.h" typedef struct _virSecurityManager virSecurityManager; typedef virSecurityManager *virSecurityManagerPtr; @@ -103,6 +104,9 @@ int virSecurityManagerGetProcessLabel(virSecurityManagerPtr mgr, virSecurityLabelPtr sec); int virSecurityManagerSetProcessLabel(virSecurityManagerPtr mgr, virDomainDefPtr def); +int virSecurityManagerSetChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr def, + virCommandPtr cmd); int virSecurityManagerVerify(virSecurityManagerPtr mgr, virDomainDefPtr def); int virSecurityManagerSetImageFDLabel(virSecurityManagerPtr mgr, diff --git a/src/security/security_nop.c b/src/security/security_nop.c index b6eb3f6..2b9767e 100644 --- a/src/security/security_nop.c +++ b/src/security/security_nop.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -157,6 +157,13 @@ static int virSecurityDomainSetProcessLabelNop(virSecurityManagerPtr mgr ATTRIBU return 0; } +static int virSecurityDomainSetChildProcessLabelNop(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, + virDomainDefPtr vm ATTRIBUTE_UNUSED, + virCommandPtr cmd ATTRIBUTE_UNUSED) +{ + return 0; +} + static int virSecurityDomainVerifyNop(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, virDomainDefPtr def ATTRIBUTE_UNUSED) { @@ -207,6 +214,7 @@ virSecurityDriver virSecurityDriverNop = { .domainGetSecurityProcessLabel = virSecurityDomainGetProcessLabelNop, .domainSetSecurityProcessLabel = virSecurityDomainSetProcessLabelNop, + .domainSetSecurityChildProcessLabel = virSecurityDomainSetChildProcessLabelNop, .domainSetSecurityAllLabel = virSecurityDomainSetAllLabelNop, .domainRestoreSecurityAllLabel = virSecurityDomainRestoreAllLabelNop, diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c index 8173b20..a61e0f0 100644 --- a/src/security/security_selinux.c +++ b/src/security/security_selinux.c @@ -1859,6 +1859,37 @@ virSecuritySELinuxSetSecurityProcessLabel(virSecurityManagerPtr mgr ATTRIBUTE_UN } static int +virSecuritySELinuxSetSecurityChildProcessLabel(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, + virDomainDefPtr def, + virCommandPtr cmd) +{ + /* TODO: verify DOI */ + virSecurityLabelDefPtr secdef; + + secdef = virDomainDefGetSecurityLabelDef(def, SECURITY_SELINUX_NAME); + if (secdef == NULL) + return -1; + + if (secdef->label == NULL) + return 0; + + VIR_DEBUG("label=%s", secdef->label); + if (!STREQ(SECURITY_SELINUX_NAME, secdef->model)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("security label driver mismatch: " + "'%s' model configured for domain, but " + "hypervisor driver is '%s'."), + secdef->model, SECURITY_SELINUX_NAME); + if (security_getenforce() == 1) + return -1; + } + + /* save in cmd to be set after fork/before child process is exec'ed */ + virCommandSetSELinuxLabel(cmd, secdef->label); + return 0; +} + +static int virSecuritySELinuxSetSecurityDaemonSocketLabel(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, virDomainDefPtr def) { @@ -2261,6 +2292,7 @@ virSecurityDriver virSecurityDriverSELinux = { .domainGetSecurityProcessLabel = virSecuritySELinuxGetSecurityProcessLabel, .domainSetSecurityProcessLabel = virSecuritySELinuxSetSecurityProcessLabel, + .domainSetSecurityChildProcessLabel = virSecuritySELinuxSetSecurityChildProcessLabel, .domainSetSecurityAllLabel = virSecuritySELinuxSetSecurityAllLabel, .domainRestoreSecurityAllLabel = virSecuritySELinuxRestoreSecurityAllLabel, diff --git a/src/security/security_stack.c b/src/security/security_stack.c index e2d0b1d..14d757d 100644 --- a/src/security/security_stack.c +++ b/src/security/security_stack.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2010-2011 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -367,6 +367,23 @@ virSecurityStackSetProcessLabel(virSecurityManagerPtr mgr, } static int +virSecurityStackSetChildProcessLabel(virSecurityManagerPtr mgr, + virDomainDefPtr vm, + virCommandPtr cmd) +{ + virSecurityStackDataPtr priv = virSecurityManagerGetPrivateData(mgr); + virSecurityStackItemPtr item = priv->itemsHead; + int rc = 0; + + for (; item; item = item->next) { + if (virSecurityManagerSetChildProcessLabel(item->securityManager, vm, cmd) < 0) + rc = -1; + } + + return rc; +} + +static int virSecurityStackGetProcessLabel(virSecurityManagerPtr mgr, virDomainDefPtr vm, pid_t pid, @@ -540,6 +557,7 @@ virSecurityDriver virSecurityDriverStack = { .domainGetSecurityProcessLabel = virSecurityStackGetProcessLabel, .domainSetSecurityProcessLabel = virSecurityStackSetProcessLabel, + .domainSetSecurityChildProcessLabel = virSecurityStackSetChildProcessLabel, .domainSetSecurityAllLabel = virSecurityStackSetSecurityAllLabel, .domainRestoreSecurityAllLabel = virSecurityStackRestoreSecurityAllLabel, -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
The existing virSecurityManagerSetProcessLabel() API is designed so that it must be called after forking the child process, but before exec'ing the child. Due to the way the virCommand API works, that means it needs to be put in a "hook" function that virCommand is told to call out to at that time.
With this new API in place, we will be able to remove virSecurityManagerSetProcessLabel() from any virCommand pre-exec hooks.
(Unfortunately, the LXC driver uses clone() rather than virCommand, so it can't take advantage of this new security driver API, meaning that we need to keep around the older virSecurityManagerSetProcessLabel(), at least for now.) --- Change from V1: rebased, Copyright dates simplified.
ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

The qemu driver had been calling virSecurityManagerSetProcessLabel() from a "pre-exec hook" function that is run after the child is forked, but before exec'ing qemu. This is problematic because the uid and gid of the child are set by the security driver, but capabilities are dropped by virCommand - such separation doesn't work; the two operations must be done together or the capabilities do not transfer properly to the child process. This patch switches to using virSecurityManagerSetChildProcessLabel(), which is called prior to virCommandRun() (rather than being called *during* virCommandrun() by the hook function), and doesn't set the UID/GID/security label directly, but instead merely informs virCommand what it should set them all to when the time is appropriate. This lets virCommand choose to do the uid/gid and caps dropping all at the same time if it wants (it does *want* to, but isn't doing so yet; that's for an upcoming patch). --- Change from V1: rebased. src/qemu/qemu_process.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 12e3544..6c70246 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -1,7 +1,7 @@ /* * qemu_process.h: QEMU process management * - * Copyright (C) 2006-2012 Red Hat, Inc. + * Copyright (C) 2006-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -2829,10 +2829,6 @@ static int qemuProcessHook(void *data) if (qemuProcessInitNumaMemoryPolicy(h->vm, h->nodemask) < 0) goto cleanup; - VIR_DEBUG("Setting up security labelling"); - if (virSecurityManagerSetProcessLabel(h->driver->securityManager, h->vm->def) < 0) - goto cleanup; - ret = 0; cleanup: @@ -3972,6 +3968,12 @@ int qemuProcessStart(virConnectPtr conn, virCommandSetPreExecHook(cmd, qemuProcessHook, &hookData); + VIR_DEBUG("Setting up security labelling"); + if (virSecurityManagerSetChildProcessLabel(driver->securityManager, + vm->def, cmd) < 0) { + goto cleanup; + } + virCommandSetOutputFD(cmd, &logfile); virCommandSetErrorFD(cmd, &logfile); virCommandNonblockingFDs(cmd); -- 1.8.1

This is an interim measure to make sure everything still works in this order. The next step will be to perform capabilities drop and setuid/gid as a single operation (which is the only way to keep any capabilities when switching to a non-root uid). --- Change from V1: rebased. src/util/vircommand.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 9727809..4b1fc8d 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -653,6 +653,12 @@ virExec(virCommandPtr cmd) goto fork_error; } + /* The steps above may need todo something privileged, so + * we delay clearing capabilities until the last minute */ + if (cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) + if (virSetCapabilities(cmd->capabilities) < 0) + goto fork_error; + if (cmd->pwd) { VIR_DEBUG("Running child in %s", cmd->pwd); if (chdir(cmd->pwd) < 0) { @@ -671,12 +677,6 @@ virExec(virCommandPtr cmd) goto fork_error; } - /* The steps above may need todo something privileged, so - * we delay clearing capabilities until the last minute */ - if (cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) - if (virSetCapabilities(cmd->capabilities) < 0) - goto fork_error; - /* Close logging again to ensure no FDs leak to child */ virLogReset(); -- 1.8.1

Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process. In order to set a non-0 uid while still maintaining any capabilities bits, it is necessary to either call capng_change_id() (which unfortunately doesn't currently call initgroups to setup auxiliary group membership), or to perform the small amount of calisthenics contained in the new utility function virSetUIDGIDWithCaps(). Another very important difference between the capabilities setting/clearing in virSetUIDGIDWithCaps() and virCommand's virSetCapabilities() (which it will replace in the next patch) is that the new function properly clears the capabilities bounding set, so it will not be possible for a child process to set any new capabilities. A short description of what is done by virSetUIDGIDWithCaps(): 1) clear all capabilities then set all those desired by the caller (in capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed to change the capabilities bounding set). 2) call prctl(), telling it that we want to maintain current capabilities across an upcoming setuid(). 3) switch to the new uid/gid 4) again call prctl(), telling it we will no longer want capabilities maintained if this process does another setuid(). 5) clear the capabilities that we added to allow us to setuid/setgid/change the bounding set (unless they were also requested by the caller via the virCommand API). Because the modification/maintaining of capabilities is intermingled with setting the uid, this is necessarily done in a single function, rather than having two independent functions. Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio"). --- Change from V1: * properly cast when comparing gid/uid, and only short circuit for -1 (not 0) * fix // style comments * add ATTRIBUTE_UNUSED where appropriate for capBits argument. src/libvirt_private.syms | 1 + src/util/virutil.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virutil.h | 1 + 3 files changed, 113 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index dcdcb67..d8d5877 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1312,6 +1312,7 @@ virSetDeviceUnprivSGIO; virSetInherit; virSetNonBlock; virSetUIDGID; +virSetUIDGIDWithCaps; virSkipSpaces; virSkipSpacesAndBackslash; virSkipSpacesBackwards; diff --git a/src/util/virutil.c b/src/util/virutil.c index 0d7db00..28fcc2f 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -60,6 +60,7 @@ #endif #if WITH_CAPNG # include <cap-ng.h> +# include <sys/prctl.h> #endif #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R # include <mntent.h> @@ -2990,6 +2991,116 @@ virGetGroupName(gid_t gid ATTRIBUTE_UNUSED) } #endif /* HAVE_GETPWUID_R */ +#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return + * 0 on success, -1 on failure (the original system error remains in + * errno). + */ +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + int ii, capng_ret, ret = -1; + bool need_setgid = false, need_setuid = false; + bool need_prctl = false, need_setpcap = false; + + /* First drop all caps except those in capBits + the extra ones we + * need to change uid/gid and change the capabilities bounding + * set. + */ + + capng_clear(CAPNG_SELECT_BOTH); + + for (ii = 0; ii <= CAP_LAST_CAP; ii++) { + if (capBits & (1ULL << ii)) { + capng_update(CAPNG_ADD, + CAPNG_EFFECTIVE|CAPNG_INHERITABLE| + CAPNG_PERMITTED|CAPNG_BOUNDING_SET, + ii); + } + } + + if (gid != (gid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETGID)) { + need_setgid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + } + if (uid != (uid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETUID)) { + need_setuid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + } +# ifdef PR_CAPBSET_DROP + /* If newer kernel, we need also need setpcap to change the bounding set */ + if ((capBits || need_setgid || need_setuid) && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) { + need_setpcap = true; + } + if (need_setpcap) + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); +# endif + + need_prctl = capBits || need_setgid || need_setuid || need_setpcap; + + /* Tell system we want to keep caps across uid change */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to set KEEPCAPS")); + goto cleanup; + } + + /* Change to the temp capabilities */ + if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + } + + if (virSetUIDGID(uid, gid) < 0) + goto cleanup; + + /* Tell it we are done keeping capabilities */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to reset KEEPCAPS")); + goto cleanup; + } + + /* Drop the caps that allow setuid/gid (unless they were requested) */ + if (need_setgid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + if (need_setuid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + /* Throw away CAP_SETPCAP so no more changes */ + if (need_setpcap) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); + + if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + ret = -1; + goto cleanup; + } + + ret = 0; +cleanup: + return ret; +} + +#else +/* + * On platforms without libcapng, the capabilities setting is treated + * as a NOP. + */ + +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, + unsigned long long capBits ATTRIBUTE_UNUSED) +{ + return virSetUIDGID(uid, gid); +} +#endif + #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virutil.h b/src/util/virutil.h index 4201aa1..2dc3403 100644 --- a/src/util/virutil.h +++ b/src/util/virutil.h @@ -54,6 +54,7 @@ int virPipeReadUntilEOF(int outfd, int errfd, char **outbuf, char **errbuf); int virSetUIDGID(uid_t uid, gid_t gid); +int virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits); int virFileReadLimFD(int fd, int maxlen, char **buf) ATTRIBUTE_RETURN_CHECK; -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process.
Because the modification/maintaining of capabilities is intermingled with setting the uid, this is necessarily done in a single function, rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio"). --- Change from V1: * properly cast when comparing gid/uid, and only short circuit for -1 (not 0) * fix // style comments * add ATTRIBUTE_UNUSED where appropriate for capBits argument.
ACK with nits fixed:
@@ -2990,6 +2991,116 @@ virGetGroupName(gid_t gid ATTRIBUTE_UNUSED) } #endif /* HAVE_GETPWUID_R */
+#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return
s/return/Return/ -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, Feb 12, 2013 at 2:15 PM, Laine Stump <laine@laine.org> wrote:
Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities bits, it is necessary to either call capng_change_id() (which unfortunately doesn't currently call initgroups to setup auxiliary group membership), or to perform the small amount of calisthenics contained in the new utility function virSetUIDGIDWithCaps().
Another very important difference between the capabilities setting/clearing in virSetUIDGIDWithCaps() and virCommand's virSetCapabilities() (which it will replace in the next patch) is that the new function properly clears the capabilities bounding set, so it will not be possible for a child process to set any new capabilities.
A short description of what is done by virSetUIDGIDWithCaps():
1) clear all capabilities then set all those desired by the caller (in capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed to change the capabilities bounding set).
2) call prctl(), telling it that we want to maintain current capabilities across an upcoming setuid().
3) switch to the new uid/gid
4) again call prctl(), telling it we will no longer want capabilities maintained if this process does another setuid().
5) clear the capabilities that we added to allow us to setuid/setgid/change the bounding set (unless they were also requested by the caller via the virCommand API).
Because the modification/maintaining of capabilities is intermingled with setting the uid, this is necessarily done in a single function, rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio"). --- Change from V1: * properly cast when comparing gid/uid, and only short circuit for -1 (not 0) * fix // style comments * add ATTRIBUTE_UNUSED where appropriate for capBits argument.
src/libvirt_private.syms | 1 + src/util/virutil.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virutil.h | 1 + 3 files changed, 113 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index dcdcb67..d8d5877 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1312,6 +1312,7 @@ virSetDeviceUnprivSGIO; virSetInherit; virSetNonBlock; virSetUIDGID; +virSetUIDGIDWithCaps; virSkipSpaces; virSkipSpacesAndBackslash; virSkipSpacesBackwards; diff --git a/src/util/virutil.c b/src/util/virutil.c index 0d7db00..28fcc2f 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -60,6 +60,7 @@ #endif #if WITH_CAPNG # include <cap-ng.h> +# include <sys/prctl.h> #endif #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R # include <mntent.h> @@ -2990,6 +2991,116 @@ virGetGroupName(gid_t gid ATTRIBUTE_UNUSED) } #endif /* HAVE_GETPWUID_R */
+#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return + * 0 on success, -1 on failure (the original system error remains in + * errno). + */ +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + int ii, capng_ret, ret = -1; + bool need_setgid = false, need_setuid = false; + bool need_prctl = false, need_setpcap = false; + + /* First drop all caps except those in capBits + the extra ones we + * need to change uid/gid and change the capabilities bounding + * set. + */ + + capng_clear(CAPNG_SELECT_BOTH); + + for (ii = 0; ii <= CAP_LAST_CAP; ii++) { + if (capBits & (1ULL << ii)) { + capng_update(CAPNG_ADD, + CAPNG_EFFECTIVE|CAPNG_INHERITABLE| + CAPNG_PERMITTED|CAPNG_BOUNDING_SET, + ii); + } + } + + if (gid != (gid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETGID)) { + need_setgid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + } + if (uid != (uid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETUID)) { + need_setuid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + } +# ifdef PR_CAPBSET_DROP + /* If newer kernel, we need also need setpcap to change the bounding set */ + if ((capBits || need_setgid || need_setuid) && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) { + need_setpcap = true; + } + if (need_setpcap) + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); +# endif + + need_prctl = capBits || need_setgid || need_setuid || need_setpcap; + + /* Tell system we want to keep caps across uid change */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to set KEEPCAPS")); + goto cleanup; + } + + /* Change to the temp capabilities */ + if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + } + + if (virSetUIDGID(uid, gid) < 0) + goto cleanup; + + /* Tell it we are done keeping capabilities */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to reset KEEPCAPS")); + goto cleanup; + } + + /* Drop the caps that allow setuid/gid (unless they were requested) */ + if (need_setgid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + if (need_setuid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + /* Throw away CAP_SETPCAP so no more changes */ + if (need_setpcap) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); + + if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + ret = -1; + goto cleanup; + } + + ret = 0; +cleanup: + return ret; +} + +#else +/* + * On platforms without libcapng, the capabilities setting is treated + * as a NOP. + */ + +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, + unsigned long long capBits ATTRIBUTE_UNUSED) +{ + return virSetUIDGID(uid, gid); +} +#endif +
#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virutil.h b/src/util/virutil.h index 4201aa1..2dc3403 100644 --- a/src/util/virutil.h +++ b/src/util/virutil.h @@ -54,6 +54,7 @@ int virPipeReadUntilEOF(int outfd, int errfd, char **outbuf, char **errbuf);
int virSetUIDGID(uid_t uid, gid_t gid); +int virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits);
int virFileReadLimFD(int fd, int maxlen, char **buf) ATTRIBUTE_RETURN_CHECK;
-- 1.8.1
The following error bisect's down to this commit when running out of my local checkout for testing. 2013-02-16 05:16:55.102+0000: 29992: error : virCommandWait:2270 : internal error Child process (LC_ALL=C LD_LIBRARY_PATH=/home/cardoe/work/libvirt/src/.libs PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.6.3:/usr/games/bin HOME=/home/cardoe USER=cardoe LOGNAME=cardoe /usr/bin/qemu-kvm -help) unexpected exit status 1: libvir: error : internal error cannot apply process capabilities -1 -- Doug Goldstein

On 02/16/2013 12:20 AM, Doug Goldstein wrote:
On Tue, Feb 12, 2013 at 2:15 PM, Laine Stump <laine@laine.org> wrote:
Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities bits, it is necessary to either call capng_change_id() (which unfortunately doesn't currently call initgroups to setup auxiliary group membership), or to perform the small amount of calisthenics contained in the new utility function virSetUIDGIDWithCaps().
Another very important difference between the capabilities setting/clearing in virSetUIDGIDWithCaps() and virCommand's virSetCapabilities() (which it will replace in the next patch) is that the new function properly clears the capabilities bounding set, so it will not be possible for a child process to set any new capabilities.
A short description of what is done by virSetUIDGIDWithCaps():
1) clear all capabilities then set all those desired by the caller (in capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed to change the capabilities bounding set).
2) call prctl(), telling it that we want to maintain current capabilities across an upcoming setuid().
3) switch to the new uid/gid
4) again call prctl(), telling it we will no longer want capabilities maintained if this process does another setuid().
5) clear the capabilities that we added to allow us to setuid/setgid/change the bounding set (unless they were also requested by the caller via the virCommand API).
Because the modification/maintaining of capabilities is intermingled with setting the uid, this is necessarily done in a single function, rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio"). --- Change from V1: * properly cast when comparing gid/uid, and only short circuit for -1 (not 0) * fix // style comments * add ATTRIBUTE_UNUSED where appropriate for capBits argument.
src/libvirt_private.syms | 1 + src/util/virutil.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virutil.h | 1 + 3 files changed, 113 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index dcdcb67..d8d5877 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1312,6 +1312,7 @@ virSetDeviceUnprivSGIO; virSetInherit; virSetNonBlock; virSetUIDGID; +virSetUIDGIDWithCaps; virSkipSpaces; virSkipSpacesAndBackslash; virSkipSpacesBackwards; diff --git a/src/util/virutil.c b/src/util/virutil.c index 0d7db00..28fcc2f 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -60,6 +60,7 @@ #endif #if WITH_CAPNG # include <cap-ng.h> +# include <sys/prctl.h> #endif #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R # include <mntent.h> @@ -2990,6 +2991,116 @@ virGetGroupName(gid_t gid ATTRIBUTE_UNUSED) } #endif /* HAVE_GETPWUID_R */
+#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return + * 0 on success, -1 on failure (the original system error remains in + * errno). + */ +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + int ii, capng_ret, ret = -1; + bool need_setgid = false, need_setuid = false; + bool need_prctl = false, need_setpcap = false; + + /* First drop all caps except those in capBits + the extra ones we + * need to change uid/gid and change the capabilities bounding + * set. + */ + + capng_clear(CAPNG_SELECT_BOTH); + + for (ii = 0; ii <= CAP_LAST_CAP; ii++) { + if (capBits & (1ULL << ii)) { + capng_update(CAPNG_ADD, + CAPNG_EFFECTIVE|CAPNG_INHERITABLE| + CAPNG_PERMITTED|CAPNG_BOUNDING_SET, + ii); + } + } + + if (gid != (gid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETGID)) { + need_setgid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + } + if (uid != (uid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETUID)) { + need_setuid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + } +# ifdef PR_CAPBSET_DROP + /* If newer kernel, we need also need setpcap to change the bounding set */ + if ((capBits || need_setgid || need_setuid) && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) { + need_setpcap = true; + } + if (need_setpcap) + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); +# endif + + need_prctl = capBits || need_setgid || need_setuid || need_setpcap; + + /* Tell system we want to keep caps across uid change */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to set KEEPCAPS")); + goto cleanup; + } + + /* Change to the temp capabilities */ + if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + } + + if (virSetUIDGID(uid, gid) < 0) + goto cleanup; + + /* Tell it we are done keeping capabilities */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to reset KEEPCAPS")); + goto cleanup; + } + + /* Drop the caps that allow setuid/gid (unless they were requested) */ + if (need_setgid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + if (need_setuid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + /* Throw away CAP_SETPCAP so no more changes */ + if (need_setpcap) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); + + if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + ret = -1; + goto cleanup; + } + + ret = 0; +cleanup: + return ret; +} + +#else +/* + * On platforms without libcapng, the capabilities setting is treated + * as a NOP. + */ + +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, + unsigned long long capBits ATTRIBUTE_UNUSED) +{ + return virSetUIDGID(uid, gid); +} +#endif +
#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virutil.h b/src/util/virutil.h index 4201aa1..2dc3403 100644 --- a/src/util/virutil.h +++ b/src/util/virutil.h @@ -54,6 +54,7 @@ int virPipeReadUntilEOF(int outfd, int errfd, char **outbuf, char **errbuf);
int virSetUIDGID(uid_t uid, gid_t gid); +int virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits);
int virFileReadLimFD(int fd, int maxlen, char **buf) ATTRIBUTE_RETURN_CHECK;
-- 1.8.1 The following error bisect's down to this commit when running out of my local checkout for testing.
2013-02-16 05:16:55.102+0000: 29992: error : virCommandWait:2270 : internal error Child process (LC_ALL=C LD_LIBRARY_PATH=/home/cardoe/work/libvirt/src/.libs PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.6.3:/usr/games/bin HOME=/home/cardoe USER=cardoe LOGNAME=cardoe /usr/bin/qemu-kvm -help) unexpected exit status 1: libvir: error : internal error cannot apply process capabilities -1
Ugh. Can you manage to get that trapped in gdb and find out the value of uid, gid, and capBits, as well as whether it is failing on the first call to capng_apply() or the second (they both have the same error messsage. (Whatever happened to the function name/line number that used to be logged with the error messages?) I wonder if perhaps on debian it's failing the capng_apply() call that happens after the uid is changed...

On Sat, Feb 16, 2013 at 4:53 PM, Laine Stump <laine@laine.org> wrote:
On 02/16/2013 12:20 AM, Doug Goldstein wrote:
On Tue, Feb 12, 2013 at 2:15 PM, Laine Stump <laine@laine.org> wrote:
Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities bits, it is necessary to either call capng_change_id() (which unfortunately doesn't currently call initgroups to setup auxiliary group membership), or to perform the small amount of calisthenics contained in the new utility function virSetUIDGIDWithCaps().
Another very important difference between the capabilities setting/clearing in virSetUIDGIDWithCaps() and virCommand's virSetCapabilities() (which it will replace in the next patch) is that the new function properly clears the capabilities bounding set, so it will not be possible for a child process to set any new capabilities.
A short description of what is done by virSetUIDGIDWithCaps():
1) clear all capabilities then set all those desired by the caller (in capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed to change the capabilities bounding set).
2) call prctl(), telling it that we want to maintain current capabilities across an upcoming setuid().
3) switch to the new uid/gid
4) again call prctl(), telling it we will no longer want capabilities maintained if this process does another setuid().
5) clear the capabilities that we added to allow us to setuid/setgid/change the bounding set (unless they were also requested by the caller via the virCommand API).
Because the modification/maintaining of capabilities is intermingled with setting the uid, this is necessarily done in a single function, rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio"). --- Change from V1: * properly cast when comparing gid/uid, and only short circuit for -1 (not 0) * fix // style comments * add ATTRIBUTE_UNUSED where appropriate for capBits argument.
src/libvirt_private.syms | 1 + src/util/virutil.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virutil.h | 1 + 3 files changed, 113 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index dcdcb67..d8d5877 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1312,6 +1312,7 @@ virSetDeviceUnprivSGIO; virSetInherit; virSetNonBlock; virSetUIDGID; +virSetUIDGIDWithCaps; virSkipSpaces; virSkipSpacesAndBackslash; virSkipSpacesBackwards; diff --git a/src/util/virutil.c b/src/util/virutil.c index 0d7db00..28fcc2f 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -60,6 +60,7 @@ #endif #if WITH_CAPNG # include <cap-ng.h> +# include <sys/prctl.h> #endif #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R # include <mntent.h> @@ -2990,6 +2991,116 @@ virGetGroupName(gid_t gid ATTRIBUTE_UNUSED) } #endif /* HAVE_GETPWUID_R */
+#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return + * 0 on success, -1 on failure (the original system error remains in + * errno). + */ +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + int ii, capng_ret, ret = -1; + bool need_setgid = false, need_setuid = false; + bool need_prctl = false, need_setpcap = false; + + /* First drop all caps except those in capBits + the extra ones we + * need to change uid/gid and change the capabilities bounding + * set. + */ + + capng_clear(CAPNG_SELECT_BOTH); + + for (ii = 0; ii <= CAP_LAST_CAP; ii++) { + if (capBits & (1ULL << ii)) { + capng_update(CAPNG_ADD, + CAPNG_EFFECTIVE|CAPNG_INHERITABLE| + CAPNG_PERMITTED|CAPNG_BOUNDING_SET, + ii); + } + } + + if (gid != (gid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETGID)) { + need_setgid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + } + if (uid != (uid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETUID)) { + need_setuid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + } +# ifdef PR_CAPBSET_DROP + /* If newer kernel, we need also need setpcap to change the bounding set */ + if ((capBits || need_setgid || need_setuid) && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) { + need_setpcap = true; + } + if (need_setpcap) + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); +# endif + + need_prctl = capBits || need_setgid || need_setuid || need_setpcap; + + /* Tell system we want to keep caps across uid change */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to set KEEPCAPS")); + goto cleanup; + } + + /* Change to the temp capabilities */ + if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + } + + if (virSetUIDGID(uid, gid) < 0) + goto cleanup; + + /* Tell it we are done keeping capabilities */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to reset KEEPCAPS")); + goto cleanup; + } + + /* Drop the caps that allow setuid/gid (unless they were requested) */ + if (need_setgid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + if (need_setuid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + /* Throw away CAP_SETPCAP so no more changes */ + if (need_setpcap) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); + + if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + ret = -1; + goto cleanup; + } + + ret = 0; +cleanup: + return ret; +} + +#else +/* + * On platforms without libcapng, the capabilities setting is treated + * as a NOP. + */ + +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, + unsigned long long capBits ATTRIBUTE_UNUSED) +{ + return virSetUIDGID(uid, gid); +} +#endif +
#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virutil.h b/src/util/virutil.h index 4201aa1..2dc3403 100644 --- a/src/util/virutil.h +++ b/src/util/virutil.h @@ -54,6 +54,7 @@ int virPipeReadUntilEOF(int outfd, int errfd, char **outbuf, char **errbuf);
int virSetUIDGID(uid_t uid, gid_t gid); +int virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits);
int virFileReadLimFD(int fd, int maxlen, char **buf) ATTRIBUTE_RETURN_CHECK;
-- 1.8.1 The following error bisect's down to this commit when running out of my local checkout for testing.
2013-02-16 05:16:55.102+0000: 29992: error : virCommandWait:2270 : internal error Child process (LC_ALL=C LD_LIBRARY_PATH=/home/cardoe/work/libvirt/src/.libs PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.6.3:/usr/games/bin HOME=/home/cardoe USER=cardoe LOGNAME=cardoe /usr/bin/qemu-kvm -help) unexpected exit status 1: libvir: error : internal error cannot apply process capabilities -1
Ugh. Can you manage to get that trapped in gdb and find out the value of uid, gid, and capBits, as well as whether it is failing on the first call to capng_apply() or the second (they both have the same error messsage. (Whatever happened to the function name/line number that used to be logged with the error messages?) I wonder if perhaps on debian it's failing the capng_apply() call that happens after the uid is changed...
Oops. Guess that would have been helpful to include. Its Gentoo btw, not Debian. Its in the first call. I guess the exit message is overriding the original line number in the error message. 2013-02-17 18:08:03.696+0000: 21164: debug : virExec:641 : Setting child uid:gid to 0:0 with caps 0 2013-02-17 18:08:03.696+0000: 21164: error : virSetUIDGIDWithCaps:3055 : internal error cannot apply process capabilities -1 -- Doug Goldstein

On Sat, Feb 16, 2013 at 05:53:05PM -0500, Laine Stump wrote:
On 02/16/2013 12:20 AM, Doug Goldstein wrote:
On Tue, Feb 12, 2013 at 2:15 PM, Laine Stump <laine@laine.org> wrote:
Normally when a process' uid is changed to non-0, all the capabilities bits are cleared, even those explicitly set with calls to capng_update()/capng_apply() made immediately before setuid. And *after* the process' uid has been changed, it no longer has the necessary privileges to add capabilities back to the process.
In order to set a non-0 uid while still maintaining any capabilities bits, it is necessary to either call capng_change_id() (which unfortunately doesn't currently call initgroups to setup auxiliary group membership), or to perform the small amount of calisthenics contained in the new utility function virSetUIDGIDWithCaps().
Another very important difference between the capabilities setting/clearing in virSetUIDGIDWithCaps() and virCommand's virSetCapabilities() (which it will replace in the next patch) is that the new function properly clears the capabilities bounding set, so it will not be possible for a child process to set any new capabilities.
A short description of what is done by virSetUIDGIDWithCaps():
1) clear all capabilities then set all those desired by the caller (in capBits) plus CAP_SETGID, CAP_SETUID, and CAP_SETPCAP (which is needed to change the capabilities bounding set).
2) call prctl(), telling it that we want to maintain current capabilities across an upcoming setuid().
3) switch to the new uid/gid
4) again call prctl(), telling it we will no longer want capabilities maintained if this process does another setuid().
5) clear the capabilities that we added to allow us to setuid/setgid/change the bounding set (unless they were also requested by the caller via the virCommand API).
Because the modification/maintaining of capabilities is intermingled with setting the uid, this is necessarily done in a single function, rather than having two independent functions.
Note that, due to the way that effective capabilities are computed (at time of execve) for a process that has uid != 0, the *file* capabilities of the binary being executed must also have the desired capabilities bit(s) set (see "man 7 capabilities"). This can be done with the "filecap" command. (e.g. "filecap /usr/bin/qemu-kvm sys_rawio"). --- Change from V1: * properly cast when comparing gid/uid, and only short circuit for -1 (not 0) * fix // style comments * add ATTRIBUTE_UNUSED where appropriate for capBits argument.
src/libvirt_private.syms | 1 + src/util/virutil.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++ src/util/virutil.h | 1 + 3 files changed, 113 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index dcdcb67..d8d5877 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1312,6 +1312,7 @@ virSetDeviceUnprivSGIO; virSetInherit; virSetNonBlock; virSetUIDGID; +virSetUIDGIDWithCaps; virSkipSpaces; virSkipSpacesAndBackslash; virSkipSpacesBackwards; diff --git a/src/util/virutil.c b/src/util/virutil.c index 0d7db00..28fcc2f 100644 --- a/src/util/virutil.c +++ b/src/util/virutil.c @@ -60,6 +60,7 @@ #endif #if WITH_CAPNG # include <cap-ng.h> +# include <sys/prctl.h> #endif #if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R # include <mntent.h> @@ -2990,6 +2991,116 @@ virGetGroupName(gid_t gid ATTRIBUTE_UNUSED) } #endif /* HAVE_GETPWUID_R */
+#if WITH_CAPNG +/* Set the real and effective uid and gid to the given values, while + * maintaining the capabilities indicated by bits in @capBits. return + * 0 on success, -1 on failure (the original system error remains in + * errno). + */ +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits) +{ + int ii, capng_ret, ret = -1; + bool need_setgid = false, need_setuid = false; + bool need_prctl = false, need_setpcap = false; + + /* First drop all caps except those in capBits + the extra ones we + * need to change uid/gid and change the capabilities bounding + * set. + */ + + capng_clear(CAPNG_SELECT_BOTH); + + for (ii = 0; ii <= CAP_LAST_CAP; ii++) { + if (capBits & (1ULL << ii)) { + capng_update(CAPNG_ADD, + CAPNG_EFFECTIVE|CAPNG_INHERITABLE| + CAPNG_PERMITTED|CAPNG_BOUNDING_SET, + ii); + } + } + + if (gid != (gid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETGID)) { + need_setgid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + } + if (uid != (uid_t)-1 && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETUID)) { + need_setuid = true; + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + } +# ifdef PR_CAPBSET_DROP + /* If newer kernel, we need also need setpcap to change the bounding set */ + if ((capBits || need_setgid || need_setuid) && + !capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) { + need_setpcap = true; + } + if (need_setpcap) + capng_update(CAPNG_ADD, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); +# endif + + need_prctl = capBits || need_setgid || need_setuid || need_setpcap; + + /* Tell system we want to keep caps across uid change */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 1, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to set KEEPCAPS")); + goto cleanup; + } + + /* Change to the temp capabilities */ + if ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + goto cleanup; + } + + if (virSetUIDGID(uid, gid) < 0) + goto cleanup; + + /* Tell it we are done keeping capabilities */ + if (need_prctl && prctl(PR_SET_KEEPCAPS, 0, 0, 0, 0)) { + virReportSystemError(errno, "%s", + _("prctl failed to reset KEEPCAPS")); + goto cleanup; + } + + /* Drop the caps that allow setuid/gid (unless they were requested) */ + if (need_setgid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETGID); + if (need_setuid) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETUID); + /* Throw away CAP_SETPCAP so no more changes */ + if (need_setpcap) + capng_update(CAPNG_DROP, CAPNG_EFFECTIVE|CAPNG_PERMITTED, CAP_SETPCAP); + + if (need_prctl && ((capng_ret = capng_apply(CAPNG_SELECT_BOTH)) < 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("cannot apply process capabilities %d"), capng_ret); + ret = -1; + goto cleanup; + } + + ret = 0; +cleanup: + return ret; +} + +#else +/* + * On platforms without libcapng, the capabilities setting is treated + * as a NOP. + */ + +int +virSetUIDGIDWithCaps(uid_t uid, gid_t gid, + unsigned long long capBits ATTRIBUTE_UNUSED) +{ + return virSetUIDGID(uid, gid); +} +#endif +
#if defined HAVE_MNTENT_H && defined HAVE_GETMNTENT_R /* search /proc/mounts for mount point of *type; return pointer to diff --git a/src/util/virutil.h b/src/util/virutil.h index 4201aa1..2dc3403 100644 --- a/src/util/virutil.h +++ b/src/util/virutil.h @@ -54,6 +54,7 @@ int virPipeReadUntilEOF(int outfd, int errfd, char **outbuf, char **errbuf);
int virSetUIDGID(uid_t uid, gid_t gid); +int virSetUIDGIDWithCaps(uid_t uid, gid_t gid, unsigned long long capBits);
int virFileReadLimFD(int fd, int maxlen, char **buf) ATTRIBUTE_RETURN_CHECK;
-- 1.8.1 The following error bisect's down to this commit when running out of my local checkout for testing.
2013-02-16 05:16:55.102+0000: 29992: error : virCommandWait:2270 : internal error Child process (LC_ALL=C LD_LIBRARY_PATH=/home/cardoe/work/libvirt/src/.libs PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.6.3:/usr/games/bin HOME=/home/cardoe USER=cardoe LOGNAME=cardoe /usr/bin/qemu-kvm -help) unexpected exit status 1: libvir: error : internal error cannot apply process capabilities -1
Ugh. Can you manage to get that trapped in gdb and find out the value of uid, gid, and capBits, as well as whether it is failing on the first call to capng_apply() or the second (they both have the same error messsage. (Whatever happened to the function name/line number that used to be logged with the error messages?) I wonder if perhaps on debian it's failing the capng_apply() call that happens after the uid is changed...
It's uid = 0, gid = 0 (as can be seen when running with LIBVIRT_DEBUG=1) . See 20130217173308.GA11314@bogon.sigxcpu.org for a proposed fix. Cheers, -- Guido
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 02/18/2013 10:09 AM, Guido Günther wrote:
On Sat, Feb 16, 2013 at 05:53:05PM -0500, Laine Stump wrote:
On 02/16/2013 12:20 AM, Doug Goldstein wrote:
The following error bisect's down to this commit when running out of my local checkout for testing.
2013-02-16 05:16:55.102+0000: 29992: error : virCommandWait:2270 : internal error Child process (LC_ALL=C LD_LIBRARY_PATH=/home/cardoe/work/libvirt/src/.libs PATH=/usr/local/bin:/usr/bin:/bin:/opt/bin:/usr/x86_64-pc-linux-gnu/gcc-bin/4.6.3:/usr/games/bin HOME=/home/cardoe USER=cardoe LOGNAME=cardoe /usr/bin/qemu-kvm -help) unexpected exit status 1: libvir: error : internal error cannot apply process capabilities -1
Ugh. Can you manage to get that trapped in gdb and find out the value of uid, gid, and capBits, as well as whether it is failing on the first call to capng_apply() or the second (they both have the same error messsage. (Whatever happened to the function name/line number that used to be logged with the error messages?) I wonder if perhaps on debian it's failing the capng_apply() call that happens after the uid is changed... It's uid = 0, gid = 0 (as can be seen when running with LIBVIRT_DEBUG=1) . See 20130217173308.GA11314@bogon.sigxcpu.org for a proposed fix.
Ah, good. I see that one now (after searching through my google apps spam folder.. grumble grumble.) I was suspicious there might be some fallout due to no longer treating uid=0 as "ignore/don't change", but didn't think to try session mode libvirtd. Thanks for figuring it out.

virCommand was previously calling virSetUIDGID() to change the uid and gid of the child process, then separately calling virSetCapabilities(). This did not work if the desired uid was != 0, since a setuid to anything other than 0 normally clears all capabilities bits. The solution is to use the new virSetUIDGIDWithCaps(), sending it the uid, gid, and capabilities bits. This will get the new process setup properly. Since the static functions virSetCapabilities() and virClearCapabilities are no longer called, they have been removed. NOTE: When combined with "filecap $path-to-qemu sys_rawio", this patch will make CAP_SYS_RAWIO (which is required for passthrough of generic scsi commands to a guest - see commits e8daeeb, 177db08, 397e6a7, and 74e0349) be retained by qemu when necessary. Apparently that capability has been broken for non-root qemu every since it was originally added. --- Change from V1: s/todo/to do/ in comment I didn't do anything about issuing a warning if CAPNG isn't present, because we previously haven't done that, and I think it would clutter the log terribly on any platform that didn't have libcapng. src/util/vircommand.c | 77 ++++++--------------------------------------------- 1 file changed, 8 insertions(+), 69 deletions(-) diff --git a/src/util/vircommand.c b/src/util/vircommand.c index 4b1fc8d..a30621b 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -183,65 +183,6 @@ virCommandFDSet(int fd, #ifndef WIN32 -static int virClearCapabilities(void) ATTRIBUTE_UNUSED; - -# if WITH_CAPNG -static int virClearCapabilities(void) -{ - int ret; - - capng_clear(CAPNG_SELECT_BOTH); - - if ((ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot clear process capabilities %d"), ret); - return -1; - } - - return 0; -} - -/** - * virSetCapabilities: - * @capabilities - capability flag to set. - * In case of 0, this function is identical to - * virClearCapabilities() - * - */ -static int virSetCapabilities(unsigned long long capabilities) -{ - int ret, i; - - capng_clear(CAPNG_SELECT_BOTH); - - for (i = 0; i <= CAP_LAST_CAP; i++) { - if (capabilities & (1ULL << i)) - capng_update(CAPNG_ADD, CAPNG_BOUNDING_SET, i); - } - - if ((ret = capng_apply(CAPNG_SELECT_BOTH)) < 0) { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("cannot apply process capabilities %d"), ret); - return -1; - } - - return 0; -} -# else -static int virClearCapabilities(void) -{ -// VIR_WARN("libcap-ng support not compiled in, unable to clear " -// "capabilities"); - return 0; -} - -static int -virSetCapabilities(unsigned long long capabilities ATTRIBUTE_UNUSED) -{ - return 0; -} -# endif - /** * virFork: * @pid - a pointer to a pid_t that will receive the return value from @@ -646,19 +587,17 @@ virExec(virCommandPtr cmd) } # endif - if (cmd->uid != (uid_t)-1 || cmd->gid != (gid_t)-1) { - VIR_DEBUG("Setting child uid:gid to %d:%d", - (int)cmd->uid, (int)cmd->gid); - if (virSetUIDGID(cmd->uid, cmd->gid) < 0) + /* The steps above may need to do something privileged, so we delay + * setuid and clearing capabilities until the last minute. + */ + if (cmd->uid != (uid_t)-1 || cmd->gid != (gid_t)-1 || + cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) { + VIR_DEBUG("Setting child uid:gid to %d:%d with caps %llx", + (int)cmd->uid, (int)cmd->gid, cmd->capabilities); + if (virSetUIDGIDWithCaps(cmd->uid, cmd->gid, cmd->capabilities) < 0) goto fork_error; } - /* The steps above may need todo something privileged, so - * we delay clearing capabilities until the last minute */ - if (cmd->capabilities || (cmd->flags & VIR_EXEC_CLEAR_CAPS)) - if (virSetCapabilities(cmd->capabilities) < 0) - goto fork_error; - if (cmd->pwd) { VIR_DEBUG("Running child in %s", cmd->pwd); if (chdir(cmd->pwd) < 0) { -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
virCommand was previously calling virSetUIDGID() to change the uid and gid of the child process, then separately calling virSetCapabilities(). This did not work if the desired uid was != 0, since a setuid to anything other than 0 normally clears all capabilities bits.
The solution is to use the new virSetUIDGIDWithCaps(), sending it the uid, gid, and capabilities bits. This will get the new process setup properly.
Since the static functions virSetCapabilities() and virClearCapabilities are no longer called, they have been removed.
NOTE: When combined with "filecap $path-to-qemu sys_rawio", this patch will make CAP_SYS_RAWIO (which is required for passthrough of generic scsi commands to a guest - see commits e8daeeb, 177db08, 397e6a7, and 74e0349) be retained by qemu when necessary. Apparently that capability has been broken for non-root qemu every since it was
s/every/ever/
originally added. --- Change from V1: s/todo/to do/ in comment
I didn't do anything about issuing a warning if CAPNG isn't present, because we previously haven't done that, and I think it would clutter the log terribly on any platform that didn't have libcapng.
Fair enough. Maybe if there were a way to do a one-shot logging it might be helpful; but as there is no change in logging behavior (not logging either before or after this patch), that could be deferred to a later patch if we ever want it. ACK. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

(NOTE: This patch will hopefully *not* be needed, as some kernel people are trying to eliminate the need by changing CAP_COMPROMISE_KERNEL so that it's only checked on open(), not on read/write. I'm only including it here for completeness, but have no plans to push it.) Any system with CAP_COMPROMISE_KERNEL available in the kernel was not able to perform PCI passthrough device assignment without 1) running qemu as root *and* 2) setting "clear_emulator_capabilities=0" in /etc/libvirt/qemu.conf. This patch is the final piece to make pci passthrough once again work properly with a non-root qemu. It sets CAP_COMPROMISE_KERNEL; now that virCommand is properly setup to honor that request for non-root child processes, it will actually do some good. It is still necessary to set the file capability for the qemu binary, however (see the rules for determining effective caps of a process running as non-root in "man 7 capabilities"). This can be done with: filecap $path-to-qemu-binary compromise_kernel --- Change from V1: rebased. src/qemu/qemu_process.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 6c70246..b681ad5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3942,6 +3942,17 @@ int qemuProcessStart(virConnectPtr conn, if (cfg->clearEmulatorCapabilities) virCommandClearCaps(cmd); +#ifdef CAP_COMPROMISE_KERNEL + /* On kernels that have this capability, we must set it for qemu + * in order for PCI passthrough with -device pci-assign to work + * (the qemu binary must also have that *file* capability, set + * with "filecap /usr/bin/qemu-kvm compromise_kernel", for + * example). (either that, or qemu must be run as root, with + * "clear_emulator_capabilities=0" in /etc/libvirt/qemu.conf). + */ + virCommandAllowCap(cmd, CAP_COMPROMISE_KERNEL); +#endif + /* in case a certain disk is desirous of CAP_SYS_RAWIO, add this */ for (i = 0; i < vm->def->ndisks; i++) { virDomainDiskDefPtr disk = vm->def->disks[i]; -- 1.8.1

On 02/12/2013 01:15 PM, Laine Stump wrote:
(and also properly clear capabilities bounding set for child processes)
This replaces the earlier version of the same patches:
https://www.redhat.com/archives/libvir-list/2013-February/msg00446.html
Many of these patches were already ACKed in the first version. If I haven't modified them (other than rebasing) I note that in the subject. Other patches have been modified based on Dan and Eric's reviews of V1.
v1 ACK stands on all the patches I didn't comment on; so 1-14 are ready to push if you fix a few last-minute nits. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (6)
-
Daniel P. Berrange
-
Doug Goldstein
-
Eric Blake
-
Guido Günther
-
Jim Fehlig
-
Laine Stump