[libvirt] [PATCH 0/4] Remaining patches from VFIO series

I've pushed everything else from all 3 VFIO series. Patch 1/4 in this series had questions from Eric about whether it is the right way to go, or if we want to do something more limited: https://www.redhat.com/archives/libvir-list/2013-April/msg01864.html Eric and danpb had both raised issues with Patch 2/4, so I redid it addressing all the points they brought up, and it's now ready for review: https://www.redhat.com/archives/libvir-list/2013-April/msg01853.html https://www.redhat.com/archives/libvir-list/2013-April/msg01869.html 3/4 and 4/4 were ACKed, but depend on 2/4, so I couldn't push them. If these patches are ACKed and DV wants to make the RC1 snapshot before I wake up, anyone else feel free to push these 4 patches as they are (assuming they're ACKed, of course :-) Laine Stump (4): util: new virCommandSetMax(MemLock|Processes|Files) qemu: use new virCommandSetMax(Processes|Files) qemu: set qemu process' RLIMIT_MEMLOCK when VFIO is used qemu: add VFIO devices to cgroup ACL configure.ac | 2 +- src/libvirt_private.syms | 6 ++ src/qemu/qemu_cgroup.c | 11 ++++ src/qemu/qemu_command.c | 25 +++++--- src/qemu/qemu_hotplug.c | 27 ++++++--- src/qemu/qemu_process.c | 38 +----------- src/util/vircommand.c | 38 ++++++++++++ src/util/vircommand.h | 4 ++ src/util/virprocess.c | 152 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virprocess.h | 5 +- 10 files changed, 255 insertions(+), 53 deletions(-) -- 1.7.11.7

This patch adds two sets of functions: 1) lower level virProcessSet*() functions that will immediately set the RLIMIT_MEMLOCK. RLIMIT_NPROC, or RLIMIT_NOFILE of either the current process (using setrlimit()) or any other process (using prlimit()). "current process" is indicated by passing a 0 for pid. 2) functions for virCommand* that will setup a virCommand object to set those limits at a later time just after it has forked a new process, but before it execs the new program. configure.ac has prlimit and setrlimit added to the list of functions to check for, and the low level functions log an "unsupported" error) on platforms that don't support those functions. --- configure.ac | 2 +- src/libvirt_private.syms | 6 ++ src/util/vircommand.c | 38 ++++++++++++ src/util/vircommand.h | 4 ++ src/util/virprocess.c | 152 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virprocess.h | 5 +- 6 files changed, 204 insertions(+), 3 deletions(-) diff --git a/configure.ac b/configure.ac index 89dae3d..23c24d2 100644 --- a/configure.ac +++ b/configure.ac @@ -194,7 +194,7 @@ dnl Availability of various common functions (non-fatal if missing), dnl and various less common threadsafe functions AC_CHECK_FUNCS_ONCE([cfmakeraw geteuid getgid getgrnam_r getmntent_r \ getpwuid_r getuid initgroups kill mmap newlocale posix_fallocate \ - posix_memalign regexec sched_getaffinity setns symlink]) + posix_memalign prlimit regexec sched_getaffinity setns setrlimit symlink]) dnl Availability of pthread functions (if missing, win32 threading is dnl assumed). Because of $LIB_PTHREAD, we cannot use AC_CHECK_FUNCS_ONCE. diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 2a2c40e..0bb6f5f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1185,6 +1185,9 @@ virCommandSetErrorFD; virCommandSetGID; virCommandSetInputBuffer; virCommandSetInputFD; +virCommandSetMaxFiles; +virCommandSetMaxMemLock; +virCommandSetMaxProcesses; virCommandSetOutputBuffer; virCommandSetOutputFD; virCommandSetPidFile; @@ -1668,6 +1671,9 @@ virProcessGetNamespaces; virProcessKill; virProcessKillPainfully; virProcessSetAffinity; +virProcessSetMaxFiles; +virProcessSetMaxMemLock; +virProcessSetMaxProcesses; virProcessSetNamespaces; virProcessTranslateStatus; virProcessWait; diff --git a/src/util/vircommand.c b/src/util/vircommand.c index ac56a63..98521ec 100644 --- a/src/util/vircommand.c +++ b/src/util/vircommand.c @@ -107,6 +107,10 @@ struct _virCommand { char *pidfile; bool reap; + unsigned long long maxMemLock; + unsigned int maxProcesses; + unsigned int maxFiles; + uid_t uid; gid_t gid; unsigned long long capabilities; @@ -598,6 +602,13 @@ virExec(virCommandPtr cmd) goto fork_error; } + if (virProcessSetMaxMemLock(0, cmd->maxMemLock) < 0) + goto fork_error; + if (virProcessSetMaxProcesses(0, cmd->maxProcesses) < 0) + goto fork_error; + if (virProcessSetMaxFiles(0, cmd->maxFiles) < 0) + goto fork_error; + if (cmd->hook) { VIR_DEBUG("Run hook %p %p", cmd->hook, cmd->opaque); ret = cmd->hook(cmd->opaque); @@ -958,6 +969,33 @@ virCommandSetUID(virCommandPtr cmd, uid_t uid) cmd->uid = uid; } +void +virCommandSetMaxMemLock(virCommandPtr cmd, unsigned long long bytes) +{ + if (!cmd || cmd->has_error) + return; + + cmd->maxMemLock = bytes; +} + +void +virCommandSetMaxProcesses(virCommandPtr cmd, unsigned int procs) +{ + if (!cmd || cmd->has_error) + return; + + cmd->maxProcesses = procs; +} + +void +virCommandSetMaxFiles(virCommandPtr cmd, unsigned int files) +{ + if (!cmd || cmd->has_error) + return; + + cmd->maxFiles = files; +} + /** * virCommandClearCaps: * @cmd: the command to modify diff --git a/src/util/vircommand.h b/src/util/vircommand.h index 6c13795..18568fe 100644 --- a/src/util/vircommand.h +++ b/src/util/vircommand.h @@ -65,6 +65,10 @@ void virCommandSetGID(virCommandPtr cmd, gid_t gid); void virCommandSetUID(virCommandPtr cmd, uid_t uid); +void virCommandSetMaxMemLock(virCommandPtr cmd, unsigned long long bytes); +void virCommandSetMaxProcesses(virCommandPtr cmd, unsigned int procs); +void virCommandSetMaxFiles(virCommandPtr cmd, unsigned int files); + void virCommandClearCaps(virCommandPtr cmd); void virCommandAllowCap(virCommandPtr cmd, diff --git a/src/util/virprocess.c b/src/util/virprocess.c index a492bd1..fb81805 100644 --- a/src/util/virprocess.c +++ b/src/util/virprocess.c @@ -1,7 +1,7 @@ /* * virprocess.c: interaction with processes * - * Copyright (C) 2010-2012 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -27,6 +27,10 @@ #include <signal.h> #include <errno.h> #include <sys/wait.h> +#if HAVE_SETRLIMIT +# include <sys/time.h> +# include <sys/resource.h> +#endif #include <sched.h> #include "virprocess.h" @@ -605,3 +609,149 @@ int virProcessSetNamespaces(size_t nfdlist ATTRIBUTE_UNUSED, return -1; } #endif /* ! HAVE_SETNS */ + +#if HAVE_PRLIMIT +static int +virProcessPrLimit(pid_t pid, int resource, struct rlimit *rlim) +{ + return prlimit(pid, resource, rlim, NULL); +} +#else /* ! HAVE_PRLIMIT */ +static int +virProcessPrLimit(pid_t pid ATTRIBUTE_UNUSED, + int resource ATTRIBUTE_UNUSED, + struct rlimit *rlim ATTRIBUTE_UNUSED) +{ + errno = ENOSYS; + return -1; +} +#endif /* ! HAVE_PRLIMIT */ + +#if HAVE_SETRLIMIT && defined(RLIMIT_MEMLOCK) +int +virProcessSetMaxMemLock(pid_t pid, unsigned long long bytes) +{ + struct rlimit rlim; + + if (bytes == 0) + return 0; + + rlim.rlim_cur = rlim.rlim_max = bytes; + if (pid == 0) { + if (setrlimit(RLIMIT_MEMLOCK, &rlim) < 0) { + virReportSystemError(errno, + _("cannot limit locked memory to %llu"), + bytes); + return -1; + } + } else { + if (virProcessPrLimit(pid, RLIMIT_MEMLOCK, &rlim) < 0) { + virReportSystemError(errno, + _("cannot limit locked memory " + "of process %lld to %llu"), + (long long int)pid, bytes); + return -1; + } + } + return 0; +} +#else /* ! (HAVE_SETRLIMIT && defined(RLIMIT_MEMLOCK)) */ +int +virProcessSetMaxMemLock(pid_t pid ATTRIBUTE_UNUSED, unsigned long long bytes) +{ + if (bytes == 0) + return 0; + + virReportSystemError(ENOSYS, "%s", _("Not supported on this platform")); + return -1; +} +#endif /* ! (HAVE_SETRLIMIT && defined(RLIMIT_MEMLOCK)) */ + + +#if HAVE_SETRLIMIT && defined(RLIMIT_NPROC) +int +virProcessSetMaxProcesses(pid_t pid, unsigned int procs) +{ + struct rlimit rlim; + + if (procs == 0) + return 0; + + rlim.rlim_cur = rlim.rlim_max = procs; + if (pid == 0) { + if (setrlimit(RLIMIT_NPROC, &rlim) < 0) { + virReportSystemError(errno, + _("cannot limit number of subprocesses to %u"), + procs); + return -1; + } + } else { + if (virProcessPrLimit(pid, RLIMIT_NPROC, &rlim) < 0) { + virReportSystemError(errno, + _("cannot limit number of subprocesses " + "of process %lld to %u"), + (long long int)pid, procs); + return -1; + } + } + return 0; +} +#else /* ! (HAVE_SETRLIMIT && defined(RLIMIT_NPROC)) */ +int +virProcessSetMaxProcesses(pid_t pid ATTRIBUTE_UNUSED, unsigned int procs) +{ + if (procs == 0) + return 0; + + virReportSystemError(ENOSYS, "%s", _("Not supported on this platform")); + return -1; +} +#endif /* ! (HAVE_SETRLIMIT && defined(RLIMIT_NPROC)) */ + +#if HAVE_SETRLIMIT && defined(RLIMIT_NOFILE) +int +virProcessSetMaxFiles(pid_t pid, unsigned int files) +{ + struct rlimit rlim; + + if (files == 0) + return 0; + + /* Max number of opened files is one greater than actual limit. See + * man setrlimit. + * + * NB: That indicates to me that we would want the following code + * to say "files - 1", but the original of this code in + * qemu_process.c also had files + 1, so this preserves current + * behavior. + */ + rlim.rlim_cur = rlim.rlim_max = files + 1; + if (pid == 0) { + if (setrlimit(RLIMIT_NOFILE, &rlim) < 0) { + virReportSystemError(errno, + _("cannot limit number of open files to %u"), + files); + return -1; + } + } else { + if (virProcessPrLimit(pid, RLIMIT_NOFILE, &rlim) < 0) { + virReportSystemError(errno, + _("cannot limit number of open files " + "of process %lld to %u"), + (long long int)pid, files); + return -1; + } + } + return 0; +} +#else /* ! (HAVE_SETRLIMIT && defined(RLIMIT_NOFILE)) */ +int +virProcessSetMaxFiles(pid_t pid ATTRIBUTE_UNUSED, unsigned int files) +{ + if (files == 0) + return 0; + + virReportSystemError(ENOSYS, "%s", _("Not supported on this platform")); + return -1; +} +#endif /* ! (HAVE_SETRLIMIT && defined(RLIMIT_NOFILE)) */ diff --git a/src/util/virprocess.h b/src/util/virprocess.h index 53475d3..5dea334 100644 --- a/src/util/virprocess.h +++ b/src/util/virprocess.h @@ -1,7 +1,7 @@ /* * virprocess.h: interaction with processes * - * Copyright (C) 2010-2012 Red Hat, Inc. + * Copyright (C) 2010-2013 Red Hat, Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public @@ -54,4 +54,7 @@ int virProcessGetNamespaces(pid_t pid, int virProcessSetNamespaces(size_t nfdlist, int *fdlist); +int virProcessSetMaxMemLock(pid_t pid, unsigned long long bytes); +int virProcessSetMaxProcesses(pid_t pid, unsigned int procs); +int virProcessSetMaxFiles(pid_t pid, unsigned int files); #endif /* __VIR_PROCESS_H__ */ -- 1.7.11.7

On Thu, Apr 25, 2013 at 09:44:30PM -0400, Laine Stump wrote:
This patch adds two sets of functions:
1) lower level virProcessSet*() functions that will immediately set the RLIMIT_MEMLOCK. RLIMIT_NPROC, or RLIMIT_NOFILE of either the current process (using setrlimit()) or any other process (using prlimit()). "current process" is indicated by passing a 0 for pid.
2) functions for virCommand* that will setup a virCommand object to set those limits at a later time just after it has forked a new process, but before it execs the new program.
configure.ac has prlimit and setrlimit added to the list of functions to check for, and the low level functions log an "unsupported" error) on platforms that don't support those functions. --- configure.ac | 2 +- src/libvirt_private.syms | 6 ++ src/util/vircommand.c | 38 ++++++++++++ src/util/vircommand.h | 4 ++ src/util/virprocess.c | 152 ++++++++++++++++++++++++++++++++++++++++++++++- src/util/virprocess.h | 5 +- 6 files changed, 204 insertions(+), 3 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

These were previously being set in a custom hook function, but now that virCommand directly supports setting them, we can eliminate that part of the hook and call the APIs directly. --- src/qemu/qemu_process.c | 38 ++------------------------------------ 1 file changed, 2 insertions(+), 36 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 925939d..f12d7d5 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -25,8 +25,6 @@ #include <unistd.h> #include <signal.h> #include <sys/stat.h> -#include <sys/time.h> -#include <sys/resource.h> #if defined(__linux__) # include <linux/capability.h> #elif defined(__FreeBSD__) @@ -2453,37 +2451,6 @@ qemuProcessPrepareChardevDevice(virDomainDefPtr def ATTRIBUTE_UNUSED, } -static int -qemuProcessLimits(virQEMUDriverConfigPtr cfg) -{ - struct rlimit rlim; - - if (cfg->maxProcesses > 0) { - rlim.rlim_cur = rlim.rlim_max = cfg->maxProcesses; - if (setrlimit(RLIMIT_NPROC, &rlim) < 0) { - virReportSystemError(errno, - _("cannot limit number of processes to %d"), - cfg->maxProcesses); - return -1; - } - } - - if (cfg->maxFiles > 0) { - /* Max number of opened files is one greater than - * actual limit. See man setrlimit */ - rlim.rlim_cur = rlim.rlim_max = cfg->maxFiles + 1; - if (setrlimit(RLIMIT_NOFILE, &rlim) < 0) { - virReportSystemError(errno, - _("cannot set max opened files to %d"), - cfg->maxFiles); - return -1; - } - } - - return 0; -} - - struct qemuProcessHookData { virConnectPtr conn; virDomainObjPtr vm; @@ -2526,9 +2493,6 @@ static int qemuProcessHook(void *data) if (virSecurityManagerClearSocketLabel(h->driver->securityManager, h->vm->def) < 0) goto cleanup; - if (qemuProcessLimits(h->cfg) < 0) - goto cleanup; - /* This must take place before exec(), so that all QEMU * memory allocation is on the correct NUMA node */ @@ -3697,6 +3661,8 @@ int qemuProcessStart(virConnectPtr conn, } virCommandSetPreExecHook(cmd, qemuProcessHook, &hookData); + virCommandSetMaxProcesses(cmd, cfg->maxProcesses); + virCommandSetMaxFiles(cmd, cfg->maxFiles); VIR_DEBUG("Setting up security labelling"); if (virSecurityManagerSetChildProcessLabel(driver->securityManager, -- 1.7.11.7

On Thu, Apr 25, 2013 at 09:44:31PM -0400, Laine Stump wrote:
These were previously being set in a custom hook function, but now that virCommand directly supports setting them, we can eliminate that part of the hook and call the APIs directly. --- src/qemu/qemu_process.c | 38 ++------------------------------------ 1 file changed, 2 insertions(+), 36 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

VFIO requires all of the guest's memory and IO space to be lockable in RAM. The domain's max_balloon is the maximum amount of memory the domain can have (in KiB). We add a generous 1GiB to that for IO space (still much better than KVM device assignment, where the KVM module actually *ignores* the process limits and locks everything anyway), and convert from KiB to bytes. In the case of hotplug, we are changing the limit for the already existing qemu process (prlimit() is used under the hood), and for regular commandline additions of vfio devices, we schedule a call to setrlimit() that will happen after the qemu process is forked. --- src/qemu/qemu_command.c | 25 ++++++++++++++++++------- src/qemu/qemu_hotplug.c | 27 ++++++++++++++++++++------- 2 files changed, 38 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index 6f6b61b..2ce0672 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -7906,13 +7906,24 @@ qemuBuildCommandLine(virConnectPtr conn, if (hostdev->mode == VIR_DOMAIN_HOSTDEV_MODE_SUBSYS && hostdev->source.subsys.type == VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) { - if ((hostdev->source.subsys.u.pci.backend - == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_VFIO) && - !virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("VFIO PCI device assignment is not supported by " - "this version of qemu")); - goto error; + if (hostdev->source.subsys.u.pci.backend + == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_VFIO) { + unsigned long long memKB; + + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("VFIO PCI device assignment is not " + "supported by this version of qemu")); + goto error; + } + /* VFIO requires all of the guest's memory to be + * locked resident, plus some amount for IO + * space. Alex Williamson suggested adding 1GiB for IO + * space just to be safe (some finer tuning might be + * nice, though). + */ + memKB = def->mem.max_balloon + (1024 * 1024); + virCommandSetMaxMemLock(cmd, memKB * 1024); } if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_DEVICE)) { diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index f608ec4..15cca08 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -38,6 +38,7 @@ #include "viralloc.h" #include "virpci.h" #include "virfile.h" +#include "virprocess.h" #include "qemu_cgroup.h" #include "locking/domain_lock.h" #include "network/bridge_driver.h" @@ -999,13 +1000,25 @@ int qemuDomainAttachHostPciDevice(virQEMUDriverPtr driver, &hostdev, 1) < 0) return -1; - if ((hostdev->source.subsys.u.pci.backend - == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_VFIO) && - !virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI)) { - virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", - _("VFIO PCI device assignment is not supported by " - "this version of qemu")); - goto error; + if (hostdev->source.subsys.u.pci.backend + == VIR_DOMAIN_HOSTDEV_PCI_BACKEND_TYPE_VFIO) { + unsigned long long memKB; + + if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_VFIO_PCI)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("VFIO PCI device assignment is not " + "supported by this version of qemu")); + goto error; + } + /* VFIO requires all of the guest's memory to be locked + * resident, plus some amount for IO space. Alex Williamson + * suggested adding 1GiB for IO space just to be safe (some + * finer tuning might be nice, though). + * In this case, the guest's memory may already be locked, but + * it doesn't hurt to "change" the limit to the same value. + */ + memKB = vm->def->mem.max_balloon + (1024 * 1024); + virProcessSetMaxMemLock(vm->pid, memKB * 1024); } if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE)) { -- 1.7.11.7

On Thu, Apr 25, 2013 at 09:44:32PM -0400, Laine Stump wrote:
VFIO requires all of the guest's memory and IO space to be lockable in RAM. The domain's max_balloon is the maximum amount of memory the domain can have (in KiB). We add a generous 1GiB to that for IO space (still much better than KVM device assignment, where the KVM module actually *ignores* the process limits and locks everything anyway), and convert from KiB to bytes.
In the case of hotplug, we are changing the limit for the already existing qemu process (prlimit() is used under the hood), and for regular commandline additions of vfio devices, we schedule a call to setrlimit() that will happen after the qemu process is forked. --- src/qemu/qemu_command.c | 25 ++++++++++++++++++------- src/qemu/qemu_hotplug.c | 27 ++++++++++++++++++++------- 2 files changed, 38 insertions(+), 14 deletions(-)
ACK Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

We don't know exactly the names of the VFIO devices that will be needed (and due to hotplug, we can't ever assume we won't need them at all), so we just add an ACL to allow any vfio device - they all have the major number 244 (/dev/vfio/vfio is 244,0, and the /dev/vfio/n devices are up from there). --- src/qemu/qemu_cgroup.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/qemu/qemu_cgroup.c b/src/qemu/qemu_cgroup.c index 891984a..ad2027d 100644 --- a/src/qemu/qemu_cgroup.c +++ b/src/qemu/qemu_cgroup.c @@ -44,6 +44,7 @@ static const char *const defaultDeviceACL[] = { }; #define DEVICE_PTY_MAJOR 136 #define DEVICE_SND_MAJOR 116 +#define DEVICE_VFIO_MAJOR 244 static int qemuSetupDiskPathAllow(virDomainDiskDefPtr disk, @@ -388,6 +389,16 @@ int qemuSetupCgroup(virQEMUDriverPtr driver, } } + rc = virCgroupAllowDeviceMajor(priv->cgroup, 'c', DEVICE_VFIO_MAJOR, + VIR_CGROUP_DEVICE_RW); + virDomainAuditCgroupMajor(vm, priv->cgroup, "allow", DEVICE_VFIO_MAJOR, + "vfio", "rw", rc == 0); + if (rc != 0) { + virReportSystemError(-rc, "%s", + _("unable to allow /dev/vfio/ devices")); + goto cleanup; + } + for (i = 0; deviceACL[i] != NULL ; i++) { if (access(deviceACL[i], F_OK) < 0) { VIR_DEBUG("Ignoring non-existant device %s", -- 1.7.11.7

On Thu, Apr 25, 2013 at 09:44:33PM -0400, Laine Stump wrote:
We don't know exactly the names of the VFIO devices that will be needed (and due to hotplug, we can't ever assume we won't need them at all), so we just add an ACL to allow any vfio device - they all have the major number 244 (/dev/vfio/vfio is 244,0, and the /dev/vfio/n devices are up from there).
We do the correct labelling of the /dev/vfio/"N" device in the security drivers, so we should be able todo the same for cgroups device ACL. Allowing all "N" is not acceptable from a security POV. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 04/26/2013 04:52 AM, Daniel P. Berrange wrote:
On Thu, Apr 25, 2013 at 09:44:33PM -0400, Laine Stump wrote:
We don't know exactly the names of the VFIO devices that will be needed (and due to hotplug, we can't ever assume we won't need them at all), so we just add an ACL to allow any vfio device - they all have the major number 244 (/dev/vfio/vfio is 244,0, and the /dev/vfio/n devices are up from there). We do the correct labelling of the /dev/vfio/"N" device in the security drivers, so we should be able todo the same for cgroups device ACL. Allowing all "N" is not acceptable from a security POV.
Up until now there hasn't been any cgroup-related code in the security drivers, though. So where should this go? Do we need a new driver backend for cgroups? This would then mean that we need to have three tiers of security drivers rather than two. Or can it just be put in the DAC driver?

On Fri, Apr 26, 2013 at 11:16:14AM -0400, Laine Stump wrote:
On 04/26/2013 04:52 AM, Daniel P. Berrange wrote:
On Thu, Apr 25, 2013 at 09:44:33PM -0400, Laine Stump wrote:
We don't know exactly the names of the VFIO devices that will be needed (and due to hotplug, we can't ever assume we won't need them at all), so we just add an ACL to allow any vfio device - they all have the major number 244 (/dev/vfio/vfio is 244,0, and the /dev/vfio/n devices are up from there). We do the correct labelling of the /dev/vfio/"N" device in the security drivers, so we should be able todo the same for cgroups device ACL. Allowing all "N" is not acceptable from a security POV.
Up until now there hasn't been any cgroup-related code in the security drivers, though. So where should this go? Do we need a new driver backend for cgroups? This would then mean that we need to have three tiers of security drivers rather than two. Or can it just be put in the DAC driver?
We manage perfectly well to configure ACLs for individual disks that a VM is given without having to wildcard allow every single /dev/sdN disk. That fact that you were able to make the security drivers label the /dev/vfio/n devices correctly, shows that the information required is available. So why can't you set the cgroups ACLs correctly here too ? There's no need to move cgroups code into any security driver. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 04/26/2013 11:42 AM, Daniel P. Berrange wrote:
On 04/26/2013 04:52 AM, Daniel P. Berrange wrote:
On Thu, Apr 25, 2013 at 09:44:33PM -0400, Laine Stump wrote:
We don't know exactly the names of the VFIO devices that will be needed (and due to hotplug, we can't ever assume we won't need them at all), so we just add an ACL to allow any vfio device - they all have the major number 244 (/dev/vfio/vfio is 244,0, and the /dev/vfio/n devices are up from there). We do the correct labelling of the /dev/vfio/"N" device in the security drivers, so we should be able todo the same for cgroups device ACL. Allowing all "N" is not acceptable from a security POV. Up until now there hasn't been any cgroup-related code in the security drivers, though. So where should this go? Do we need a new driver backend for cgroups? This would then mean that we need to have three tiers of security drivers rather than two. Or can it just be put in the DAC driver? We manage perfectly well to configure ACLs for individual disks that a VM is given without having to wildcard allow every single /dev/sdN disk. That fact that you were able to make the security drivers label
On Fri, Apr 26, 2013 at 11:16:14AM -0400, Laine Stump wrote: the /dev/vfio/n devices correctly, shows that the information required is available. So why can't you set the cgroups ACLs correctly here too ? There's no need to move cgroups code into any security driver.
Sorry, my brain combined the first and second sentences of your message, and understood that you wanted this to happen in the security driver. I'll look up what's done for disks.

On 04/26/2013 09:55 AM, Laine Stump wrote:
We manage perfectly well to configure ACLs for individual disks that a VM is given without having to wildcard allow every single /dev/sdN disk. That fact that you were able to make the security drivers label the /dev/vfio/n devices correctly, shows that the information required is available. So why can't you set the cgroups ACLs correctly here too ? There's no need to move cgroups code into any security driver.
Sorry, my brain combined the first and second sentences of your message, and understood that you wanted this to happen in the security driver. I'll look up what's done for disks.
Basically, we have code that does four related things - call into the security manager, call into the cgroup manager, call into the lock space manager, and finally audit the result. See qemuDomainPrepareDiskChainElement for an example. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/25/2013 09:44 PM, Laine Stump wrote:
I've pushed everything else from all 3 VFIO series.
Patch 1/4 in this series had questions from Eric about whether it is the right way to go, or if we want to do something more limited:
https://www.redhat.com/archives/libvir-list/2013-April/msg01864.html
Eric and danpb had both raised issues with Patch 2/4, so I redid it addressing all the points they brought up, and it's now ready for review:
https://www.redhat.com/archives/libvir-list/2013-April/msg01853.html https://www.redhat.com/archives/libvir-list/2013-April/msg01869.html
3/4 and 4/4 were ACKed, but depend on 2/4, so I couldn't push them.
I mixed up the ordering of these in the descriptions above - the cgroup ACL patch was 3/4, *not* 1/4. I just pushed the vir(Process|Command)SetMax(.*) patches, but not the cgroup patch.
participants (3)
-
Daniel P. Berrange
-
Eric Blake
-
Laine Stump