[libvirt] [PATCH 0 of 2] Cgroup and LXC fixes

This set moves the cgroup creation before that of the container process, thus ensuring that it is put in the cgroup as well. As a result, I noticed that we need to allow device access to /dev/pts/*, and thus added a cgroup mechanism to allow a whole major device type. The LXC driver is made to allow major type 136 as a result. Note that this doesn't seem to do much to really restrict the container. While it does prevent them from opening devices other than what are allowed, the container can still mount (or access) the cgroup filesystem and move itself out of its own group and into the unrestricted root. Further, it can just add whitelist entries for the devices it wants to gain access. I tested code to restrict the devices in the per-driver cgroup, but that appears to have no effect, because from within the container, I can still add "b 8:* rwm" to my group's devices.allow and subsequently access SCSI disks. Even still, this patch set is crucial for proper cgroup membership of the container children.

This interface provides a way to allow an entire major device type diff -r 388beff59a32 -r 48a9209668ab src/cgroup.c --- a/src/cgroup.c Wed Oct 15 10:33:01 2008 +0000 +++ b/src/cgroup.c Thu Oct 16 14:01:13 2008 -0700 @@ -761,6 +761,36 @@ return rc; } +/** + * virCgroupAllowDeviceMajor: + * + * @group: The cgroup to allow an entire device major type for + * @type: The device type (i.e., 'c' or 'b') + * @major: The major number of the device type + * + * Returns: 0 on success + */ +int virCgroupAllowDeviceMajor(virCgroupPtr group, + char type, + int major) +{ + int rc; + char *devstr = NULL; + + if (asprintf(&devstr, "%c %i:* rwm", type, major) == -1) { + rc = -ENOMEM; + goto out; + } + + rc = virCgroupSetValueStr(group, + "devices.allow", + devstr); + out: + VIR_FREE(devstr); + + return rc; +} + int virCgroupSetCpuShares(virCgroupPtr group, unsigned long shares) { return virCgroupSetValueU64(group, "cpu.shares", (uint64_t)shares); diff -r 388beff59a32 -r 48a9209668ab src/cgroup.h --- a/src/cgroup.h Wed Oct 15 10:33:01 2008 +0000 +++ b/src/cgroup.h Thu Oct 16 14:01:13 2008 -0700 @@ -35,6 +35,9 @@ char type, int major, int minor); +int virCgroupAllowDeviceMajor(virCgroupPtr group, + char type, + int major); int virCgroupSetCpuShares(virCgroupPtr group, unsigned long shares); int virCgroupGetCpuShares(virCgroupPtr group, unsigned long *shares);

On Thu, Oct 16, 2008 at 02:07:56PM -0700, Dan Smith wrote:
This interface provides a way to allow an entire major device type
ACK. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Oct 16, 2008 at 02:07:56PM -0700, Dan Smith wrote:
This interface provides a way to allow an entire major device type
Looks uncontroversial, +1 Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Without this, our container child doesn't actually end up in the cgroup, and thus runs unrestricted. Note that this does not address the container's ability to mount cgroup and move itself into the parent namespace. While making this change, it became clear that we need to allow access to the entire range of pty devices for the container console to work. This patch adds that logic as well. diff -r 48a9209668ab -r 4babf77ec518 src/lxc_container.h --- a/src/lxc_container.h Thu Oct 16 14:01:13 2008 -0700 +++ b/src/lxc_container.h Thu Oct 16 14:01:13 2008 -0700 @@ -40,6 +40,8 @@ #define LXC_DEV_MAJ_TTY 5 #define LXC_DEV_MIN_CONSOLE 1 +#define LXC_DEV_MAJ_PTY 136 + int lxcContainerSendContinue(int control); int lxcContainerStart(virDomainDefPtr def, diff -r 48a9209668ab -r 4babf77ec518 src/lxc_controller.c --- a/src/lxc_controller.c Thu Oct 16 14:01:13 2008 -0700 +++ b/src/lxc_controller.c Thu Oct 16 14:01:13 2008 -0700 @@ -102,6 +102,10 @@ if (rc != 0) goto out; } + + rc = virCgroupAllowDeviceMajor(cgroup, 'c', LXC_DEV_MAJ_PTY); + if (rc != 0) + goto out; rc = virCgroupAddTask(cgroup, getpid()); out: @@ -449,6 +453,9 @@ goto cleanup; } + if (lxcSetContainerResources(def) < 0) + goto cleanup; + if ((container = lxcContainerStart(def, nveths, veths, @@ -459,9 +466,6 @@ control[1] = -1; if (lxcControllerMoveInterfaces(nveths, veths, container) < 0) - goto cleanup; - - if (lxcSetContainerResources(def) < 0) goto cleanup; if (lxcContainerSendContinue(control[0]) < 0)

On Thu, Oct 16, 2008 at 02:07:57PM -0700, Dan Smith wrote:
Without this, our container child doesn't actually end up in the cgroup, and thus runs unrestricted. Note that this does not address the container's ability to mount cgroup and move itself into the parent namespace.
While making this change, it became clear that we need to allow access to the entire range of pty devices for the container console to work. This patch adds that logic as well.
Yep, we also need the kernel guys to finish PTS namespace virtualization so we can actually make /dev/pts private to the container :-) ACK to this patch. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Thu, Oct 16, 2008 at 02:07:57PM -0700, Dan Smith wrote:
Without this, our container child doesn't actually end up in the cgroup, and thus runs unrestricted. Note that this does not address the container's ability to mount cgroup and move itself into the parent namespace.
Okay this moves the initialization earlier, makes sense, +1 Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

DS> This set moves the cgroup creation before that of the container DS> process, thus ensuring that it is put in the cgroup as well. As a DS> result, I noticed that we need to allow device access to DS> /dev/pts/*, and thus added a cgroup mechanism to allow a whole DS> major device type. The LXC driver is made to allow major type 136 DS> as a result. Were there any comments on this set? I think it would be nice to get it into the tree sooner than later so we at least have resource controls on containers :) -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

On Tue, Oct 21, 2008 at 07:58:29AM -0700, Dan Smith wrote:
DS> This set moves the cgroup creation before that of the container DS> process, thus ensuring that it is put in the cgroup as well. As a DS> result, I noticed that we need to allow device access to DS> /dev/pts/*, and thus added a cgroup mechanism to allow a whole DS> major device type. The LXC driver is made to allow major type 136 DS> as a result.
Were there any comments on this set? I think it would be nice to get it into the tree sooner than later so we at least have resource controls on containers :)
sorry for the delay, please push ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

DV> sorry for the delay, please push ! Done. Thanks! -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com
participants (3)
-
Dan Smith
-
Daniel P. Berrange
-
Daniel Veillard