[libvirt] LXC: making the private root filesystem more secure

When I wrote the private root filesystem stuff for LXC (which I just committed) I noted that we couldn't actually make this secure, because someone inside the chroot can just 'mknod' and access the host devices. What I completely forgot was that cgroups as of 2.6.26 has device ACLs If we place every container in a cgroup (which was planned anyway), then we can trivially prevent containers accessing host devices One time setup mount -t cgroups /dev/cgroups mkdir /dev/cgroups/libvirt mkdir /dev/cgroups/libvirt/lxc For each new container 'NAME' mkdir /dev/cgroups/libvirt/lxc/{NAME} echo "a" > /dev/cgroups/libvirt/lxc/{NAME}/devices.deny echo "c 1:3 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 1:5 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 1:7 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 5:1 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 1:8 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 1:9 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow This denies all devices, and then allows null, zero, full, console, random and urandom. Allowing use of 'random' is debatable. The 'devpts' namespace stuff is also needed to provide private PTYs. The 'user' namespace stuff is needed to prevent an unprivileged user in the host OS from killing off processes with same UID inside the container. There looks to be active patchsets for both of these being discussed, so we're getting close to having a genuinely useful container based virt driver with LXC Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

DB> mkdir /dev/cgroups/libvirt/lxc/{NAME} I have a small (and not-yet-working) patch that uses libcgroup[1] to setup a cgroup per container. This provides the ability to enforce the <memory> quantity on the group through memory.limit_in_bytes. I've also got some stubs that I plan to use to provide access to cpu.shares through the scheduling parameters interface. DB> echo "a" > /dev/cgroups/libvirt/lxc/{NAME}/devices.deny DB> echo "c 1:3 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 1:5 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 1:7 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 5:1 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 1:8 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 1:9 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow Adding this functionality to what I have should be rather trivial, I think. I'm still working with the libcgroup folks to get some kinks ironed out, but I will post the patches when we get something that works against some version of libcgroup. [1]: http://libcg.sourceforge.net -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

On Fri, Aug 29, 2008 at 06:59:21AM -0700, Dan Smith wrote:
DB> mkdir /dev/cgroups/libvirt/lxc/{NAME}
I have a small (and not-yet-working) patch that uses libcgroup[1] to setup a cgroup per container. This provides the ability to enforce the <memory> quantity on the group through memory.limit_in_bytes. I've also got some stubs that I plan to use to provide access to cpu.shares through the scheduling parameters interface.
DB> echo "a" > /dev/cgroups/libvirt/lxc/{NAME}/devices.deny DB> echo "c 1:3 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 1:5 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 1:7 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 5:1 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 1:8 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow DB> echo "c 1:9 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow
Adding this functionality to what I have should be rather trivial, I think.
I'm still working with the libcgroup folks to get some kinks ironed out, but I will post the patches when we get something that works against some version of libcgroup.
I just checked the libcgroup heaer file available under Fedora 9 and I'm a bit afraid of the dependancy. They expose a lot of structure, some clearly incomplete, which means liking to it in its current state may turn into a problematic dependency. Maybe I need to look further, but really all those structures should be hidden and accessors should be provided at the API level. I made that mistake in libxml2, and still have the scars ! Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

DV> I just checked the libcgroup heaer file available under Fedora 9 DV> and I'm a bit afraid of the dependancy. They expose a lot of DV> structure, some clearly incomplete, which means liking to it in its DV> current state may turn into a problematic dependency. I've become increasingly concerned about the likelihood of converging on something stable that will work for libvirt in this area. I hate to ignore an abstraction layer that may help reduce the amount of knowledge of cgroups that has to be present in libvirt. However, I'm not sure that libcgroup is really going to provide such a layer, and thus would (as you put it) become nothing but a problematic dependency. Perhaps it makes the most sense to implement a bit of cgroup support directly into libvirt to satisfy our current needs while we wait to see if libcgroup matures? -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

On Thu, Sep 04, 2008 at 12:50:35PM -0700, Dan Smith wrote:
DV> I just checked the libcgroup heaer file available under Fedora 9 DV> and I'm a bit afraid of the dependancy. They expose a lot of DV> structure, some clearly incomplete, which means liking to it in its DV> current state may turn into a problematic dependency.
I've become increasingly concerned about the likelihood of converging on something stable that will work for libvirt in this area. I hate to ignore an abstraction layer that may help reduce the amount of knowledge of cgroups that has to be present in libvirt. However, I'm not sure that libcgroup is really going to provide such a layer, and thus would (as you put it) become nothing but a problematic dependency.
Perhaps it makes the most sense to implement a bit of cgroup support directly into libvirt to satisfy our current needs while we wait to see if libcgroup matures?
Yes I don't want to presume the ability of the libcgroup to become cleaner and more stable, we can probably go with a small internal API and when/if things become nicer, then reuse libcgroup, thanks for the insights, Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Daniel Veillard wrote:
On Thu, Sep 04, 2008 at 12:50:35PM -0700, Dan Smith wrote:
DV> I just checked the libcgroup heaer file available under Fedora 9 DV> and I'm a bit afraid of the dependancy. They expose a lot of DV> structure, some clearly incomplete, which means liking to it in its DV> current state may turn into a problematic dependency.
Could you please elaborate on the structure exposed. We are more then willing to fix any incomplete information you are concerned about
I've become increasingly concerned about the likelihood of converging on something stable that will work for libvirt in this area. I hate to ignore an abstraction layer that may help reduce the amount of knowledge of cgroups that has to be present in libvirt. However, I'm not sure that libcgroup is really going to provide such a layer, and thus would (as you put it) become nothing but a problematic dependency.
Perhaps it makes the most sense to implement a bit of cgroup support directly into libvirt to satisfy our current needs while we wait to see if libcgroup matures?
Yes I don't want to presume the ability of the libcgroup to become cleaner and more stable, we can probably go with a small internal API and when/if things become nicer, then reuse libcgroup,
I am afraid that would be duplication of efforts and the small internal API will need a lot of work. We already deal with having things like multiple mount points and controllers mounted at several places and the associated complexity. We are willing to fix problems you see, please do complain at libcg-devel. -- Balbir

On Wed, Sep 10, 2008 at 03:10:53PM -0700, Balbir Singh wrote:
Daniel Veillard wrote:
On Thu, Sep 04, 2008 at 12:50:35PM -0700, Dan Smith wrote:
DV> I just checked the libcgroup heaer file available under Fedora 9 DV> and I'm a bit afraid of the dependancy. They expose a lot of DV> structure, some clearly incomplete, which means liking to it in its DV> current state may turn into a problematic dependency.
Could you please elaborate on the structure exposed. We are more then willing to fix any incomplete information you are concerned about
I believe Daniel is refering to the struct's in your public header file. The embedded comments themselves in libcgroup.h say that the structs will need to be extended with more fields as cgroups gets more capabilities. Adding fields to a struct will change the ABI unless care is taken to provide for extensibility. The cpu_controller and cg_group structs here are of particular concern /* * These data structures are heavily controller dependent, which means * any changes (additions/removal) of configuration items would have to * be reflected in this library. We might implement a plugin * infrastructure, so that we can deal with such changes with ease. */ struct cpu_controller { /*TODO: Add the cpu.usage file here, also need to automate this.*/ char *shares; /* Having strings helps us write them to files */ /* * XX: No it does not make a difference. It requires a fprintf anyway * so it needs the qualifier. */ }; struct cg_group { char *name; uid_t tasks_uid; gid_t tasks_gid; uid_t admin_uid; gid_t admin_gid; struct cpu_controller cpu_config; }; /* * A singly linked list suffices since we don't expect too many mount points */ struct mount_table { char *options; /* Name(s) of the controller */ char *mount_point; /* The place where the controller is mounted */ struct mount_table *next; }; /* * Maintain a list of all group names. These will be used during cleanup */ struct list_of_names { char *name; struct list_of_names *next; }; Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

DB> I believe Daniel is refering to the struct's in your public header DB> file. The embedded comments themselves in libcgroup.h say that the DB> structs will need to be extended with more fields as cgroups gets DB> more capabilities. Adding fields to a struct will change the ABI DB> unless care is taken to provide for extensibility. The DB> cpu_controller and cg_group structs here are of particular concern I think that these structures in the header file are all cruft. There are access functions to eliminate the need to know the format of the data structures. I'm not aware of any public users, but there is already a maintained set of deprecated APIs for some of this. IMHO, this is indicative of the current process of evolution that libcgroup is struggling with. I think the problems run a little deeper than that, and that they have been comprehensively discussed on libcg-devel already. I definitely think that a cgroup abstraction is beneficial, but I don't think it's in a terribly useful spot at the moment. Having monitored and participated with the libcgroup development pretty closely thus far, I'd say that going forward with an internal mini-implementation is a sane thing to do at this point to get working cgroup support in a timely fashion. When the libcgroup API and functionality settles a bit, it should be easy to start using it. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

Dan Smith wrote:
DB> I believe Daniel is refering to the struct's in your public header DB> file. The embedded comments themselves in libcgroup.h say that the DB> structs will need to be extended with more fields as cgroups gets DB> more capabilities. Adding fields to a struct will change the ABI DB> unless care is taken to provide for extensibility. The DB> cpu_controller and cg_group structs here are of particular concern
I think that these structures in the header file are all cruft. There are access functions to eliminate the need to know the format of the data structures. I'm not aware of any public users, but there is already a maintained set of deprecated APIs for some of this. IMHO, this is indicative of the current process of evolution that libcgroup is struggling with.
We intend to do that so that we have stable API.
I think the problems run a little deeper than that, and that they have been comprehensively discussed on libcg-devel already. I definitely think that a cgroup abstraction is beneficial, but I don't think it's in a terribly useful spot at the moment.
We've generally honoured all requests on the mailing list and request for new API. I've generally seen you help review code or find problems, we also encourage code contributions to fix any problems you see.
Having monitored and participated with the libcgroup development pretty closely thus far, I'd say that going forward with an internal mini-implementation is a sane thing to do at this point to get working cgroup support in a timely fashion.
And this would be lesser effort than contributing code to libcgroup to fix the problems you see? Library design is hard and were fixing things, getting them to work together. Libcgroup is an open project and no-one stops you from fixing problems you see or adding new features or even rewriting the whole thing. When the libcgroup API and
functionality settles a bit, it should be easy to start using it.
I would prefer we use libcgroup now, instead of rewriting to use libcgroup later. It seems like a lose-lose situation to me. -- Balbir

Daniel P. Berrange wrote:
On Wed, Sep 10, 2008 at 03:10:53PM -0700, Balbir Singh wrote:
Daniel Veillard wrote:
On Thu, Sep 04, 2008 at 12:50:35PM -0700, Dan Smith wrote:
DV> I just checked the libcgroup heaer file available under Fedora 9 DV> and I'm a bit afraid of the dependancy. They expose a lot of DV> structure, some clearly incomplete, which means liking to it in its DV> current state may turn into a problematic dependency.
Could you please elaborate on the structure exposed. We are more then willing to fix any incomplete information you are concerned about
I believe Daniel is refering to the struct's in your public header file. The embedded comments themselves in libcgroup.h say that the structs will need to be extended with more fields as cgroups gets more capabilities. Adding fields to a struct will change the ABI unless care is taken to provide for extensibility. The cpu_controller and cg_group structs here are of particular concern
Thanks for pin-pointing them, most of these structures are used by the configuration subsystem and not part of the core API or wrappers. Most of this stuff is going away and will not affect a single export cgroup API. As an interim arrangement we can move them to a configuration specific header. The configuration subsystem is being rewritten to reuse the existing API. It is a user of the API, not a provider. -- Balbir

On Thu, Sep 11, 2008 at 02:42:46PM +0100, Daniel P. Berrange wrote:
On Wed, Sep 10, 2008 at 03:10:53PM -0700, Balbir Singh wrote:
Daniel Veillard wrote:
On Thu, Sep 04, 2008 at 12:50:35PM -0700, Dan Smith wrote:
DV> I just checked the libcgroup heaer file available under Fedora 9 DV> and I'm a bit afraid of the dependancy. They expose a lot of DV> structure, some clearly incomplete, which means liking to it in its DV> current state may turn into a problematic dependency.
Could you please elaborate on the structure exposed. We are more then willing to fix any incomplete information you are concerned about
I believe Daniel is refering to the struct's in your public header file. The embedded comments themselves in libcgroup.h say that the structs will need to be extended with more fields as cgroups gets more capabilities. Adding fields to a struct will change the ABI unless care is taken to provide for extensibility. The cpu_controller and cg_group structs here are of particular concern
/* * These data structures are heavily controller dependent, which means * any changes (additions/removal) of configuration items would have to * be reflected in this library. We might implement a plugin * infrastructure, so that we can deal with such changes with ease. */
struct cpu_controller { /*TODO: Add the cpu.usage file here, also need to automate this.*/ char *shares; /* Having strings helps us write them to files */ /* * XX: No it does not make a difference. It requires a fprintf anyway * so it needs the qualifier. */ };
struct cg_group { char *name; uid_t tasks_uid; gid_t tasks_gid; uid_t admin_uid; gid_t admin_gid; struct cpu_controller cpu_config; };
/* * A singly linked list suffices since we don't expect too many mount points */ struct mount_table { char *options; /* Name(s) of the controller */ char *mount_point; /* The place where the controller is mounted */ struct mount_table *next; };
/* * Maintain a list of all group names. These will be used during cleanup */ struct list_of_names { char *name; struct list_of_names *next; };
Hi Daniel, I have patches posted to the list proposing to remove all these exposed structures[1]. I am reposting those patches after updating them, so that they can get reviewed and be included. These were structures which were leftover from the original subsystem specific changes. Since then we have moved to a generic design. thanks, -- regards, Dhaval [1] http://sourceforge.net/mailarchive/message.php?msg_id=20080813105716.GE19247...

On Thu, Aug 28, 2008 at 11:56:58PM +0100, Daniel P. Berrange wrote:
When I wrote the private root filesystem stuff for LXC (which I just committed) I noted that we couldn't actually make this secure, because someone inside the chroot can just 'mknod' and access the host devices.
What I completely forgot was that cgroups as of 2.6.26 has device ACLs If we place every container in a cgroup (which was planned anyway), then we can trivially prevent containers accessing host devices
One time setup
mount -t cgroups /dev/cgroups mkdir /dev/cgroups/libvirt mkdir /dev/cgroups/libvirt/lxc
For each new container 'NAME'
mkdir /dev/cgroups/libvirt/lxc/{NAME} echo "a" > /dev/cgroups/libvirt/lxc/{NAME}/devices.deny echo "c 1:3 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 1:5 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 1:7 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 5:1 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 1:8 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow echo "c 1:9 rwm" > /dev/cgroups/libvirt/lxc/{NAME}/devices.allow
This denies all devices, and then allows null, zero, full, console, random and urandom. Allowing use of 'random' is debatable.
Sounds fine to me, the first 4 sounds unavoidable, for (u)random I guess that will be needed for most setup but there we are at the limit of libvirt, i.e. start to step on the policies for the guests
The 'devpts' namespace stuff is also needed to provide private PTYs. The 'user' namespace stuff is needed to prevent an unprivileged user in the host OS from killing off processes with same UID inside the container. There looks to be active patchsets for both of these being discussed, so we're getting close to having a genuinely useful container based virt driver with LXC
Which is something I would love to see for Fedora 10, possibly as an update. Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
participants (5)
-
Balbir Singh
-
Dan Smith
-
Daniel P. Berrange
-
Daniel Veillard
-
Dhaval Giani