[libvirt] [PATCH] Fix a compilation problem with LXC drop capabilities

The lxcContainerDropCapabilities() function requires PR_CAPBSET_DROP to be defined in order to compile, but it may not be defined in older kernels. So I made the compilation of the core of the function conditional, raise an error but still return 0 to not make the container initialization fail. But I'm unsure, should we just fail and return -1 if we can't drop capabilities instead ? Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard <veillard@redhat.com> wrote:
The lxcContainerDropCapabilities() function requires PR_CAPBSET_DROP to be defined in order to compile, but it may not be defined in older kernels. So I made the compilation of the core of the function conditional, raise an error but still return 0 to not make the container initialization fail. But I'm unsure, should we just fail and return -1 if we can't drop capabilities instead ?
I think it depends on applications. AFAIK, libvirt intends to support two types of applications; application workload isolation and virtual private servers. In the latter case, we MUST drop the capability and if it fails we have to fail booting a container as well. OTOH, in the former case, we might not need to fail booting. Nonetheless, I agree with the patch because old kernels that don't support PR_CAPBSET_DROP (they would be 2.6.24 or earlier) don't have enough facilities to support VPSs (e.g., they lacks sysfs, devpts, etc.). Therefore, with the old kernels we don't need to care much about the dropping-failed-but-booting-success case. ozaki-r
Daniel
-- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/
-- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

Quoting Ryota Ozaki (ozaki.ryota@gmail.com):
On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard <veillard@redhat.com> wrote:
The lxcContainerDropCapabilities() function requires PR_CAPBSET_DROP to be defined in order to compile, but it may not be defined in older kernels. So I made the compilation of the core of the function conditional, raise an error but still return 0 to not make the container initialization fail. But I'm unsure, should we just fail and return -1 if we can't drop capabilities instead ?
I think it depends on applications. AFAIK, libvirt intends to support two types of applications; application workload isolation and virtual private servers. In the latter case, we MUST drop the capability and if it fails we have to fail booting a container as well. OTOH, in the former case, we might not need to fail booting.
Nonetheless, I agree with the patch because old kernels that don't support PR_CAPBSET_DROP (they would be 2.6.24 or earlier) don't have enough facilities to support VPSs (e.g., they lacks sysfs, devpts, etc.). Therefore, with the old kernels we don't need to care much about the dropping-failed-but-booting-success case.
Hmm, yeah but note that often userspace is out of date with respect to "recent" new kernel-related defines. I do a lot of testing on a rhel 5.3 partition with spanking-new kernels, so rare is the time that I don't have to do #ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif and same for clone flags (CLONE_NEWIPC), securebits, capabilities, etc. So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I agree, but the patch just does #ifdef PR_CAPBSET_DROP which seems wrong. -serge

On Fri, May 29, 2009 at 04:42:54PM -0500, Serge E. Hallyn wrote:
Quoting Ryota Ozaki (ozaki.ryota@gmail.com):
On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard <veillard@redhat.com> wrote:
The lxcContainerDropCapabilities() function requires PR_CAPBSET_DROP to be defined in order to compile, but it may not be defined in older kernels. So I made the compilation of the core of the function conditional, raise an error but still return 0 to not make the container initialization fail. But I'm unsure, should we just fail and return -1 if we can't drop capabilities instead ?
I think it depends on applications. AFAIK, libvirt intends to support two types of applications; application workload isolation and virtual private servers. In the latter case, we MUST drop the capability and if it fails we have to fail booting a container as well. OTOH, in the former case, we might not need to fail booting.
Nonetheless, I agree with the patch because old kernels that don't support PR_CAPBSET_DROP (they would be 2.6.24 or earlier) don't have enough facilities to support VPSs (e.g., they lacks sysfs, devpts, etc.). Therefore, with the old kernels we don't need to care much about the dropping-failed-but-booting-success case.
Hmm, yeah but note that often userspace is out of date with respect to "recent" new kernel-related defines. I do a lot of testing on a rhel 5.3 partition with spanking-new kernels, so rare is the time that I don't have to do
#ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif
and same for clone flags (CLONE_NEWIPC), securebits, capabilities, etc.
So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I agree, but the patch just does
#ifdef PR_CAPBSET_DROP
which seems wrong.
Well, if you have a better patch to suggest, fire :-) Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@veillard.com | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/

Quoting Daniel Veillard (veillard@redhat.com):
On Fri, May 29, 2009 at 04:42:54PM -0500, Serge E. Hallyn wrote:
Quoting Ryota Ozaki (ozaki.ryota@gmail.com):
On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard <veillard@redhat.com> wrote:
The lxcContainerDropCapabilities() function requires PR_CAPBSET_DROP to be defined in order to compile, but it may not be defined in older kernels. So I made the compilation of the core of the function conditional, raise an error but still return 0 to not make the container initialization fail. But I'm unsure, should we just fail and return -1 if we can't drop capabilities instead ?
I think it depends on applications. AFAIK, libvirt intends to support two types of applications; application workload isolation and virtual private servers. In the latter case, we MUST drop the capability and if it fails we have to fail booting a container as well. OTOH, in the former case, we might not need to fail booting.
Nonetheless, I agree with the patch because old kernels that don't support PR_CAPBSET_DROP (they would be 2.6.24 or earlier) don't have enough facilities to support VPSs (e.g., they lacks sysfs, devpts, etc.). Therefore, with the old kernels we don't need to care much about the dropping-failed-but-booting-success case.
Hmm, yeah but note that often userspace is out of date with respect to "recent" new kernel-related defines. I do a lot of testing on a rhel 5.3 partition with spanking-new kernels, so rare is the time that I don't have to do
#ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif
and same for clone flags (CLONE_NEWIPC), securebits, capabilities, etc.
So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I agree, but the patch just does
#ifdef PR_CAPBSET_DROP
which seems wrong.
Well, if you have a better patch to suggest, fire :-)
No, I'm not sure how it should be handled. Some ugly autoconf thing? Maybe PR_CAPBSET_DROP should get hand-defined, and the code at runtime handles its failure? I'll take a look next week. Of course in the meantime this patch is better than nothing :) thanks, -serge

On Fri, May 29, 2009 at 04:42:54PM -0500, Serge E. Hallyn wrote:
Quoting Ryota Ozaki (ozaki.ryota@gmail.com):
On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard <veillard@redhat.com> wrote:
Hmm, yeah but note that often userspace is out of date with respect to "recent" new kernel-related defines. I do a lot of testing on a rhel 5.3 partition with spanking-new kernels, so rare is the time that I don't have to do
#ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif
and same for clone flags (CLONE_NEWIPC), securebits, capabilities, etc.
So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I agree,
This makes sense as a way to deal with this problem, since it matches what we already do with the various CLONE_XXX & MOUNT_XXX flags too. NB, in the not too distant future I'm going to submit code for making the libvirtd daemon drop alot of its capabilities, including clearing the bounding set to prevent inheritance by any child processes except in required circumstances. For that I'll likely use libcap-ng so we will be able to stop callin prctl() directly in the LXC driver. Regards, Daniel [1] libcap-ng isn't technically yet announced to the world, but it'll appear real soon... -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Mon, Jun 1, 2009 at 6:24 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Fri, May 29, 2009 at 04:42:54PM -0500, Serge E. Hallyn wrote:
Quoting Ryota Ozaki (ozaki.ryota@gmail.com):
On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard <veillard@redhat.com> wrote:
Hmm, yeah but note that often userspace is out of date with respect to "recent" new kernel-related defines. I do a lot of testing on a rhel 5.3 partition with spanking-new kernels, so rare is the time that I don't have to do
#ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif
and same for clone flags (CLONE_NEWIPC), securebits, capabilities, etc.
So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I agree,
This makes sense as a way to deal with this problem, since it matches what we already do with the various CLONE_XXX & MOUNT_XXX flags too.
That convinces me too.
NB, in the not too distant future I'm going to submit code for making the libvirtd daemon drop alot of its capabilities, including clearing the bounding set to prevent inheritance by any child processes except in required circumstances. For that I'll likely use libcap-ng so we will be able to stop callin prctl() directly in the LXC driver.
Oh, good. Will the new facility allow us to specify which capabilities to be dropped in a XML file or somewhere? Or the set of caps will be hard-coded? ozaki-r
Regards, Daniel
[1] libcap-ng isn't technically yet announced to the world, but it'll appear real soon... -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, Jun 02, 2009 at 11:15:58AM +0900, Ryota Ozaki wrote:
On Mon, Jun 1, 2009 at 6:24 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Fri, May 29, 2009 at 04:42:54PM -0500, Serge E. Hallyn wrote:
Quoting Ryota Ozaki (ozaki.ryota@gmail.com):
On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard <veillard@redhat.com> wrote:
Hmm, yeah but note that often userspace is out of date with respect to "recent" new kernel-related defines. I do a lot of testing on a rhel 5.3 partition with spanking-new kernels, so rare is the time that I don't have to do
#ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif
and same for clone flags (CLONE_NEWIPC), securebits, capabilities, etc.
So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I agree,
This makes sense as a way to deal with this problem, since it matches what we already do with the various CLONE_XXX & MOUNT_XXX flags too.
That convinces me too.
NB, in the not too distant future I'm going to submit code for making the libvirtd daemon drop alot of its capabilities, including clearing the bounding set to prevent inheritance by any child processes except in required circumstances. For that I'll likely use libcap-ng so we will be able to stop callin prctl() directly in the LXC driver.
Oh, good. Will the new facility allow us to specify which capabilities to be dropped in a XML file or somewhere? Or the set of caps will be hard-coded?
That's TBD. My currently proof of concept code is to restrict the set of capabilities to just those required by the libvirtd daemon itself. LXC is an interesting problem, because inside the container, you could argue that it should just get all capabilities. This would assume the container prevented use of the caapbilities impacting things outside the container. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Quoting Daniel P. Berrange (berrange@redhat.com):
On Tue, Jun 02, 2009 at 11:15:58AM +0900, Ryota Ozaki wrote:
On Mon, Jun 1, 2009 at 6:24 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Fri, May 29, 2009 at 04:42:54PM -0500, Serge E. Hallyn wrote:
Quoting Ryota Ozaki (ozaki.ryota@gmail.com):
On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard <veillard@redhat.com> wrote:
Hmm, yeah but note that often userspace is out of date with respect to "recent" new kernel-related defines. I do a lot of testing on a rhel 5.3 partition with spanking-new kernels, so rare is the time that I don't have to do
#ifndef PR_CAPBSET_DROP #define PR_CAPBSET_DROP 24 #endif
and same for clone flags (CLONE_NEWIPC), securebits, capabilities, etc.
So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I agree,
This makes sense as a way to deal with this problem, since it matches what we already do with the various CLONE_XXX & MOUNT_XXX flags too.
That convinces me too.
NB, in the not too distant future I'm going to submit code for making the libvirtd daemon drop alot of its capabilities, including clearing the bounding set to prevent inheritance by any child processes except in required circumstances. For that I'll likely use libcap-ng so we will be able to stop callin prctl() directly in the LXC driver.
Oh, good. Will the new facility allow us to specify which capabilities to be dropped in a XML file or somewhere? Or the set of caps will be hard-coded?
That's TBD. My currently proof of concept code is to restrict the set of capabilities to just those required by the libvirtd daemon itself. LXC is an interesting problem, because inside the container, you could argue that it should just get all capabilities. This would assume the container prevented use of the caapbilities impacting things outside the container.
Yeah, once again, that's waiting on us to complete the user namespaces implementation. At that point, all capabilities granted within a container will be limited to resources owned by the container. For instance, CAP_DAC_OVERRIDE will be restricted to the container's own files. We're aware it needs to get done, and hopefully later in the year we'll take a stab at it, but lack of time doesn't allow it right now. -serge

On Mon, Jun 01, 2009 at 10:24:12AM +0100, Daniel P. Berrange wrote:
NB, in the not too distant future I'm going to submit code for making the libvirtd daemon drop alot of its capabilities, including clearing the bounding set to prevent inheritance by any child processes except in required circumstances. For that I'll likely use libcap-ng so we will be able to stop callin prctl() directly in the LXC driver.
[1] libcap-ng isn't technically yet announced to the world, but it'll appear real soon...
FYI, for those who use Linux capabilities, this is now available to the world and well worth a look if you're battling with old libcap http://people.redhat.com/sgrubb/libcap-ng/index.html Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Fri, May 29, 2009 at 02:20:04PM +0200, Daniel Veillard wrote:
The lxcContainerDropCapabilities() function requires PR_CAPBSET_DROP to be defined in order to compile, but it may not be defined in older kernels. So I made the compilation of the core of the function conditional, raise an error but still return 0 to not make the container initialization fail. But I'm unsure, should we just fail and return -1 if we can't drop capabilities instead ?
I think that lxcError() call should just be a VIR_WARN message here, since that mirrors what we do in a few other cases such as lack of /dev/pts private instances. The patch is good in general though Daniel
Index: src/lxc_container.c =================================================================== RCS file: /data/cvs/libxen/src/lxc_container.c,v retrieving revision 1.30 diff -u -u -p -r1.30 lxc_container.c --- src/lxc_container.c 13 May 2009 11:37:17 -0000 1.30 +++ src/lxc_container.c 29 May 2009 12:15:16 -0000 @@ -644,6 +644,7 @@ static int lxcContainerSetupMounts(virDo
static int lxcContainerDropCapabilities(virDomainDefPtr vmDef ATTRIBUTE_UNUSED) { +#ifdef PR_CAPBSET_DROP int i; const struct { int id; @@ -660,7 +661,10 @@ static int lxcContainerDropCapabilities( return -1; } } - +#else /* ! PR_CAPBSET_DROP */ + lxcError(NULL, NULL, VIR_ERR_INTERNAL_ERROR, + _("failed to drop capabilities PR_CAPBSET_DROP undefined")); +#endif return 0; }
-- Libvir-list mailing list Libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
participants (4)
-
Daniel P. Berrange
-
Daniel Veillard
-
Ryota Ozaki
-
Serge E. Hallyn