Quoting Daniel P. Berrange (berrange(a)redhat.com):
On Tue, Jun 02, 2009 at 11:15:58AM +0900, Ryota Ozaki wrote:
> On Mon, Jun 1, 2009 at 6:24 PM, Daniel P. Berrange <berrange(a)redhat.com>
wrote:
> > On Fri, May 29, 2009 at 04:42:54PM -0500, Serge E. Hallyn wrote:
> >> Quoting Ryota Ozaki (ozaki.ryota(a)gmail.com):
> >> > On Fri, May 29, 2009 at 9:20 PM, Daniel Veillard
<veillard(a)redhat.com> wrote:
> >>
> >> Hmm, yeah but note that often userspace is out of date with respect to
> >> "recent" new kernel-related defines. I do a lot of testing on a
rhel
> >> 5.3 partition with spanking-new kernels, so rare is the time that I
> >> don't have to do
> >>
> >> #ifndef PR_CAPBSET_DROP
> >> #define PR_CAPBSET_DROP 24
> >> #endif
> >>
> >> and same for clone flags (CLONE_NEWIPC), securebits, capabilities,
> >> etc.
> >>
> >> So if the prctl(PR_CAPBSET_DROP) returns -ENOSYS then absolutely I
> >> agree,
> >
> > This makes sense as a way to deal with this problem, since it matches
> > what we already do with the various CLONE_XXX & MOUNT_XXX flags too.
>
> That convinces me too.
>
> > NB, in the not too distant future I'm going to submit code for making
> > the libvirtd daemon drop alot of its capabilities, including clearing
> > the bounding set to prevent inheritance by any child processes except
> > in required circumstances. For that I'll likely use libcap-ng so we
> > will be able to stop callin prctl() directly in the LXC driver.
>
> Oh, good. Will the new facility allow us to specify which capabilities
> to be dropped in a XML file or somewhere? Or the set of caps will be
> hard-coded?
That's TBD. My currently proof of concept code is to restrict the set
of capabilities to just those required by the libvirtd daemon itself.
LXC is an interesting problem, because inside the container, you could
argue that it should just get all capabilities. This would assume the
container prevented use of the caapbilities impacting things outside
the container.
Yeah, once again, that's waiting on us to complete the user namespaces
implementation.
At that point, all capabilities granted within a container will be
limited to resources owned by the container. For instance,
CAP_DAC_OVERRIDE will be restricted to the container's own files.
We're aware it needs to get done, and hopefully later in the year
we'll take a stab at it, but lack of time doesn't allow it right now.
-serge