Daniel P. Berrange wrote:
On Tue, Jun 23, 2009 at 07:45:34PM -0700, Casey Schaufler wrote:
> Serge E. Hallyn wrote:
>
>> Quoting Daniel P. Berrange (berrange(a)redhat.com):
>>
>>
>>> This patch updates the LXC driver to make use of libcap-ng for managing
>>> process capabilities. Previously Ryota Ozaki had provided code to remove
>>> the CAP_BOOT capabilities inside the container, preventing host reboots.
>>> In addition to that one, I believe we should be removing ability to
>>> load kernel modules, change the system clock and changing audit/MAC.
>>> So this patch also clears the following:
>>>
>>> CAP_SYS_MODULE, /* No kernel module loading */
>>> CAP_SYS_TIME, /* No changing the clock */
>>> CAP_AUDIT_CONTROL, /* No messing with auditing */
>>> CAP_AUDIT_WRITE, /* No messing with auditing */
>>> CAP_MAC_ADMIN, /* No messing with LSM */
>>> CAP_MAC_OVERRIDE, /* No messing with LSM */
>>>
>>>
> What is going to run inside your container? Turning off the MAC
> capabilities can seriously limit the programs that can run inside
> it. If you can't drop CAP_DAC_OVERRIDE or CAP_KILL it's unlikely
> that it makes sense to drop CAP_MAC_OVERRIDE. Similarly, if you
> can't drop CAP_FOWNER or CAP_CHOWN you'll probably be ill advised
> to forgo CAP_MAC_ADMIN.
>
The containers are all run with
CLONE_NEWPID|CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWNET
and each has a private filesystem setup. Thus there is no need to
restrict things like CAP_FOWNER/CHOWN, since the only files the
container process can access are those within its private FS.
Likewise CAP_KILL is ok, since CLONE_NEWPID ensures the container
can only see its own processes, and none of those from the host.
Given those CLONE_* flags being set, is it safe to allow a
container to have CAP_MAC_ADMIN/CAP_MAC_OVERRIDE capabilities ?
I was concerned that those capabilities may still allow the
container to perform changes that would impact MAC settings
for the host as a hole, and not be confined. If that's not
the case, then I will change the patch to not clear those
capabilieis.
CAP_MAC_OVERRIDE deals with enforcement of the policy itself,
like CAP_DAC_OVERRIDE. It does not have "host as a whole"
implications. CAP_MAC_ADMIN on the other hand allows you (on
Smack at least) to change the configuration (e.g access rules)
and definitely does have global implications.
It seems that you should be OK to allow CAP_MAC_OVERRIDE in your
scheme but not CAP_MAC_ADMIN. I think that is consistent with the
SELinux use of CAP_MAC_ADMIN (I don't think they're using
CAP_MAC_OVERRIDE at all) but you'll want to confirm that.