Am 12.12.2014 um 10:33 schrieb Daniel P. Berrange:
On Thu, Dec 11, 2014 at 10:06:40PM +0100, Richard Weinberger wrote:
> On Tue, Dec 9, 2014 at 10:47 AM, Cédric Bosdonnat <cbosdonnat(a)suse.com> wrote:
>> Some programs want to change some values for the network interfaces
>> configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them
>> allows wicked to work on openSUSE 13.2+.
>>
>> In order to mount those folders RW but keep the rest of /proc/sys RO,
>> we add temporary mounts for these folders before bind-mounting
>> /proc/sys. Those mounts will be skipped if the container doesn't have
>> its own network namespace.
>>
>> It may happen that one of the temporary mounts in /proc/ filesystem
>> isn't available due to a missing kernel feature. We need not to fail
>> in that case.
>
> IMHO we should drop the read-only /proc mount completely.
> The idea behind having a read-only /proc was to make a container less
> insecure because user namespaces did not exist yet.
Yep, read-only /proc was a (failed) attempt to predict the future - we
originally expected we'd need that even when user namespaces arrived,
but of course in the end it was a waste of time.
Correct. Let's reduce this waste of time and don't add more code. :-)
> Now as user namespaces are mainline and considered stable we
should
> start dropping such hacks
> instead of adding more of them.
I'm trying to think if there are any backwards compatibility problems
if we got rid of read-only /proc but I can't imagine any app out there
is actively checked for a read-only /proc, so we'd probably be safe
to just switch it read-write.
Same here.
I'd be astonished if an application will break if you make /proc rw.
BTW: While we are here, let's make /sys/ also rw.
Again, if an application can do bad things, this is a plain kernel bug.
> As consequence of that libvirt has to decide what kind of
container it
> wants to support.
> IMHO the only sane way is to enforce user namespaces to provide
> reasonable isolation.
> If an user can do bad things with a read-write /proc it need to be
> fixed in the kernel
> and not in libvirt.
>
> Containers without user namespaces and a root within are insecure and
> broken by design.
Well addition of MAC can make them secure, but of course if you have
MAC, there's again no need to make /proc mount read-only.
The MAC policy has to be *perfect* and has to use white listing.
Also if you make your MAC too restrictive you'll break certain programs.
You need more than just deny access to some magic files in /sys and /proc.
If you deny for example mount(2) many applications will break, most notable systemd.
I propose the following:
a) Make /sys and /proc read-write
b) If one create a container without and uid/g mapping print a big fat warning
that such a container is not suitable for hostile guests.
If the user has a specific use case where he can trust all guests, fine. But we
have to document it clearly.
Maybe a new config flag a la <i_know_what_i_m_doing/> would help too. ;-)
Thanks,
//richard