Re: [libvirt] [PATCHv2] lxc: give RW access to /proc/sys/net/ipv[46] to containers

Saturday, 13 December 2014

Am 12.12.2014 um 10:33 schrieb Daniel P. Berrange:
...
 On Thu, Dec 11, 2014 at 10:06:40PM +0100, Richard Weinberger wrote:
> On Tue, Dec 9, 2014 at 10:47 AM, Cédric Bosdonnat <cbosdonnat(a)suse.com&gt; wrote:
>> Some programs want to change some values for the network interfaces
>> configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them
>> allows wicked to work on openSUSE 13.2+.
>>
>> In order to mount those folders RW but keep the rest of /proc/sys RO,
>> we add temporary mounts for these folders before bind-mounting
>> /proc/sys. Those mounts will be skipped if the container doesn't have
>> its own network namespace.
>>
>> It may happen that one of the temporary mounts in /proc/ filesystem
>> isn't available due to a missing kernel feature. We need not to fail
>> in that case.
>
> IMHO we should drop the read-only /proc mount completely.
> The idea behind having a read-only /proc was to make a container less
> insecure because user namespaces did not exist yet.

 Yep, read-only /proc was a (failed) attempt to predict the future - we
 originally expected we'd need that even when user namespaces arrived,
 but of course in the end it was a waste of time. 
Correct. Let's reduce this waste of time and don't add more code. :-)

...
> Now as user namespaces are mainline and considered stable we
should
> start dropping such hacks
> instead of adding more of them.

 I'm trying to think if there are any backwards compatibility problems
 if we got rid of read-only /proc but I can't imagine any app out there
 is actively checked for a read-only /proc, so we'd probably be safe
 to just switch it read-write. 
Same here.
I'd be astonished if an application will break if you make /proc rw.
BTW: While we are here, let's make /sys/ also rw.
Again, if an application can do bad things, this is a plain kernel bug.

...
> As consequence of that libvirt has to decide what kind of
container it
> wants to support.
> IMHO the only sane way is to enforce user namespaces to provide
> reasonable isolation.
> If an user can do bad things with a read-write /proc it need to be
> fixed in the kernel
> and not in libvirt.
>
> Containers without user namespaces and a root within are insecure and
> broken by design.

 Well addition of MAC can make them secure, but of course if you have
 MAC, there's again no need to make /proc mount read-only. 
The MAC policy has to be *perfect* and has to use white listing.
Also if you make your MAC too restrictive you'll break certain programs.
You need more than just deny access to some magic files in /sys and /proc.
If you deny for example mount(2) many applications will break, most notable systemd.

I propose the following:
a) Make /sys and /proc read-write
b) If one create a container without and uid/g mapping print a big fat warning
that such a container is not suitable for hostile guests.
If the user has a specific use case where he can trust all guests, fine. But we
have to document it clearly.
Maybe a new config flag a la <i_know_what_i_m_doing/> would help too. ;-)

Thanks,
//richard

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCHv2] lxc: give RW access to /proc/sys/net/ipv[46] to containers