
Am 12.12.2014 um 10:33 schrieb Daniel P. Berrange:
On Thu, Dec 11, 2014 at 10:06:40PM +0100, Richard Weinberger wrote:
On Tue, Dec 9, 2014 at 10:47 AM, Cédric Bosdonnat <cbosdonnat@suse.com> wrote:
Some programs want to change some values for the network interfaces configuration in /proc/sys/net/ipv[46] folders. Giving RW access on them allows wicked to work on openSUSE 13.2+.
In order to mount those folders RW but keep the rest of /proc/sys RO, we add temporary mounts for these folders before bind-mounting /proc/sys. Those mounts will be skipped if the container doesn't have its own network namespace.
It may happen that one of the temporary mounts in /proc/ filesystem isn't available due to a missing kernel feature. We need not to fail in that case.
IMHO we should drop the read-only /proc mount completely. The idea behind having a read-only /proc was to make a container less insecure because user namespaces did not exist yet.
Yep, read-only /proc was a (failed) attempt to predict the future - we originally expected we'd need that even when user namespaces arrived, but of course in the end it was a waste of time.
Correct. Let's reduce this waste of time and don't add more code. :-)
Now as user namespaces are mainline and considered stable we should start dropping such hacks instead of adding more of them.
I'm trying to think if there are any backwards compatibility problems if we got rid of read-only /proc but I can't imagine any app out there is actively checked for a read-only /proc, so we'd probably be safe to just switch it read-write.
Same here. I'd be astonished if an application will break if you make /proc rw. BTW: While we are here, let's make /sys/ also rw. Again, if an application can do bad things, this is a plain kernel bug.
As consequence of that libvirt has to decide what kind of container it wants to support. IMHO the only sane way is to enforce user namespaces to provide reasonable isolation. If an user can do bad things with a read-write /proc it need to be fixed in the kernel and not in libvirt.
Containers without user namespaces and a root within are insecure and broken by design.
Well addition of MAC can make them secure, but of course if you have MAC, there's again no need to make /proc mount read-only.
The MAC policy has to be *perfect* and has to use white listing. Also if you make your MAC too restrictive you'll break certain programs. You need more than just deny access to some magic files in /sys and /proc. If you deny for example mount(2) many applications will break, most notable systemd. I propose the following: a) Make /sys and /proc read-write b) If one create a container without and uid/g mapping print a big fat warning that such a container is not suitable for hostile guests. If the user has a specific use case where he can trust all guests, fine. But we have to document it clearly. Maybe a new config flag a la <i_know_what_i_m_doing/> would help too. ;-) Thanks, //richard