Re: [libvirt] [PATCH] lxc: Cleaning up mount setup

Thursday, 8 January 2015

Am 08.01.2015 um 14:02 schrieb Daniel P. Berrange:
...
 We have historically done a number of things with LXC that are
 somewhat questionable in retrospect

  1. Mounted /proc/sys read-only, but then mounted
     /proc/sys/net/ipv* read-write again
  2. Mounted /sys read only
  3. Mount /sys/fs/cgroup/NNN/the/guest/dir to /sys/fs/cgroup/NNN
  4. FUSE mount on /proc/meminfo

 Items 1 & 2 are pointless as they offer no security benefit either
 with or without user namespaces. Without userns it is always insecure,
 with userns it is always secure, no matter what the mount state is. 
I agree. Thanks a lot for addressing this, Daniel!

...
 Item 3 is some what dubious, since /proc/self/cgroup paths for
 processes are now not visible at /sys/fs/cgroup. This really
 confuses systemd inside the container making it create a broken
 layout 
The question is, how to support systemd in containers?

As of now I'm not aware of a working concept.
With current libvirt it kind of works but recently I found a very nasty issue:
See: https://www.redhat.com/archives/libvir-list/2014-November/msg01090.html

Maybe with cgroup namespaces it works. i.e. such that systemd can mount cgroupfs
within the container in a secure way.
The current discussion can be found here: https://lkml.org/lkml/2015/1/7/150

As of now I have to drop all my systemd lxc guests and will replace them by
a non-systemd distro, which is very sad. :-(

...
 Item 4 is some what dubious, since we're only changing some of
the
 fields in /proc/meminfo. It helps apps which blindly parse
 /proc/meminfo to determine free system resources they can consume.
 Those apps are broken even without containers being involved though,
 since any application must expect to be placed inside a cgroup with
 limited resources. Faking /proc/meminfo is a pretty limited workaround
 that just delays the inevitable fixing of such apps.. 
You mean that tools like free(1) have to be patched to query also
memory limits from cgroupfs?

...
 The patch that follows just removes the items 1 & 2, but I'm
thinking
 we should go further and remove items 3 & 4 too.

 Changing 4 in particular though is certainly classed as a guest ABI
 change though, so is not something distros may wish to see when
 upgrading libvirt. There is scope to argue that 1-3 are guest ABI
 changes too

 In full machine virt world, we deal with this using machine types.
 eg each new KVM version introduces a new machine type which models
 the guest ABI in a stable fashion. Guest machine types are fixed at
 time of first deployment. So when libvirt / KVM is upgraded, existing
 guests will not see any changes, but new guests will automatically
 get the new machine type.

 I'm thinking we might want make use of this in LXC before making
 these changes. eg introduce a new machine 'libvirt-lxc-1' to
 represent the current guest mount setup and make sure all existing
 guests get that machine type. Then introduce a new machine type
 libvirt-lxc-2 that removes all this cruft, which new guests will
 get by default.

 Alternatively we could call them 'libvirt-lxc-compat-1' and
 'libvirt-lxc-bare-1' to give a clearer indication of their
 functional difference and version them separately in the future ? 
Can we have a new machine type which enforces user namespaces?

...
 Regards,
 Daniel

 Daniel P. Berrange (1):
   lxc: Stop mouning /proc and /sys read only

  src/lxc/lxc_container.c | 15 +++++++++++----
  1 file changed, 11 insertions(+), 4 deletions(-) 
Acked-by: Richard Weinberger <richard(a)nod.at&gt;

Thanks,
//richard

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH] lxc: Cleaning up mount setup