We have historically done a number of things with LXC that are
somewhat questionable in retrospect
1. Mounted /proc/sys read-only, but then mounted
/proc/sys/net/ipv* read-write again
2. Mounted /sys read only
3. Mount /sys/fs/cgroup/NNN/the/guest/dir to /sys/fs/cgroup/NNN
4. FUSE mount on /proc/meminfo
Items 1 & 2 are pointless as they offer no security benefit either
with or without user namespaces. Without userns it is always insecure,
with userns it is always secure, no matter what the mount state is.
Item 3 is some what dubious, since /proc/self/cgroup paths for
processes are now not visible at /sys/fs/cgroup. This really
confuses systemd inside the container making it create a broken
layout
Item 4 is some what dubious, since we're only changing some of the
fields in /proc/meminfo. It helps apps which blindly parse
/proc/meminfo to determine free system resources they can consume.
Those apps are broken even without containers being involved though,
since any application must expect to be placed inside a cgroup with
limited resources. Faking /proc/meminfo is a pretty limited workaround
that just delays the inevitable fixing of such apps..
The patch that follows just removes the items 1 & 2, but I'm thinking
we should go further and remove items 3 & 4 too.
Changing 4 in particular though is certainly classed as a guest ABI
change though, so is not something distros may wish to see when
upgrading libvirt. There is scope to argue that 1-3 are guest ABI
changes too
In full machine virt world, we deal with this using machine types.
eg each new KVM version introduces a new machine type which models
the guest ABI in a stable fashion. Guest machine types are fixed at
time of first deployment. So when libvirt / KVM is upgraded, existing
guests will not see any changes, but new guests will automatically
get the new machine type.
I'm thinking we might want make use of this in LXC before making
these changes. eg introduce a new machine 'libvirt-lxc-1' to
represent the current guest mount setup and make sure all existing
guests get that machine type. Then introduce a new machine type
libvirt-lxc-2 that removes all this cruft, which new guests will
get by default.
Alternatively we could call them 'libvirt-lxc-compat-1' and
'libvirt-lxc-bare-1' to give a clearer indication of their
functional difference and version them separately in the future ?
Regards,
Daniel
Daniel P. Berrange (1):
lxc: Stop mouning /proc and /sys read only
src/lxc/lxc_container.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
--
2.1.0