Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1

Wednesday, 20 August 2014

On Wed, Aug 6, 2014 at 2:57 AM, Eric W. Biederman <ebiederm(a)xmission.com&gt; wrote:
...

 Linus,

 Please pull the for-linus branch from the git tree:

    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-linus

    HEAD: 344470cac42e887e68cfb5bdfa6171baf27f1eb5 proc: Point /proc/mounts at
/proc/thread-self/mounts instead of /proc/self/mounts

 This is a bunch of small changes built against 3.16-rc6.  The most
 significant change for users is the first patch which makes setns
 drmatically faster by removing unneded rcu handling.

 The next chunk of changes are so that "mount -o remount,.." will not
 allow the user namespace root to drop flags on a mount set by the system
 wide root.  Aks this forces read-only mounts to stay read-only, no-dev
 mounts to stay no-dev, no-suid mounts to stay no-suid, no-exec mounts to
 stay no exec and it prevents unprivileged users from messing with a
 mounts atime settings.  I have included my test case as the last patch
 in this series so people performing backports can verify this change
 works correctly.

 The next change fixes a bug in NFS that was discovered while auditing
 nsproxy users for the first optimization.  Today you can oops the kernel
 by reading /proc/fs/nfsfs/{servers,volumes} if you are clever with pid
 namespaces.  I rebased and fixed the build of the !CONFIG_NFS_FS case
 yesterday when a build bot caught my typo.  Given that no one to my
 knowledge bases anything on my tree fixing the typo in place seems more
 responsible that requiring a typo-fix to be backported as well.

 The last change is a small semantic cleanup introducing
 /proc/thread-self and pointing /proc/mounts and /proc/net at it.  This
 prevents several kinds of problemantic corner cases.  It is a
 user-visible change so it has a minute chance of causing regressions so
 the change to /proc/mounts and /proc/net are individual one line commits
 that can be trivially reverted.  Unfortunately I lost and could not find
 the email of the original reporter so he is not credited.  From at least
 one perspective this change to /proc/net is a refgression fix to allow
 pthread /proc/net uses that were broken by the introduction of the network
 namespace.

 Eric

 Eric W. Biederman (11):
       namespaces: Use task_lock and not rcu to protect nsproxy
       mnt: Only change user settable mount flags in remount
       mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
       mnt: Correct permission checks in do_remount 
This commit breaks libvirt-lxc.
libvirt does in lxcContainerMountBasicFS():

        /*
         * We can't immediately set the MS_RDONLY flag when mounting filesystems
         * because (in at least some kernel versions) this will propagate back
         * to the original mount in the host OS, turning it readonly too. Thus
         * we mount the filesystem in read-write mode initially, and then do a
         * separate read-only bind mount on top of that.
         */
        bindOverReadonly = !!(mnt_mflags & MS_RDONLY);

        VIR_DEBUG("Mount %s on %s type=%s flags=%x",
                  mnt_src, mnt->dst, mnt->type, mnt_mflags & ~MS_RDONLY);
        if (mount(mnt_src, mnt->dst, mnt->type, mnt_mflags &
~MS_RDONLY, NULL) < 0) {

^^^^ Here it fails for sysfs because with user namespaces we bind the
existing /sys into the container
and would have to read out all existing mount flags from the current /sys mount.
Otherwise mount() fails with EPERM.
On my test system /sys is mounted with
"rw,nosuid,nodev,noexec,relatime" and libvirt
misses the realtime...

            virReportSystemError(errno,
                                 _("Failed to mount %s on %s type %s
flags=%x"),
                                 mnt_src, mnt->dst, NULLSTR(mnt->type),
                                 mnt_mflags & ~MS_RDONLY);
            goto cleanup;
        }

        if (bindOverReadonly &&
            mount(mnt_src, mnt->dst, NULL,
                  MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) {

^^^ Here it fails because now we'd have to specify all flags as used
for the first
mount. For the procfs case MS_NOSUID|MS_NOEXEC|MS_NODEV.
See lxcBasicMounts[].
In this case the fix is easy, add mnt_mflags to the mount flags.

         virReportSystemError(errno,
                                 _("Failed to re-mount %s on %s flags=%x"),
                                 mnt_src, mnt->dst,
                                 MS_BIND|MS_REMOUNT|MS_RDONLY);
            goto cleanup;
        }

-- 
Thanks,
//richard

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [GIT PULL] namespace updates for v3.17-rc1