Re: [libvirt] [CFT][PATCH 00/10] Making new mounts of proc and sysfs as safe as bind mounts (take 2)

[CC'ing libvirt-lxc folks] Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
Richard Weinberger <richard@nod.at> writes:
Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
FWIW, it breaks also libvirt-lxc: Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted
Interesting. I had not anticipated a failure there? And it is failing in remount? Oh that is interesting.
That implies that there is some flag of the original mount of /proc that the remount of /proc/sys is clearing, and that previously
The flags specified are current rdonly,remount,bind so I expect there are some other flags on proc that libvirt-lxc is clearing by accident and we did not fail before because the kernel was not enforcing things.
Please see: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9... lxcContainerMountBasicFS()
and: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9... lxcBasicMounts
What are the mount flags in a working libvirt-lxc?
See: test1:~ # cat /proc/self/mountinfo 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
If you need more info, please let me know. :-)
Oh interesting I had not realized libvirt-lxc had grown an unprivileged mode using user namespaces.
This does appear to be a classic remount bug, where you are not preserving the permissions. It appears the fact that the code failed to enforce locked permissions on the fresh mount of proc was hiding this bug until now.
I expect what you actually want is the code below:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..f008a7484bfe 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
Or possibly just:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..a60ccbd12bfc 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
As the there is little point in making /proc/sys read-only in a user-namespace, as the permission checks are uid based and no-one should have the global uid 0 in your container. Making mounting /proc/sys read-only rather pointless.
Eric, using the patch below I was able to spawn a user-namespace enabled container using libvirt-lxc. :-) I had to: 1. Disable the read-only mount of /proc/sys which is anyway useless in the user-namespace case. 2. Disable the /proc/sys/net/ipv{4,6} bind mounts, this ugly hack is only needed for the non user-namespace case. 3. Remove MS_RDONLY from the sysfs mount (For the non user-namespace case we'd have to keep this, though). Daniel, I'd take this as a chance to disable all the MS_RDONLY games if user-namespace are configured. With Eric's fixes they hurt us. And as I wrote many times before if root within the user-namespace is able to do nasty things in /sys and /proc that's a plain kernel bug which needs fixing. There is no point in mounting these read-only. Except for the case then no user-namespace is used. diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c..497e05f 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,10 +850,10 @@ typedef struct { static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, - { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, - { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, - { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false }, + { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, true, false, true }, + { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, true, false, true }, + { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, { "securityfs", "/sys/kernel/security", "securityfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true, false }, #if WITH_SELINUX { SELINUX_MOUNT, SELINUX_MOUNT, "selinuxfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true, false }, Thanks, //richard

Richard Weinberger <richard@nod.at> writes:
[CC'ing libvirt-lxc folks]
Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
Richard Weinberger <richard@nod.at> writes:
Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
FWIW, it breaks also libvirt-lxc: Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted
Interesting. I had not anticipated a failure there? And it is failing in remount? Oh that is interesting.
That implies that there is some flag of the original mount of /proc that the remount of /proc/sys is clearing, and that previously
The flags specified are current rdonly,remount,bind so I expect there are some other flags on proc that libvirt-lxc is clearing by accident and we did not fail before because the kernel was not enforcing things.
Please see: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9... lxcContainerMountBasicFS()
and: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9... lxcBasicMounts
What are the mount flags in a working libvirt-lxc?
See: test1:~ # cat /proc/self/mountinfo 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
If you need more info, please let me know. :-)
Oh interesting I had not realized libvirt-lxc had grown an unprivileged mode using user namespaces.
This does appear to be a classic remount bug, where you are not preserving the permissions. It appears the fact that the code failed to enforce locked permissions on the fresh mount of proc was hiding this bug until now.
I expect what you actually want is the code below:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..f008a7484bfe 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
Or possibly just:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..a60ccbd12bfc 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
As the there is little point in making /proc/sys read-only in a user-namespace, as the permission checks are uid based and no-one should have the global uid 0 in your container. Making mounting /proc/sys read-only rather pointless.
Eric, using the patch below I was able to spawn a user-namespace enabled container using libvirt-lxc. :-)
I am glad. I am trying to figure out which set of changes were necessary vs just nice to have, to inform that part of the conversation that is asking is there a way we can avoid breaking userspace for this security fix.
I had to: 1. Disable the read-only mount of /proc/sys which is anyway useless in the user-namespace case.
It is likely worth addressing the libvirt-lxc MS_REMOUNT code as it does not preserve any mount flags, or even have the capability to try. if (bindOverReadonly && mount(mnt_src, mnt->dst, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { virReportSystemError(errno, _("Failed to re-mount %s on %s flags=%x"), mnt_src, mnt->dst, MS_BIND|MS_REMOUNT|MS_RDONLY); goto cleanup; } Aka the flags during remount are hard coded (which is buggy). So I believe even without the use of user-namespaces this code does the wrong thing. Likely statvfs needs to be called to get the existing mount flags and those should be applied during remount or possibly just the mount flags from the virLXCBasicMountInfo entry should be added.
2. Disable the /proc/sys/net/ipv{4,6} bind mounts, this ugly hack is only needed for the non user-namespace case.
*Scratches my head* Why was this necessary? Those are just plain bind mounts which do not need any remount-magic so they should have just worked and preserved the existing mount flags. I agree they are unnecessary in this context but I don't expect they would have cause problems or were "wrong".
3. Remove MS_RDONLY from the sysfs mount (For the non user-namespace case we'd have to keep this, though).
Ok. I can see this as being necessary as well, and missed in the first pass because the code did not get this far. The code flow for sysfs appears to trigger the bindOverReadOnly code as MS_RDONLY is set. Then the remount clears the other mount flags on sysfs. Which previously we would have not enforced as sysfs with a network namespace is a fresh mount (and that is the bug my patchset fixes). This does very much look like a bug in libvirt-lxc clearing flags it did not intend to.
Daniel, I'd take this as a chance to disable all the MS_RDONLY games if user-namespace are configured. With Eric's fixes they hurt us. And as I wrote many times before if root within the user-namespace is able to do nasty things in /sys and /proc that's a plain kernel bug which needs fixing. There is no point in mounting these read-only. Except for the case then no user-namespace is used.
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c..497e05f 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,10 +850,10 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, - { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, - { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, - { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false }, + { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, true, false, true }, + { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, true, false, true }, + { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, { "securityfs", "/sys/kernel/security", "securityfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true, false }, #if WITH_SELINUX { SELINUX_MOUNT, SELINUX_MOUNT, "selinuxfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true, false },
Thanks, //richard

Richard Weinberger <richard@nod.at> writes:
[CC'ing libvirt-lxc folks]
Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
Richard Weinberger <richard@nod.at> writes:
Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
FWIW, it breaks also libvirt-lxc: Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted
Interesting. I had not anticipated a failure there? And it is failing in remount? Oh that is interesting.
That implies that there is some flag of the original mount of /proc that the remount of /proc/sys is clearing, and that previously
The flags specified are current rdonly,remount,bind so I expect there are some other flags on proc that libvirt-lxc is clearing by accident and we did not fail before because the kernel was not enforcing things.
Please see: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9... lxcContainerMountBasicFS()
and: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9... lxcBasicMounts
What are the mount flags in a working libvirt-lxc?
See: test1:~ # cat /proc/self/mountinfo 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
If you need more info, please let me know. :-)
Oh interesting I had not realized libvirt-lxc had grown an unprivileged mode using user namespaces.
This does appear to be a classic remount bug, where you are not preserving the permissions. It appears the fact that the code failed to enforce locked permissions on the fresh mount of proc was hiding this bug until now.
I expect what you actually want is the code below:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..f008a7484bfe 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
Or possibly just:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..a60ccbd12bfc 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
As the there is little point in making /proc/sys read-only in a user-namespace, as the permission checks are uid based and no-one should have the global uid 0 in your container. Making mounting /proc/sys read-only rather pointless.
Eric, using the patch below I was able to spawn a user-namespace enabled container using libvirt-lxc. :-)
I had to: 1. Disable the read-only mount of /proc/sys which is anyway useless in the user-namespace case. 2. Disable the /proc/sys/net/ipv{4,6} bind mounts, this ugly hack is only needed for the non user-namespace case. 3. Remove MS_RDONLY from the sysfs mount (For the non user-namespace case we'd have to keep this, though).
Daniel, I'd take this as a chance to disable all the MS_RDONLY games if user-namespace are configured. With Eric's fixes they hurt us. And as I wrote many times before if root within the user-namespace is able to do nasty things in /sys and /proc that's a plain kernel bug which needs fixing. There is no point in mounting these read-only. Except for the case then no user-namespace is used.
For clarity the patch below appears to be the minimal change needed to fix this security issue. AKA add mnt_mflags in when remounting something read-only. /proc/sys needed to be updated so it had the proper flags to be added back in. I hope this helps. diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..11e9514e0761 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct { static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, @@ -1030,7 +1030,7 @@ static int lxcContainerMountBasicFS(bool userns_enabled, if (bindOverReadonly && mount(mnt_src, mnt->dst, NULL, - MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { + MS_BIND|MS_REMOUNT|mnt_mflags|MS_RDONLY, NULL) < 0) { virReportSystemError(errno, _("Failed to re-mount %s on %s flags=%x"), mnt_src, mnt->dst, Eric

On Sat, Jun 06, 2015 at 01:56:54PM -0500, Eric W. Biederman wrote:
Richard Weinberger <richard@nod.at> writes:
[CC'ing libvirt-lxc folks]
Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
Richard Weinberger <richard@nod.at> writes:
Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
FWIW, it breaks also libvirt-lxc: Error: internal error: guest failed to start: Failed to re-mount /proc/sys on /proc/sys flags=1021: Operation not permitted
Interesting. I had not anticipated a failure there? And it is failing in remount? Oh that is interesting.
That implies that there is some flag of the original mount of /proc that the remount of /proc/sys is clearing, and that previously
The flags specified are current rdonly,remount,bind so I expect there are some other flags on proc that libvirt-lxc is clearing by accident and we did not fail before because the kernel was not enforcing things.
Please see: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9... lxcContainerMountBasicFS()
and: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=9a9... lxcBasicMounts
What are the mount flags in a working libvirt-lxc?
See: test1:~ # cat /proc/self/mountinfo 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
If you need more info, please let me know. :-)
Oh interesting I had not realized libvirt-lxc had grown an unprivileged mode using user namespaces.
This does appear to be a classic remount bug, where you are not preserving the permissions. It appears the fact that the code failed to enforce locked permissions on the fresh mount of proc was hiding this bug until now.
I expect what you actually want is the code below:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..f008a7484bfe 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
Or possibly just:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..a60ccbd12bfc 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
As the there is little point in making /proc/sys read-only in a user-namespace, as the permission checks are uid based and no-one should have the global uid 0 in your container. Making mounting /proc/sys read-only rather pointless.
Eric, using the patch below I was able to spawn a user-namespace enabled container using libvirt-lxc. :-)
I had to: 1. Disable the read-only mount of /proc/sys which is anyway useless in the user-namespace case. 2. Disable the /proc/sys/net/ipv{4,6} bind mounts, this ugly hack is only needed for the non user-namespace case. 3. Remove MS_RDONLY from the sysfs mount (For the non user-namespace case we'd have to keep this, though).
Daniel, I'd take this as a chance to disable all the MS_RDONLY games if user-namespace are configured. With Eric's fixes they hurt us. And as I wrote many times before if root within the user-namespace is able to do nasty things in /sys and /proc that's a plain kernel bug which needs fixing. There is no point in mounting these read-only. Except for the case then no user-namespace is used.
For clarity the patch below appears to be the minimal change needed to fix this security issue.
AKA add mnt_mflags in when remounting something read-only.
/proc/sys needed to be updated so it had the proper flags to be added back in.
I hope this helps.
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 9a9ae5c2aaf0..11e9514e0761 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = { { "proc", "/proc", "proc", MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false }, - { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false, false, false }, + { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL, MS_BIND, false, false, true }, { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL, MS_BIND, false, false, true }, { "sysfs", "/sys", "sysfs", MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false }, @@ -1030,7 +1030,7 @@ static int lxcContainerMountBasicFS(bool userns_enabled,
if (bindOverReadonly && mount(mnt_src, mnt->dst, NULL, - MS_BIND|MS_REMOUNT|MS_RDONLY, NULL) < 0) { + MS_BIND|MS_REMOUNT|mnt_mflags|MS_RDONLY, NULL) < 0) { virReportSystemError(errno, _("Failed to re-mount %s on %s flags=%x"), mnt_src, mnt->dst,
Thanks Richard / Eric for the suggested patches. I'll apply Eric's simplified patch to libvirt now, and backport it to our stable libvirt branches. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am 16.06.2015 um 14:31 schrieb Daniel P. Berrange:
Thanks Richard / Eric for the suggested patches. I'll apply Eric's simplified patch to libvirt now, and backport it to our stable libvirt branches.
Thank you Daniel!
participants (3)
-
Daniel P. Berrange
-
Eric W. Biederman
-
Richard Weinberger