[CC'ing libvirt-lxc folks]
Am 28.05.2015 um 23:32 schrieb Eric W. Biederman:
Richard Weinberger <richard(a)nod.at> writes:
> Am 28.05.2015 um 21:57 schrieb Eric W. Biederman:
>>> FWIW, it breaks also libvirt-lxc:
>>> Error: internal error: guest failed to start: Failed to re-mount /proc/sys on
/proc/sys flags=1021: Operation not permitted
>>
>> Interesting. I had not anticipated a failure there? And it is failing
>> in remount? Oh that is interesting.
>>
>> That implies that there is some flag of the original mount of /proc that
>> the remount of /proc/sys is clearing, and that previously
>>
>> The flags specified are current rdonly,remount,bind so I expect there
>> are some other flags on proc that libvirt-lxc is clearing by accident
>> and we did not fail before because the kernel was not enforcing things.
>
> Please see:
>
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=...
> lxcContainerMountBasicFS()
>
> and:
>
http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/lxc/lxc_container.c;h=...
> lxcBasicMounts
>
>> What are the mount flags in a working libvirt-lxc?
>
> See:
> test1:~ # cat /proc/self/mountinfo
> 149 147 0:56 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
> 150 149 0:56 /sys /proc/sys ro,nodev,relatime - proc proc rw
> If you need more info, please let me know. :-)
Oh interesting I had not realized libvirt-lxc had grown an unprivileged
mode using user namespaces.
This does appear to be a classic remount bug, where you are not
preserving the permissions. It appears the fact that the code
failed to enforce locked permissions on the fresh mount of proc
was hiding this bug until now.
I expect what you actually want is the code below:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 9a9ae5c2aaf0..f008a7484bfe 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = {
{ "proc", "/proc", "proc",
MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
- { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false,
false, false },
+ { "/proc/sys", "/proc/sys", NULL,
MS_BIND|MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
{ "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL,
MS_BIND, false, false, true },
{ "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL,
MS_BIND, false, false, true },
{ "sysfs", "/sys", "sysfs",
MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
Or possibly just:
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 9a9ae5c2aaf0..a60ccbd12bfc 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -850,7 +850,7 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = {
{ "proc", "/proc", "proc",
MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
- { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false,
false, false },
+ { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true,
false, false },
{ "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL,
MS_BIND, false, false, true },
{ "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL,
MS_BIND, false, false, true },
{ "sysfs", "/sys", "sysfs",
MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
As the there is little point in making /proc/sys read-only in a
user-namespace, as the permission checks are uid based and no-one should
have the global uid 0 in your container. Making mounting /proc/sys
read-only rather pointless.
Eric, using the patch below I was able to spawn a user-namespace enabled container
using libvirt-lxc. :-)
I had to:
1. Disable the read-only mount of /proc/sys which is anyway useless in the user-namespace
case.
2. Disable the /proc/sys/net/ipv{4,6} bind mounts, this ugly hack is only needed for the
non user-namespace case.
3. Remove MS_RDONLY from the sysfs mount (For the non user-namespace case we'd have to
keep this, though).
Daniel, I'd take this as a chance to disable all the MS_RDONLY games if user-namespace
are configured.
With Eric's fixes they hurt us. And as I wrote many times before if root within the
user-namespace
is able to do nasty things in /sys and /proc that's a plain kernel bug which needs
fixing. There is no
point in mounting these read-only. Except for the case then no user-namespace is used.
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c
index 9a9ae5c..497e05f 100644
--- a/src/lxc/lxc_container.c
+++ b/src/lxc/lxc_container.c
@@ -850,10 +850,10 @@ typedef struct {
static const virLXCBasicMountInfo lxcBasicMounts[] = {
{ "proc", "/proc", "proc",
MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
- { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, false,
false, false },
- { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL,
MS_BIND, false, false, true },
- { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL,
MS_BIND, false, false, true },
- { "sysfs", "/sys", "sysfs",
MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, false, false, false },
+ { "/proc/sys", "/proc/sys", NULL, MS_BIND|MS_RDONLY, true, false,
false },
+ { "/.oldroot/proc/sys/net/ipv4", "/proc/sys/net/ipv4", NULL,
MS_BIND, true, false, true },
+ { "/.oldroot/proc/sys/net/ipv6", "/proc/sys/net/ipv6", NULL,
MS_BIND, true, false, true },
+ { "sysfs", "/sys", "sysfs",
MS_NOSUID|MS_NOEXEC|MS_NODEV, false, false, false },
{ "securityfs", "/sys/kernel/security", "securityfs",
MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true, false },
#if WITH_SELINUX
{ SELINUX_MOUNT, SELINUX_MOUNT, "selinuxfs",
MS_NOSUID|MS_NOEXEC|MS_NODEV|MS_RDONLY, true, true, false },
Thanks,
//richard