
On Mon, Jul 14, 2014 at 12:01 PM, Chen Hanxiao <chenhanxiao@cn.fujitsu.com> wrote:
kernel commit 7dc5dbc879bd0779924b5132a48b731a0bc04a1e forbid us doing a fresh mount for sysfs when enable userns but disable netns. This patch will create a bind mount in this senario.
Sorry for exhuming an already merged patch but today I ran into a nasty issue caused by it.
Signed-off-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com> --- src/lxc/lxc_container.c | 44 +++++++++++++++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 11 deletions(-)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index 4d89677..8a27215 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -815,10 +815,13 @@ static int lxcContainerSetReadOnly(void) }
-static int lxcContainerMountBasicFS(bool userns_enabled) +static int lxcContainerMountBasicFS(bool userns_enabled, + bool netns_disabled) { size_t i; int rc = -1; + char* mnt_src = NULL; + int mnt_mflags;
VIR_DEBUG("Mounting basic filesystems");
@@ -826,8 +829,25 @@ static int lxcContainerMountBasicFS(bool userns_enabled) bool bindOverReadonly; virLXCBasicMountInfo const *mnt = &lxcBasicMounts[i];
+ /* When enable userns but disable netns, kernel will + * forbid us doing a new fresh mount for sysfs. + * So we had to do a bind mount for sysfs instead. + */ + if (userns_enabled && netns_disabled && + STREQ(mnt->src, "sysfs")) { + if (VIR_STRDUP(mnt_src, "/sys") < 0) { + goto cleanup; + }
This is clearly broken and looks very untested to me. It will issue this mount call: mount("/sys", "/sys", "sysfs", MS_NOSUID|MS_NODEV|MS_NOEXEC|MS_BIND, NULL) because the code runs after pivot_root(2). i.e, /sys will be still empty after that and no sysfs at all there. As libvirt will later remount /sys readonly creating a container will fail with the most useless error message: Error: internal error: guest failed to start: Unable to create directory /sys/fs/: Read-only file system or Error: internal error: guest failed to start: Unable to create directory /sys/fs/cgroup: Read-only file system Please note that changing "/sys" to "/.oldroot/sys" will not solve the issue as this code runs already in the new namespace and therefore the old mount tree is locked, thus MS_BIND is not allowed. This brings me to the question, why do you handle the netns_disabled case anyway? If in the XML file no network is specified just create a new and empty network namespace. Bindmounting /sys into the container is a security issue. This is why mounting sysfs without a netns was disabled to begin with. P.S: Sorry for the grumpy mail, I've wasted almost the whole day with debugging that issue. -- Thanks, //richard