[libvirt] [PATCH] LXC: Improved check before mounting securityfs

Securityfs kernel support may not be available on all platforms running libvirt containers. Since securityfs receives special handling in the context of user namespaces, make an additional check to see if it is supported, by inspecting /proc/filesystems. Making this check for all lxcBasicMounts is a bit tedious, since the /proc filesystem is first unmounted from host, so the /proc/filesystems list should be saved before unmounting, to be available at all times. However, checks for the support for /proc or /sys are superfluous. In the long run, to support the addition of new filesystems in lxcBasicMounts, an additional "optional" flag should be introduced, to mark that for a specific filesystem, the code should first check for support in the kernel, before mounting it. For mandatory filesystems, if mounting them fails, creating the container fails. Right now, check for support only for securityfs, since right now it is the only special case. Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> --- src/lxc/lxc_container.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 68 insertions(+), 1 deletion(-) diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index c60f5d8..cead026 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -509,6 +509,71 @@ static int lxcContainerChildMountSort(const void *a, const void *b) # define MS_SLAVE (1<<19) #endif +/* + * This function attempts to detect kernel support + * for a specific filesystem type. This is done by + * inspecting /proc/filesystems. + */ +static int lxcCheckFSSupport(const char *fs_type) +{ + FILE *fp = NULL; + int ret = -1; + const char *fslist = "/proc/filesystems"; + char *line = NULL; + char *type; + size_t n; + + /* there should be no problem mounting an entry + * with NULL fs type, hence NULL fs types are + * supported */ + if (!fs_type) { + ret = 1; + goto out; + } + + VIR_DEBUG("Checking kernel support for %s in %s", fs_type, fslist); + + if (!(fp = fopen(fslist, "r"))) { + virReportSystemError(errno, + _("Unable to read %s"), + fslist); + goto out; + } + + while(getline(&line, &n, fp) > 0) { + type = strstr(line, fs_type); + + if (!type) + continue; + + /* eliminate trailing newline */ + type[strlen(type) - 1] = '\0'; + + if (STREQ(type,fs_type)) { + VIR_DEBUG("Kernel support found for %s", fs_type); + ret = 1; + goto cleanup; + } + } + + if (ferror(fp)) { + virReportSystemError(errno, + _("Error reading line from %s"), + fslist); + goto cleanup; + } + + VIR_DEBUG("No kernel support for %s", fs_type); + + ret = 0; + +cleanup: + VIR_FREE(line); + VIR_FORCE_FCLOSE(fp); +out: + return ret; +} + static int lxcContainerGetSubtree(const char *prefix, char ***mountsret, size_t *nmountsret) @@ -872,7 +937,9 @@ static int lxcContainerMountBasicFS(bool userns_enabled) continue; #endif - if (STREQ(mnt->src, "securityfs") && userns_enabled) + if (STREQ(mnt->src, "securityfs") && + (lxcCheckFSSupport(mnt->type) < 1 || + userns_enabled)) continue; if (virFileMakePath(mnt->dst) < 0) { -- 1.7.11.7

On 10/07/2013 05:52 PM, Bogdan Purcareata wrote:
Securityfs kernel support may not be available on all platforms running libvirt containers. Since securityfs receives special handling in the context of user namespaces, make an additional check to see if it is supported, by inspecting /proc/filesystems.
Making this check for all lxcBasicMounts is a bit tedious, since the /proc filesystem is first unmounted from host, so the /proc/filesystems list should be saved before unmounting, to be available at all times. However, checks for the support for /proc or /sys are superfluous.
In the long run, to support the addition of new filesystems in lxcBasicMounts, an additional "optional" flag should be introduced, to mark that for a specific filesystem, the code should first check for support in the kernel, before mounting it. For mandatory filesystems, if mounting them fails, creating the container fails.
Right now, check for support only for securityfs, since right now it is the only special case.
Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com> --- src/lxc/lxc_container.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 68 insertions(+), 1 deletion(-)
Ok, I know what's wrong, please check my patch. If you think it's good, please add your Acked-by or Reviewed-by

Otherwise we can't know if securityfs is avaiabled. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_container.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index b1f429c..a15ce59 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -768,8 +768,8 @@ static const virLXCBasicMountInfo lxcBasicMounts[] = { { "/proc/sys", "/proc/sys", NULL, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, { "sysfs", "/sys", "sysfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV }, { "sysfs", "/sys", "sysfs", NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, - { "securityfs", "/sys/kernel/security", "securityfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV }, - { "securityfs", "/sys/kernel/security", "securityfs", NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, + { "/sys/kernel/security", "/sys/kernel/security", "securityfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV }, + { "/sys/kernel/security", "/sys/kernel/security", "securityfs", NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, #if WITH_SELINUX { SELINUX_MOUNT, SELINUX_MOUNT, "selinuxfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV }, { SELINUX_MOUNT, SELINUX_MOUNT, NULL, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, -- 1.8.3.1

On Mon, Oct 07, 2013 at 06:48:18PM +0800, Gao feng wrote:
Otherwise we can't know if securityfs is avaiabled.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> --- src/lxc/lxc_container.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/lxc/lxc_container.c b/src/lxc/lxc_container.c index b1f429c..a15ce59 100644 --- a/src/lxc/lxc_container.c +++ b/src/lxc/lxc_container.c @@ -768,8 +768,8 @@ static const virLXCBasicMountInfo lxcBasicMounts[] = { { "/proc/sys", "/proc/sys", NULL, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, { "sysfs", "/sys", "sysfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV }, { "sysfs", "/sys", "sysfs", NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, - { "securityfs", "/sys/kernel/security", "securityfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV }, - { "securityfs", "/sys/kernel/security", "securityfs", NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, + { "/sys/kernel/security", "/sys/kernel/security", "securityfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV }, + { "/sys/kernel/security", "/sys/kernel/security", "securityfs", NULL, MS_BIND|MS_REMOUNT|MS_RDONLY }, #if WITH_SELINUX { SELINUX_MOUNT, SELINUX_MOUNT, "selinuxfs", NULL, MS_NOSUID|MS_NOEXEC|MS_NODEV }, { SELINUX_MOUNT, SELINUX_MOUNT, NULL, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY },
Huh, this isn't right. If this has any functional effect, then it is merely exposing a bug somewhere else. The 'src' is just an opaque string for psuedo filesystems like securityfs that shouldn't have any functional effect. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Mon, Oct 07, 2013 at 12:52:30PM +0300, Bogdan Purcareata wrote:
Securityfs kernel support may not be available on all platforms running libvirt containers. Since securityfs receives special handling in the context of user namespaces, make an additional check to see if it is supported, by inspecting /proc/filesystems.
Making this check for all lxcBasicMounts is a bit tedious, since the /proc filesystem is first unmounted from host, so the /proc/filesystems list should be saved before unmounting, to be available at all times. However, checks for the support for /proc or /sys are superfluous.
I actually don't think it is correct to base it on /proc/filesystems. The intent of this code is that the container setup match the host OS setup for these "special" filesystems. So the container should have it mounted, if and only if, the host has it mounted. We had attempted todo this by using access(/the/path), but this is flawed because a) we were looking at the wrong path (the container path, not the host path) and b) the directory can exist even if the FS isn't mounted. What we should have done here is to check whether the path in question is a mount point on the host. This should automatically do the right thing if the kernel does not have the filesystem in question compiled, as well as if it isn't mounted in the host. I'll copy you on a patch which tries todo that, so can you test it with your kenrels.
In the long run, to support the addition of new filesystems in lxcBasicMounts, an additional "optional" flag should be introduced, to mark that for a specific filesystem, the code should first check for support in the kernel, before mounting it. For mandatory filesystems, if mounting them fails, creating the container fails.
Right now, check for support only for securityfs, since right now it is the only special case.
I'm including flags in the mount table so we can avoid this special casing as you suggest. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 10/07/2013 09:04 PM, Daniel P. Berrange wrote:
On Mon, Oct 07, 2013 at 12:52:30PM +0300, Bogdan Purcareata wrote:
Securityfs kernel support may not be available on all platforms running libvirt containers. Since securityfs receives special handling in the context of user namespaces, make an additional check to see if it is supported, by inspecting /proc/filesystems.
Making this check for all lxcBasicMounts is a bit tedious, since the /proc filesystem is first unmounted from host, so the /proc/filesystems list should be saved before unmounting, to be available at all times. However, checks for the support for /proc or /sys are superfluous.
I actually don't think it is correct to base it on /proc/filesystems.
The intent of this code is that the container setup match the host OS setup for these "special" filesystems. So the container should have it mounted, if and only if, the host has it mounted.
We had attempted todo this by using access(/the/path), but this is flawed because a) we were looking at the wrong path (the container path, not the host path) and b) the directory can exist even if the FS isn't mounted.
We already mount sysfs to /sys before we mount securityfs in container, so the path /sys/kernel/securityfs is right, the securityfs directory is created when we mount sysfs. I read the codes of systemd, systemd doesn't mount securityfs in container environment, so I don't know what's problem commit 6807238d87fd93dee30038bea1e8582a5f0a9fe7 trying to resolve.
What we should have done here is to check whether the path in question is a mount point on the host. This should automatically do the right thing if the kernel does not have the filesystem in question compiled, as well as if it isn't mounted in the host. I'll copy you on a patch which tries todo that, so can you test it with your kenrels.
In the long run, to support the addition of new filesystems in lxcBasicMounts, an additional "optional" flag should be introduced, to mark that for a specific filesystem, the code should first check for support in the kernel, before mounting it. For mandatory filesystems, if mounting them fails, creating the container fails.
Right now, check for support only for securityfs, since right now it is the only special case.
I'm including flags in the mount table so we can avoid this special casing as you suggest.
Daniel
participants (3)
-
Bogdan Purcareata
-
Daniel P. Berrange
-
Gao feng