[libvirt] [PATCH] Add some notes about security considerations when using LXC

From: "Daniel P. Berrange" <berrange@redhat.com> Describe some of the issues to be aware of when configuring LXC guests with security isolation as a goal. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- docs/drvlxc.html.in | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in index 1e6aa1d..dd2e93c 100644 --- a/docs/drvlxc.html.in +++ b/docs/drvlxc.html.in @@ -168,6 +168,99 @@ Further block or character devices will be made available to containers depending on their configuration. </p> +<h2><a name="security">Security considerations</a></h2> + +<p> +The libvirt LXC driver is fairly flexible in how it can be configured, +and as such does not enforce a requirement for strict security +separation between a container and the host. This allows it to be used +in scenarios where only resource control capabilities are important, +and resource sharing is desired. Applications wishing to ensure secure +isolation between a container and the host must ensure that they are +writing a suitable configuration +</p> + +<h3><a name="securenetworking">Network isolation</a></h3> + +<p> +If the guest configuration does not list any network interfaces, +the <code>network</code> namespace will not be activated, and thus +the container will see all the host's network interfaces. This will +allow apps in the container to bind to/connect from TCP/UDP addresses +and ports from the host OS. It also allows applications to access +UNIX domain sockets associated with the host OS. +</p> + +<p> +It should be noted that <code>systemd</code> has a UNIX domain socket +hich is used for communication by <code>systemctl</code>. Thus, with a +container that shares the host's network namespace, it will be possible +for a user in the container to invoke operations on <code>systemd</code> +in the same way it could if outside the container. In particular this +would allow <code>root</code> in the container to do anything including +shutting down the host OS. If this is not desired, then applications +should either specify the UID/GID mapping in the configuration to enable +user namespaces, or should set the <code><privnet/></code> flag +in the <code><features>....</features></code> element. +</p> + + +<h3><a name="securefs">Filesystem isolation</a></h3> + +<p> +If the guest confuguration does not list any filesystems, then +the container will be setup with a root filesystem that matches +the host's root filesystem. As noted earlier, only a few locations +such as <code>/dev</code>, <code>/proc</code> and <code>/sys</code> +will be altered. This means that, in the absence of restrictions +from sVirt, a process running as user/group N:M inside the container +will be able to access alnmost exactly the same files as a process +running as user/group N:M in the host. +</p> + +<p> +There are multiple options for restricting this. It is possible to +simply map the existing root filesystem through to the container in +read-only mode. Alternatively a completely separate root filesystem +can be configured for the guest. In both cases, further sub-mounts +can be applied to customize the content that is made visible. Note +that in the absence of sVirt controls, it is still possible for the +root user in a container to unmount any sub-mounts applied. The user +namespace feature can also be used to restrict access to files based +on the UID/GID mappings. +</p> + +<h3><a name="secureusers">User and group isolation</a></h3> + +<p> +If the guest configuration does not list any ID mapping, then the +user and group IDs used inside the container will match those used +outside the container. In addition, the capabilities associated with +a process in the container will infer the same privileges they would +for a process in the host. This has obvious implications for security, +since a root user inside the container will be able to access any +file owned by root that is visible to the container, and perform more +or less any privileged kernel operation. In the absence of additional +protection from sVirt, this means that the root user inside a container +is effectively as powerful as the root user in the host. There is no +security isolation of the root user. +</p> + +<p> +The ID mapping facility was introduced to allow for stricter control +over the privileges of users inside the container. It allows apps to +define rules such as "user ID 0 in the container maps to user ID 1000 +in the host". In addition the privileges associated with capabilities +are somewhat reduced so that they can not be used to escape from the +container environment. A full description of user namespaces is outside +the scope of this document, however LWN has +<a href="https://lwn.net/Articles/532593/">a good writeup on the topic</a>. +From the libvirt POV, the key thing to remember is that defining an +ID mapping for users and groups in the container XML configuration +causes libvirt to activate the user namespace feature. +</p> + + <h2><a name="activation">Systemd Socket Activation Integration</a></h2> <p> -- 1.8.3.1

On 09/10/2013 04:43 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange@redhat.com>
Describe some of the issues to be aware of when configuring LXC guests with security isolation as a goal.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- docs/drvlxc.html.in | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+)
(This appears to be the patch promised here [1], but the original cc doesn't show evidence of it reaching Chen: https://www.redhat.com/archives/libvir-list/2013-September/msg00474.html)
diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in index 1e6aa1d..dd2e93c 100644 --- a/docs/drvlxc.html.in +++ b/docs/drvlxc.html.in @@ -168,6 +168,99 @@ Further block or character devices will be made available to containers depending on their configuration. </p>
+<h2><a name="security">Security considerations</a></h2> + +<p> +The libvirt LXC driver is fairly flexible in how it can be configured, +and as such does not enforce a requirement for strict security +separation between a container and the host. This allows it to be used +in scenarios where only resource control capabilities are important, +and resource sharing is desired. Applications wishing to ensure secure +isolation between a container and the host must ensure that they are +writing a suitable configuration
s/$/./
+</p> + +<h3><a name="securenetworking">Network isolation</a></h3> + +<p> +If the guest configuration does not list any network interfaces, +the <code>network</code> namespace will not be activated, and thus +the container will see all the host's network interfaces. This will +allow apps in the container to bind to/connect from TCP/UDP addresses +and ports from the host OS. It also allows applications to access +UNIX domain sockets associated with the host OS. +</p> + +<p> +It should be noted that <code>systemd</code> has a UNIX domain socket +hich is used for communication by <code>systemctl</code>. Thus, with a
s/hich/which/
+container that shares the host's network namespace, it will be possible +for a user in the container to invoke operations on <code>systemd</code> +in the same way it could if outside the container. In particular this +would allow <code>root</code> in the container to do anything including +shutting down the host OS. If this is not desired, then applications +should either specify the UID/GID mapping in the configuration to enable +user namespaces, or should set the <code><privnet/></code> flag +in the <code><features>....</features></code> element. +</p> + + +<h3><a name="securefs">Filesystem isolation</a></h3> + +<p> +If the guest confuguration does not list any filesystems, then
s/confuguration/configuration/
+the container will be setup with a root filesystem that matches
s/setup/set up/
+the host's root filesystem. As noted earlier, only a few locations +such as <code>/dev</code>, <code>/proc</code> and <code>/sys</code> +will be altered. This means that, in the absence of restrictions +from sVirt, a process running as user/group N:M inside the container +will be able to access alnmost exactly the same files as a process
s/alnmost/almost/
+running as user/group N:M in the host. +</p> + +<p> +There are multiple options for restricting this. It is possible to +simply map the existing root filesystem through to the container in +read-only mode. Alternatively a completely separate root filesystem +can be configured for the guest. In both cases, further sub-mounts +can be applied to customize the content that is made visible. Note +that in the absence of sVirt controls, it is still possible for the +root user in a container to unmount any sub-mounts applied. The user +namespace feature can also be used to restrict access to files based +on the UID/GID mappings. +</p> + +<h3><a name="secureusers">User and group isolation</a></h3> + +<p> +If the guest configuration does not list any ID mapping, then the +user and group IDs used inside the container will match those used +outside the container. In addition, the capabilities associated with +a process in the container will infer the same privileges they would +for a process in the host. This has obvious implications for security, +since a root user inside the container will be able to access any +file owned by root that is visible to the container, and perform more +or less any privileged kernel operation. In the absence of additional +protection from sVirt, this means that the root user inside a container +is effectively as powerful as the root user in the host. There is no +security isolation of the root user. +</p> + +<p> +The ID mapping facility was introduced to allow for stricter control +over the privileges of users inside the container. It allows apps to +define rules such as "user ID 0 in the container maps to user ID 1000 +in the host". In addition the privileges associated with capabilities +are somewhat reduced so that they can not be used to escape from the +container environment. A full description of user namespaces is outside +the scope of this document, however LWN has +<a href="https://lwn.net/Articles/532593/">a good writeup on the topic</a>. +From the libvirt POV, the key thing to remember is that defining an
s/POV/point of view/
+ID mapping for users and groups in the container XML configuration +causes libvirt to activate the user namespace feature. +</p> + + <h2><a name="activation">Systemd Socket Activation Integration</a></h2>
<p>
ACK with those fixes. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

-----Original Message----- From: Daniel P. Berrange [mailto:berrange@redhat.com] Sent: Tuesday, September 10, 2013 6:44 PM To: libvir-list@redhat.com Cc: Chen Hanxiao; Daniel P. Berrange Subject: [PATCH] Add some notes about security considerations when using LXC
From: "Daniel P. Berrange" <berrange@redhat.com>
Describe some of the issues to be aware of when configuring LXC guests with security isolation as a goal.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- docs/drvlxc.html.in | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+)
diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in index 1e6aa1d..dd2e93c 100644 --- a/docs/drvlxc.html.in +++ b/docs/drvlxc.html.in @@ -168,6 +168,99 @@ Further block or character devices will be made available to containers depending on their configuration. </p>
+<h2><a name="security">Security considerations</a></h2> + +<p> +The libvirt LXC driver is fairly flexible in how it can be configured, +and as such does not enforce a requirement for strict security +separation between a container and the host. This allows it to be used +in scenarios where only resource control capabilities are important, +and resource sharing is desired. Applications wishing to ensure secure +isolation between a container and the host must ensure that they are +writing a suitable configuration </p> + +<h3><a name="securenetworking">Network isolation</a></h3> + +<p> +If the guest configuration does not list any network interfaces, the +<code>network</code> namespace will not be activated, and thus the +container will see all the host's network interfaces. This will allow +apps in the container to bind to/connect from TCP/UDP addresses and +ports from the host OS. It also allows applications to access UNIX +domain sockets associated with the host OS. +</p> + +<p> +It should be noted that <code>systemd</code> has a UNIX domain socket +hich is used for communication by <code>systemctl</code>. Thus, with a +container that shares the host's network namespace, it will be possible +for a user in the container to invoke operations on +<code>systemd</code> in the same way it could if outside the container. +In particular this would allow <code>root</code> in the container to do +anything including shutting down the host OS. If this is not desired, +then applications should either specify the UID/GID mapping in the +configuration to enable user namespaces, or should set the +<code><privnet/></code> flag in the <code><features>....</features></code> element. +</p>
There might be too much spotlight on 'systemd'. Maybe users may think that this issue only came with systemd. Actually RHEL6.4GA without systemd still suffer from the reboot issue. Some apps like upstart can send reboot request to host via unix sockets.
+ + +<h3><a name="securefs">Filesystem isolation</a></h3> + +<p> +If the guest confuguration does not list any filesystems, then the +container will be setup with a root filesystem that matches the host's +root filesystem. As noted earlier, only a few locations such as +<code>/dev</code>, <code>/proc</code> and <code>/sys</code> will be +altered. This means that, in the absence of restrictions from sVirt, a +process running as user/group N:M inside the container will be able to +access alnmost exactly the same files as a process running as +user/group N:M in the host. +</p> + +<p> +There are multiple options for restricting this. It is possible to +simply map the existing root filesystem through to the container in +read-only mode. Alternatively a completely separate root filesystem can +be configured for the guest. In both cases, further sub-mounts can be +applied to customize the content that is made visible. Note that in the +absence of sVirt controls, it is still possible for the root user in a +container to unmount any sub-mounts applied. The user namespace feature +can also be used to restrict access to files based on the UID/GID +mappings. +</p> + +<h3><a name="secureusers">User and group isolation</a></h3> + +<p> +If the guest configuration does not list any ID mapping, then the user +and group IDs used inside the container will match those used outside +the container. In addition, the capabilities associated with a process +in the container will infer the same privileges they would for a +process in the host. This has obvious implications for security, since +a root user inside the container will be able to access any file owned +by root that is visible to the container, and perform more or less any +privileged kernel operation. In the absence of additional protection +from sVirt, this means that the root user inside a container is +effectively as powerful as the root user in the host. There is no +security isolation of the root user. +</p> + +<p> +The ID mapping facility was introduced to allow for stricter control +over the privileges of users inside the container. It allows apps to +define rules such as "user ID 0 in the container maps to user ID 1000 +in the host". In addition the privileges associated with capabilities +are somewhat reduced so that they can not be used to escape from the +container environment. A full description of user namespaces is outside +the scope of this document, however LWN has <a +href="https://lwn.net/Articles/532593/">a good writeup on the topic</a>.
s/ writeup/write-up
+From the libvirt POV, the key thing to remember is that defining an ID +mapping for users and groups in the container XML configuration causes +libvirt to activate the user namespace feature. +</p> + + <h2><a name="activation">Systemd Socket Activation Integration</a></h2>
<p> -- 1.8.3.1

On 09/11/2013 10:33 AM, Chen Hanxiao wrote:
-----Original Message----- From: Daniel P. Berrange [mailto:berrange@redhat.com] Sent: Tuesday, September 10, 2013 6:44 PM To: libvir-list@redhat.com Cc: Chen Hanxiao; Daniel P. Berrange Subject: [PATCH] Add some notes about security considerations when using LXC
From: "Daniel P. Berrange" <berrange@redhat.com>
Describe some of the issues to be aware of when configuring LXC guests with security isolation as a goal.
Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- docs/drvlxc.html.in | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+)
diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in index 1e6aa1d..dd2e93c 100644 --- a/docs/drvlxc.html.in +++ b/docs/drvlxc.html.in @@ -168,6 +168,99 @@ Further block or character devices will be made available to containers depending on their configuration. </p>
+<h2><a name="security">Security considerations</a></h2> + +<p> +The libvirt LXC driver is fairly flexible in how it can be configured, +and as such does not enforce a requirement for strict security +separation between a container and the host. This allows it to be used +in scenarios where only resource control capabilities are important, +and resource sharing is desired. Applications wishing to ensure secure +isolation between a container and the host must ensure that they are +writing a suitable configuration </p> + +<h3><a name="securenetworking">Network isolation</a></h3> + +<p> +If the guest configuration does not list any network interfaces, the +<code>network</code> namespace will not be activated, and thus the +container will see all the host's network interfaces. This will allow +apps in the container to bind to/connect from TCP/UDP addresses and +ports from the host OS. It also allows applications to access UNIX +domain sockets associated with the host OS. +</p> + +<p> +It should be noted that <code>systemd</code> has a UNIX domain socket +hich is used for communication by <code>systemctl</code>. Thus, with a +container that shares the host's network namespace, it will be possible +for a user in the container to invoke operations on +<code>systemd</code> in the same way it could if outside the container. +In particular this would allow <code>root</code> in the container to do +anything including shutting down the host OS. If this is not desired, +then applications should either specify the UID/GID mapping in the +configuration to enable user namespaces, or should set the +<code><privnet/></code> flag in the <code><features>....</features></code> element. +</p>
There might be too much spotlight on 'systemd'. Maybe users may think that this issue only came with systemd.
Actually RHEL6.4GA without systemd still suffer from the reboot issue. Some apps like upstart can send reboot request to host via unix sockets.
Yes, there are two kinds of unix sockets(man 7 unix). one is abstract, this type of unix socket is net namespace aware, and upstarts use this type of unix socket to recv/send reboot message. So in this case, we should enable net namespace. the other one is pathname, this type is not net namespace aware, since it represents a file(inode), systemd uses this type of unix socket to recv/send reboot message. In this case, we should make sure the files aren't shared between host and container, for systemd, this file is /run/systemd/private. Ack for other parts of this doc. Thanks
participants (4)
-
Chen Hanxiao
-
Daniel P. Berrange
-
Eric Blake
-
Gao feng