On 09/10/2013 04:43 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange" <berrange(a)redhat.com>
Describe some of the issues to be aware of when configuring LXC
guests with security isolation as a goal.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
---
docs/drvlxc.html.in | 93 +++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 93 insertions(+)
(This appears to be the patch promised here [1], but the original cc
doesn't show evidence of it reaching Chen:
https://www.redhat.com/archives/libvir-list/2013-September/msg00474.html)
diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in
index 1e6aa1d..dd2e93c 100644
--- a/docs/drvlxc.html.in
+++ b/docs/drvlxc.html.in
@@ -168,6 +168,99 @@ Further block or character devices will be made available to
containers
depending on their configuration.
</p>
+<h2><a name="security">Security
considerations</a></h2>
+
+<p>
+The libvirt LXC driver is fairly flexible in how it can be configured,
+and as such does not enforce a requirement for strict security
+separation between a container and the host. This allows it to be used
+in scenarios where only resource control capabilities are important,
+and resource sharing is desired. Applications wishing to ensure secure
+isolation between a container and the host must ensure that they are
+writing a suitable configuration
s/$/./
+</p>
+
+<h3><a name="securenetworking">Network
isolation</a></h3>
+
+<p>
+If the guest configuration does not list any network interfaces,
+the <code>network</code> namespace will not be activated, and thus
+the container will see all the host's network interfaces. This will
+allow apps in the container to bind to/connect from TCP/UDP addresses
+and ports from the host OS. It also allows applications to access
+UNIX domain sockets associated with the host OS.
+</p>
+
+<p>
+It should be noted that <code>systemd</code> has a UNIX domain socket
+hich is used for communication by <code>systemctl</code>. Thus, with a
s/hich/which/
+container that shares the host's network namespace, it will be
possible
+for a user in the container to invoke operations on <code>systemd</code>
+in the same way it could if outside the container. In particular this
+would allow <code>root</code> in the container to do anything including
+shutting down the host OS. If this is not desired, then applications
+should either specify the UID/GID mapping in the configuration to enable
+user namespaces, or should set the <code><privnet/></code>
flag
+in the <code><features>....</features></code>
element.
+</p>
+
+
+<h3><a name="securefs">Filesystem isolation</a></h3>
+
+<p>
+If the guest confuguration does not list any filesystems, then
s/confuguration/configuration/
+the container will be setup with a root filesystem that matches
s/setup/set up/
+the host's root filesystem. As noted earlier, only a few
locations
+such as <code>/dev</code>, <code>/proc</code> and
<code>/sys</code>
+will be altered. This means that, in the absence of restrictions
+from sVirt, a process running as user/group N:M inside the container
+will be able to access alnmost exactly the same files as a process
s/alnmost/almost/
+running as user/group N:M in the host.
+</p>
+
+<p>
+There are multiple options for restricting this. It is possible to
+simply map the existing root filesystem through to the container in
+read-only mode. Alternatively a completely separate root filesystem
+can be configured for the guest. In both cases, further sub-mounts
+can be applied to customize the content that is made visible. Note
+that in the absence of sVirt controls, it is still possible for the
+root user in a container to unmount any sub-mounts applied. The user
+namespace feature can also be used to restrict access to files based
+on the UID/GID mappings.
+</p>
+
+<h3><a name="secureusers">User and group
isolation</a></h3>
+
+<p>
+If the guest configuration does not list any ID mapping, then the
+user and group IDs used inside the container will match those used
+outside the container. In addition, the capabilities associated with
+a process in the container will infer the same privileges they would
+for a process in the host. This has obvious implications for security,
+since a root user inside the container will be able to access any
+file owned by root that is visible to the container, and perform more
+or less any privileged kernel operation. In the absence of additional
+protection from sVirt, this means that the root user inside a container
+is effectively as powerful as the root user in the host. There is no
+security isolation of the root user.
+</p>
+
+<p>
+The ID mapping facility was introduced to allow for stricter control
+over the privileges of users inside the container. It allows apps to
+define rules such as "user ID 0 in the container maps to user ID 1000
+in the host". In addition the privileges associated with capabilities
+are somewhat reduced so that they can not be used to escape from the
+container environment. A full description of user namespaces is outside
+the scope of this document, however LWN has
+<a
href="https://lwn.net/Articles/532593/">a good writeup on the
topic</a>.
+From the libvirt POV, the key thing to remember is that defining an
s/POV/point of view/
+ID mapping for users and groups in the container XML configuration
+causes libvirt to activate the user namespace feature.
+</p>
+
+
<h2><a name="activation">Systemd Socket Activation
Integration</a></h2>
<p>
ACK with those fixes.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org