-----Original Message-----
From: Daniel P. Berrange [mailto:berrange@redhat.com]
Sent: Tuesday, September 10, 2013 6:44 PM
To: libvir-list(a)redhat.com
Cc: Chen Hanxiao; Daniel P. Berrange
Subject: [PATCH] Add some notes about security considerations when using
LXC
From: "Daniel P. Berrange" <berrange(a)redhat.com>
Describe some of the issues to be aware of when configuring LXC guests
with
security isolation as a goal.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
---
docs/drvlxc.html.in | 93
+++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 93 insertions(+)
diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in index
1e6aa1d..dd2e93c
100644
--- a/docs/drvlxc.html.in
+++ b/docs/drvlxc.html.in
@@ -168,6 +168,99 @@ Further block or character devices will be made
available to containers depending on their configuration.
</p>
+<h2><a name="security">Security
considerations</a></h2>
+
+<p>
+The libvirt LXC driver is fairly flexible in how it can be configured,
+and as such does not enforce a requirement for strict security
+separation between a container and the host. This allows it to be used
+in scenarios where only resource control capabilities are important,
+and resource sharing is desired. Applications wishing to ensure secure
+isolation between a container and the host must ensure that they are
+writing a suitable configuration </p>
+
+<h3><a name="securenetworking">Network
isolation</a></h3>
+
+<p>
+If the guest configuration does not list any network interfaces, the
+<code>network</code> namespace will not be activated, and thus the
+container will see all the host's network interfaces. This will allow
+apps in the container to bind to/connect from TCP/UDP addresses and
+ports from the host OS. It also allows applications to access UNIX
+domain sockets associated with the host OS.
+</p>
+
+<p>
+It should be noted that <code>systemd</code> has a UNIX domain socket
+hich is used for communication by <code>systemctl</code>. Thus, with a
+container that shares the host's network namespace, it will be possible
+for a user in the container to invoke operations on
+<code>systemd</code> in the same way it could if outside the container.
+In particular this would allow <code>root</code> in the container to do
+anything including shutting down the host OS. If this is not desired,
+then applications should either specify the UID/GID mapping in the
+configuration to enable user namespaces, or should set the
+<code><privnet/></code> flag in the
<code><features>....</features></code> element.
+</p>
There might be too much spotlight on 'systemd'.
Maybe users may think that this issue only came with systemd.
Actually RHEL6.4GA without systemd still suffer from the reboot issue.
Some apps like upstart can send reboot request to host via unix sockets.
+
+
+<h3><a name="securefs">Filesystem isolation</a></h3>
+
+<p>
+If the guest confuguration does not list any filesystems, then the
+container will be setup with a root filesystem that matches the host's
+root filesystem. As noted earlier, only a few locations such as
+<code>/dev</code>, <code>/proc</code> and
<code>/sys</code> will be
+altered. This means that, in the absence of restrictions from sVirt, a
+process running as user/group N:M inside the container will be able to
+access alnmost exactly the same files as a process running as
+user/group N:M in the host.
+</p>
+
+<p>
+There are multiple options for restricting this. It is possible to
+simply map the existing root filesystem through to the container in
+read-only mode. Alternatively a completely separate root filesystem can
+be configured for the guest. In both cases, further sub-mounts can be
+applied to customize the content that is made visible. Note that in the
+absence of sVirt controls, it is still possible for the root user in a
+container to unmount any sub-mounts applied. The user namespace feature
+can also be used to restrict access to files based on the UID/GID
+mappings.
+</p>
+
+<h3><a name="secureusers">User and group
isolation</a></h3>
+
+<p>
+If the guest configuration does not list any ID mapping, then the user
+and group IDs used inside the container will match those used outside
+the container. In addition, the capabilities associated with a process
+in the container will infer the same privileges they would for a
+process in the host. This has obvious implications for security, since
+a root user inside the container will be able to access any file owned
+by root that is visible to the container, and perform more or less any
+privileged kernel operation. In the absence of additional protection
+from sVirt, this means that the root user inside a container is
+effectively as powerful as the root user in the host. There is no
+security isolation of the root user.
+</p>
+
+<p>
+The ID mapping facility was introduced to allow for stricter control
+over the privileges of users inside the container. It allows apps to
+define rules such as "user ID 0 in the container maps to user ID 1000
+in the host". In addition the privileges associated with capabilities
+are somewhat reduced so that they can not be used to escape from the
+container environment. A full description of user namespaces is outside
+the scope of this document, however LWN has <a
+href="https://lwn.net/Articles/532593/">a good writeup on the
topic</a>.
s/ writeup/write-up
+From the libvirt POV, the key thing to remember is that defining an
ID
+mapping for users and groups in the container XML configuration causes
+libvirt to activate the user namespace feature.
+</p>
+
+
<h2><a name="activation">Systemd Socket Activation
Integration</a></h2>
<p>
--
1.8.3.1