From: "Daniel P. Berrange" <berrange(a)redhat.com>
Update the LXC driver documentation to describe the way
containers are setup by default. Also describe the common
virsh commands for managing containers and a little about
the security. Placeholders for docs about configuring
containers still to be filled in.
Signed-off-by: Daniel P. Berrange <berrange(a)redhat.com>
---
docs/drvlxc.html.in | 401 +++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 366 insertions(+), 35 deletions(-)
diff --git a/docs/drvlxc.html.in b/docs/drvlxc.html.in
index beff214..a745956 100644
--- a/docs/drvlxc.html.in
+++ b/docs/drvlxc.html.in
@@ -3,49 +3,100 @@
<html
xmlns="http://www.w3.org/1999/xhtml">
<body>
<h1>LXC container driver</h1>
+
+ <ul id="toc"></ul>
+
<p>
-The libvirt LXC driver manages "Linux Containers". Containers are sets of
processes
-with private namespaces which can (but don't always) look like separate machines,
but
-do not have their own OS. Here are two example configurations. The first is a very
-light-weight "application container" which does not have its own root image.
+The libvirt LXC driver manages "Linux Containers". At their simplest,
containers
+can just be thought of as a collection of processes, separated from the main
+host processes via a set of resource namespaces and constrained via control
+groups resource tunables. The libvirt LXC driver has no dependancy on the LXC
+userspace tools hosted on
sourceforge.net. It directly utilizers the relevant
+kernel features to build the container environment. This allows for sharing
+of many libvirt technologies across both the QEMU/KVM and LXC drivers. In
+particular sVirt for mandatory access control, auditing of operations,
+integration with control groups and many other features.
</p>
- <h2><a name="project">Project Links</a></h2>
+<h2><a name="cgroups">Control groups
Requirements</a></h2>
- <ul>
- <li>
- The <a
href="http://lxc.sourceforge.net/">LXC</a> Linux
- container system
- </li>
- </ul>
+<p>
+In order to control the resource usage of processes inside containers, the
+libvirt LXC driver requires that certain cgroups controllers are mounted on
+the host OS. The minimum required controllers are 'cpuacct', 'memory'
and
+'devices', while recommended extra controllers are 'cpu',
'freezer' and
+'blkio'. Libvirt will not mount the cgroups filesystem itself, leaving
+this upto the init system to take care of. Systemd will do the right thing
+in this repect, while for other init systems the <code>cgconfig</code>
+init service will be required. For further information, consult the general
+libvirt <a href="cgroups.html">cgroups documentation</a>.
+</p>
-<h2>Cgroups Requirements</h2>
+<h2><a name="namespaces">Namespace
requirements</a></h2>
<p>
-The libvirt LXC driver requires that certain cgroups controllers are
-mounted on the host OS. The minimum required controllers are 'cpuacct',
-'memory' and 'devices', while recommended extra controllers are
-'cpu', 'freezer' and 'blkio'. The /etc/cgconfig.conf &
cgconfig
-init service used to mount cgroups at host boot time. To manually
-mount them use:
+In order to separate processes inside a container from those in the
+primary "host" OS environment, the libvirt LXC driver requires that
+certain kernel namespaces are compiled in. Libvirt currently requires
+the 'mount', 'ipc', 'pid', and 'uts' namespaces to be
available. If
+separate network interfaces are desired, then the 'net' namespace is
+requird. In the near future, the 'user' namespace will optionally be
+supported.
</p>
-<pre>
- # mount -t cgroup cgroup /dev/cgroup -o cpuacct,memory,devices,cpu,freezer,blkio
-</pre>
+<p>
+<strong>NOTE: In the absence of support for the 'user' namespace,
+processes inside containers cannot be securely isolated from host
+process without the use of a mandatory access control technology
+such as SELinux or AppArmor.</strong>
+</p>
+
+<h2><a name="init">Default container setup</a></h2>
+
+<h3><a name="cliargs">Command line arguments</a></h3>
<p>
-NB, the blkio controller in some kernels will not allow creation of nested
-sub-directories which will prevent correct operation of the libvirt LXC
-driver. On such kernels, it may be necessary to unmount the blkio controller.
+When the container "init" process is started, it will typically
+not be given any command line arguments (eg the equivalent of
+the bootloader args visible in <code>/proc/cmdline</code>). If
+any arguments are desired, then must be explicitly set in the
+container XML configuration via one or more <code>initarg</code>
+elements. For example, to run <code>systemd --unit emergency.service</code>
+would use the following XML
</p>
+<pre>
+ <os>
+ <type arch='x86_64'>exe</type>
+ <init>/bin/systemd</init>
+ <initarg>--unit</initarg>
+ <initarg>emergency.service</initarg>
+ </os>
+</pre>
-<h2>Environment setup for the container init</h2>
+<h3><a name="envvars">Environment variables</a></h3>
<p>
When the container "init" process is started, it will be given several useful
-environment variables.
+environment variables. The following standard environment variables are mandated
+by <a
href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInte...
container interface</a>
+to be provided by all container technologies on Linux.
+</p>
+
+<dl>
+<dt>container</dt>
+<dd>The fixed string <code>libvirt-lxc</code> to identify libvirt as
the creator</dd>
+<dt>container_uuid</dt>
+<dd>The UUID assigned to the container by libvirt</dd>
+<dt>PATH</dt>
+<dd>The fixed string <code>/bin:/usr/bin</code></dd>
+<dt>TERM</dt>
+<dd>The fixed string <code>linux</code></dd>
+</dl>
+
+<p>
+In addition to the standard variables, the following libvirt specific
+environment variables are also provided
</p>
<dl>
@@ -54,9 +105,152 @@ environment variables.
<dt>LIBVIRT_LXC_UUID</dt>
<dd>The UUID assigned to the container by libvirt</dd>
<dt>LIBVIRT_LXC_CMDLINE</dt>
-<dd>The unparsed command line arguments specified in the container
configuration</dd>
+<dd>The unparsed command line arguments specified in the container configuration.
+Use of this is discouraged, in favour of passing arguments directly to the
+container init process via the <code>initarg</code> config
element.</dd>
</dl>
+<h3><a name="fsmounts">Filesystem mounts</a></h3>
+
+<p>
+In the absence of any explicit configuration, the container will
+inherit the host OS filesystem mounts. A number of mount points will
+be made read only, or re-mounted with new instances to provide
+container specific data. The following special mounts are setup
+by libvirt
+</p>
+
+<ul>
+<li><code>/dev</code> a new "tmpfs" pre-populated with
authorized device nodes</li>
+<li><code>/dev/pts</code> a new private "devpts" instance for
console devices</li>
+<li><code>/sys</code> the host "sysfs" instance remounted
read-only</li>
+<li><code>/proc</code> a new instance of the "proc"
filesystem</li>
+<li><code>/proc/sys</code> the host "/proc/sys" bind-mounted
read-only</li>
+<li><code>/sys/fs/selinux</code> the host "selinux" instance
remounted read-only</li>
+<li><code>/sys/fs/cgroup/NNNN</code> the host cgroups controllers
bind-mounted to
+only expose the sub-tree associated with the container</li>
+<li><code>/proc/meminfo</code> a FUSE backed file reflecting memory
limits of the container</li>
+</ul>
+
+
+<h3><a name="devnodes">Device nodes</a></h3>
+
+<p>
+The container init process will be started with <code>CAP_MKNOD</code>
+capability removed & blocked from re-acquiring it. As such it will
+not be able to create any device nodes in <code>/dev</code> or anywhere
+else in its filesystems. Libvirt itself will take care of pre-populating
+the <code>/dev</code> filesystem with any devices that the container
+is authorized to use. The current devices that will be made available
+to all containers are
+</p>
+
+<ul>
+<li><code>/dev/zero</code></li>
+<li><code>/dev/null</code></li>
+<li><code>/dev/full</code></li>
+<li><code>/dev/random</code></li>
+<li><code>/dev/urandom</code></li>
+<li><code>/dev/stdin</code> symlinked to
<code>/proc/self/fd/0</code></li>
+<li><code>/dev/stdout</code> symlinked to
<code>/proc/self/fd/1</code></li>
+<li><code>/dev/stderr</code> symlinked to
<code>/proc/self/fd/2</code></li>
+<li><code>/dev/fd</code> symlinked to
<code>/proc/self/fd</code></li>
+<li><code>/dev/ptmx</code> symlinked to
<code>/dev/pts/ptmx</code></li>
+<li><code>/dev/console</code> symlinked to
<code>/dev/pts/0</code></li>
+</ul>
+
+<p>
+In addition, for every console defined in the guest configuration,
+a symlink will be created from <code>/dev/ttyN</code> symlinked to
+the corresponding <code>/dev/pts/M</code> psuedo TTY device. The
+first console will be <code>/dev/tty1</code>, with further consoles
+numbered incrementally from there.
+</p>
+
+<p>
+Further block or character devices will be made available to containers
+depending on their configuration.
+</p>
+
+<!--
+<h2>Container configuration</h2>
+
+<h3>Init process</h3>
+
+<h3>Console devices</h3>
+
+<h3>Filesystem devices</h3>
+
+<h3>Disk devices</h3>
+
+<h3>Block devices</h3>
+
+<h3>USB devices</h3>
+
+<h3>Character devices</h3>
+
+<h3>Network devices</h3>
+-->
+
+<h2>Container security</h2>
+
+<h3>sVirt SELinux</h3>
+
+<p>
+In the absence of the "user" namespace being used, containers cannot
+be considered secure against exploits of the host OS. The sVirt SELinux
+driver provides a way to secure containers even when the "user" namespace
+is not used. The cost is that writing a policy to allow execution of
+arbitrary OS is not practical. The SELinux sVirt policy is typically
+tailored to work with an simpler application confinement use case,
+as provided by the "libvirt-sandbox" project.
+</p>
+
+<h3>Auditing</h3>
+
+<p>
+The LXC driver is integrated with libvirt's auditing subsystem, which
+causes audit messages to be logged whenever there is an operation
+performed against a container which has impact on host resources.
+So for example, start/stop, device hotplug will all log audit messages
+providing details about what action occurred & any resources
+associated with it. There are the following 3 types of audit messages
+</p>
+
+<ul>
+<li><code>VIRT_MACHINE_ID</code> - details of the SELinux process and
+image security labels assigned to the container.</li>
+<li><code>VIRT_CONTROL</code> - details of an action / operation
+performed against a container. There are the following types of
+operation
+ <ul>
+ <li><code>op=start</code> - a container has been started. Provides
+ the machine name, uuid and PID of the <code>libvirt_lxc</code>
+ controller process</li>
+ <li><code>op=init</code> - the init PID of the container has been
+ started. Provides the machine name, uuid and PID of the
+ <code>libvirt_lxc</code> controller process and PID of the
+ init process (in the host PID namespace)</li>
+ <li><code>op=stop</code> - a container has been stopped. Provides
+ the machine name, uuid</li>
+ </ul>
+</li>
+<li><code>VIRT_RESOURCE</code> - details of a host resource
+associated with a container action.</li>
+</ul>
+
+<h3>Device access</h3>
+
+<p>
+All containers are launched with the CAP_MKNOD capability cleared
+and removed from the bounding set. Libvirt will ensure that the
+/dev filesystem is pre-populated with all devices that a container
+is allowed to use. In addition, the cgroup "device" controller is
+configured to block read/write/mknod from all devices except those
+that a container is authorized to use.
+</p>
+
+<h2><a name="exconfig">Example configurations</a></h2>
<h3>Example config version 1</h3>
<p></p>
@@ -121,21 +315,158 @@ debootstrap, whatever) under /opt/vm-1-root:
</domain>
</pre>
+
+<h2><a name="usage">Container usage /
management</a></h2>
+
+<p>
+As with any libvirt virtualization driver, LXC containers can be
+managed via a wide variety of libvirt based tools. At the lowest
+level the <code>virsh</code> command can be used to perform many
+tasks, by passing the <code>-c lxc:///</code> argument. As an
+alternative to repeating the URI with every command, the
<code>LIBVIRT_DEFAULT_URI</code>
+environment variable can be set to <code>lxc:///</code>. The
+examples that follow outline some common operations with virsh
+and LXC. For further details about usage of virsh consult its
+manual page.
+</p>
+
+<h3><a name="usageSave">Defining (saving) container
configuration></a></h3>
+
<p>
-In both cases, you can define and start a container using:</p>
+The <code>virsh define</code> command takes an XML configuration
+document and loads it into libvirt, saving the configuration on disk
+</p>
+
+<pre>
+# virsh -c lxc:/// define myguest.xml
+</pre>
+
+<h3><a name="usageView">Viewing container
configuration</a></h3>
+
+<p>
+The <code>virsh dumpxml</code> command can be used to view the
+current XML configuration of a container. By default the XML
+output reflects the current state of the container. If the
+container is running, it is possible to explicitly request the
+persistent configuration, instead of the current live configuration
+using the <code>--inactive</code> flag
+</p>
+
+<pre>
+# virsh -c lxc:/// dumpxml myguest
+</pre>
+
+<h3><a name="usageStart">Starting containers</a></h3>
+
+<p>
+The <code>virsh start</code> command can be used to start a
+container from a previously defined persistent configuration
+</p>
+
+<pre>
+# virsh -c lxc:/// start myguest
+</pre>
+
+<p>
+It is also possible to start so called "transient" containers,
+which do not require a persistent configuration to be saved
+by libvirt, using the <code>virsh create</code> command.
+</p>
+
<pre>
-virsh --connect lxc:/// define v1.xml
-virsh --connect lxc:/// start vm1
+# virsh -c lxc:/// create myguest.xml
</pre>
-and then get a console using:
+
+
+<h3><a name="usageStop">Stopping containers</a></h3>
+
+<p>
+The <code>virsh shutdown</code> command can be used
+to request a graceful shutdown of the container. By default
+this command will first attempt to send a message to the
+init process via the <code>/dev/initctl</code> device node.
+If no such device node exists, then it will send SIGTERM
+to PID 1 inside the container.
+</p>
+
<pre>
-virsh --connect lxc:/// console vm1
+# virsh -c lxc:/// shutdown myguest
</pre>
-<p>Now doing 'ps -ef' will only show processes in the container, for
-instance. You can undefine it using
+
+<p>
+If the container does not respond to the graceful shutdown
+request, it can be forceably stopped using the <code>virsh destroy</code>
</p>
+
<pre>
-virsh --connect lxc:/// undefine vm1
+# virsh -c lxc:/// destroy myguest
</pre>
+
+
+<h3><a name="usageReboot">Rebooting a
container</a></h3>
+
+<p>
+The <code>virsh reboot</code> command can be used
+to request a graceful shutdown of the container. By default
+this command will first attempt to send a message to the
+init process via the <code>/dev/initctl</code> device node.
+If no such device node exists, then it will send SIGHUP
+to PID 1 inside the container.
+</p>
+
+<pre>
+# virsh -c lxc:/// reboot myguest
+</pre>
+
+<h3><a name="usageDelete">Undefining (deleting) a container
configuration</a></h3>
+
+<p>
+The <code>virsh undefine</code> command can be used to delete the
+persistent configuration of a container. If the guest is currently
+running, this will turn it into a "transient" guest.
+</p>
+
+<pre>
+# virsh -c lxc:/// undefine myguest
+</pre>
+
+<h3><a name="usageConnect">Connecting to a container
console</a></h3>
+
+<p>
+The <code>virsh console</code> command can be used to connect
+to the text console associated with a container. If the container
+has been configured with multiple console devices, then the
+<code>--devname</code> argument can be used to choose the
+console to connect to
+</p>
+
+<pre>
+# virsh -c lxc:/// console myguest
+</pre>
+
+<h3><a name="usageEnter">Running commands in a
container</a></h3>
+
+<p>
+The <code>virsh lxc-enter-namespace</code> command can be used
+to enter the namespaces & security context of a container
+and then execute an arbitrary command.
+</p>
+
+<pre>
+# virsh -c lxc:/// lxc-enter-namespace myguest -- /bin/ls -al /dev
+</pre>
+
+<h3><a name="usageTop">Monitoring container
utilization</a></h3>
+
+<p>
+The <code>virt-top</code> command can be used to monitor the
+activity & resource utilization of all containers on a
+host
+</p>
+
+<pre>
+# virt-top -c lxc:///
+</pre>
+
</body>
</html>
--
1.8.1.4