[Libvir] [RFC] 4 of 4 Linux Container support - start container

19 Mar 2008

      This patch adds the start container support.  A couple new source files are
added - lxc_container.h and lxc_container.c  These contain the setup code that
runs within the container namespace prior to exec'ing the user specified init.

This is a rough outline of the functions involved in starting a container and
the namespace and process under which they run:
lxcVmStart() - runs under callers process
    lxcSetupTtyTunnel() - opens a tty and socket pair, tty stored in vmDef
    double fork to separate from parent process
        grandchild calls lxcStartContainer() see below
        parent continues
    wait for child process(es)
    if child process was successful, change vm state to running
    return

lxcStartContainer() - runs in parent namespace, child process from lxcVmStart
    Allocate stack for container
    clone() - child process will start in lxcChild() see below
    exit() - once lxcTtyForward returns, the container has exited

lxcChild() - runs within container, child process from clone()
    mount user filesystems
    mount container /proc
    lxcExecWithTty() - see below, will not return

lxcExecWithTty() - runs within container
    lxcSetupContainerTty() - opens tty for container
    Set up SIGCHLD handler
    fork()
        Child calls lxcExecContainerInit() see below
        Parent continues
    lxcTtyForward - shuttles data between file descriptors until flag is set
                    in this case between the master end of the container tty
                    and the master end of the parent tty
    exit() - when lxcTtyForward returns, container init has exited

lxcExecContainerInit() - runs within contianer, child process from
                         lxcExecWithTty
    exec containers init
    if exec fails, exit()

There's (at least) a couple issues I don't have good solutions for -
1) In this setup with a tty console, we end up with at least 2 processes per
container.  One process is running the user init.  The CMD listed under ps will
be the init as specified in the XML (unless it changes it to something else).
The other process is forwarding console traffic between the parent and container
pts.  The CMD listed in ps depends will depend on the mgmt app used to start the
container.  Using virsh, it's something like this outside the container:
root     10141     1 93 22:05 pts/6    00:27:50
/home/dlesko/src/dev/libvirt-ss/libvirt/src/.libs/lt-virsh -c lxc:///
and this inside the container:
root         1     0 93 22:05 pts/6    00:29:19
/home/dlesko/src/dev/libvirt-ss/libvirt/src/.libs/lt-virsh -c lxc:///
This can be a bit confusing.  I'm not sure how important it is but it would be
nice to change this to something a little more meaningful as is done by ssh.

2) The container can stall when nothing is connected to the parent side pty and
console output fills up the buffer.  To avoid this, we set the parent side pty
to be non-blocking.  The result of this is that we will discard any console
output once the buffer has filled.  When a user does connect to the console,
they may get a flood of (potentially very) old data.  It would be nice to be
able to provide some more recent output once someone connects to the console.

-- 
Best Regards,
Dave Leskovec
IBM Linux Technology Center
Open Virtualization