This patch adds the start container support. A couple new source files are
added - lxc_container.h and lxc_container.c These contain the setup code that
runs within the container namespace prior to exec'ing the user specified init.
This is a rough outline of the functions involved in starting a container and
the namespace and process under which they run:
lxcVmStart() - runs under callers process
lxcSetupTtyTunnel() - opens a tty and socket pair, tty stored in vmDef
double fork to separate from parent process
grandchild calls lxcStartContainer() see below
parent continues
wait for child process(es)
if child process was successful, change vm state to running
return
lxcStartContainer() - runs in parent namespace, child process from lxcVmStart
Allocate stack for container
clone() - child process will start in lxcChild() see below
exit() - once lxcTtyForward returns, the container has exited
lxcChild() - runs within container, child process from clone()
mount user filesystems
mount container /proc
lxcExecWithTty() - see below, will not return
lxcExecWithTty() - runs within container
lxcSetupContainerTty() - opens tty for container
Set up SIGCHLD handler
fork()
Child calls lxcExecContainerInit() see below
Parent continues
lxcTtyForward - shuttles data between file descriptors until flag is set
in this case between the master end of the container tty
and the master end of the parent tty
exit() - when lxcTtyForward returns, container init has exited
lxcExecContainerInit() - runs within contianer, child process from
lxcExecWithTty
exec containers init
if exec fails, exit()
There's (at least) a couple issues I don't have good solutions for -
1) In this setup with a tty console, we end up with at least 2 processes per
container. One process is running the user init. The CMD listed under ps will
be the init as specified in the XML (unless it changes it to something else).
The other process is forwarding console traffic between the parent and container
pts. The CMD listed in ps depends will depend on the mgmt app used to start the
container. Using virsh, it's something like this outside the container:
root 10141 1 93 22:05 pts/6 00:27:50
/home/dlesko/src/dev/libvirt-ss/libvirt/src/.libs/lt-virsh -c lxc:///
and this inside the container:
root 1 0 93 22:05 pts/6 00:29:19
/home/dlesko/src/dev/libvirt-ss/libvirt/src/.libs/lt-virsh -c lxc:///
This can be a bit confusing. I'm not sure how important it is but it would be
nice to change this to something a little more meaningful as is done by ssh.
2) The container can stall when nothing is connected to the parent side pty and
console output fills up the buffer. To avoid this, we set the parent side pty
to be non-blocking. The result of this is that we will discard any console
output once the buffer has filled. When a user does connect to the console,
they may get a flood of (potentially very) old data. It would be nice to be
able to provide some more recent output once someone connects to the console.
--
Best Regards,
Dave Leskovec
IBM Linux Technology Center
Open Virtualization