[Libvir] [RFC] 4 of 4 Linux Container support - start container

This patch adds the start container support. A couple new source files are added - lxc_container.h and lxc_container.c These contain the setup code that runs within the container namespace prior to exec'ing the user specified init. This is a rough outline of the functions involved in starting a container and the namespace and process under which they run: lxcVmStart() - runs under callers process lxcSetupTtyTunnel() - opens a tty and socket pair, tty stored in vmDef double fork to separate from parent process grandchild calls lxcStartContainer() see below parent continues wait for child process(es) if child process was successful, change vm state to running return lxcStartContainer() - runs in parent namespace, child process from lxcVmStart Allocate stack for container clone() - child process will start in lxcChild() see below exit() - once lxcTtyForward returns, the container has exited lxcChild() - runs within container, child process from clone() mount user filesystems mount container /proc lxcExecWithTty() - see below, will not return lxcExecWithTty() - runs within container lxcSetupContainerTty() - opens tty for container Set up SIGCHLD handler fork() Child calls lxcExecContainerInit() see below Parent continues lxcTtyForward - shuttles data between file descriptors until flag is set in this case between the master end of the container tty and the master end of the parent tty exit() - when lxcTtyForward returns, container init has exited lxcExecContainerInit() - runs within contianer, child process from lxcExecWithTty exec containers init if exec fails, exit() There's (at least) a couple issues I don't have good solutions for - 1) In this setup with a tty console, we end up with at least 2 processes per container. One process is running the user init. The CMD listed under ps will be the init as specified in the XML (unless it changes it to something else). The other process is forwarding console traffic between the parent and container pts. The CMD listed in ps depends will depend on the mgmt app used to start the container. Using virsh, it's something like this outside the container: root 10141 1 93 22:05 pts/6 00:27:50 /home/dlesko/src/dev/libvirt-ss/libvirt/src/.libs/lt-virsh -c lxc:/// and this inside the container: root 1 0 93 22:05 pts/6 00:29:19 /home/dlesko/src/dev/libvirt-ss/libvirt/src/.libs/lt-virsh -c lxc:/// This can be a bit confusing. I'm not sure how important it is but it would be nice to change this to something a little more meaningful as is done by ssh. 2) The container can stall when nothing is connected to the parent side pty and console output fills up the buffer. To avoid this, we set the parent side pty to be non-blocking. The result of this is that we will discard any console output once the buffer has filled. When a user does connect to the console, they may get a flood of (potentially very) old data. It would be nice to be able to provide some more recent output once someone connects to the console. -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization

On Wed, Mar 19, 2008 at 11:16:32PM -0700, Dave Leskovec wrote:
This patch adds the start container support. A couple new source files are added - lxc_container.h and lxc_container.c These contain the setup code that runs within the container namespace prior to exec'ing the user specified init.
IMHO there's too much forking going on here. With the stateful driver we should have the daemon be the parent of the forked VM as per the QEMU driver. This will avoid the need to unsafely re-write the config files. It will also enable errors during the domain creation process to be correctly propagated back to the caller. eg, when I tested this patch 'mount' failed, but the libvirt driver still thought all we fine becasue this part of domain creation was being done in the double-fork()d child and thus no errors could be propagated back. Regards, Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

DB> IMHO there's too much forking going on here. With the stateful DB> driver we should have the daemon be the parent of the forked VM as DB> per the QEMU driver. This will avoid the need to unsafely re-write DB> the config files. It will also enable errors during the domain DB> creation process to be correctly propagated back to the caller. In the case of being called from libvirtd, this seems correct. However, if you're not going through the daemon, I think that the double-fork isolation is important, no? Being a child of something as complex as a CIM server seems like a bad idea to me. DB> eg, when I tested this patch 'mount' failed, but the libvirt DB> driver still thought all we fine becasue this part of domain DB> creation was being done in the double-fork()d child and thus no DB> errors could be propagated back. Perhaps the setup can be performed as part of the immediate child and then do the second fork to achieve the isolation (if desired)? -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

On Fri, Mar 21, 2008 at 10:21:48AM -0700, Dan Smith wrote:
DB> IMHO there's too much forking going on here. With the stateful DB> driver we should have the daemon be the parent of the forked VM as DB> per the QEMU driver. This will avoid the need to unsafely re-write DB> the config files. It will also enable errors during the domain DB> creation process to be correctly propagated back to the caller.
In the case of being called from libvirtd, this seems correct. However, if you're not going through the daemon, I think that the double-fork isolation is important, no? Being a child of something as complex as a CIM server seems like a bad idea to me.
The patch I posted to make it use the state driver means all LXC driver calls go via the daemon. So nothing happens directly as a child of the calling application - its all in the context of the daemon. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

DB> The patch I posted to make it use the state driver means all LXC DB> driver calls go via the daemon. So nothing happens directly as a DB> child of the calling application - its all in the context of the DB> daemon. Yeah, just saw that. Even better :) It seems to me that always being in the daemon would allow the console forwarding to happen there instead of being duplicated in every container. I don't really like the idea of having a forwarding process in every container as the parent of the container init. -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@us.ibm.com

On Fri, Mar 21, 2008 at 10:30:36AM -0700, Dan Smith wrote:
DB> The patch I posted to make it use the state driver means all LXC DB> driver calls go via the daemon. So nothing happens directly as a DB> child of the calling application - its all in the context of the DB> daemon.
Yeah, just saw that. Even better :)
It seems to me that always being in the daemon would allow the console forwarding to happen there instead of being duplicated in every container. I don't really like the idea of having a forwarding process in every container as the parent of the container init.
The only problem is that if we restart the daemon we loose all state, and can't control the domain anymore I'm afraid, right ? Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

On Fri, Mar 21, 2008 at 01:33:55PM -0400, Daniel Veillard wrote:
On Fri, Mar 21, 2008 at 10:30:36AM -0700, Dan Smith wrote:
DB> The patch I posted to make it use the state driver means all LXC DB> driver calls go via the daemon. So nothing happens directly as a DB> child of the calling application - its all in the context of the DB> daemon.
Yeah, just saw that. Even better :)
It seems to me that always being in the daemon would allow the console forwarding to happen there instead of being duplicated in every container. I don't really like the idea of having a forwarding process in every container as the parent of the container init.
The only problem is that if we restart the daemon we loose all state, and can't control the domain anymore I'm afraid, right ?
Yes, but we have that problem already with the QEMU and network drivers and there are ways to deal with that which we can implement ina generic manner which works across all drivers. Dan. -- |: Red Hat, Engineering, Boston -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Fri, Mar 21, 2008 at 05:37:50PM +0000, Daniel P. Berrange wrote:
On Fri, Mar 21, 2008 at 01:33:55PM -0400, Daniel Veillard wrote:
On Fri, Mar 21, 2008 at 10:30:36AM -0700, Dan Smith wrote:
DB> The patch I posted to make it use the state driver means all LXC DB> driver calls go via the daemon. So nothing happens directly as a DB> child of the calling application - its all in the context of the DB> daemon.
Yeah, just saw that. Even better :)
It seems to me that always being in the daemon would allow the console forwarding to happen there instead of being duplicated in every container. I don't really like the idea of having a forwarding process in every container as the parent of the container init.
The only problem is that if we restart the daemon we loose all state, and can't control the domain anymore I'm afraid, right ?
Yes, but we have that problem already with the QEMU and network drivers and there are ways to deal with that which we can implement ina generic manner which works across all drivers.
Among the TODOs for Fedora (but not urgent) are adding /etc/libvirt/lxc to the spec file and request access for libvirtd at the SELinux level, Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Dan Smith wrote:
DB> The patch I posted to make it use the state driver means all LXC DB> driver calls go via the daemon. So nothing happens directly as a DB> child of the calling application - its all in the context of the DB> daemon.
Yeah, just saw that. Even better :)
It seems to me that always being in the daemon would allow the console forwarding to happen there instead of being duplicated in every container. I don't really like the idea of having a forwarding process in every container as the parent of the container init.
Right. Need to understand how a devpts namespace impacts this before going too much further down that path. -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization

Daniel P. Berrange wrote:
On Fri, Mar 21, 2008 at 10:21:48AM -0700, Dan Smith wrote:
DB> IMHO there's too much forking going on here. With the stateful DB> driver we should have the daemon be the parent of the forked VM as DB> per the QEMU driver. This will avoid the need to unsafely re-write DB> the config files. It will also enable errors during the domain DB> creation process to be correctly propagated back to the caller.
In the case of being called from libvirtd, this seems correct. However, if you're not going through the daemon, I think that the double-fork isolation is important, no? Being a child of something as complex as a CIM server seems like a bad idea to me.
The patch I posted to make it use the state driver means all LXC driver calls go via the daemon. So nothing happens directly as a child of the calling application - its all in the context of the daemon.
Dan.
Ok, so I'll remove the forks and clean up the status and state handling. -- Best Regards, Dave Leskovec IBM Linux Technology Center Open Virtualization
participants (4)
-
Dan Smith
-
Daniel P. Berrange
-
Daniel Veillard
-
Dave Leskovec