On Tue, Aug 23, 2011 at 4:31 PM, Daniel P. Berrange <berrange(a)redhat.com> wrote:
On Tue, Aug 23, 2011 at 04:24:46PM +0100, Stefan Hajnoczi wrote:
> On Tue, Aug 23, 2011 at 12:15 PM, Daniel P. Berrange
> <berrange(a)redhat.com> wrote:
> > I was at the KVM Forum / LinuxCon last week and there were many
> > interesting things discussed which are relevant to ongoing libvirt
> > development. Here was the list that caught my attention. If I have
> > missed any, fill in the gaps....
> >
> > - Sandbox/container KVM. The Solaris port of KVM puts QEMU inside
> > a zone so that an exploit of QEMU can't escape into the full OS.
> > Containers are Linux's parallel of Zones, and while not nearly as
> > secure yet, it would still be worth using more containers support
> > to confine QEMU.
>
> Can you elaborate on why Linux containers are "not nearly as secure"
> [as Solaris Zones]?
Mostly because the Linux namespace functionality is far from complete,
notably lacking proper UID/GID/capability separation, and UID/GID
virtualization wrt filesystems. The longer answer is here:
https://wiki.ubuntu.com/UserNamespace
So at this time you can't build a secure container on Linux, relying
just on DAC alone. You have to add in a MAC layer ontop of the container
to get full security benefits, which obviously defeats the point of
using the container as a backup for failure in the MAC layer.
Thanks, that is interesting. I still don't understand why that is a
problem. Linux containers (lxc) uses a different pid namespace (no
ptrace worries), file system root (restricted to a subdirectory tree),
forbids most device nodes, etc. Why does the user namespace matter
for security in this case?
I think it matters when giving multiple containers access to the same
file system. Is that what you'd like to do for libvirt?
Stefan