
On Tue, Aug 23, 2011 at 4:31 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Tue, Aug 23, 2011 at 04:24:46PM +0100, Stefan Hajnoczi wrote:
On Tue, Aug 23, 2011 at 12:15 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
I was at the KVM Forum / LinuxCon last week and there were many interesting things discussed which are relevant to ongoing libvirt development. Here was the list that caught my attention. If I have missed any, fill in the gaps....
- Sandbox/container KVM. The Solaris port of KVM puts QEMU inside a zone so that an exploit of QEMU can't escape into the full OS. Containers are Linux's parallel of Zones, and while not nearly as secure yet, it would still be worth using more containers support to confine QEMU.
Can you elaborate on why Linux containers are "not nearly as secure" [as Solaris Zones]?
Mostly because the Linux namespace functionality is far from complete, notably lacking proper UID/GID/capability separation, and UID/GID virtualization wrt filesystems. The longer answer is here:
https://wiki.ubuntu.com/UserNamespace
So at this time you can't build a secure container on Linux, relying just on DAC alone. You have to add in a MAC layer ontop of the container to get full security benefits, which obviously defeats the point of using the container as a backup for failure in the MAC layer.
Thanks, that is interesting. I still don't understand why that is a problem. Linux containers (lxc) uses a different pid namespace (no ptrace worries), file system root (restricted to a subdirectory tree), forbids most device nodes, etc. Why does the user namespace matter for security in this case? I think it matters when giving multiple containers access to the same file system. Is that what you'd like to do for libvirt? Stefan