Re: [libvirt] Notes from the KVM Forum relevant to libvirt

24 Aug 2011

      On Tue, Aug 23, 2011 at 4:31 PM, Daniel P. Berrange <berrange@redhat.com> wrote:
...
On Tue, Aug 23, 2011 at 04:24:46PM +0100, Stefan Hajnoczi wrote:
...
On Tue, Aug 23, 2011 at 12:15 PM, Daniel P. Berrange
<berrange@redhat.com> wrote:
...
I was at the KVM Forum / LinuxCon last week and there were many
interesting things discussed which are relevant to ongoing libvirt
development. Here was the list that caught my attention. If I have
missed any, fill in the gaps....
 - Sandbox/container KVM.  The Solaris port of KVM puts QEMU inside
  a zone so that an exploit of QEMU can't escape into the full OS.
  Containers are Linux's parallel of Zones, and while not nearly as
  secure yet, it would still be worth using more containers support
  to confine QEMU.
Can you elaborate on why Linux containers are "not nearly as secure"
[as Solaris Zones]?
Mostly because the Linux namespace functionality is far from complete,
notably lacking proper UID/GID/capability separation, and UID/GID
virtualization wrt filesystems. The longer answer is here:
  https://wiki.ubuntu.com/UserNamespace
So at this time you can't build a secure container on Linux, relying
just on DAC alone. You have to add in a MAC layer ontop of the container
to get full security benefits, which obviously defeats the point of
using the container as a backup for failure in the MAC layer.
Thanks, that is interesting.  I still don't understand why that is a
problem.  Linux containers (lxc) uses a different pid namespace (no
ptrace worries), file system root (restricted to a subdirectory tree),
forbids most device nodes, etc.  Why does the user namespace matter
for security in this case?

I think it matters when giving multiple containers access to the same
file system.  Is that what you'd like to do for libvirt?

Stefan