On Mon, Aug 11, 2008 at 12:17:48PM +1000, James Morris wrote:
4. Design Considerations
4.1 Consensus in preliminary discussion appears to be that adding
MAC to libvirt will be the most effective approach. Support
may then be extended to virsh, virt-manager, oVirt etc.
I can see a couple of immediate items to address in the libvirt space
- Need to decide how to ensure the VM is run with the correct
security label instead of the default virt_t.
Cannot assume that all VMs have disks configured. Some VMs may
be PXE boot, and use an NFS/iSCSI root filesystem - this is not
visible to the host. Implication is that we can't rely on labelling
of disks files to infer the VM's security context.
This suggests the domain XML format needs to allow for a security
context to be specified at time the VM is defined/created in libvirt.
libvirt would have to takes steps to make sure the VM is started with
this defined context. An approach of including context in the XML
would also allow easy extension to Xen XSM framework in future
where you specify a context at time of VM creation, which is passed
to the hypervisor.
- The storage XML format can already report what label a storage
volume currently has. In addition we need to be able to set the
label.
A few complications...
- We may need to set it in several places - ie a VM may be assigned
a disk based on a stable path such as
/dev/disk/by-uuid/4cb23887-0d02-4e4c-bc95-7599c85afc1a
Which is a symlink to the real (unstable) device name
/dev/sda1
Clearly need to set label on the real device, but may also ned
to change the symlink too ?
- We can't add the new label to the SELinux policy directly,
because the label needs to be on the unstable device name
/dev/sdaXXX which can change across host OS reboots.
Do we instead add the info the udev rules, so when /dev is
populated at boot time by udev the device nodes get the desired
initial labelling ? Or do we manually chcon() the device
at the time we boot the VM ?
- Some storage types don't allow per-file labelling - eg NFS
In those scenarios the storage pool is assigned a label and
all volumes inherit it. So, if two VMs are using NFS files
and need different labelling, they need to use different
directories on the NFS server, so that we can have separate
mount points with appropriate labelling for each.
4.2 Initially, sVirt should "just work" as a means to
isolate VMs,
with minimal administrative interaction. e.g. an option is
added to virt-manager which allows a VM to be designated as
"isolated", and from then on, it is automatically run in a
separate security context, with policy etc. being generated
and managed by libvirt.
4.3 We need to consider very carefully exactly how VMs will be
launched and controlled e.g. overall security ultimately must
be enforced by the host kernel, even though VM launch will be
initially controlled by host userspace.
4.4 We need to consider overall management of labeling both
locally and in distributed environments (e.g. oVirt), as well
as situations where VMs may be migrated between systems,
backed up etc.
We need to define who/what is responsible for ensuring that all hosts
in the cluster have the same policy loaded. Typically libvirt only
aims to provide the mechanism, and not constrain what you do with it.
So perhaps libvirt needs to merely be able to report on what policy
version is loaded as part of host capabilities information.
oVirt (or FreeIPA?) would be responsible for using this info, and also
ensuring that all hosts have same policy if desired/required.
One possible approach may be to represent the security
label
as the UUID of the guest and then translate that to locally
meaningful values as needed.
This implies there needs to be some lookup table of UUID -> security
label mappings on every host in the cluster. This needs to be updated
whenever a new VM is created, which is a fairly significant data sync
task someone/thing needs to take care of. Would be doable for oVirt or
FreeIPA, since they have a network-wide view. virt-manager though has
individual host-centric view of things - it doesn't consider the broader
picture.
4.5 MAC controls/policy will need to be considered for any
control
planes (e.g. /dev/kvm).
I should probably point out that there are in fact two ways in which
KVM/QEMU can be used on a host
- The 'system' instance. There is one of these per host, and it
currently runs as a privileged user (ie root)
- The 'session' instance. There is one of these per user, per host
and it runs as the unprivileged user.
The session instances can only utilize KVM acceleration if the host admin
has given then appropriate group/ACL membership to access /dev/kvm. Likewise
they can only access physical devices if they have neccessary grou/ACL
membership for the device. Network access is SLIRP based unless the admin
has pre-created TUN devices & given them access.
I imagine that for this work we'll primarily target the 'system' instance
and anything that happens to work for the 'session' instances can just be
considered a free bonus
4.10 {lib}semanage needs performance optimization work to reduce
impact on the virt toolchain.
Specifically in libvirt we need to avoid a dependancy on python. For oVirt
we have a requirement that the operating system for a 'managed node' (ie
the host running VMs) can be built into a Live CD / PXE bootable image
that is < 64 MB in size. So any new dependancies from libvirt are very
sensitive in terms of on disk footprint.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|