[libvirt] [ANNOUNCE][RFC] sVirt: Integrating SELinux and Linux-based virtualization

This is to announce the formation of the sVirt project, which aims to integrate SELinux and Linux-based virtualization (KVM et al). The idea has been discussed a few times over the last year or so, and in recent weeks, a few Fedora folk (such as Dan Walsh, Daniel Berrange and myself) have put together some requirements and had some preliminary technical discussions. The requirements document and initital technical considerations are included below inline for review and discussion. In a nutshell, we'd like to be able to apply distinct security labels to individual VM instances and their resources, so that they can be isolated from each other via MAC policy. We want this to "just work" when configuring and managing VMs via the libvirt toolchain, e.g. provide a simple option during VM creation to make the VM "isolated", and have the toolchain do all the labeling and policy configuration behind the scenes. Greater detail may be found in the requirements document. If you wish to contribute, please reply to this email with any enhancements to the requirements, technical ideas, issues etc. I'd suggest joining the libvir-list (in the to: line) if not already on it, as that is where the integration work is expected to happen. There's also a wiki page: http://selinuxproject.org/page/SVirt ------------------------------------------------------------------------ 11 Aug 2008 sVirt: Integration of SELinux and Linux-based Virtualization Requirements and Design Considerations v1.0 1. Introduction This document establishes a set of functional and technical requirements for integrating SELinux with Linux-based virtualization (i.e. Linux-as-hypervisor schemes such as KVM/Qemu amd lguest). Note that non-Linux hypervisor models such as Xen are not covered explicitly in this document, although may be afforded some MAC coverage due to shared infrastructure (e.g. libvirt). Non-Linux hypervisor models may considered further at a later stage. Also, while this document focuses on SELinux as the MAC scheme, consideration should be given to ensuring support for other label-based MAC schemes. 1.1 Rationale With increased use of virtualization, one security benefit of physically separated systems -- strong isolation -- is reduced, an issue which may be ameliorated with the application of Mandatory Access Control (MAC) security in the host system. Integration of MAC with virtualization will also help increase the overall robustness and security assurance of both host and guest systems. Many threats arising from flaws in the VM environment, or misconfiguration, may be mitigated through tighter isolation and specific MAC policy enforcement. By incorporating MAC support into the virtualization toolchain and documentation, users will also be able to make more use of the MAC security capabilities provided by the OS. 1.2 Use-cases The following use-cases have been identified: o Providing virtualized services with a level of isolation similar to that previously afforded by physical separation. o Increased protection of the host from untrusted VM guests (e.g. for VM hosting providers, grid/cloud servers etc.). o Increased protection of VM guests from each other in the event of host flaws or misconfiguration. Some protection may also be provided against a compromised host. o Consolidating access to multiple networks which require strong isolation from each other (e.g. military, government, corporate extranets, internal "Chinese wall" separation etc.) o Strongly isolating desktop applications by running them in separately labeled VMs (e.g. online banking in one VM and World of Warcraft in another; opening untrusted office documents in an isolated VM for view/print only). o Forensic analysis of disk images, binaries, browser scripts etc. in a highly isolated VM, protecting both the host system and the integrity of the components under analysis. o Isolated test environments, preventing e.g. simulated trades from entering a production system, leakage of sensitive customer data to internal networks etc. o General ability to specify a wide range of high-level security goals for virtualization through flexible MAC policy. 2. Current Situation SELinux is already able to provide general MAC protection to Linux-based virtualization, such as protecting the integrity and confidentiality of disk images and providing strong isolation of Linux hypervisor processes from the rest of the system. There is, however, no explicit support for Linux-based virtualization in SELinux, and all VMs currently run in the same security context. Thus, there is no MAC isolation applied between VMs. 3. Functional Security Requirements 3.1 Strong isolation between active VMs Running different VMs with different MAC labels allows stronger isolation between VMs, reducing risks arising from flaws in or misconfiguration of the VM environment. For example, the threat of a rogue VM exploiting a flaw in KVM could be greatly reduced if it is isolated from other VMs by MAC security policy. 3.2 Improved control over access to VM resources Distinct MAC labeling of resources belonging to VMs (disk images, disk partitions etc.) binds VM instances to those resources, ensuring that VMs can only access their own resources. This can protect the VM from invalid VM resources; and protect VM resources from flawed or misconfigured VMs. 3.3 Improved control over access to shared resources Where VMs may share resources (e.g. miscellaneous devices, virtual networking, read-only disk images etc.), fine-grained MAC policy may be specified to control information flow between VMs. 3.4 Fine-grained interaction with the host With distinct labeling of VMs and their resources, interactions with host entities on a per-VM basis may then be controlled by MAC policy. An example of this would be to safely allow different users on the host to administer different VMs. Configuration of this could be managed via the RBAC scheme integrated with SELinux. 3.5 Coarse MAC containment of VMs High-level security constraints may be applied to different VMs, to allow simplified lock-down of the overall system. For example, a VM may be restricted so that it cannot transmit TCP traffic on port 25 via virtual networking (i.e. the guest cannot be used as a spam bot even if it is compromised via a rootkit). 3.6 Leverage general MAC architecture As MAC is extended to the desktop, it will be possible to apply uniform MAC policy to the OS, desktop and Linux-based virtualization components. This will provide a basis for a variety of sophisticated security applications such as a virtualized MLS desktop with different windows running VMs at different security levels. 4. Design Considerations 4.1 Consensus in preliminary discussion appears to be that adding MAC to libvirt will be the most effective approach. Support may then be extended to virsh, virt-manager, oVirt etc. 4.2 Initially, sVirt should "just work" as a means to isolate VMs, with minimal administrative interaction. e.g. an option is added to virt-manager which allows a VM to be designated as "isolated", and from then on, it is automatically run in a separate security context, with policy etc. being generated and managed by libvirt. 4.3 We need to consider very carefully exactly how VMs will be launched and controlled e.g. overall security ultimately must be enforced by the host kernel, even though VM launch will be initially controlled by host userspace. 4.4 We need to consider overall management of labeling both locally and in distributed environments (e.g. oVirt), as well as situations where VMs may be migrated between systems, backed up etc. One possible approach may be to represent the security label as the UUID of the guest and then translate that to locally meaningful values as needed. 4.5 MAC controls/policy will need to be considered for any control planes (e.g. /dev/kvm). 4.6 We need to ensure that we have high-level management tools available from the start, avoiding past mistakes with SELinux where we dropped a bunch of complicated pieces in the user's lap. All aspects of sVirt must be managed by the via the virt tools and only present high-level abstractions to the user (and then, only if essential). 4.7 As sVirt involves creating and managing arbitrary numbers of security labels, we will need to address the effects of label space explosion and resulting complexity. Possible approaches already discussed include: - SELinux RBAC or IBAC mechanisms. - MCS labels. This has the possible advantage of not requiring any policy changes for simple isolation: just have a general policy which applies to all MCS labels, and possibly customize behavior via the type. e.g. system_u:object_r:virt_image_t:c1 ^ ^ type mcs label 'c0' = isolated inactive VM image/device. 'cN' = dynamically assigned MCS label for active VM, bound to the UUID, and translated via MCS translation file. i.e. a user running ls -Z or ps -Z will see the UUID instead of cN. 'virt_image_t' = standard VM which is also isolated if an MCS label is present. 'virt_image_nonet_t' = VM with no network access at all. etc. So, MCS is used to enforce VM isolation, Type is used to enforce general security behavior. We can then provide a high-level GUI interface for selecting this behavior without the admin knowing anything about SELinux. Note: proof of concept testing has been performed using MCS labels, which appears to be workable at this stage. - Utilize the new hierarchical types being proposed upstream by NEC. (No analysis done yet). 4.8 Accessing images via shared storage will present challenges as we do not yet have labeling for networked filesystems. 4.9 We must ensure that any MAC support integrated into libvirt is readily debuggable. e.g. hook into auditsp to actively process policy violations and handle them via the virt toolchain; also develop good setroubleshoot plugins. 4.10 {lib}semanage needs performance optimization work to reduce impact on the virt toolchain. ------------------------------------------------------------------------

James Morris wrote:
This is to announce the formation of the sVirt project, which aims to integrate SELinux and Linux-based virtualization (KVM et al).
The idea has been discussed a few times over the last year or so, and in recent weeks, a few Fedora folk (such as Dan Walsh, Daniel Berrange and myself) have put together some requirements and had some preliminary technical discussions.
The requirements document and initital technical considerations are included below inline for review and discussion.
In a nutshell, we'd like to be able to apply distinct security labels to individual VM instances and their resources, so that they can be isolated from each other via MAC policy. We want this to "just work" when configuring and managing VMs via the libvirt toolchain, e.g. provide a simple option during VM creation to make the VM "isolated", and have the toolchain do all the labeling and policy configuration behind the scenes. Greater detail may be found in the requirements document.
If you wish to contribute, please reply to this email with any enhancements to the requirements, technical ideas, issues etc. I'd suggest joining the libvir-list (in the to: line) if not already on it, as that is where the integration work is expected to happen.
There's also a wiki page: http://selinuxproject.org/page/SVirt
------------------------------------------------------------------------
11 Aug 2008
sVirt: Integration of SELinux and Linux-based Virtualization
Requirements and Design Considerations v1.0
1. Introduction
This document establishes a set of functional and technical requirements for integrating SELinux with Linux-based virtualization (i.e. Linux-as-hypervisor schemes such as KVM/Qemu amd lguest).
Note that non-Linux hypervisor models such as Xen are not covered explicitly in this document, although may be afforded some MAC coverage due to shared infrastructure (e.g. libvirt). Non-Linux hypervisor models may considered further at a later stage.
Also, while this document focuses on SELinux as the MAC scheme, consideration should be given to ensuring support for other label-based MAC schemes.
1.1 Rationale
With increased use of virtualization, one security benefit of physically separated systems -- strong isolation -- is reduced,
This issue can always be readily resolved by going back to physically separated hardware. The strength of isolation of a virtual machine mechanism is one of the important characteristics that should be considered when it is chosen over a hardware isolation scheme, but the systems available today appear to do a pretty good job on this, so I would like to see some evidence of this claim before accepting it as a primary premise.
an issue which may be ameliorated with the application of Mandatory Access Control (MAC) security in the host system.
Ok. I've been using VMs for 30 years and MAC for 20, and to my mind the only way this statement can possibly be true is if both the MAC system and the VM system are inadequate to their respective tasks.
Integration of MAC with virtualization will also help increase the overall robustness and security assurance of both host and guest systems. Many threats arising from flaws in the VM environment, or misconfiguration, may be mitigated through tighter isolation and specific MAC policy enforcement.
You are not going to solve any problems of misconfiguration this way, you are just going to make the environment more complicated and prone to misconfiguration.
By incorporating MAC support into the virtualization toolchain and documentation, users will also be able to make more use of the MAC security capabilities provided by the OS.
Would you back this assertion? VMs seem no better a vehicle for MAC integration than user sessions in any way that I can see.
1.2 Use-cases
The following use-cases have been identified:
o Providing virtualized services with a level of isolation similar to that previously afforded by physical separation.
As I mentioned before, this doesn't seem to make any sense. I can see the value in running a MAC system under a VM in limited cases, but the other way around does not seem particularly rational.
o Increased protection of the host from untrusted VM guests (e.g. for VM hosting providers, grid/cloud servers etc.).
I can see where someone who is not familiar with VMs might think this is plausible, but looking at the interfaces and separation provided by VMs makes it pretty clear that this isn't even a belts and braces level of extra protection.
o Increased protection of VM guests from each other in the event of host flaws or misconfiguration. Some protection may also be provided against a compromised host.
Flaws and misconfiguration will not be helped by this.
o Consolidating access to multiple networks which require strong isolation from each other (e.g. military, government, corporate extranets, internal "Chinese wall" separation etc.)
The VMs provide this. Isolation is easy. Sharing is what's hard.
o Strongly isolating desktop applications by running them in separately labeled VMs (e.g. online banking in one VM and World of Warcraft in another; opening untrusted office documents in an isolated VM for view/print only).
And how does a MAC environment help this when we still barely have SELinux X11 limping along?
o Forensic analysis of disk images, binaries, browser scripts etc. in a highly isolated VM, protecting both the host system and the integrity of the components under analysis.
You're just restating "stronger isolation".
o Isolated test environments, preventing e.g. simulated trades from entering a production system, leakage of sensitive customer data to internal networks etc.
You're just restating "stronger isolation".
o General ability to specify a wide range of high-level security goals for virtualization through flexible MAC policy.
What does that mean in this context? How is it useful? Usually by the time someone decides that they need to use physical separation or that they can simulate it with VMs they've already decided that fancy software schemes like MAC are insufficient, and that's often because the MAC system (SELinux or B&L) is too hard to set up for their uses.
2. Current Situation
SELinux is already able to provide general MAC protection to Linux-based virtualization, such as protecting the integrity and confidentiality of disk images and providing strong isolation of Linux hypervisor processes from the rest of the system.
There is, however, no explicit support for Linux-based virtualization in SELinux, and all VMs currently run in the same security context. Thus, there is no MAC isolation applied between VMs.
You can run them with different MLS values, can't you?
3. Functional Security Requirements
3.1 Strong isolation between active VMs
Running different VMs with different MAC labels allows stronger isolation between VMs, reducing risks arising from flaws in or misconfiguration of the VM environment. For example, the threat of a rogue VM exploiting a flaw in KVM could be greatly reduced if it is isolated from other VMs by MAC security policy.
You can run them with different MLS values, can't you?
3.2 Improved control over access to VM resources
.... OK, I get the picture. You really want to run VMs under SELinux. Go wild, but I think you are significantly overstating the value and creating a project where a a little bit of policy work ought to handle all but the most extreme cases. DAC, MAC, Containers, VMs, and separate hardware are all mechanisms for providing (in ascending order) measures of isolation. It makes sense to tighten a DAC system by adding MAC, or running a MAC system under a VM, but to each the sort of isolation it is good at.

On Sun, 10 Aug 2008, Casey Schaufler wrote:
1.1 Rationale
With increased use of virtualization, one security benefit of physically separated systems -- strong isolation -- is reduced,
This issue can always be readily resolved by going back to physically separated hardware. The strength of isolation of a virtual machine mechanism is one of the important characteristics that should be considered when it is chosen over a hardware isolation scheme, but the systems available today appear to do a pretty good job on this, so I would like to see some evidence of this claim before accepting it as a primary premise.
I suspect you misunderstood an important aspect of this in that we are targeting Linux-based virtualization, where the VMs are running inside Linux processes. In this case, the isolation depends on DAC in the host, and the idea behind sVirt is to apply MAC security to these VMs and their resources. Currently, all such VMs run with the same security label, and all of the resources accessed (e.g. disk images) have the same labels. We are simply proposing a mechanism to allow distinct security labels to be applied to these entities, which in the simplest case, will allow MAC policy to enforce isolation between them.
an issue which may be ameliorated with the application of Mandatory Access Control (MAC) security in the host system.
Ok. I've been using VMs for 30 years and MAC for 20, and to my mind the only way this statement can possibly be true is if both the MAC system and the VM system are inadequate to their respective tasks.
I'm not sure how you come to the conclusion that the MAC system must be inadequate, but this scheme does attempt to improve the robustness of isolation between Linux-based VMs. We're applying the idea of containing applications in the face of potential flaws and misconfiguration to the case of applications which happen to be VMs. In this sense, it is not different to containing any other form of application (e.g. a flaw in the VM might be exploited by malicious code).
Integration of MAC with virtualization will also help increase the overall robustness and security assurance of both host and guest systems. Many threats arising from flaws in the VM environment, or misconfiguration, may be mitigated through tighter isolation and specific MAC policy enforcement.
You are not going to solve any problems of misconfiguration this way, you are just going to make the environment more complicated and prone to misconfiguration.
I don't think this is valid as an absolute statement. e.g. by adding distinct labels to VMs and their resources at the toolchain level, the admin will not be involved beyond specifying that the VM is to be created as an "isolated" instance. The VMs will simply be running with different labels and bound to their own resources with those labels, enforced by MAC policy. If an admin manually launches a VM and accidentally specifies an incorrect disk image, MAC policy will prevent the VM from accessing the image. There is nothing characteristically different between this and the general rationale for MAC in the OS, e.g. to ensure that a web server can only serve appropriately labeled content. The risk that the MAC system itself could be misconfigured does not change with sVirt, although in the proposed scheme, VM labeling will be automated. Again, keep in mind that we're talking about Linux-based virtualization, where the VM is literally just another application running in the host OS.
By incorporating MAC support into the virtualization toolchain and documentation, users will also be able to make more use of the MAC security capabilities provided by the OS.
Would you back this assertion? VMs seem no better a vehicle for MAC integration than user sessions in any way that I can see.
I don't understand what needs to be backed here. Currently, MAC is not used to separate different Linux-based VMs, and by integrating MAC support, people will be able to further utilize MAC.
1.2 Use-cases
The following use-cases have been identified: o Providing virtualized services with a level of isolation similar to that previously afforded by physical separation.
As I mentioned before, this doesn't seem to make any sense. I can see the value in running a MAC system under a VM in limited cases, but the other way around does not seem particularly rational.
Please be specific: in what way is it not rational?
o Increased protection of the host from untrusted VM guests (e.g. for VM hosting providers, grid/cloud servers etc.).
I can see where someone who is not familiar with VMs might think this is plausible, but looking at the interfaces and separation provided by VMs makes it pretty clear that this isn't even a belts and braces level of extra protection.
I disagree. VMs are software, and all software has bugs. If you can break out of the VM in Linux-based virtualization, you can probably get to all of the other VMs on the system (they are running in the same DAC and MAC contexts). Applying distinct labels allows MAC policy to be enforced by the host kernel so that such breaches are contained within the isolated host VM process. Or are you saying that you don't think hypervisors can be broken out of ?
o Increased protection of VM guests from each other in the event of host flaws or misconfiguration. Some protection may also be provided against a compromised host.
Flaws and misconfiguration will not be helped by this.
I don't see why not. With MAC containment, if, say, a web server on the host was compromised, an attacker may be prevented from interfering with the VMs running on the system. This is already true to some extent with coarse-grained MAC.
o Consolidating access to multiple networks which require strong isolation from each other (e.g. military, government, corporate extranets, internal "Chinese wall" separation etc.)
The VMs provide this. Isolation is easy. Sharing is what's hard.
Again, it's important to understand that these VMs are merely Linux processes and are only currently afforded the same level of isolation as standard DAC. DAC is of course considered inadequate as a means to protect against a range of common threats including software flaws. Please refer to: http://www.nsa.gov/selinux/papers/inevitability/ (I know you know this, but not everyone reading this thread is familiar with the case for MAC).
o Strongly isolating desktop applications by running them in separately labeled VMs (e.g. online banking in one VM and World of Warcraft in another; opening untrusted office documents in an isolated VM for view/print only).
And how does a MAC environment help this when we still barely have SELinux X11 limping along?
When they progress to walking.
o Forensic analysis of disk images, binaries, browser scripts etc. in a highly isolated VM, protecting both the host system and the integrity of the components under analysis.
You're just restating "stronger isolation".
Yes, but these are use-cases. They are condensed into requirements further on.
o General ability to specify a wide range of high-level security goals for virtualization through flexible MAC policy.
What does that mean in this context? How is it useful?
It means going beyond simple isolation and specifying security goals which are then mapped to policy. So, you can design a system in the security problem domain, rather, than say, the security model domain (e.g. BLP). Note that this is a logical extension of the system aimed at advanced users. In the default case, VMs will simply be isolated from each other via MAC.
Usually by the time someone decides that they need to use physical separation or that they can simulate it with VMs they've already decided that fancy software schemes like MAC are insufficient, and that's often because the MAC system (SELinux or B&L) is too hard to set up for their uses.
That's not my experience, but I guess anything can happen.
2. Current Situation
SELinux is already able to provide general MAC protection to Linux-based virtualization, such as protecting the integrity and confidentiality of disk images and providing strong isolation of Linux hypervisor processes from the rest of the system. There is, however, no explicit support for Linux-based virtualization in SELinux, and all VMs currently run in the same security context. Thus, there is no MAC isolation applied between VMs.
You can run them with different MLS values, can't you?
There is no explicit support for doing this, and in fact, that is one of the things that sVirt will address. In SELinux, an MLS label is simply part of the overall security context.
OK, I get the picture. You really want to run VMs under SELinux. Go wild, but I think you are significantly overstating the value and creating a project where a a little bit of policy work ought to handle all but the most extreme cases.
The proof of concept code is indeed simple policy/labeling changes, although we want to ensure that we fully understand the requirements, and implement a flexible and generally useful scheme. Support for this also needs to be built directly into the virt toolchain so that the user is provided with a complete solution, rather than a collection of pieces.
DAC, MAC, Containers, VMs, and separate hardware are all mechanisms for providing (in ascending order) measures of isolation. It makes sense to tighten a DAC system by adding MAC, or running a MAC system under a VM, but to each the sort of isolation it is good at.
So, in the case of a Linux-based VM running as a process in a DAC context, we are indeed tightening a DAC system by adding MAC. - James -- James Morris <jmorris@namei.org>

James Morris wrote:
On Sun, 10 Aug 2008, Casey Schaufler wrote:
1.1 Rationale
With increased use of virtualization, one security benefit of physically separated systems -- strong isolation -- is reduced,
This issue can always be readily resolved by going back to physically separated hardware. The strength of isolation of a virtual machine mechanism is one of the important characteristics that should be considered when it is chosen over a hardware isolation scheme, but the systems available today appear to do a pretty good job on this, so I would like to see some evidence of this claim before accepting it as a primary premise.
I suspect you misunderstood an important aspect of this in that we are targeting Linux-based virtualization, where the VMs are running inside Linux processes. In this case, the isolation depends on DAC in the host, and the idea behind sVirt is to apply MAC security to these VMs and their resources.
Currently, all such VMs run with the same security label, and all of the resources accessed (e.g. disk images) have the same labels.
We are simply proposing a mechanism to allow distinct security labels to be applied to these entities, which in the simplest case, will allow MAC policy to enforce isolation between them.
Well, let's look at the situation and see what sort of risk we're talking about. A VM running inside a Linux process is subject to the same kinds of vulnerabilities as any other program too be sure. So the problem you are looking to address is that the label of the process is based on the label of the program file. This problem is hardly unique to VMs, as anyone who wants to run two web servers with different MLS values will tell you.
...
You are not going to solve any problems of misconfiguration this way, you are just going to make the environment more complicated and prone to misconfiguration.
I don't think this is valid as an absolute statement.
You're correct. It is a strong opinion that I hold. My strong opinions and $4.55 will get you a nice coffee at Peets.
...
I can see where someone who is not familiar with VMs might think this is plausible, but looking at the interfaces and separation provided by VMs makes it pretty clear that this isn't even a belts and braces level of extra protection.
I disagree.
VMs are software, and all software has bugs. If you can break out of the VM in Linux-based virtualization, you can probably get to all of the other VMs on the system (they are running in the same DAC and MAC contexts).
Assuming again that you're using program based MAC. A traditional B&L system where the process retains its label through exec() would not have this shortcoming.
Applying distinct labels allows MAC policy to be enforced by the host kernel so that such breaches are contained within the isolated host VM process.
Or are you saying that you don't think hypervisors can be broken out of ?
Not any more than web servers, name servers, or game programs, many of which don't do particularly well in a program based MAC environment, either.
o Increased protection of VM guests from each other in the event of host flaws or misconfiguration. Some protection may also be provided against a compromised host.
Flaws and misconfiguration will not be helped by this.
I don't see why not. With MAC containment, if, say, a web server on the host was compromised, an attacker may be prevented from interfering with the VMs running on the system. This is already true to some extent with coarse-grained MAC.
Sure, there are some flaws and misconfigurations that will be caught, but I would never count on it as a security feature in an environment that matters.
...
OK, I get the picture. You really want to run VMs under SELinux. Go wild, but I think you are significantly overstating the value and creating a project where a a little bit of policy work ought to handle all but the most extreme cases.
The proof of concept code is indeed simple policy/labeling changes, although we want to ensure that we fully understand the requirements, and implement a flexible and generally useful scheme.
Support for this also needs to be built directly into the virt toolchain so that the user is provided with a complete solution, rather than a collection of pieces.
How do you envision the Virt toolchain changing to support SELinux? I confess to being pretty curious about what you think needs to change.
DAC, MAC, Containers, VMs, and separate hardware are all mechanisms for providing (in ascending order) measures of isolation. It makes sense to tighten a DAC system by adding MAC, or running a MAC system under a VM, but to each the sort of isolation it is good at.
So, in the case of a Linux-based VM running as a process in a DAC context, we are indeed tightening a DAC system by adding MAC.
Yeah, you're right there.

On Monday 11 August 2008 19:31, James Morris <jmorris@namei.org> wrote:
I suspect you misunderstood an important aspect of this in that we are targeting Linux-based virtualization, where the VMs are running inside Linux processes. In this case, the isolation depends on DAC in the host, and the idea behind sVirt is to apply MAC security to these VMs and their resources.
Currently, all such VMs run with the same security label, and all of the resources accessed (e.g. disk images) have the same labels.
http://en.wikipedia.org/wiki/NetTop So it's basically a free implementation of NetTop?
an issue which may be ameliorated with the application of Mandatory Access Control (MAC) security in the host system.
Ok. I've been using VMs for 30 years and MAC for 20, and to my mind the only way this statement can possibly be true is if both the MAC system and the VM system are inadequate to their respective tasks.
I'm not sure how you come to the conclusion that the MAC system must be inadequate, but this scheme does attempt to improve the robustness of isolation between Linux-based VMs.
I think that Casey's idea is that if someone breaks the VM separation then you lose it all. For separation based on UML there are obvious benefits to having different labels for processes and files so that if someone cracks the UML kernel then they end up with just a regular user access on the Linux host. Which of course they could then try to crack with any of the usual local-root exploits. For separation based on Xen if someone cracks the hypervisor then you lose everything. For KVM (which seems to be the future of Linux virtualisation) I don't know enough to comment.
We're applying the idea of containing applications in the face of potential flaws and misconfiguration to the case of applications which happen to be VMs. In this sense, it is not different to containing any other form of application (e.g. a flaw in the VM might be exploited by malicious code).
VMWare has it's own device drivers. Surely someone who wanted to attack a VMWare based system would go for the drivers which have the ability to override any other protection mechanisms. But I think that constraining the user-space code (as done in NetTop) does provide significant benefits.
Again, keep in mind that we're talking about Linux-based virtualization, where the VM is literally just another application running in the host OS.
So by "Linux-based" you mean in contrast to Xen which has the Xen kernel (not Linux) running on the hardware?
I don't understand what needs to be backed here. Currently, MAC is not used to separate different Linux-based VMs, and by integrating MAC support, people will be able to further utilize MAC.
One thing that should be noted is the labelled network benefits. If you had several groups of virtual servers running at different levels and wanted to prevent information leaks then having SE Linux contexts and labelled networking could make things a little easier. I have had some real challenges in managing firewall rules for Xen servers. My general practice is to try and make sure that there is no real need for firewalls between hosts on the same hardware (not that I want it this way - it's what technical and management issues force me to). So for example if I have an ISP Xen server running virtual machines for a number of organisations I make sure that they are either all within a similar trust boundary (IE affiliated groups) or all mutually untrusting (IE other IP addresses in the same net-block are treated the same as random hosts on the net).
o Increased protection of the host from untrusted VM guests (e.g. for VM hosting providers, grid/cloud servers etc.).
I can see where someone who is not familiar with VMs might think this is plausible, but looking at the interfaces and separation provided by VMs makes it pretty clear that this isn't even a belts and braces level of extra protection.
I disagree.
VMs are software, and all software has bugs. If you can break out of the VM in Linux-based virtualization, you can probably get to all of the other VMs on the system (they are running in the same DAC and MAC contexts).
Applying distinct labels allows MAC policy to be enforced by the host kernel so that such breaches are contained within the isolated host VM process.
Or are you saying that you don't think hypervisors can be broken out of ?
The issue is whether the hypervisor you care about can be broken out of in that way. It seems that if someone can break out of Xen then you just lose. For KVM I don't know the situation, do you have a good reference for how it works? http://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine The above web page says that KVM is all based in the kernel, in which case why would it be any more resilient than Xen?
o Consolidating access to multiple networks which require strong isolation from each other (e.g. military, government, corporate extranets, internal "Chinese wall" separation etc.)
The VMs provide this. Isolation is easy. Sharing is what's hard.
Again, it's important to understand that these VMs are merely Linux processes and are only currently afforded the same level of isolation as standard DAC.
How does the "VMs are merely Linux processes" fit with the description of KVM? Or are you talking about some other virtualisation system? My virtualisation experience of recent times has only been Xen sys-admin.
o Forensic analysis of disk images, binaries, browser scripts etc. in a highly isolated VM, protecting both the host system and the integrity of the components under analysis.
You're just restating "stronger isolation".
Yes, but these are use-cases. They are condensed into requirements further on.
Are there any tools that do forensics in a sensible way? Some time ago in a conversation with someone who uses forensics tools professionally I was shocked to discover that his tools didn't even do anything basic like storing SHA hashes of all the data. It seems to me that if the state of the art doesn't even use cryptographically secure hashes to preserve the trail of evidence then we can probably forget about any of the advanced stuff.
OK, I get the picture. You really want to run VMs under SELinux. Go wild, but I think you are significantly overstating the value and creating a project where a a little bit of policy work ought to handle all but the most extreme cases.
The proof of concept code is indeed simple policy/labeling changes, although we want to ensure that we fully understand the requirements, and implement a flexible and generally useful scheme.
http://www.nsa.gov/seLinux/list-archive/0409/thread_body53.cfm#8690 Actually we had proof of concept for some of this with VMWare almost four years ago. If NetTop isn't enough of a proof of concept... -- russell@coker.com.au http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development

On Tue, Aug 12, 2008 at 03:57:46PM +1000, Russell Coker wrote:
On Monday 11 August 2008 19:31, James Morris <jmorris@namei.org> wrote: I think that Casey's idea is that if someone breaks the VM separation then you lose it all. For separation based on UML there are obvious benefits to having different labels for processes and files so that if someone cracks the UML kernel then they end up with just a regular user access on the Linux host. Which of course they could then try to crack with any of the usual local-root exploits.
For separation based on Xen if someone cracks the hypervisor then you lose everything.
Xen is out of scope for this initial work, because we don't want to get involved in the hypervisor code, since then we have to deal with the Xen XSM framework too.
For KVM (which seems to be the future of Linux virtualisation) I don't know enough to comment.
In KVM, virtual machines really are just processes as far as the host OS is concerned. It is very similar to UML in this respect, except you don't need to have a special kernel image for your VM.
Again, keep in mind that we're talking about Linux-based virtualization, where the VM is literally just another application running in the host OS.
So by "Linux-based" you mean in contrast to Xen which has the Xen kernel (not Linux) running on the hardware?
By Linux-based we mean a virtualization platform where Linux *is* the hypervisor. This includes KVM, UML. It specifically excludes Xen, since it has a separate hypervisor underneath the host kernel. That's not to say the work couldn't be extended to Xen later, its merely not a core focus.
The issue is whether the hypervisor you care about can be broken out of in that way. It seems that if someone can break out of Xen then you just lose. For KVM I don't know the situation, do you have a good reference for how it works?
http://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine
The above web page says that KVM is all based in the kernel, in which case why would it be any more resilient than Xen?
The best way is to thing of KVM, as an accelerator for QEMU. If you're not already familiar, QEMU provides CPU emulation, and device emulation for a wide variety of platforms. The KVM kernel module basically provides a simple API to userspace which allows QEMU's CPU emulation to be switched out in favour of using hardware virtualization capabilities available from latest generation CPUs. The QEMU device emulation is still used. People typically claim that Xen is more resilient than KVM because it has a separate hypervisor and thus has a smaller trusted codebase. In practice this is smoke & mirrors, because to do anything useful Xen still has this Dom0 host kernel which has access to all hardware. So I wouldn't claim either Xen or KVM are inherantly more secure than each other.
o Consolidating access to multiple networks which require strong isolation from each other (e.g. military, government, corporate extranets, internal "Chinese wall" separation etc.)
The VMs provide this. Isolation is easy. Sharing is what's hard.
Again, it's important to understand that these VMs are merely Linux processes and are only currently afforded the same level of isolation as standard DAC.
How does the "VMs are merely Linux processes" fit with the description of KVM? Or are you talking about some other virtualisation system?
Again think of the KVM kernel module as simply a CPU accelerator for QEMU. A VM is just a QEMU process which is using KVM for its CPU virtualization. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, 12 Aug 2008, Russell Coker wrote:
having different labels for processes and files so that if someone cracks the UML kernel then they end up with just a regular user access on the Linux host. Which of course they could then try to crack with any of the usual local-root exploits.
For separation based on Xen if someone cracks the hypervisor then you lose everything.
For KVM (which seems to be the future of Linux virtualisation) I don't know enough to comment.
KVM uses a modified version of Qemu where guests run as Linux processes. There are some useful documents here: http://kvm.qumranet.com/kvmwiki/Documents (The OLS paper especially).
So by "Linux-based" you mean in contrast to Xen which has the Xen kernel (not Linux) running on the hardware?
Yes.
I don't understand what needs to be backed here. Currently, MAC is not used to separate different Linux-based VMs, and by integrating MAC support, people will be able to further utilize MAC.
One thing that should be noted is the labelled network benefits. If you had several groups of virtual servers running at different levels and wanted to prevent information leaks then having SE Linux contexts and labelled networking could make things a little easier.
I have had some real challenges in managing firewall rules for Xen servers. My general practice is to try and make sure that there is no real need for firewalls between hosts on the same hardware (not that I want it this way - it's what technical and management issues force me to).
So for example if I have an ISP Xen server running virtual machines for a number of organisations I make sure that they are either all within a similar trust boundary (IE affiliated groups) or all mutually untrusting (IE other IP addresses in the same net-block are treated the same as random hosts on the net).
Thanks for the insights -- we expect to address the virtual networking aspect in some way.
The issue is whether the hypervisor you care about can be broken out of in that way. It seems that if someone can break out of Xen then you just lose. For KVM I don't know the situation, do you have a good reference for how it works?
http://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine
The above web page says that KVM is all based in the kernel, in which case why would it be any more resilient than Xen?
KVM uses a kernel module to utilize the virt hardware (which Qemu interfaces with via /dev/kvm), but the guest runs in a userspace process. I'm not comparing which is more resilient. - James -- James Morris <jmorris@namei.org>

On Tuesday 12 August 2008 5:57:19 am James Morris wrote:
On Tue, 12 Aug 2008, Russell Coker wrote:
One thing that should be noted is the labelled network benefits. If you had several groups of virtual servers running at different levels and wanted to prevent information leaks then having SE Linux contexts and labelled networking could make things a little easier.
I have had some real challenges in managing firewall rules for Xen servers. My general practice is to try and make sure that there is no real need for firewalls between hosts on the same hardware (not that I want it this way - it's what technical and management issues force me to).
So for example if I have an ISP Xen server running virtual machines for a number of organisations I make sure that they are either all within a similar trust boundary (IE affiliated groups) or all mutually untrusting (IE other IP addresses in the same net-block are treated the same as random hosts on the net).
Thanks for the insights -- we expect to address the virtual networking aspect in some way.
I think we could do some pretty cool things here with the new, well 2.6.25 new, network ingress/egress controls and restricting VM instances to specific interfaces and/or networks. However, we would need to settle the basic VM label management issues first. -- paul moore linux @ hp

On Mon, Aug 11, 2008 at 12:17:48PM +1000, James Morris wrote:
4. Design Considerations
4.1 Consensus in preliminary discussion appears to be that adding MAC to libvirt will be the most effective approach. Support may then be extended to virsh, virt-manager, oVirt etc.
I can see a couple of immediate items to address in the libvirt space - Need to decide how to ensure the VM is run with the correct security label instead of the default virt_t. Cannot assume that all VMs have disks configured. Some VMs may be PXE boot, and use an NFS/iSCSI root filesystem - this is not visible to the host. Implication is that we can't rely on labelling of disks files to infer the VM's security context. This suggests the domain XML format needs to allow for a security context to be specified at time the VM is defined/created in libvirt. libvirt would have to takes steps to make sure the VM is started with this defined context. An approach of including context in the XML would also allow easy extension to Xen XSM framework in future where you specify a context at time of VM creation, which is passed to the hypervisor. - The storage XML format can already report what label a storage volume currently has. In addition we need to be able to set the label. A few complications... - We may need to set it in several places - ie a VM may be assigned a disk based on a stable path such as /dev/disk/by-uuid/4cb23887-0d02-4e4c-bc95-7599c85afc1a Which is a symlink to the real (unstable) device name /dev/sda1 Clearly need to set label on the real device, but may also ned to change the symlink too ? - We can't add the new label to the SELinux policy directly, because the label needs to be on the unstable device name /dev/sdaXXX which can change across host OS reboots. Do we instead add the info the udev rules, so when /dev is populated at boot time by udev the device nodes get the desired initial labelling ? Or do we manually chcon() the device at the time we boot the VM ? - Some storage types don't allow per-file labelling - eg NFS In those scenarios the storage pool is assigned a label and all volumes inherit it. So, if two VMs are using NFS files and need different labelling, they need to use different directories on the NFS server, so that we can have separate mount points with appropriate labelling for each.
4.2 Initially, sVirt should "just work" as a means to isolate VMs, with minimal administrative interaction. e.g. an option is added to virt-manager which allows a VM to be designated as "isolated", and from then on, it is automatically run in a separate security context, with policy etc. being generated and managed by libvirt.
4.3 We need to consider very carefully exactly how VMs will be launched and controlled e.g. overall security ultimately must be enforced by the host kernel, even though VM launch will be initially controlled by host userspace.
4.4 We need to consider overall management of labeling both locally and in distributed environments (e.g. oVirt), as well as situations where VMs may be migrated between systems, backed up etc.
We need to define who/what is responsible for ensuring that all hosts in the cluster have the same policy loaded. Typically libvirt only aims to provide the mechanism, and not constrain what you do with it. So perhaps libvirt needs to merely be able to report on what policy version is loaded as part of host capabilities information. oVirt (or FreeIPA?) would be responsible for using this info, and also ensuring that all hosts have same policy if desired/required.
One possible approach may be to represent the security label as the UUID of the guest and then translate that to locally meaningful values as needed.
This implies there needs to be some lookup table of UUID -> security label mappings on every host in the cluster. This needs to be updated whenever a new VM is created, which is a fairly significant data sync task someone/thing needs to take care of. Would be doable for oVirt or FreeIPA, since they have a network-wide view. virt-manager though has individual host-centric view of things - it doesn't consider the broader picture.
4.5 MAC controls/policy will need to be considered for any control planes (e.g. /dev/kvm).
I should probably point out that there are in fact two ways in which KVM/QEMU can be used on a host - The 'system' instance. There is one of these per host, and it currently runs as a privileged user (ie root) - The 'session' instance. There is one of these per user, per host and it runs as the unprivileged user. The session instances can only utilize KVM acceleration if the host admin has given then appropriate group/ACL membership to access /dev/kvm. Likewise they can only access physical devices if they have neccessary grou/ACL membership for the device. Network access is SLIRP based unless the admin has pre-created TUN devices & given them access. I imagine that for this work we'll primarily target the 'system' instance and anything that happens to work for the 'session' instances can just be considered a free bonus
4.10 {lib}semanage needs performance optimization work to reduce impact on the virt toolchain.
Specifically in libvirt we need to avoid a dependancy on python. For oVirt we have a requirement that the operating system for a 'managed node' (ie the host running VMs) can be built into a Live CD / PXE bootable image that is < 64 MB in size. So any new dependancies from libvirt are very sensitive in terms of on disk footprint. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

On Tue, 12 Aug 2008, Daniel P. Berrange wrote:
Do we instead add the info the udev rules, so when /dev is populated at boot time by udev the device nodes get the desired initial labelling ? Or do we manually chcon() the device at the time we boot the VM ?
Dan Walsh has mentioned wanting to label the device at VM launch so that MCS labels can be dynamically assigned. This raises some other possible issues such as revoking any existing access (Linux doesn't have general revocation) and having the security of the system depend on whatever is performing the relabel (although we can enforce relabelfrom/relabelto permissions). I wonder if existing work/concepts related to MLS device allocation would be useful here. See: http://sourceforge.net/projects/devallocator/ - James -- James Morris <jmorris@namei.org>

James Morris wrote:
On Tue, 12 Aug 2008, Daniel P. Berrange wrote:
Do we instead add the info the udev rules, so when /dev is populated at boot time by udev the device nodes get the desired initial labelling ? Or do we manually chcon() the device at the time we boot the VM ?
Dan Walsh has mentioned wanting to label the device at VM launch so that MCS labels can be dynamically assigned. This raises some other possible issues such as revoking any existing access (Linux doesn't have general revocation) and having the security of the system depend on whatever is performing the relabel (although we can enforce relabelfrom/relabelto permissions).
I wonder if existing work/concepts related to MLS device allocation would be useful here.
See: http://sourceforge.net/projects/devallocator/
- James The experimenting I have done has been around labeling of the virt_image and the process with mcs labels to prevent one process from touching another process/image with a different MCS label.
system_u:system_r:qemu_t:s0:c1 can read/write system_u:system_r:virt_image_t:s0:c1 But can not read/write system_u:system_r:virt_image_t:s0:c2 or communicate with process system_u:system_r:qemu_t:s0:c2 The idea would be to have libvirt look at the labeling of the image file and lauch the qemu process with the correct type and matching MCS label. So a more advanced image might be labeled system_u:system_r:virt_image_nonet_t:s0:c1 and libvirt would launch system_u:system_r:qemu_nonet_t:s0:c2 I have not looked into which devices on the host machine might need higher levels of protection. In Fedora 9/Rawhide libvirt runs as virtd_t and has a transition rule on qemu_exec_t to qemu_t. So all virtual machines run with system_u:system_r:qemu_t:s0

On Tue, Aug 12, 2008 at 09:20:41AM -0400, Daniel J Walsh wrote:
James Morris wrote:
On Tue, 12 Aug 2008, Daniel P. Berrange wrote:
Do we instead add the info the udev rules, so when /dev is populated at boot time by udev the device nodes get the desired initial labelling ? Or do we manually chcon() the device at the time we boot the VM ?
Dan Walsh has mentioned wanting to label the device at VM launch so that MCS labels can be dynamically assigned. This raises some other possible issues such as revoking any existing access (Linux doesn't have general revocation) and having the security of the system depend on whatever is performing the relabel (although we can enforce relabelfrom/relabelto permissions).
I wonder if existing work/concepts related to MLS device allocation would be useful here.
See: http://sourceforge.net/projects/devallocator/
- James The experimenting I have done has been around labeling of the virt_image and the process with mcs labels to prevent one process from touching another process/image with a different MCS label.
system_u:system_r:qemu_t:s0:c1 can read/write system_u:system_r:virt_image_t:s0:c1
But can not read/write system_u:system_r:virt_image_t:s0:c2 or communicate with process system_u:system_r:qemu_t:s0:c2
The idea would be to have libvirt look at the labeling of the image file and lauch the qemu process with the correct type and matching MCS label.
That's not going to fly for VMs without disks in the host - either totally diskless VMs, or VMs using iSCSI/NFS network blockdevices / root FS. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Tue, Aug 12, 2008 at 09:20:41AM -0400, Daniel J Walsh wrote:
James Morris wrote:
On Tue, 12 Aug 2008, Daniel P. Berrange wrote:
Do we instead add the info the udev rules, so when /dev is populated at boot time by udev the device nodes get the desired initial labelling ? Or do we manually chcon() the device at the time we boot the VM ?
Dan Walsh has mentioned wanting to label the device at VM launch so that MCS labels can be dynamically assigned. This raises some other possible issues such as revoking any existing access (Linux doesn't have general revocation) and having the security of the system depend on whatever is performing the relabel (although we can enforce relabelfrom/relabelto permissions).
I wonder if existing work/concepts related to MLS device allocation would be useful here.
See: http://sourceforge.net/projects/devallocator/
- James The experimenting I have done has been around labeling of the virt_image and the process with mcs labels to prevent one process from touching another process/image with a different MCS label.
system_u:system_r:qemu_t:s0:c1 can read/write system_u:system_r:virt_image_t:s0:c1
But can not read/write system_u:system_r:virt_image_t:s0:c2 or communicate with process system_u:system_r:qemu_t:s0:c2
The idea would be to have libvirt look at the labeling of the image file and lauch the qemu process with the correct type and matching MCS label.
That's not going to fly for VMs without disks in the host - either totally diskless VMs, or VMs using iSCSI/NFS network blockdevices / root FS.
Daniel
We could store the label to run qemu for a particular image in the libvirt database. But this mechanism would have to match up with the labeling on disk or remote storage. system_u:system_r:qemu_nfs_t:s0:c1 can read/write system_u:system_r:nfs_t:s0 Or you have rules that state if virtd_t wants to start an image labeled nfs_t it will use qemu_nfs_t You could still use the MCS label to prevent processes from attacking each other, but if the remote storage does not support labelling you will not be able to prevent them from attacking each others image files. I think libvirt being SELinux aware and have it decide which context to run qemu at is the important point. The arguement about whether it needs to store the SELinux label in its database or base it off the label of the image can be hashed out later.

On Tue, Aug 12, 2008 at 09:54:23AM -0400, Daniel J Walsh wrote:
Daniel P. Berrange wrote:
On Tue, Aug 12, 2008 at 09:20:41AM -0400, Daniel J Walsh wrote:
The experimenting I have done has been around labeling of the virt_image and the process with mcs labels to prevent one process from touching another process/image with a different MCS label.
system_u:system_r:qemu_t:s0:c1 can read/write system_u:system_r:virt_image_t:s0:c1
But can not read/write system_u:system_r:virt_image_t:s0:c2 or communicate with process system_u:system_r:qemu_t:s0:c2
The idea would be to have libvirt look at the labeling of the image file and lauch the qemu process with the correct type and matching MCS label.
That's not going to fly for VMs without disks in the host - either totally diskless VMs, or VMs using iSCSI/NFS network blockdevices / root FS.
Daniel
We could store the label to run qemu for a particular image in the libvirt database. But this mechanism would have to match up with the labeling on disk or remote storage.
Ok, one minor point worth mentioning is that libvirt does not have a general purpose database of configurations. The way VM configuration is stored is hypervisor-specific. In the Xen/OpenVZ/VMWware case we pass the config straight through to XenD which takes care of persisting it. In QEMU/LXC case we store the XML config files in /etc/libvirt.
Or you have rules that state if virtd_t wants to start an image labeled nfs_t it will use qemu_nfs_t
You could still use the MCS label to prevent processes from attacking each other, but if the remote storage does not support labelling you will not be able to prevent them from attacking each others image files.
We don't need to restrict ourselves to a single NFS type qemu_nfs_t/nfs_t. A single NFS server can export many directories each of which can be mounted with a different context.
I think libvirt being SELinux aware and have it decide which context to run qemu at is the important point.
The arguement about whether it needs to store the SELinux label in its database or base it off the label of the image can be hashed out later.
So the important thing is that the label is represented in the libvirt XML format, and this XML config is persisted in a manner which is most applicable for the virtualization driver in question. While I know James wants to target KVM primarily, we need make sure the approach we take doesn't paint ourselves into a corner wrt supporting other virt platforms like Xen. Our guiding rule with libvirt is that for every capability we add, we need to come up with a conceptual model that is applicable to all virtualization drivers, even if we only implement it for one particular driver. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Daniel P. Berrange wrote:
On Tue, Aug 12, 2008 at 09:54:23AM -0400, Daniel J Walsh wrote:
On Tue, Aug 12, 2008 at 09:20:41AM -0400, Daniel J Walsh wrote:
The experimenting I have done has been around labeling of the virt_image and the process with mcs labels to prevent one process from touching another process/image with a different MCS label.
system_u:system_r:qemu_t:s0:c1 can read/write system_u:system_r:virt_image_t:s0:c1
But can not read/write system_u:system_r:virt_image_t:s0:c2 or communicate with process system_u:system_r:qemu_t:s0:c2
The idea would be to have libvirt look at the labeling of the image file and lauch the qemu process with the correct type and matching MCS label. That's not going to fly for VMs without disks in the host - either totally diskless VMs, or VMs using iSCSI/NFS network blockdevices / root FS.
Daniel We could store the label to run qemu for a particular image in the
Daniel P. Berrange wrote: libvirt database. But this mechanism would have to match up with the labeling on disk or remote storage.
Ok, one minor point worth mentioning is that libvirt does not have a general purpose database of configurations. The way VM configuration is stored is hypervisor-specific. In the Xen/OpenVZ/VMWware case we pass the config straight through to XenD which takes care of persisting it. In QEMU/LXC case we store the XML config files in /etc/libvirt.
Or you have rules that state if virtd_t wants to start an image labeled nfs_t it will use qemu_nfs_t
You could still use the MCS label to prevent processes from attacking each other, but if the remote storage does not support labelling you will not be able to prevent them from attacking each others image files.
We don't need to restrict ourselves to a single NFS type qemu_nfs_t/nfs_t. A single NFS server can export many directories each of which can be mounted with a different context.
Yes and no. The mountpoint can be labeled differently at the directory level I believe. So you would have to store each image in it's own directory and mount at that level in order for mount -o context to work.
I think libvirt being SELinux aware and have it decide which context to run qemu at is the important point.
The argument about whether it needs to store the SELinux label in its database or base it off the label of the image can be hashed out later.
So the important thing is that the label is represented in the libvirt XML format, and this XML config is persisted in a manner which is most applicable for the virtualization driver in question. While I know James wants to target KVM primarily, we need make sure the approach we take doesn't paint ourselves into a corner wrt supporting other virt platforms like Xen.
I probably should have used "datastore" rather then database. Wherever this data is stored it should be protected as libvirt will be making security decisions based on the data. (Of course this is controlled by kernel policy. the kernel would prevent virtd_t for execing an application as sshd_t for example.
Our guiding rule with libvirt is that for every capability we add, we need to come up with a conceptual model that is applicable to all virtualization drivers, even if we only implement it for one particular driver.
Isn't libvirt going to be execing q program like qemu_t to run xen images? If yes then this should all work with the defined mechanism.
Daniel

On Tue, Aug 12, 2008 at 10:16:35AM -0400, Daniel J Walsh wrote:
Daniel P. Berrange wrote:
On Tue, Aug 12, 2008 at 09:54:23AM -0400, Daniel J Walsh wrote:
On Tue, Aug 12, 2008 at 09:20:41AM -0400, Daniel J Walsh wrote:
The experimenting I have done has been around labeling of the virt_image and the process with mcs labels to prevent one process from touching another process/image with a different MCS label.
system_u:system_r:qemu_t:s0:c1 can read/write system_u:system_r:virt_image_t:s0:c1
But can not read/write system_u:system_r:virt_image_t:s0:c2 or communicate with process system_u:system_r:qemu_t:s0:c2
The idea would be to have libvirt look at the labeling of the image file and lauch the qemu process with the correct type and matching MCS label. That's not going to fly for VMs without disks in the host - either totally diskless VMs, or VMs using iSCSI/NFS network blockdevices / root FS.
Daniel We could store the label to run qemu for a particular image in the
Daniel P. Berrange wrote: libvirt database. But this mechanism would have to match up with the labeling on disk or remote storage.
Ok, one minor point worth mentioning is that libvirt does not have a general purpose database of configurations. The way VM configuration is stored is hypervisor-specific. In the Xen/OpenVZ/VMWware case we pass the config straight through to XenD which takes care of persisting it. In QEMU/LXC case we store the XML config files in /etc/libvirt.
Or you have rules that state if virtd_t wants to start an image labeled nfs_t it will use qemu_nfs_t
You could still use the MCS label to prevent processes from attacking each other, but if the remote storage does not support labelling you will not be able to prevent them from attacking each others image files.
We don't need to restrict ourselves to a single NFS type qemu_nfs_t/nfs_t. A single NFS server can export many directories each of which can be mounted with a different context.
Yes and no. The mountpoint can be labeled differently at the directory level I believe. So you would have to store each image in it's own directory and mount at that level in order for mount -o context to work.
Yes, that's what I actually meant - different directories on the NFS server for each set of disk images which need to be separated by security label.
Our guiding rule with libvirt is that for every capability we add, we need to come up with a conceptual model that is applicable to all virtualization drivers, even if we only implement it for one particular driver.
Isn't libvirt going to be execing q program like qemu_t to run xen images? If yes then this should all work with the defined mechanism.
Yes & no. In the Xen case, we pass the configuration ontop XenD. This talks to the hypervisor to create the virtual machine. Some Xen guests happen to have a QEMU process to provide an emulated device model, but this isn't required by Xen. We can however pass a security label to XenD and have it do the neccessary security work at VM creation time. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Hi, James Thank you for posting the announcement. This thread is very interesting. Instead, I understand many things to be dicided.
- Utilize the new hierarchical types being proposed upstream by NEC. (No analysis done yet).
Would you point the following document? Thanks Atsushi SAKAI

On Fri, 15 Aug 2008, Atsushi SAKAI wrote:
Hi, James
Thank you for posting the announcement. This thread is very interesting. Instead, I understand many things to be dicided.
- Utilize the new hierarchical types being proposed upstream by NEC. (No analysis done yet).
Would you point the following document?
http://marc.info/?l=selinux&m=121791579130877&w=2 -- James Morris <jmorris@namei.org>
participants (7)
-
Atsushi SAKAI
-
Casey Schaufler
-
Daniel J Walsh
-
Daniel P. Berrange
-
James Morris
-
Paul Moore
-
Russell Coker