[libvirt] Suspending access to opened/active /dev/nodes during application runtime

Problem: Has anyone thought about a mechanism to limit/remove an access to a device during an application runtime? Meaning we have an application that has an open file descriptor to some /dev/node and depending on *something* it gains or looses the access to it gracefully (with or without a notification, but without any fatal consequences). Example: LXC. Imagine we have 2 separate containers. Both running full operating systems. Specifically with 2 X servers. Both running concurrently of course. Both need the same input devices (e.g. we have just one mouse). This creates a security problem when we want to have completely separate environments. One container is active (being displayed on a monitor and controlled with a mouse) while the other container runs evtest /dev/input/something and grabs the secret password user typed in the other. Solutions: The complete solution would comprise of 2 parts: - a mechanism that would allow to temporally "hide" a device from an open file descriptor. - a mechanism for deciding whether application/process/namespace should have an access to a specific device at a specific moment Let's focus on the first problem only, as it would need to be solved first anyway. I haven't found anything that would allow me to do it. There are a lot mechanisms that make it possible to restrict an access during open(): - DAC - ACL (controlled by hand or with uaccess) - LSM (in general) - device cgroups But all of those can't do a thing when the device is already opened and an application has a file descriptor. I don't see such mechanism in kernel sources either. I do imagine that it would not be possible for every device to handle such a thing (dri comes to mind) without breaking something (graphics card state in dri example). But there is class of simple input/output devices that would handle this without problems. I did implement some proof-of-concept solution for an evdev driver by allowing or disallowing events that go to evdev_client structure using some arbitrary condition. But this is far from a generic solution. My proof-of-concept is somewhat similar to this (I just found it): http://www.spinics.net/lists/linux-input/msg25547.html Though a little bit wider in scope. But neither is flawless nor generic. Has anyone had any thoughts about a similar problem? -- Regards Havner

On Fri, Mar 07, 2014 at 07:46:44PM +0100, Lukasz Pawelczyk wrote:
Problem: Has anyone thought about a mechanism to limit/remove an access to a device during an application runtime? Meaning we have an application that has an open file descriptor to some /dev/node and depending on *something* it gains or looses the access to it gracefully (with or without a notification, but without any fatal consequences).
Example: LXC. Imagine we have 2 separate containers. Both running full operating systems. Specifically with 2 X servers. Both running concurrently of course. Both need the same input devices (e.g. we have just one mouse).
Stop right there. If they "both" need an input device, then they should use the "shared" input device stream, i.e. evdev. And it goes the same for every type of device the kernel is exposing to userspace, if you want to "share" them, then you need to work on changing the kernel to be able to handle shared devices. And odds are, you will get back a big "as-if" comment from the kernel developers, as for almost all devices, they can't be shared, for very good reasons. So work down the list of devices you really need access to, and either work to provide a way for the kernel to mediate them, or, work to only have one "container" access to one device, and not have all containers access to it at the same time. This has been discussed many times in the past, on mailing lists and in person at the Linux Plumbers conference last year. This isn't a systemd issue, it is a "you are using the kernel in ways it was not designed to be used" issue. good luck, you will need it... greg k-h

On 7 Mar 2014, at 20:09, Greg KH <gregkh@linuxfoundation.org> wrote:
On Fri, Mar 07, 2014 at 07:46:44PM +0100, Lukasz Pawelczyk wrote:
Problem: Has anyone thought about a mechanism to limit/remove an access to a device during an application runtime? Meaning we have an application that has an open file descriptor to some /dev/node and depending on *something* it gains or looses the access to it gracefully (with or without a notification, but without any fatal consequences).
Example: LXC. Imagine we have 2 separate containers. Both running full operating systems. Specifically with 2 X servers. Both running concurrently of course. Both need the same input devices (e.g. we have just one mouse).
Stop right there.
If they "both" need an input device, then they should use the "shared" input device stream, i.e. evdev.
And it goes the same for every type of device the kernel is exposing to userspace, if you want to "share" them, then you need to work on changing the kernel to be able to handle shared devices.
I think you might have misunderstood me. They are using a shared input stream (evdev in this case). The problem is I don’t want them to eavesdrop on each other. So it’s not about making it to work. It’s about making them to work „in turns”.
And odds are, you will get back a big "as-if" comment from the kernel developers, as for almost all devices, they can't be shared, for very good reasons.
Evdev devices can. -- Regards, Havner

On Fri, Mar 07, 2014 at 09:45:28PM +0100, Lukasz Pawelczyk wrote:
On 7 Mar 2014, at 20:09, Greg KH <gregkh@linuxfoundation.org> wrote:
On Fri, Mar 07, 2014 at 07:46:44PM +0100, Lukasz Pawelczyk wrote:
Problem: Has anyone thought about a mechanism to limit/remove an access to a device during an application runtime? Meaning we have an application that has an open file descriptor to some /dev/node and depending on *something* it gains or looses the access to it gracefully (with or without a notification, but without any fatal consequences).
Example: LXC. Imagine we have 2 separate containers. Both running full operating systems. Specifically with 2 X servers. Both running concurrently of course. Both need the same input devices (e.g. we have just one mouse).
Stop right there.
If they "both" need an input device, then they should use the "shared" input device stream, i.e. evdev.
And it goes the same for every type of device the kernel is exposing to userspace, if you want to "share" them, then you need to work on changing the kernel to be able to handle shared devices.
I think you might have misunderstood me. They are using a shared input stream (evdev in this case). The problem is I don’t want them to eavesdrop on each other. So it’s not about making it to work. It’s about making them to work „in turns”.
See Lennart's comment about namespaces for devices, and how the kernel doesn't support it, for the answer to this. Sorry, not going to happen, use real virtual machines if you want to do this. greg k-h
participants (2)
-
Greg KH
-
Lukasz Pawelczyk