On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
libvirt's sVirt security driver provides SELinux MAC isolation
for
Qemu guest processes and their corresponding image files. In other
words, sVirt uses SELinux to prevent a QEMU process from opening
files that do not belong to it.
sVirt provides this support by labeling guests and resources with
security labels that are stored in file system extended attributes.
Some file systems, such as NFS, do not support the extended
attribute security namespace, and therefore cannot support sVirt
isolation.
A solution to this problem is to provide fd passing support, where
libvirt opens files and passes file descriptors to QEMU. This,
along with SELinux policy to prevent QEMU from opening files, can
provide image file isolation for NFS files stored on the same NFS
mount.
This patch series adds the pass-fd QMP monitor command, which allows
an fd to be passed via SCM_RIGHTS, and returns the received file
descriptor. Support is also added to the block layer to allow QEMU
to dup the fd when the filename is of the /dev/fd/X format. This
is useful if MAC policy prevents QEMU from opening specific types
of files.
I was thinking about some of the sources complexity when using
FD passing from libvirt and wanted to raise one idea for discussion
before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's
view of the FD
2. drive_add file=/dev/fd/N
3. if failure:
close_fd "/dev/fd/N"
My problem is that none of this FD passing is "transactional".
If libvirtd crashes or otherwise fails between steps 1 & 2,
a FD is left open in QEMU. If libvirtd gets the failure
detection wrong in step 2 (eg sees a I/O failure on the monitor,
but from QEMU's pov drive_add succeeed), we could end up
telling QEMU to close an FD that it is still using for a
drive. Likewise if libvirtd fails/crashes between steps 2 & 3
we might not clean up after failure.
These aren't new problems with pass_fd - they existed with
getfd too of course.
If we were designing this interface with no regard for the
historical practice in QEMU, then I feel like we would not
even bother to have either 'pass_fd' or 'getfd'. We would
pass the FD(s) directly with the "drive_add" command.
Given that we have decided that attaching special semantics
to filenames matching "/dev/fd/N" is OK, then I feel we could
go one better, and allow the FD to be passed with the "drive_add"
(or other) commands directly. All we need do is define slightly
different semantics for "/dev/fd/N". Instead of it meaning
"use the process FD numbered N", we can define it to mean
"use the n'th FD set in the current context". The "context"
would be populated with all FDs received with the monitor
current command.
So now from a client's POV you'd have a flow like
* drive_add "file=/dev/fd/N" FDSET={N}
And in QEMU you'd have something like
* handle_monitor_command
- recvmsg all FDs, and stash them in a thread local "FDContext"
context
- invoke monitor command handler
- Sees file=/dev/fd/N
- Fetch /dev/fd/N from "FDContext"
- If success remove /dev/fd/N from "FDContext"
- close() all FDs left in "FDContext"
The key point with this is that because the FDs are directly
associated with a monitor command, QEMU can /guarantee/ that
FDs are never leaked, regardless of client behaviour.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|