
On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files stored on the same NFS mount.
This patch series adds the pass-fd QMP monitor command, which allows an fd to be passed via SCM_RIGHTS, and returns the received file descriptor. Support is also added to the block layer to allow QEMU to dup the fd when the filename is of the /dev/fd/X format. This is useful if MAC policy prevents QEMU from opening specific types of files.
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue. With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N" My problem is that none of this FD passing is "transactional". If libvirtd crashes or otherwise fails between steps 1 & 2, a FD is left open in QEMU. If libvirtd gets the failure detection wrong in step 2 (eg sees a I/O failure on the monitor, but from QEMU's pov drive_add succeeed), we could end up telling QEMU to close an FD that it is still using for a drive. Likewise if libvirtd fails/crashes between steps 2 & 3 we might not clean up after failure. These aren't new problems with pass_fd - they existed with getfd too of course. If we were designing this interface with no regard for the historical practice in QEMU, then I feel like we would not even bother to have either 'pass_fd' or 'getfd'. We would pass the FD(s) directly with the "drive_add" command. Given that we have decided that attaching special semantics to filenames matching "/dev/fd/N" is OK, then I feel we could go one better, and allow the FD to be passed with the "drive_add" (or other) commands directly. All we need do is define slightly different semantics for "/dev/fd/N". Instead of it meaning "use the process FD numbered N", we can define it to mean "use the n'th FD set in the current context". The "context" would be populated with all FDs received with the monitor current command. So now from a client's POV you'd have a flow like * drive_add "file=/dev/fd/N" FDSET={N} And in QEMU you'd have something like * handle_monitor_command - recvmsg all FDs, and stash them in a thread local "FDContext" context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from "FDContext" - If success remove /dev/fd/N from "FDContext" - close() all FDs left in "FDContext" The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|