On 06/26/2012 05:10 AM, Daniel P. Berrange wrote:
On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
> libvirt's sVirt security driver provides SELinux MAC isolation for
> Qemu guest processes and their corresponding image files. In other
> words, sVirt uses SELinux to prevent a QEMU process from opening
> files that do not belong to it.
>
> sVirt provides this support by labeling guests and resources with
> security labels that are stored in file system extended attributes.
> Some file systems, such as NFS, do not support the extended
> attribute security namespace, and therefore cannot support sVirt
> isolation.
>
> A solution to this problem is to provide fd passing support, where
> libvirt opens files and passes file descriptors to QEMU. This,
> along with SELinux policy to prevent QEMU from opening files, can
> provide image file isolation for NFS files stored on the same NFS
> mount.
>
> This patch series adds the pass-fd QMP monitor command, which allows
> an fd to be passed via SCM_RIGHTS, and returns the received file
> descriptor. Support is also added to the block layer to allow QEMU
> to dup the fd when the filename is of the /dev/fd/X format. This
> is useful if MAC policy prevents QEMU from opening specific types
> of files.
I was thinking about some of the sources complexity when using
FD passing from libvirt and wanted to raise one idea for discussion
before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's
view of the FD
2. drive_add file=/dev/fd/N
3. if failure:
close_fd "/dev/fd/N"
My problem is that none of this FD passing is "transactional".
If libvirtd crashes or otherwise fails between steps 1 & 2,
a FD is left open in QEMU. If libvirtd gets the failure
detection wrong in step 2 (eg sees a I/O failure on the monitor,
but from QEMU's pov drive_add succeeed), we could end up
telling QEMU to close an FD that it is still using for a
drive. Likewise if libvirtd fails/crashes between steps 2 & 3
we might not clean up after failure.
I see what you're saying, but if libvirt crashes it seems like there are
bigger issues going on than a leaked fd. If libvirt fails, then it
should call closefd to prevent leakage.
I don't know if it really buys that much of an advantage though. I
think one major advantage to having separate commands is that other
commands can use an fd passed by pass-fd, not just drive_add.
These aren't new problems with pass_fd - they existed with
getfd too of course.
If we were designing this interface with no regard for the
historical practice in QEMU, then I feel like we would not
even bother to have either 'pass_fd' or 'getfd'. We would
pass the FD(s) directly with the "drive_add" command.
Given that we have decided that attaching special semantics
to filenames matching "/dev/fd/N" is OK, then I feel we could
go one better, and allow the FD to be passed with the "drive_add"
(or other) commands directly. All we need do is define slightly
different semantics for "/dev/fd/N". Instead of it meaning
"use the process FD numbered N", we can define it to mean
"use the n'th FD set in the current context". The "context"
would be populated with all FDs received with the monitor
current command.
So now from a client's POV you'd have a flow like
* drive_add "file=/dev/fd/N" FDSET={N}
IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?
And in QEMU you'd have something like
* handle_monitor_command
- recvmsg all FDs, and stash them in a thread local "FDContext"
context
- invoke monitor command handler
- Sees file=/dev/fd/N
- Fetch /dev/fd/N from "FDContext"
- If success remove /dev/fd/N from "FDContext"
- close() all FDs left in "FDContext"
The key point with this is that because the FDs are directly
associated with a monitor command, QEMU can /guarantee/ that
FDs are never leaked, regardless of client behaviour.
Wouldn't this leak fds if libvirt crashed part way through sending the
set of fds?
--
Regards,
Corey