On 08/08/2012 11:58 AM, Stefan Hajnoczi wrote:
On Wed, Aug 8, 2012 at 3:54 PM, Corey Bryant
<coreyb(a)linux.vnet.ibm.com> wrote:
>
>
> On 08/08/2012 09:04 AM, Stefan Hajnoczi wrote:
>>
>> On Tue, Aug 7, 2012 at 4:58 PM, Corey Bryant <coreyb(a)linux.vnet.ibm.com>
>> wrote:
>>>
>>> libvirt's sVirt security driver provides SELinux MAC isolation for
>>> Qemu guest processes and their corresponding image files. In other
>>> words, sVirt uses SELinux to prevent a QEMU process from opening
>>> files that do not belong to it.
>>>
>>> sVirt provides this support by labeling guests and resources with
>>> security labels that are stored in file system extended attributes.
>>> Some file systems, such as NFS, do not support the extended
>>> attribute security namespace, and therefore cannot support sVirt
>>> isolation.
>>>
>>> A solution to this problem is to provide fd passing support, where
>>> libvirt opens files and passes file descriptors to QEMU. This,
>>> along with SELinux policy to prevent QEMU from opening files, can
>>> provide image file isolation for NFS files stored on the same NFS
>>> mount.
>>>
>>> This patch series adds the add-fd, remove-fd, and query-fdsets
>>> QMP monitor commands, which allow file descriptors to be passed
>>> via SCM_RIGHTS, and assigned to specified fd sets. This allows
>>> fd sets to be created per file with fds having, for example,
>>> different access rights. When QEMU needs to reopen a file with
>>> different access rights, it can search for a matching fd in the
>>> fd set. Fd sets also allow for easy tracking of fds per file,
>>> helping to prevent fd leaks.
>>>
>>> Support is also added to the block layer to allow QEMU to dup an
>>> fd from an fdset when the filename is of the /dev/fdset/nnn format,
>>> where nnn is the fd set ID.
>>>
>>> No new SELinux policy is required to prevent open of NFS files
>>> (files with type nfs_t). The virt_use_nfs boolean type simply
>>> needs to be set to false, and open will be prevented (and dup will
>>> be allowed). For example:
>>>
>>> # setsebool virt_use_nfs 0
>>> # getsebool virt_use_nfs
>>> virt_use_nfs --> off
>>>
>>> Corey Bryant (6):
>>> qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg
>>> qapi: Introduce add-fd, remove-fd, query-fdsets
>>> monitor: Clean up fd sets on monitor disconnect
>>> block: Convert open calls to qemu_open
>>> block: Convert close calls to qemu_close
>>> block: Enable qemu_open/close to work with fd sets
>>>
>>> block/raw-posix.c | 42 ++++-----
>>> block/raw-win32.c | 6 +-
>>> block/vdi.c | 5 +-
>>> block/vmdk.c | 25 +++--
>>> block/vpc.c | 4 +-
>>> block/vvfat.c | 16 ++--
>>> cutils.c | 5 +
>>> monitor.c | 273
>>> +++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> monitor.h | 5 +
>>> osdep.c | 117 +++++++++++++++++++++++
>>> qapi-schema.json | 110 +++++++++++++++++++++
>>> qemu-char.c | 12 ++-
>>> qemu-common.h | 2 +
>>> qemu-tool.c | 20 ++++
>>> qerror.c | 4 +
>>> qerror.h | 3 +
>>> qmp-commands.hx | 131 +++++++++++++++++++++++++
>>> savevm.c | 4 +-
>>> 18 files changed, 730 insertions(+), 54 deletions(-)
>>
>>
>> Are there tests for this feature? Do you have test scripts used
>> during development?
>
>
> Yes I have some C code that I've been using for testing. I can clean it up
> and provide it if you'd like.
That would be very useful. tests/ has test cases. For the block
layer tests/qemu-iotests/ is especially relevant, that's where a lot
of the test cases go. If you look at test case 030 you'll see how a
Python script interacts with QMP to test image streaming -
unfortunately I think Python doesn't natively support SCM_RIGHTS. But
a test script would be very useful so it can be used as a regression
test in the future.
Sure I'll take a look. Hopefully a C test is ok if I can't use
SCM_RIGHTS in Python.
>>
>> Here's what I've gathered:
>>
>> Applications use add-fd to add file descriptors to fd sets. An fd set
>> contains one or more file descriptors, each with different access
>> modes (O_RDONLY, O_RDWR, O_WRONLY). File descriptors can be retrieved
>> from the fd set and are matched by their access modes. This allows
>> QEMU to reopen files with different access modes.
>>
>> File descriptors stay in their fd set until explicitly removed by the
>> remove-fd command or when all monitor clients have disconnected. This
>> ensures that file descriptors are not leaked after a monitor client
>> crashes. Automatic removal on monitor close is postponed until all
>> duped fds have been fd - this means QEMU can still reopen an in-use fd
>
>
> I assume you mean "... until all duped fds have been *closed* - ..."
Yes, my typo :)
Great, then your understanding of how this works is correct. :)
>> after a client disconnects.
>>
>> Does this sound right?
>
>
> Yes, exactly.
>
> I should point out there is an issue that needs to be cleaned up in the
> future. There are short windows of time where refcount can get to zero
> while an image file is in use. This is because the file is being reopened.
> For example, I've noticed this occurs when format= is not specified on the
> device_add command and the file is probed, and when mouting/unmounting a
> file system. Hopefully this can be treated as a follow-up issue.
The block layer doesn't treat this as a "reopen" today. Supriya
Kannery has a patch series for bdrv_reopen() which would also need to
be integrated with fd sets to ensure the refcount doesn't hit 0 and
cause a cleanup.
Great, Supriya's patches sound like what is needed. Also, I noticed
that I'm missing a patch in my series. I need to make sure that
/dev/fdset/nnn is not detected as a floppy drive (/dev/fdx). That was
causing a close/open.
--
Regards,
Corey