On 06/18/2011 04:50 PM, Blue Swirl wrote:
On Thu, Jun 16, 2011 at 5:48 PM, Corey
Bryant<bryntcor(a)us.ibm.com> wrote:
>
>
> On 06/15/2011 03:12 PM, Blue Swirl wrote:
>>
>> On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant<bryntcor(a)us.ibm.com>
wrote:
>>>
>>>> sVirt provides SELinux MAC isolation for Qemu guest processes and
>>>> their
>>>> corresponding resources (image files). sVirt provides this support
>>>> by labeling guests and resources with security labels that are stored
>>>> in file system extended attributes. Some file systems, such as NFS, do
>>>> not support the extended attribute security namespace, which is needed
>>>> for image file isolation when using the sVirt SELinux security driver
>>>> in libvirt.
>>>>
>>>> The proposed solution entails a combination of Qemu, libvirt, and
>>>> SELinux patches that work together to isolate multiple guests'
images
>>>> when they're stored in the same NFS mount. This results in an
>>>> environment where sVirt isolation and NFS image file isolation can
>>>> both
>>>> be provided.
>>>>
>>>> This patch contains the Qemu code to support this solution. I would
>>>> like to solicit input from the libvirt community prior to starting
>>>> the libvirt patch.
>>>>
>>>> Currently, Qemu opens an image file in addition to performing the
>>>> necessary read and write operations. The proposed solution will move
>>>> the open out of Qemu and into libvirt. Once libvirt opens an image
>>>> file for the guest, it will pass the file descriptor to Qemu via a
>>>> new fd: protocol.
>>>>
>>>> If the image file resides in an NFS mount, the following SELinux
>>>> policy
>>>> changes will provide image isolation:
>>>>
>>>> - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>>>> allow Qemu (svirt_t) to only have SELinux read and write
>>>> permissions on nfs_t files
>>>>
>>>> - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>>>> (virtd_t) file descriptors
>>>>
>>>> Following is a sample invocation of Qemu using the fd: protocol on
>>>> the command line:
>>>>
>>>> qemu -drive file=fd:4,format=qcow2
>>>>
>>>> The fd: protocol is also supported by the drive_add monitor command.
>>>> This requires that the specified file descriptor is passed to the
>>>> monitor alongside a prior getfd monitor command.
>>>>
>>>> There are some additional features provided by certain image types
>>>> where Qemu reopens the image file. All of these scenarios will be
>>>> unsupported for the fd: protocol, at least for this patch:
>>>>
>>>> - The -snapshot command line option
>>>> - The savevm monitor command
>>>> - The snapshot_blkdev monitor command
>>>> - Starting Qemu with a backing file
>>
>> There's also native CDROM device. Did you consider adding an explicit
>> reopen method to block layer?
>
> Thanks. Yes it looks like I overlooked CDROM reopens.
>
> I'm not sure that I'm clear on the purpose of the reopen function. Would the
> goal be to funnel all block layer reopens through a single function,
> enabling potential future support where a privileged layer of Qemu, or
> libvirt, performs the open?
Eventually yes, but I think it would help also now by moving the
checks to a single place. It's a bit orthogonal to this patch though.
This would definitely simplify things, especially when reopen support is
added. I'm going to defer this until then.
>>>> The thought is that this support can be added in
the future, but is
>>>> not required for the initial fd: support.
>>>>
>>>> This patch was tested with the following formats: raw, cow, qcow,
>>>> qcow2, qed, and vmdk, using the fd: protocol from the command line
>>>> and the monitor. Tests were also run to verify existing file name
>>>> support and qemu-img were not regressed. Non-valid file descriptors,
>>>> fd: without format, snapshot and backing files were also tested.
>>>>
>>>> Signed-off-by: Corey Bryant<coreyb(a)linux.vnet.ibm.com>
>>>> ---
>>>> block.c | 16 ++++++++++
>>>> block.h | 1 +
>>>> block/cow.c | 5 +++
>>>> block/qcow.c | 5 +++
>>>> block/qcow2.c | 5 +++
>>>> block/qed.c | 4 ++
>>>> block/raw-posix.c | 81
>>>> +++++++++++++++++++++++++++++++++++++++++++++++------
>>>> block/vmdk.c | 5 +++
>>>> block_int.h | 1 +
>>>> blockdev.c | 10 ++++++
>>>> monitor.c | 5 +++
>>>> monitor.h | 1 +
>>>> qemu-options.hx | 8 +++--
>>>> qemu-tool.c | 5 +++
>>>> 14 files changed, 140 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/block.c b/block.c
>>>> index 24a25d5..500db84 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char
>>>> *filename, int flags,
>>>> char tmp_filename[PATH_MAX];
>>>> char backing_filename[PATH_MAX];
>>>>
>>>> + if (bdrv_is_fd_protocol(bs)) {
>>>> + return -ENOTSUP;
>>>> + }
>>>> +
>>>> /* if snapshot, we create a temporary backing file and open
>>>> it
>>>> instead of opening 'filename' directly */
>>>>
>>>> @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char
>>>> *filename, int flags,
>>>>
>>>> /* Find the right image format driver */
>>>> if (!drv) {
>>>> + /* format must be specified for fd: protocol */
>>>> + if (bdrv_is_fd_protocol(bs)) {
>>>> + return -ENOTSUP;
>>>> + }
>>>> ret = find_image_format(filename,&drv);
>>>> }
>>>>
>>>> @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState
>>>> *bs)
>>>> return bs->enable_write_cache;
>>>> }
>>>>
>>>> +int bdrv_is_fd_protocol(BlockDriverState *bs)
>>>> +{
>>>> + return bs->fd_protocol;
>>>> +}
>>>> +
>>>> /* XXX: no longer used */
>>>> void bdrv_set_change_cb(BlockDriverState *bs,
>>>> void (*change_cb)(void *opaque, int reason),
>>>> @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>>>> BlockDriver *drv = bs->drv;
>>>> if (!drv)
>>>> return -ENOMEDIUM;
>>>> + if (bdrv_is_fd_protocol(bs)) {
>>>> + return -ENOTSUP;
>>>> + }
>>>> if (drv->bdrv_snapshot_create)
>>>> return drv->bdrv_snapshot_create(bs, sn_info);
>>>> if (bs->file)
>>>> diff --git a/block.h b/block.h
>>>> index da7d39c..dd46d52 100644
>>>> --- a/block.h
>>>> +++ b/block.h
>>>> @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>>>> int bdrv_is_read_only(BlockDriverState *bs);
>>>> int bdrv_is_sg(BlockDriverState *bs);
>>>> int bdrv_enable_write_cache(BlockDriverState *bs);
>>>> +int bdrv_is_fd_protocol(BlockDriverState *bs);
>>>> int bdrv_is_inserted(BlockDriverState *bs);
>>>> int bdrv_media_changed(BlockDriverState *bs);
>>>> int bdrv_is_locked(BlockDriverState *bs);
>>>> diff --git a/block/cow.c b/block/cow.c
>>>> index 4cf543c..e17f8e7 100644
>>>> --- a/block/cow.c
>>>> +++ b/block/cow.c
>>>> @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int
>>>> flags)
>>>> pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>>>> cow_header.backing_file);
>>>>
>>>> + if (bs->backing_file[0] != '\0'&&
bdrv_is_fd_protocol(bs)) {
>>>> + /* backing file currently not supported by fd: protocol */
>>>> + goto fail;
>>>> + }
>>>> +
>>>> bitmap_size = ((bs->total_sectors + 7)>> 3) +
>>>> sizeof(cow_header);
>>>> s->cow_sectors_offset = (bitmap_size + 511)& ~511;
>>>> return 0;
>>>> diff --git a/block/qcow.c b/block/qcow.c
>>>> index a26c886..1e46bdb 100644
>>>> --- a/block/qcow.c
>>>> +++ b/block/qcow.c
>>>> @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int
>>>> flags)
>>>> if (bdrv_pread(bs->file, header.backing_file_offset,
>>>> bs->backing_file, len) != len)
>>>> goto fail;
>>>> bs->backing_file[len] = '\0';
>>>> +
>>>> + if (bs->backing_file[0] != '\0'&&
bdrv_is_fd_protocol(bs)) {
>>>> + /* backing file currently not supported by fd: protocol
>>>> */
>>>> + goto fail;
>>>> + }
>>>> }
>>>> return 0;
>>>>
>>>> diff --git a/block/qcow2.c b/block/qcow2.c
>>>> index 8451ded..8b9c160 100644
>>>> --- a/block/qcow2.c
>>>> +++ b/block/qcow2.c
>>>> @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int
>>>> flags)
>>>> goto fail;
>>>> }
>>>> bs->backing_file[len] = '\0';
>>>> +
>>>> + if (bs->backing_file[0] != '\0'&&
bdrv_is_fd_protocol(bs)) {
>>>> + ret = -ENOTSUP;
>>>> + goto fail;
>>>> + }
>>>> }
>>>> if (qcow2_read_snapshots(bs)< 0) {
>>>> ret = -EINVAL;
>>>> diff --git a/block/qed.c b/block/qed.c
>>>> index 3970379..5028897 100644
>>>> --- a/block/qed.c
>>>> +++ b/block/qed.c
>>>> @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs,
>>>> int flags)
>>>> return ret;
>>>> }
>>>>
>>>> + if (bs->backing_file[0] != '\0'&&
bdrv_is_fd_protocol(bs)) {
>>>> + return -ENOTSUP;
>>>> + }
>>>> +
>>>> if (s->header.features&
QED_F_BACKING_FORMAT_NO_PROBE) {
>>>> pstrcpy(bs->backing_format,
sizeof(bs->backing_format),
>>>> "raw");
>>>> }
>>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>>> index 4cd7d7a..c72de3d 100644
>>>> --- a/block/raw-posix.c
>>>> +++ b/block/raw-posix.c
>>>> @@ -28,6 +28,7 @@
>>>> #include "block_int.h"
>>>> #include "module.h"
>>>> #include "block/raw-posix-aio.h"
>>>> +#include "monitor.h"
>>>>
>>>> #ifdef CONFIG_COCOA
>>>> #include<paths.h>
>>>> @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs,
>>>> const char *filename,
>>>> int bdrv_flags, int open_flags)
>>>> {
>>>> BDRVRawState *s = bs->opaque;
>>>> - int fd, ret;
>>>> + int fd = -1;
>>>> + int ret;
>>>>
>>>> ret = raw_normalize_devicepath(&filename);
>>>> if (ret != 0) {
>>>> @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs,
>>>> const char *filename,
>>>> if (!(bdrv_flags& BDRV_O_CACHE_WB))
>>>> s->open_flags |= O_DSYNC;
>>>>
>>>> - s->fd = -1;
>>>> - fd = qemu_open(filename, s->open_flags, 0644);
>>>> - if (fd< 0) {
>>>> - ret = -errno;
>>>> - if (ret == -EROFS)
>>>> - ret = -EACCES;
>>>> - return ret;
>>>> + if (s->fd == -1) {
>>>> + fd = qemu_open(filename, s->open_flags, 0644);
>>>> + if (fd< 0) {
>>>> + ret = -errno;
>>>> + if (ret == -EROFS) {
>>>> + ret = -EACCES;
>>>> + }
>>>> + return ret;
>>>> + }
>>>> + s->fd = fd;
>>>> }
>>>> - s->fd = fd;
>>>> s->aligned_buf = NULL;
>>>>
>>>> if ((bdrv_flags& BDRV_O_NOCACHE)) {
>>>> @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const
>>>> char *filename, int flags)
>>>> {
>>>> BDRVRawState *s = bs->opaque;
>>>>
>>>> + s->fd = -1;
>>>> s->type = FTYPE_FILE;
>>>> return raw_open_common(bs, filename, flags, 0);
>>>> }
>>>> @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = {
>>>> .create_options = raw_create_options,
>>>> };
>>>>
>>>> +static int raw_open_fd(BlockDriverState *bs, const char *filename,
>>>> int flags)
>>>> +{
>>>> + BDRVRawState *s = bs->opaque;
>>>> + const char *fd_str;
>>>> + int fd;
>>>> +
>>>> + /* extract the file descriptor - fail if it's not fd: */
>>>> + if (!strstart(filename, "fd:",&fd_str)) {
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + if (!qemu_isdigit(fd_str[0])) {
>>>> + /* get fd from monitor */
>>>> + fd = qemu_get_fd(fd_str);
>>>> + if (fd == -1) {
>>>> + return -EBADF;
>>>> + }
>>>> + } else {
>>>> + char *endptr = NULL;
>>>> +
>>>> + fd = strtol(fd_str,&endptr, 10);
>>>> + if (*endptr || (fd == 0&& fd_str == endptr)) {
>>>> + return -EBADF;
>>>> + }
>>>> + }
>>>> +
>>>> + s->fd = fd;
>>>> + s->type = FTYPE_FILE;
>>>> +
>>>> + return raw_open_common(bs, filename, flags, 0);
>>>> +}
>>>> +
>>>> +static BlockDriver bdrv_file_fd = {
>>>> + .format_name = "file",
>>>> + .protocol_name = "fd",
>>>> + .instance_size = sizeof(BDRVRawState),
>>>> + .bdrv_probe = NULL, /* no probe for protocols */
>>>> + .bdrv_file_open = raw_open_fd,
>>>> + .bdrv_read = raw_read,
>>>> + .bdrv_write = raw_write,
>>>> + .bdrv_close = raw_close,
>>>> + .bdrv_flush = raw_flush,
>>>> + .bdrv_discard = raw_discard,
>>>> +
>>>> + .bdrv_aio_readv = raw_aio_readv,
>>>> + .bdrv_aio_writev = raw_aio_writev,
>>>> + .bdrv_aio_flush = raw_aio_flush,
>>>> +
>>>> + .bdrv_truncate = raw_truncate,
>>>> + .bdrv_getlength = raw_getlength,
>>>> +
>>>> + .create_options = raw_create_options,
>>>> +};
>>>> +
>>>> /***********************************************/
>>>> /* host device */
>>>>
>>>> @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const
>>>> char *filename, int flags)
>>>> }
>>>> #endif
>>>>
>>>> + s->fd = -1;
>>>> s->type = FTYPE_FILE;
>>>> #if defined(__linux__)
>>>> {
>>>> @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs,
>>>> const char *filename, int flags)
>>>> BDRVRawState *s = bs->opaque;
>>>> int ret;
>>>>
>>>> + s->fd = -1;
>>>> s->type = FTYPE_FD;
>>>>
>>>> /* open will not fail even if no floppy is inserted, so add
>>>> O_NONBLOCK */
>>>> @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs,
>>>> const char *filename, int flags)
>>>> {
>>>> BDRVRawState *s = bs->opaque;
>>>>
>>>> + s->fd = -1;
>>>> s->type = FTYPE_CD;
>>>>
>>>> /* open will not fail even if no CD is inserted, so add
>>>> O_NONBLOCK */
>>>> @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void)
>>>> * Register all the drivers. Note that order is important, the
>>>> driver
>>>> * registered last will get probed first.
>>>> */
>>>> + bdrv_register(&bdrv_file_fd);
>>>> bdrv_register(&bdrv_file);
>>>> bdrv_register(&bdrv_host_device);
>>>> #ifdef __linux__
>>>> diff --git a/block/vmdk.c b/block/vmdk.c
>>>> index 922b23d..2ea808e 100644
>>>> --- a/block/vmdk.c
>>>> +++ b/block/vmdk.c
>>>> @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>>>> return -1;
>>>>
>>>> pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>>>> +
>>>> + if (bs->backing_file[0] != '\0'&&
bdrv_is_fd_protocol(bs)) {
>>>> + /* backing file currently not supported by fd: protocol
>>>> */
>>>> + return -1;
>>>> + }
>>>> }
>>>>
>>>> return 0;
>>>> diff --git a/block_int.h b/block_int.h
>>>> index fa91337..a305ee2 100644
>>>> --- a/block_int.h
>>>> +++ b/block_int.h
>>>> @@ -152,6 +152,7 @@ struct BlockDriverState {
>>>> int encrypted; /* if true, the media is encrypted */
>>>> int valid_key; /* if true, a valid encryption key has been set */
>>>> int sg; /* if true, the device is a /dev/sg* */
>>>> + int fd_protocol; /* if true, the fd: protocol was specified */
>>
>> bool?
>>
> I was following suit here, but I agree that bool would be better. Or
> better, these could all be reduced to bit flags. My thought is that I'll
> stick with following suit here though.
OK, better convert all at once with a separate patch.
Ok
>>>> /* event callback when inserting/removing */
>>>> void (*change_cb)(void *opaque, int reason);
>>>> void *change_opaque;
>>>> diff --git a/blockdev.c b/blockdev.c
>>>> index e81e0ab..a536c20 100644
>>>> --- a/blockdev.c
>>>> +++ b/blockdev.c
>>>> @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int
>>>> default_to_scsi)
>>>>
>>>> bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>>>>
>>>> + if (strncmp(file, "fd:", 3) == 0) {
>>>> + dinfo->bdrv->fd_protocol = 1;
>>>> + }
>>>> +
>>>> ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv);
>>>> if (ret< 0) {
>>>> error_report("could not open disk image %s: %s",
>>>> @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict
>>>> *qdict, QObject **ret_data)
>>>> goto out;
>>>> }
>>>>
>>>> + if (bdrv_is_fd_protocol(bs)) {
>>>> + qerror_report(QERR_UNSUPPORTED);
>>>> + ret = -1;
>>>> + goto out;
>>>> + }
>>>> +
>>>> pstrcpy(old_filename, sizeof(old_filename), bs->filename);
>>>>
>>>> old_drv = bs->drv;
>>>> diff --git a/monitor.c b/monitor.c
>>>> index 59a3e76..ea60be2 100644
>>>> --- a/monitor.c
>>>> +++ b/monitor.c
>>>> @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char
>>>> *fdname)
>>>> return -1;
>>>> }
>>>>
>>>> +int qemu_get_fd(const char *fdname)
>>>> +{
>>>> + return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1;
>>>> +}
>>>> +
>>>> static const mon_cmd_t mon_cmds[] = {
>>>> #include "hmp-commands.h"
>>>> { NULL, NULL, },
>>>> diff --git a/monitor.h b/monitor.h
>>>> index 4f2d328..de5b987 100644
>>>> --- a/monitor.h
>>>> +++ b/monitor.h
>>>> @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon,
>>>> BlockDriverState *bs,
>>>> void *opaque);
>>>>
>>>> int monitor_get_fd(Monitor *mon, const char *fdname);
>>>> +int qemu_get_fd(const char *fdname);
>>>>
>>>> void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
>>>> GCC_FMT_ATTR(2, 0);
>>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>>> index 1d5ad8b..f9b66f4 100644
>>>> --- a/qemu-options.hx
>>>> +++ b/qemu-options.hx
>>>> @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename
>>>> (@pxref{host_drives}).
>>>> ETEXI
>>>>
>>>> DEF("drive", HAS_ARG, QEMU_OPTION_drive,
>>>> - "-drive
>>>> [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>>>> + "-drive
>>>> [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n"
>>>> "
[,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
>>>> "
[,cache=writethrough|writeback|none|unsafe][,format=f]\n"
>>>> "
[,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
>>>> @@ -129,10 +129,12 @@ STEXI
>>>> Define a new drive. Valid options are:
>>>>
>>>> @table @option
>>>> -@item file=@var{file}
>>>> +@item file=[fd:]@var{file}
>>>> This option defines which disk image (@pxref{disk_images}) to use
>>>> with
>>>> this drive. If the filename contains comma, you must double it
>>>> -(for instance, "file=my,,file" to use file
"my,file").
>>>> +(for instance, "file=my,,file" to use file
"my,file").
>>>> @option{fd:}@var{file}
>>>> +specifies the file descriptor of an already open disk image.
>>>> +@option{format=}@var{format} is required by @option{fd:}@var{file}.
>>>> @item if=@var{interface}
>>>> This option defines on which type on interface the drive is
>>>> connected.
>>>> Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio.
>>>> diff --git a/qemu-tool.c b/qemu-tool.c
>>>> index 41e5c41..8fe6b8c 100644
>>>> --- a/qemu-tool.c
>>>> +++ b/qemu-tool.c
>>>> @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock)
>>>> {
>>>> return 0;
>>>> }
>>>> +
>>>> +int qemu_get_fd(const char *fdname)
>>>> +{
>>>> + return -1;
>>>> +}
>>>> --
>>>> 1.7.1
>>>>
>>>>
>>>>
>
>
> Regards,
> Corey Bryant
>
--
Regards,
Corey Bryant
IBM Linux Technology Center - Security
845-435-8732 (T/L: 295-8732)