[libvirt] [RFC PATCH 0/4] block: file descriptor passing using -filefd and getfd_file

libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it. sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation. A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files. This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file. This series reuses the file_open function that Anthony Liguori <aliguori@us.ibm.com> created for the most recent fd passing prototype. It also reuses the test driver that Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> created for that prototype, with several modifications. Corey Bryant (4): qemu-options: Add -filefd command line option qmp/hmp: Add getfd_file monitor command block: Enable QEMU to retrieve passed fd before attempting open Example -filefd and getfd_file server block.c | 31 +++++++ block/raw-posix.c | 20 +++--- block/raw-win32.c | 4 +- block/vdi.c | 4 +- block/vmdk.c | 21 ++--- block/vpc.c | 2 +- block/vvfat.c | 4 +- block_int.h | 12 +++ hmp-commands.hx | 17 ++++ monitor.c | 70 +++++++++++++++++ monitor.h | 3 + qemu-config.c | 17 ++++ qemu-config.h | 1 + qemu-options.hx | 17 ++++ qemu-tool.c | 5 + qmp-commands.hx | 30 +++++++ test-fd-passing.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++++ vl.c | 6 ++ 18 files changed, 459 insertions(+), 29 deletions(-) create mode 100644 test-fd-passing.c -- 1.7.7.6

This patch provides support for the -filefd command line option. This option will allow passing of a filename and its corresponding file descriptor to QEMU at exec time. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- qemu-config.c | 17 +++++++++++++++++ qemu-config.h | 1 + qemu-options.hx | 17 +++++++++++++++++ 3 files changed, 35 insertions(+), 0 deletions(-) diff --git a/qemu-config.c b/qemu-config.c index be84a03..64afac6 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -613,6 +613,22 @@ QemuOptsList qemu_boot_opts = { }, }; +QemuOptsList qemu_filefd_opts = { + .name = "filefd", + .head = QTAILQ_HEAD_INITIALIZER(qemu_filefd_opts.head), + .desc = { + { + .name = "file", + .type = QEMU_OPT_STRING, + }, { + .name = "fd", + .type = QEMU_OPT_NUMBER, + }, + { /*End of list */ } + }, +}; + + static QemuOptsList *vm_config_groups[32] = { &qemu_drive_opts, &qemu_chardev_opts, @@ -628,6 +644,7 @@ static QemuOptsList *vm_config_groups[32] = { &qemu_machine_opts, &qemu_boot_opts, &qemu_iscsi_opts, + &qemu_filefd_opts, NULL, }; diff --git a/qemu-config.h b/qemu-config.h index 6d7365d..6be54f1 100644 --- a/qemu-config.h +++ b/qemu-config.h @@ -4,6 +4,7 @@ extern QemuOptsList qemu_fsdev_opts; extern QemuOptsList qemu_virtfs_opts; extern QemuOptsList qemu_spice_opts; +extern QemuOptsList qemu_filefd_opts; QemuOptsList *qemu_find_opts(const char *group); void qemu_add_opts(QemuOptsList *list); diff --git a/qemu-options.hx b/qemu-options.hx index 8b66264..5f6782c 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -2743,6 +2743,23 @@ DEF("qtest-log", HAS_ARG, QEMU_OPTION_qtest_log, "-qtest-log LOG specify tracing options\n", QEMU_ARCH_ALL) +DEF("filefd", HAS_ARG, QEMU_OPTION_filefd, + "-filefd file=<filename>,fd=<fd>\n" + " passes a filename and its corresponding file descriptor\n", + QEMU_ARCH_ALL) +STEXI +@item -filefd file=@var{filename},fd=@var{fd} +@findex -filefd +This is used when a management application opens a file on behalf of QEMU. +Instead of performing an open, QEMU will use the fd passed on this option. +@table @option +@item file=@var{filename} +The name of the file. +@item fd=@var{fd} +The file descriptor that corresponds to @var{filename}. +@end table +ETEXI + HXCOMM This is the last statement. Insert new options before this line! STEXI @end table -- 1.7.7.6

On 05/21/2012 02:19 PM, Corey Bryant wrote:
This patch provides support for the -filefd command line option. This option will allow passing of a filename and its corresponding file descriptor to QEMU at exec time.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
+DEF("filefd", HAS_ARG, QEMU_OPTION_filefd, + "-filefd file=<filename>,fd=<fd>\n"
I take it that if filename contains ',', then we have to escape it on the command line? Is it worth passing fd first and file second by default, as a possible way to avoid the need for escaping, or does the option parser not care about ordering? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 05/21/2012 05:40 PM, Eric Blake wrote:
On 05/21/2012 02:19 PM, Corey Bryant wrote:
This patch provides support for the -filefd command line option. This option will allow passing of a filename and its corresponding file descriptor to QEMU at exec time.
Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
+DEF("filefd", HAS_ARG, QEMU_OPTION_filefd, + "-filefd file=<filename>,fd=<fd>\n"
I take it that if filename contains ',', then we have to escape it on the command line? Is it worth passing fd first and file second by default, as a possible way to avoid the need for escaping, or does the option parser not care about ordering?
That's a good question. The options can be ordered either way so I don't think we'll force fd to be specified first. I imagine this should behave no differently than "-drive file=xyz,if=none,...". I ran a quick test using -drive with a filename that had a comma, and (escaped or not) it failed on the option parsing. So it looks like if you have a path with a comma you're not going to have any luck. -- Regards, Corey

Am 22.05.2012 15:25, schrieb Corey Bryant:
On 05/21/2012 05:40 PM, Eric Blake wrote:
On 05/21/2012 02:19 PM, Corey Bryant wrote:
This patch provides support for the -filefd command line option. This option will allow passing of a filename and its corresponding file descriptor to QEMU at exec time.
Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
+DEF("filefd", HAS_ARG, QEMU_OPTION_filefd, + "-filefd file=<filename>,fd=<fd>\n"
I take it that if filename contains ',', then we have to escape it on the command line? Is it worth passing fd first and file second by default, as a possible way to avoid the need for escaping, or does the option parser not care about ordering?
That's a good question. The options can be ordered either way so I don't think we'll force fd to be specified first. I imagine this should behave no differently than "-drive file=xyz,if=none,...". I ran a quick test using -drive with a filename that had a comma, and (escaped or not) it failed on the option parsing. So it looks like if you have a path with a comma you're not going to have any luck.
I think you can escape it, you'd have to use a double comma. But I'd rather not introduce more of this. It's another good reason for using /dev/fd/... instead of a translation table. Kevin

On Tue, May 22, 2012 at 2:38 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 22.05.2012 15:25, schrieb Corey Bryant:
On 05/21/2012 05:40 PM, Eric Blake wrote:
On 05/21/2012 02:19 PM, Corey Bryant wrote:
This patch provides support for the -filefd command line option. This option will allow passing of a filename and its corresponding file descriptor to QEMU at exec time.
Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
+DEF("filefd", HAS_ARG, QEMU_OPTION_filefd, + "-filefd file=<filename>,fd=<fd>\n"
I take it that if filename contains ',', then we have to escape it on the command line? Is it worth passing fd first and file second by default, as a possible way to avoid the need for escaping, or does the option parser not care about ordering?
That's a good question. The options can be ordered either way so I don't think we'll force fd to be specified first. I imagine this should behave no differently than "-drive file=xyz,if=none,...". I ran a quick test using -drive with a filename that had a comma, and (escaped or not) it failed on the option parsing. So it looks like if you have a path with a comma you're not going to have any luck.
I think you can escape it, you'd have to use a double comma. But I'd rather not introduce more of this. It's another good reason for using /dev/fd/... instead of a translation table.
I'm not sure how this solves backing file descriptor passing? How will QEMU know to use /dev/fd/10 for a backing file that is referenced from /dev/fd/9 as "backing.img"? (There was discussion about rewriting backing filenames but I think that solution is risky and we should avoid it.) Stefan

Am 22.05.2012 16:26, schrieb Stefan Hajnoczi:
On Tue, May 22, 2012 at 2:38 PM, Kevin Wolf <kwolf@redhat.com> wrote:
Am 22.05.2012 15:25, schrieb Corey Bryant:
On 05/21/2012 05:40 PM, Eric Blake wrote:
On 05/21/2012 02:19 PM, Corey Bryant wrote:
This patch provides support for the -filefd command line option. This option will allow passing of a filename and its corresponding file descriptor to QEMU at exec time.
Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
+DEF("filefd", HAS_ARG, QEMU_OPTION_filefd, + "-filefd file=<filename>,fd=<fd>\n"
I take it that if filename contains ',', then we have to escape it on the command line? Is it worth passing fd first and file second by default, as a possible way to avoid the need for escaping, or does the option parser not care about ordering?
That's a good question. The options can be ordered either way so I don't think we'll force fd to be specified first. I imagine this should behave no differently than "-drive file=xyz,if=none,...". I ran a quick test using -drive with a filename that had a comma, and (escaped or not) it failed on the option parsing. So it looks like if you have a path with a comma you're not going to have any luck.
I think you can escape it, you'd have to use a double comma. But I'd rather not introduce more of this. It's another good reason for using /dev/fd/... instead of a translation table.
I'm not sure how this solves backing file descriptor passing? How will QEMU know to use /dev/fd/10 for a backing file that is referenced from /dev/fd/9 as "backing.img"? (There was discussion about rewriting backing filenames but I think that solution is risky and we should avoid it.)
By getting an override for the backing file. I'm making something up now, but it could look like this: { "execute": "blockdev-add", "arguments": { "name": "my_backing_file", "format": "raw", "filename": "/dev/fd/10" } } { "execute": "blockdev-add", "arguments": { "name": "my_disk", "format": "qcow2", "filename": "/dev/fd/9", "backing_file": "my_backing_file" /* backing_fmt is not overridden and read from qcow2 */ } } If you absolutely must have it today (this is not meant to be used, but to push -blockdev development ;-)), it looks like this: (qemu) drive_add 0 file=/dev/fd/10,format=raw,if=none,id=my_disk { "execute" : "blockdev-snapshot-sync", "arguments" : { "device": "my_disk", "snapshot-file": "/dev/fd/9", "format": "qcow2", "mode": "existing" } } Kevin

This patch provides support for the getfd_file monitor command. This command will allow passing of a filename and its corresponding file descriptor to a guest via the monitor. This command could be followed, for example, by a drive_add command to hot attach a disk drive. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- hmp-commands.hx | 17 +++++++++++++ monitor.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ monitor.h | 3 ++ qemu-tool.c | 5 ++++ qmp-commands.hx | 30 +++++++++++++++++++++++ 5 files changed, 125 insertions(+), 0 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index 18cb415..9cd5ed1 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1240,6 +1240,23 @@ used by another monitor command. ETEXI { + .name = "getfd_file", + .args_type = "filename:s", + .params = "getfd_file filename", + .help = "receive a file descriptor via SCM rights and assign it a filename", + .user_print = monitor_user_noop, + .mhandler.cmd_new = do_getfd_file, + }, + +STEXI +@item getfd_file @var{filename} +@findex getfd_file +If a file descriptor is passed alongside this command using the SCM_RIGHTS +mechanism on unix sockets, it is stored using the name @var{filename} for +later use by other monitor commands. +ETEXI + + { .name = "block_passwd", .args_type = "device:B,password:s", .params = "block_passwd device password", diff --git a/monitor.c b/monitor.c index 12a6fe2..bdf4dd8 100644 --- a/monitor.c +++ b/monitor.c @@ -163,6 +163,7 @@ struct Monitor { #endif QError *error; QLIST_HEAD(,mon_fd_t) fds; + QLIST_HEAD(,mon_fd_t) file_fds; QLIST_ENTRY(Monitor) entry; }; @@ -2256,6 +2257,42 @@ static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) return -1; } +static int do_getfd_file(Monitor *mon, const QDict *qdict, QObject **ret_data) +{ + const char *filename = qdict_get_str(qdict, "filename"); + mon_fd_t *monfd; + int fd; + + fd = qemu_chr_fe_get_msgfd(mon->chr); + if (fd == -1) { + qerror_report(QERR_FD_NOT_SUPPLIED); + return -1; + } + + if (qemu_isdigit(filename[0])) { + qerror_report(QERR_INVALID_PARAMETER_VALUE, "filename", + "a name not starting with a digit"); + return -1; + } + + QLIST_FOREACH(monfd, &mon->file_fds, next) { + if (strcmp(monfd->name, filename) != 0) { + continue; + } + + close(monfd->fd); + monfd->fd = fd; + return 0; + } + + monfd = g_malloc0(sizeof(mon_fd_t)); + monfd->name = g_strdup(filename); + monfd->fd = fd; + + QLIST_INSERT_HEAD(&mon->file_fds, monfd, next); + return 0; +} + static void do_loadvm(Monitor *mon, const QDict *qdict) { int saved_vm_running = runstate_is_running(); @@ -2292,6 +2329,39 @@ int monitor_get_fd(Monitor *mon, const char *fdname) return -1; } +int monitor_get_fd_file(Monitor *mon, const char *filename, + bool take_ownership) +{ + mon_fd_t *monfd; + + QLIST_FOREACH(monfd, &mon->file_fds, next) { + int fd; + + if (strcmp(monfd->name, filename) != 0) { + continue; + } + + fd = monfd->fd; + + if (take_ownership) { + /* caller takes ownership of fd */ + QLIST_REMOVE(monfd, next); + g_free(monfd->name); + g_free(monfd); + } + + return fd; + } + + return -1; +} + +int qemu_get_fd_file(const char *filename, bool take_ownership) +{ + return cur_mon ? + monitor_get_fd_file(cur_mon, filename, take_ownership) : -1; +} + /* mon_cmds and info_cmds would be sorted at runtime */ static mon_cmd_t mon_cmds[] = { #include "hmp-commands.h" diff --git a/monitor.h b/monitor.h index 0d49800..529d8a7 100644 --- a/monitor.h +++ b/monitor.h @@ -60,6 +60,9 @@ int monitor_read_block_device_key(Monitor *mon, const char *device, void *opaque); int monitor_get_fd(Monitor *mon, const char *fdname); +int monitor_get_fd_file(Monitor *mon, const char *filename, + bool take_ownership); +int qemu_get_fd_file(const char *filename, bool take_ownership); void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap) GCC_FMT_ATTR(2, 0); diff --git a/qemu-tool.c b/qemu-tool.c index 07fc4f2..d3d86bf 100644 --- a/qemu-tool.c +++ b/qemu-tool.c @@ -111,3 +111,8 @@ void migrate_add_blocker(Error *reason) void migrate_del_blocker(Error *reason) { } + +int qemu_get_fd_file(const char *fdname, bool take_ownership) +{ + return -1; +} diff --git a/qmp-commands.hx b/qmp-commands.hx index db980fa..1825a91 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -891,6 +891,36 @@ Example: EQMP { + .name = "getfd_file", + .args_type = "filename:s", + .params = "getfd_file filename", + .help = "receive a file descriptor via SCM rights and assign it a filename", + .user_print = monitor_user_noop, + .mhandler.cmd_new = do_getfd_file, + }, + + +SQMP + +getfd_file +-------------- + +Receive a file descriptor via SCM rights and assign it a filename. + +Arguments: + +- "filename": filename (json-string) + +Example: + +-> { "execute": "getfd_file", + "arguments": { "filename": "/var/lib/libvirt/images/tst.img" } } +<- { "return": {} } + + +EQMP + + { .name = "block_passwd", .args_type = "device:B,password:s", .mhandler.cmd_new = qmp_marshal_input_block_passwd, -- 1.7.7.6

On 05/21/2012 02:19 PM, Corey Bryant wrote:
This patch provides support for the getfd_file monitor command. This command will allow passing of a filename and its corresponding file descriptor to a guest via the monitor. This command could be followed, for example, by a drive_add command to hot attach a disk drive.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
Is the only difference between 'getfd' and 'getfd_file' the fact that 'getfd' introduces an abstract namespace usable only by the fd: protocol, while the 'getfd_file' introduces a name identical to the absolute naming of the file system and usable by the file: protocol? What happens if I pass 'getfd_file' a relative file name? Must the filename passed to 'getfd_file' be in canonical form, or may it contain symlinks, .., and other non-canonical constructs? Can the 'closefd' command be used to close the fd originally given to qemu via 'getfd_file'? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 05/21/2012 05:48 PM, Eric Blake wrote:
On 05/21/2012 02:19 PM, Corey Bryant wrote:
This patch provides support for the getfd_file monitor command. This command will allow passing of a filename and its corresponding file descriptor to a guest via the monitor. This command could be followed, for example, by a drive_add command to hot attach a disk drive.
Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
Is the only difference between 'getfd' and 'getfd_file' the fact that 'getfd' introduces an abstract namespace usable only by the fd: protocol, while the 'getfd_file' introduces a name identical to the absolute naming of the file system and usable by the file: protocol?
The only difference is that getfd passes an fdname to associate to the fd, and getfd_file passes a filename to associate to the fd. These name/fd pairs are stored separately so there won't be any conflicts (ie. fdname == filename).
What happens if I pass 'getfd_file' a relative file name? Must the filename passed to 'getfd_file' be in canonical form, or may it contain symlinks, .., and other non-canonical constructs?
As the code is now, the 'getfd_file' filename has to be the same as the 'drive_add' filename, for example. And the same goes for the '-drive' filename and the '-filfd' filename. I didn't introduce any special handling to canonicalize the filenames, but I think it is necessary. Either in QEMU or libvirt, but it probably makes more sense to canonicalize in QEMU.
Can the 'closefd' command be used to close the fd originally given to qemu via 'getfd_file'?
No, 'closefd' won't close an fd passed in by 'getfd_file'. I was thinking I should probably add a 'closefd_file' that could do this. -- Regards, Corey

On Mon, May 21, 2012 at 9:19 PM, Corey Bryant <coreyb@linux.vnet.ibm.com> wrote: I think Eric has raised the main questions about duplicating getfd and rules regarding canonical file names (QEMU mashes filenames together if the backing filename is relative!).
+ if (qemu_isdigit(filename[0])) { + qerror_report(QERR_INVALID_PARAMETER_VALUE, "filename", + "a name not starting with a digit"); + return -1; + }
What is the reason for this filename restriction?
diff --git a/qmp-commands.hx b/qmp-commands.hx index db980fa..1825a91 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -891,6 +891,36 @@ Example: EQMP
{ + .name = "getfd_file", + .args_type = "filename:s", + .params = "getfd_file filename", + .help = "receive a file descriptor via SCM rights and assign it a filename", + .user_print = monitor_user_noop, + .mhandler.cmd_new = do_getfd_file, + }, + + +SQMP + +getfd_file +-------------- + +Receive a file descriptor via SCM rights and assign it a filename. + +Arguments: + +- "filename": filename (json-string) + +Example: + +-> { "execute": "getfd_file", + "arguments": { "filename": "/var/lib/libvirt/images/tst.img" } } +<- { "return": {} } + + +EQMP
QMP commands should be added to qapi-schema.json as described in docs/writing-qmp-commands.txt. Stefan

On 05/22/2012 05:18 AM, Stefan Hajnoczi wrote:
On Mon, May 21, 2012 at 9:19 PM, Corey Bryant<coreyb@linux.vnet.ibm.com> wrote: I think Eric has raised the main questions about duplicating getfd and rules regarding canonical file names (QEMU mashes filenames together if the backing filename is relative!).
Ok, good so it sounds like we this covered in the other threads then.
+ if (qemu_isdigit(filename[0])) { + qerror_report(QERR_INVALID_PARAMETER_VALUE, "filename", + "a name not starting with a digit"); + return -1; + }
What is the reason for this filename restriction?
The reason is that I copied this from 'getfd'. :) I'll remove it.
diff --git a/qmp-commands.hx b/qmp-commands.hx index db980fa..1825a91 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -891,6 +891,36 @@ Example: EQMP
{ + .name = "getfd_file", + .args_type = "filename:s", + .params = "getfd_file filename", + .help = "receive a file descriptor via SCM rights and assign it a filename", + .user_print = monitor_user_noop, + .mhandler.cmd_new = do_getfd_file, + }, + + +SQMP + +getfd_file +-------------- + +Receive a file descriptor via SCM rights and assign it a filename. + +Arguments: + +- "filename": filename (json-string) + +Example: + +-> { "execute": "getfd_file", + "arguments": { "filename": "/var/lib/libvirt/images/tst.img" } } +<- { "return": {} } + + +EQMP
QMP commands should be added to qapi-schema.json as described in docs/writing-qmp-commands.txt.
Stefan
Ok thanks! -- Regards, Corey

On Tue, 22 May 2012 10:18:22 +0100 Stefan Hajnoczi <stefanha@gmail.com> wrote:
QMP commands should be added to qapi-schema.json as described in docs/writing-qmp-commands.txt.
Looks like there's consensus on dropping this patch and enhancing getfd to return the fd number. This would require to first convert getfd from plain HMP to the QAPI, which is basically to say more or less the same thing as Stefan said above (you could also look for examples searching for "qapi: convert" in the git log). But there's a small problem. Today getfd commands are closely tied to the Monitor. In Anthony's development tree, the getfd commands are tied to the new QMP server's session support. Asking you to integrate the new QMP server only to have the getfd command returning a simple integer would be too much, but at the same time I think you'll have to at least to break it from the monitor. This means moving its data structure away from the Monitor object and probably reworking the internal API used to get fds (ie. monitor_get_fd()). Shouldn't be hard, but you should be careful not to break external users.

On 05/22/2012 03:06 PM, Luiz Capitulino wrote:
On Tue, 22 May 2012 10:18:22 +0100 Stefan Hajnoczi<stefanha@gmail.com> wrote:
QMP commands should be added to qapi-schema.json as described in docs/writing-qmp-commands.txt.
Looks like there's consensus on dropping this patch and enhancing getfd to return the fd number. This would require to first convert getfd from plain HMP to the QAPI, which is basically to say more or less the same thing as Stefan said above (you could also look for examples searching for "qapi: convert" in the git log).
Yep, ok thanks.
But there's a small problem. Today getfd commands are closely tied to the Monitor. In Anthony's development tree, the getfd commands are tied to the new QMP server's session support.
Asking you to integrate the new QMP server only to have the getfd command returning a simple integer would be too much, but at the same time I think you'll have to at least to break it from the monitor. This means moving its data structure away from the Monitor object and probably reworking the internal API used to get fds (ie. monitor_get_fd()).
Shouldn't be hard, but you should be careful not to break external users.
Just to verify, are you talking about moving the "fds" off the Monitor struct? --> QLIST_HEAD(,mon_fd_t) fds; Was this already moved away from the Monitor struct in Anthony's development tree? If not do you have a recommendation on where to move it? I think this would make more sense to me if I took a look at the getfd code in Anthony's development tree. Is this the correct tree? I had some issues cloning it. https://github.com/aliguori/qemu-next.git -- Regards, Corey

On Tue, 22 May 2012 16:02:19 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
But there's a small problem. Today getfd commands are closely tied to the Monitor. In Anthony's development tree, the getfd commands are tied to the new QMP server's session support.
Asking you to integrate the new QMP server only to have the getfd command returning a simple integer would be too much, but at the same time I think you'll have to at least to break it from the monitor. This means moving its data structure away from the Monitor object and probably reworking the internal API used to get fds (ie. monitor_get_fd()).
Shouldn't be hard, but you should be careful not to break external users.
Just to verify, are you talking about moving the "fds" off the Monitor struct? --> QLIST_HEAD(,mon_fd_t) fds;
Yes.
Was this already moved away from the Monitor struct in Anthony's development tree? If not do you have a recommendation on where to move it?
Yes, iirc it moved inside the new QMP server session support in Anthony's tree.
I think this would make more sense to me if I took a look at the getfd code in Anthony's development tree. Is this the correct tree? I had some issues cloning it. https://github.com/aliguori/qemu-next.git
The 'development' tree I'm referring to is the old glib branch in git://repo.or.cz/qemu/aliguori.git.

On 05/22/2012 04:26 PM, Luiz Capitulino wrote:
On Tue, 22 May 2012 16:02:19 -0400 Corey Bryant<coreyb@linux.vnet.ibm.com> wrote:
But there's a small problem. Today getfd commands are closely tied to the Monitor. In Anthony's development tree, the getfd commands are tied to the new QMP server's session support.
Asking you to integrate the new QMP server only to have the getfd command returning a simple integer would be too much, but at the same time I think you'll have to at least to break it from the monitor. This means moving its data structure away from the Monitor object and probably reworking the internal API used to get fds (ie. monitor_get_fd()).
Shouldn't be hard, but you should be careful not to break external users.
Just to verify, are you talking about moving the "fds" off the Monitor struct? --> QLIST_HEAD(,mon_fd_t) fds;
Yes.
Was this already moved away from the Monitor struct in Anthony's development tree? If not do you have a recommendation on where to move it?
Yes, iirc it moved inside the new QMP server session support in Anthony's tree.
I think this would make more sense to me if I took a look at the getfd code in Anthony's development tree. Is this the correct tree? I had some issues cloning it. https://github.com/aliguori/qemu-next.git
The 'development' tree I'm referring to is the old glib branch in git://repo.or.cz/qemu/aliguori.git.
Hmm, it looks like fds is still on the Monitor struct in that branch. I'll do some more searching later unless you have anything else you can point me to. -- Regards, Corey

On Tue, 22 May 2012 18:34:19 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 05/22/2012 04:26 PM, Luiz Capitulino wrote:
On Tue, 22 May 2012 16:02:19 -0400 Corey Bryant<coreyb@linux.vnet.ibm.com> wrote:
But there's a small problem. Today getfd commands are closely tied to the Monitor. In Anthony's development tree, the getfd commands are tied to the new QMP server's session support.
Asking you to integrate the new QMP server only to have the getfd command returning a simple integer would be too much, but at the same time I think you'll have to at least to break it from the monitor. This means moving its data structure away from the Monitor object and probably reworking the internal API used to get fds (ie. monitor_get_fd()).
Shouldn't be hard, but you should be careful not to break external users.
Just to verify, are you talking about moving the "fds" off the Monitor struct? --> QLIST_HEAD(,mon_fd_t) fds;
Yes.
Was this already moved away from the Monitor struct in Anthony's development tree? If not do you have a recommendation on where to move it?
Yes, iirc it moved inside the new QMP server session support in Anthony's tree.
I think this would make more sense to me if I took a look at the getfd code in Anthony's development tree. Is this the correct tree? I had some issues cloning it. https://github.com/aliguori/qemu-next.git
The 'development' tree I'm referring to is the old glib branch in git://repo.or.cz/qemu/aliguori.git.
Hmm, it looks like fds is still on the Monitor struct in that branch.
Oh, you're right. That code is unfinished. It seems that I kept the finished version in my mind. Well, I don't think that moving the fd array to another object will buy us much, so you can keep it this way. Note that you still have to convert do_getfd() to the QAPI as pointed out by Stefan and that the monitor object (*mon pointer) won't be passed to it. You'll have to use cur_mon in qmp_getfd() (as Anthony's version does).

On 05/23/2012 09:33 AM, Luiz Capitulino wrote:
On Tue, 22 May 2012 18:34:19 -0400 Corey Bryant<coreyb@linux.vnet.ibm.com> wrote:
On 05/22/2012 04:26 PM, Luiz Capitulino wrote:
On Tue, 22 May 2012 16:02:19 -0400 Corey Bryant<coreyb@linux.vnet.ibm.com> wrote:
But there's a small problem. Today getfd commands are closely tied to the Monitor. In Anthony's development tree, the getfd commands are tied to the new QMP server's session support.
Asking you to integrate the new QMP server only to have the getfd command returning a simple integer would be too much, but at the same time I think you'll have to at least to break it from the monitor. This means moving its data structure away from the Monitor object and probably reworking the internal API used to get fds (ie. monitor_get_fd()).
Shouldn't be hard, but you should be careful not to break external users.
Just to verify, are you talking about moving the "fds" off the Monitor struct? --> QLIST_HEAD(,mon_fd_t) fds;
Yes.
Was this already moved away from the Monitor struct in Anthony's development tree? If not do you have a recommendation on where to move it?
Yes, iirc it moved inside the new QMP server session support in Anthony's tree.
I think this would make more sense to me if I took a look at the getfd code in Anthony's development tree. Is this the correct tree? I had some issues cloning it. https://github.com/aliguori/qemu-next.git
The 'development' tree I'm referring to is the old glib branch in git://repo.or.cz/qemu/aliguori.git.
Hmm, it looks like fds is still on the Monitor struct in that branch.
Oh, you're right. That code is unfinished. It seems that I kept the finished version in my mind.
Oh how I wish I could git clone some people's brains. :)
Well, I don't think that moving the fd array to another object will buy us much, so you can keep it this way. Note that you still have to convert do_getfd() to the QAPI as pointed out by Stefan and that the monitor object (*mon pointer) won't be passed to it. You'll have to use cur_mon in qmp_getfd() (as Anthony's version does).
Alright, thanks for the info. -- Regards, Corey

With this patch, when QEMU needs to "open" a file, it will first check to see if a matching filename/fd pair were passed via the -filefd command line option or the getfd_file monitor command. If a match is found, QEMU will use the passed fd and will not attempt to open the file. Otherwise, if a match is not found, QEMU will attempt to open the file on it's own. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- block.c | 31 +++++++++++++++++++++++++++++++ block/raw-posix.c | 20 ++++++++++---------- block/raw-win32.c | 4 ++-- block/vdi.c | 4 ++-- block/vmdk.c | 21 +++++++++------------ block/vpc.c | 2 +- block/vvfat.c | 4 ++-- block_int.h | 12 ++++++++++++ vl.c | 6 ++++++ 9 files changed, 75 insertions(+), 29 deletions(-) diff --git a/block.c b/block.c index af2ab4f..6472114 100644 --- a/block.c +++ b/block.c @@ -102,6 +102,37 @@ static BlockDriverState *bs_snapshots; /* If non-zero, use only whitelisted block drivers */ static int use_bdrv_whitelist; +static int filename_match(QemuOpts *opts, void *opaque) +{ + const char *file = qemu_opt_get(opts, "file"); + int fd = qemu_opt_get_number(opts, "fd", -1); + + return (strcmp((char *)opaque, file) == 0) ? fd : 0; +} + +int file_open(const char *filename, int flags, mode_t mode) +{ + int fd; + +#ifdef _WIN32 + return qemu_open(filename, flags, mode); +#else + + fd = qemu_opts_foreach(qemu_find_opts("filefd"), filename_match, + (void *)filename, 1); + if (fd != 0) { + return dup(fd); + } + + fd = qemu_get_fd_file(filename, false); + if (fd != -1) { + return dup(fd); + } + + return qemu_open(filename, flags, mode); +#endif +} + #ifdef _WIN32 static int is_windows_drive_prefix(const char *filename) { diff --git a/block/raw-posix.c b/block/raw-posix.c index 03fcfcc..4f7b40f 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -208,7 +208,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, s->open_flags |= O_DSYNC; s->fd = -1; - fd = qemu_open(filename, s->open_flags, 0644); + fd = file_open(filename, s->open_flags, 0644); if (fd < 0) { ret = -errno; if (ret == -EROFS) @@ -568,8 +568,8 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) options++; } - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, - 0644); + fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, + 0644); if (fd < 0) { result = -errno; } else { @@ -741,7 +741,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags) if ( bsdPath[ 0 ] != '\0' ) { strcat(bsdPath,"s0"); /* some CDs don't have a partition 0 */ - fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE); + fd = file_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE, 0); if (fd < 0) { bsdPath[strlen(bsdPath)-1] = '1'; } else { @@ -798,7 +798,7 @@ static int fd_open(BlockDriverState *bs) #endif return -EIO; } - s->fd = open(bs->filename, s->open_flags & ~O_NONBLOCK); + s->fd = file_open(bs->filename, s->open_flags & ~O_NONBLOCK, 0); if (s->fd < 0) { s->fd_error_time = get_clock(); s->fd_got_error = 1; @@ -872,7 +872,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options) options++; } - fd = open(filename, O_WRONLY | O_BINARY); + fd = file_open(filename, O_WRONLY | O_BINARY, 0); if (fd < 0) return -errno; @@ -950,7 +950,7 @@ static int floppy_probe_device(const char *filename) if (strstart(filename, "/dev/fd", NULL)) prio = 50; - fd = open(filename, O_RDONLY | O_NONBLOCK); + fd = file_open(filename, O_RDONLY | O_NONBLOCK, 0); if (fd < 0) { goto out; } @@ -1003,7 +1003,7 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag) close(s->fd); s->fd = -1; } - fd = open(bs->filename, s->open_flags | O_NONBLOCK); + fd = file_open(bs->filename, s->open_flags | O_NONBLOCK, 0); if (fd >= 0) { if (ioctl(fd, FDEJECT, 0) < 0) perror("FDEJECT"); @@ -1053,7 +1053,7 @@ static int cdrom_probe_device(const char *filename) int prio = 0; struct stat st; - fd = open(filename, O_RDONLY | O_NONBLOCK); + fd = file_open(filename, O_RDONLY | O_NONBLOCK, 0); if (fd < 0) { goto out; } @@ -1177,7 +1177,7 @@ static int cdrom_reopen(BlockDriverState *bs) */ if (s->fd >= 0) close(s->fd); - fd = open(bs->filename, s->open_flags, 0644); + fd = file_open(bs->filename, s->open_flags, 0644); if (fd < 0) { s->fd = -1; return -EIO; diff --git a/block/raw-win32.c b/block/raw-win32.c index e4b0b75..ec4ac96 100644 --- a/block/raw-win32.c +++ b/block/raw-win32.c @@ -255,8 +255,8 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) options++; } - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, - 0644); + fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, + 0644); if (fd < 0) return -EIO; set_sparse(fd); diff --git a/block/vdi.c b/block/vdi.c index 119d3c7..b3ec9b2 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -648,8 +648,8 @@ static int vdi_create(const char *filename, QEMUOptionParameter *options) options++; } - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, - 0644); + fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, + 0644); if (fd < 0) { return -errno; } diff --git a/block/vmdk.c b/block/vmdk.c index 18e9b4c..bda4c1a 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1161,10 +1161,9 @@ static int vmdk_create_extent(const char *filename, int64_t filesize, VMDK4Header header; uint32_t tmp, magic, grains, gd_size, gt_size, gt_count; - fd = open( - filename, - O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, - 0644); + fd = file_open(filename, + O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, + 0644); if (fd < 0) { return -errno; } @@ -1484,15 +1483,13 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options) (flags & BLOCK_FLAG_COMPAT6 ? 6 : 4), total_size / (int64_t)(63 * 16 * 512)); if (split || flat) { - fd = open( - filename, - O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, - 0644); + fd = file_open(filename, + O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, + 0644); } else { - fd = open( - filename, - O_WRONLY | O_BINARY | O_LARGEFILE, - 0644); + fd = file_open(filename, + O_WRONLY | O_BINARY | O_LARGEFILE, + 0644); } if (fd < 0) { return -errno; diff --git a/block/vpc.c b/block/vpc.c index 5cd13d1..c1efb10 100644 --- a/block/vpc.c +++ b/block/vpc.c @@ -678,7 +678,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options) } /* Create the file */ - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644); + fd = file_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644); if (fd < 0) { return -EIO; } diff --git a/block/vvfat.c b/block/vvfat.c index 2dc9d50..1573d8f 100644 --- a/block/vvfat.c +++ b/block/vvfat.c @@ -1156,7 +1156,7 @@ static int open_file(BDRVVVFATState* s,mapping_t* mapping) if(!s->current_mapping || strcmp(s->current_mapping->path,mapping->path)) { /* open file */ - int fd = open(mapping->path, O_RDONLY | O_BINARY | O_LARGEFILE); + int fd = file_open(mapping->path, O_RDONLY | O_BINARY | O_LARGEFILE, 0); if(fd<0) return -1; vvfat_close_current_file(s); @@ -2215,7 +2215,7 @@ static int commit_one_file(BDRVVVFATState* s, for (i = s->cluster_size; i < offset; i += s->cluster_size) c = modified_fat_get(s, c); - fd = open(mapping->path, O_RDWR | O_CREAT | O_BINARY, 0666); + fd = file_open(mapping->path, O_RDWR | O_CREAT | O_BINARY, 0666); if (fd < 0) { fprintf(stderr, "Could not open %s... (%s, %d)\n", mapping->path, strerror(errno), errno); diff --git a/block_int.h b/block_int.h index b80e66d..f3b6144 100644 --- a/block_int.h +++ b/block_int.h @@ -453,4 +453,16 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base, BlockDriverCompletionFunc *cb, void *opaque, Error **errp); +/** + * file_open: + * @filename: the filename to attempt to open + * @flags: O_ flags that determine how the file is open + * @mode: the mode to create the file with if #O_CREAT is included in @flags + * + * This function behaves just like the @open libc function. It may, however, + * get the file descriptor from the QEMU command line or monitor if QEMU is + * being run with fewer privileges. + */ +int file_open(const char *filename, int flags, mode_t mode); + #endif /* BLOCK_INT_H */ diff --git a/vl.c b/vl.c index 23ab3a3..6a55166 100644 --- a/vl.c +++ b/vl.c @@ -3201,6 +3201,12 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_qtest_log: qtest_log = optarg; break; + case QEMU_OPTION_filefd: + opts = qemu_opts_parse(qemu_find_opts("filefd"), optarg, 0); + if (!opts) { + exit(1); + } + break; default: os_parse_cmd_args(popt->index, optarg); } -- 1.7.7.6

On 05/21/2012 02:19 PM, Corey Bryant wrote:
With this patch, when QEMU needs to "open" a file, it will first check to see if a matching filename/fd pair were passed via the -filefd command line option or the getfd_file monitor command. If a match is found, QEMU will use the passed fd and will not attempt to open the file. Otherwise, if a match is not found, QEMU will attempt to open the file on it's own.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
+int file_open(const char *filename, int flags, mode_t mode) +{ + int fd; + +#ifdef _WIN32 + return qemu_open(filename, flags, mode); +#else
Would it be any easier to write: #ifndef _WIN32 qemu_get_fd_file() stuff #endif return qemu_open() so that you aren't repeating the return line?
+ fd = qemu_get_fd_file(filename, false); + if (fd != -1) { + return dup(fd);
Why are you dup'ing the fd? That just sounds like a way to leak fds. Remember, the existing 'getfd' monitor command doesn't dup things - it either gets consumed by a successful use of the named fd, or it remains open on failure and the user can close it by calling 'closefd'. Or, if you are intentionally allowing the user to reuse the fd for more than one qemu open instance, you need to document this point. What happens if qemu wants O_WRONLY or O_RDWR access, but the user passed in an fd with only O_RDONLY access? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 05/21/2012 05:50 PM, Eric Blake wrote:
On 05/21/2012 02:19 PM, Corey Bryant wrote:
With this patch, when QEMU needs to "open" a file, it will first check to see if a matching filename/fd pair were passed via the -filefd command line option or the getfd_file monitor command. If a match is found, QEMU will use the passed fd and will not attempt to open the file. Otherwise, if a match is not found, QEMU will attempt to open the file on it's own.
Signed-off-by: Corey Bryant<coreyb@linux.vnet.ibm.com>
+int file_open(const char *filename, int flags, mode_t mode) +{ + int fd; + +#ifdef _WIN32 + return qemu_open(filename, flags, mode); +#else
Would it be any easier to write:
#ifndef _WIN32 qemu_get_fd_file() stuff #endif return qemu_open()
so that you aren't repeating the return line?
Yes that's easier. Thanks for the suggestion!
+ fd = qemu_get_fd_file(filename, false); + if (fd != -1) { + return dup(fd);
Why are you dup'ing the fd? That just sounds like a way to leak fds. Remember, the existing 'getfd' monitor command doesn't dup things - it either gets consumed by a successful use of the named fd, or it remains open on failure and the user can close it by calling 'closefd'.
The way it works now is that the fd/filename pairs that are passed in (either through -filefd or getfd_file) will persist on the option and monitor structures. In other words, when we find a match for a filename on the monitor structure, we don't delete it from the struct. We leave it there in case we need to open the file again. So we dup the fd and let QEMU use the new fd as if it opened it itself. This allows QEMU to close the fd on its own, and if it needs to re-open the fd, it can dup it again.
Or, if you are intentionally allowing the user to reuse the fd for more than one qemu open instance, you need to document this point.
Ok, yes. I'll document this.
What happens if qemu wants O_WRONLY or O_RDWR access, but the user passed in an fd with only O_RDONLY access?
This is an area of concern for me. And it's an area where Anthony's call-back approach was much simpler because it passed the open flags directly from QEMU to libvirt. I don't think these flags can be set through fcntl(F_SETFL). So I think they have to be set on the open() by the managing application. The problem is that, today, QEMU will open a single file several different times on initialization alone (reading cow headers and what not), and the open flags vary on these open calls. The difference with the new approach in this patch series is that the fd from a single open call is re-used for each of the "opens". -- Regards, Corey

This example demonstrates use of the -filefd command to open two disk drives at start-up time. It also demonstrates hot attaching a third disk drive with the getfd_file monitor command. I still have some learning to do with regards to QMP, so the example is using a not-so-program-friendly HMP method. Usage: ./test-fd-passing /path/hda.img /path/hdb.img /path/hdc.img Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- test-fd-passing.c | 224 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 224 insertions(+), 0 deletions(-) create mode 100644 test-fd-passing.c diff --git a/test-fd-passing.c b/test-fd-passing.c new file mode 100644 index 0000000..d568198 --- /dev/null +++ b/test-fd-passing.c @@ -0,0 +1,224 @@ +/* + * QEMU -filefd and getfd_file test server + * + * Copyright IBM, Corp. 2012 + * + * Authors: + * Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> + * Corey Bryant <coreyb@linux.vnet.ibm.com> + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + * gcc -Wall -o test-fd-passing test-fd-passing.c + */ + +#define _GNU_SOURCE +#include <stdint.h> +#include <stdlib.h> +#include <string.h> +#include <stdio.h> +#include <errno.h> +#include <sys/stat.h> +#include <sys/types.h> +#include <sys/socket.h> +#include <fcntl.h> +#include <unistd.h> +#include <spawn.h> +#include <sys/socket.h> +#include <sys/un.h> + +static int openQemuMonitor(const char *monitor) +{ + int i; + int ret; + struct sockaddr_un addr; + int monfd = 0; + + if ((monfd = socket(AF_UNIX, SOCK_STREAM, 0)) < 0) { + perror("socket"); + goto error; + } + + memset(&addr, 0, sizeof(addr)); + addr.sun_family = AF_UNIX; + strcpy(addr.sun_path, monitor); + + for (i = 0; i < 100; i++) { + ret = connect(monfd, (struct sockaddr *) &addr, sizeof(addr)); + if (ret == 0) { + break; + } + usleep(.2 * 1000000); + } + + if (ret != 0) { + fprintf(stderr, "no monitor socket"); + goto error; + } + + return monfd; + +error: + close(monfd); + return -1; +} + +static int issueHMPCmdFD(int monfd,const char *data, size_t len, int fd) +{ + int ret; + struct msghdr msg; + struct iovec iov[1]; + char control[CMSG_SPACE(sizeof(int))]; + struct cmsghdr *cmsg; + + memset(&msg, 0, sizeof(msg)); + + iov[0].iov_base = (void *)data; + iov[0].iov_len = len; + + msg.msg_iov = iov; + msg.msg_iovlen = 1; + + msg.msg_control = control; + msg.msg_controllen = sizeof(control); + + cmsg = CMSG_FIRSTHDR(&msg); + cmsg->cmsg_len = CMSG_LEN(sizeof(int)); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + memcpy(CMSG_DATA(cmsg), &fd, sizeof(int)); + + do { + ret = sendmsg(monfd, &msg, 0); + } while (ret < 0 && errno == EINTR); + + return ret; +} + +int main(int argc, char *argv[]) { + int rc; + int fd1, fd2, hotfd, monfd=-1; + int flags = O_RDWR; + int mode = S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH; + pid_t child_pid; + char *drive_str_1 = NULL;; + char *drive_str_2 = NULL;; + char *filefd_str_1 = NULL; + char *filefd_str_2 = NULL; + char *getfd_file_str = NULL; + char *drive_add_str = NULL; + char *device_add_str = NULL; + + if (argc != 4) { + fprintf(stderr, "usage: %s <boot-image-file-1> <boot-image-file-2> <attach-image-file>\n", argv[0]); + goto error; + } + + fd1 = open(argv[1], flags, mode); + if (fd1 == -1) { + perror("open"); + goto error; + } + + fd2 = open(argv[2], flags, mode); + if (fd2 == -1) { + perror("open"); + goto error; + } + + hotfd = open(argv[3], flags, mode); + if (hotfd == -1) { + perror("open"); + goto error; + } + + asprintf(&drive_str_1, "file=%s,if=none,id=drive-virtio-disk0", argv[1]); + asprintf(&filefd_str_1, "file=%s,fd=%d", argv[1], fd1); + asprintf(&drive_str_2, "file=%s,if=none,id=drive-virtio-disk1", argv[2]); + asprintf(&filefd_str_2, "file=%s,fd=%d", argv[2], fd2); + + char *child_argv[] = { + "qemu-system-x86_64", + "-enable-kvm", + "-m", "1024", + "-chardev", + "socket,id=charmonitor,path=/var/lib/libvirt/qemu/RHEL62.monitor,server,nowait", + "-mon", + "chardev=charmonitor,id=monitor,mode=readline", + "-drive", drive_str_1, + "-device", + "virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0", + "-drive", drive_str_2, + "-device", + "virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1", + "-filefd", filefd_str_1, + "-filefd", filefd_str_2, + "-vnc", ":0", + NULL, + }; + + if (posix_spawn(&child_pid, "/usr/local/bin/qemu-system-x86_64", + NULL, NULL, child_argv, environ) != 0) { + perror("posix_spawn\n"); + goto error; + } + + monfd = openQemuMonitor("/var/lib/libvirt/qemu/RHEL62.monitor"); + if (monfd == -1) { + goto error; + } + + asprintf(&getfd_file_str, "getfd_file %s\r\n", argv[3]); + rc = issueHMPCmdFD(monfd, getfd_file_str, + strlen(getfd_file_str), hotfd); + if (rc < 0) { + perror("issueHMPCmdFD"); + goto error; + } + + sleep(1); + asprintf(&drive_add_str, "drive_add data_drive file=%s,%s", argv[3], + "if=none,id=drive-virtio-disk2,cache=writethrough\r\n"); + rc = write(monfd, drive_add_str, strlen(drive_add_str)); + if (rc < 0) { + perror("write"); + goto error; + } + + sleep(1); + asprintf(&device_add_str, "device_add virtio-blk-pci,bus=pci.0,%s", + "addr=0x8,drive=drive-virtio-disk2,id=virtio-disk2\r\n"); + rc = write(monfd, device_add_str, strlen(device_add_str)); + if (rc < 0) { + perror("write"); + goto error; + } + +error: + if (drive_str_1) { + free(drive_str_1); + } + if (drive_str_2) { + free(drive_str_2); + } + if (filefd_str_1) { + free(filefd_str_1); + } + if (filefd_str_2) { + free(filefd_str_2); + } + if (getfd_file_str) { + free(getfd_file_str); + } + if (drive_add_str) { + free(drive_add_str); + } + if (device_add_str) { + free(device_add_str); + } + if (monfd != -1) { + close(monfd); + } + return -1; +} -- 1.7.7.6

Am 21.05.2012 22:19, schrieb Corey Bryant:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files.
This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file.
I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42? Kevin

On 05/22/2012 02:18 AM, Kevin Wolf wrote:
This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file.
I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42?
This doesn't make "some file names magic", it makes "all file names magic". In other words, _every_ call to open() first checks the database for an existing fd for the same file name. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Am 22.05.2012 14:02, schrieb Eric Blake:
On 05/22/2012 02:18 AM, Kevin Wolf wrote:
This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file.
I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42?
This doesn't make "some file names magic", it makes "all file names magic". In other words, _every_ call to open() first checks the database for an existing fd for the same file name.
Depends on your definition. You call every database lookup magic, I only considered cases where the database actually contains something. But no matter if "some" or "all", there's magic and I dislike that. Kevin

On 05/22/2012 04:18 AM, Kevin Wolf wrote:
Am 21.05.2012 22:19, schrieb Corey Bryant:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files.
This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file.
I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42?
Kevin
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example? -- Regards, Corey

Am 22.05.2012 16:30, schrieb Corey Bryant:
On 05/22/2012 04:18 AM, Kevin Wolf wrote:
Am 21.05.2012 22:19, schrieb Corey Bryant:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files.
This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file.
I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42?
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example?
With your approach you open the file outside qemu, pass the fd to qemu along with a file name that it's supposed to replace and then you use that fake file name: (qemu) getfd_file abc (qemu) drive_add 0 file=abc,... Instead you could use the existing getfd command and avoid the translation: (qemu) getfd 42 (qemu) drive_add 0 file=/dev/fd/42,... Er, well. Just that getfd doesn't return the assigned fd today, so the management tool doesn't know it. We would have to add that. Kevin

On 05/22/2012 08:45 AM, Kevin Wolf wrote:
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example?
With your approach you open the file outside qemu, pass the fd to qemu along with a file name that it's supposed to replace and then you use that fake file name:
(qemu) getfd_file abc (qemu) drive_add 0 file=abc,...
Instead you could use the existing getfd command and avoid the translation:
(qemu) getfd 42 (qemu) drive_add 0 file=/dev/fd/42,...
Er, well. Just that getfd doesn't return the assigned fd today, so the management tool doesn't know it. We would have to add that.
That actually sounds workable. As long as management knows _what_ fd qemu recieved (that is, 'getfd' is enhanced to tell libvirt the associated fd number), and as long as qemu makes the magic naming of /dev/fd/ work everywhere (even if it isn't normally part of the host OS), then libvirt could indeed reuse existing file mechanisms to open a file using an fd that it knows qemu should already own, without needing to invent a new 'getfd_file' monitor command. I guess in this instance, libvirt would have to unconditionally use 'closefd' after the command that reused the fd, since using file=/dev/fd/42 dups the fd rather than consuming it (this is different from commands that use fd:nnn to consume an fd). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Am 22.05.2012 17:01, schrieb Eric Blake:
On 05/22/2012 08:45 AM, Kevin Wolf wrote:
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example?
With your approach you open the file outside qemu, pass the fd to qemu along with a file name that it's supposed to replace and then you use that fake file name:
(qemu) getfd_file abc (qemu) drive_add 0 file=abc,...
Instead you could use the existing getfd command and avoid the translation:
(qemu) getfd 42 (qemu) drive_add 0 file=/dev/fd/42,...
Er, well. Just that getfd doesn't return the assigned fd today, so the management tool doesn't know it. We would have to add that.
That actually sounds workable. As long as management knows _what_ fd qemu recieved (that is, 'getfd' is enhanced to tell libvirt the associated fd number), and as long as qemu makes the magic naming of /dev/fd/ work everywhere (even if it isn't normally part of the host OS), then libvirt could indeed reuse existing file mechanisms to open a file using an fd that it knows qemu should already own, without needing to invent a new 'getfd_file' monitor command.
Which OSes are you thinking of? I'm not sure if we need to support FD passing on all OSes in all places where you can pass a file name. The specific use case we have in mind is for bypassing a SELinux limitation, so that would be useful only for Linux anyway.
I guess in this instance, libvirt would have to unconditionally use 'closefd' after the command that reused the fd, since using file=/dev/fd/42 dups the fd rather than consuming it (this is different from commands that use fd:nnn to consume an fd).
I actually liked the fd: protocol approach, but Anthony doesn't seem to want it any more. But yes, I think if we use /dev/fd/42 you need to closefd unconditionally. Kevin

On 05/22/2012 10:45 AM, Kevin Wolf wrote:
Am 22.05.2012 16:30, schrieb Corey Bryant:
On 05/22/2012 04:18 AM, Kevin Wolf wrote:
Am 21.05.2012 22:19, schrieb Corey Bryant:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files.
This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file.
I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42?
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example?
With your approach you open the file outside qemu, pass the fd to qemu along with a file name that it's supposed to replace and then you use that fake file name:
(qemu) getfd_file abc (qemu) drive_add 0 file=abc,...
Instead you could use the existing getfd command and avoid the translation:
(qemu) getfd 42 (qemu) drive_add 0 file=/dev/fd/42,...
Er, well. Just that getfd doesn't return the assigned fd today, so the management tool doesn't know it. We would have to add that.
Kevin
Thanks for the explanation. This would mean the management app that performs the open(/path/to/my.img) would have to keep a mapping of filenames (/path/to/my.img) to corresponding /dev/fd/X paths, or perhaps just keeping track of the filename and fd is enough. It sounds like this would simplify things in QEMU and get rid of any need for canonicalization of filenames in QEMU. I'm not sure why getfd would have to return the fd though. I'm assuming this would be the fd returned from open("dev/fd/42"). -- Regards, Corey

Am 22.05.2012 17:29, schrieb Corey Bryant:
On 05/22/2012 10:45 AM, Kevin Wolf wrote:
Am 22.05.2012 16:30, schrieb Corey Bryant:
On 05/22/2012 04:18 AM, Kevin Wolf wrote:
Am 21.05.2012 22:19, schrieb Corey Bryant:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files.
This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file.
I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42?
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example?
With your approach you open the file outside qemu, pass the fd to qemu along with a file name that it's supposed to replace and then you use that fake file name:
(qemu) getfd_file abc (qemu) drive_add 0 file=abc,...
Instead you could use the existing getfd command and avoid the translation:
(qemu) getfd 42 (qemu) drive_add 0 file=/dev/fd/42,...
Er, well. Just that getfd doesn't return the assigned fd today, so the management tool doesn't know it. We would have to add that.
Thanks for the explanation. This would mean the management app that performs the open(/path/to/my.img) would have to keep a mapping of filenames (/path/to/my.img) to corresponding /dev/fd/X paths, or perhaps just keeping track of the filename and fd is enough. It sounds like this would simplify things in QEMU and get rid of any need for canonicalization of filenames in QEMU.
I don't know the implementation details of libvirt, but I would assume that they don't have to keep a name/fd map and deal with strings, but could just add the fd to some internal object representing a block device of a running domain. I would be surprised if this didn't exist.
I'm not sure why getfd would have to return the fd though. I'm assuming this would be the fd returned from open("dev/fd/42").
It would be the 42. When you pass a file descriptor via getfd, you don't know yet which number it gets assigned in qemu. Kevin

On 05/22/2012 11:39 AM, Kevin Wolf wrote:
Am 22.05.2012 17:29, schrieb Corey Bryant:
On 05/22/2012 10:45 AM, Kevin Wolf wrote:
Am 22.05.2012 16:30, schrieb Corey Bryant:
On 05/22/2012 04:18 AM, Kevin Wolf wrote:
Am 21.05.2012 22:19, schrieb Corey Bryant:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files.
This patch series adds the -filefd command-line option and the getfd_file monitor command. This will enable libvirt to open a file and push the corresponding filename and file descriptor to QEMU. When QEMU needs to "open" a file, it will first check if the file descriptor was passed by either of these methods before attempting to actually open the file.
I thought we decided to avoid making some file names magic, and instead go for the obvious /dev/fd/42?
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example?
With your approach you open the file outside qemu, pass the fd to qemu along with a file name that it's supposed to replace and then you use that fake file name:
(qemu) getfd_file abc (qemu) drive_add 0 file=abc,...
Instead you could use the existing getfd command and avoid the translation:
(qemu) getfd 42 (qemu) drive_add 0 file=/dev/fd/42,...
Er, well. Just that getfd doesn't return the assigned fd today, so the management tool doesn't know it. We would have to add that.
Thanks for the explanation. This would mean the management app that performs the open(/path/to/my.img) would have to keep a mapping of filenames (/path/to/my.img) to corresponding /dev/fd/X paths, or perhaps just keeping track of the filename and fd is enough. It sounds like this would simplify things in QEMU and get rid of any need for canonicalization of filenames in QEMU.
I don't know the implementation details of libvirt, but I would assume that they don't have to keep a name/fd map and deal with strings, but could just add the fd to some internal object representing a block device of a running domain. I would be surprised if this didn't exist.
Ok, that's probably the case.
I'm not sure why getfd would have to return the fd though. I'm assuming this would be the fd returned from open("dev/fd/42").
It would be the 42. When you pass a file descriptor via getfd, you don't know yet which number it gets assigned in qemu.
Kevin
Sorry, I must be missing something. Isn't 42 the fd that libvirt got from the open() call? I assume you are talking about returning the fd that QEMU created as a dup. I'm still not seeing the point in returning an fd to libvirt. It seems like QEMU should just be able to dup the fd that it was passed, and close/re-dup it as needed. -- Regards, Corey

On 05/22/2012 09:29 AM, Corey Bryant wrote:
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example?
Instead you could use the existing getfd command and avoid the translation:
(qemu) getfd
Really, this would be: (qemu) getfd name Here, libvirt may be passing in fd 20 (from the livirt process), which qemu then receives as fd 42 (in the qemu process). Libvirt needs to know that qemu sees the file as 42, because a file=/dev/fd/20 (from libvirt's perspective) is wrong; if qemu will be opening /dev/fd, it has to be /dev/fd/42. If you pass the fd's by inheritance at the command line when first exec'ing qemu, then libvirt's fd number _is_ qemu's fd number, so it is only the 'getfd' command that needs to be enhanced to return an fd number.
42 (qemu) drive_add 0 file=/dev/fd/42,...
Er, well. Just that getfd doesn't return the assigned fd today, so the management tool doesn't know it. We would have to add that.
Kevin
Thanks for the explanation. This would mean the management app that performs the open(/path/to/my.img) would have to keep a mapping of filenames (/path/to/my.img) to corresponding /dev/fd/X paths, or perhaps just keeping track of the filename and fd is enough. It sounds like this would simplify things in QEMU and get rid of any need for canonicalization of filenames in QEMU.
Libvirt would just track the int fd returned by 'getfd' as associated with the device it has handed to qemu, and construct a /dev/fd/X path based on that int. Not too difficult.
I'm not sure why getfd would have to return the fd though. I'm assuming this would be the fd returned from open("dev/fd/42").
No. That happens later. That is, when libvirt does 'drive_add 0 file=/dev/fd/42', then qemu does open("/dev/fd/42") and gets a _new_ fd, which is basically the result of dup(42). After the 'drive_add' succeeds, _then_ libvirt follows up with a 'closefd name' that matches the name passed in to the original 'getfd', so that qemu will call close(42) at that point. The added drive continues to use the duplicated fd. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 05/22/2012 12:15 PM, Eric Blake wrote:
On 05/22/2012 09:29 AM, Corey Bryant wrote:
I understand that open("/dev/fd/42") would be the same as dup(42), but I'm not sure that I'm entirely clear on how this would work. Could you give an example?
Instead you could use the existing getfd command and avoid the translation:
(qemu) getfd
Really, this would be:
(qemu) getfd name
Here, libvirt may be passing in fd 20 (from the livirt process), which qemu then receives as fd 42 (in the qemu process). Libvirt needs to know that qemu sees the file as 42, because a file=/dev/fd/20 (from libvirt's perspective) is wrong; if qemu will be opening /dev/fd, it has to be /dev/fd/42.
This clears things up a lot.
If you pass the fd's by inheritance at the command line when first exec'ing qemu, then libvirt's fd number _is_ qemu's fd number, so it is only the 'getfd' command that needs to be enhanced to return an fd number.
Ok
42 (qemu) drive_add 0 file=/dev/fd/42,...
Er, well. Just that getfd doesn't return the assigned fd today, so the management tool doesn't know it. We would have to add that.
Kevin
Thanks for the explanation. This would mean the management app that performs the open(/path/to/my.img) would have to keep a mapping of filenames (/path/to/my.img) to corresponding /dev/fd/X paths, or perhaps just keeping track of the filename and fd is enough. It sounds like this would simplify things in QEMU and get rid of any need for canonicalization of filenames in QEMU.
Libvirt would just track the int fd returned by 'getfd' as associated with the device it has handed to qemu, and construct a /dev/fd/X path based on that int. Not too difficult.
Ok
I'm not sure why getfd would have to return the fd though. I'm assuming this would be the fd returned from open("dev/fd/42").
No. That happens later. That is, when libvirt does 'drive_add 0 file=/dev/fd/42', then qemu does open("/dev/fd/42") and gets a _new_ fd, which is basically the result of dup(42). After the 'drive_add' succeeds, _then_ libvirt follows up with a 'closefd name' that matches the name passed in to the original 'getfd', so that qemu will call close(42) at that point. The added drive continues to use the duplicated fd.
That makes sense. Thanks! -- Regards, Corey
participants (5)
-
Corey Bryant
-
Eric Blake
-
Kevin Wolf
-
Luiz Capitulino
-
Stefan Hajnoczi