On 06/26/2012 11:37 AM, Corey Bryant wrote:
On 06/26/2012 11:03 AM, Daniel P. Berrange wrote:
> On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
>> On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
>>>> So now from a client's POV you'd have a flow like
>>>>
>>>> * drive_add "file=/dev/fd/N" FDSET={N}
>>>
>>> IIUC then drive_add would loop and pass each fd in the set via
>>> SCM_RIGHTS?
>>
>> Yes, you'd probably use the JSON to tell QEMU exactly
>> how many FDs you're intending to pass with the command,
>> and then once the command is received, you'd have N*SCM_RIGHTS
>> messages sent/received.
>>
>>>>
>>>> And in QEMU you'd have something like
>>>>
>>>> * handle_monitor_command
>>>> - recvmsg all FDs, and stash them in a thread local
>>>> "FDContext"
>>>> context
>>>> - invoke monitor command handler
>>>> - Sees file=/dev/fd/N
>>>> - Fetch /dev/fd/N from "FDContext"
>>>> - If success remove /dev/fd/N from "FDContext"
>>>> - close() all FDs left in "FDContext"
>>>>
>>>> The key point with this is that because the FDs are directly
>>>> associated with a monitor command, QEMU can /guarantee/ that
>>>> FDs are never leaked, regardless of client behaviour.
>>>
>>> Wouldn't this leak fds if libvirt crashed part way through sending
>>> the set of fds?
>>
>> No, because you've scoped the FDs to the current monitor instance,
>> and the current command being processed you know to close all FDs
>> when the associated command has finished, as well as closing them
>> all if you see EOF on the monitor socket while in the middle of
>> receiving a command.
>
> Here is a quick proof of concept (ie untested) patch to demonstrate
> what I mean. It relies on Cory's patch which converts everything
> to use qemu_open. It is also still valuable to make the change
> to qemu_open() to support "/dev/fd/N" for passing FDs during
> QEMU initial startup for CLI args.
>
> IMHO, what I propose here is preferrable for QMP clients that
> our current plan of requiring use of 3 monitor commands (passfd,
> XXXXX, closefd).
Thanks for the PoC.
Two other required updates that I can think of would be:
1) Update change, block_stream, block_reopen, snapshot_blkdev, and
perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.
Nevermind my comment. I see that your PoC supports passing nfds for any
QMP command.
I'm curious what Kevin's thoughts are on this and the overall approach.
2) Support re-opening files with different access modes (O_RDONLY,
O_WRONLY, or O_RDWR). This would be similar to the force option for
pass-fd.
I'm still not quite sure how we'd go about this. We need a way to
determine the existing QEMU fd that is to be re-associated with the new
fd, when using a /dev/fdlist/0 filename. In this new approach, 0
corresponds to an index, not an fd. The prior approach of using the
/dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made this easy.
>
> From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001
> From: "Daniel P. Berrange" <berrange(a)redhat.com>
> Date: Tue, 26 Jun 2012 15:48:13 +0100
> Subject: [PATCH] Support passing FDs with the monitor command that
> uses them
>
> Add support for a syntax "/dev/fdlist/N", where 'N' refers to
> the N'th FD that was passed along with the currently processing
> command in this thread.
>
> The QMP client sends
>
> {
> "execute": "drive_add",
> "arguments": {
> "filename": "/dev/fdlist/0",
> "backingfilename": "/dev/fdlist/1" },
> "nfds": 2
> }
>
> followed by "nfds" * SCM_RIGHTS packets
>
> The FDs received along with the "drive_add" command are placed
> in a thread-local list, then the QMP handler function is invoked.
> The qemu_open() function will resolve any path "/dev/fdlist/0"
> to use an FD stored in the thread-local list, and dup() it.
> One the QMP handler function is finished, all FDs in the thread
> local list are closed.
>
> THe reason for choice of "/dev/fdlist/N", is to allow the
> "/dev/fd/N" syntax to be used for FDs that are passed to QEMU
> at startup time, which will be in the global FD number namespace,
> not the thread-local index.
>
> The reason for using a thread local is to ensure the FDs are
> only made available to the currently executing monitor command
> handler in this thread, and not any other threads.
>
> The advantage over using a separate 'passfd' command, is that
> this guarentees all FDs are closed in QEMU once the QMP command
> handler is finished. There is no risk of leaks being triggered
> by the client either intentionally, or via it crashing.
>
> NB: this is a proof of concept demonstration of the overall
> architectural design. The patch would need more work before
> it was suitable for merging
>
> - Pick a better place/file for the fdlist APIs
> - Do proper error reporting in various places
> - Check for possibility that qemu_chr_fe_get_msgfd
> can block or return EAGAIN.
>
> diff --git a/monitor.c b/monitor.c
> index f6107ba..a3a4bd4 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser
> *parser, QList *tokens)
> const mon_cmd_t *cmd;
> const char *cmd_name;
> Monitor *mon = cur_mon;
> + int nfds;
> + size_t i;
>
> args = input = NULL;
>
> @@ -4535,6 +4537,17 @@ static void
> handle_qmp_command(JSONMessageParser *parser, QList *tokens)
> goto err_out;
> }
>
> + nfds = qdict_get_int(input, "nfds");
> + for (i = 0 ; i < nfds ; i++) {
> + int fd = qemu_chr_fe_get_msgfd(mon->chr);
> + if (fd == -1) {
> + qerror_report(QERR_FD_NOT_SUPPLIED);
> + goto err_out;
> + }
> +
> + qemu_fdlist_add(fd);
> + }
> +
> if (handler_is_async(cmd)) {
> err = qmp_async_cmd_handler(mon, cmd, args);
> if (err) {
> @@ -4552,6 +4565,7 @@ err_out:
> out:
> QDECREF(input);
> QDECREF(args);
> + qemu_fdlist_closeall();
> }
>
> /**
> diff --git a/osdep.c b/osdep.c
> index 03817f0..0849cb6 100644
> --- a/osdep.c
> +++ b/osdep.c
> @@ -75,6 +75,81 @@ int qemu_madvise(void *addr, size_t len, int advice)
> #endif
> }
>
> +typedef struct QEMUFDList {
> + int *fds;
> + size_t nfds;
> +} QEMUFDList;
> +
> +static pthread_key_t fdlist_key;
> +
> +static void qemu_fdlist_cleanup(void *data)
> +{
> + size_t i;
> + QEMUFDList *fdlist = data;
> +
> + if (!fdlist)
> + return;
> +
> + for (i = 0 ; i < fdlist->nfds ; i++) {
> + close(fdlist->fds[i]);
> + }
> +}
> +
> +int qemu_fdlist_init(void)
> +{
> + if (pthread_key_create(&fdlist_key,
> + qemu_fdlist_cleanup) < 0)
> + return -1;
> +
> + return 0;
> +}
> +
> +static QEMUFDList *qemu_fdlist_for_thread(void)
> +{
> + QEMUFDList *fdlist = pthread_getspecific(fdlist_key);
> +
> + if (fdlist == NULL) {
> + fdlist = g_new0(QEMUFDList, 1);
> + pthread_setspecific(fdlist_key, fdlist);
> + }
> +
> + return fdlist;
> +}
> +
> +void qemu_fdlist_add(int fd)
> +{
> + QEMUFDList *fdlist = qemu_fdlist_for_thread();
> +
> + fdlist->fds = g_renew(int, fdlist->fds, fdlist->nfds + 1);
> + fdlist->fds[fdlist->nfds++] = fd;
> +}
> +
> +int qemu_fdlist_dup(int idx)
> +{
> + QEMUFDList *fdlist = qemu_fdlist_for_thread();
> +
> + if (idx >= fdlist->nfds) {
> + errno = EINVAL;
> + return -1;
> + }
> +
> + return dup(fdlist->fds[idx]);
> +}
> +
> +void qemu_fdlist_closeall(void)
> +{
> + QEMUFDList *fdlist = qemu_fdlist_for_thread();
> + size_t i;
> +
> + for (i = 0 ; i < fdlist->nfds ; i++) {
> + close(fdlist->fds[i]);
> + }
> + g_free(fdlist->fds);
> + fdlist->fds = NULL;
> + fdlist->nfds = 0;
> +}
> +
> +
>
> /*
> * Opens a file with FD_CLOEXEC set
> @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...)
> {
> int ret;
> int mode = 0;
> + const char *p;
> +
> + /* Attempt dup of fd for pre-opened file */
> + if (strstart(name, "/dev/fdlist/", &p)) {
> + int idx;
> + int fd;
> +
> + idx = qemu_parse_fd(p);
> + if (idx == -1) {
> + return -1;
> + }
> +
> + fd = qemu_fdlist_dup(idx);
> + if (fd == -1) {
> + return -1;
> + }
> +
> + /* XXX verify flags match */
> + return fd;
> + }
>
> if (flags & O_CREAT) {
> va_list ap;
> diff --git a/qemu-common.h b/qemu-common.h
> index 8f87e41..b6b7203 100644
> --- a/qemu-common.h
> +++ b/qemu-common.h
> @@ -194,6 +194,11 @@ ssize_t qemu_send_full(int fd, const void *buf,
> size_t count, int flags)
> ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags)
> QEMU_WARN_UNUSED_RESULT;
>
> +int qemu_fdlist_init(void);
> +void qemu_fdlist_add(int fd);
> +int qemu_fdlist_dup(int idx);
> +void qemu_fdlist_closeall(void);
> +
> #ifndef _WIN32
> int qemu_eventfd(int pipefd[2]);
> int qemu_pipe(int pipefd[2]);
> diff --git a/vl.c b/vl.c
> index 1329c30..e515c84 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2321,6 +2321,11 @@ int main(int argc, char **argv, char **envp)
> QLIST_INIT (&vm_change_state_head);
> os_setup_early_signal_handling();
>
> + if (qemu_fdlist_init() < 0) {
> + perror("fdlist");
> + exit(EXIT_FAILURE);
> + }
> +
> module_call_init(MODULE_INIT_MACHINE);
> machine = find_default_machine();
> cpu_model = NULL;
>
--
Regards,
Corey