[libvirt] [PATCH v4 0/7] file descriptor passing using pass-fd

libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it. sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation. A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files stored on the same NFS mount. This patch series adds the pass-fd QMP monitor command, which allows an fd to be passed via SCM_RIGHTS, and returns the received file descriptor. Support is also added to the block layer to allow QEMU to dup the fd when the filename is of the /dev/fd/X format. This is useful if MAC policy prevents QEMU from opening specific types of files. One nice thing about this approach is that no new SELinux policy is required to prevent open of NFS files (files with type nfs_t). The virt_use_nfs boolean type simply needs to be set to false, and open will be prevented (and dup will be allowed). For example: # setsebool virt_use_nfs 0 # getsebool virt_use_nfs virt_use_nfs --> off Corey Bryant (7): qemu-char: Add MSG_CMSG_CLOEXEC flag to recvmsg qapi: Convert getfd and closefd qapi: Add pass-fd QMP command qapi: Re-arrange monitor.c functions block: Prevent /dev/fd/X filename from being detected as floppy block: Convert open calls to qemu_open osdep: Enable qemu_open to dup pre-opened fd block/raw-posix.c | 22 +++++----- block/raw-win32.c | 4 +- block/vdi.c | 5 ++- block/vmdk.c | 21 ++++------ block/vpc.c | 2 +- block/vvfat.c | 21 +++++----- cutils.c | 26 +++++++++--- dump.c | 3 +- hmp-commands.hx | 6 +-- hmp.c | 18 ++++++++ hmp.h | 2 + main-loop.c | 6 +-- migration-fd.c | 2 +- monitor.c | 120 ++++++++++++++++++++++++++++++++--------------------- monitor.h | 2 +- net.c | 6 ++- osdep.c | 91 ++++++++++++++++++++++++++++++++++++++++ qapi-schema.json | 71 +++++++++++++++++++++++++++++++ qemu-char.c | 2 +- qemu-common.h | 2 +- qmp-commands.hx | 56 ++++++++++++++++++++++--- 21 files changed, 378 insertions(+), 110 deletions(-) -- 1.7.10.2

This sets the close-on-exec flag for the file descriptor received via SCM_RIGHTS. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4 -This patch is new in v4 (eblake@redhat.com) qemu-char.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qemu-char.c b/qemu-char.c index c2aaaee..f890113 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2263,7 +2263,7 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len) msg.msg_control = &msg_control; msg.msg_controllen = sizeof(msg_control); - ret = recvmsg(s->fd, &msg, 0); + ret = recvmsg(s->fd, &msg, MSG_CMSG_CLOEXEC); if (ret > 0 && s->is_unix) unix_process_msgfd(chr, &msg); -- 1.7.10.2

On 06/22/2012 12:36 PM, Corey Bryant wrote:
This sets the close-on-exec flag for the file descriptor received via SCM_RIGHTS.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4 -This patch is new in v4 (eblake@redhat.com)
qemu-char.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/qemu-char.c b/qemu-char.c index c2aaaee..f890113 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2263,7 +2263,7 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len) msg.msg_control = &msg_control; msg.msg_controllen = sizeof(msg_control);
- ret = recvmsg(s->fd, &msg, 0); + ret = recvmsg(s->fd, &msg, MSG_CMSG_CLOEXEC);
MSG_CMSG_CLOEXEC is not (yet) in POSIX (although it has been proposed for addition); therefore, at the moment, it only exists on Linux and Cygwin. Does this need to have conditional code to allow compilation on BSD, such as: #ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0 #endif as well as fallback code that sets FD_CLOEXEC manually via fcntl() when MSG_CMSG_CLOEXEC is missing? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Jun 22, 2012 at 01:31:17PM -0600, Eric Blake wrote:
On 06/22/2012 12:36 PM, Corey Bryant wrote:
This sets the close-on-exec flag for the file descriptor received via SCM_RIGHTS.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4 -This patch is new in v4 (eblake@redhat.com)
qemu-char.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/qemu-char.c b/qemu-char.c index c2aaaee..f890113 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2263,7 +2263,7 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len) msg.msg_control = &msg_control; msg.msg_controllen = sizeof(msg_control);
- ret = recvmsg(s->fd, &msg, 0); + ret = recvmsg(s->fd, &msg, MSG_CMSG_CLOEXEC);
MSG_CMSG_CLOEXEC is not (yet) in POSIX (although it has been proposed for addition); therefore, at the moment, it only exists on Linux and Cygwin. Does this need to have conditional code to allow compilation on BSD, such as:
#ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0 #endif
as well as fallback code that sets FD_CLOEXEC manually via fcntl() when MSG_CMSG_CLOEXEC is missing?
Good point. I think the answer is yes. Just like qemu_open() we can wrap "recvmsg(2) with fd" so that platforms with MSG_CMSG_CLOEXEC use that flag and other platforms use qemu_set_cloexec(). Stefan

On 06/25/2012 05:16 AM, Stefan Hajnoczi wrote:
On Fri, Jun 22, 2012 at 01:31:17PM -0600, Eric Blake wrote:
On 06/22/2012 12:36 PM, Corey Bryant wrote:
This sets the close-on-exec flag for the file descriptor received via SCM_RIGHTS.
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4 -This patch is new in v4 (eblake@redhat.com)
qemu-char.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/qemu-char.c b/qemu-char.c index c2aaaee..f890113 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -2263,7 +2263,7 @@ static ssize_t tcp_chr_recv(CharDriverState *chr, char *buf, size_t len) msg.msg_control = &msg_control; msg.msg_controllen = sizeof(msg_control);
- ret = recvmsg(s->fd, &msg, 0); + ret = recvmsg(s->fd, &msg, MSG_CMSG_CLOEXEC);
MSG_CMSG_CLOEXEC is not (yet) in POSIX (although it has been proposed for addition); therefore, at the moment, it only exists on Linux and Cygwin. Does this need to have conditional code to allow compilation on BSD, such as:
#ifndef MSG_CMSG_CLOEXEC # define MSG_CMSG_CLOEXEC 0 #endif
as well as fallback code that sets FD_CLOEXEC manually via fcntl() when MSG_CMSG_CLOEXEC is missing?
Good point. I think the answer is yes. Just like qemu_open() we can wrap "recvmsg(2) with fd" so that platforms with MSG_CMSG_CLOEXEC use that flag and other platforms use qemu_set_cloexec().
Stefan
Thanks for pointing this out. I'll make the updates in v5. -- Regards, Corey

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v2: -Convert getfd and closefd to QAPI (lcapitulino@redhat.com) -Remove changes that returned fd from getfd (lcapitulino@redhat.com) -Wrap hmp_* functions around qmp_* functions (kwolf@redhat.com) -Move hmp_* functions to hmp.c (lcapitulino@redhat.com) -Drop .user_print lines (lcapitulino@redhat.com) -Use 'cmd' instead of 'cmd_new' for HMP (lcapitulino@redhat.com) -Change QMP command existance back to 0.14 (lcapitulino@redhat.com) v3: -Remove unnecessary return statements from qmp_* functions (lcapitulino@redhat.com) -Add notes to QMP command describing behavior in more detail (lcapitulino@redhat.com, eblake@redhat.com) v4: -No changes hmp-commands.hx | 6 ++---- hmp.c | 18 ++++++++++++++++++ hmp.h | 2 ++ monitor.c | 32 ++++++++++++++------------------ qapi-schema.json | 35 +++++++++++++++++++++++++++++++++++ qmp-commands.hx | 14 ++++++++++---- 6 files changed, 81 insertions(+), 26 deletions(-) diff --git a/hmp-commands.hx b/hmp-commands.hx index f5d9d91..eea8b32 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1236,8 +1236,7 @@ ETEXI .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd = hmp_getfd, }, STEXI @@ -1253,8 +1252,7 @@ ETEXI .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd = hmp_closefd, }, STEXI diff --git a/hmp.c b/hmp.c index 2ce8cb9..6241856 100644 --- a/hmp.c +++ b/hmp.c @@ -999,3 +999,21 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict) qmp_netdev_del(id, &err); hmp_handle_error(mon, &err); } + +void hmp_getfd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_getfd(fdname, &errp); + hmp_handle_error(mon, &errp); +} + +void hmp_closefd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_closefd(fdname, &errp); + hmp_handle_error(mon, &errp); +} diff --git a/hmp.h b/hmp.h index 79d138d..8d2b0d7 100644 --- a/hmp.h +++ b/hmp.h @@ -64,5 +64,7 @@ void hmp_device_del(Monitor *mon, const QDict *qdict); void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); +void hmp_getfd(Monitor *mon, const QDict *qdict); +void hmp_closefd(Monitor *mon, const QDict *qdict); #endif diff --git a/monitor.c b/monitor.c index a3bc2c7..1a7f7e7 100644 --- a/monitor.c +++ b/monitor.c @@ -2182,48 +2182,45 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict) } #endif -static int do_getfd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_getfd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd; int fd; - fd = qemu_chr_fe_get_msgfd(mon->chr); + fd = qemu_chr_fe_get_msgfd(cur_mon->chr); if (fd == -1) { - qerror_report(QERR_FD_NOT_SUPPLIED); - return -1; + error_set(errp, QERR_FD_NOT_SUPPLIED); + return; } if (qemu_isdigit(fdname[0])) { - qerror_report(QERR_INVALID_PARAMETER_VALUE, "fdname", - "a name not starting with a digit"); - return -1; + error_set(errp, QERR_INVALID_PARAMETER_VALUE, "fdname", + "a name not starting with a digit"); + return; } - QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; } close(monfd->fd); monfd->fd = fd; - return 0; + return; } monfd = g_malloc0(sizeof(mon_fd_t)); monfd->name = g_strdup(fdname); monfd->fd = fd; - QLIST_INSERT_HEAD(&mon->fds, monfd, next); - return 0; + QLIST_INSERT_HEAD(&cur_mon->fds, monfd, next); } -static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_closefd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd; - QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; } @@ -2232,11 +2229,10 @@ static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) close(monfd->fd); g_free(monfd->name); g_free(monfd); - return 0; + return; } - qerror_report(QERR_FD_NOT_FOUND, fdname); - return -1; + error_set(errp, QERR_FD_NOT_FOUND, fdname); } static void do_loadvm(Monitor *mon, const QDict *qdict) diff --git a/qapi-schema.json b/qapi-schema.json index 3b6e346..26a6b84 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1862,3 +1862,38 @@ # Since: 0.14.0 ## { 'command': 'netdev_del', 'data': {'id': 'str'} } + +## +# @getfd: +# +# Receive a file descriptor via SCM rights and assign it a name +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If file descriptor was not received, FdNotSupplied +# If @fdname is not valid, InvalidParameterType +# +# Since: 0.14.0 +# +# Notes: If @fdname already exists, the file descriptor assigned to +# it will be closed and replaced by the received file +# descriptor. +# The 'closefd' command can be used to explicitly close the +# file descriptor when it is no longer needed. +## +{ 'command': 'getfd', 'data': {'fdname': 'str'} } + +## +# @closefd: +# +# Close a file descriptor previously passed via SCM rights +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If @fdname is not found, FdNotFound +# +# Since: 0.14.0 +## +{ 'command': 'closefd', 'data': {'fdname': 'str'} } diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e1a38e..e3cf3c5 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -873,8 +873,7 @@ EQMP .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd_new = qmp_marshal_input_getfd, }, SQMP @@ -892,6 +891,14 @@ Example: -> { "execute": "getfd", "arguments": { "fdname": "fd1" } } <- { "return": {} } +Notes: + +(1) If the name specified by the "fdname" argument already exists, + the file descriptor assigned to it will be closed and replaced + by the received file descriptor. +(2) The 'closefd' command can be used to explicitly close the file + descriptor when it is no longer needed. + EQMP { @@ -899,8 +906,7 @@ EQMP .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd_new = qmp_marshal_input_closefd, }, SQMP -- 1.7.10.2

On Fri, 22 Jun 2012 14:36:09 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
If this patch doesn't change anymore you can add: Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
--- v2: -Convert getfd and closefd to QAPI (lcapitulino@redhat.com) -Remove changes that returned fd from getfd (lcapitulino@redhat.com) -Wrap hmp_* functions around qmp_* functions (kwolf@redhat.com) -Move hmp_* functions to hmp.c (lcapitulino@redhat.com) -Drop .user_print lines (lcapitulino@redhat.com) -Use 'cmd' instead of 'cmd_new' for HMP (lcapitulino@redhat.com) -Change QMP command existance back to 0.14 (lcapitulino@redhat.com)
v3: -Remove unnecessary return statements from qmp_* functions (lcapitulino@redhat.com) -Add notes to QMP command describing behavior in more detail (lcapitulino@redhat.com, eblake@redhat.com)
v4: -No changes
hmp-commands.hx | 6 ++---- hmp.c | 18 ++++++++++++++++++ hmp.h | 2 ++ monitor.c | 32 ++++++++++++++------------------ qapi-schema.json | 35 +++++++++++++++++++++++++++++++++++ qmp-commands.hx | 14 ++++++++++---- 6 files changed, 81 insertions(+), 26 deletions(-)
diff --git a/hmp-commands.hx b/hmp-commands.hx index f5d9d91..eea8b32 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1236,8 +1236,7 @@ ETEXI .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd = hmp_getfd, },
STEXI @@ -1253,8 +1252,7 @@ ETEXI .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd = hmp_closefd, },
STEXI diff --git a/hmp.c b/hmp.c index 2ce8cb9..6241856 100644 --- a/hmp.c +++ b/hmp.c @@ -999,3 +999,21 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict) qmp_netdev_del(id, &err); hmp_handle_error(mon, &err); } + +void hmp_getfd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_getfd(fdname, &errp); + hmp_handle_error(mon, &errp); +} + +void hmp_closefd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_closefd(fdname, &errp); + hmp_handle_error(mon, &errp); +} diff --git a/hmp.h b/hmp.h index 79d138d..8d2b0d7 100644 --- a/hmp.h +++ b/hmp.h @@ -64,5 +64,7 @@ void hmp_device_del(Monitor *mon, const QDict *qdict); void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); +void hmp_getfd(Monitor *mon, const QDict *qdict); +void hmp_closefd(Monitor *mon, const QDict *qdict);
#endif diff --git a/monitor.c b/monitor.c index a3bc2c7..1a7f7e7 100644 --- a/monitor.c +++ b/monitor.c @@ -2182,48 +2182,45 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict) } #endif
-static int do_getfd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_getfd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd; int fd;
- fd = qemu_chr_fe_get_msgfd(mon->chr); + fd = qemu_chr_fe_get_msgfd(cur_mon->chr); if (fd == -1) { - qerror_report(QERR_FD_NOT_SUPPLIED); - return -1; + error_set(errp, QERR_FD_NOT_SUPPLIED); + return; }
if (qemu_isdigit(fdname[0])) { - qerror_report(QERR_INVALID_PARAMETER_VALUE, "fdname", - "a name not starting with a digit"); - return -1; + error_set(errp, QERR_INVALID_PARAMETER_VALUE, "fdname", + "a name not starting with a digit"); + return; }
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; }
close(monfd->fd); monfd->fd = fd; - return 0; + return; }
monfd = g_malloc0(sizeof(mon_fd_t)); monfd->name = g_strdup(fdname); monfd->fd = fd;
- QLIST_INSERT_HEAD(&mon->fds, monfd, next); - return 0; + QLIST_INSERT_HEAD(&cur_mon->fds, monfd, next); }
-static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_closefd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd;
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; } @@ -2232,11 +2229,10 @@ static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) close(monfd->fd); g_free(monfd->name); g_free(monfd); - return 0; + return; }
- qerror_report(QERR_FD_NOT_FOUND, fdname); - return -1; + error_set(errp, QERR_FD_NOT_FOUND, fdname); }
static void do_loadvm(Monitor *mon, const QDict *qdict) diff --git a/qapi-schema.json b/qapi-schema.json index 3b6e346..26a6b84 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1862,3 +1862,38 @@ # Since: 0.14.0 ## { 'command': 'netdev_del', 'data': {'id': 'str'} } + +## +# @getfd: +# +# Receive a file descriptor via SCM rights and assign it a name +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If file descriptor was not received, FdNotSupplied +# If @fdname is not valid, InvalidParameterType +# +# Since: 0.14.0 +# +# Notes: If @fdname already exists, the file descriptor assigned to +# it will be closed and replaced by the received file +# descriptor. +# The 'closefd' command can be used to explicitly close the +# file descriptor when it is no longer needed. +## +{ 'command': 'getfd', 'data': {'fdname': 'str'} } + +## +# @closefd: +# +# Close a file descriptor previously passed via SCM rights +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If @fdname is not found, FdNotFound +# +# Since: 0.14.0 +## +{ 'command': 'closefd', 'data': {'fdname': 'str'} } diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e1a38e..e3cf3c5 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -873,8 +873,7 @@ EQMP .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd_new = qmp_marshal_input_getfd, },
SQMP @@ -892,6 +891,14 @@ Example: -> { "execute": "getfd", "arguments": { "fdname": "fd1" } } <- { "return": {} }
+Notes: + +(1) If the name specified by the "fdname" argument already exists, + the file descriptor assigned to it will be closed and replaced + by the received file descriptor. +(2) The 'closefd' command can be used to explicitly close the file + descriptor when it is no longer needed. + EQMP
{ @@ -899,8 +906,7 @@ EQMP .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd_new = qmp_marshal_input_closefd, },
SQMP

On 06/26/2012 04:44 PM, Luiz Capitulino wrote:
On Fri, 22 Jun 2012 14:36:09 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
If this patch doesn't change anymore you can add:
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Thanks! I'll do that. I'll assume we still want this patch even if pass-fd support is dropped.
--- v2: -Convert getfd and closefd to QAPI (lcapitulino@redhat.com) -Remove changes that returned fd from getfd (lcapitulino@redhat.com) -Wrap hmp_* functions around qmp_* functions (kwolf@redhat.com) -Move hmp_* functions to hmp.c (lcapitulino@redhat.com) -Drop .user_print lines (lcapitulino@redhat.com) -Use 'cmd' instead of 'cmd_new' for HMP (lcapitulino@redhat.com) -Change QMP command existance back to 0.14 (lcapitulino@redhat.com)
v3: -Remove unnecessary return statements from qmp_* functions (lcapitulino@redhat.com) -Add notes to QMP command describing behavior in more detail (lcapitulino@redhat.com, eblake@redhat.com)
v4: -No changes
hmp-commands.hx | 6 ++---- hmp.c | 18 ++++++++++++++++++ hmp.h | 2 ++ monitor.c | 32 ++++++++++++++------------------ qapi-schema.json | 35 +++++++++++++++++++++++++++++++++++ qmp-commands.hx | 14 ++++++++++---- 6 files changed, 81 insertions(+), 26 deletions(-)
diff --git a/hmp-commands.hx b/hmp-commands.hx index f5d9d91..eea8b32 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1236,8 +1236,7 @@ ETEXI .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd = hmp_getfd, },
STEXI @@ -1253,8 +1252,7 @@ ETEXI .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd = hmp_closefd, },
STEXI diff --git a/hmp.c b/hmp.c index 2ce8cb9..6241856 100644 --- a/hmp.c +++ b/hmp.c @@ -999,3 +999,21 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict) qmp_netdev_del(id, &err); hmp_handle_error(mon, &err); } + +void hmp_getfd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_getfd(fdname, &errp); + hmp_handle_error(mon, &errp); +} + +void hmp_closefd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_closefd(fdname, &errp); + hmp_handle_error(mon, &errp); +} diff --git a/hmp.h b/hmp.h index 79d138d..8d2b0d7 100644 --- a/hmp.h +++ b/hmp.h @@ -64,5 +64,7 @@ void hmp_device_del(Monitor *mon, const QDict *qdict); void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); +void hmp_getfd(Monitor *mon, const QDict *qdict); +void hmp_closefd(Monitor *mon, const QDict *qdict);
#endif diff --git a/monitor.c b/monitor.c index a3bc2c7..1a7f7e7 100644 --- a/monitor.c +++ b/monitor.c @@ -2182,48 +2182,45 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict) } #endif
-static int do_getfd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_getfd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd; int fd;
- fd = qemu_chr_fe_get_msgfd(mon->chr); + fd = qemu_chr_fe_get_msgfd(cur_mon->chr); if (fd == -1) { - qerror_report(QERR_FD_NOT_SUPPLIED); - return -1; + error_set(errp, QERR_FD_NOT_SUPPLIED); + return; }
if (qemu_isdigit(fdname[0])) { - qerror_report(QERR_INVALID_PARAMETER_VALUE, "fdname", - "a name not starting with a digit"); - return -1; + error_set(errp, QERR_INVALID_PARAMETER_VALUE, "fdname", + "a name not starting with a digit"); + return; }
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; }
close(monfd->fd); monfd->fd = fd; - return 0; + return; }
monfd = g_malloc0(sizeof(mon_fd_t)); monfd->name = g_strdup(fdname); monfd->fd = fd;
- QLIST_INSERT_HEAD(&mon->fds, monfd, next); - return 0; + QLIST_INSERT_HEAD(&cur_mon->fds, monfd, next); }
-static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_closefd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd;
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; } @@ -2232,11 +2229,10 @@ static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) close(monfd->fd); g_free(monfd->name); g_free(monfd); - return 0; + return; }
- qerror_report(QERR_FD_NOT_FOUND, fdname); - return -1; + error_set(errp, QERR_FD_NOT_FOUND, fdname); }
static void do_loadvm(Monitor *mon, const QDict *qdict) diff --git a/qapi-schema.json b/qapi-schema.json index 3b6e346..26a6b84 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1862,3 +1862,38 @@ # Since: 0.14.0 ## { 'command': 'netdev_del', 'data': {'id': 'str'} } + +## +# @getfd: +# +# Receive a file descriptor via SCM rights and assign it a name +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If file descriptor was not received, FdNotSupplied +# If @fdname is not valid, InvalidParameterType +# +# Since: 0.14.0 +# +# Notes: If @fdname already exists, the file descriptor assigned to +# it will be closed and replaced by the received file +# descriptor. +# The 'closefd' command can be used to explicitly close the +# file descriptor when it is no longer needed. +## +{ 'command': 'getfd', 'data': {'fdname': 'str'} } + +## +# @closefd: +# +# Close a file descriptor previously passed via SCM rights +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If @fdname is not found, FdNotFound +# +# Since: 0.14.0 +## +{ 'command': 'closefd', 'data': {'fdname': 'str'} } diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e1a38e..e3cf3c5 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -873,8 +873,7 @@ EQMP .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd_new = qmp_marshal_input_getfd, },
SQMP @@ -892,6 +891,14 @@ Example: -> { "execute": "getfd", "arguments": { "fdname": "fd1" } } <- { "return": {} }
+Notes: + +(1) If the name specified by the "fdname" argument already exists, + the file descriptor assigned to it will be closed and replaced + by the received file descriptor. +(2) The 'closefd' command can be used to explicitly close the file + descriptor when it is no longer needed. + EQMP
{ @@ -899,8 +906,7 @@ EQMP .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd_new = qmp_marshal_input_closefd, },
SQMP
-- Regards, Corey

On Tue, 26 Jun 2012 17:15:05 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 06/26/2012 04:44 PM, Luiz Capitulino wrote:
On Fri, 22 Jun 2012 14:36:09 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
If this patch doesn't change anymore you can add:
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Thanks! I'll do that. I'll assume we still want this patch even if pass-fd support is dropped.
Yes. Actually, I could cherry pick it but I believe it will cause conflicts to you if you don't include it?
--- v2: -Convert getfd and closefd to QAPI (lcapitulino@redhat.com) -Remove changes that returned fd from getfd (lcapitulino@redhat.com) -Wrap hmp_* functions around qmp_* functions (kwolf@redhat.com) -Move hmp_* functions to hmp.c (lcapitulino@redhat.com) -Drop .user_print lines (lcapitulino@redhat.com) -Use 'cmd' instead of 'cmd_new' for HMP (lcapitulino@redhat.com) -Change QMP command existance back to 0.14 (lcapitulino@redhat.com)
v3: -Remove unnecessary return statements from qmp_* functions (lcapitulino@redhat.com) -Add notes to QMP command describing behavior in more detail (lcapitulino@redhat.com, eblake@redhat.com)
v4: -No changes
hmp-commands.hx | 6 ++---- hmp.c | 18 ++++++++++++++++++ hmp.h | 2 ++ monitor.c | 32 ++++++++++++++------------------ qapi-schema.json | 35 +++++++++++++++++++++++++++++++++++ qmp-commands.hx | 14 ++++++++++---- 6 files changed, 81 insertions(+), 26 deletions(-)
diff --git a/hmp-commands.hx b/hmp-commands.hx index f5d9d91..eea8b32 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1236,8 +1236,7 @@ ETEXI .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd = hmp_getfd, },
STEXI @@ -1253,8 +1252,7 @@ ETEXI .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd = hmp_closefd, },
STEXI diff --git a/hmp.c b/hmp.c index 2ce8cb9..6241856 100644 --- a/hmp.c +++ b/hmp.c @@ -999,3 +999,21 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict) qmp_netdev_del(id, &err); hmp_handle_error(mon, &err); } + +void hmp_getfd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_getfd(fdname, &errp); + hmp_handle_error(mon, &errp); +} + +void hmp_closefd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_closefd(fdname, &errp); + hmp_handle_error(mon, &errp); +} diff --git a/hmp.h b/hmp.h index 79d138d..8d2b0d7 100644 --- a/hmp.h +++ b/hmp.h @@ -64,5 +64,7 @@ void hmp_device_del(Monitor *mon, const QDict *qdict); void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); +void hmp_getfd(Monitor *mon, const QDict *qdict); +void hmp_closefd(Monitor *mon, const QDict *qdict);
#endif diff --git a/monitor.c b/monitor.c index a3bc2c7..1a7f7e7 100644 --- a/monitor.c +++ b/monitor.c @@ -2182,48 +2182,45 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict) } #endif
-static int do_getfd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_getfd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd; int fd;
- fd = qemu_chr_fe_get_msgfd(mon->chr); + fd = qemu_chr_fe_get_msgfd(cur_mon->chr); if (fd == -1) { - qerror_report(QERR_FD_NOT_SUPPLIED); - return -1; + error_set(errp, QERR_FD_NOT_SUPPLIED); + return; }
if (qemu_isdigit(fdname[0])) { - qerror_report(QERR_INVALID_PARAMETER_VALUE, "fdname", - "a name not starting with a digit"); - return -1; + error_set(errp, QERR_INVALID_PARAMETER_VALUE, "fdname", + "a name not starting with a digit"); + return; }
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; }
close(monfd->fd); monfd->fd = fd; - return 0; + return; }
monfd = g_malloc0(sizeof(mon_fd_t)); monfd->name = g_strdup(fdname); monfd->fd = fd;
- QLIST_INSERT_HEAD(&mon->fds, monfd, next); - return 0; + QLIST_INSERT_HEAD(&cur_mon->fds, monfd, next); }
-static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_closefd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd;
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; } @@ -2232,11 +2229,10 @@ static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) close(monfd->fd); g_free(monfd->name); g_free(monfd); - return 0; + return; }
- qerror_report(QERR_FD_NOT_FOUND, fdname); - return -1; + error_set(errp, QERR_FD_NOT_FOUND, fdname); }
static void do_loadvm(Monitor *mon, const QDict *qdict) diff --git a/qapi-schema.json b/qapi-schema.json index 3b6e346..26a6b84 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1862,3 +1862,38 @@ # Since: 0.14.0 ## { 'command': 'netdev_del', 'data': {'id': 'str'} } + +## +# @getfd: +# +# Receive a file descriptor via SCM rights and assign it a name +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If file descriptor was not received, FdNotSupplied +# If @fdname is not valid, InvalidParameterType +# +# Since: 0.14.0 +# +# Notes: If @fdname already exists, the file descriptor assigned to +# it will be closed and replaced by the received file +# descriptor. +# The 'closefd' command can be used to explicitly close the +# file descriptor when it is no longer needed. +## +{ 'command': 'getfd', 'data': {'fdname': 'str'} } + +## +# @closefd: +# +# Close a file descriptor previously passed via SCM rights +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If @fdname is not found, FdNotFound +# +# Since: 0.14.0 +## +{ 'command': 'closefd', 'data': {'fdname': 'str'} } diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e1a38e..e3cf3c5 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -873,8 +873,7 @@ EQMP .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd_new = qmp_marshal_input_getfd, },
SQMP @@ -892,6 +891,14 @@ Example: -> { "execute": "getfd", "arguments": { "fdname": "fd1" } } <- { "return": {} }
+Notes: + +(1) If the name specified by the "fdname" argument already exists, + the file descriptor assigned to it will be closed and replaced + by the received file descriptor. +(2) The 'closefd' command can be used to explicitly close the file + descriptor when it is no longer needed. + EQMP
{ @@ -899,8 +906,7 @@ EQMP .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd_new = qmp_marshal_input_closefd, },
SQMP

On 06/26/2012 05:28 PM, Luiz Capitulino wrote:
On Tue, 26 Jun 2012 17:15:05 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 06/26/2012 04:44 PM, Luiz Capitulino wrote:
On Fri, 22 Jun 2012 14:36:09 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
If this patch doesn't change anymore you can add:
Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
Thanks! I'll do that. I'll assume we still want this patch even if pass-fd support is dropped.
Yes. Actually, I could cherry pick it but I believe it will cause conflicts to you if you don't include it?
It might just be easier if I keep it in my series for now, at least until we figure out what direction we're going to take regarding pass-fd vs passing fd's via SCM_RIGHTS with commands like drive_add, etc.
--- v2: -Convert getfd and closefd to QAPI (lcapitulino@redhat.com) -Remove changes that returned fd from getfd (lcapitulino@redhat.com) -Wrap hmp_* functions around qmp_* functions (kwolf@redhat.com) -Move hmp_* functions to hmp.c (lcapitulino@redhat.com) -Drop .user_print lines (lcapitulino@redhat.com) -Use 'cmd' instead of 'cmd_new' for HMP (lcapitulino@redhat.com) -Change QMP command existance back to 0.14 (lcapitulino@redhat.com)
v3: -Remove unnecessary return statements from qmp_* functions (lcapitulino@redhat.com) -Add notes to QMP command describing behavior in more detail (lcapitulino@redhat.com, eblake@redhat.com)
v4: -No changes
hmp-commands.hx | 6 ++---- hmp.c | 18 ++++++++++++++++++ hmp.h | 2 ++ monitor.c | 32 ++++++++++++++------------------ qapi-schema.json | 35 +++++++++++++++++++++++++++++++++++ qmp-commands.hx | 14 ++++++++++---- 6 files changed, 81 insertions(+), 26 deletions(-)
diff --git a/hmp-commands.hx b/hmp-commands.hx index f5d9d91..eea8b32 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1236,8 +1236,7 @@ ETEXI .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd = hmp_getfd, },
STEXI @@ -1253,8 +1252,7 @@ ETEXI .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd = hmp_closefd, },
STEXI diff --git a/hmp.c b/hmp.c index 2ce8cb9..6241856 100644 --- a/hmp.c +++ b/hmp.c @@ -999,3 +999,21 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict) qmp_netdev_del(id, &err); hmp_handle_error(mon, &err); } + +void hmp_getfd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_getfd(fdname, &errp); + hmp_handle_error(mon, &errp); +} + +void hmp_closefd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_closefd(fdname, &errp); + hmp_handle_error(mon, &errp); +} diff --git a/hmp.h b/hmp.h index 79d138d..8d2b0d7 100644 --- a/hmp.h +++ b/hmp.h @@ -64,5 +64,7 @@ void hmp_device_del(Monitor *mon, const QDict *qdict); void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); +void hmp_getfd(Monitor *mon, const QDict *qdict); +void hmp_closefd(Monitor *mon, const QDict *qdict);
#endif diff --git a/monitor.c b/monitor.c index a3bc2c7..1a7f7e7 100644 --- a/monitor.c +++ b/monitor.c @@ -2182,48 +2182,45 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict) } #endif
-static int do_getfd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_getfd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd; int fd;
- fd = qemu_chr_fe_get_msgfd(mon->chr); + fd = qemu_chr_fe_get_msgfd(cur_mon->chr); if (fd == -1) { - qerror_report(QERR_FD_NOT_SUPPLIED); - return -1; + error_set(errp, QERR_FD_NOT_SUPPLIED); + return; }
if (qemu_isdigit(fdname[0])) { - qerror_report(QERR_INVALID_PARAMETER_VALUE, "fdname", - "a name not starting with a digit"); - return -1; + error_set(errp, QERR_INVALID_PARAMETER_VALUE, "fdname", + "a name not starting with a digit"); + return; }
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; }
close(monfd->fd); monfd->fd = fd; - return 0; + return; }
monfd = g_malloc0(sizeof(mon_fd_t)); monfd->name = g_strdup(fdname); monfd->fd = fd;
- QLIST_INSERT_HEAD(&mon->fds, monfd, next); - return 0; + QLIST_INSERT_HEAD(&cur_mon->fds, monfd, next); }
-static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_closefd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd;
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; } @@ -2232,11 +2229,10 @@ static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) close(monfd->fd); g_free(monfd->name); g_free(monfd); - return 0; + return; }
- qerror_report(QERR_FD_NOT_FOUND, fdname); - return -1; + error_set(errp, QERR_FD_NOT_FOUND, fdname); }
static void do_loadvm(Monitor *mon, const QDict *qdict) diff --git a/qapi-schema.json b/qapi-schema.json index 3b6e346..26a6b84 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1862,3 +1862,38 @@ # Since: 0.14.0 ## { 'command': 'netdev_del', 'data': {'id': 'str'} } + +## +# @getfd: +# +# Receive a file descriptor via SCM rights and assign it a name +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If file descriptor was not received, FdNotSupplied +# If @fdname is not valid, InvalidParameterType +# +# Since: 0.14.0 +# +# Notes: If @fdname already exists, the file descriptor assigned to +# it will be closed and replaced by the received file +# descriptor. +# The 'closefd' command can be used to explicitly close the +# file descriptor when it is no longer needed. +## +{ 'command': 'getfd', 'data': {'fdname': 'str'} } + +## +# @closefd: +# +# Close a file descriptor previously passed via SCM rights +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If @fdname is not found, FdNotFound +# +# Since: 0.14.0 +## +{ 'command': 'closefd', 'data': {'fdname': 'str'} } diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e1a38e..e3cf3c5 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -873,8 +873,7 @@ EQMP .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd_new = qmp_marshal_input_getfd, },
SQMP @@ -892,6 +891,14 @@ Example: -> { "execute": "getfd", "arguments": { "fdname": "fd1" } } <- { "return": {} }
+Notes: + +(1) If the name specified by the "fdname" argument already exists, + the file descriptor assigned to it will be closed and replaced + by the received file descriptor. +(2) The 'closefd' command can be used to explicitly close the file + descriptor when it is no longer needed. + EQMP
{ @@ -899,8 +906,7 @@ EQMP .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd_new = qmp_marshal_input_closefd, },
SQMP
-- Regards, Corey

On Fri, 22 Jun 2012 14:36:09 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com>
I've cherry-picked this one into the qmp tree.
--- v2: -Convert getfd and closefd to QAPI (lcapitulino@redhat.com) -Remove changes that returned fd from getfd (lcapitulino@redhat.com) -Wrap hmp_* functions around qmp_* functions (kwolf@redhat.com) -Move hmp_* functions to hmp.c (lcapitulino@redhat.com) -Drop .user_print lines (lcapitulino@redhat.com) -Use 'cmd' instead of 'cmd_new' for HMP (lcapitulino@redhat.com) -Change QMP command existance back to 0.14 (lcapitulino@redhat.com)
v3: -Remove unnecessary return statements from qmp_* functions (lcapitulino@redhat.com) -Add notes to QMP command describing behavior in more detail (lcapitulino@redhat.com, eblake@redhat.com)
v4: -No changes
hmp-commands.hx | 6 ++---- hmp.c | 18 ++++++++++++++++++ hmp.h | 2 ++ monitor.c | 32 ++++++++++++++------------------ qapi-schema.json | 35 +++++++++++++++++++++++++++++++++++ qmp-commands.hx | 14 ++++++++++---- 6 files changed, 81 insertions(+), 26 deletions(-)
diff --git a/hmp-commands.hx b/hmp-commands.hx index f5d9d91..eea8b32 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -1236,8 +1236,7 @@ ETEXI .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd = hmp_getfd, },
STEXI @@ -1253,8 +1252,7 @@ ETEXI .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd = hmp_closefd, },
STEXI diff --git a/hmp.c b/hmp.c index 2ce8cb9..6241856 100644 --- a/hmp.c +++ b/hmp.c @@ -999,3 +999,21 @@ void hmp_netdev_del(Monitor *mon, const QDict *qdict) qmp_netdev_del(id, &err); hmp_handle_error(mon, &err); } + +void hmp_getfd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_getfd(fdname, &errp); + hmp_handle_error(mon, &errp); +} + +void hmp_closefd(Monitor *mon, const QDict *qdict) +{ + const char *fdname = qdict_get_str(qdict, "fdname"); + Error *errp = NULL; + + qmp_closefd(fdname, &errp); + hmp_handle_error(mon, &errp); +} diff --git a/hmp.h b/hmp.h index 79d138d..8d2b0d7 100644 --- a/hmp.h +++ b/hmp.h @@ -64,5 +64,7 @@ void hmp_device_del(Monitor *mon, const QDict *qdict); void hmp_dump_guest_memory(Monitor *mon, const QDict *qdict); void hmp_netdev_add(Monitor *mon, const QDict *qdict); void hmp_netdev_del(Monitor *mon, const QDict *qdict); +void hmp_getfd(Monitor *mon, const QDict *qdict); +void hmp_closefd(Monitor *mon, const QDict *qdict);
#endif diff --git a/monitor.c b/monitor.c index a3bc2c7..1a7f7e7 100644 --- a/monitor.c +++ b/monitor.c @@ -2182,48 +2182,45 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict) } #endif
-static int do_getfd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_getfd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd; int fd;
- fd = qemu_chr_fe_get_msgfd(mon->chr); + fd = qemu_chr_fe_get_msgfd(cur_mon->chr); if (fd == -1) { - qerror_report(QERR_FD_NOT_SUPPLIED); - return -1; + error_set(errp, QERR_FD_NOT_SUPPLIED); + return; }
if (qemu_isdigit(fdname[0])) { - qerror_report(QERR_INVALID_PARAMETER_VALUE, "fdname", - "a name not starting with a digit"); - return -1; + error_set(errp, QERR_INVALID_PARAMETER_VALUE, "fdname", + "a name not starting with a digit"); + return; }
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; }
close(monfd->fd); monfd->fd = fd; - return 0; + return; }
monfd = g_malloc0(sizeof(mon_fd_t)); monfd->name = g_strdup(fdname); monfd->fd = fd;
- QLIST_INSERT_HEAD(&mon->fds, monfd, next); - return 0; + QLIST_INSERT_HEAD(&cur_mon->fds, monfd, next); }
-static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) +void qmp_closefd(const char *fdname, Error **errp) { - const char *fdname = qdict_get_str(qdict, "fdname"); mon_fd_t *monfd;
- QLIST_FOREACH(monfd, &mon->fds, next) { + QLIST_FOREACH(monfd, &cur_mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; } @@ -2232,11 +2229,10 @@ static int do_closefd(Monitor *mon, const QDict *qdict, QObject **ret_data) close(monfd->fd); g_free(monfd->name); g_free(monfd); - return 0; + return; }
- qerror_report(QERR_FD_NOT_FOUND, fdname); - return -1; + error_set(errp, QERR_FD_NOT_FOUND, fdname); }
static void do_loadvm(Monitor *mon, const QDict *qdict) diff --git a/qapi-schema.json b/qapi-schema.json index 3b6e346..26a6b84 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1862,3 +1862,38 @@ # Since: 0.14.0 ## { 'command': 'netdev_del', 'data': {'id': 'str'} } + +## +# @getfd: +# +# Receive a file descriptor via SCM rights and assign it a name +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If file descriptor was not received, FdNotSupplied +# If @fdname is not valid, InvalidParameterType +# +# Since: 0.14.0 +# +# Notes: If @fdname already exists, the file descriptor assigned to +# it will be closed and replaced by the received file +# descriptor. +# The 'closefd' command can be used to explicitly close the +# file descriptor when it is no longer needed. +## +{ 'command': 'getfd', 'data': {'fdname': 'str'} } + +## +# @closefd: +# +# Close a file descriptor previously passed via SCM rights +# +# @fdname: file descriptor name +# +# Returns: Nothing on success +# If @fdname is not found, FdNotFound +# +# Since: 0.14.0 +## +{ 'command': 'closefd', 'data': {'fdname': 'str'} } diff --git a/qmp-commands.hx b/qmp-commands.hx index 2e1a38e..e3cf3c5 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -873,8 +873,7 @@ EQMP .args_type = "fdname:s", .params = "getfd name", .help = "receive a file descriptor via SCM rights and assign it a name", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_getfd, + .mhandler.cmd_new = qmp_marshal_input_getfd, },
SQMP @@ -892,6 +891,14 @@ Example: -> { "execute": "getfd", "arguments": { "fdname": "fd1" } } <- { "return": {} }
+Notes: + +(1) If the name specified by the "fdname" argument already exists, + the file descriptor assigned to it will be closed and replaced + by the received file descriptor. +(2) The 'closefd' command can be used to explicitly close the file + descriptor when it is no longer needed. + EQMP
{ @@ -899,8 +906,7 @@ EQMP .args_type = "fdname:s", .params = "closefd name", .help = "close a file descriptor previously passed via SCM rights", - .user_print = monitor_user_noop, - .mhandler.cmd_new = do_closefd, + .mhandler.cmd_new = qmp_marshal_input_closefd, },
SQMP

This patch adds the pass-fd QMP command using the QAPI framework. Like the getfd command, it is used to pass a file descriptor via SCM_RIGHTS and associate it with a name. However, the pass-fd command also returns the received file descriptor, which is a difference in behavior from the getfd command, which returns nothing. pass-fd also supports a boolean "force" argument that can be used to specify that an existing file descriptor is associated with the specified name, and that it should become a copy of the newly passed file descriptor. The closefd command can be used to close a file descriptor that was passed with the pass-fd command. Note that when using getfd or pass-fd, there are some commands (e.g. migrate with fd:name) that implicitly close the named fd. When this is not the case, closefd must be used to avoid fd leaks. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v2: -Introduce new QMP command to pass/return fd (lcapitulino@redhat.com) -Use passfd as command name (berrange@redhat.com) v3: -Use pass-fd as command name (lcapitulino@redhat.com) -Fail pass-fd if fdname already exists (lcapitulino@redhat.com) -Add notes to QMP command describing behavior in more detail (lcapitulino@redhat.com, eblake@redhat.com) -Add note about fd leakage (eblake@redhat.com) v4: -Don't return same error class twice (lcapitulino@redhat.com) -Share code for qmp_gefd and qmp_pass_fd (lcapitulino@redhat.com) -Update qmp_closefd to call monitor_get_fd -Add support for "force" argument to pass-fd (eblake@redhat.com) dump.c | 3 +- migration-fd.c | 2 +- monitor.c | 96 +++++++++++++++++++++++++++++++++++------------------- monitor.h | 2 +- net.c | 6 ++-- qapi-schema.json | 36 ++++++++++++++++++++ qmp-commands.hx | 42 +++++++++++++++++++++++- 7 files changed, 146 insertions(+), 41 deletions(-) diff --git a/dump.c b/dump.c index 4412d7a..2fd8ced 100644 --- a/dump.c +++ b/dump.c @@ -836,9 +836,8 @@ void qmp_dump_guest_memory(bool paging, const char *file, bool has_begin, #if !defined(WIN32) if (strstart(file, "fd:", &p)) { - fd = monitor_get_fd(cur_mon, p); + fd = monitor_get_fd(cur_mon, p, errp); if (fd == -1) { - error_set(errp, QERR_FD_NOT_FOUND, p); return; } } diff --git a/migration-fd.c b/migration-fd.c index 50138ed..7335167 100644 --- a/migration-fd.c +++ b/migration-fd.c @@ -75,7 +75,7 @@ static int fd_close(MigrationState *s) int fd_start_outgoing_migration(MigrationState *s, const char *fdname) { - s->fd = monitor_get_fd(cur_mon, fdname); + s->fd = monitor_get_fd(cur_mon, fdname, NULL); if (s->fd == -1) { DPRINTF("fd_migration: invalid file descriptor identifier\n"); goto err_after_get_fd; diff --git a/monitor.c b/monitor.c index 1a7f7e7..3433c06 100644 --- a/monitor.c +++ b/monitor.c @@ -814,7 +814,7 @@ static int add_graphics_client(Monitor *mon, const QDict *qdict, QObject **ret_d CharDriverState *s; if (strcmp(protocol, "spice") == 0) { - int fd = monitor_get_fd(mon, fdname); + int fd = monitor_get_fd(mon, fdname, NULL); int skipauth = qdict_get_try_bool(qdict, "skipauth", 0); int tls = qdict_get_try_bool(qdict, "tls", 0); if (!using_spice) { @@ -828,18 +828,18 @@ static int add_graphics_client(Monitor *mon, const QDict *qdict, QObject **ret_d return 0; #ifdef CONFIG_VNC } else if (strcmp(protocol, "vnc") == 0) { - int fd = monitor_get_fd(mon, fdname); + int fd = monitor_get_fd(mon, fdname, NULL); int skipauth = qdict_get_try_bool(qdict, "skipauth", 0); - vnc_display_add_client(NULL, fd, skipauth); - return 0; + vnc_display_add_client(NULL, fd, skipauth); + return 0; #endif } else if ((s = qemu_chr_find(protocol)) != NULL) { - int fd = monitor_get_fd(mon, fdname); - if (qemu_chr_add_client(s, fd) < 0) { - qerror_report(QERR_ADD_CLIENT_FAILED); - return -1; - } - return 0; + int fd = monitor_get_fd(mon, fdname, NULL); + if (qemu_chr_add_client(s, fd) < 0) { + qerror_report(QERR_ADD_CLIENT_FAILED); + return -1; + } + return 0; } qerror_report(QERR_INVALID_PARAMETER, "protocol"); @@ -2182,57 +2182,69 @@ static void do_inject_mce(Monitor *mon, const QDict *qdict) } #endif -void qmp_getfd(const char *fdname, Error **errp) +static int monitor_put_fd(Monitor *mon, const char *fdname, bool replace, + bool copy, Error **errp) { mon_fd_t *monfd; int fd; - fd = qemu_chr_fe_get_msgfd(cur_mon->chr); + fd = qemu_chr_fe_get_msgfd(mon->chr); if (fd == -1) { error_set(errp, QERR_FD_NOT_SUPPLIED); - return; + return -1; } if (qemu_isdigit(fdname[0])) { error_set(errp, QERR_INVALID_PARAMETER_VALUE, "fdname", "a name not starting with a digit"); - return; + return -1; } - QLIST_FOREACH(monfd, &cur_mon->fds, next) { + QLIST_FOREACH(monfd, &mon->fds, next) { if (strcmp(monfd->name, fdname) != 0) { continue; } - close(monfd->fd); - monfd->fd = fd; - return; + if (replace) { + /* the existing fd is replaced with the new fd */ + close(monfd->fd); + monfd->fd = fd; + return fd; + } + + if (copy) { + /* the existing fd becomes a copy of the new fd */ + if (dup2(fd, monfd->fd) == -1) { + error_set(errp, QERR_UNDEFINED_ERROR); + return -1; + } + close(fd); + return monfd->fd; + } + + error_set(errp, QERR_INVALID_PARAMETER, "fdname"); + return -1; + } + + if (copy) { + error_set(errp, QERR_INVALID_PARAMETER, "fdname"); + return -1; } monfd = g_malloc0(sizeof(mon_fd_t)); monfd->name = g_strdup(fdname); monfd->fd = fd; - QLIST_INSERT_HEAD(&cur_mon->fds, monfd, next); + QLIST_INSERT_HEAD(&mon->fds, monfd, next); + return fd; } void qmp_closefd(const char *fdname, Error **errp) { - mon_fd_t *monfd; - - QLIST_FOREACH(monfd, &cur_mon->fds, next) { - if (strcmp(monfd->name, fdname) != 0) { - continue; - } - - QLIST_REMOVE(monfd, next); - close(monfd->fd); - g_free(monfd->name); - g_free(monfd); - return; + int fd = monitor_get_fd(cur_mon, fdname, errp); + if (fd != -1) { + close(fd); } - - error_set(errp, QERR_FD_NOT_FOUND, fdname); } static void do_loadvm(Monitor *mon, const QDict *qdict) @@ -2247,7 +2259,7 @@ static void do_loadvm(Monitor *mon, const QDict *qdict) } } -int monitor_get_fd(Monitor *mon, const char *fdname) +int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp) { mon_fd_t *monfd; @@ -2268,9 +2280,25 @@ int monitor_get_fd(Monitor *mon, const char *fdname) return fd; } + error_set(errp, QERR_FD_NOT_FOUND, fdname); return -1; } +int64_t qmp_pass_fd(const char *fdname, bool has_force, bool force, + Error **errp) +{ + bool replace = false; + bool copy = has_force ? force : false; + return monitor_put_fd(cur_mon, fdname, replace, copy, errp); +} + +void qmp_getfd(const char *fdname, Error **errp) +{ + bool replace = true; + bool copy = false; + monitor_put_fd(cur_mon, fdname, replace, copy, errp); +} + /* mon_cmds and info_cmds would be sorted at runtime */ static mon_cmd_t mon_cmds[] = { #include "hmp-commands.h" diff --git a/monitor.h b/monitor.h index cd1d878..8fa515d 100644 --- a/monitor.h +++ b/monitor.h @@ -63,7 +63,7 @@ int monitor_read_block_device_key(Monitor *mon, const char *device, BlockDriverCompletionFunc *completion_cb, void *opaque); -int monitor_get_fd(Monitor *mon, const char *fdname); +int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp); void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap) GCC_FMT_ATTR(2, 0); diff --git a/net.c b/net.c index 4aa416c..28860b8 100644 --- a/net.c +++ b/net.c @@ -730,12 +730,14 @@ int qemu_find_nic_model(NICInfo *nd, const char * const *models, int net_handle_fd_param(Monitor *mon, const char *param) { int fd; + Error *local_err = NULL; if (!qemu_isdigit(param[0]) && mon) { - fd = monitor_get_fd(mon, param); + fd = monitor_get_fd(mon, param, &local_err); if (fd == -1) { - error_report("No file descriptor named %s found", param); + qerror_report_err(local_err); + error_free(local_err); return -1; } } else { diff --git a/qapi-schema.json b/qapi-schema.json index 26a6b84..2ac1261 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1864,6 +1864,41 @@ { 'command': 'netdev_del', 'data': {'id': 'str'} } ## +# @pass-fd: +# +# Pass a file descriptor via SCM rights and assign it a name +# +# @fdname: file descriptor name +# +# @force: #optional true specifies that an existing file descriptor is +# associated with @fdname, and that it should become a copy of the +# newly passed file descriptor. +# +# Returns: The QEMU file descriptor that is assigned to @fdname +# If file descriptor was not received, FdNotSupplied +# If @fdname is not valid, InvalidParameterValue +# If @fdname already exists, and @force is not true, InvalidParameter +# If @fdname does not already exist, and @force is true, +# InvalidParameter +# If @force fails to copy the passed file descriptor, +# UndefinedError +# +# Since: 1.2.0 +# +# Notes: If @fdname already exists, and @force is not true, the +# command will fail. +# +# If @fdname already exists, and @force is true, the value of +# the existing file descriptor is returned when the command is +# successful. +# +# The 'closefd' command can be used to explicitly close the +# file descriptor when it is no longer needed. +## +{ 'command': 'pass-fd', 'data': {'fdname': 'str', '*force': 'bool'}, + 'returns': 'int' } + +## # @getfd: # # Receive a file descriptor via SCM rights and assign it a name @@ -1879,6 +1914,7 @@ # Notes: If @fdname already exists, the file descriptor assigned to # it will be closed and replaced by the received file # descriptor. +# # The 'closefd' command can be used to explicitly close the # file descriptor when it is no longer needed. ## diff --git a/qmp-commands.hx b/qmp-commands.hx index e3cf3c5..7e3c07e 100644 --- a/qmp-commands.hx +++ b/qmp-commands.hx @@ -869,9 +869,49 @@ Example: EQMP { + .name = "pass-fd", + .args_type = "fdname:s,force:b?", + .params = "pass-fd fdname force", + .help = "pass a file descriptor via SCM rights and assign it a name", + .mhandler.cmd_new = qmp_marshal_input_pass_fd, + }, + +SQMP +pass-fd +------- + +Pass a file descriptor via SCM rights and assign it a name. + +Arguments: + +- "fdname": file descriptor name (json-string) +- "force": true specifies that an existing file descriptor is associated + with "fdname", and that it should become a copy of the newly + passed file descriptor. (json-bool, optional) + +Return a json-int with the QEMU file descriptor that is assigned to "fdname". + +Example: + +-> { "execute": "pass-fd", "arguments": { "fdname": "fd1" } } +<- { "return": 42 } + +Notes: + +(1) If the name specified by the "fdname" argument already exists, and + "force" is not true, the command will fail. +(2) If the name specified by the "fdname" argument already exists, and + "force" is true, the value of the existing file descriptor is + returned when the command is successful. +(3) The 'closefd' command can be used to explicitly close the file + descriptor when it is no longer needed. + +EQMP + + { .name = "getfd", .args_type = "fdname:s", - .params = "getfd name", + .params = "getfd fdname", .help = "receive a file descriptor via SCM rights and assign it a name", .mhandler.cmd_new = qmp_marshal_input_getfd, }, -- 1.7.10.2

On 06/22/2012 12:36 PM, Corey Bryant wrote:
This patch adds the pass-fd QMP command using the QAPI framework. Like the getfd command, it is used to pass a file descriptor via SCM_RIGHTS and associate it with a name. However, the pass-fd command also returns the received file descriptor, which is a difference in behavior from the getfd command, which returns nothing. pass-fd also supports a boolean "force" argument that can be used to specify that an existing file descriptor is associated with the specified name, and that it should become a copy of the newly passed file descriptor.
+ if (replace) { + /* the existing fd is replaced with the new fd */ + close(monfd->fd); + monfd->fd = fd; + return fd; + } + + if (copy) {
Since 'replace' and 'copy' are never both true, should this instead be a tri-state enum argument instead of two independent bool arguments?
+ /* the existing fd becomes a copy of the new fd */ + if (dup2(fd, monfd->fd) == -1) {
Broken - you want to use dup3(fd, monfd->fd, O_CLOEXEC) (and fall back to dup2()+fcntl(F_GETFD/F_SETFD) on platforms where dup3 is not available; it has been proposed for the next revision of POSIX but right now is pretty much limited to Linux and Cygwin). Otherwise, you are undoing the fact that patch 1/7 just changed the 'getfd' list to always keep all its fds marked cloexec. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 06/22/2012 04:24 PM, Eric Blake wrote:
On 06/22/2012 12:36 PM, Corey Bryant wrote:
This patch adds the pass-fd QMP command using the QAPI framework. Like the getfd command, it is used to pass a file descriptor via SCM_RIGHTS and associate it with a name. However, the pass-fd command also returns the received file descriptor, which is a difference in behavior from the getfd command, which returns nothing. pass-fd also supports a boolean "force" argument that can be used to specify that an existing file descriptor is associated with the specified name, and that it should become a copy of the newly passed file descriptor.
+ if (replace) { + /* the existing fd is replaced with the new fd */ + close(monfd->fd); + monfd->fd = fd; + return fd; + } + + if (copy) {
Since 'replace' and 'copy' are never both true, should this instead be a tri-state enum argument instead of two independent bool arguments?
Sure, that makes sense. I'll do that in v5.
+ /* the existing fd becomes a copy of the new fd */ + if (dup2(fd, monfd->fd) == -1) {
Broken - you want to use dup3(fd, monfd->fd, O_CLOEXEC) (and fall back to dup2()+fcntl(F_GETFD/F_SETFD) on platforms where dup3 is not available; it has been proposed for the next revision of POSIX but right now is pretty much limited to Linux and Cygwin). Otherwise, you are undoing the fact that patch 1/7 just changed the 'getfd' list to always keep all its fds marked cloexec.
Thanks for catching this. I'll fix this in v5. In terms of platforms that support dup3 vs dup2, I'm assuming the following preprocessor checks will do what we need: #if defined(__linux__) || defined(__CYGWIN__) dup3(fd, monfd->fd, O_CLOEXEC) #else dup2()+fcntl(F_GETFD/F_SETFD) #endif -- Regards, Corey

Am 25.06.2012 16:51, schrieb Corey Bryant:
Thanks for catching this. I'll fix this in v5. In terms of platforms that support dup3 vs dup2, I'm assuming the following preprocessor checks will do what we need:
#if defined(__linux__) || defined(__CYGWIN__) dup3(fd, monfd->fd, O_CLOEXEC) #else dup2()+fcntl(F_GETFD/F_SETFD) #endif
Checking for dup3() in configure might be the better option. It's not like it has been there forever, even on Linux. Kevin

On 06/25/2012 11:34 AM, Kevin Wolf wrote:
Am 25.06.2012 16:51, schrieb Corey Bryant:
Thanks for catching this. I'll fix this in v5. In terms of platforms that support dup3 vs dup2, I'm assuming the following preprocessor checks will do what we need:
#if defined(__linux__) || defined(__CYGWIN__) dup3(fd, monfd->fd, O_CLOEXEC) #else dup2()+fcntl(F_GETFD/F_SETFD) #endif
Checking for dup3() in configure might be the better option. It's not like it has been there forever, even on Linux.
Kevin
Ah, thanks. It looks like support is already there and I can just check: #if defined(CONFIG_DUP3) -- Regards, Corey

On 06/25/2012 09:34 AM, Kevin Wolf wrote:
Am 25.06.2012 16:51, schrieb Corey Bryant:
Thanks for catching this. I'll fix this in v5. In terms of platforms that support dup3 vs dup2, I'm assuming the following preprocessor checks will do what we need:
#if defined(__linux__) || defined(__CYGWIN__) dup3(fd, monfd->fd, O_CLOEXEC) #else dup2()+fcntl(F_GETFD/F_SETFD) #endif
Checking for dup3() in configure might be the better option. It's not like it has been there forever, even on Linux.
Agreed. Furthermore, since dup3() has been proposed for addition into POSIX[1], it won't be long before other platforms add it. Always favor feature checks (a configure probe for dup3) over platform checks (hard-coding the assumption that Linux and Cygwin are the only platforms that will ever provide it). [1] http://austingroupbugs.net/view.php?id=411 -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4: -This patch is new in v4. monitor.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/monitor.c b/monitor.c index 3433c06..153e949 100644 --- a/monitor.c +++ b/monitor.c @@ -2239,26 +2239,6 @@ static int monitor_put_fd(Monitor *mon, const char *fdname, bool replace, return fd; } -void qmp_closefd(const char *fdname, Error **errp) -{ - int fd = monitor_get_fd(cur_mon, fdname, errp); - if (fd != -1) { - close(fd); - } -} - -static void do_loadvm(Monitor *mon, const QDict *qdict) -{ - int saved_vm_running = runstate_is_running(); - const char *name = qdict_get_str(qdict, "name"); - - vm_stop(RUN_STATE_RESTORE_VM); - - if (load_vmstate(name) == 0 && saved_vm_running) { - vm_start(); - } -} - int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp) { mon_fd_t *monfd; @@ -2299,6 +2279,26 @@ void qmp_getfd(const char *fdname, Error **errp) monitor_put_fd(cur_mon, fdname, replace, copy, errp); } +void qmp_closefd(const char *fdname, Error **errp) +{ + int fd = monitor_get_fd(cur_mon, fdname, errp); + if (fd != -1) { + close(fd); + } +} + +static void do_loadvm(Monitor *mon, const QDict *qdict) +{ + int saved_vm_running = runstate_is_running(); + const char *name = qdict_get_str(qdict, "name"); + + vm_stop(RUN_STATE_RESTORE_VM); + + if (load_vmstate(name) == 0 && saved_vm_running) { + vm_start(); + } +} + /* mon_cmds and info_cmds would be sorted at runtime */ static mon_cmd_t mon_cmds[] = { #include "hmp-commands.h" -- 1.7.10.2

On Fri, 22 Jun 2012 14:36:11 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4: -This patch is new in v4.
Can you explain why it's needed?
monitor.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/monitor.c b/monitor.c index 3433c06..153e949 100644 --- a/monitor.c +++ b/monitor.c @@ -2239,26 +2239,6 @@ static int monitor_put_fd(Monitor *mon, const char *fdname, bool replace, return fd; }
-void qmp_closefd(const char *fdname, Error **errp) -{ - int fd = monitor_get_fd(cur_mon, fdname, errp); - if (fd != -1) { - close(fd); - } -} - -static void do_loadvm(Monitor *mon, const QDict *qdict) -{ - int saved_vm_running = runstate_is_running(); - const char *name = qdict_get_str(qdict, "name"); - - vm_stop(RUN_STATE_RESTORE_VM); - - if (load_vmstate(name) == 0 && saved_vm_running) { - vm_start(); - } -} - int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp) { mon_fd_t *monfd; @@ -2299,6 +2279,26 @@ void qmp_getfd(const char *fdname, Error **errp) monitor_put_fd(cur_mon, fdname, replace, copy, errp); }
+void qmp_closefd(const char *fdname, Error **errp) +{ + int fd = monitor_get_fd(cur_mon, fdname, errp); + if (fd != -1) { + close(fd); + } +} + +static void do_loadvm(Monitor *mon, const QDict *qdict) +{ + int saved_vm_running = runstate_is_running(); + const char *name = qdict_get_str(qdict, "name"); + + vm_stop(RUN_STATE_RESTORE_VM); + + if (load_vmstate(name) == 0 && saved_vm_running) { + vm_start(); + } +} + /* mon_cmds and info_cmds would be sorted at runtime */ static mon_cmd_t mon_cmds[] = { #include "hmp-commands.h"

On 06/26/2012 04:45 PM, Luiz Capitulino wrote:
On Fri, 22 Jun 2012 14:36:11 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4: -This patch is new in v4.
Can you explain why it's needed?
This just made patch 3/7 easier to review.
monitor.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/monitor.c b/monitor.c index 3433c06..153e949 100644 --- a/monitor.c +++ b/monitor.c @@ -2239,26 +2239,6 @@ static int monitor_put_fd(Monitor *mon, const char *fdname, bool replace, return fd; }
-void qmp_closefd(const char *fdname, Error **errp) -{ - int fd = monitor_get_fd(cur_mon, fdname, errp); - if (fd != -1) { - close(fd); - } -} - -static void do_loadvm(Monitor *mon, const QDict *qdict) -{ - int saved_vm_running = runstate_is_running(); - const char *name = qdict_get_str(qdict, "name"); - - vm_stop(RUN_STATE_RESTORE_VM); - - if (load_vmstate(name) == 0 && saved_vm_running) { - vm_start(); - } -} - int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp) { mon_fd_t *monfd; @@ -2299,6 +2279,26 @@ void qmp_getfd(const char *fdname, Error **errp) monitor_put_fd(cur_mon, fdname, replace, copy, errp); }
+void qmp_closefd(const char *fdname, Error **errp) +{ + int fd = monitor_get_fd(cur_mon, fdname, errp); + if (fd != -1) { + close(fd); + } +} + +static void do_loadvm(Monitor *mon, const QDict *qdict) +{ + int saved_vm_running = runstate_is_running(); + const char *name = qdict_get_str(qdict, "name"); + + vm_stop(RUN_STATE_RESTORE_VM); + + if (load_vmstate(name) == 0 && saved_vm_running) { + vm_start(); + } +} + /* mon_cmds and info_cmds would be sorted at runtime */ static mon_cmd_t mon_cmds[] = { #include "hmp-commands.h"
-- Regards, Corey

On Tue, 26 Jun 2012 17:16:28 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 06/26/2012 04:45 PM, Luiz Capitulino wrote:
On Fri, 22 Jun 2012 14:36:11 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4: -This patch is new in v4.
Can you explain why it's needed?
This just made patch 3/7 easier to review.
Better to explain it then and move it to before that patch.
monitor.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/monitor.c b/monitor.c index 3433c06..153e949 100644 --- a/monitor.c +++ b/monitor.c @@ -2239,26 +2239,6 @@ static int monitor_put_fd(Monitor *mon, const char *fdname, bool replace, return fd; }
-void qmp_closefd(const char *fdname, Error **errp) -{ - int fd = monitor_get_fd(cur_mon, fdname, errp); - if (fd != -1) { - close(fd); - } -} - -static void do_loadvm(Monitor *mon, const QDict *qdict) -{ - int saved_vm_running = runstate_is_running(); - const char *name = qdict_get_str(qdict, "name"); - - vm_stop(RUN_STATE_RESTORE_VM); - - if (load_vmstate(name) == 0 && saved_vm_running) { - vm_start(); - } -} - int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp) { mon_fd_t *monfd; @@ -2299,6 +2279,26 @@ void qmp_getfd(const char *fdname, Error **errp) monitor_put_fd(cur_mon, fdname, replace, copy, errp); }
+void qmp_closefd(const char *fdname, Error **errp) +{ + int fd = monitor_get_fd(cur_mon, fdname, errp); + if (fd != -1) { + close(fd); + } +} + +static void do_loadvm(Monitor *mon, const QDict *qdict) +{ + int saved_vm_running = runstate_is_running(); + const char *name = qdict_get_str(qdict, "name"); + + vm_stop(RUN_STATE_RESTORE_VM); + + if (load_vmstate(name) == 0 && saved_vm_running) { + vm_start(); + } +} + /* mon_cmds and info_cmds would be sorted at runtime */ static mon_cmd_t mon_cmds[] = { #include "hmp-commands.h"

On 06/26/2012 05:28 PM, Luiz Capitulino wrote:
On Tue, 26 Jun 2012 17:16:28 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 06/26/2012 04:45 PM, Luiz Capitulino wrote:
On Fri, 22 Jun 2012 14:36:11 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v4: -This patch is new in v4.
Can you explain why it's needed?
This just made patch 3/7 easier to review.
Better to explain it then and move it to before that patch.
Sure, I can do that.
monitor.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/monitor.c b/monitor.c index 3433c06..153e949 100644 --- a/monitor.c +++ b/monitor.c @@ -2239,26 +2239,6 @@ static int monitor_put_fd(Monitor *mon, const char *fdname, bool replace, return fd; }
-void qmp_closefd(const char *fdname, Error **errp) -{ - int fd = monitor_get_fd(cur_mon, fdname, errp); - if (fd != -1) { - close(fd); - } -} - -static void do_loadvm(Monitor *mon, const QDict *qdict) -{ - int saved_vm_running = runstate_is_running(); - const char *name = qdict_get_str(qdict, "name"); - - vm_stop(RUN_STATE_RESTORE_VM); - - if (load_vmstate(name) == 0 && saved_vm_running) { - vm_start(); - } -} - int monitor_get_fd(Monitor *mon, const char *fdname, Error **errp) { mon_fd_t *monfd; @@ -2299,6 +2279,26 @@ void qmp_getfd(const char *fdname, Error **errp) monitor_put_fd(cur_mon, fdname, replace, copy, errp); }
+void qmp_closefd(const char *fdname, Error **errp) +{ + int fd = monitor_get_fd(cur_mon, fdname, errp); + if (fd != -1) { + close(fd); + } +} + +static void do_loadvm(Monitor *mon, const QDict *qdict) +{ + int saved_vm_running = runstate_is_running(); + const char *name = qdict_get_str(qdict, "name"); + + vm_stop(RUN_STATE_RESTORE_VM); + + if (load_vmstate(name) == 0 && saved_vm_running) { + vm_start(); + } +} + /* mon_cmds and info_cmds would be sorted at runtime */ static mon_cmd_t mon_cmds[] = { #include "hmp-commands.h"
-- Regards, Corey

Reported-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v3: -This patch is new in v3. It was previously submitted on its own, and is now being included in this series. v4 -Moved patch to be earlier in series (lcapitulino@redhat.com) block/raw-posix.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 03fcfcc..0d9be0b 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -946,9 +946,11 @@ static int floppy_probe_device(const char *filename) int prio = 0; struct floppy_struct fdparam; struct stat st; + const char *p; - if (strstart(filename, "/dev/fd", NULL)) + if (strstart(filename, "/dev/fd", &p) && p[0] != '/') { prio = 50; + } fd = open(filename, O_RDONLY | O_NONBLOCK); if (fd < 0) { -- 1.7.10.2

This patch converts all block layer open calls to qemu_open. This enables all block layer open paths to dup(X) a pre-opened file descriptor if the filename is of the format /dev/fd/X. This is useful if QEMU is restricted from opening certain files. Note that this adds the O_CLOEXEC flag to the changed open paths when the O_CLOEXEC macro is defined. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v2: -Convert calls to qemu_open instead of file_open (kwolf@redhat.com) -Mention introduction of O_CLOEXEC (kwolf@redhat.com) v3: -No changes v4: -No changes block/raw-posix.c | 18 +++++++++--------- block/raw-win32.c | 4 ++-- block/vdi.c | 5 +++-- block/vmdk.c | 21 +++++++++------------ block/vpc.c | 2 +- block/vvfat.c | 21 +++++++++++---------- 6 files changed, 35 insertions(+), 36 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 0d9be0b..68886cd 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -568,8 +568,8 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) options++; } - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, - 0644); + fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, + 0644); if (fd < 0) { result = -errno; } else { @@ -741,7 +741,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags) if ( bsdPath[ 0 ] != '\0' ) { strcat(bsdPath,"s0"); /* some CDs don't have a partition 0 */ - fd = open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE); + fd = qemu_open(bsdPath, O_RDONLY | O_BINARY | O_LARGEFILE); if (fd < 0) { bsdPath[strlen(bsdPath)-1] = '1'; } else { @@ -798,7 +798,7 @@ static int fd_open(BlockDriverState *bs) #endif return -EIO; } - s->fd = open(bs->filename, s->open_flags & ~O_NONBLOCK); + s->fd = qemu_open(bs->filename, s->open_flags & ~O_NONBLOCK); if (s->fd < 0) { s->fd_error_time = get_clock(); s->fd_got_error = 1; @@ -872,7 +872,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options) options++; } - fd = open(filename, O_WRONLY | O_BINARY); + fd = qemu_open(filename, O_WRONLY | O_BINARY); if (fd < 0) return -errno; @@ -952,7 +952,7 @@ static int floppy_probe_device(const char *filename) prio = 50; } - fd = open(filename, O_RDONLY | O_NONBLOCK); + fd = qemu_open(filename, O_RDONLY | O_NONBLOCK); if (fd < 0) { goto out; } @@ -1005,7 +1005,7 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag) close(s->fd); s->fd = -1; } - fd = open(bs->filename, s->open_flags | O_NONBLOCK); + fd = qemu_open(bs->filename, s->open_flags | O_NONBLOCK); if (fd >= 0) { if (ioctl(fd, FDEJECT, 0) < 0) perror("FDEJECT"); @@ -1055,7 +1055,7 @@ static int cdrom_probe_device(const char *filename) int prio = 0; struct stat st; - fd = open(filename, O_RDONLY | O_NONBLOCK); + fd = qemu_open(filename, O_RDONLY | O_NONBLOCK); if (fd < 0) { goto out; } @@ -1179,7 +1179,7 @@ static int cdrom_reopen(BlockDriverState *bs) */ if (s->fd >= 0) close(s->fd); - fd = open(bs->filename, s->open_flags, 0644); + fd = qemu_open(bs->filename, s->open_flags, 0644); if (fd < 0) { s->fd = -1; return -EIO; diff --git a/block/raw-win32.c b/block/raw-win32.c index e4b0b75..8d7838d 100644 --- a/block/raw-win32.c +++ b/block/raw-win32.c @@ -255,8 +255,8 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) options++; } - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, - 0644); + fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, + 0644); if (fd < 0) return -EIO; set_sparse(fd); diff --git a/block/vdi.c b/block/vdi.c index 119d3c7..6183fdf 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -648,8 +648,9 @@ static int vdi_create(const char *filename, QEMUOptionParameter *options) options++; } - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, - 0644); + fd = qemu_open(filename, + O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, + 0644); if (fd < 0) { return -errno; } diff --git a/block/vmdk.c b/block/vmdk.c index 18e9b4c..557dc1b 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1161,10 +1161,9 @@ static int vmdk_create_extent(const char *filename, int64_t filesize, VMDK4Header header; uint32_t tmp, magic, grains, gd_size, gt_size, gt_count; - fd = open( - filename, - O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, - 0644); + fd = qemu_open(filename, + O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, + 0644); if (fd < 0) { return -errno; } @@ -1484,15 +1483,13 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options) (flags & BLOCK_FLAG_COMPAT6 ? 6 : 4), total_size / (int64_t)(63 * 16 * 512)); if (split || flat) { - fd = open( - filename, - O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, - 0644); + fd = qemu_open(filename, + O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE, + 0644); } else { - fd = open( - filename, - O_WRONLY | O_BINARY | O_LARGEFILE, - 0644); + fd = qemu_open(filename, + O_WRONLY | O_BINARY | O_LARGEFILE, + 0644); } if (fd < 0) { return -errno; diff --git a/block/vpc.c b/block/vpc.c index 5cd13d1..60ebf5a 100644 --- a/block/vpc.c +++ b/block/vpc.c @@ -678,7 +678,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options) } /* Create the file */ - fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644); + fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644); if (fd < 0) { return -EIO; } diff --git a/block/vvfat.c b/block/vvfat.c index 0fd3367..81c3a19 100644 --- a/block/vvfat.c +++ b/block/vvfat.c @@ -1152,16 +1152,17 @@ static inline mapping_t* find_mapping_for_cluster(BDRVVVFATState* s,int cluster_ static int open_file(BDRVVVFATState* s,mapping_t* mapping) { if(!mapping) - return -1; + return -1; if(!s->current_mapping || - strcmp(s->current_mapping->path,mapping->path)) { - /* open file */ - int fd = open(mapping->path, O_RDONLY | O_BINARY | O_LARGEFILE); - if(fd<0) - return -1; - vvfat_close_current_file(s); - s->current_fd = fd; - s->current_mapping = mapping; + strcmp(s->current_mapping->path, mapping->path)) { + /* open file */ + int fd = qemu_open(mapping->path, O_RDONLY | O_BINARY | O_LARGEFILE); + if (fd < 0) { + return -1; + } + vvfat_close_current_file(s); + s->current_fd = fd; + s->current_mapping = mapping; } return 0; } @@ -2215,7 +2216,7 @@ static int commit_one_file(BDRVVVFATState* s, for (i = s->cluster_size; i < offset; i += s->cluster_size) c = modified_fat_get(s, c); - fd = open(mapping->path, O_RDWR | O_CREAT | O_BINARY, 0666); + fd = qemu_open(mapping->path, O_RDWR | O_CREAT | O_BINARY, 0666); if (fd < 0) { fprintf(stderr, "Could not open %s... (%s, %d)\n", mapping->path, strerror(errno), errno); -- 1.7.10.2

This patch adds support to qemu_open to dup(fd) a pre-opened file descriptor if the filename is of the format /dev/fd/X. This can be used when QEMU is restricted from opening files, and the management application opens files on QEMU's behalf. If the fd was passed to the monitor with the pass-fd command, it must be explicitly closed with the 'closefd' command when it is no longer required, in order to prevent fd leaks. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> --- v2: -Get rid of file_open and move dup code to qemu_open (kwolf@redhat.com) -Use strtol wrapper instead of atoi (kwolf@redhat.com) v3: -Add note about fd leakage (eblake@redhat.com) v4 -Moved patch to be later in series (lcapitulino@redhat.com) -Update qemu_open to check access mode flags and set flags that can be set (eblake@redhat.com, kwolf@redhat.com) cutils.c | 26 +++++++++++++---- main-loop.c | 6 ++-- osdep.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ qemu-common.h | 2 +- 4 files changed, 116 insertions(+), 9 deletions(-) diff --git a/cutils.c b/cutils.c index af308cd..f45d921 100644 --- a/cutils.c +++ b/cutils.c @@ -339,17 +339,33 @@ bool buffer_is_zero(const void *buf, size_t len) } #ifndef _WIN32 -/* Sets a specific flag */ -int fcntl_setfl(int fd, int flag) +/* Sets a specific flag on or off */ +int fcntl_setfl(int fd, int flag, int onoff) { int flags; + if (onoff != 0 && onoff != 1) { + return -EINVAL; + } + flags = fcntl(fd, F_GETFL); - if (flags == -1) + if (flags == -1) { return -errno; + } - if (fcntl(fd, F_SETFL, flags | flag) == -1) - return -errno; + if (onoff == 1) { + if ((flags & flag) != flag) { + if (fcntl(fd, F_SETFL, flags | flag) == -1) { + return -errno; + } + } + } else { + if ((flags & flag) == flag) { + if (fcntl(fd, F_SETFL, flags & ~flag) == -1) { + return -errno; + } + } + } return 0; } diff --git a/main-loop.c b/main-loop.c index eb3b6e6..644fcc3 100644 --- a/main-loop.c +++ b/main-loop.c @@ -75,11 +75,11 @@ static int qemu_event_init(void) if (err == -1) { return -errno; } - err = fcntl_setfl(fds[0], O_NONBLOCK); + err = fcntl_setfl(fds[0], O_NONBLOCK, 1); if (err < 0) { goto fail; } - err = fcntl_setfl(fds[1], O_NONBLOCK); + err = fcntl_setfl(fds[1], O_NONBLOCK, 1); if (err < 0) { goto fail; } @@ -154,7 +154,7 @@ static int qemu_signal_init(void) return -errno; } - fcntl_setfl(sigfd, O_NONBLOCK); + fcntl_setfl(sigfd, O_NONBLOCK, 1); qemu_set_fd_handler2(sigfd, NULL, sigfd_handler, NULL, (void *)(intptr_t)sigfd); diff --git a/osdep.c b/osdep.c index 3e6bada..a6fc758d 100644 --- a/osdep.c +++ b/osdep.c @@ -73,6 +73,63 @@ int qemu_madvise(void *addr, size_t len, int advice) #endif } +/* + * Dups an fd and sets the flags + */ +static int qemu_dup(int fd, int flags) +{ + int ret; + int serrno; + + if (flags & O_CLOEXEC) { + ret = fcntl(fd, F_DUPFD_CLOEXEC, 0); + if (ret == -1 && errno == EINVAL) { + ret = dup(fd); + if (ret == -1) { + goto fail; + } + if (fcntl_setfl(ret, O_CLOEXEC, (flags & O_CLOEXEC) ? 1 : 0) < 0) { + goto fail; + } + } + } else { + ret = dup(fd); + } + + if (ret == -1) { + goto fail; + } + + /* Truncate the file in the cases that open would truncate it */ + if (flags & O_TRUNC || + ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) { + if (ftruncate(ret, 0) == -1) { + goto fail; + } + } + + if ((fcntl_setfl(ret, O_APPEND, (flags & O_APPEND) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_ASYNC, (flags & O_ASYNC) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_DIRECT, (flags & O_DIRECT) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_LARGEFILE, (flags & O_LARGEFILE) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NDELAY, (flags & O_NDELAY) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NOATIME, (flags & O_NOATIME) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NOCTTY, (flags & O_NOCTTY) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NONBLOCK, (flags & O_NONBLOCK) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_SYNC, (flags & O_SYNC) ? 1 : 0) < 0)) { + goto fail; + } + + return ret; + +fail: + serrno = errno; + if (ret != -1) { + close(ret); + } + errno = serrno; + return -1; +} /* * Opens a file with FD_CLOEXEC set @@ -82,6 +139,40 @@ int qemu_open(const char *name, int flags, ...) int ret; int mode = 0; +#ifndef _WIN32 + const char *p; + + /* Attempt dup of fd for pre-opened file */ + if (strstart(name, "/dev/fd/", &p)) { + int fd; + int eflags; + + fd = qemu_parse_fd(p); + if (fd == -1) { + return -1; + } + + /* Get the existing fd's flags */ + eflags = fcntl(fd, F_GETFL); + if (eflags == -1) { + return -1; + } + + if (((flags & O_RDWR) != (eflags & O_RDWR)) || + ((flags & O_RDONLY) != (eflags & O_RDONLY)) || + ((flags & O_WRONLY) != (eflags & O_WRONLY))) { + errno = EACCES; + return -1; + } + + if (fcntl_setfl(fd, O_CLOEXEC, 1) < 0) { + return -1; + } + + return qemu_dup(fd, flags); + } +#endif + if (flags & O_CREAT) { va_list ap; diff --git a/qemu-common.h b/qemu-common.h index 91e0562..99cbbc5 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -144,7 +144,7 @@ int qemu_strnlen(const char *s, int max_len); time_t mktimegm(struct tm *tm); int qemu_fls(int i); int qemu_fdatasync(int fd); -int fcntl_setfl(int fd, int flag); +int fcntl_setfl(int fd, int flag, int onoff); int qemu_parse_fd(const char *param); /* -- 1.7.10.2

On 06/22/2012 12:36 PM, Corey Bryant wrote:
This patch adds support to qemu_open to dup(fd) a pre-opened file descriptor if the filename is of the format /dev/fd/X.
This can be used when QEMU is restricted from opening files, and the management application opens files on QEMU's behalf.
If the fd was passed to the monitor with the pass-fd command, it must be explicitly closed with the 'closefd' command when it is no longer required, in order to prevent fd leaks.
+static int qemu_dup(int fd, int flags) +{ + int ret; + int serrno; + + if (flags & O_CLOEXEC) { + ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
F_DUPFD_CLOEXEC is required by POSIX, but not implemented on all platforms yet. Do you need to be checking with #ifdef F_DUPFD_CLOEXEC to avoid compilation failure?
+ if (ret == -1 && errno == EINVAL) { + ret = dup(fd); + if (ret == -1) { + goto fail; + } + if (fcntl_setfl(ret, O_CLOEXEC, (flags & O_CLOEXEC) ? 1 : 0) < 0) {
Broken. O_CLOEXEC _only_ affects open(); to change it on an existing fd, you have to use fcntl(F_GETFD/F_SETFD) (not F_GETFL/F_SETFL).
+ + if ((fcntl_setfl(ret, O_APPEND, (flags & O_APPEND) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_ASYNC, (flags & O_ASYNC) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_DIRECT, (flags & O_DIRECT) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_LARGEFILE, (flags & O_LARGEFILE) ? 1 : 0) < 0) ||
Pointless. O_LARGEFILE should _always_ be set, since we are compiling for 64-bit off_t always.
+ (fcntl_setfl(ret, O_NDELAY, (flags & O_NDELAY) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NOATIME, (flags & O_NOATIME) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NOCTTY, (flags & O_NOCTTY) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NONBLOCK, (flags & O_NONBLOCK) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_SYNC, (flags & O_SYNC) ? 1 : 0) < 0)) {
Yuck. That's a lot of syscalls (1 per fcntl_setfl() if they are already set correctly, and 2 per fcntl_setfl() call if we are toggling each one). It might be better to combine this into at most 2 fcntl() calls, instead of a long sequence.
+ /* Get the existing fd's flags */ + eflags = fcntl(fd, F_GETFL); + if (eflags == -1) { + return -1; + } + + if (((flags & O_RDWR) != (eflags & O_RDWR)) || + ((flags & O_RDONLY) != (eflags & O_RDONLY)) || + ((flags & O_WRONLY) != (eflags & O_WRONLY))) {
Broken. O_RDWR, O_RDONLY, and O_WRONLY are NOT bitmasks, but are values in the range of O_ACCMODE. In particular, O_RDONLY==0 on some platforms (Linux), and ==1 on others (Hurd), and although POSIX recommends that O_RDWR==(O_RDONLY|O_WRONLY) for any new systems, no one has really done that except Hurd. A correct way to write this is: switch (flags & O_ACCMODE) { case O_RDWR: if ((eflags & O_ACCMODE) != O_RDWR) { goto error; break; case O_RDONLY: if ((eflags & O_ACCMODE) != O_RDONLY) { goto error; break; case O_RDONLY: if ((eflags & O_ACCMODE) != O_RDONLY) { goto error; break; default: goto error: } [Technically, POSIX also requires O_ACCMODE to include O_SEARCH and O_EXEC, although those two constants might be the same value; but right now Linux has not yet implemented that bit; but unless qemu ever gains the need to open executable binaries with O_EXEC or directories with O_SEARCH, we probably don't have to worry about that aspect of O_ACCMODE here.]
+ errno = EACCES; + return -1; + } + + if (fcntl_setfl(fd, O_CLOEXEC, 1) < 0) {
Again, broken. Besides, why are you attempting it both here and in qemu_dup()? Shouldn't once be enough?
+ return -1; + } + + return qemu_dup(fd, flags);
-- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 06/22/2012 03:58 PM, Eric Blake wrote:
On 06/22/2012 12:36 PM, Corey Bryant wrote:
This patch adds support to qemu_open to dup(fd) a pre-opened file descriptor if the filename is of the format /dev/fd/X.
This can be used when QEMU is restricted from opening files, and the management application opens files on QEMU's behalf.
If the fd was passed to the monitor with the pass-fd command, it must be explicitly closed with the 'closefd' command when it is no longer required, in order to prevent fd leaks.
+static int qemu_dup(int fd, int flags) +{ + int ret; + int serrno; + + if (flags & O_CLOEXEC) { + ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
F_DUPFD_CLOEXEC is required by POSIX, but not implemented on all platforms yet. Do you need to be checking with #ifdef F_DUPFD_CLOEXEC to avoid compilation failure?
Yes it sounds like this needs to be done. I'll fix this in v5 (as discussed in the previous email).
+ if (ret == -1 && errno == EINVAL) { + ret = dup(fd); + if (ret == -1) { + goto fail; + } + if (fcntl_setfl(ret, O_CLOEXEC, (flags & O_CLOEXEC) ? 1 : 0) < 0) {
Broken. O_CLOEXEC _only_ affects open(); to change it on an existing fd, you have to use fcntl(F_GETFD/F_SETFD) (not F_GETFL/F_SETFL).
I'll fix this in v5.
+ + if ((fcntl_setfl(ret, O_APPEND, (flags & O_APPEND) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_ASYNC, (flags & O_ASYNC) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_DIRECT, (flags & O_DIRECT) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_LARGEFILE, (flags & O_LARGEFILE) ? 1 : 0) < 0) ||
Pointless. O_LARGEFILE should _always_ be set, since we are compiling for 64-bit off_t always.
I'll remove this in v5.
+ (fcntl_setfl(ret, O_NDELAY, (flags & O_NDELAY) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NOATIME, (flags & O_NOATIME) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NOCTTY, (flags & O_NOCTTY) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_NONBLOCK, (flags & O_NONBLOCK) ? 1 : 0) < 0) || + (fcntl_setfl(ret, O_SYNC, (flags & O_SYNC) ? 1 : 0) < 0)) {
Yuck. That's a lot of syscalls (1 per fcntl_setfl() if they are already set correctly, and 2 per fcntl_setfl() call if we are toggling each one). It might be better to combine this into at most 2 fcntl() calls, instead of a long sequence.
I see your point. I'll call fcntl(F_GETFL) once to get the current flags, determine what needs to be set on/off, and then call fnctl(F_SETFL) once. In this case I won't be using fcntl_setfl() anymore. Do you want me to drop the fcntl_setfl() changes I made? Also, I noticed in the fnctl man page that F_SETFL: "On Linux this command can change only the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags." So I'll only set/unset these flags.
+ /* Get the existing fd's flags */ + eflags = fcntl(fd, F_GETFL); + if (eflags == -1) { + return -1; + } + + if (((flags & O_RDWR) != (eflags & O_RDWR)) || + ((flags & O_RDONLY) != (eflags & O_RDONLY)) || + ((flags & O_WRONLY) != (eflags & O_WRONLY))) {
Broken. O_RDWR, O_RDONLY, and O_WRONLY are NOT bitmasks, but are values in the range of O_ACCMODE. In particular, O_RDONLY==0 on some platforms (Linux), and ==1 on others (Hurd), and although POSIX recommends that O_RDWR==(O_RDONLY|O_WRONLY) for any new systems, no one has really done that except Hurd.
A correct way to write this is:
switch (flags & O_ACCMODE) { case O_RDWR: if ((eflags & O_ACCMODE) != O_RDWR) { goto error; break; case O_RDONLY: if ((eflags & O_ACCMODE) != O_RDONLY) { goto error; break; case O_RDONLY: if ((eflags & O_ACCMODE) != O_RDONLY) { goto error; break; default: goto error: }
Thanks, I'll update this in v5.
[Technically, POSIX also requires O_ACCMODE to include O_SEARCH and O_EXEC, although those two constants might be the same value; but right now Linux has not yet implemented that bit; but unless qemu ever gains the need to open executable binaries with O_EXEC or directories with O_SEARCH, we probably don't have to worry about that aspect of O_ACCMODE here.]
+ errno = EACCES; + return -1; + } + + if (fcntl_setfl(fd, O_CLOEXEC, 1) < 0) {
Again, broken. Besides, why are you attempting it both here and in qemu_dup()? Shouldn't once be enough?
This is setting O_CLOEXEC for the inherited fd, and qemu_dup() is setting O_CLOEXEC for the dup'd fd, so I think this needs to remain (and use F_SETFD instead).
+ return -1; + } + + return qemu_dup(fd, flags);
-- Regards, Corey

On 06/25/2012 08:24 AM, Corey Bryant wrote:
+ if (fcntl_setfl(ret, O_CLOEXEC, (flags & O_CLOEXEC) ? 1 : 0) < 0) {
Broken. O_CLOEXEC _only_ affects open(); to change it on an existing fd, you have to use fcntl(F_GETFD/F_SETFD) (not F_GETFL/F_SETFL).
I'll fix this in v5.
Don't we already have qemu_set_cloexec() for this purpose?
I see your point. I'll call fcntl(F_GETFL) once to get the current flags, determine what needs to be set on/off, and then call fnctl(F_SETFL) once. In this case I won't be using fcntl_setfl() anymore. Do you want me to drop the fcntl_setfl() changes I made?
Or maybe make fcntl_setfl() take a mask of bits to alter? You're right that it's not worth changing if you won't take advantage of the changes.
Also, I noticed in the fnctl man page that F_SETFL: "On Linux this command can change only the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags." So I'll only set/unset these flags.
O_NDELAY is the obsolete spelling of O_NONBLOCK; which means the only other flags in your list not supported by Linux are O_LARGEFILE (which I said was pointless), O_NOCTTY (which only has an impact at open() and not later on, so it is not worth worrying about), and O_SYNC (so for that one, you should error out if not set correctly, as the difference between O_SYNC on vs. off is significant). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Am 25.06.2012 16:34, schrieb Eric Blake:
Also, I noticed in the fnctl man page that F_SETFL: "On Linux this command can change only the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags." So I'll only set/unset these flags.
O_NDELAY is the obsolete spelling of O_NONBLOCK; which means the only other flags in your list not supported by Linux are O_LARGEFILE (which I said was pointless), O_NOCTTY (which only has an impact at open() and not later on, so it is not worth worrying about), and O_SYNC (so for that one, you should error out if not set correctly, as the difference between O_SYNC on vs. off is significant).
As far as I know, newer kernels can change O_SYNC with fcntl, so it's probably worth trying, but you'd have to check if it has really been set. Kevin

On 06/25/2012 11:26 AM, Kevin Wolf wrote:
Am 25.06.2012 16:34, schrieb Eric Blake:
Also, I noticed in the fnctl man page that F_SETFL: "On Linux this command can change only the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags." So I'll only set/unset these flags.
O_NDELAY is the obsolete spelling of O_NONBLOCK; which means the only other flags in your list not supported by Linux are O_LARGEFILE (which I said was pointless), O_NOCTTY (which only has an impact at open() and not later on, so it is not worth worrying about), and O_SYNC (so for that one, you should error out if not set correctly, as the difference between O_SYNC on vs. off is significant).
As far as I know, newer kernels can change O_SYNC with fcntl, so it's probably worth trying, but you'd have to check if it has really been set.
Kevin
Ok I'll test it out and change O_SYNC if possible. Thanks. -- Regards, Corey

On 06/25/2012 10:34 AM, Eric Blake wrote:
On 06/25/2012 08:24 AM, Corey Bryant wrote:
+ if (fcntl_setfl(ret, O_CLOEXEC, (flags & O_CLOEXEC) ? 1 : 0) < 0) {
Broken. O_CLOEXEC _only_ affects open(); to change it on an existing fd, you have to use fcntl(F_GETFD/F_SETFD) (not F_GETFL/F_SETFL).
I'll fix this in v5.
Don't we already have qemu_set_cloexec() for this purpose?
Yes, it looks that way. I'll use qemu_set_cloexec().
I see your point. I'll call fcntl(F_GETFL) once to get the current flags, determine what needs to be set on/off, and then call fnctl(F_SETFL) once. In this case I won't be using fcntl_setfl() anymore. Do you want me to drop the fcntl_setfl() changes I made?
Or maybe make fcntl_setfl() take a mask of bits to alter? You're right that it's not worth changing if you won't take advantage of the changes.
Good idea. I'll modify fcntl_setfl() to take a mask of bits to turn on or off.
Also, I noticed in the fnctl man page that F_SETFL: "On Linux this command can change only the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags." So I'll only set/unset these flags.
O_NDELAY is the obsolete spelling of O_NONBLOCK; which means the only other flags in your list not supported by Linux are O_LARGEFILE (which I said was pointless), O_NOCTTY (which only has an impact at open() and not later on, so it is not worth worrying about), and O_SYNC (so for that one, you should error out if not set correctly, as the difference between O_SYNC on vs. off is significant).
Ok I'll take this into account. Thanks. -- Regards, Corey

On 06/25/2012 10:34 AM, Eric Blake wrote:
On 06/25/2012 08:24 AM, Corey Bryant wrote:
+ if (fcntl_setfl(ret, O_CLOEXEC, (flags & O_CLOEXEC) ? 1 : 0) < 0) {
Broken. O_CLOEXEC _only_ affects open(); to change it on an existing fd, you have to use fcntl(F_GETFD/F_SETFD) (not F_GETFL/F_SETFL).
I'll fix this in v5.
Don't we already have qemu_set_cloexec() for this purpose?
Yes, it looks that way. I'll use qemu_set_cloexec().
I see your point. I'll call fcntl(F_GETFL) once to get the current flags, determine what needs to be set on/off, and then call fnctl(F_SETFL) once. In this case I won't be using fcntl_setfl() anymore. Do you want me to drop the fcntl_setfl() changes I made?
Or maybe make fcntl_setfl() take a mask of bits to alter? You're right that it's not worth changing if you won't take advantage of the changes.
Good idea. I'll modify fcntl_setfl() to take a mask of bits to turn on or off.
Also, I noticed in the fnctl man page that F_SETFL: "On Linux this command can change only the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags." So I'll only set/unset these flags.
O_NDELAY is the obsolete spelling of O_NONBLOCK; which means the only other flags in your list not supported by Linux are O_LARGEFILE (which I said was pointless), O_NOCTTY (which only has an impact at open() and not later on, so it is not worth worrying about), and O_SYNC (so for that one, you should error out if not set correctly, as the difference between O_SYNC on vs. off is significant).
Ok I'll take this into account. Thanks. -- Regards, Corey

On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files stored on the same NFS mount.
This patch series adds the pass-fd QMP monitor command, which allows an fd to be passed via SCM_RIGHTS, and returns the received file descriptor. Support is also added to the block layer to allow QEMU to dup the fd when the filename is of the /dev/fd/X format. This is useful if MAC policy prevents QEMU from opening specific types of files.
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue. With this proposed series, we have usage akin to: 1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N" My problem is that none of this FD passing is "transactional". If libvirtd crashes or otherwise fails between steps 1 & 2, a FD is left open in QEMU. If libvirtd gets the failure detection wrong in step 2 (eg sees a I/O failure on the monitor, but from QEMU's pov drive_add succeeed), we could end up telling QEMU to close an FD that it is still using for a drive. Likewise if libvirtd fails/crashes between steps 2 & 3 we might not clean up after failure. These aren't new problems with pass_fd - they existed with getfd too of course. If we were designing this interface with no regard for the historical practice in QEMU, then I feel like we would not even bother to have either 'pass_fd' or 'getfd'. We would pass the FD(s) directly with the "drive_add" command. Given that we have decided that attaching special semantics to filenames matching "/dev/fd/N" is OK, then I feel we could go one better, and allow the FD to be passed with the "drive_add" (or other) commands directly. All we need do is define slightly different semantics for "/dev/fd/N". Instead of it meaning "use the process FD numbered N", we can define it to mean "use the n'th FD set in the current context". The "context" would be populated with all FDs received with the monitor current command. So now from a client's POV you'd have a flow like * drive_add "file=/dev/fd/N" FDSET={N} And in QEMU you'd have something like * handle_monitor_command - recvmsg all FDs, and stash them in a thread local "FDContext" context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from "FDContext" - If success remove /dev/fd/N from "FDContext" - close() all FDs left in "FDContext" The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am 26.06.2012 11:10, schrieb Daniel P. Berrange:
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
In fact, there are more steps: 4. use it successfully 5. close_fd "/dev/fd/N" I think it would well be possible that qemu just closes the fd when it's not used internally any more. The point of close_fd is more that libvirt still keeps a handle and can make use of this fd. One example where this is necessary is reopening the FD of a backing file in order to perform a live commit operation. Kevin

On Tue, 26 Jun 2012 13:45:52 +0200 Kevin Wolf <kwolf@redhat.com> wrote:
Am 26.06.2012 11:10, schrieb Daniel P. Berrange:
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
In fact, there are more steps:
4. use it successfully 5. close_fd "/dev/fd/N"
I think it would well be possible that qemu just closes the fd when it's not used internally any more.
pass-fd could have a flag indicating qemu to do that.

On 06/26/2012 04:50 PM, Luiz Capitulino wrote:
On Tue, 26 Jun 2012 13:45:52 +0200 Kevin Wolf <kwolf@redhat.com> wrote:
Am 26.06.2012 11:10, schrieb Daniel P. Berrange:
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
In fact, there are more steps:
4. use it successfully 5. close_fd "/dev/fd/N"
I think it would well be possible that qemu just closes the fd when it's not used internally any more.
pass-fd could have a flag indicating qemu to do that.
It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU. -- Regards, Corey

On 06/26/2012 04:28 PM, Corey Bryant wrote:
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
In fact, there are more steps:
4. use it successfully 5. close_fd "/dev/fd/N"
I think it would well be possible that qemu just closes the fd when it's not used internally any more.
pass-fd could have a flag indicating qemu to do that.
It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU.
The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals: Proposal One: enhance every command to take an fd:name protocol. PRO: Successful use of a name removes the fd from the 'getfd' list. CON: Lots of commands to change. CON: The command sequence is not atomic. CON: Block layer does not have a file name tied to the fd, so the reopen operation also needs to learn the fd:name protocol, but since we're already touching the command it's not much more work. USAGE: 1. getfd FDSET={M}, ties new fd to "name" 2. drive_add fd=name - looks up the fd by name from the list 3a. on success, fd is no longer in the list, drive_add consumed it 3b. on failure, libvirt must call 'closefd name' to avoid a leak 4. getfd FDSET={M2}, ties new fd to "newname" 5. block-commit fd=newname - looks up fd by name from the list 6a. on success, fd is no longer in the list, block-commit consumed it 6b. on failure, libvirt must call 'closefd newname' to avoid a leak Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then update that name to the new fd in advance of any operation that needs to do a reopen. PRO: All other commands work without impact by using qemu_open(), by getting a clone of the permanent name. CON: The permanent name must remain open as long as qemu might need to reopen it, and reopening needs the pass-fd force option. CON: The command sequence is not atomic. USAGE: 1. pass_fd FDSET={M} -> returns an integer N (or a string "/dev/fd/N") showing QEMU's permanent name of the FD 2. drive_add file=<permanent name> means that qemu_open() will clone the fd, and drive_add is now using yet another FD while N remains open; meanwhile, the block layer knows that the drive is tied to the permanent name 3. pass-fd force FDSET={M2} -> replaces fd N with the passed M2, and still returns /dev/fd/N 4. block-commit - looks up the block-device name (/dev/fd/N), which maps back to the permanent fd, and gets a copy of the newly passed M2 5. on completion (success or failure), libvirt must call 'closefd name' to avoid a leak Proposal Three: Use thread-local fds passed alongside any command, rather than passing via a separate command PRO: All commands can now atomically handle fd passing PRO: Commands that only need a one-shot open can now use fd CON: Commands that need to reopen still need modification to allow a name override during the reopen 1. drive_add nfds=1 file="/dev/fdlist/1" FDSET={M} -> on success, the fd is used as the block file, on failure it is atomically closed, so there is no leak either way. However, the block file has no permanent name. 2. block-commit nfds=1 file-override="/dev/fdlist/1" FDSET={M2} -> file override must be passed again (since we have no guarantee that the /dev/fdlist/1 of _this_ command matches the command-local naming used in the previous command), but the fd is again used atomically Under proposal 3, there is no need to dup fd's, merely only to check that qemu_open("/dev/fdlist/n", flags) has compatible flags with the fd received via FDSET. And the fact that things are made atomic means that libvirt no longer has to worry about calling closefd, nor does it have to worry about being interrupted between two monitor commands and not knowing what state qemu is in. But it does mean teaching every possible command that wants to do a reopen to learn how to use fd overrides rather than to reuse the permanent name that was originally in place on the first open. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Am 27.06.2012 00:54, schrieb Eric Blake:
It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU.
The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals:
Thanks for the summary. In fact, after having read this, I start thinking that we're oversimplifying things because we're only thinking about one use case, block-commit. There are more: Live snapshot doesn't only put a new image on top, it must also make the old top-level image read-only. This isn't bad per se, but it shows that QMP commands can easily become bloated if you decide to change every command. The really bad case that nobody thought of is that, when block-commit has finished, we need to switch back to read-only. This is an event that is not triggered by a certain monitor command, but that comes from inside qemu. I'm almost sure that we'll see more of this as we add more asynchronous commands. This only works if we can pass the new file descriptor in advance. It would work nicely if you go with pass-fd and actually maintain a list of file descriptors for each /dev/fd/N, along with the different flags the file descriptors are meant for. I can't see how it would work with the temporary /dev/fdlist/N or the fd:name approach because they both imply that the original file descriptors are closed by the time that the QMP command returns. Kevin

On Wed, Jun 27, 2012 at 11:16:32AM +0200, Kevin Wolf wrote:
Am 27.06.2012 00:54, schrieb Eric Blake:
It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU.
The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals:
Thanks for the summary. In fact, after having read this, I start thinking that we're oversimplifying things because we're only thinking about one use case, block-commit.
There are more: Live snapshot doesn't only put a new image on top, it must also make the old top-level image read-only. This isn't bad per se, but it shows that QMP commands can easily become bloated if you decide to change every command.
The really bad case that nobody thought of is that, when block-commit has finished, we need to switch back to read-only. This is an event that is not triggered by a certain monitor command, but that comes from inside qemu. I'm almost sure that we'll see more of this as we add more asynchronous commands.
This only works if we can pass the new file descriptor in advance. It would work nicely if you go with pass-fd and actually maintain a list of file descriptors for each /dev/fd/N, along with the different flags the file descriptors are meant for. I can't see how it would work with the temporary /dev/fdlist/N or the fd:name approach because they both imply that the original file descriptors are closed by the time that the QMP command returns.
What I don't understand though is that when you're "reopening" FDs, with the pass-fd command approach, you're merely dup'ing the original FDs that was passed in from the client. Why can't you alternatively just dup the FD you already have. I don't see why we need to keep the original FD around forever. If the QMP command handler nees the temporary /dev/fdlist/N file to exist for longer than the duration of the command, it can simply dup() it to get a permanent copy of its own. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am 27.06.2012 11:20, schrieb Daniel P. Berrange:
On Wed, Jun 27, 2012 at 11:16:32AM +0200, Kevin Wolf wrote:
The really bad case that nobody thought of is that, when block-commit has finished, we need to switch back to read-only. This is an event that is not triggered by a certain monitor command, but that comes from inside qemu. I'm almost sure that we'll see more of this as we add more asynchronous commands.
This only works if we can pass the new file descriptor in advance. It would work nicely if you go with pass-fd and actually maintain a list of file descriptors for each /dev/fd/N, along with the different flags the file descriptors are meant for. I can't see how it would work with the temporary /dev/fdlist/N or the fd:name approach because they both imply that the original file descriptors are closed by the time that the QMP command returns.
What I don't understand though is that when you're "reopening" FDs, with the pass-fd command approach, you're merely dup'ing the original FDs that was passed in from the client. Why can't you alternatively just dup the FD you already have.
That's easy: Because I don't know that I'm dealing with an FD. I think originally the whole point in this /dev/fd/ thing was that it would transparently enable the functionality for all parts of qemu: The block layer, of course, but also new char devices, migration targets, screenshot files or whatever else can deal with files. In this design the block layer doesn't even know that it's not a regular file. If we now decided to go for a design where the /dev/fd path isn't enough, but changes are needed to each subsystem where it should work, then this point doesn't apply any more and I cast my vote for abandoning everything starting with /dev/ and introducing a proper fd: protocol in the block layer, so that the block layer at least has a chance to know that it has to do treat this file specially.
I don't see why we need to keep the original FD around forever. If the QMP command handler nees the temporary /dev/fdlist/N file to exist for longer than the duration of the command, it can simply dup() it to get a permanent copy of its own.
If we're not going to make it transparent and if we don't care about QMP command bloat, we can choose completely different designs, yes. And then qemu could probably try to internally work around a suboptimal API. Kevin

On 06/27/2012 05:51 AM, Kevin Wolf wrote:
Am 27.06.2012 11:20, schrieb Daniel P. Berrange:
On Wed, Jun 27, 2012 at 11:16:32AM +0200, Kevin Wolf wrote:
The really bad case that nobody thought of is that, when block-commit has finished, we need to switch back to read-only. This is an event that is not triggered by a certain monitor command, but that comes from inside qemu. I'm almost sure that we'll see more of this as we add more asynchronous commands.
This only works if we can pass the new file descriptor in advance. It would work nicely if you go with pass-fd and actually maintain a list of file descriptors for each /dev/fd/N, along with the different flags the file descriptors are meant for. I can't see how it would work with the temporary /dev/fdlist/N or the fd:name approach because they both imply that the original file descriptors are closed by the time that the QMP command returns.
What I don't understand though is that when you're "reopening" FDs, with the pass-fd command approach, you're merely dup'ing the original FDs that was passed in from the client. Why can't you alternatively just dup the FD you already have.
That's easy: Because I don't know that I'm dealing with an FD. I think originally the whole point in this /dev/fd/ thing was that it would transparently enable the functionality for all parts of qemu: The block layer, of course, but also new char devices, migration targets, screenshot files or whatever else can deal with files. In this design the block layer doesn't even know that it's not a regular file.
If we now decided to go for a design where the /dev/fd path isn't enough, but changes are needed to each subsystem where it should work, then this point doesn't apply any more and I cast my vote for abandoning everything starting with /dev/ and introducing a proper fd: protocol in the block layer, so that the block layer at least has a chance to know that it has to do treat this file specially.
I don't see why we need to keep the original FD around forever. If the QMP command handler nees the temporary /dev/fdlist/N file to exist for longer than the duration of the command, it can simply dup() it to get a permanent copy of its own.
If we're not going to make it transparent and if we don't care about QMP command bloat, we can choose completely different designs, yes. And then qemu could probably try to internally work around a suboptimal API.
Kevin
Are there any more thoughts on how to move forward with this? It seems like we're at a stand-still with the discussion. -- Regards, Corey

On 06/26/2012 06:54 PM, Eric Blake wrote:
On 06/26/2012 04:28 PM, Corey Bryant wrote:
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
In fact, there are more steps:
4. use it successfully 5. close_fd "/dev/fd/N"
I think it would well be possible that qemu just closes the fd when it's not used internally any more.
pass-fd could have a flag indicating qemu to do that.
It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU.
The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals:
Proposal One: enhance every command to take an fd:name protocol. PRO: Successful use of a name removes the fd from the 'getfd' list. CON: Lots of commands to change. CON: The command sequence is not atomic. CON: Block layer does not have a file name tied to the fd, so the reopen operation also needs to learn the fd:name protocol, but since we're already touching the command it's not much more work. USAGE: 1. getfd FDSET={M}, ties new fd to "name" 2. drive_add fd=name - looks up the fd by name from the list 3a. on success, fd is no longer in the list, drive_add consumed it 3b. on failure, libvirt must call 'closefd name' to avoid a leak 4. getfd FDSET={M2}, ties new fd to "newname" 5. block-commit fd=newname - looks up fd by name from the list 6a. on success, fd is no longer in the list, block-commit consumed it 6b. on failure, libvirt must call 'closefd newname' to avoid a leak
Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then update that name to the new fd in advance of any operation that needs to do a reopen. PRO: All other commands work without impact by using qemu_open(), by getting a clone of the permanent name. CON: The permanent name must remain open as long as qemu might need to reopen it, and reopening needs the pass-fd force option. CON: The command sequence is not atomic. USAGE: 1. pass_fd FDSET={M} -> returns an integer N (or a string "/dev/fd/N") showing QEMU's permanent name of the FD 2. drive_add file=<permanent name> means that qemu_open() will clone the fd, and drive_add is now using yet another FD while N remains open; meanwhile, the block layer knows that the drive is tied to the permanent name 3. pass-fd force FDSET={M2} -> replaces fd N with the passed M2, and still returns /dev/fd/N 4. block-commit - looks up the block-device name (/dev/fd/N), which maps back to the permanent fd, and gets a copy of the newly passed M2 5. on completion (success or failure), libvirt must call 'closefd name' to avoid a leak
Proposal Three: Use thread-local fds passed alongside any command, rather than passing via a separate command PRO: All commands can now atomically handle fd passing PRO: Commands that only need a one-shot open can now use fd CON: Commands that need to reopen still need modification to allow a name override during the reopen 1. drive_add nfds=1 file="/dev/fdlist/1" FDSET={M} -> on success, the fd is used as the block file, on failure it is atomically closed, so there is no leak either way. However, the block file has no permanent name. 2. block-commit nfds=1 file-override="/dev/fdlist/1" FDSET={M2} -> file override must be passed again (since we have no guarantee that the /dev/fdlist/1 of _this_ command matches the command-local naming used in the previous command), but the fd is again used atomically
Under proposal 3, there is no need to dup fd's, merely only to check that qemu_open("/dev/fdlist/n", flags) has compatible flags with the fd received via FDSET. And the fact that things are made atomic means that libvirt no longer has to worry about calling closefd, nor does it have to worry about being interrupted between two monitor commands and not knowing what state qemu is in. But it does mean teaching every possible command that wants to do a reopen to learn how to use fd overrides rather than to reuse the permanent name that was originally in place on the first open.
Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong. Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' } close-fds: { 'command': 'close-fds', 'data': {'fdsetname': 'str' } where: @fdsetname - the file descriptor set name @close - *if true QEMU closes the monitor fds when they expire. if false, fds remain on the monitor until close-fds command. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage if close==true specified CON: Determining when to close fds when close==true may be tricky USAGE: 1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to the passed set of fds. 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the list that has access flags matching the qemu_open() action flags. 3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first fd from the list that has access flags matching the qemu_open() action flags. 4. The monitor fds are closed: A) *If @close == true, qemu closes the set of fds when the timer expires B) If @close == false, libvirt must issue close-fds command to close the set of fds *How to solve this has yet to be determined. Kevin mentioned potentially using a bottom-half or a timer in qemu_close(). -- Regards, Corey

On 07/02/2012 04:02 PM, Corey Bryant wrote:
Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong.
Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
But this means that libvirt has to open a file O_RDWR up front for any file that it _might_ need qemu to reopen later, and that qemu is now hanging on to 2 fds per fdset instead of 1 fd for the life of any client of the fdset. I see no reason why libvirt can't pass in an O_RDWR fd when qemu only needs to use an O_RDONLY fd; making qemu insist on an O_RDONLY fd makes life harder for libvirt. On the other hand, I can see from a safety standpoint that passing in an O_RDWR fd opens up more potential for errors than passing in an O_RDONLY fd, but if you don't know up front whether you will ever need to write into a file, then it would be better to pass in O_RDONLY. At least I don't see it as a security risk: passing in O_RDWR but setting SELinux permissions on the fd to only allow read() but not write() via the labeling of the fd may be possible, so that libvirt could still prevent accidental writes into an O_RDWR file that starts life only needing to service reads.
pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' }
This still doesn't solve Dan's complaint that things are not atomic; if the monitor dies between the pass-fds and the later use of the fdset, we would need a counterpart monitor command to query what fdsets are currently known by qemu. And like you pointed out, you didn't make it clear how a timeout mechanism would be implemented to auto-close fds that are not consumed in a fixed amount of time - would that be another optional parameter to 'pass-fds'? Or do we need a way to initially create a set with only one O_RDONLY fd, but later pass in a new O_RDWR fd but associate it with the existing set rather than creating a new set (akin to the 'force' option of 'pass-fd' in proposal two)?
close-fds: { 'command': 'close-fds', 'data': {'fdsetname': 'str' } where: @fdsetname - the file descriptor set name @close - *if true QEMU closes the monitor fds when they expire. if false, fds remain on the monitor until close-fds command. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage if close==true specified CON: Determining when to close fds when close==true may be tricky USAGE: 1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to the passed set of fds. 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the list that has access flags matching the qemu_open() action flags. 3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first fd from the list that has access flags matching the qemu_open() action flags. 4. The monitor fds are closed: A) *If @close == true, qemu closes the set of fds when the timer expires B) If @close == false, libvirt must issue close-fds command to close the set of fds
*How to solve this has yet to be determined. Kevin mentioned potentially using a bottom-half or a timer in qemu_close().
Still, it is certainly an option worth thinking about, and I'm hoping we can come to a solid design that everyone agrees provides the desired security and flexibility without having to recode every single existing command. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Jul 02, 2012 at 04:31:09PM -0600, Eric Blake wrote:
On 07/02/2012 04:02 PM, Corey Bryant wrote:
Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong.
Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
But this means that libvirt has to open a file O_RDWR up front for any file that it _might_ need qemu to reopen later, and that qemu is now hanging on to 2 fds per fdset instead of 1 fd for the life of any client of the fdset.
I see no reason why libvirt can't pass in an O_RDWR fd when qemu only needs to use an O_RDONLY fd;
If libvirt has only granted read-only access to the file with sVirt, then passing a O_RDWR file handle to QEMU will result in an SELinux denial, even if QEMU doesn't try to do I/O on it. So this is out of the question. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am 03.07.2012 00:31, schrieb Eric Blake:
On 07/02/2012 04:02 PM, Corey Bryant wrote:
Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong.
Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
But this means that libvirt has to open a file O_RDWR up front for any file that it _might_ need qemu to reopen later, and that qemu is now hanging on to 2 fds per fdset instead of 1 fd for the life of any client of the fdset.
It doesn't have to be up front. There's no reason not to have monitor commands that add or remove fds from a given fdset later.
I see no reason why libvirt can't pass in an O_RDWR fd when qemu only needs to use an O_RDONLY fd; making qemu insist on an O_RDONLY fd makes life harder for libvirt. On the other hand, I can see from a safety standpoint that passing in an O_RDWR fd opens up more potential for errors than passing in an O_RDONLY fd,
Yes, this is exactly my consideration. Having a read-only file opened as O_RDWR gives us the chance to make stupid mistakes.
but if you don't know up front whether you will ever need to write into a file, then it would be better to pass in O_RDONLY. At least I don't see it as a security risk: passing in O_RDWR but setting SELinux permissions on the fd to only allow read() but not write() via the labeling of the fd may be possible, so that libvirt could still prevent accidental writes into an O_RDWR file that starts life only needing to service reads.
But this would assume that libvirt knows exactly when a reopen happens, for each current and future qemu version. I wouldn't want to tie qemu's internals so closely to the management application, even if libvirt could probably get it reasonably right (allow rw before sending a monitor command; revoke rw when receiving a QMP event that the commit has completed). It wouldn't work if we had a qemu-initiated ro->rw transition, but I don't think we have one at the moment.
pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' }
This still doesn't solve Dan's complaint that things are not atomic; if the monitor dies between the pass-fds and the later use of the fdset, we would need a counterpart monitor command to query what fdsets are currently known by qemu.
If you want a query-fdsets, that should be easy enough. Actually, we might be able to even make the command transactionable. This would of course require blockdev-add to be transactionable as well to be of any use.
And like you pointed out, you didn't make it clear how a timeout mechanism would be implemented to auto-close fds that are not consumed in a fixed amount of time - would that be another optional parameter to 'pass-fds'?
Do you think a timeout is helpful? Can't we just drop libvirt's reference automatically if the monitor connection gets closed? The BH/timer thing Corey mentioned is more about the qemu internal problem that during a reopen there may be a short period where the old fd is closed, but the new one not yet opened, so the fdset might need to survive a short period with refcount 0.
Or do we need a way to initially create a set with only one O_RDONLY fd, but later pass in a new O_RDWR fd but associate it with the existing set rather than creating a new set (akin to the 'force' option of 'pass-fd' in proposal two)?
As I said above, yes, I think this makes a lot sense. Kevin

On 07/02/2012 06:31 PM, Eric Blake wrote:
On 07/02/2012 04:02 PM, Corey Bryant wrote:
Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong.
Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
But this means that libvirt has to open a file O_RDWR up front for any file that it _might_ need qemu to reopen later, and that qemu is now hanging on to 2 fds per fdset instead of 1 fd for the life of any client of the fdset.
I see no reason why libvirt can't pass in an O_RDWR fd when qemu only needs to use an O_RDONLY fd; making qemu insist on an O_RDONLY fd makes life harder for libvirt. On the other hand, I can see from a safety standpoint that passing in an O_RDWR fd opens up more potential for errors than passing in an O_RDONLY fd, but if you don't know up front whether you will ever need to write into a file, then it would be better to pass in O_RDONLY. At least I don't see it as a security risk: passing in O_RDWR but setting SELinux permissions on the fd to only allow read() but not write() via the labeling of the fd may be possible, so that libvirt could still prevent accidental writes into an O_RDWR file that starts life only needing to service reads.
pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' }
This still doesn't solve Dan's complaint that things are not atomic; if the monitor dies between the pass-fds and the later use of the fdset, we would need a counterpart monitor command to query what fdsets are currently known by qemu. And like you pointed out, you didn't make it clear how a timeout mechanism would be implemented to auto-close fds that are not consumed in a fixed amount of time - would that be another optional parameter to 'pass-fds'?
Yes see the description of the close bool on pass-fds below which determines how the fds are closed. It's not an atomic operation, but it does handle Dan's concern of preventing fd leaks. If libvirt did crash after the pass-fd, then couldn't it just abandon the fd set (which will get closed) and libvirt could send a new fd set and work with that?
Or do we need a way to initially create a set with only one O_RDONLY fd, but later pass in a new O_RDWR fd but associate it with the existing set rather than creating a new set (akin to the 'force' option of 'pass-fd' in proposal two)?
Yeah I see what you're saying. I was also wondering if it would make sense for qemu to consume the passed fds and use those (ie. no dup()). This would prevent the issues of having to close the fds that linger on the monitor. But I don't know if it's realistic for libvirt to know how many open calls qemu will need to make (this would need 1 fd passed per open call).
close-fds: { 'command': 'close-fds', 'data': {'fdsetname': 'str' } where: @fdsetname - the file descriptor set name @close - *if true QEMU closes the monitor fds when they expire. if false, fds remain on the monitor until close-fds command. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage if close==true specified CON: Determining when to close fds when close==true may be tricky USAGE: 1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to the passed set of fds. 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the list that has access flags matching the qemu_open() action flags. 3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first fd from the list that has access flags matching the qemu_open() action flags. 4. The monitor fds are closed: A) *If @close == true, qemu closes the set of fds when the timer expires B) If @close == false, libvirt must issue close-fds command to close the set of fds
*How to solve this has yet to be determined. Kevin mentioned potentially using a bottom-half or a timer in qemu_close().
Still, it is certainly an option worth thinking about, and I'm hoping we can come to a solid design that everyone agrees provides the desired security and flexibility without having to recode every single existing command.
I agree. -- Regards, Corey

On 07/02/2012 06:02 PM, Corey Bryant wrote:
On 06/26/2012 06:54 PM, Eric Blake wrote:
On 06/26/2012 04:28 PM, Corey Bryant wrote:
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
In fact, there are more steps:
4. use it successfully 5. close_fd "/dev/fd/N"
I think it would well be possible that qemu just closes the fd when it's not used internally any more.
pass-fd could have a flag indicating qemu to do that.
It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU.
The more we discuss this, the more I'm convinced that commands like 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR will need to have an optional argument that says the name of the file to reopen. That is, I've seen three proposals:
Proposal One: enhance every command to take an fd:name protocol. PRO: Successful use of a name removes the fd from the 'getfd' list. CON: Lots of commands to change. CON: The command sequence is not atomic. CON: Block layer does not have a file name tied to the fd, so the reopen operation also needs to learn the fd:name protocol, but since we're already touching the command it's not much more work. USAGE: 1. getfd FDSET={M}, ties new fd to "name" 2. drive_add fd=name - looks up the fd by name from the list 3a. on success, fd is no longer in the list, drive_add consumed it 3b. on failure, libvirt must call 'closefd name' to avoid a leak 4. getfd FDSET={M2}, ties new fd to "newname" 5. block-commit fd=newname - looks up fd by name from the list 6a. on success, fd is no longer in the list, block-commit consumed it 6b. on failure, libvirt must call 'closefd newname' to avoid a leak
Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then update that name to the new fd in advance of any operation that needs to do a reopen. PRO: All other commands work without impact by using qemu_open(), by getting a clone of the permanent name. CON: The permanent name must remain open as long as qemu might need to reopen it, and reopening needs the pass-fd force option. CON: The command sequence is not atomic. USAGE: 1. pass_fd FDSET={M} -> returns an integer N (or a string "/dev/fd/N") showing QEMU's permanent name of the FD 2. drive_add file=<permanent name> means that qemu_open() will clone the fd, and drive_add is now using yet another FD while N remains open; meanwhile, the block layer knows that the drive is tied to the permanent name 3. pass-fd force FDSET={M2} -> replaces fd N with the passed M2, and still returns /dev/fd/N 4. block-commit - looks up the block-device name (/dev/fd/N), which maps back to the permanent fd, and gets a copy of the newly passed M2 5. on completion (success or failure), libvirt must call 'closefd name' to avoid a leak
Proposal Three: Use thread-local fds passed alongside any command, rather than passing via a separate command PRO: All commands can now atomically handle fd passing PRO: Commands that only need a one-shot open can now use fd CON: Commands that need to reopen still need modification to allow a name override during the reopen 1. drive_add nfds=1 file="/dev/fdlist/1" FDSET={M} -> on success, the fd is used as the block file, on failure it is atomically closed, so there is no leak either way. However, the block file has no permanent name. 2. block-commit nfds=1 file-override="/dev/fdlist/1" FDSET={M2} -> file override must be passed again (since we have no guarantee that the /dev/fdlist/1 of _this_ command matches the command-local naming used in the previous command), but the fd is again used atomically
Under proposal 3, there is no need to dup fd's, merely only to check that qemu_open("/dev/fdlist/n", flags) has compatible flags with the fd received via FDSET. And the fact that things are made atomic means that libvirt no longer has to worry about calling closefd, nor does it have to worry about being interrupted between two monitor commands and not knowing what state qemu is in. But it does mean teaching every possible command that wants to do a reopen to learn how to use fd overrides rather than to reuse the permanent name that was originally in place on the first open.
Here's another option that Kevin and I discussed today on IRC. I've modified a few minor details since the discussion. And Kevin please correct me if anything is wrong.
Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. pass-fds: { 'command': 'pass-fds', 'data': {'fdsetname': 'str', '*close': 'bool'}, 'returns': 'string' } close-fds: { 'command': 'close-fds', 'data': {'fdsetname': 'str' } where: @fdsetname - the file descriptor set name @close - *if true QEMU closes the monitor fds when they expire. if false, fds remain on the monitor until close-fds command. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage if close==true specified CON: Determining when to close fds when close==true may be tricky USAGE: 1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to the passed set of fds. 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the list that has access flags matching the qemu_open() action flags. 3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first fd from the list that has access flags matching the qemu_open() action flags. 4. The monitor fds are closed: A) *If @close == true, qemu closes the set of fds when the timer expires B) If @close == false, libvirt must issue close-fds command to close the set of fds
*How to solve this has yet to be determined. Kevin mentioned potentially using a bottom-half or a timer in qemu_close().
Thanks again for taking time to discuss this at today's QEMU community call. Here's the proposal we discussed at the call. Please let me know if I missed anything or if there are any issues with this design. Proposal Five: New monitor commands enable adding/removing an fd to/from a set. The set of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage (fds are associated with monitor connection and, if not in use, closed when monitor disconnects) PRO: Security-wise this is ok since libvirt can manage the set of fd's (ie. it can add/remove an O_RDWR fd to/from the set as needed). CON: Not atomic (e.g. doesn't add an fd with single drive_add command). USAGE: 1. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd (e.g fd=4) to caller. libvirt in-use flag turned on for fd. 2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it. 5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off. More functional details: -If libvirt crashes it can call "query-fd /dev/fdset/1" to determine which fds are open in the set. -If monitor connection closes, qemu will close fds that have a refcount of zero. Do we also need a qemu in-use flag in case refcount is zero and fd is still in use? -This support requires introduction of qemu_close function that will be called everywhere in block layer that close is currently called. Notes: -Patch series 1 will include support for all of the above. This will be my initial focus. -Patch series 2 will include command line support that enables association of command line fd with a monitor set. This will be my secondary focus, most likely after patch series 1 is applied. -- Regards, Corey

Am 03.07.2012 17:40, schrieb Corey Bryant:
Thanks again for taking time to discuss this at today's QEMU community call.
Here's the proposal we discussed at the call. Please let me know if I missed anything or if there are any issues with this design.
Proposal Five: New monitor commands enable adding/removing an fd to/from a set. The set of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage (fds are associated with monitor connection and, if not in use, closed when monitor disconnects) PRO: Security-wise this is ok since libvirt can manage the set of fd's (ie. it can add/remove an O_RDWR fd to/from the set as needed). CON: Not atomic (e.g. doesn't add an fd with single drive_add command). USAGE: 1. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd (e.g fd=4) to caller. libvirt in-use flag turned on for fd.
I thought qemu would rather return the number of the fdset (which it also assigns if none it passed, i.e. for fdset creation). Does libvirt need the number of an individual fd? If libvirt prefers to assign fdset numbers itself, I'm not against it, it's just something that wasn't clear to me yet.
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it.
If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here.
5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off.
The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail.
More functional details: -If libvirt crashes it can call "query-fd /dev/fdset/1" to determine which fds are open in the set.
We also need a query-fdsets command that lists all fdsets that exist. If we add information about single fds to the return value of it, we probably don't need a separate query-fd that operates on a single fdset.
-If monitor connection closes, qemu will close fds that have a refcount of zero. Do we also need a qemu in-use flag in case refcount is zero and fd is still in use?
In use by whom? If it's still in use in qemu (as in "in-use flag would be set") and we have a refcount of zero, then that's a bug.
-This support requires introduction of qemu_close function that will be called everywhere in block layer that close is currently called.
Notes: -Patch series 1 will include support for all of the above. This will be my initial focus. -Patch series 2 will include command line support that enables association of command line fd with a monitor set. This will be my secondary focus, most likely after patch series 1 is applied.
Thanks, this is a good and as far as I can tell complete summary of what we discussed. Kevin

On 07/03/2012 11:59 AM, Kevin Wolf wrote:
Am 03.07.2012 17:40, schrieb Corey Bryant:
Thanks again for taking time to discuss this at today's QEMU community call.
Here's the proposal we discussed at the call. Please let me know if I missed anything or if there are any issues with this design.
Proposal Five: New monitor commands enable adding/removing an fd to/from a set. The set of fds should all refer to the same file, but may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. PRO: Supports reopen PRO: All other commands work without impact by using qemu_open() PRO: No fd leakage (fds are associated with monitor connection and, if not in use, closed when monitor disconnects) PRO: Security-wise this is ok since libvirt can manage the set of fd's (ie. it can add/remove an O_RDWR fd to/from the set as needed). CON: Not atomic (e.g. doesn't add an fd with single drive_add command). USAGE: 1. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd (e.g fd=4) to caller. libvirt in-use flag turned on for fd.
I thought qemu would rather return the number of the fdset (which it also assigns if none it passed, i.e. for fdset creation). Does libvirt need the number of an individual fd?
If libvirt prefers to assign fdset numbers itself, I'm not against it, it's just something that wasn't clear to me yet.
That's fine. QEMU can return the fdset number or a string (/dev/fdset/1) if none is specified. And an fdset will need to be specified if adding to an existing set. I think libvirt will need the fd returned by add-fd so that it can evaluate fds returned by query-fd. It's also useful for remove-fd.
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it.
If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here.
Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove?
5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off.
The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail.
I'm not sure I understand what you mean.
More functional details: -If libvirt crashes it can call "query-fd /dev/fdset/1" to determine which fds are open in the set.
We also need a query-fdsets command that lists all fdsets that exist. If we add information about single fds to the return value of it, we probably don't need a separate query-fd that operates on a single fdset.
Yes, good point. And maybe we don't need 2 commands. query-fdsets could return all the sets and all the fds that are in those sets.
-If monitor connection closes, qemu will close fds that have a refcount of zero. Do we also need a qemu in-use flag in case refcount is zero and fd is still in use?
In use by whom? If it's still in use in qemu (as in "in-use flag would be set") and we have a refcount of zero, then that's a bug.
In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd.
-This support requires introduction of qemu_close function that will be called everywhere in block layer that close is currently called.
Notes: -Patch series 1 will include support for all of the above. This will be my initial focus. -Patch series 2 will include command line support that enables association of command line fd with a monitor set. This will be my secondary focus, most likely after patch series 1 is applied.
Thanks, this is a good and as far as I can tell complete summary of what we discussed.
Kevin
Definitely! Thank you for all the input. -- Regards, Corey

On 07/03/2012 10:25 AM, Corey Bryant wrote:
I thought qemu would rather return the number of the fdset (which it also assigns if none it passed, i.e. for fdset creation). Does libvirt need the number of an individual fd?
If libvirt prefers to assign fdset numbers itself, I'm not against it, it's just something that wasn't clear to me yet.
That's fine. QEMU can return the fdset number or a string (/dev/fdset/1) if none is specified. And an fdset will need to be specified if adding to an existing set.
I think libvirt will need the fd returned by add-fd so that it can evaluate fds returned by query-fd. It's also useful for remove-fd.
Correct - since we will be adding a remove-fd, then that command needs to know both the fdset name and the individual fd within the set to be removed.
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it.
If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here.
Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove?
Passing the flag parameters is not trivial, as that would mean the QMP code would have to define constants mapping to all of the O_* flags that qemu_open supports. It's easier to support closing by fd number.
5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off.
The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail.
I'm not sure I understand what you mean.
pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, with initial use count of 1 (the use is the monitor). qemu_open() increments the use count. A new qemu_close() wrapper would decrement the use count. And both calling 'remove-fd', or closing the QMP monitor of an fd that has not yet been passed through 'remove-fd', serves as a way to decrement the use count. You'd still have to track whether the monitor is using an fd (to avoid over-decrementing on QMP monitor close), but by having the monitor's use also tracked under the refcount, then refcount reaching 0 is sufficient to auto-close an fd. I think that also means that re-establishing the client QMP connection would increment For some examples: 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in use by the monitor, refcount decremented to 1 but still left open because it is in use by the block device 4. client crashes, so all tracked fds are visited; but fd=4 is already marked as not in use by the monitor, so its refcount is unchanged 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in use by the monitor, refcount is now decremented to 0 and fd=4 is closed I think that covers the idea; you need a bool in_use for tracking monitor state (the monitor is in use until either a remove-fd or a monitor connection closes), as well as a ref-count.
We also need a query-fdsets command that lists all fdsets that exist. If we add information about single fds to the return value of it, we probably don't need a separate query-fd that operates on a single fdset.
Yes, good point. And maybe we don't need 2 commands. query-fdsets could return all the sets and all the fds that are in those sets.
Yes, I think a single query command is good enough here, something like: { "execute":"query-fdsets" } => { "return" : { "sets": [ { "name": "fdset1", "fds": [ { "fd": 4, "monitor": true, "refcount": 1 } ] }, { "name": "fdset2", "fds": [ { "fd": 5, "monitor": false, "refcount": 1 }, { "fd": 6, "monitor": true, "refcount": 2 } ] } ] } }
In use by whom? If it's still in use in qemu (as in "in-use flag would be set") and we have a refcount of zero, then that's a bug.
In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd.
I think the refcount being non-zero _is_ what defines an fd as being in use by qemu (including use by the monitor). Any place you have to close an fd before reopening it is dangerous; the safe way is always to open with the new permissions before closing the old permissions. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/03/2012 01:03 PM, Eric Blake wrote:
On 07/03/2012 10:25 AM, Corey Bryant wrote:
I thought qemu would rather return the number of the fdset (which it also assigns if none it passed, i.e. for fdset creation). Does libvirt need the number of an individual fd?
If libvirt prefers to assign fdset numbers itself, I'm not against it, it's just something that wasn't clear to me yet.
That's fine. QEMU can return the fdset number or a string (/dev/fdset/1) if none is specified. And an fdset will need to be specified if adding to an existing set.
I think libvirt will need the fd returned by add-fd so that it can evaluate fds returned by query-fd. It's also useful for remove-fd.
Correct - since we will be adding a remove-fd, then that command needs to know both the fdset name and the individual fd within the set to be removed.
Ok
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it.
If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here.
Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove?
Passing the flag parameters is not trivial, as that would mean the QMP code would have to define constants mapping to all of the O_* flags that qemu_open supports. It's easier to support closing by fd number.
I understand what you were saying now, although I guess it's not applicable at this point. I'll plan on returning the fd from add-fd and passing the fd on the remove-fd command.
5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off.
The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail.
I'm not sure I understand what you mean.
pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, with initial use count of 1 (the use is the monitor). qemu_open() increments the use count. A new qemu_close() wrapper would decrement the use count. And both calling 'remove-fd', or closing the QMP monitor of an fd that has not yet been passed through 'remove-fd', serves as a way to decrement the use count. You'd still have to track whether the monitor is using an fd (to avoid over-decrementing on QMP monitor close), but by having the monitor's use also tracked under the refcount, then refcount reaching 0 is sufficient to auto-close an fd. I think that also means that re-establishing the client QMP connection would increment For some examples:
Yes, I think adding a +1 to the refcount for the monitor makes sense. I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects.
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it
Just a note that the fd above also hasn't yet been referenced by a drive-add/device-add, so it will be closed in step 2.
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor
And in step 4 the QMP connection will increment the refcount +1 for all fds that persisted through the QMP disconnect. (?)
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in use by the monitor, refcount decremented to 1 but still left open because it is in use by the block device 4. client crashes, so all tracked fds are visited; but fd=4 is already marked as not in use by the monitor, so its refcount is unchanged
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in use by the monitor, refcount is now decremented to 0 and fd=4 is closed
I think that covers the idea; you need a bool in_use for tracking monitor state (the monitor is in use until either a remove-fd or a monitor connection closes), as well as a ref-count.
Yes, it all makes sense to me. Thanks for the scenarios.
We also need a query-fdsets command that lists all fdsets that exist. If we add information about single fds to the return value of it, we probably don't need a separate query-fd that operates on a single fdset.
Yes, good point. And maybe we don't need 2 commands. query-fdsets could return all the sets and all the fds that are in those sets.
Yes, I think a single query command is good enough here, something like:
{ "execute":"query-fdsets" } => { "return" : { "sets": [ { "name": "fdset1", "fds": [ { "fd": 4, "monitor": true, "refcount": 1 } ] }, { "name": "fdset2", "fds": [ { "fd": 5, "monitor": false, "refcount": 1 }, { "fd": 6, "monitor": true, "refcount": 2 } ] } ] } }
Ok, thanks!
In use by whom? If it's still in use in qemu (as in "in-use flag would be set") and we have a refcount of zero, then that's a bug.
In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd.
I think the refcount being non-zero _is_ what defines an fd as being in use by qemu (including use by the monitor). Any place you have to close an fd before reopening it is dangerous; the safe way is always to open with the new permissions before closing the old permissions.
Maybe Kevin wants to weigh in on this. Perhaps it's an issue that can be separated from my patch series. -- Regards, Corey

On 07/03/2012 11:46 AM, Corey Bryant wrote:
Yes, I think adding a +1 to the refcount for the monitor makes sense.
I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects.
Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts. The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd.
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor
And in step 4 the QMP connection will increment the refcount +1 for all fds that persisted through the QMP disconnect. (?)
I'm not sure we need the refcount increment on reconnect. 'query-fds' should provide enough information for the new monitor to know what fdsets are still in use by qemu, even though they are no longer available to 'remove-fd' from the monitor, and if the monitor is worried about keeping the fd set alive it can call 'add-fd' to again associate a new fd with the set. The lifetime of a set is thus as long as any of its associated fds have a non-zero refcount. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/03/2012 02:00 PM, Eric Blake wrote:
On 07/03/2012 11:46 AM, Corey Bryant wrote:
Yes, I think adding a +1 to the refcount for the monitor makes sense.
I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects.
Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts.
This doesn't sound ideal.
The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd.
I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect.
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor
And in step 4 the QMP connection will increment the refcount +1 for all fds that persisted through the QMP disconnect. (?)
I'm not sure we need the refcount increment on reconnect. 'query-fds' should provide enough information for the new monitor to know what fdsets are still in use by qemu, even though they are no longer available to 'remove-fd' from the monitor, and if the monitor is worried about keeping the fd set alive it can call 'add-fd' to again associate a new fd with the set. The lifetime of a set is thus as long as any of its associated fds have a non-zero refcount.
This sounds good to me. And qemu_open will need to make sure the monitor in_use flag is true before dup'ing an fd. -- Regards, Corey

Am 03.07.2012 20:21, schrieb Corey Bryant:
On 07/03/2012 02:00 PM, Eric Blake wrote:
On 07/03/2012 11:46 AM, Corey Bryant wrote:
Yes, I think adding a +1 to the refcount for the monitor makes sense.
I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects.
Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts.
This doesn't sound ideal.
Yes, it's less than ideal.
The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd.
I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect.
In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset... You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP. Kevin

On 07/04/2012 04:09 AM, Kevin Wolf wrote:
Am 03.07.2012 20:21, schrieb Corey Bryant:
On 07/03/2012 02:00 PM, Eric Blake wrote:
On 07/03/2012 11:46 AM, Corey Bryant wrote:
Yes, I think adding a +1 to the refcount for the monitor makes sense.
I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects.
Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts.
This doesn't sound ideal.
Yes, it's less than ideal.
The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd.
I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect.
In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset...
You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP.
Ok. So refcount would be incremented (for the fd or fdset, whatever we decide on) when QMP reconnects. I'm assuming we wouldn't wait until after a query-fds call. I'll go with this approach until someone objects. -- Regards, Corey

On Thu, 05 Jul 2012 11:06:56 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 07/04/2012 04:09 AM, Kevin Wolf wrote:
Am 03.07.2012 20:21, schrieb Corey Bryant:
On 07/03/2012 02:00 PM, Eric Blake wrote:
On 07/03/2012 11:46 AM, Corey Bryant wrote:
Yes, I think adding a +1 to the refcount for the monitor makes sense.
I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects.
Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts.
This doesn't sound ideal.
Yes, it's less than ideal.
The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd.
I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect.
In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset...
You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP.
Ok. So refcount would be incremented (for the fd or fdset, whatever we decide on) when QMP reconnects. I'm assuming we wouldn't wait until after a query-fds call.
I'm not sure this is a good idea because we will leak fds if the client forgets about the fds when re-connecting (ie. it was restarted) or if a different client connects to QMP. If we really want to do that, I think that the right way of doing this is to add a command for clients to re-again ownership of the fds on reconnection. But to be honest, I don't fully understand why this is needed.

On 07/09/2012 10:05 AM, Luiz Capitulino wrote:
On Thu, 05 Jul 2012 11:06:56 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 07/04/2012 04:09 AM, Kevin Wolf wrote:
Am 03.07.2012 20:21, schrieb Corey Bryant:
On 07/03/2012 02:00 PM, Eric Blake wrote:
On 07/03/2012 11:46 AM, Corey Bryant wrote:
Yes, I think adding a +1 to the refcount for the monitor makes sense.
I'm a bit unsure how to increment the refcount when a monitor reconnects though. Maybe it is as simple as adding a +1 to each fd's refcount when the next QMP monitor connects.
Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts.
This doesn't sound ideal.
Yes, it's less than ideal.
The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd.
I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect.
In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset...
You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP.
Ok. So refcount would be incremented (for the fd or fdset, whatever we decide on) when QMP reconnects. I'm assuming we wouldn't wait until after a query-fds call.
I'm not sure this is a good idea because we will leak fds if the client forgets about the fds when re-connecting (ie. it was restarted) or if a different client connects to QMP.
If we really want to do that, I think that the right way of doing this is to add a command for clients to re-again ownership of the fds on reconnection.
But to be honest, I don't fully understand why this is needed.
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues. FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags. Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) -> add-fd monitor function initializes the monitor inuse flag for the fdset to true -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller 2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented 3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller 4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented 5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4 6. qemu_close (need to replace all close calls in block layer with qemu_close) -> qemu_close decrements refcount for fdset -> qemu_close closes all fds that have (refcount == 0 && (!inuse || remove)) -> qemu_close frees the fdset if no fds remain in it 7. disconnecting the QMP monitor -> monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset 8. connecting the QMP monitor -> monitor connect visits all fdsets on monitor and turns on monitor in-use flag for fdset 9. query-fdsets -> returns all fdsets and fds that don't have remove flag on QMP command examples -------------------- -> { "execute": "add-fd", "arguments": { "fdset": 1 } } <- { "return": { "fdset": 1, "fd": 3 } } -> { "execute": "remove-fd", "arguments": { "fdset": 1, "fd": 3 } } <- { "return": {} } -> { "execute":"query-fdsets" } => <- { "return" : { "fdsets": [ { "name": "fdset1", "fds": [ { "fd": 4, "removed": false } ], "refcount": 1, "monitor": true }, { "name": "fdset2", "fds": [ { "fd": 5, "removed": false }, { "fd": 6, "removed": true } ], "refcount": 1, "monitor": true } } -- Regards, Corey

Am 09.07.2012 17:05, schrieb Corey Bryant:
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues.
FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) -> add-fd monitor function initializes the monitor inuse flag for the fdset to true -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented
3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller
4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented
5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4
What was again the reason why we keep removed fds in the fdset at all? The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately. I think I might have confused remove-fd and close-fdset in earlier emails in this thread, so I hope this isn't inconsistent with what I said before.
6. qemu_close (need to replace all close calls in block layer with qemu_close) -> qemu_close decrements refcount for fdset -> qemu_close closes all fds that have (refcount == 0 && (!inuse || remove)) -> qemu_close frees the fdset if no fds remain in it
7. disconnecting the QMP monitor -> monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset
And close all fds with refcount == 0.
8. connecting the QMP monitor -> monitor connect visits all fdsets on monitor and turns on monitor in-use flag for fdset
9. query-fdsets -> returns all fdsets and fds that don't have remove flag on
Kevin

On Mon, 09 Jul 2012 17:46:00 +0200 Kevin Wolf <kwolf@redhat.com> wrote:
Am 09.07.2012 17:05, schrieb Corey Bryant:
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues.
FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1)
The fdset should be specified by the client, like: { "execute": "add-fd-set", "arguments": { "set-name": "/dev/fdset/1" } }
-> add-fd monitor function initializes the monitor inuse flag for the fdset to true
Why do we need the inuse flag?
-> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented
3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller
4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented
5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4
What was again the reason why we keep removed fds in the fdset at all?
The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately.
Agreed.

On 07/09/2012 12:18 PM, Luiz Capitulino wrote:
On Mon, 09 Jul 2012 17:46:00 +0200 Kevin Wolf <kwolf@redhat.com> wrote:
Am 09.07.2012 17:05, schrieb Corey Bryant:
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues.
FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1)
The fdset should be specified by the client, like:
{ "execute": "add-fd-set", "arguments": { "set-name": "/dev/fdset/1" } }
We could make the fdset name configurable. Then we wouldn't force clients into using the file=/dev/fdset/1 alias on commands like device_add. The risk with this is that clients need to be careful and use a unique name that doesn't conflict with any existing file names. The way it is currently, if add-fd is not given an fdset name, it will generate a new fdset and add the fd to it. add-fd always returns the fdset (int) and received fd (int) on success. (e.g. fdset=1 corresponds to file=/dev/fdset/1). Then the 2nd time you want to add an fd to this set, you have to specify fdset=1 on add-fd. I'll do whatever you all prefer. I think there are advantages in both approaches, however I'm leaning toward the current approach and forcing use of /dev/fdset/1 to keep all fd set names consistent.
-> add-fd monitor function initializes the monitor inuse flag for the fdset to true
Why do we need the inuse flag?
This helps to prevent fd leakage. Let's say the client adds fd=3 to /dev/fdset/1 and then the QMP monitor disconnects. Since the following evaluates to true when the monitor disconnects, the fd will be closed: (refcount == 0 && (!inuse || remove)) Note: refcount is incremented for the fdset on qemu_open and decremented on qemu_close, and no commands caused it to be incremented from zero in this example.
-> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented
3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller
4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented
5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4
What was again the reason why we keep removed fds in the fdset at all?
The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately.
Agreed.
Please take a look at my recent reply to Kevin about this and let me know if it clears things up. -- Regards, Corey

On 07/09/2012 11:46 AM, Kevin Wolf wrote:
Am 09.07.2012 17:05, schrieb Corey Bryant:
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues.
FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) -> add-fd monitor function initializes the monitor inuse flag for the fdset to true -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented
3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller
4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented
5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4
What was again the reason why we keep removed fds in the fdset at all?
Because if refcount is > 0 for the fd set, then the fd could be in use by a block device. So we keep it around until refcount is decremented to zero, at which point it is safe to close.
The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately.
fds in an fd set really need to be kept around until zero block devices reference them. At that point, if '(refcount == 0 && (!inuse || remove))' is true, then we'll officially close the fd.
I think I might have confused remove-fd and close-fdset in earlier emails in this thread, so I hope this isn't inconsistent with what I said before.
Ok no problem.
6. qemu_close (need to replace all close calls in block layer with qemu_close) -> qemu_close decrements refcount for fdset -> qemu_close closes all fds that have (refcount == 0 && (!inuse || remove)) -> qemu_close frees the fdset if no fds remain in it
7. disconnecting the QMP monitor -> monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset
And close all fds with refcount == 0.
Yes, this makes sense. It also makes sense to close removed fds with refcount == 0 in the remove-fd function. Basically this will be the same thing we do in qemu_close. We'll close any fds that evaulate the following as true: (refcount == 0 && (!inuse || remove))
8. connecting the QMP monitor -> monitor connect visits all fdsets on monitor and turns on monitor in-use flag for fdset
9. query-fdsets -> returns all fdsets and fds that don't have remove flag on
-- Regards, Corey

On Mon, 09 Jul 2012 13:35:19 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 07/09/2012 11:46 AM, Kevin Wolf wrote:
Am 09.07.2012 17:05, schrieb Corey Bryant:
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues.
FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) -> add-fd monitor function initializes the monitor inuse flag for the fdset to true -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented
3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller
4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented
5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4
What was again the reason why we keep removed fds in the fdset at all?
Because if refcount is > 0 for the fd set, then the fd could be in use by a block device. So we keep it around until refcount is decremented to zero, at which point it is safe to close.
But then the refcount is associated with the set, not with any particular fd.

On 07/09/2012 01:48 PM, Luiz Capitulino wrote:
On Mon, 09 Jul 2012 13:35:19 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 07/09/2012 11:46 AM, Kevin Wolf wrote:
Am 09.07.2012 17:05, schrieb Corey Bryant:
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues.
FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) -> add-fd monitor function initializes the monitor inuse flag for the fdset to true -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented
3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller
4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented
5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4
What was again the reason why we keep removed fds in the fdset at all?
Because if refcount is > 0 for the fd set, then the fd could be in use by a block device. So we keep it around until refcount is decremented to zero, at which point it is safe to close.
But then the refcount is associated with the set, not with any particular fd.
Exactly, yes that's what we're doing. Sorry, I thought that was clear in the design overview I sent earlier today. -- Regards, Corey

Am 09.07.2012 19:35, schrieb Corey Bryant:
On 07/09/2012 11:46 AM, Kevin Wolf wrote:
Am 09.07.2012 17:05, schrieb Corey Bryant:
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues.
FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) -> add-fd monitor function initializes the monitor inuse flag for the fdset to true -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented
3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller
4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented
5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4
What was again the reason why we keep removed fds in the fdset at all?
Because if refcount is > 0 for the fd set, then the fd could be in use by a block device. So we keep it around until refcount is decremented to zero, at which point it is safe to close.
The removed flag would make sense for a fdset after a hypothetical close-fdset call because the fdset needs to be kept around until the last user closes it, but I think removed fds can be deleted immediately.
fds in an fd set really need to be kept around until zero block devices reference them. At that point, if '(refcount == 0 && (!inuse || remove))' is true, then we'll officially close the fd.
Block devices don't reference an fd in the fdset. There are two references in a block device. The first one is obviously the file descriptor they are using; it is a fd dup()ed from an fd in the fdset, but it's now independent of it. The other reference is the file name that is kept in the BlockDriverState, and it always points to "/dev/fdset/X", that is, the whole fdset instead of a single fd. What happens if you remove a file descriptor from an fdset that is in use, is that you can't reopen the fdset with the flags of the removed file descriptor any more. Which I believe is exactly the expected behaviour. libvirt would use this to revoke r/w access, for example (and which behaviour you already provide by checking removed in qemu_open). Are there any other use cases where it makes a difference whether a file descriptor is kept in the fdset with removed=1 or whether it's actually removed from the fdset?
I think I might have confused remove-fd and close-fdset in earlier emails in this thread, so I hope this isn't inconsistent with what I said before.
Ok no problem.
6. qemu_close (need to replace all close calls in block layer with qemu_close) -> qemu_close decrements refcount for fdset -> qemu_close closes all fds that have (refcount == 0 && (!inuse || remove)) -> qemu_close frees the fdset if no fds remain in it
7. disconnecting the QMP monitor -> monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset
And close all fds with refcount == 0.
Yes, this makes sense.
It also makes sense to close removed fds with refcount == 0 in the remove-fd function. Basically this will be the same thing we do in qemu_close. We'll close any fds that evaulate the following as true:
(refcount == 0 && (!inuse || remove))
Yes, whatever condition we'll come up with, but it should be the same and checked in all places where its value might change. Kevin

On 07/09/2012 11:05 AM, Corey Bryant wrote:
On 07/09/2012 10:05 AM, Luiz Capitulino wrote:
On Thu, 05 Jul 2012 11:06:56 -0400 Corey Bryant <coreyb@linux.vnet.ibm.com> wrote:
On 07/04/2012 04:09 AM, Kevin Wolf wrote:
Am 03.07.2012 20:21, schrieb Corey Bryant:
On 07/03/2012 02:00 PM, Eric Blake wrote:
On 07/03/2012 11:46 AM, Corey Bryant wrote:
> > Yes, I think adding a +1 to the refcount for the monitor makes > sense. > > I'm a bit unsure how to increment the refcount when a monitor > reconnects > though. Maybe it is as simple as adding a +1 to each fd's > refcount when > the next QMP monitor connects.
Or maybe delay a +1 until after a 'query-fds' - it is not until the monitor has reconnected and learned what fds it should be aware of that incrementing the refcount again makes sense. But that would mean making 'query-fds' track whether this is the first call since the monitor reconnected, as it shouldn't normally increase refcounts.
This doesn't sound ideal.
Yes, it's less than ideal.
The other alternative is that the monitor never re-increments a refcount. Once a monitor disconnects, that fd is lost to the monitor, and a reconnected monitor must pass in a new fd to be re-associated with the fdset. In other words, the monitor's use of an fd is a one-way operation, starting life in use but ending at the first disconnect or remove-fd.
I would vote for this 2nd alternative. As long as we're not introducing an fd leak. And I don't think we are if we decrement the refcount on remove-fd or on QMP disconnect.
In fact, I believe this one is even worse. I can already see hacks like adding a dummy FD with invalid flags and removing it again just to regain control over the fdset...
You earlier suggestion made a lot of sense to me: Whenever a new QMP monitor is connected, increase the refcount. That is, as long as any monitor is there, don't drop any fdsets unless explicitly requested via QMP.
Ok. So refcount would be incremented (for the fd or fdset, whatever we decide on) when QMP reconnects. I'm assuming we wouldn't wait until after a query-fds call.
I'm not sure this is a good idea because we will leak fds if the client forgets about the fds when re-connecting (ie. it was restarted) or if a different client connects to QMP.
If we really want to do that, I think that the right way of doing this is to add a command for clients to re-again ownership of the fds on reconnection.
But to be honest, I don't fully understand why this is needed.
I'm not sure this is an issue with current design. I know things have changed a bit as the email threads evolved, so I'll paste the current design that I am working from. Please let me know if you still see any issues.
FD passing: ----------- New monitor commands enable adding/removing an fd to/from a set. New monitor command query-fdsets enables querying of current monitor fdsets. The set of fds should all refer to the same file, with each fd having different access flags (ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the matching access mode flags.
Design points: -------------- 1. add-fd -> fd is passed via SCM rights and qemu adds fd to first unused fdset (e.g. /dev/fdset/1) -> add-fd monitor function initializes the monitor inuse flag for the fdset to true -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=3) to caller
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd in fdset1 that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'device-add' fails that refcount is not incremented
3. add-fd fdset=1 -> fd is passed via SCM rights -> add-fd monitor function adds the received fd to the specified fdset (or fails if fdset doesn't exist) -> add-fd monitor function initializes the remove flag for the fd to false -> add-fd returns fdset number and received fd number (e.g fd=4) to caller
4. block-commit -> qemu_open performs "reopen" by using the first fd from the fdset that has access flags matching the qemu_open action flags and has remove flag set to false -> qemu_open increments refcount for the fdset -> Need to make sure that if a command like 'block-commit' fails that refcount is not incremented
5. remove-fd fdset=1 fd=4 -> remove-fd monitor function fails if fdset doesn't exist -> remove-fd monitor function turns on remove flag for fd=4
6. qemu_close (need to replace all close calls in block layer with qemu_close) -> qemu_close decrements refcount for fdset -> qemu_close closes all fds that have (refcount == 0 && (!inuse || remove)) -> qemu_close frees the fdset if no fds remain in it
7. disconnecting the QMP monitor -> monitor disconnect visits all fdsets on monitor and turns off monitor in-use flag for fdset
8. connecting the QMP monitor -> monitor connect visits all fdsets on monitor and turns on monitor in-use flag for fdset
9. query-fdsets -> returns all fdsets and fds that don't have remove flag on
QMP command examples -------------------- -> { "execute": "add-fd", "arguments": { "fdset": 1 } } <- { "return": { "fdset": 1, "fd": 3 } }
-> { "execute": "remove-fd", "arguments": { "fdset": 1, "fd": 3 } } <- { "return": {} }
-> { "execute":"query-fdsets" } => <- { "return" : { "fdsets": [ { "name": "fdset1", "fds": [ { "fd": 4, "removed": false } ], "refcount": 1, "monitor": true }, { "name": "fdset2", "fds": [ { "fd": 5, "removed": false }, { "fd": 6, "removed": true } ], "refcount": 1, "monitor": true } }
Here's a minor correction to query-fdsets. s/name/fdset and I changed fdset to an int to stay in sync with add-fd and remove-fd. -> { "execute":"query-fdsets" } => <- { "return" : { "fdsets": [ { "fdset": 1, "fds": [ { "fd": 4, "removed": false } ], "refcount": 1, "monitor": true }, { "fdset": 2, "fds": [ { "fd": 5, "removed": false }, { "fd": 6, "removed": true } ], "refcount": 1, "monitor": true } } -- Regards, Corey

Am 03.07.2012 19:03, schrieb Eric Blake:
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it.
If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here.
Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove?
Passing the flag parameters is not trivial, as that would mean the QMP code would have to define constants mapping to all of the O_* flags that qemu_open supports. It's easier to support closing by fd number.
Good point. So pass-fd or whatever it will be called needs to returnn both fdset number and the individual fd number.
5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off.
The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail.
I'm not sure I understand what you mean.
pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, with initial use count of 1 (the use is the monitor). qemu_open() increments the use count. A new qemu_close() wrapper would decrement the use count. And both calling 'remove-fd', or closing the QMP monitor of an fd that has not yet been passed through 'remove-fd', serves as a way to decrement the use count. You'd still have to track whether the monitor is using an fd (to avoid over-decrementing on QMP monitor close), but by having the monitor's use also tracked under the refcount, then refcount reaching 0 is sufficient to auto-close an fd.
I think that also means that re-establishing the client QMP connection would increment
This is an interesting idea. I've never thought about what to do with a reconnect. It felt a bit odd at first, but now your proposal totally makes sense to me.
For some examples:
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it
Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example: 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open. If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more.
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in use by the monitor, refcount decremented to 1 but still left open because it is in use by the block device 4. client crashes, so all tracked fds are visited; but fd=4 is already marked as not in use by the monitor, so its refcount is unchanged
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in use by the monitor, refcount is now decremented to 0 and fd=4 is closed
Yup, apart from the comment above the examples look good to me.
I think that covers the idea; you need a bool in_use for tracking monitor state (the monitor is in use until either a remove-fd or a monitor connection closes), as well as a ref-count.
Yes, makes sense to me.
We also need a query-fdsets command that lists all fdsets that exist. If
we add information about single fds to the return value of it, we probably don't need a separate query-fd that operates on a single fdset.
Yes, good point. And maybe we don't need 2 commands. query-fdsets could return all the sets and all the fds that are in those sets.
Yes, I think a single query command is good enough here, something like:
{ "execute":"query-fdsets" } => { "return" : { "sets": [ { "name": "fdset1", "fds": [ { "fd": 4, "monitor": true, "refcount": 1 } ] }, { "name": "fdset2", "fds": [ { "fd": 5, "monitor": false, "refcount": 1 }, { "fd": 6, "monitor": true, "refcount": 2 } ] } ] } }
Looks good, this is exactly what I was thinking of.
In use by whom? If it's still in use in qemu (as in "in-use flag would be set") and we have a refcount of zero, then that's a bug.
In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd.
I think the refcount being non-zero _is_ what defines an fd as being in use by qemu (including use by the monitor). Any place you have to close an fd before reopening it is dangerous; the safe way is always to open with the new permissions before closing the old permissions.
There is one case I'm aware of where we need to be careful: Before opening an image, qemu may probe the format. In this case, the image gets opened twice, and the first close comes before the second open. I'm not entirely sure how hard it would be to get rid of that behaviour. If we can't get rid of it, we have a small window that the refcount doesn't really cover, and if we weren't careful it would become racy. This is why I mentioned earlier that maybe we need to defer the refcount decrease a bit. However, I can't see how the in-use flag would make a difference there. If the refcount can't cover it, the in-use flag can't either. Kevin

On 07/04/2012 04:00 AM, Kevin Wolf wrote:
Am 03.07.2012 19:03, schrieb Eric Blake:
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt in-use flag turned on for fd. 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first fd from the set that has access flags matching the qemu_open action flags. qemu_open increments refcount for this fd. 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the set. turns libvirt in-use flag off marking the fd ready to be closed when qemu is done with it.
If we decided to not return the individual fd numbers to libvirt, file descriptors would be uniquely identified by an fdset/flags pair here.
Are you saying we'd pass the fdset name and flags parameters on remove-fd to somehow identify the fds to remove?
Passing the flag parameters is not trivial, as that would mean the QMP code would have to define constants mapping to all of the O_* flags that qemu_open supports. It's easier to support closing by fd number.
Good point. So pass-fd or whatever it will be called needs to returnn both fdset number and the individual fd number.
5. qemu_close decrements refcount for fd, and closes fd when refcount is zero and libvirt in use flag is off.
The monitor could just hold another reference, then we save the additional flag. But that's a qemu implementation detail.
I'm not sure I understand what you mean.
pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset, with initial use count of 1 (the use is the monitor). qemu_open() increments the use count. A new qemu_close() wrapper would decrement the use count. And both calling 'remove-fd', or closing the QMP monitor of an fd that has not yet been passed through 'remove-fd', serves as a way to decrement the use count. You'd still have to track whether the monitor is using an fd (to avoid over-decrementing on QMP monitor close), but by having the monitor's use also tracked under the refcount, then refcount reaching 0 is sufficient to auto-close an fd.
I think that also means that re-establishing the client QMP connection would increment
This is an interesting idea. I've never thought about what to do with a reconnect. It felt a bit odd at first, but now your proposal totally makes sense to me.
For some examples:
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it
Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example:
1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open.
If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more.
If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue?
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4 open because it is still in use by the block device 4. client re-establishes QMP connection, and 'query-fds' lets client learn about fd=4 still being open as part of fdset1, but also informs client that fd is not in use by the monitor
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount to 2 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in use by the monitor, refcount decremented to 1 but still left open because it is in use by the block device 4. client crashes, so all tracked fds are visited; but fd=4 is already marked as not in use by the monitor, so its refcount is unchanged
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in use by the monitor, refcount is now decremented to 0 and fd=4 is closed
Yup, apart from the comment above the examples look good to me.
I think that covers the idea; you need a bool in_use for tracking monitor state (the monitor is in use until either a remove-fd or a monitor connection closes), as well as a ref-count.
Yes, makes sense to me.
We also need a query-fdsets command that lists all fdsets that exist. If
we add information about single fds to the return value of it, we probably don't need a separate query-fd that operates on a single fdset.
Yes, good point. And maybe we don't need 2 commands. query-fdsets could return all the sets and all the fds that are in those sets.
Yes, I think a single query command is good enough here, something like:
{ "execute":"query-fdsets" } => { "return" : { "sets": [ { "name": "fdset1", "fds": [ { "fd": 4, "monitor": true, "refcount": 1 } ] }, { "name": "fdset2", "fds": [ { "fd": 5, "monitor": false, "refcount": 1 }, { "fd": 6, "monitor": true, "refcount": 2 } ] } ] } }
Looks good, this is exactly what I was thinking of.
In use by whom? If it's still in use in qemu (as in "in-use flag would be set") and we have a refcount of zero, then that's a bug.
In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd.
I think the refcount being non-zero _is_ what defines an fd as being in use by qemu (including use by the monitor). Any place you have to close an fd before reopening it is dangerous; the safe way is always to open with the new permissions before closing the old permissions.
There is one case I'm aware of where we need to be careful: Before opening an image, qemu may probe the format. In this case, the image gets opened twice, and the first close comes before the second open. I'm not entirely sure how hard it would be to get rid of that behaviour.
If we can't get rid of it, we have a small window that the refcount doesn't really cover, and if we weren't careful it would become racy. This is why I mentioned earlier that maybe we need to defer the refcount decrease a bit. However, I can't see how the in-use flag would make a difference there. If the refcount can't cover it, the in-use flag can't either.
Yeah this is a problem. Could we introduce another flag to cover this? -- Regards, Corey

Am 05.07.2012 16:22, schrieb Corey Bryant:
For some examples:
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it
Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example:
1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open.
If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more.
If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue?
Come on, requiring realistic examples isn't fair. ;-) Swap steps 2 and 3 in the example, then step 3 is just the preparation for a command that uses the new fd. The problem stays the same. Or a live commit operation could be in flight so that qemu must be able to switch on completion without libvirt interaction. There are enough reasons that an fd in an fdset exists and is not opened, but I can't think of one why it should be dropped.
In use by whom? If it's still in use in qemu (as in "in-use flag would be set") and we have a refcount of zero, then that's a bug.
In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd.
I think the refcount being non-zero _is_ what defines an fd as being in use by qemu (including use by the monitor). Any place you have to close an fd before reopening it is dangerous; the safe way is always to open with the new permissions before closing the old permissions.
There is one case I'm aware of where we need to be careful: Before opening an image, qemu may probe the format. In this case, the image gets opened twice, and the first close comes before the second open. I'm not entirely sure how hard it would be to get rid of that behaviour.
If we can't get rid of it, we have a small window that the refcount doesn't really cover, and if we weren't careful it would become racy. This is why I mentioned earlier that maybe we need to defer the refcount decrease a bit. However, I can't see how the in-use flag would make a difference there. If the refcount can't cover it, the in-use flag can't either.
Yeah this is a problem. Could we introduce another flag to cover this?
Adding more refcounts or flags is not a problem, but it doesn't solve it either. The hard question is when to set that flag. I believe it may be easier to just change the block layer so that it opens files only once during bdrv_open(). Kevin

On 07/05/2012 10:51 AM, Kevin Wolf wrote:
Am 05.07.2012 16:22, schrieb Corey Bryant:
For some examples:
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it
Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example:
1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open.
If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more.
If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue?
Come on, requiring realistic examples isn't fair. ;-)
Heh :)
Swap steps 2 and 3 in the example, then step 3 is just the preparation for a command that uses the new fd. The problem stays the same. Or a live commit operation could be in flight so that qemu must be able to switch on completion without libvirt interaction.
Good point. I thought it would be useful to run through the examples again with each fdset having a refcount, and each fd having an in-use flag. One difference though is that refcount is only incremented/decremented when qemu_open/qemu_close are called. I think this works and covers Kevin's concerns. Actually it might be a bit simpler too. 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 0 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in-use by the monitor; refcount is 0 so all fds in fdset1 are closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and fd=4 is closed since the refcount is 0; fd=5 remains open 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, but fd=4 remains open because refcount of fdset1 is 1 5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no longer in-use by the monitor, and fd=5 remains open because refcount of fdset1 is 1 6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1) 5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off
There are enough reasons that an fd in an fdset exists and is not opened, but I can't think of one why it should be dropped.
In use by whom? If it's still in use in qemu (as in "in-use flag would be set") and we have a refcount of zero, then that's a bug.
In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd.
I think the refcount being non-zero _is_ what defines an fd as being in use by qemu (including use by the monitor). Any place you have to close an fd before reopening it is dangerous; the safe way is always to open with the new permissions before closing the old permissions.
There is one case I'm aware of where we need to be careful: Before opening an image, qemu may probe the format. In this case, the image gets opened twice, and the first close comes before the second open. I'm not entirely sure how hard it would be to get rid of that behaviour.
If we can't get rid of it, we have a small window that the refcount doesn't really cover, and if we weren't careful it would become racy. This is why I mentioned earlier that maybe we need to defer the refcount decrease a bit. However, I can't see how the in-use flag would make a difference there. If the refcount can't cover it, the in-use flag can't either.
Yeah this is a problem. Could we introduce another flag to cover this?
Adding more refcounts or flags is not a problem, but it doesn't solve it either. The hard question is when to set that flag.
I believe it may be easier to just change the block layer so that it opens files only once during bdrv_open().
Kevin
-- Regards, Corey

On 07/05/2012 12:35 PM, Corey Bryant wrote:
On 07/05/2012 10:51 AM, Kevin Wolf wrote:
Am 05.07.2012 16:22, schrieb Corey Bryant:
For some examples:
1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in use by monitor, as member of fdset1 2. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is now 0 so qemu closes it
Doesn't the refcount belong to an fdset rather than a single fd? As long as the fdset has still a reference, it's possible to reach the other fds in the set through a reopen. For example:
1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member of fdset1, in use by monitor, refcount 1 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a member of fdset, in use by monitor, refcount is still 1 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2 4. client crashes, so all tracked fdsets are visited; fdset1 gets its refcount decreased to 1, and both member fds 4 and 5 stay open.
If you had refcounted a single fd, fd 5 would be closed now and qemu couldn't switch to O_RDONLY any more.
If the O_RDONLY is for a future command, it would make more sense to add that fd before that future command. Or are you passing it at step 2 because it is needed for the probe re-open issue?
Come on, requiring realistic examples isn't fair. ;-)
Heh :)
Swap steps 2 and 3 in the example, then step 3 is just the preparation for a command that uses the new fd. The problem stays the same. Or a live commit operation could be in flight so that qemu must be able to switch on completion without libvirt interaction.
Good point.
I thought it would be useful to run through the examples again with each fdset having a refcount, and each fd having an in-use flag. One difference though is that refcount is only incremented/decremented when qemu_open/qemu_close are called. I think this works and covers Kevin's concerns. Actually it might be a bit simpler too.
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 0 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in-use by the monitor; refcount is 0 so all fds in fdset1 are closed
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in fdset1
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and fd=4 is closed since the refcount is 0; fd=5 remains open
1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, but fd=4 remains open because refcount of fdset1 is 1 5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no longer in-use by the monitor, and fd=5 remains open because refcount of fdset1 is 1 6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off
1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1) 5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off
There are enough reasons that an fd in an fdset exists and is not opened, but I can't think of one why it should be dropped.
> In use by whom? If it's still in use in qemu (as in "in-use flag > would > be set") and we have a refcount of zero, then that's a bug. >
In use by qemu. I don't think it's a bug. I think there are situations where refcount gets to zero but qemu is still using the fd.
I think the refcount being non-zero _is_ what defines an fd as being in use by qemu (including use by the monitor). Any place you have to close an fd before reopening it is dangerous; the safe way is always to open with the new permissions before closing the old permissions.
There is one case I'm aware of where we need to be careful: Before opening an image, qemu may probe the format. In this case, the image gets opened twice, and the first close comes before the second open. I'm not entirely sure how hard it would be to get rid of that behaviour.
If we can't get rid of it, we have a small window that the refcount doesn't really cover, and if we weren't careful it would become racy. This is why I mentioned earlier that maybe we need to defer the refcount decrease a bit. However, I can't see how the in-use flag would make a difference there. If the refcount can't cover it, the in-use flag can't either.
Yeah this is a problem. Could we introduce another flag to cover this?
Adding more refcounts or flags is not a problem, but it doesn't solve it either. The hard question is when to set that flag.
I believe it may be easier to just change the block layer so that it opens files only once during bdrv_open().
Can this fix be delivered after the fd passing patch series? -- Regards, Corey

Am 05.07.2012 18:37, schrieb Corey Bryant:
There is one case I'm aware of where we need to be careful: Before opening an image, qemu may probe the format. In this case, the image gets opened twice, and the first close comes before the second open. I'm not entirely sure how hard it would be to get rid of that behaviour.
If we can't get rid of it, we have a small window that the refcount doesn't really cover, and if we weren't careful it would become racy. This is why I mentioned earlier that maybe we need to defer the refcount decrease a bit. However, I can't see how the in-use flag would make a difference there. If the refcount can't cover it, the in-use flag can't either.
Yeah this is a problem. Could we introduce another flag to cover this?
Adding more refcounts or flags is not a problem, but it doesn't solve it either. The hard question is when to set that flag.
I believe it may be easier to just change the block layer so that it opens files only once during bdrv_open().
Can this fix be delivered after the fd passing patch series?
Sure, we can't fix everything at once. Kevin

On 07/05/2012 10:35 AM, Corey Bryant wrote:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1
This says that an in-use fd comes back into use after a crash...
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag? ...but what are the semantics here? If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things. If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash. Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). An fd is then closed when 'refcount == 0 && (!inuse || removed)'. On monitor disconnect, the inuse flag is cleared, and on reconnect, it is set; but that does not impact the removed flag. And the 'query-fds' command would omit any fd with the 'removed' flag, even when the fd is still kept open internally thanks to the refcount being nonzero. But I'm definitely agreeing that tying the refcount to the set rather than to individual fds within the set makes sense; you still avoid the fd leak in that all fds in the set are closed when both the monitor has disavowed the set and when qemu_close() has finished using any of the fds. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/05/2012 01:00 PM, Eric Blake wrote:
On 07/05/2012 10:35 AM, Corey Bryant wrote:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all tracked fds are visited; fd=4 had not yet been passed to 'remove-fd', so it's in-use flag is on; in-use flag is turned off and fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, so all tracked fds are visited, and in-use flags are turned back on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1
This says that an in-use fd comes back into use after a crash...
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag?
...but what are the semantics here?
If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things.
Ah, yeah I missed that.
If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash.
Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects). An fd is then closed when 'refcount == 0 && (!inuse || removed)'. On monitor disconnect, the inuse flag is cleared, and on reconnect, it is set; but that does not impact the removed flag. And the 'query-fds' command would omit any fd with the 'removed' flag, even when the fd is still kept open internally thanks to the refcount being nonzero.
I agree. Having the 2 flags makes sense and solves the issues you mentioned above. -- Regards, Corey
But I'm definitely agreeing that tying the refcount to the set rather than to individual fds within the set makes sense; you still avoid the fd leak in that all fds in the set are closed when both the monitor has disavowed the set and when qemu_close() has finished using any of the fds.

Am 05.07.2012 19:00, schrieb Eric Blake:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
On 07/05/2012 10:35 AM, Corey Bryant wrote: 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag?
...but what are the semantics here?
If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things.
If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash.
Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects).
I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening. Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one? Kevin

On 07/06/2012 05:11 AM, Kevin Wolf wrote:
Am 05.07.2012 19:00, schrieb Eric Blake:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
On 07/05/2012 10:35 AM, Corey Bryant wrote: 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag?
...but what are the semantics here?
If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things.
If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash.
Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects).
I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening.
Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one?
Kevin
I think we need the granularity of having an in-use flag per fd. Here's an example of why: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open as part of fdset1 6. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed and fdset1 is freed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu turns on the remove flag for fd=4; refcount is 0 so all fds in fdset1 are closed I think we need the granularity of having an in-use flag per fd. If we track monitor in-use in the reference count, then we are tracking it for the fdset and I think this could cause leaks. In the following example, we have a refcount for the fdset, an in-use flag per fd, and a remove flag per fd. We're only incrementing/decrementing refcount in qemu_open/qemu_close. In the following example, we have a refcount for the fdset, and a remove flag per fd. We're incrementing refcount when the fdset is first created, when QMP re-connects, and in qemu_open. We're decrementing refcount when QMP disconnects, and in qemu_close. 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client crashes, so all tracked fdsets are visited; fd=4 has not yet been passed to 'remove-fd', so its remove flag is off; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1) 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open as part of fdset1 6. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed and fdset1 is freed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 0 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in-use by the monitor; refcount is 0 so all fds in fdset1 are closed 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in fdset1 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and fd=4 is closed since the refcount is 0; fd=5 remains open 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, but fd=4 remains open because refcount of fdset1 is 1 5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no longer in-use by the monitor, and fd=5 remains open because refcount of fdset1 is 1 6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1) 5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off -- Regards, Corey

Ugh... please disregard this. I hit send accidentally. On 07/06/2012 01:14 PM, Corey Bryant wrote:
On 07/06/2012 05:11 AM, Kevin Wolf wrote:
Am 05.07.2012 19:00, schrieb Eric Blake:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
On 07/05/2012 10:35 AM, Corey Bryant wrote: 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag?
...but what are the semantics here?
If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things.
If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash.
Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects).
I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening.
Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one?
Kevin
I think we need the granularity of having an in-use flag per fd. Here's an example of why:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open as part of fdset1 6. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed and fdset1 is freed
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 1 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu turns on the remove flag for fd=4; refcount is 0 so all fds in fdset1 are closed
I think we need the granularity of having an in-use flag per fd. If we track monitor in-use in the reference count, then we are tracking it for the fdset and I think this could cause leaks.
In the following example, we have a refcount for the fdset, an in-use flag per fd, and a remove flag per fd. We're only incrementing/decrementing refcount in qemu_open/qemu_close.
In the following example, we have a refcount for the fdset, and a remove flag per fd. We're incrementing refcount when the fdset is first created, when QMP re-connects, and in qemu_open. We're decrementing refcount when QMP disconnects, and in qemu_close.
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client crashes, so all tracked fdsets are visited; fd=4 has not yet been passed to 'remove-fd', so its remove flag is off; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1)
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1; fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open as part of fdset1 6. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed and fdset1 is freed
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, but the command fails for some other reason, so the refcount is still 0 at the end of the command (although it may have been temporarily incremented then decremented during the command) 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or QMP connection is closed), so qemu marks fd=4 as no longer in-use by the monitor; refcount is 0 so all fds in fdset1 are closed
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client crashes, so all tracked fds are visited; fd=4 and fd=5 had not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off; refcount of fdset1 is 0 so qemu closes all fds in fdset1
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and fd=4 is closed since the refcount is 0; fd=5 remains open
1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, but fd=4 remains open because refcount of fdset1 is 1 5. client calls 'remove-fd fdset=1 fd=5', so qemu marks fd=5 as no longer in-use by the monitor, and fd=5 remains open because refcount of fdset1 is 1 6. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off
1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'add-fd', qemu is now tracking fd=5 (O_RDONLY) in fdset1 with refcount still 0; fd=5's in-use flag is turned on 3. client calls 'device-add' with /dev/fdset/1 as the backing filename and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 1 4. client crashes, so all tracked fds are visited; fd=4 and fd=5 have not yet been passed to 'remove-fd', so their in-use flags are on; in-use flags are turned off and both fds are left open because the set is still in use by the block device (refcount is 1) 5. qemu_close(fd=4) is called, refcount of fdset1 is decremented; both fds are closed since refcount is 0 and their in-use flags are off
-- Regards, Corey

On 07/06/2012 05:11 AM, Kevin Wolf wrote:
Am 05.07.2012 19:00, schrieb Eric Blake:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
On 07/05/2012 10:35 AM, Corey Bryant wrote: 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag?
...but what are the semantics here?
If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things.
If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash.
Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects).
I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening.
Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one?
Kevin
Ok let me try this again. I was going through some of the examples and I think we need a separate in-use flag. Otherwise, when refcount gets to 1, we don't know if it is because of a monitor reference or a block device reference. I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1) 7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed In the following case, we have an in-use and remove flag per fd and we only increment/decrement refcount on qemu_open/qemu_close: 1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's remove flag is initialized to off and in-use flag is initialized to on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; fd=4's in-use flag is turned off; fd=4 is left open because it is still in-use by the block device (refcount is still 1) 4. client re-establishes QMP connection, refcount for fdset1 is still 1; fd=4's in-use flag is turned on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 1 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 0; fd=4 is closed because (refcount == 0 && (!inuse || removed)) is true 7. Sometime later.. QMP disconnects -- Regards, Corey

On 07/06/2012 01:40 PM, Corey Bryant wrote:
On 07/06/2012 05:11 AM, Kevin Wolf wrote:
Am 05.07.2012 19:00, schrieb Eric Blake:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
On 07/05/2012 10:35 AM, Corey Bryant wrote: 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag?
...but what are the semantics here?
If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things.
If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash.
Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects).
I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening.
Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one?
Kevin
Ok let me try this again. I was going through some of the examples and I think we need a separate in-use flag. Otherwise, when refcount gets to 1, we don't know if it is because of a monitor reference or a block device reference. I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1) 7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed
In the following case, we have an in-use and remove flag per fd and we only increment/decrement refcount on qemu_open/qemu_close:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's remove flag is initialized to off and in-use flag is initialized to on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 1 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; fd=4's in-use flag is turned off; fd=4 is left open because it is still in-use by the block device (refcount is still 1) 4. client re-establishes QMP connection, refcount for fdset1 is still 1; fd=4's in-use flag is turned on; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 1 6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 0; fd=4 is closed because (refcount == 0 && (!inuse || removed)) is true 7. Sometime later.. QMP disconnects
I have one modification to this. It looks the inuse flag could be at the fdset level, rather than an inuse flag per fd. Although.. you did mention multiple monitors, so perhaps inuse should be a counter? -- Regards, Corey

Am 06.07.2012 19:40, schrieb Corey Bryant:
On 07/06/2012 05:11 AM, Kevin Wolf wrote:
Am 05.07.2012 19:00, schrieb Eric Blake:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
On 07/05/2012 10:35 AM, Corey Bryant wrote: 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag?
...but what are the semantics here?
If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things.
If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash.
Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects).
I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening.
Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one?
Kevin
Ok let me try this again. I was going through some of the examples and I think we need a separate in-use flag. Otherwise, when refcount gets to 1, we don't know if it is because of a monitor reference or a block device reference.
Does it matter?
I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2
It also decreases the reference count because the monitor doesn't use it any more.
6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1)
So here the refcount becomes 0 and the fdset is closed.
7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed
The only question that is a bit unclear to me is whether a remove-fd on one monitor drops the refcount only for this monitor or for all monitors. However, both options can be implemented without additional flags or counters. Kevin

On 07/09/2012 10:04 AM, Kevin Wolf wrote:
Am 06.07.2012 19:40, schrieb Corey Bryant:
On 07/06/2012 05:11 AM, Kevin Wolf wrote:
Am 05.07.2012 19:00, schrieb Eric Blake:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 0; fd=4's in-use flag is turned on 2. client calls 'device-add' with /dev/fdset/1 as the backing filename, so qemu_open() increments the refcount of fdset1 to 1 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no longer in-use by the monitor, and is left open because it is in use by the block device (refcount is 1) 4. client crashes, so all tracked fds are visited; fd=4 is not in-use but refcount is 1 so it is not closed
On 07/05/2012 10:35 AM, Corey Bryant wrote: 5. client re-establishes QMP connection, so all tracked fds are visited. What happens to the fd=4 in-use flag?
...but what are the semantics here?
If we _always_ turn the in-use flag back on, then that says that even though libvirt successfully asked to close the fd, the reconnection means that libvirt once again has to ask to close things.
If we _never_ turn the in-use flag back on, then we've broken the first case above where we want an in-use fd to come back into use after a crash.
Maybe that argues for two flags per fd: an in-use flag (there is currently a monitor connection that can manipulate the fd, either because it passed in the fd or because it reconnected) and a removed flag (a monitor called remove-fd, and no longer wants to know about the fd, even if it crashes and reconnects).
I was in fact just going to suggest a removed flag as well, however combined with including the monitor connections in the refcount instead of an additional flag. This would also allow to have (the currently mostly hypothetical case of) multiple QMP monitors without interesting things happening.
Maybe I'm missing some point that the inuse flag would allow and including monitors in the refcount doesn't. Is there one?
Kevin
Ok let me try this again. I was going through some of the examples and I think we need a separate in-use flag. Otherwise, when refcount gets to 1, we don't know if it is because of a monitor reference or a block device reference.
Does it matter?
I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2
It also decreases the reference count because the monitor doesn't use it any more.
I don't think that will work because refcount is for the entire fdset. So we can't decrement the refcount for every fd that is removed from the fdset. I think it is much simpler if we only increment refcount for an fdset on qemu_open, and only decrement refcount on qemu_close.
6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1)
So here the refcount becomes 0 and the fdset is closed.
7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed
The only question that is a bit unclear to me is whether a remove-fd on one monitor drops the refcount only for this monitor or for all monitors. However, both options can be implemented without additional flags or counters.
Before we go back and forth on this thread, would you mind taking a look at the last email I sent to Luiz? It includes all the design points that I'm currently working from. I think it's a good level set and we can work off that thread if there are still any issues. -- Regards, Corey

Am 09.07.2012 17:23, schrieb Corey Bryant:
I think it would cause fds to sit on the monitor until refcount gets to zero (monitor disconnects). Here's an example without the in-use flag:
1. client calls 'add-fd', qemu is now tracking fd=4 in fdset1 with refcount of 1 (incremented because of monitor reference); fd=4's remove flag is initialized to off 2. client calls 'device-add' with /dev/fdset/1 as the backing filename; qemu_open() increments the refcount of fdset1 to 2 3. client crashes, so all fdsets are visited; fd=4 had not yet been passed to 'remove-fd', so it's remove flag is off; refcount for fdset1 is decremented to 1; fd=4 is left open because it is still in use by the block device (refcount is 1) 4. client re-establishes QMP connection, refcount for fdset1 is incremented to 2; 'query-fds' lets client learn about fd=4 still being open as part of fdset1 5. client calls 'remove-fd fdset=1 fd=4'; qemu turns on remove flag for fd=4; but fd=4 remains open because refcount of fdset1 is 2
It also decreases the reference count because the monitor doesn't use it any more.
I don't think that will work because refcount is for the entire fdset. So we can't decrement the refcount for every fd that is removed from the fdset.
I think it is much simpler if we only increment refcount for an fdset on qemu_open, and only decrement refcount on qemu_close.
Ah right... So this would only work if we had explicit fdset-create/close commands, where the former would increase the refcount and the latter decrease it (fdset-open would be optional but I like symmetry) Maybe we need (or want) that anyway, but I need to think more about it first.
6. qemu_close is called for fd=4; refcount for fdset1 is decremented to 1; fd=4 remains open because monitor still references fdset1 (refcount of fdset1 is 1)
So here the refcount becomes 0 and the fdset is closed.
7. Sometime later.. QMP disconnects; refcount for fdset is decremented to zero; fd=4 is closed
The only question that is a bit unclear to me is whether a remove-fd on one monitor drops the refcount only for this monitor or for all monitors. However, both options can be implemented without additional flags or counters.
Before we go back and forth on this thread, would you mind taking a look at the last email I sent to Luiz? It includes all the design points that I'm currently working from. I think it's a good level set and we can work off that thread if there are still any issues.
Ok, I'll have a look. Kevin

Am 27.06.2012 00:28, schrieb Corey Bryant:
On 06/26/2012 04:50 PM, Luiz Capitulino wrote:
On Tue, 26 Jun 2012 13:45:52 +0200 Kevin Wolf <kwolf@redhat.com> wrote:
Am 26.06.2012 11:10, schrieb Daniel P. Berrange:
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
In fact, there are more steps:
4. use it successfully 5. close_fd "/dev/fd/N"
I think it would well be possible that qemu just closes the fd when it's not used internally any more.
pass-fd could have a flag indicating qemu to do that.
It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU.
No, libvirt doesn't know it, because the original file descriptor is still needed when qemu decides to reopen the file. So I think qemu needs some kind of refcounting anyway. One of the references is held by libvirt and it can drop it with close_fd, and the other one would be for the BlockDriverState or whatever you use the FD with. (There's a tricky part: When do you actually close the FD? If libvirt has dropped its reference and qemu reopens, for example because it has just probed the image format, we have a short time where the refcount would be 0, but we can't drop it anyway.) Kevin

On 06/27/2012 04:43 AM, Kevin Wolf wrote:
Am 27.06.2012 00:28, schrieb Corey Bryant:
On 06/26/2012 04:50 PM, Luiz Capitulino wrote:
On Tue, 26 Jun 2012 13:45:52 +0200 Kevin Wolf <kwolf@redhat.com> wrote:
Am 26.06.2012 11:10, schrieb Daniel P. Berrange:
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
In fact, there are more steps:
4. use it successfully 5. close_fd "/dev/fd/N"
I think it would well be possible that qemu just closes the fd when it's not used internally any more.
pass-fd could have a flag indicating qemu to do that.
It seems like libvirt would be in a better position to understand when a file is no longer in use, and then it can call close_fd. No? Of course the the only fd that needs to be closed is the originally passed fd. The dup'd fd's are closed by QEMU.
No, libvirt doesn't know it, because the original file descriptor is still needed when qemu decides to reopen the file. So I think qemu needs some kind of refcounting anyway. One of the references is held by libvirt and it can drop it with close_fd, and the other one would be for the BlockDriverState or whatever you use the FD with. (There's a tricky part: When do you actually close the FD? If libvirt has dropped its reference and qemu reopens, for example because it has just probed the image format, we have a short time where the refcount would be 0, but we can't drop it anyway.)
Kevin
Yes, the refcount getting to 0 while the fd is still in use is a tough one. It just seems like the management app would be better equipped to understand when a drive is no longer needed. Isn't this what drive_del is for? If qemu is never told that the drive is no longer needed, then the fd remains on the monitor, and it's not a leak. -- Regards, Corey

On 06/26/2012 05:10 AM, Daniel P. Berrange wrote:
On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files stored on the same NFS mount.
This patch series adds the pass-fd QMP monitor command, which allows an fd to be passed via SCM_RIGHTS, and returns the received file descriptor. Support is also added to the block layer to allow QEMU to dup the fd when the filename is of the /dev/fd/X format. This is useful if MAC policy prevents QEMU from opening specific types of files.
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
My problem is that none of this FD passing is "transactional". If libvirtd crashes or otherwise fails between steps 1 & 2, a FD is left open in QEMU. If libvirtd gets the failure detection wrong in step 2 (eg sees a I/O failure on the monitor, but from QEMU's pov drive_add succeeed), we could end up telling QEMU to close an FD that it is still using for a drive. Likewise if libvirtd fails/crashes between steps 2 & 3 we might not clean up after failure.
I see what you're saying, but if libvirt crashes it seems like there are bigger issues going on than a leaked fd. If libvirt fails, then it should call closefd to prevent leakage. I don't know if it really buys that much of an advantage though. I think one major advantage to having separate commands is that other commands can use an fd passed by pass-fd, not just drive_add.
These aren't new problems with pass_fd - they existed with getfd too of course.
If we were designing this interface with no regard for the historical practice in QEMU, then I feel like we would not even bother to have either 'pass_fd' or 'getfd'. We would pass the FD(s) directly with the "drive_add" command.
Given that we have decided that attaching special semantics to filenames matching "/dev/fd/N" is OK, then I feel we could go one better, and allow the FD to be passed with the "drive_add" (or other) commands directly. All we need do is define slightly different semantics for "/dev/fd/N". Instead of it meaning "use the process FD numbered N", we can define it to mean "use the n'th FD set in the current context". The "context" would be populated with all FDs received with the monitor current command.
So now from a client's POV you'd have a flow like
* drive_add "file=/dev/fd/N" FDSET={N}
IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?
And in QEMU you'd have something like
* handle_monitor_command - recvmsg all FDs, and stash them in a thread local "FDContext" context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from "FDContext" - If success remove /dev/fd/N from "FDContext" - close() all FDs left in "FDContext"
The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour.
Wouldn't this leak fds if libvirt crashed part way through sending the set of fds? -- Regards, Corey

On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
So now from a client's POV you'd have a flow like
* drive_add "file=/dev/fd/N" FDSET={N}
IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?
Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received.
And in QEMU you'd have something like
* handle_monitor_command - recvmsg all FDs, and stash them in a thread local "FDContext" context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from "FDContext" - If success remove /dev/fd/N from "FDContext" - close() all FDs left in "FDContext"
The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour.
Wouldn't this leak fds if libvirt crashed part way through sending the set of fds?
No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
So now from a client's POV you'd have a flow like
* drive_add "file=/dev/fd/N" FDSET={N}
IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?
Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received.
And in QEMU you'd have something like
* handle_monitor_command - recvmsg all FDs, and stash them in a thread local "FDContext" context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from "FDContext" - If success remove /dev/fd/N from "FDContext" - close() all FDs left in "FDContext"
The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour.
Wouldn't this leak fds if libvirt crashed part way through sending the set of fds?
No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command.
Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support "/dev/fd/N" for passing FDs during QEMU initial startup for CLI args. IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, XXXXX, closefd).
From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001 From: "Daniel P. Berrange" <berrange@redhat.com> Date: Tue, 26 Jun 2012 15:48:13 +0100 Subject: [PATCH] Support passing FDs with the monitor command that uses them
Add support for a syntax "/dev/fdlist/N", where 'N' refers to the N'th FD that was passed along with the currently processing command in this thread. The QMP client sends { "execute": "drive_add", "arguments": { "filename": "/dev/fdlist/0", "backingfilename": "/dev/fdlist/1" }, "nfds": 2 } followed by "nfds" * SCM_RIGHTS packets The FDs received along with the "drive_add" command are placed in a thread-local list, then the QMP handler function is invoked. The qemu_open() function will resolve any path "/dev/fdlist/0" to use an FD stored in the thread-local list, and dup() it. One the QMP handler function is finished, all FDs in the thread local list are closed. THe reason for choice of "/dev/fdlist/N", is to allow the "/dev/fd/N" syntax to be used for FDs that are passed to QEMU at startup time, which will be in the global FD number namespace, not the thread-local index. The reason for using a thread local is to ensure the FDs are only made available to the currently executing monitor command handler in this thread, and not any other threads. The advantage over using a separate 'passfd' command, is that this guarentees all FDs are closed in QEMU once the QMP command handler is finished. There is no risk of leaks being triggered by the client either intentionally, or via it crashing. NB: this is a proof of concept demonstration of the overall architectural design. The patch would need more work before it was suitable for merging - Pick a better place/file for the fdlist APIs - Do proper error reporting in various places - Check for possibility that qemu_chr_fe_get_msgfd can block or return EAGAIN. diff --git a/monitor.c b/monitor.c index f6107ba..a3a4bd4 100644 --- a/monitor.c +++ b/monitor.c @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) const mon_cmd_t *cmd; const char *cmd_name; Monitor *mon = cur_mon; + int nfds; + size_t i; args = input = NULL; @@ -4535,6 +4537,17 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) goto err_out; } + nfds = qdict_get_int(input, "nfds"); + for (i = 0 ; i < nfds ; i++) { + int fd = qemu_chr_fe_get_msgfd(mon->chr); + if (fd == -1) { + qerror_report(QERR_FD_NOT_SUPPLIED); + goto err_out; + } + + qemu_fdlist_add(fd); + } + if (handler_is_async(cmd)) { err = qmp_async_cmd_handler(mon, cmd, args); if (err) { @@ -4552,6 +4565,7 @@ err_out: out: QDECREF(input); QDECREF(args); + qemu_fdlist_closeall(); } /** diff --git a/osdep.c b/osdep.c index 03817f0..0849cb6 100644 --- a/osdep.c +++ b/osdep.c @@ -75,6 +75,81 @@ int qemu_madvise(void *addr, size_t len, int advice) #endif } +typedef struct QEMUFDList { + int *fds; + size_t nfds; +} QEMUFDList; + +static pthread_key_t fdlist_key; + +static void qemu_fdlist_cleanup(void *data) +{ + size_t i; + QEMUFDList *fdlist = data; + + if (!fdlist) + return; + + for (i = 0 ; i < fdlist->nfds ; i++) { + close(fdlist->fds[i]); + } +} + +int qemu_fdlist_init(void) +{ + if (pthread_key_create(&fdlist_key, + qemu_fdlist_cleanup) < 0) + return -1; + + return 0; +} + +static QEMUFDList *qemu_fdlist_for_thread(void) +{ + QEMUFDList *fdlist = pthread_getspecific(fdlist_key); + + if (fdlist == NULL) { + fdlist = g_new0(QEMUFDList, 1); + pthread_setspecific(fdlist_key, fdlist); + } + + return fdlist; +} + +void qemu_fdlist_add(int fd) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + + fdlist->fds = g_renew(int, fdlist->fds, fdlist->nfds + 1); + fdlist->fds[fdlist->nfds++] = fd; +} + +int qemu_fdlist_dup(int idx) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + + if (idx >= fdlist->nfds) { + errno = EINVAL; + return -1; + } + + return dup(fdlist->fds[idx]); +} + +void qemu_fdlist_closeall(void) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + size_t i; + + for (i = 0 ; i < fdlist->nfds ; i++) { + close(fdlist->fds[i]); + } + g_free(fdlist->fds); + fdlist->fds = NULL; + fdlist->nfds = 0; +} + + /* * Opens a file with FD_CLOEXEC set @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...) { int ret; int mode = 0; + const char *p; + + /* Attempt dup of fd for pre-opened file */ + if (strstart(name, "/dev/fdlist/", &p)) { + int idx; + int fd; + + idx = qemu_parse_fd(p); + if (idx == -1) { + return -1; + } + + fd = qemu_fdlist_dup(idx); + if (fd == -1) { + return -1; + } + + /* XXX verify flags match */ + return fd; + } if (flags & O_CREAT) { va_list ap; diff --git a/qemu-common.h b/qemu-common.h index 8f87e41..b6b7203 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -194,6 +194,11 @@ ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags) ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags) QEMU_WARN_UNUSED_RESULT; +int qemu_fdlist_init(void); +void qemu_fdlist_add(int fd); +int qemu_fdlist_dup(int idx); +void qemu_fdlist_closeall(void); + #ifndef _WIN32 int qemu_eventfd(int pipefd[2]); int qemu_pipe(int pipefd[2]); diff --git a/vl.c b/vl.c index 1329c30..e515c84 100644 --- a/vl.c +++ b/vl.c @@ -2321,6 +2321,11 @@ int main(int argc, char **argv, char **envp) QLIST_INIT (&vm_change_state_head); os_setup_early_signal_handling(); + if (qemu_fdlist_init() < 0) { + perror("fdlist"); + exit(EXIT_FAILURE); + } + module_call_init(MODULE_INIT_MACHINE); machine = find_default_machine(); cpu_model = NULL; -- 1.7.10.2 -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/26/2012 11:03 AM, Daniel P. Berrange wrote:
On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
So now from a client's POV you'd have a flow like
* drive_add "file=/dev/fd/N" FDSET={N}
IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?
Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received.
And in QEMU you'd have something like
* handle_monitor_command - recvmsg all FDs, and stash them in a thread local "FDContext" context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from "FDContext" - If success remove /dev/fd/N from "FDContext" - close() all FDs left in "FDContext"
The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour.
Wouldn't this leak fds if libvirt crashed part way through sending the set of fds?
No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command.
Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support "/dev/fd/N" for passing FDs during QEMU initial startup for CLI args.
IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, XXXXX, closefd).
Thanks for the PoC. Two other required updates that I can think of would be: 1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. 2) Support re-opening files with different access modes (O_RDONLY, O_WRONLY, or O_RDWR). This would be similar to the force option for pass-fd. -- Regards, Corey
From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001 From: "Daniel P. Berrange" <berrange@redhat.com> Date: Tue, 26 Jun 2012 15:48:13 +0100 Subject: [PATCH] Support passing FDs with the monitor command that uses them
Add support for a syntax "/dev/fdlist/N", where 'N' refers to the N'th FD that was passed along with the currently processing command in this thread.
The QMP client sends
{ "execute": "drive_add", "arguments": { "filename": "/dev/fdlist/0", "backingfilename": "/dev/fdlist/1" }, "nfds": 2 }
followed by "nfds" * SCM_RIGHTS packets
The FDs received along with the "drive_add" command are placed in a thread-local list, then the QMP handler function is invoked. The qemu_open() function will resolve any path "/dev/fdlist/0" to use an FD stored in the thread-local list, and dup() it. One the QMP handler function is finished, all FDs in the thread local list are closed.
THe reason for choice of "/dev/fdlist/N", is to allow the "/dev/fd/N" syntax to be used for FDs that are passed to QEMU at startup time, which will be in the global FD number namespace, not the thread-local index.
The reason for using a thread local is to ensure the FDs are only made available to the currently executing monitor command handler in this thread, and not any other threads.
The advantage over using a separate 'passfd' command, is that this guarentees all FDs are closed in QEMU once the QMP command handler is finished. There is no risk of leaks being triggered by the client either intentionally, or via it crashing.
NB: this is a proof of concept demonstration of the overall architectural design. The patch would need more work before it was suitable for merging
- Pick a better place/file for the fdlist APIs - Do proper error reporting in various places - Check for possibility that qemu_chr_fe_get_msgfd can block or return EAGAIN.
diff --git a/monitor.c b/monitor.c index f6107ba..a3a4bd4 100644 --- a/monitor.c +++ b/monitor.c @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) const mon_cmd_t *cmd; const char *cmd_name; Monitor *mon = cur_mon; + int nfds; + size_t i;
args = input = NULL;
@@ -4535,6 +4537,17 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) goto err_out; }
+ nfds = qdict_get_int(input, "nfds"); + for (i = 0 ; i < nfds ; i++) { + int fd = qemu_chr_fe_get_msgfd(mon->chr); + if (fd == -1) { + qerror_report(QERR_FD_NOT_SUPPLIED); + goto err_out; + } + + qemu_fdlist_add(fd); + } + if (handler_is_async(cmd)) { err = qmp_async_cmd_handler(mon, cmd, args); if (err) { @@ -4552,6 +4565,7 @@ err_out: out: QDECREF(input); QDECREF(args); + qemu_fdlist_closeall(); }
/** diff --git a/osdep.c b/osdep.c index 03817f0..0849cb6 100644 --- a/osdep.c +++ b/osdep.c @@ -75,6 +75,81 @@ int qemu_madvise(void *addr, size_t len, int advice) #endif }
+typedef struct QEMUFDList { + int *fds; + size_t nfds; +} QEMUFDList; + +static pthread_key_t fdlist_key; + +static void qemu_fdlist_cleanup(void *data) +{ + size_t i; + QEMUFDList *fdlist = data; + + if (!fdlist) + return; + + for (i = 0 ; i < fdlist->nfds ; i++) { + close(fdlist->fds[i]); + } +} + +int qemu_fdlist_init(void) +{ + if (pthread_key_create(&fdlist_key, + qemu_fdlist_cleanup) < 0) + return -1; + + return 0; +} + +static QEMUFDList *qemu_fdlist_for_thread(void) +{ + QEMUFDList *fdlist = pthread_getspecific(fdlist_key); + + if (fdlist == NULL) { + fdlist = g_new0(QEMUFDList, 1); + pthread_setspecific(fdlist_key, fdlist); + } + + return fdlist; +} + +void qemu_fdlist_add(int fd) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + + fdlist->fds = g_renew(int, fdlist->fds, fdlist->nfds + 1); + fdlist->fds[fdlist->nfds++] = fd; +} + +int qemu_fdlist_dup(int idx) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + + if (idx >= fdlist->nfds) { + errno = EINVAL; + return -1; + } + + return dup(fdlist->fds[idx]); +} + +void qemu_fdlist_closeall(void) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + size_t i; + + for (i = 0 ; i < fdlist->nfds ; i++) { + close(fdlist->fds[i]); + } + g_free(fdlist->fds); + fdlist->fds = NULL; + fdlist->nfds = 0; +} + +
/* * Opens a file with FD_CLOEXEC set @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...) { int ret; int mode = 0; + const char *p; + + /* Attempt dup of fd for pre-opened file */ + if (strstart(name, "/dev/fdlist/", &p)) { + int idx; + int fd; + + idx = qemu_parse_fd(p); + if (idx == -1) { + return -1; + } + + fd = qemu_fdlist_dup(idx); + if (fd == -1) { + return -1; + } + + /* XXX verify flags match */ + return fd; + }
if (flags & O_CREAT) { va_list ap; diff --git a/qemu-common.h b/qemu-common.h index 8f87e41..b6b7203 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -194,6 +194,11 @@ ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags) ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags) QEMU_WARN_UNUSED_RESULT;
+int qemu_fdlist_init(void); +void qemu_fdlist_add(int fd); +int qemu_fdlist_dup(int idx); +void qemu_fdlist_closeall(void); + #ifndef _WIN32 int qemu_eventfd(int pipefd[2]); int qemu_pipe(int pipefd[2]); diff --git a/vl.c b/vl.c index 1329c30..e515c84 100644 --- a/vl.c +++ b/vl.c @@ -2321,6 +2321,11 @@ int main(int argc, char **argv, char **envp) QLIST_INIT (&vm_change_state_head); os_setup_early_signal_handling();
+ if (qemu_fdlist_init() < 0) { + perror("fdlist"); + exit(EXIT_FAILURE); + } + module_call_init(MODULE_INIT_MACHINE); machine = find_default_machine(); cpu_model = NULL;

On 06/26/2012 11:37 AM, Corey Bryant wrote:
On 06/26/2012 11:03 AM, Daniel P. Berrange wrote:
On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
So now from a client's POV you'd have a flow like
* drive_add "file=/dev/fd/N" FDSET={N}
IIUC then drive_add would loop and pass each fd in the set via SCM_RIGHTS?
Yes, you'd probably use the JSON to tell QEMU exactly how many FDs you're intending to pass with the command, and then once the command is received, you'd have N*SCM_RIGHTS messages sent/received.
And in QEMU you'd have something like
* handle_monitor_command - recvmsg all FDs, and stash them in a thread local "FDContext" context - invoke monitor command handler - Sees file=/dev/fd/N - Fetch /dev/fd/N from "FDContext" - If success remove /dev/fd/N from "FDContext" - close() all FDs left in "FDContext"
The key point with this is that because the FDs are directly associated with a monitor command, QEMU can /guarantee/ that FDs are never leaked, regardless of client behaviour.
Wouldn't this leak fds if libvirt crashed part way through sending the set of fds?
No, because you've scoped the FDs to the current monitor instance, and the current command being processed you know to close all FDs when the associated command has finished, as well as closing them all if you see EOF on the monitor socket while in the middle of receiving a command.
Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support "/dev/fd/N" for passing FDs during QEMU initial startup for CLI args.
IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, XXXXX, closefd).
Thanks for the PoC.
Two other required updates that I can think of would be:
1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.
Nevermind my comment. I see that your PoC supports passing nfds for any QMP command. I'm curious what Kevin's thoughts are on this and the overall approach.
2) Support re-opening files with different access modes (O_RDONLY, O_WRONLY, or O_RDWR). This would be similar to the force option for pass-fd.
I'm still not quite sure how we'd go about this. We need a way to determine the existing QEMU fd that is to be re-associated with the new fd, when using a /dev/fdlist/0 filename. In this new approach, 0 corresponds to an index, not an fd. The prior approach of using the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made this easy.
From 55a264b647d90a30c1cc00cb1896535246bcd9b7 Mon Sep 17 00:00:00 2001 From: "Daniel P. Berrange" <berrange@redhat.com> Date: Tue, 26 Jun 2012 15:48:13 +0100 Subject: [PATCH] Support passing FDs with the monitor command that uses them
Add support for a syntax "/dev/fdlist/N", where 'N' refers to the N'th FD that was passed along with the currently processing command in this thread.
The QMP client sends
{ "execute": "drive_add", "arguments": { "filename": "/dev/fdlist/0", "backingfilename": "/dev/fdlist/1" }, "nfds": 2 }
followed by "nfds" * SCM_RIGHTS packets
The FDs received along with the "drive_add" command are placed in a thread-local list, then the QMP handler function is invoked. The qemu_open() function will resolve any path "/dev/fdlist/0" to use an FD stored in the thread-local list, and dup() it. One the QMP handler function is finished, all FDs in the thread local list are closed.
THe reason for choice of "/dev/fdlist/N", is to allow the "/dev/fd/N" syntax to be used for FDs that are passed to QEMU at startup time, which will be in the global FD number namespace, not the thread-local index.
The reason for using a thread local is to ensure the FDs are only made available to the currently executing monitor command handler in this thread, and not any other threads.
The advantage over using a separate 'passfd' command, is that this guarentees all FDs are closed in QEMU once the QMP command handler is finished. There is no risk of leaks being triggered by the client either intentionally, or via it crashing.
NB: this is a proof of concept demonstration of the overall architectural design. The patch would need more work before it was suitable for merging
- Pick a better place/file for the fdlist APIs - Do proper error reporting in various places - Check for possibility that qemu_chr_fe_get_msgfd can block or return EAGAIN.
diff --git a/monitor.c b/monitor.c index f6107ba..a3a4bd4 100644 --- a/monitor.c +++ b/monitor.c @@ -4490,6 +4490,8 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) const mon_cmd_t *cmd; const char *cmd_name; Monitor *mon = cur_mon; + int nfds; + size_t i;
args = input = NULL;
@@ -4535,6 +4537,17 @@ static void handle_qmp_command(JSONMessageParser *parser, QList *tokens) goto err_out; }
+ nfds = qdict_get_int(input, "nfds"); + for (i = 0 ; i < nfds ; i++) { + int fd = qemu_chr_fe_get_msgfd(mon->chr); + if (fd == -1) { + qerror_report(QERR_FD_NOT_SUPPLIED); + goto err_out; + } + + qemu_fdlist_add(fd); + } + if (handler_is_async(cmd)) { err = qmp_async_cmd_handler(mon, cmd, args); if (err) { @@ -4552,6 +4565,7 @@ err_out: out: QDECREF(input); QDECREF(args); + qemu_fdlist_closeall(); }
/** diff --git a/osdep.c b/osdep.c index 03817f0..0849cb6 100644 --- a/osdep.c +++ b/osdep.c @@ -75,6 +75,81 @@ int qemu_madvise(void *addr, size_t len, int advice) #endif }
+typedef struct QEMUFDList { + int *fds; + size_t nfds; +} QEMUFDList; + +static pthread_key_t fdlist_key; + +static void qemu_fdlist_cleanup(void *data) +{ + size_t i; + QEMUFDList *fdlist = data; + + if (!fdlist) + return; + + for (i = 0 ; i < fdlist->nfds ; i++) { + close(fdlist->fds[i]); + } +} + +int qemu_fdlist_init(void) +{ + if (pthread_key_create(&fdlist_key, + qemu_fdlist_cleanup) < 0) + return -1; + + return 0; +} + +static QEMUFDList *qemu_fdlist_for_thread(void) +{ + QEMUFDList *fdlist = pthread_getspecific(fdlist_key); + + if (fdlist == NULL) { + fdlist = g_new0(QEMUFDList, 1); + pthread_setspecific(fdlist_key, fdlist); + } + + return fdlist; +} + +void qemu_fdlist_add(int fd) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + + fdlist->fds = g_renew(int, fdlist->fds, fdlist->nfds + 1); + fdlist->fds[fdlist->nfds++] = fd; +} + +int qemu_fdlist_dup(int idx) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + + if (idx >= fdlist->nfds) { + errno = EINVAL; + return -1; + } + + return dup(fdlist->fds[idx]); +} + +void qemu_fdlist_closeall(void) +{ + QEMUFDList *fdlist = qemu_fdlist_for_thread(); + size_t i; + + for (i = 0 ; i < fdlist->nfds ; i++) { + close(fdlist->fds[i]); + } + g_free(fdlist->fds); + fdlist->fds = NULL; + fdlist->nfds = 0; +} + +
/* * Opens a file with FD_CLOEXEC set @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...) { int ret; int mode = 0; + const char *p; + + /* Attempt dup of fd for pre-opened file */ + if (strstart(name, "/dev/fdlist/", &p)) { + int idx; + int fd; + + idx = qemu_parse_fd(p); + if (idx == -1) { + return -1; + } + + fd = qemu_fdlist_dup(idx); + if (fd == -1) { + return -1; + } + + /* XXX verify flags match */ + return fd; + }
if (flags & O_CREAT) { va_list ap; diff --git a/qemu-common.h b/qemu-common.h index 8f87e41..b6b7203 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -194,6 +194,11 @@ ssize_t qemu_send_full(int fd, const void *buf, size_t count, int flags) ssize_t qemu_recv_full(int fd, void *buf, size_t count, int flags) QEMU_WARN_UNUSED_RESULT;
+int qemu_fdlist_init(void); +void qemu_fdlist_add(int fd); +int qemu_fdlist_dup(int idx); +void qemu_fdlist_closeall(void); + #ifndef _WIN32 int qemu_eventfd(int pipefd[2]); int qemu_pipe(int pipefd[2]); diff --git a/vl.c b/vl.c index 1329c30..e515c84 100644 --- a/vl.c +++ b/vl.c @@ -2321,6 +2321,11 @@ int main(int argc, char **argv, char **envp) QLIST_INIT (&vm_change_state_head); os_setup_early_signal_handling();
+ if (qemu_fdlist_init() < 0) { + perror("fdlist"); + exit(EXIT_FAILURE); + } + module_call_init(MODULE_INIT_MACHINE); machine = find_default_machine(); cpu_model = NULL;
-- Regards, Corey

On Tue, Jun 26, 2012 at 02:40:03PM -0400, Corey Bryant wrote: > > > On 06/26/2012 11:37 AM, Corey Bryant wrote: > > > > > >On 06/26/2012 11:03 AM, Daniel P. Berrange wrote: > >>On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote: > >>>On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: > >>>>>So now from a client's POV you'd have a flow like > >>>>> > >>>>> * drive_add "file=/dev/fd/N" FDSET={N} > >>>> > >>>>IIUC then drive_add would loop and pass each fd in the set via > >>>>SCM_RIGHTS? > >>> > >>>Yes, you'd probably use the JSON to tell QEMU exactly > >>>how many FDs you're intending to pass with the command, > >>>and then once the command is received, you'd have N*SCM_RIGHTS > >>>messages sent/received. > >>> > >>>>> > >>>>>And in QEMU you'd have something like > >>>>> > >>>>> * handle_monitor_command > >>>>> - recvmsg all FDs, and stash them in a thread local > >>>>>"FDContext" > >>>>> context > >>>>> - invoke monitor command handler > >>>>> - Sees file=/dev/fd/N > >>>>> - Fetch /dev/fd/N from "FDContext" > >>>>> - If success remove /dev/fd/N from "FDContext" > >>>>> - close() all FDs left in "FDContext" > >>>>> > >>>>>The key point with this is that because the FDs are directly > >>>>>associated with a monitor command, QEMU can /guarantee/ that > >>>>>FDs are never leaked, regardless of client behaviour. > >>>> > >>>>Wouldn't this leak fds if libvirt crashed part way through sending > >>>>the set of fds? > >>> > >>>No, because you've scoped the FDs to the current monitor instance, > >>>and the current command being processed you know to close all FDs > >>>when the associated command has finished, as well as closing them > >>>all if you see EOF on the monitor socket while in the middle of > >>>receiving a command. > >> > >>Here is a quick proof of concept (ie untested) patch to demonstrate > >>what I mean. It relies on Cory's patch which converts everything > >>to use qemu_open. It is also still valuable to make the change > >>to qemu_open() to support "/dev/fd/N" for passing FDs during > >>QEMU initial startup for CLI args. > >> > >>IMHO, what I propose here is preferrable for QMP clients that > >>our current plan of requiring use of 3 monitor commands (passfd, > >>XXXXX, closefd). > > > >Thanks for the PoC. > > > >Two other required updates that I can think of would be: > > > >1) Update change, block_stream, block_reopen, snapshot_blkdev, and > >perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. > > > > Nevermind my comment. I see that your PoC supports passing nfds for > any QMP command. > > I'm curious what Kevin's thoughts are on this and the overall approach. > > >2) Support re-opening files with different access modes (O_RDONLY, > >O_WRONLY, or O_RDWR). This would be similar to the force option for > >pass-fd. > > > > I'm still not quite sure how we'd go about this. We need a way to > determine the existing QEMU fd that is to be re-associated with the > new fd, when using a /dev/fdlist/0 filename. In this new approach, > 0 corresponds to an index, not an fd. The prior approach of using > the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made > this easy. Hmm, I'm not sure I understand what the use case is for the 'force' parameter ? In my proof of concept I left out the block of code from qemu_open() you had for dup'ing the FD with various different flags set, but that was just for brevity. I think it ought to fit in the same way, unless you're refering to a different area of the code. >> >>@@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...) > >> { > >> int ret; > >> int mode = 0; > >>+ const char *p; > >>+ > >>+ /* Attempt dup of fd for pre-opened file */ > >>+ if (strstart(name, "/dev/fdlist/", &p)) { > >>+ int idx; > >>+ int fd; > >>+ > >>+ idx = qemu_parse_fd(p); > >>+ if (idx == -1) { > >>+ return -1; > >>+ } > >>+ > >>+ fd = qemu_fdlist_dup(idx); > >>+ if (fd == -1) { > >>+ return -1; > >>+ } > >>+ > >>+ /* XXX verify flags match */ eg, this part of the code you had some work related to 'flags' > >>+ return fd; > >>+ } > >> > >> if (flags & O_CREAT) { > >> va_list ap; Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 06/26/2012 04:42 PM, Daniel P. Berrange wrote: > On Tue, Jun 26, 2012 at 02:40:03PM -0400, Corey Bryant wrote: >> >> >> On 06/26/2012 11:37 AM, Corey Bryant wrote: >>> >>> >>> On 06/26/2012 11:03 AM, Daniel P. Berrange wrote: >>>> On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote: >>>>> On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: >>>>>>> So now from a client's POV you'd have a flow like >>>>>>> >>>>>>> * drive_add "file=/dev/fd/N" FDSET={N} >>>>>> >>>>>> IIUC then drive_add would loop and pass each fd in the set via >>>>>> SCM_RIGHTS? >>>>> >>>>> Yes, you'd probably use the JSON to tell QEMU exactly >>>>> how many FDs you're intending to pass with the command, >>>>> and then once the command is received, you'd have N*SCM_RIGHTS >>>>> messages sent/received. >>>>> >>>>>>> >>>>>>> And in QEMU you'd have something like >>>>>>> >>>>>>> * handle_monitor_command >>>>>>> - recvmsg all FDs, and stash them in a thread local >>>>>>> "FDContext" >>>>>>> context >>>>>>> - invoke monitor command handler >>>>>>> - Sees file=/dev/fd/N >>>>>>> - Fetch /dev/fd/N from "FDContext" >>>>>>> - If success remove /dev/fd/N from "FDContext" >>>>>>> - close() all FDs left in "FDContext" >>>>>>> >>>>>>> The key point with this is that because the FDs are directly >>>>>>> associated with a monitor command, QEMU can /guarantee/ that >>>>>>> FDs are never leaked, regardless of client behaviour. >>>>>> >>>>>> Wouldn't this leak fds if libvirt crashed part way through sending >>>>>> the set of fds? >>>>> >>>>> No, because you've scoped the FDs to the current monitor instance, >>>>> and the current command being processed you know to close all FDs >>>>> when the associated command has finished, as well as closing them >>>>> all if you see EOF on the monitor socket while in the middle of >>>>> receiving a command. >>>> >>>> Here is a quick proof of concept (ie untested) patch to demonstrate >>>> what I mean. It relies on Cory's patch which converts everything >>>> to use qemu_open. It is also still valuable to make the change >>>> to qemu_open() to support "/dev/fd/N" for passing FDs during >>>> QEMU initial startup for CLI args. >>>> >>>> IMHO, what I propose here is preferrable for QMP clients that >>>> our current plan of requiring use of 3 monitor commands (passfd, >>>> XXXXX, closefd). >>> >>> Thanks for the PoC. >>> >>> Two other required updates that I can think of would be: >>> >>> 1) Update change, block_stream, block_reopen, snapshot_blkdev, and >>> perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. >>> >> >> Nevermind my comment. I see that your PoC supports passing nfds for >> any QMP command. >> >> I'm curious what Kevin's thoughts are on this and the overall approach. >> >>> 2) Support re-opening files with different access modes (O_RDONLY, >>> O_WRONLY, or O_RDWR). This would be similar to the force option for >>> pass-fd. >>> >> >> I'm still not quite sure how we'd go about this. We need a way to >> determine the existing QEMU fd that is to be re-associated with the >> new fd, when using a /dev/fdlist/0 filename. In this new approach, >> 0 corresponds to an index, not an fd. The prior approach of using >> the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made >> this easy. > > Hmm, I'm not sure I understand what the use case is for the 'force' > parameter ? In my proof of concept I left out the block of code > from qemu_open() you had for dup'ing the FD with various different > flags set, but that was just for brevity. I think it ought to fit > in the same way, unless you're refering to a different area of the > code. > The access modes (O_RDONLY, O_WRONLY, or O_RDWR) can only be set on the open() call, which is performed by libvirt. fcntl(F_SETFL) can't change them. So the point of pass-fd's force option is to allow libvirt to open() the same file with a new access mode, pass it to qemu, and copy it to the existing qemu fd. For example: fd1 = open("/path/to/file.img", O_RDONLY) fd2 = open("/path/to/file.img", O_RDWR) pass-fd disk0 (fd1) returns fd=3 (/dev/fd/3 in qemu has O_RDONLY) drive_add file:/dev/fd/3 pass-fd disk0 (fd2) returns fd=3 (/dev/fd/3 in qemu now has O_RDWR) This is required because the access_mode flags used by qemu_open() are the same as the passed fd's access_mode flags. And qemu will re-open files with different access modes. >>>>> @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...) >>>> { >>>> int ret; >>>> int mode = 0; >>>> + const char *p; >>>> + >>>> + /* Attempt dup of fd for pre-opened file */ >>>> + if (strstart(name, "/dev/fdlist/", &p)) { >>>> + int idx; >>>> + int fd; >>>> + >>>> + idx = qemu_parse_fd(p); >>>> + if (idx == -1) { >>>> + return -1; >>>> + } >>>> + >>>> + fd = qemu_fdlist_dup(idx); >>>> + if (fd == -1) { >>>> + return -1; >>>> + } >>>> + >>>> + /* XXX verify flags match */ > > eg, this part of the code you had some work related to > 'flags' > Yes, no problem and thanks again for the code. :) I'm hoping we can get some more input soon so I can get moving in one direction or another. >>>> + return fd; >>>> + } >>>> >>>> if (flags & O_CREAT) { >>>> va_list ap; > > Daniel > -- Regards, Corey

Am 26.06.2012 20:40, schrieb Corey Bryant:
Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support "/dev/fd/N" for passing FDs during QEMU initial startup for CLI args.
IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, XXXXX, closefd).
Thanks for the PoC.
Two other required updates that I can think of would be:
1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.
Nevermind my comment. I see that your PoC supports passing nfds for any QMP command.
I'm curious what Kevin's thoughts are on this and the overall approach.
I'm not against introducing this nfd thing as a general feature for QMP commands instead of a separate pass-fd command. It's not obvious to me that everyone would agree with that, so let's CC Luiz at least. The that I'm unsure about is what we should do with qemu reopening the file. If you close the fd immediately, you obviously can't do that any more. Even worse, libvirt doesn't have a unique ID for each passed file descriptor any more, so even though we have introduced a QMP feature for file descriptor passing, we would still need to touch all commands to allow assigning a new fd. I think having one stable original fd that libvirt can refer to is much nicer. Kevin

On Wed, 27 Jun 2012 10:58:01 +0200 Kevin Wolf <kwolf@redhat.com> wrote:
Am 26.06.2012 20:40, schrieb Corey Bryant:
Here is a quick proof of concept (ie untested) patch to demonstrate what I mean. It relies on Cory's patch which converts everything to use qemu_open. It is also still valuable to make the change to qemu_open() to support "/dev/fd/N" for passing FDs during QEMU initial startup for CLI args.
IMHO, what I propose here is preferrable for QMP clients that our current plan of requiring use of 3 monitor commands (passfd, XXXXX, closefd).
Thanks for the PoC.
Two other required updates that I can think of would be:
1) Update change, block_stream, block_reopen, snapshot_blkdev, and perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.
Nevermind my comment. I see that your PoC supports passing nfds for any QMP command.
I'm curious what Kevin's thoughts are on this and the overall approach.
I'm not against introducing this nfd thing as a general feature for QMP commands instead of a separate pass-fd command. It's not obvious to me that everyone would agree with that, so let's CC Luiz at least.
I don't have objections either, but I want to point out two things. First, we've agreed on adding new commands instead of extending existing ones. I feel that not everyone is agreement with this and maybe we could revisit this topic, but in principle new commands should be added. Secondly, this is just a small suggestion, but we're in a point in time where several block commands are being added. I think it's a good moment to think hard how the ideal block API would look like and try to design new commands based on that.

On 06/26/2012 04:10 AM, Daniel P. Berrange wrote:
On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files stored on the same NFS mount.
This patch series adds the pass-fd QMP monitor command, which allows an fd to be passed via SCM_RIGHTS, and returns the received file descriptor. Support is also added to the block layer to allow QEMU to dup the fd when the filename is of the /dev/fd/X format. This is useful if MAC policy prevents QEMU from opening specific types of files.
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
My problem is that none of this FD passing is "transactional".
My original patch series did not suffer from this problem. QEMU owned the file descriptor once it received it from libvirt. I don't think the cited problem (QEMU failing an operation if libvirt was down) is really an actual problem since it would be libvirt that would be issuing the command in the first place (so the command would just fail which libvirt would have to assume anyway if it crashed). I really dislike where this thread has headed with /dev/fdset. This has become extremely complex and cumbersome. Perhaps we should reconsider using an RPC for QEMU to request an fd as this solves all the cited problems in a much simpler fashion. Regards, Anthony Liguori

On Mon, 09 Jul 2012 13:40:34 -0500 Anthony Liguori <aliguori@us.ibm.com> wrote:
On 06/26/2012 04:10 AM, Daniel P. Berrange wrote:
On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files stored on the same NFS mount.
This patch series adds the pass-fd QMP monitor command, which allows an fd to be passed via SCM_RIGHTS, and returns the received file descriptor. Support is also added to the block layer to allow QEMU to dup the fd when the filename is of the /dev/fd/X format. This is useful if MAC policy prevents QEMU from opening specific types of files.
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
My problem is that none of this FD passing is "transactional".
My original patch series did not suffer from this problem.
QEMU owned the file descriptor once it received it from libvirt.
I don't think the cited problem (QEMU failing an operation if libvirt was down) is really an actual problem since it would be libvirt that would be issuing the command in the first place (so the command would just fail which libvirt would have to assume anyway if it crashed).
I really dislike where this thread has headed with /dev/fdset. This has become extremely complex and cumbersome.
I agree, maybe it's time to start over and discuss the original problem again.
Perhaps we should reconsider using an RPC for QEMU to request an fd as this solves all the cited problems in a much simpler fashion.
Regards,
Anthony Liguori

On Mon, Jul 09, 2012 at 04:00:37PM -0300, Luiz Capitulino wrote:
On Mon, 09 Jul 2012 13:40:34 -0500 Anthony Liguori <aliguori@us.ibm.com> wrote:
On 06/26/2012 04:10 AM, Daniel P. Berrange wrote:
On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files stored on the same NFS mount.
This patch series adds the pass-fd QMP monitor command, which allows an fd to be passed via SCM_RIGHTS, and returns the received file descriptor. Support is also added to the block layer to allow QEMU to dup the fd when the filename is of the /dev/fd/X format. This is useful if MAC policy prevents QEMU from opening specific types of files.
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
My problem is that none of this FD passing is "transactional".
My original patch series did not suffer from this problem.
QEMU owned the file descriptor once it received it from libvirt.
I don't think the cited problem (QEMU failing an operation if libvirt was down) is really an actual problem since it would be libvirt that would be issuing the command in the first place (so the command would just fail which libvirt would have to assume anyway if it crashed).
I really dislike where this thread has headed with /dev/fdset. This has become extremely complex and cumbersome.
I agree, maybe it's time to start over and discuss the original problem again.
I must say, I'm not entirely sure of all the problems we're trying to solve anymore. I don't think we've ever clearly stated in this thread what all the requirements/problems are, so I'm finding it hard to see what the optimal solution is. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Am 09.07.2012 20:40, schrieb Anthony Liguori:
On 06/26/2012 04:10 AM, Daniel P. Berrange wrote:
On Fri, Jun 22, 2012 at 02:36:07PM -0400, Corey Bryant wrote:
libvirt's sVirt security driver provides SELinux MAC isolation for Qemu guest processes and their corresponding image files. In other words, sVirt uses SELinux to prevent a QEMU process from opening files that do not belong to it.
sVirt provides this support by labeling guests and resources with security labels that are stored in file system extended attributes. Some file systems, such as NFS, do not support the extended attribute security namespace, and therefore cannot support sVirt isolation.
A solution to this problem is to provide fd passing support, where libvirt opens files and passes file descriptors to QEMU. This, along with SELinux policy to prevent QEMU from opening files, can provide image file isolation for NFS files stored on the same NFS mount.
This patch series adds the pass-fd QMP monitor command, which allows an fd to be passed via SCM_RIGHTS, and returns the received file descriptor. Support is also added to the block layer to allow QEMU to dup the fd when the filename is of the /dev/fd/X format. This is useful if MAC policy prevents QEMU from opening specific types of files.
I was thinking about some of the sources complexity when using FD passing from libvirt and wanted to raise one idea for discussion before we continue.
With this proposed series, we have usage akin to:
1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing QEMU's view of the FD 2. drive_add file=/dev/fd/N 3. if failure: close_fd "/dev/fd/N"
My problem is that none of this FD passing is "transactional".
My original patch series did not suffer from this problem.
QEMU owned the file descriptor once it received it from libvirt.
I don't think the cited problem (QEMU failing an operation if libvirt was down) is really an actual problem since it would be libvirt that would be issuing the command in the first place (so the command would just fail which libvirt would have to assume anyway if it crashed).
I really dislike where this thread has headed with /dev/fdset. This has become extremely complex and cumbersome.
What exactly is complex about the interface we're going to provide? A long discussion about how to get the details implemented best doesn't mean at all that the result is complex.
Perhaps we should reconsider using an RPC for QEMU to request an fd as this solves all the cited problems in a much simpler fashion.
NACK. RPC is wrong and no way easier to handle for management. Kevin
participants (7)
-
Anthony Liguori
-
Corey Bryant
-
Daniel P. Berrange
-
Eric Blake
-
Kevin Wolf
-
Luiz Capitulino
-
Stefan Hajnoczi