
26 Jun
2012
26 Jun
'12
6:46 p.m.
On 06/26/2012 04:42 PM, Daniel P. Berrange wrote: > On Tue, Jun 26, 2012 at 02:40:03PM -0400, Corey Bryant wrote: >> >> >> On 06/26/2012 11:37 AM, Corey Bryant wrote: >>> >>> >>> On 06/26/2012 11:03 AM, Daniel P. Berrange wrote: >>>> On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote: >>>>> On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote: >>>>>>> So now from a client's POV you'd have a flow like >>>>>>> >>>>>>> * drive_add "file=/dev/fd/N" FDSET={N} >>>>>> >>>>>> IIUC then drive_add would loop and pass each fd in the set via >>>>>> SCM_RIGHTS? >>>>> >>>>> Yes, you'd probably use the JSON to tell QEMU exactly >>>>> how many FDs you're intending to pass with the command, >>>>> and then once the command is received, you'd have N*SCM_RIGHTS >>>>> messages sent/received. >>>>> >>>>>>> >>>>>>> And in QEMU you'd have something like >>>>>>> >>>>>>> * handle_monitor_command >>>>>>> - recvmsg all FDs, and stash them in a thread local >>>>>>> "FDContext" >>>>>>> context >>>>>>> - invoke monitor command handler >>>>>>> - Sees file=/dev/fd/N >>>>>>> - Fetch /dev/fd/N from "FDContext" >>>>>>> - If success remove /dev/fd/N from "FDContext" >>>>>>> - close() all FDs left in "FDContext" >>>>>>> >>>>>>> The key point with this is that because the FDs are directly >>>>>>> associated with a monitor command, QEMU can /guarantee/ that >>>>>>> FDs are never leaked, regardless of client behaviour. >>>>>> >>>>>> Wouldn't this leak fds if libvirt crashed part way through sending >>>>>> the set of fds? >>>>> >>>>> No, because you've scoped the FDs to the current monitor instance, >>>>> and the current command being processed you know to close all FDs >>>>> when the associated command has finished, as well as closing them >>>>> all if you see EOF on the monitor socket while in the middle of >>>>> receiving a command. >>>> >>>> Here is a quick proof of concept (ie untested) patch to demonstrate >>>> what I mean. It relies on Cory's patch which converts everything >>>> to use qemu_open. It is also still valuable to make the change >>>> to qemu_open() to support "/dev/fd/N" for passing FDs during >>>> QEMU initial startup for CLI args. >>>> >>>> IMHO, what I propose here is preferrable for QMP clients that >>>> our current plan of requiring use of 3 monitor commands (passfd, >>>> XXXXX, closefd). >>> >>> Thanks for the PoC. >>> >>> Two other required updates that I can think of would be: >>> >>> 1) Update change, block_stream, block_reopen, snapshot_blkdev, and >>> perhaps other monitor commands to support receiving fd's via SCM_RIGHTS. >>> >> >> Nevermind my comment. I see that your PoC supports passing nfds for >> any QMP command. >> >> I'm curious what Kevin's thoughts are on this and the overall approach. >> >>> 2) Support re-opening files with different access modes (O_RDONLY, >>> O_WRONLY, or O_RDWR). This would be similar to the force option for >>> pass-fd. >>> >> >> I'm still not quite sure how we'd go about this. We need a way to >> determine the existing QEMU fd that is to be re-associated with the >> new fd, when using a /dev/fdlist/0 filename. In this new approach, >> 0 corresponds to an index, not an fd. The prior approach of using >> the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made >> this easy. > > Hmm, I'm not sure I understand what the use case is for the 'force' > parameter ? In my proof of concept I left out the block of code > from qemu_open() you had for dup'ing the FD with various different > flags set, but that was just for brevity. I think it ought to fit > in the same way, unless you're refering to a different area of the > code. > The access modes (O_RDONLY, O_WRONLY, or O_RDWR) can only be set on the open() call, which is performed by libvirt. fcntl(F_SETFL) can't change them. So the point of pass-fd's force option is to allow libvirt to open() the same file with a new access mode, pass it to qemu, and copy it to the existing qemu fd. For example: fd1 = open("/path/to/file.img", O_RDONLY) fd2 = open("/path/to/file.img", O_RDWR) pass-fd disk0 (fd1) returns fd=3 (/dev/fd/3 in qemu has O_RDONLY) drive_add file:/dev/fd/3 pass-fd disk0 (fd2) returns fd=3 (/dev/fd/3 in qemu now has O_RDWR) This is required because the access_mode flags used by qemu_open() are the same as the passed fd's access_mode flags. And qemu will re-open files with different access modes. >>>>> @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...) >>>> { >>>> int ret; >>>> int mode = 0; >>>> + const char *p; >>>> + >>>> + /* Attempt dup of fd for pre-opened file */ >>>> + if (strstart(name, "/dev/fdlist/", &p)) { >>>> + int idx; >>>> + int fd; >>>> + >>>> + idx = qemu_parse_fd(p); >>>> + if (idx == -1) { >>>> + return -1; >>>> + } >>>> + >>>> + fd = qemu_fdlist_dup(idx); >>>> + if (fd == -1) { >>>> + return -1; >>>> + } >>>> + >>>> + /* XXX verify flags match */ > > eg, this part of the code you had some work related to > 'flags' > Yes, no problem and thanks again for the code. :) I'm hoping we can get some more input soon so I can get moving in one direction or another. >>>> + return fd; >>>> + } >>>> >>>> if (flags & O_CREAT) { >>>> va_list ap; > > Daniel > -- Regards, Corey