Re: [libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd

26 Jun 2012

On 06/26/2012 04:42 PM, Daniel P. Berrange wrote:
> On Tue, Jun 26, 2012 at 02:40:03PM -0400, Corey Bryant wrote:
>>
>>
>> On 06/26/2012 11:37 AM, Corey Bryant wrote:
>>>
>>>
>>> On 06/26/2012 11:03 AM, Daniel P. Berrange wrote:
>>>> On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
>>>>> On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
>>>>>>> So now from a client's POV you'd have a flow like
>>>>>>>
>>>>>>>     * drive_add "file=/dev/fd/N"  FDSET={N}
>>>>>>
>>>>>> IIUC then drive_add would loop and pass each fd in the set via
>>>>>> SCM_RIGHTS?
>>>>>
>>>>> Yes, you'd probably use the JSON to tell QEMU exactly
>>>>> how many FDs you're intending to pass with the command,
>>>>> and then once the command is received, you'd have N*SCM_RIGHTS
>>>>> messages sent/received.
>>>>>
>>>>>>>
>>>>>>> And in QEMU you'd have something like
>>>>>>>
>>>>>>>     * handle_monitor_command
>>>>>>>          - recvmsg all FDs, and stash them in a thread local
>>>>>>> "FDContext"
>>>>>>>            context
>>>>>>>          - invoke monitor command handler
>>>>>>>                - Sees file=/dev/fd/N
>>>>>>>                - Fetch /dev/fd/N from "FDContext"
>>>>>>>                - If success remove /dev/fd/N from "FDContext"
>>>>>>>          - close() all FDs left in "FDContext"
>>>>>>>
>>>>>>> The key point with this is that because the FDs are directly
>>>>>>> associated with a monitor command, QEMU can /guarantee/ that
>>>>>>> FDs are never leaked, regardless of client behaviour.
>>>>>>
>>>>>> Wouldn't this leak fds if libvirt crashed part way through sending
>>>>>> the set of fds?
>>>>>
>>>>> No, because you've scoped the FDs to the current monitor instance,
>>>>> and the current command being processed you know to close all FDs
>>>>> when the associated command has finished, as well as closing them
>>>>> all if you see EOF on the monitor socket while in the middle of
>>>>> receiving a command.
>>>>
>>>> Here is a quick proof of concept (ie untested) patch to demonstrate
>>>> what I mean. It relies on Cory's patch which converts everything
>>>> to use qemu_open. It is also still valuable to make the change
>>>> to qemu_open() to support "/dev/fd/N" for passing FDs during
>>>> QEMU initial startup for CLI args.
>>>>
>>>> IMHO, what I propose here is preferrable for QMP clients that
>>>> our current plan of requiring use of 3 monitor commands (passfd,
>>>> XXXXX, closefd).
>>>
>>> Thanks for the PoC.
>>>
>>> Two other required updates that I can think of would be:
>>>
>>> 1) Update change, block_stream, block_reopen, snapshot_blkdev, and
>>> perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.
>>>
>>
>> Nevermind my comment.  I see that your PoC supports passing nfds for
>> any QMP command.
>>
>> I'm curious what Kevin's thoughts are on this and the overall approach.
>>
>>> 2) Support re-opening files with different access modes (O_RDONLY,
>>> O_WRONLY, or O_RDWR).  This would be similar to the force option for
>>> pass-fd.
>>>
>>
>> I'm still not quite sure how we'd go about this.  We need a way to
>> determine the existing QEMU fd that is to be re-associated with the
>> new fd, when using a /dev/fdlist/0 filename.  In this new approach,
>> 0 corresponds to an index, not an fd.  The prior approach of using
>> the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made
>> this easy.
>
> Hmm, I'm not sure I understand what the use case is for the 'force'
> parameter ? In my proof of concept I left out the  block of code
> from qemu_open() you had for dup'ing the FD with various different
> flags set, but that was just for brevity. I think it ought to fit
> in the same way, unless you're refering to a different area of the
> code.
>

The access modes (O_RDONLY, O_WRONLY, or O_RDWR) can only be set on the 
open() call, which is performed by libvirt.  fcntl(F_SETFL) can't change 
them.  So the point of pass-fd's force option is to allow libvirt to 
open() the same file with a new access mode, pass it to qemu, and copy 
it to the existing qemu fd. For example:

fd1 = open("/path/to/file.img", O_RDONLY)
fd2 = open("/path/to/file.img", O_RDWR)
pass-fd disk0 (fd1) returns fd=3 (/dev/fd/3 in qemu has O_RDONLY)
drive_add file:/dev/fd/3
pass-fd disk0 (fd2) returns fd=3 (/dev/fd/3 in qemu now has O_RDWR)

This is required because the access_mode flags used by qemu_open() are 
the same as the passed fd's access_mode flags.  And qemu will re-open 
files with different access modes.

>>>>> @@ -83,6 +158,26 @@ int qemu_open(const char *name, int flags, ...)
>>>>   {
>>>>       int ret;
>>>>       int mode = 0;
>>>> +    const char *p;
>>>> +
>>>> +    /* Attempt dup of fd for pre-opened file */
>>>> +    if (strstart(name, "/dev/fdlist/", &p)) {
>>>> +        int idx;
>>>> +    int fd;
>>>> +
>>>> +        idx = qemu_parse_fd(p);
>>>> +        if (idx == -1) {
>>>> +            return -1;
>>>> +        }
>>>> +
>>>> +    fd = qemu_fdlist_dup(idx);
>>>> +    if (fd == -1) {
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    /* XXX verify flags match */
>
> eg, this part of the code you had some work related to
> 'flags'
>

Yes, no problem and thanks again for the code. :)  I'm hoping we can get 
some more input soon so I can get moving in one direction or another.

>>>> +    return fd;
>>>> +    }
>>>>
>>>>       if (flags & O_CREAT) {
>>>>           va_list ap;
>
> Daniel
>

-- 
Regards,
Corey