On 06/26/2012 04:42 PM, Daniel P. Berrange wrote:
On Tue, Jun 26, 2012 at 02:40:03PM -0400, Corey Bryant wrote:
>
>
> On 06/26/2012 11:37 AM, Corey Bryant wrote:
>>
>>
>> On 06/26/2012 11:03 AM, Daniel P. Berrange wrote:
>>> On Tue, Jun 26, 2012 at 03:11:40PM +0100, Daniel P. Berrange wrote:
>>>> On Tue, Jun 26, 2012 at 09:52:51AM -0400, Corey Bryant wrote:
>>>>>> So now from a client's POV you'd have a flow like
>>>>>>
>>>>>> * drive_add "file=/dev/fd/N" FDSET={N}
>>>>>
>>>>> IIUC then drive_add would loop and pass each fd in the set via
>>>>> SCM_RIGHTS?
>>>>
>>>> Yes, you'd probably use the JSON to tell QEMU exactly
>>>> how many FDs you're intending to pass with the command,
>>>> and then once the command is received, you'd have N*SCM_RIGHTS
>>>> messages sent/received.
>>>>
>>>>>>
>>>>>> And in QEMU you'd have something like
>>>>>>
>>>>>> * handle_monitor_command
>>>>>> - recvmsg all FDs, and stash them in a thread local
>>>>>> "FDContext"
>>>>>> context
>>>>>> - invoke monitor command handler
>>>>>> - Sees file=/dev/fd/N
>>>>>> - Fetch /dev/fd/N from "FDContext"
>>>>>> - If success remove /dev/fd/N from
"FDContext"
>>>>>> - close() all FDs left in "FDContext"
>>>>>>
>>>>>> The key point with this is that because the FDs are directly
>>>>>> associated with a monitor command, QEMU can /guarantee/ that
>>>>>> FDs are never leaked, regardless of client behaviour.
>>>>>
>>>>> Wouldn't this leak fds if libvirt crashed part way through
sending
>>>>> the set of fds?
>>>>
>>>> No, because you've scoped the FDs to the current monitor instance,
>>>> and the current command being processed you know to close all FDs
>>>> when the associated command has finished, as well as closing them
>>>> all if you see EOF on the monitor socket while in the middle of
>>>> receiving a command.
>>>
>>> Here is a quick proof of concept (ie untested) patch to demonstrate
>>> what I mean. It relies on Cory's patch which converts everything
>>> to use qemu_open. It is also still valuable to make the change
>>> to qemu_open() to support "/dev/fd/N" for passing FDs during
>>> QEMU initial startup for CLI args.
>>>
>>> IMHO, what I propose here is preferrable for QMP clients that
>>> our current plan of requiring use of 3 monitor commands (passfd,
>>> XXXXX, closefd).
>>
>> Thanks for the PoC.
>>
>> Two other required updates that I can think of would be:
>>
>> 1) Update change, block_stream, block_reopen, snapshot_blkdev, and
>> perhaps other monitor commands to support receiving fd's via SCM_RIGHTS.
>>
>
> Nevermind my comment. I see that your PoC supports passing nfds for
> any QMP command.
>
> I'm curious what Kevin's thoughts are on this and the overall approach.
>
>> 2) Support re-opening files with different access modes (O_RDONLY,
>> O_WRONLY, or O_RDWR). This would be similar to the force option for
>> pass-fd.
>>
>
> I'm still not quite sure how we'd go about this. We need a way to
> determine the existing QEMU fd that is to be re-associated with the
> new fd, when using a /dev/fdlist/0 filename. In this new approach,
> 0 corresponds to an index, not an fd. The prior approach of using
> the /dev/fd/nnn, where nnn corresponded to an actual QEMU fd, made
> this easy.
Hmm, I'm not sure I understand what the use case is for the 'force'
parameter ? In my proof of concept I left out the block of code
from qemu_open() you had for dup'ing the FD with various different
flags set, but that was just for brevity. I think it ought to fit
in the same way, unless you're refering to a different area of the
code.
The access modes (O_RDONLY, O_WRONLY, or O_RDWR) can only be set on the
open() call, which is performed by libvirt. fcntl(F_SETFL) can't change
them. So the point of pass-fd's force option is to allow libvirt to
open() the same file with a new access mode, pass it to qemu, and copy
it to the existing qemu fd. For example:
fd1 = open("/path/to/file.img", O_RDONLY)
fd2 = open("/path/to/file.img", O_RDWR)
pass-fd disk0 (fd1) returns fd=3 (/dev/fd/3 in qemu has O_RDONLY)
drive_add file:/dev/fd/3
pass-fd disk0 (fd2) returns fd=3 (/dev/fd/3 in qemu now has O_RDWR)
This is required because the access_mode flags used by qemu_open() are
the same as the passed fd's access_mode flags. And qemu will re-open
files with different access modes.
>>>> @@ -83,6 +158,26 @@ int qemu_open(const char *name,
int flags, ...)
>>> {
>>> int ret;
>>> int mode = 0;
>>> + const char *p;
>>> +
>>> + /* Attempt dup of fd for pre-opened file */
>>> + if (strstart(name, "/dev/fdlist/", &p)) {
>>> + int idx;
>>> + int fd;
>>> +
>>> + idx = qemu_parse_fd(p);
>>> + if (idx == -1) {
>>> + return -1;
>>> + }
>>> +
>>> + fd = qemu_fdlist_dup(idx);
>>> + if (fd == -1) {
>>> + return -1;
>>> + }
>>> +
>>> + /* XXX verify flags match */
eg, this part of the code you had some work related to
'flags'
Yes, no problem and thanks again for the code. :) I'm hoping we can get
some more input soon so I can get moving in one direction or another.
>>> + return fd;
>>> + }
>>>
>>> if (flags & O_CREAT) {
>>> va_list ap;
Daniel
--
Regards,
Corey