On 07/02/2012 06:02 PM, Corey Bryant wrote:
On 06/26/2012 06:54 PM, Eric Blake wrote:
> On 06/26/2012 04:28 PM, Corey Bryant wrote:
>
>>>>> With this proposed series, we have usage akin to:
>>>>>
>>>>> 1. pass_fd FDSET={M} -> returns a string "/dev/fd/N"
showing
>>>>> QEMU's
>>>>> view of the FD
>>>>> 2. drive_add file=/dev/fd/N
>>>>> 3. if failure:
>>>>> close_fd "/dev/fd/N"
>>>>
>>>> In fact, there are more steps:
>>>>
>>>> 4. use it successfully
>>>> 5. close_fd "/dev/fd/N"
>>>>
>>>> I think it would well be possible that qemu just closes the fd when
>>>> it's
>>>> not used internally any more.
>>>
>>> pass-fd could have a flag indicating qemu to do that.
>>>
>>
>> It seems like libvirt would be in a better position to understand when a
>> file is no longer in use, and then it can call close_fd. No? Of course
>> the the only fd that needs to be closed is the originally passed fd. The
>> dup'd fd's are closed by QEMU.
>
> The more we discuss this, the more I'm convinced that commands like
> 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR
> will need to have an optional argument that says the name of the file to
> reopen. That is, I've seen three proposals:
>
> Proposal One: enhance every command to take an fd:name protocol.
> PRO: Successful use of a name removes the fd from the 'getfd' list.
> CON: Lots of commands to change.
> CON: The command sequence is not atomic.
> CON: Block layer does not have a file name tied to the fd, so the reopen
> operation also needs to learn the fd:name protocol, but since we're
> already touching the command it's not much more work.
> USAGE:
> 1. getfd FDSET={M}, ties new fd to "name"
> 2. drive_add fd=name - looks up the fd by name from the list
> 3a. on success, fd is no longer in the list, drive_add consumed it
> 3b. on failure, libvirt must call 'closefd name' to avoid a leak
> 4. getfd FDSET={M2}, ties new fd to "newname"
> 5. block-commit fd=newname - looks up fd by name from the list
> 6a. on success, fd is no longer in the list, block-commit consumed it
> 6b. on failure, libvirt must call 'closefd newname' to avoid a leak
>
> Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then
> update that name to the new fd in advance of any operation that needs to
> do a reopen.
> PRO: All other commands work without impact by using qemu_open(), by
> getting a clone of the permanent name.
> CON: The permanent name must remain open as long as qemu might need to
> reopen it, and reopening needs the pass-fd force option.
> CON: The command sequence is not atomic.
> USAGE:
> 1. pass_fd FDSET={M} -> returns an integer N (or a string "/dev/fd/N")
> showing QEMU's permanent name of the FD
> 2. drive_add file=<permanent name> means that qemu_open() will clone the
> fd, and drive_add is now using yet another FD while N remains open;
> meanwhile, the block layer knows that the drive is tied to the permanent
> name
> 3. pass-fd force FDSET={M2} -> replaces fd N with the passed M2, and
> still returns /dev/fd/N
> 4. block-commit - looks up the block-device name (/dev/fd/N), which maps
> back to the permanent fd, and gets a copy of the newly passed M2
> 5. on completion (success or failure), libvirt must call 'closefd name'
> to avoid a leak
>
> Proposal Three: Use thread-local fds passed alongside any command,
> rather than passing via a separate command
> PRO: All commands can now atomically handle fd passing
> PRO: Commands that only need a one-shot open can now use fd
> CON: Commands that need to reopen still need modification to allow a
> name override during the reopen
> 1. drive_add nfds=1 file="/dev/fdlist/1" FDSET={M} -> on success, the
fd
> is used as the block file, on failure it is atomically closed, so there
> is no leak either way. However, the block file has no permanent name.
> 2. block-commit nfds=1 file-override="/dev/fdlist/1" FDSET={M2} -> file
> override must be passed again (since we have no guarantee that the
> /dev/fdlist/1 of _this_ command matches the command-local naming used in
> the previous command), but the fd is again used atomically
>
> Under proposal 3, there is no need to dup fd's, merely only to check
> that qemu_open("/dev/fdlist/n", flags) has compatible flags with the fd
> received via FDSET. And the fact that things are made atomic means that
> libvirt no longer has to worry about calling closefd, nor does it have
> to worry about being interrupted between two monitor commands and not
> knowing what state qemu is in. But it does mean teaching every possible
> command that wants to do a reopen to learn how to use fd overrides
> rather than to reuse the permanent name that was originally in place on
> the first open.
>
Here's another option that Kevin and I discussed today on IRC. I've
modified a few minor details since the discussion. And Kevin please
correct me if anything is wrong.
Proposal Four: Pass a set of fds via 'pass-fds'. The group of fds
should all refer to the same file, but may have different access flags
(ie. O_RDWR, O_RDONLY). qemu_open can then dup the fd that has the
matching access mode flags.
pass-fds:
{ 'command': 'pass-fds',
'data': {'fdsetname': 'str', '*close':
'bool'},
'returns': 'string' }
close-fds:
{ 'command': 'close-fds',
'data': {'fdsetname': 'str' }
where:
@fdsetname - the file descriptor set name
@close - *if true QEMU closes the monitor fds when they expire.
if false, fds remain on the monitor until close-fds
command.
PRO: Supports reopen
PRO: All other commands work without impact by using qemu_open()
PRO: No fd leakage if close==true specified
CON: Determining when to close fds when close==true may be tricky
USAGE:
1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to
the passed set of fds.
2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the
list that has access flags matching the qemu_open() action flags.
3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first
fd from the list that has access flags matching the qemu_open() action
flags.
4. The monitor fds are closed:
A) *If @close == true, qemu closes the set of fds when the timer
expires
B) If @close == false, libvirt must issue close-fds command to close
the set of fds
*How to solve this has yet to be determined. Kevin mentioned
potentially using a bottom-half or a timer in qemu_close().
Thanks again for taking time to discuss this at today's QEMU community call.
Here's the proposal we discussed at the call. Please let me know if I
missed anything or if there are any issues with this design.
Proposal Five: New monitor commands enable adding/removing an fd
to/from a set. The set of fds should all refer to the same file, but
may have different access flags (ie. O_RDWR, O_RDONLY). qemu_open can
then dup the fd that has the matching access mode flags.
PRO: Supports reopen
PRO: All other commands work without impact by using qemu_open()
PRO: No fd leakage (fds are associated with monitor connection and, if
not in use, closed when monitor disconnects)
PRO: Security-wise this is ok since libvirt can manage the set of fd's
(ie. it can add/remove an O_RDWR fd to/from the set as needed).
CON: Not atomic (e.g. doesn't add an fd with single drive_add command).
USAGE:
1. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named
"/dev/fdset/1" - command returns qemu fd (e.g fd=4) to caller. libvirt
in-use flag turned on for fd.
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the
set that has access flags matching the qemu_open action flags.
qemu_open increments refcount for this fd.
3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named
"/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5). libvirt
in-use flag turned on for fd.
3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first
fd from the set that has access flags matching the qemu_open action
flags. qemu_open increments refcount for this fd.
4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the
set. turns libvirt in-use flag off marking the fd ready to be closed
when qemu is done with it.
5. qemu_close decrements refcount for fd, and closes fd when refcount is
zero and libvirt in use flag is off.
More functional details:
-If libvirt crashes it can call "query-fd /dev/fdset/1" to determine
which fds are open in the set.
-If monitor connection closes, qemu will close fds that have a refcount
of zero. Do we also need a qemu in-use flag in case refcount is zero
and fd is still in use?
-This support requires introduction of qemu_close function that will be
called everywhere in block layer that close is currently called.
Notes:
-Patch series 1 will include support for all of the above. This will be
my initial focus.
-Patch series 2 will include command line support that enables
association of command line fd with a monitor set. This will be my
secondary focus, most likely after patch series 1 is applied.
--
Regards,
Corey