On 05/01/2012 03:56 PM, Eric Blake wrote:
On 05/01/2012 02:25 PM, Anthony Liguori wrote:
> Thanks for sending this out Stefan.
Indeed.
>> This series adds the -open-hook-fd command-line option. Whenever QEMU
>> needs to
>> open an image file it sends a request over the given UNIX domain
>> socket. The
>> response includes the file descriptor or an errno on failure. Please
>> see the
>> patches for details on the protocol.
>
>> The -open-hook-fd approach allows QEMU to support file
descriptor passing
>> without changing -drive. It also supports snapshot_blkdev and other
>> commands
>> that re-open image files.
>
>> Anthony Liguori<aliguori(a)us.ibm.com> wrote
most of these patches. I
>> added a
>> demo -open-hook-fd server and added some small fixes. Since Anthony is
>> traveling right now I'm sending the RFC for discussion.
> What I like about this approach is that it's useful
outside the block
> layer and is conceptionally simple from a QEMU PoV. We simply delegate
> open() to libvirt and let libvirt enforce whatever rules it wants.
> This is not meant to be an alternative to blockdev, but
even with
> blockdev, I think we still want to use a mechanism like this even with
> blockdev.
The overall series looks like it would be rather interesting. What sort
of timing restrictions are there? For example, the proposed
'drive-reopen' command (probably now delegated to qemu 1.2) would mean
that qemu would be calling back into libvirt in order to do the reopen.
If libvirt takes its time in passing back an open fd, is it going to
starve qemu from answering unrelated monitor commands in the meantime?
s/libvirt/kernel/g and your concerns are equally valid.
Doing open() should never be done in a path that could block things. There's
always the possibility that we're on top of NFS and the open could timeout.
For something like drive_reopen, we should use an asynchronous open() that
dispatched the open() in the posix-aio thread pool.
That's part of what's nice about this approach, we could still call file_open()
in the posix-aio thread pool...
I definitely want to make sure we avoid deadlock where libvirt is
waiting on a monitor command, but the monitor command is waiting on
libvirt to pass an fd.
Is this also an opportunity to request whether a particular fd must be
seekable vs. acceptable as a one-pass read or write, perhaps by whether
the command is 1 (seekable open) or 2 (one-pass open)?
I'm not really sure where the distinction lies...
I want the RPC to behave exactly like open(). So if we're assuming that open()
of a /dev/ file returns something that is ioctl()'able, then that's what libvirt
should return.
If we want to sort of do fd-transformation where a special protocol is used for
things like ioctl, that's fine, but it ought to be a different mechanism (that's
probably not nearly as generic).
For example,
migration is one-pass (and therefore libvirt passes a pipe which is
hooked up to a helper app that uses O_DIRECT), while block devices must
be seekable.
But migration doesn't involve doing an open(). This is not a replacement for fd
passing. This is a replacement for open() to make up for the facts that (1)
some management tools like libvirt cannot isolate guests with DAC and (2)
SELinux cannot be used to isolate guests across all file systems.
I would really prefer that the kernel fix this problem for us, but from what I'm
told, the problem lies in the NFS standards committee so short of forking the
NFS protocol, there isn't much that the kernel can do.
Regards,
Anthony Liguori