On Thu, Oct 11, 2018 at 04:48:34PM +0100, Daniel P. Berrangé wrote:
Adding Markus since we're talking about new CLI argument and
capability
reporting standards.
On Fri, Sep 14, 2018 at 05:52:30PM +0400, Marc-André Lureau wrote:
> As discussed during "[PATCH v4 00/29] vhost-user for input & GPU"
> review, let's define a common set of backend conventions to help with
> management layer implementation, and interoperability.
>
> v2:
> - drop --pidfile
> - add some notes about daemonizing & stdin/out/err
>
> Cc: libvir-list(a)redhat.com
> Cc: Gerd Hoffmann <kraxel(a)redhat.com>
> Cc: Daniel P. Berrangé <berrange(a)redhat.com>
> Cc: Changpeng Liu <changpeng.liu(a)intel.com>
> Cc: Dr. David Alan Gilbert <dgilbert(a)redhat.com>
> Cc: Felipe Franciosi <felipe(a)nutanix.com>
> Cc: Gonglei <arei.gonglei(a)huawei.com>
> Cc: Maxime Coquelin <maxime.coquelin(a)redhat.com>
> Cc: Michael S. Tsirkin <mst(a)redhat.com>
> Cc: Victor Kaplansky <victork(a)redhat.com>
> Signed-off-by: Marc-André Lureau <marcandre.lureau(a)redhat.com>
> ---
> docs/interop/vhost-user.txt | 109 +++++++++++++++++++++++++++++++++++-
> 1 file changed, 107 insertions(+), 2 deletions(-)
>
> diff --git a/docs/interop/vhost-user.txt b/docs/interop/vhost-user.txt
> index ba5e37d714..339b335e9c 100644
> --- a/docs/interop/vhost-user.txt
> +++ b/docs/interop/vhost-user.txt
> @@ -17,8 +17,13 @@ The protocol defines 2 sides of the communication, master and
slave. Master is
> the application that shares its virtqueues, in our case QEMU. Slave is the
> consumer of the virtqueues.
>
> -In the current implementation QEMU is the Master, and the Slave is intended to
> -be a software Ethernet switch running in user space, such as Snabbswitch.
> +In the current implementation QEMU is the Master, and the Slave is the
> +external process consuming the virtio queues, for example a software
> +Ethernet switch running in user space, such as Snabbswitch, or a block
> +device backend processing read & write to a virtual disk. In order to
> +facilitate interoperability between various backend implementations,
> +it is recommended to follow the "Backend program conventions"
> +described in this document.
>
> Master and slave can be either a client (i.e. connecting) or server (listening)
> in the socket communication.
> @@ -859,3 +864,103 @@ resilient for selective requests.
> For the message types that already solicit a reply from the client, the
> presence of VHOST_USER_PROTOCOL_F_REPLY_ACK or need_reply bit being set brings
> no behavioural change. (See the 'Communication' section for details.)
> +
> +Backend program conventions
> +---------------------------
> +
> +vhost-user backends provide various services and they may need to be
> +configured manually depending on the use case. However, it is a good
> +idea to follow the conventions listed here when possible. Users, QEMU
> +or libvirt, can then rely on some common behaviour to avoid
> +heterogenous configuration and management of the backend program and
> +facilitate interoperability.
> +
> +In order to be discoverable, default vhost-user backends should be
> +located under "/usr/libexec", and be named "vhost-user-$device"
where
> +"$device" is the device name in lower-case following the name listed
> +in the Linux virtio_ids.h header (ex: the VIRTIO_ID_RPROC_SERIAL
> +backend would be named "vhost-user-rproc-serial").
> +
> +Mechanisms to list, and to select among alternatives implementations
> +or modify the default backend are not described at this point (a
> +distribution may use update-alternatives, for example, to list and to
> +pick a different default backend).
I don't think that update-alternatives is a good thing as it presumes
that each host only needs a single preferred impl at a time.
I think we need to be able to discover all impls for a given device
type.
This feels like the same problem we tackled recently with enumerating
and choosing between multiple firmware impls.
In $git/docs/interop/firmware.json we defined a way to drop config files
into a standard directory, providing info about the firmware in a well
defined QAPI based data format.
Rather than requiring a special file naming convention I think we just
need to register config files in a particular directory, letting the
mgmt app enumerate them.
eg
/etc/qemu/vhost-user/50-rproc-serial.json (a default imp from QEMU)
/etc/qemu/vhost-user/10-my-rproc-serial.json (my replacenment impl)
a file could be something pretty simple like
{
"name": "my-rproc-serial",
"description": "My rproc serial impl doing foo, bar, wizz",
"device": "rproc-serial",
"binary": "/usr/libexec/my-awesome-rproc-serial",
}
Mgmt apps can simply load all files in that directory to learn about
the possible impls. The file load order gives a prioritization if
multiple matches exist, or a specific impl can be requested by
name "my-rproc-serial".
This shouldn't provide full capabilities reporting though, just
enough to identify viable binaries. Capabilities should still be
via the binary itself so it can be dynamically tailored based on
other environmental factors
> +
> +The backend program must not daemonize itself, but it may be
> +daemonized by the management layer. It may also have a restricted
> +access to the system.
> +
> +File descriptors 0, 1 and 2 will exist, and have regular
> +stdin/stdout/stderr usage (they may be redirected to /dev/null by the
> +management layer, or to a log handler).
> +
> +The backend program must end (as quickly and cleanly as possible) when
> +the SIGTERM signal is received. Eventually, it may be SIGKILL by the
> +management layer after a few seconds.
> +
> +The following command line options have an expected behaviour. They
> +are mandatory, unless explicitly said differently:
> +
> +* --socket-path=PATH
> +
> +This option specify the location of the vhost-user Unix domain socket.
> +It is incompatible with --fd.
> +
> +* --fd=FDNUM
> +
> +When this argument is given, the backend program is started with the
> +vhost-user socket as file descriptor FDNUM. It is incompatible with
> +--socket-path.
> +
> +* --print-capabilities
> +
> +Output to stdout a line-seperated list of backend capabilities, and
> +then exit successfully. Other options and arguments should be ignored,
> +and the backend program should not perform its normal function.
This is going to repeat the mistakes we've had with every other
binary in QEMU. A "simple" flag list or args sounds appealing,
but we've always been burnt by it in the medium-long term, which
is why we created QAPI.
If we're doing to have any capabilities reporting, we should
model it in QAPI schema, so any '--print-capabilities' arg
should print a JSON doc following the documented schema.
While talking about QAPI, I think this is an opportunity to
also avoid the problems of CLI arg values becoming more
complex than just scalars. eg
--socket-path=PATH
may inevitably grow more options - eg to perhaps say whether
to use it in listen or connect mode. Or to indicate a reconnect
timeout. etc
I know Markus wants to replace QemuOpts with something that
is again driven by QAPI, so that "-arg $VALUE" can handle
$VALUE being complex non-scalar data following a QAPI
schema with well defined semantics for parsing. Since we
are defining a new standard, I think we should go todo
something better than scalar values right from the start.
> +
> +At the time of writing, there are no common capabilities. Some
> +device-specific capabilities are listed in the respective sections. By
> +convention, device-specific capabilities are prefixed by their device
> +name.
> +
> +vhost-user-input program conventions
> +------------------------------------
> +
> +Capabilities:
> +
> +input-evdev-path
> +
> + The --evdev-path command line option is supported.
> +
> +input-no-grab
> +
> + The --no-grab command line option is supported.
> +
> +* --evdev-path=PATH (optional)
> +
> +Specify the linux input device.
> +
> +* --no-grab (optional)
> +
> +Do no request exclusive access to the input device.
> +
> +vhost-user-gpu program conventions
> +----------------------------------
> +
> +Capabilities:
> +
> +gpu-render-node
> +
> + The --render-node command line option is supported.
> +
> +gpu-virgl
> +
> + The --virgl command line option is supported.
> +
> +* --render-node=PATH (optional)
> +
> +Specify the GPU DRM render node.
> +
> +* --virgl (optional)
> +
> +Enable virgl rendering support.
As a rough illustration I mocked up a possible QAPI schema that covers
the templates describing the binaries, the format of CLI arguments, and
the data for capabilities.
Note, I can't remember what Markus had proposed for CLI arguments in
QAPI, so I invented something arbitary but plausible.
#
# The type of device the vhost-user backend is for
#
{ 'enum': 'VHostUserBackendType',
'data': 'input', 'gpu', ... }
#
# @type: the type of backend interface provided
# @name: short name of the impl, unique wrt @type
# @description: a human-readable description of the firmware.
# @binary: fully qualified path to the binary
#
{
'struct': 'VHostUserBackend',
'data': {
'type': 'VHostUserBackendType',
'name': 'str'
'description': 'str'
'binary': 'str'
}
}
#
# Command line options common to all vhost user backends
#
{
'optionset': 'VHostUserBackendCommandLineBase',
'data': [
{
'option': '--print-capabilities',
'help': 'Print backend capabilities document',
},
{
'option': '--socket',
'data': 'ChardevSocket',
'help': 'Socket to communicate with frontend',
},
]
}
#
# Command line options for vhost user "input" backends
#
{
'optionset': 'VHostUserBackendCommandLineInput',
'base': 'VHostUserBackendCommandLineBase',
'data': [
{
'option': '--evdev-path',
'data': 'str',
'help': 'The Linux input device path',
},
{
'option': '--no-grab',
'data': 'str',
'help': 'Do not request exclusive access to device',
},
]
}
#
# Command line options for vhost user "gpu" backends
#
{
'optionset': 'VHostUserBackendCommandLineGPU',
'base': 'VHostUserBackendCommandLineBase',
'data': [
{
'option': '--render-node',
'data': 'str',
'help': 'The GPU DRM render node path',
},
{
'option': '--virgl',
'help': 'Enable virgl rendering support',
},
]
}
#
# Command line options for vhost user backends
#
{
'union': 'VHostUserBackendCommandLine',
'base': { 'type': 'VHostUserBackendType' },
'discriminator': 'type',
'data': {
'input': 'VHostUserBackendCommandLineInput',
'gpu': 'VHostUserBackendCommandLineGPU',
}
}
{
'enum': 'VHostUserBackendInputFeature',
'data': { 'evdev-path', 'no-grab', }
}
#
# Capabilities reported by vhost user "input" backends
#
{
'struct': 'VHostUserBackendCapabilitiesInput',
'data': {
'features': [ 'VHostUserBackendInputFeature' ],
}
}
{
'enum': 'VHostUserBackendGPUFeature',
'data': { 'render-node', 'virgl' }
}
#
# Capabilities reported by vhost user "gpu" backends
#
{
'struct': 'VHostUserBackendCapabilitiesGPU',
'data': {
'features': [ 'VHostUserBackendGPUFeature' ],
}
}
#
# Capabilities reported by vhost user backends
#
{
'union': 'VHostUserBackendCapabilities',
'base': { 'type': 'VHostUserBackendType' },
'discriminator': 'type',
'data': {
'input': 'VHostUserBackendCapabilitiesInput',
'gpu': 'VHostUserBackendCapabilitiesGPU',
}
}
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|