Signed-off-by: Peter Krempa <pkrempa(a)redhat.com>
---
docs/api.rst | 2 +-
docs/docs.rst | 3 -
docs/internals/meson.build | 1 -
docs/internals/rpc.html.in | 914 -------------------------------
docs/kbase/index.rst | 3 +
docs/kbase/internals/meson.build | 1 +
docs/kbase/internals/rpc.rst | 781 ++++++++++++++++++++++++++
7 files changed, 786 insertions(+), 919 deletions(-)
delete mode 100644 docs/internals/rpc.html.in
create mode 100644 docs/kbase/internals/rpc.rst
diff --git a/docs/api.rst b/docs/api.rst
index d9f01fb403..325b9b840c 100644
--- a/docs/api.rst
+++ b/docs/api.rst
@@ -219,7 +219,7 @@ Daemon and Remote Access
Access to libvirt drivers is primarily handled by the libvirtd daemon
through the `remote <remote.html>`__ driver via an
-`RPC <internals/rpc.html>`__. Some hypervisors do support client-side
+`RPC <kbase/internals/rpc.html>`__. Some hypervisors do support client-side
connections and responses, such as Test, OpenVZ, VMware, VirtualBox
(vbox), ESX, Hyper-V, Xen, and Virtuozzo. The libvirtd daemon service is
started on the host at system boot time and can also be restarted at any
diff --git a/docs/docs.rst b/docs/docs.rst
index 3387dacce8..0a698913be 100644
--- a/docs/docs.rst
+++ b/docs/docs.rst
@@ -154,9 +154,6 @@ Project development
`API extensions <api_extension.html>`__
Adding new public libvirt APIs
-`RPC protocol & APIs <internals/rpc.html>`__
- RPC protocol information and API / dispatch guide
-
`Functional testing <testsuites.html>`__
Testing libvirt with
`TCK test suite <testtck.html>`__ and
diff --git a/docs/internals/meson.build b/docs/internals/meson.build
index 68a2e70a3d..cbf0623c08 100644
--- a/docs/internals/meson.build
+++ b/docs/internals/meson.build
@@ -1,5 +1,4 @@
internals_in_files = [
- 'rpc',
]
html_xslt_gen_install_dir = docs_html_dir / 'internals'
diff --git a/docs/internals/rpc.html.in b/docs/internals/rpc.html.in
deleted file mode 100644
index ceb7dba5f2..0000000000
--- a/docs/internals/rpc.html.in
+++ /dev/null
@@ -1,914 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE html>
-<html
xmlns="http://www.w3.org/1999/xhtml">
- <body>
- <h1>libvirt RPC infrastructure</h1>
-
- <ul id="toc"></ul>
-
- <p>
- libvirt includes a basic protocol and code to implement
- an extensible, secure client/server RPC service. This was
- originally designed for communication between the libvirt
- client library and the libvirtd daemon, but the code is
- now isolated to allow reuse in other areas of libvirt code.
- This document provides an overview of the protocol and
- structure / operation of the internal RPC library APIs.
- </p>
-
-
- <h2><a id="protocol">RPC protocol</a></h2>
-
- <p>
- libvirt uses a simple, variable length, packet based RPC protocol.
- All structured data within packets is encoded using the
- <a
href="https://en.wikipedia.org/wiki/External_Data_Representation&quo...
standard</a>
- as currently defined by <a
href="https://tools.ietf.org/html/rfc4506">RFC 4506</a>.
- On any connection running the RPC protocol, there can be multiple
- programs active, each supporting one or more versions. A program
- defines a set of procedures that it supports. The procedures can
- support call+reply method invocation, asynchronous events,
- and generic data streams. Method invocations can be overlapped,
- so waiting for a reply to one will not block the receipt of the
- reply to another outstanding method. The protocol was loosely
- inspired by the design of SunRPC. The definition of the RPC
- protocol is in the file <code>src/rpc/virnetprotocol.x</code>
- in the libvirt source tree.
- </p>
-
- <h3><a href="protocolframing">Packet
framing</a></h3>
-
- <p>
- On the wire, there is no explicit packet framing marker. Instead
- each packet is preceded by an unsigned 32-bit integer giving
- the total length of the packet in bytes. This length includes
- the 4-bytes of the length word itself. Conceptually the framing
- looks like this:
- </p>
-
-<pre>
-|~~~ Packet 1 ~~~|~~~ Packet 2 ~~~|~~~ Packet 3 ~~~|~~~
-
-+-------+------------+-------+------------+-------+------------+...
-| n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 |
-+-------+------------+-------+------------+-------+------------+...
-
-|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~
-
-</pre>
-
- <h3><a href="protocoldata">Packet data</a></h3>
-
- <p>
- The data in each packet is split into two parts, a short
- fixed length header, followed by a variable length payload.
- So a packet from the illustration above is more correctly
- shown as
- </p>
-
-<pre>
-
-+-------+-------------+---------------....---+
-| n=U32 | 6*U32 | (n-(7*4))*U8 |
-+-------+-------------+---------------....---+
-
-|~ Len ~|~ Header ~|~ Payload .... ~|
-</pre>
-
-
- <h3><a href="protocolheader">Packet
header</a></h3>
- <p>
- The header contains 6 fields, encoded as signed/unsigned 32-bit
- integers.
- </p>
-
- <pre>
-+---------------+
-| program=U32 |
-+---------------+
-| version=U32 |
-+---------------+
-| procedure=S32 |
-+---------------+
-| type=S32 |
-+---------------+
-| serial=U32 |
-+---------------+
-| status=S32 |
-+---------------+
- </pre>
-
- <dl>
- <dt><code>program</code></dt>
- <dd>
- This is an arbitrarily chosen number that will uniquely
- identify the "service" running over the stream.
- </dd>
- <dt><code>version</code></dt>
- <dd>
- This is the version number of the program, by convention
- starting from '1'. When an incompatible change is made
- to a program, the version number is incremented. Ideally
- both versions will then be supported on the wire in
- parallel for backwards compatibility.
- </dd>
- <dt><code>procedure</code></dt>
- <dd>
- This is an arbitrarily chosen number that will uniquely
- identify the method call, or event associated with the
- packet. By convention, procedure numbers start from 1
- and are assigned monotonically thereafter.
- </dd>
- <dt><code>type</code></dt>
- <dd>
- <p>
- This can be one of the following enumeration values
- </p>
- <ol>
- <li>call: invocation of a method call</li>
- <li>reply: completion of a method call</li>
- <li>event: an asynchronous event</li>
- <li>stream: control info or data from a stream</li>
- </ol>
- </dd>
- <dt><code>serial</code></dt>
- <dd>
- This is a number that starts from 1 and increases
- each time a method call packet is sent. A reply or
- stream packet will have a serial number matching the
- original method call packet serial. Events always
- have the serial number set to 0.
- </dd>
- <dt><code>status</code></dt>
- <dd>
- <p>
- This can one of the following enumeration values
- </p>
- <ol>
- <li>ok: a normal packet. this is always set for method calls or events.
- For replies it indicates successful completion of the method. For
- streams it indicates confirmation of the end of file on the
stream.</li>
- <li>error: for replies this indicates that the method call failed
- and error information is being returned. For streams this indicates
- that not all data was sent and the stream has aborted</li>
- <li>continue: for streams this indicates that further data packets
- will be following</li>
- </ol>
- </dd>
- </dl>
-
- <h3><a href="protocolpayload">Packet
payload</a></h3>
-
- <p>
- The payload of a packet will vary depending on the <code>type</code>
- and <code>status</code> fields from the header.
- </p>
-
- <ul>
- <li>type=call: the in parameters for the method call, XDR encoded</li>
- <li>type=call-with-fds: number of file handles, then the in parameters for
the method call, XDR encoded, followed by the file handles</li>
- <li>type=reply+status=ok: the return value and/or out parameters for the
method call, XDR encoded</li>
- <li>type=reply+status=error: the error information for the method, a
virErrorPtr XDR encoded</li>
- <li>type=reply-with-fds+status=ok: number of file handles, the return value
and/or out parameters for the method call, XDR encoded, followed by the file
handles</li>
- <li>type=reply-with-fds+status=error: number of file handles, the error
information for the method, a virErrorPtr XDR encoded, followed by the file
handles</li>
- <li>type=event: the parameters for the event, XDR encoded</li>
- <li>type=stream+status=ok: no payload</li>
- <li>type=stream+status=error: the error information for the method, a
virErrorPtr XDR encoded</li>
- <li>type=stream+status=continue: the raw bytes of data for the stream. No XDR
encoding</li>
- </ul>
-
- <p>
- With the two packet types that support passing file descriptors, in
- between the header and the payload there will be a 4-byte integer
- specifying the number of file descriptors which are being sent.
- The actual file handles are sent after the payload has been sent.
- Each file handle has a single dummy byte transmitted as a carrier
- for the out of band file descriptor. While the sender should always
- send '\0' as the dummy byte value, the receiver ought to ignore the
- value for the sake of robustness.
- </p>
-
- <p>
- For the exact payload information for each procedure, consult the XDR protocol
- definition for the program+version in question
- </p>
-
- <h3><a id="wireexamples">Wire examples</a></h3>
-
- <p>
- The following diagrams illustrate some example packet exchanges
- between a client and server
- </p>
-
- <h4><a id="wireexamplescall">Method call</a></h4>
-
- <p>
- A single method call and successful
- reply, for a program=8, version=1, procedure=3, which 10 bytes worth
- of input args, and 4 bytes worth of return values. The overall input
- packet length is 4 + 24 + 10 == 38, and output packet length 32
- </p>
-
- <pre>
- +--+-----------------------+-----------+
-C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
- +--+-----------------------+-----------+
-
- +--+-----------------------+--------+
-C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
- +--+-----------------------+--------+
- </pre>
-
- <h4><a id="wireexamplescallerr">Method call with
error</a></h4>
-
- <p>
- An unsuccessful method call will instead return an error object
- </p>
-
- <pre>
- +--+-----------------------+-----------+
-C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
- +--+-----------------------+-----------+
-
- +--+-----------------------+--------------------------+
-C <-- |48| 8 | 1 | 3 | 2 | 1 | 0 | .o.oOo.o.oOo.o.oOo.o.oOo | <-- S
(error)
- +--+-----------------------+--------------------------+
- </pre>
-
- <h4><a id="wireexamplescallup">Method call with upload
stream</a></h4>
-
- <p>
- A method call which also involves uploading some data over
- a stream will result in
- </p>
-
- <pre>
- +--+-----------------------+-----------+
-C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
- +--+-----------------------+-----------+
-
- +--+-----------------------+--------+
-C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
- +--+-----------------------+--------+
-
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- ...
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+
-C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish)
- +--+-----------------------+
- +--+-----------------------+
-C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish)
- +--+-----------------------+
- </pre>
-
- <h4><a id="wireexamplescallbi">Method call bidirectional
stream</a></h4>
-
- <p>
- A method call which also involves a bi-directional stream will
- result in
- </p>
-
- <pre>
- +--+-----------------------+-----------+
-C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
- +--+-----------------------+-----------+
-
- +--+-----------------------+--------+
-C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
- +--+-----------------------+--------+
-
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S
(stream data down)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S
(stream data down)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S
(stream data down)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S
(stream data down)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- ..
- +--+-----------------------+-------------....-------+
-C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S
(stream data up)
- +--+-----------------------+-------------....-------+
- +--+-----------------------+
-C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish)
- +--+-----------------------+
- +--+-----------------------+
-C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish)
- +--+-----------------------+
- </pre>
-
-
- <h4><a id="wireexamplescallmany">Method calls
overlapping</a></h4>
- <pre>
- +--+-----------------------+-----------+
-C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call 1)
- +--+-----------------------+-----------+
- +--+-----------------------+-----------+
-C --> |38| 8 | 1 | 3 | 0 | 2 | 0 | .o.oOo.o. | --> S (call 2)
- +--+-----------------------+-----------+
- +--+-----------------------+--------+
-C <-- |32| 8 | 1 | 3 | 1 | 2 | 0 | .o.oOo | <-- S (reply 2)
- +--+-----------------------+--------+
- +--+-----------------------+-----------+
-C --> |38| 8 | 1 | 3 | 0 | 3 | 0 | .o.oOo.o. | --> S (call 3)
- +--+-----------------------+-----------+
- +--+-----------------------+--------+
-C <-- |32| 8 | 1 | 3 | 1 | 3 | 0 | .o.oOo | <-- S (reply 3)
- +--+-----------------------+--------+
- +--+-----------------------+-----------+
-C --> |38| 8 | 1 | 3 | 0 | 4 | 0 | .o.oOo.o. | --> S (call 4)
- +--+-----------------------+-----------+
- +--+-----------------------+--------+
-C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply 1)
- +--+-----------------------+--------+
- +--+-----------------------+--------+
-C <-- |32| 8 | 1 | 3 | 1 | 4 | 0 | .o.oOo | <-- S (reply 4)
- +--+-----------------------+--------+
- </pre>
-
- <h4><a id="wireexamplescallfd">Method call with passed
FD</a></h4>
-
- <p>
- A single method call with 2 passed file descriptors and successful
- reply, for a program=8, version=1, procedure=3, which 10 bytes worth
- of input args, and 4 bytes worth of return values. The number of
- file descriptors is encoded as a 32-bit int. Each file descriptor
- then has a 1 byte dummy payload. The overall input
- packet length is 4 + 24 + 4 + 2 + 10 == 44, and output packet length 32.
- </p>
-
- <pre>
- +--+-----------------------+---------------+-------+
-C --> |44| 8 | 1 | 3 | 0 | 1 | 0 | 2 | .o.oOo.o. | 0 | 0 | --> S (call)
- +--+-----------------------+---------------+-------+
-
- +--+-----------------------+--------+
-C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
- +--+-----------------------+--------+
- </pre>
-
-
- <h2><a id="security">RPC security</a></h2>
-
- <p>
- There are various things to consider to ensure an implementation
- of the RPC protocol can be satisfactorily secured
- </p>
-
- <h3><a
id="securitytls">Authentication/encryption</a></h3>
-
- <p>
- The basic RPC protocol does not define or require any specific
- authentication/encryption capabilities. A generic solution to
- providing encryption for the protocol is to run the protocol
- over a TLS encrypted data stream. x509 certificate checks can
- be done to form a crude authentication mechanism. It is also
- possible for an RPC program to negotiate an encryption /
- authentication capability, such as SASL, which may then also
- provide per-packet data encryption. Finally the protocol data
- stream can of course be tunnelled over transports such as SSH.
- </p>
-
- <h3><a id="securitylimits">Data limits</a></h3>
-
- <p>
- Although the protocol itself defines many arbitrary sized data values in the
- payloads, to avoid denial of service attack there are a number of size limit
- checks prior to encoding or decoding data. There is a limit on the maximum
- size of a single RPC message, limit on the maximum string length, and limits
- on any other parameter which uses a variable length array. These limits can
- be raised, subject to agreement between client/server, without otherwise
- breaking compatibility of the RPC data on the wire.
- </p>
-
- <h3><a id="securityvalidate">Data
validation</a></h3>
-
- <p>
- It is important that all data be fully validated before performing
- any actions based on the data. When reading an RPC packet, the
- first four bytes must be read and the max packet size limit validated,
- before any attempt is made to read the variable length packet data.
- After a complete packet has been read, the header must be decoded
- and all 6 fields fully validated, before attempting to dispatch
- the payload. Once dispatched, the payload can be decoded and passed
- on to the appropriate API for execution. The RPC code must not take
- any action based on the payload, since it has no way to validate
- the semantics of the payload data. It must delegate this to the
- execution API (e.g. corresponding libvirt public API).
- </p>
-
- <h2><a id="internals">RPC internal APIs</a></h2>
-
- <p>
- The generic internal RPC library code lives in the
<code>src/rpc/</code>
- directory of the libvirt source tree. Unless otherwise noted, the
- objects are all threadsafe. The core object types and their
- purposes are:
- </p>
-
- <h3><a id="apioverview">Overview of RPC
objects</a></h3>
-
- <p>
- The following is a high level overview of the role of each
- of the main RPC objects
- </p>
-
- <dl>
- <dt><code>virNetSASLContext *</code>
(virnetsaslcontext.h)</dt>
- <dd>The virNetSASLContext APIs maintain SASL state for a network
- service (server or client). This is primarily used on the server
- to provide an access control list of SASL usernames permitted as
- clients.
- </dd>
-
- <dt><code>virNetSASLSession *</code>
(virnetsaslcontext.h)</dt>
- <dd>The virNetSASLSession APIs maintain SASL state for a single
- network connection (socket). This is used to perform the multi-step
- SASL handshake and perform encryption/decryption of data once
- authenticated, via integration with virNetSocket.
- </dd>
-
- <dt><code>virNetTLSContext *</code>
(virnettlscontext.h)</dt>
- <dd>The virNetTLSContext APIs maintain TLS state for a network
- service (server or client). This is primarily used on the server
- to provide an access control list of x509 distinguished names, as
- well as diffie-hellman keys. It can also do validation of
- x509 certificates prior to initiating a connection, in order
- to improve detection of configuration errors.
- </dd>
-
- <dt><code>virNetTLSSession *</code>
(virnettlscontext.h)</dt>
- <dd>The virNetTLSSession APIs maintain TLS state for a single
- network connection (socket). This is used to perform the multi-step
- TLS handshake and perform encryption/decryption of data once
- authenticated, via integration with virNetSocket.
- </dd>
-
- <dt><code>virNetSocket *</code> (virnetsocket.h)</dt>
- <dd>The virNetSocket APIs provide a higher level wrapper around
- the raw BSD sockets and getaddrinfo APIs. They allow for creation
- of both server and client sockets. Data transports supported are
- TCP, UNIX, SSH tunnel or external command tunnel. Internally the
- TCP socket impl uses the getaddrinfo info APIs to ensure correct
- protocol-independent behaviour, thus supporting both IPv4 and IPv6.
- The socket APIs can be associated with a virNetSASLSession *or
- virNetTLSSession *object to allow seamless encryption/decryption
- of all writes and reads. For UNIX sockets it is possible to obtain
- the remote client user ID and process ID. Integration with the
- libvirt event loop also allows use of callbacks for notification
- of various I/O conditions
- </dd>
-
- <dt><code>virNetMessage *</code> (virnetmessage.h)</dt>
- <dd>The virNetMessage APIs provide a wrapper around the libxdr
- API calls, to facilitate processing and creation of RPC
- packets. There are convenience APIs for encoding/encoding the
- packet headers, encoding/decoding the payload using an XDR
- filter, encoding/decoding a raw payload (for streams), and
- encoding a virErrorPtr object. There is also a means to
- add to/serve from a linked-list queue of messages.</dd>
-
- <dt><code>virNetClient *</code> (virnetclient.h)</dt>
- <dd>The virNetClient APIs provide a way to connect to a
- remote server and run one or more RPC protocols over
- the connection. Connections can be made over TCP, UNIX
- sockets, SSH tunnels, or external command tunnels. There
- is support for both TLS and SASL session encryption.
- The client also supports management of multiple data streams
- over each connection. Each client object can be used from
- multiple threads concurrently, with method calls/replies
- being interleaved on the wire as required.
- </dd>
-
- <dt><code>virNetClientProgram *</code>
(virnetclientprogram.h)</dt>
- <dd>The virNetClientProgram APIs are used to register a
- program+version with the connection. This then enables
- invocation of method calls, receipt of asynchronous
- events and use of data streams, within that program+version.
- When created a set of callbacks must be supplied to take
- care of dispatching any incoming asynchronous events.
- </dd>
-
- <dt><code>virNetClientStream *</code>
(virnetclientstream.h)</dt>
- <dd>The virNetClientStream APIs are used to control transmission and
- receipt of data over a stream active on a client. Streams provide
- a low latency, unlimited length, bi-directional raw data exchange
- mechanism layered over the RPC connection
- </dd>
-
- <dt><code>virNetServer *</code> (virnetserver.h)</dt>
- <dd>The virNetServer APIs are used to manage a network server. A
- server exposed one or more programs, over one or more services.
- It manages multiple client connections invoking multiple RPC
- calls in parallel, with dispatch across multiple worker threads.
- </dd>
-
- <dt><code>virNetDaemon *</code> (virnetdaemon.h)</dt>
- <dd>The virNetDaemon APIs are used to manage a daemon process. A
- daemon is a process that might expose one or more servers. It
- handles most process-related details, network-related should
- be part of the underlying server.
- </dd>
-
- <dt><code>virNetServerClient *</code>
(virnetserverclient.h)</dt>
- <dd>The virNetServerClient APIs are used to manage I/O related
- to a single client network connection. It handles initial
- validation and routing of incoming RPC packets, and transmission
- of outgoing packets.
- </dd>
-
- <dt><code>virNetServerProgram *</code>
(virnetserverprogram.h)</dt>
- <dd>The virNetServerProgram APIs are used to provide the implementation
- of a single program/version set. Primarily this includes a set of
- callbacks used to actually invoke the APIs corresponding to
- program procedure numbers. It is responsible for all the serialization
- of payloads to/from XDR.</dd>
-
- <dt><code>virNetServerService *</code>
(virnetserverservice.h)</dt>
- <dd>The virNetServerService APIs are used to connect the server to
- one or more network protocols. A single service may involve multiple
- sockets (ie both IPv4 and IPv6). A service also has an associated
- authentication policy for incoming clients.
- </dd>
- </dl>
-
- <h3><a id="apiclientdispatch">Client RPC
dispatch</a></h3>
-
- <p>
- The client RPC code must allow for multiple overlapping RPC method
- calls to be invoked, transmission and receipt of data for multiple
- streams and receipt of asynchronous events. Understandably this
- involves coordination of multiple threads.
- </p>
-
- <p>
- The core requirement in the client dispatch code is that only
- one thread is allowed to be performing I/O on the socket at
- any time. This thread is said to be "holding the buck". When
- any other thread comes along and needs to do I/O it must place
- its packets on a queue and delegate processing of them to the
- thread that has the buck. This thread will send out the method
- call, and if it sees a reply will pass it back to the waiting
- thread. If the other thread's reply hasn't arrived, by the time
- the main thread has got its own reply, then it will transfer
- responsibility for I/O to the thread that has been waiting the
- longest. It is said to be "passing the buck" for I/O.
- </p>
-
- <p>
- When no thread is performing any RPC method call, or sending
- stream data there is still a need to monitor the socket for
- incoming I/O related to asynchronous events, or stream data
- receipt. For this task, a watch is registered with the event
- loop which triggers whenever the socket is readable. This
- watch is automatically disabled whenever any other thread
- grabs the buck, and re-enabled when the buck is released.
- </p>
-
- <h4><a id="apiclientdispatchex1">Example with buck
passing</a></h4>
-
- <p>
- In the first example, a second thread issues an API call
- while the first thread holds the buck. The reply to the
- first call arrives first, so the buck is passed to the
- second thread.
- </p>
-
- <pre>
- Thread-1
- |
- V
- Call API1()
- |
- V
- Grab Buck
- | Thread-2
- V |
- Send method1 V
- | Call API2()
- V |
- Wait I/O V
- |<--------Queue method2
- V |
- Send method2 V
- | Wait for buck
- V |
- Wait I/O |
- | |
- V |
- Recv reply1 |
- | |
- V |
- Pass the buck----->|
- | V
- V Wait I/O
- Return API1() |
- V
- Recv reply2
- |
- V
- Release the buck
- |
- V
- Return API2()
- </pre>
-
- <h4><a id="apiclientdispatchex2">Example without buck
passing</a></h4>
-
- <p>
- In this second example, a second thread issues an API call
- which is sent and replied to, before the first thread's
- API call has completed. The first thread thus notifies
- the second that its reply is ready, and there is no need
- to pass the buck
- </p>
-
- <pre>
- Thread-1
- |
- V
- Call API1()
- |
- V
- Grab Buck
- | Thread-2
- V |
- Send method1 V
- | Call API2()
- V |
- Wait I/O V
- |<--------Queue method2
- V |
- Send method2 V
- | Wait for buck
- V |
- Wait I/O |
- | |
- V |
- Recv reply2 |
- | |
- V |
- Notify reply2------>|
- | V
- V Return API2()
- Wait I/O
- |
- V
- Recv reply1
- |
- V
- Release the buck
- |
- V
- Return API1()
- </pre>
-
- <h4><a id="apiclientdispatchex3">Example with async
events</a></h4>
-
- <p>
- In this example, only one thread is present and it has to
- deal with some async events arriving. The events are actually
- dispatched to the application from the event loop thread
- </p>
-
- <pre>
- Thread-1
- |
- V
- Call API1()
- |
- V
- Grab Buck
- |
- V
- Send method1
- |
- V
- Wait I/O
- | Event thread
- V ...
- Recv event1 |
- | V
- V Wait for timer/fd
- Queue event1 |
- | V
- V Timer fires
- Wait I/O |
- | V
- V Emit event1
- Recv reply1 |
- | V
- V Wait for timer/fd
- Return API1() |
- ...
- </pre>
-
- <h3><a id="apiserverdispatch">Server RPC
dispatch</a></h3>
-
- <p>
- The RPC server code must support receipt of incoming RPC requests from
- multiple client connections, and parallel processing of all RPC
- requests, even many from a single client. This goal is achieved through
- a combination of event driven I/O, and multiple processing threads.
- </p>
-
- <p>
- The main libvirt event loop thread is responsible for performing all
- socket I/O. It will read incoming packets from clients and will
- transmit outgoing packets to clients. It will handle the I/O to/from
- streams associated with client API calls. When doing client I/O it
- will also pass the data through any applicable encryption layer
- (through use of the virNetSocket / virNetTLSSession and virNetSASLSession
- integration). What is paramount is that the event loop thread never
- do any task that can take a non-trivial amount of time.
- </p>
-
- <p>
- When reading packets, the event loop will first read the 4 byte length
- word. This is validated to make sure it does not exceed the maximum
- permissible packet size, and the client is set to allow receipt of the
- rest of the packet data. Once a complete packet has been received, the
- next step is to decode the RPC header. The header is validated to
- ensure the request is sensible, ie the server should not receive a
- method reply from a client. If the client has not yet authenticated,
- an access control list check is also performed to make sure the procedure
- is one of those allowed prior to auth. If the packet is a method
- call, it will be placed on a global processing queue. The event loop
- thread is now done with the packet for the time being.
- </p>
-
- <p>
- The server has a pool of worker threads, which wait for method call
- packets to be queued. One of them will grab the new method call off
- the queue for processing. The first step is to decode the payload of
- the packet to extract the method call arguments. The worker does not
- attempt to do any semantic validation of the arguments, except to make
- sure the size of any variable length fields is below defined limits.
- </p>
-
- <p>
- The worker now invokes the libvirt API call that corresponds to the
- procedure number in the packet header. The worker is thus kept busy
- until the API call completes. The implementation of the API call
- is responsible for doing semantic validation of parameters and any
- MAC security checks on the objects affected.
- </p>
-
- <p>
- Once the API call has completed, the worker thread will take the
- return value and output parameters, or error object and encode
- them into a reply packet. Again it does not attempt to do any
- semantic validation of output data, aside from variable length
- field limit checks. The worker thread puts the reply packet on
- the transmission queue for the client. The worker is now finished
- and goes back to wait for another incoming method call.
- </p>
-
- <p>
- The main event loop is back in charge and when the client socket
- becomes writable, it will start sending the method reply packet
- back to the client.
- </p>
-
- <p>
- At any time the libvirt connection object can emit asynchronous
- events. These are handled by callbacks in the main event thread.
- The callback will simply encode the event parameters into a new
- data packet and place the packet on the client transmission
- queue.
- </p>
-
- <p>
- Incoming and outgoing stream packets are also directly handled
- by the main event thread. When an incoming stream packet is
- received, instead of placing it in the global dispatch queue
- for the worker threads, it is sidetracked into a per-stream
- processing queue. When the stream becomes writable, queued
- incoming stream packets will be processed, passing their data
- payload on the stream. Conversely when the stream becomes
- readable, chunks of data will be read from it, encoded into
- new outgoing packets, and placed on the client's transmit
- queue.
- </p>
-
- <h4><a id="apiserverdispatchex1">Example with overlapping
methods</a></h4>
-
- <p>
- This example illustrates processing of two incoming methods with
- overlapping execution
- </p>
-
- <pre>
- Event thread Worker 1 Worker 2
- | | |
- V V V
- Wait I/O Wait Job Wait Job
- | | |
- V | |
- Recv method1 | |
- | | |
- V | |
- Queue method1 V |
- | Serve method1 |
- V | |
- Wait I/O V |
- | Call API1() |
- V | |
- Recv method2 | |
- | | |
- V | |
- Queue method2 | V
- | | Serve method2
- V V |
- Wait I/O Return API1() V
- | | Call API2()
- | V |
- V Queue reply1 |
- Send reply1 | |
- | V V
- V Wait Job Return API2()
- Wait I/O | |
- | ... V
- V Queue reply2
- Send reply2 |
- | V
- V Wait Job
- Wait I/O |
- | ...
- ...
- </pre>
-
- <h4><a id="apiserverdispatchex2">Example with stream
data</a></h4>
-
- <p>
- This example illustrates processing of stream data
- </p>
-
- <pre>
- Event thread
- |
- V
- Wait I/O
- |
- V
- Recv stream1
- |
- V
- Queue stream1
- |
- V
- Wait I/O
- |
- V
- Recv stream2
- |
- V
- Queue stream2
- |
- V
- Wait I/O
- |
- V
- Write stream1
- |
- V
- Write stream2
- |
- V
- Wait I/O
- |
- ...
- </pre>
-
- </body>
-</html>
diff --git a/docs/kbase/index.rst b/docs/kbase/index.rst
index 2125bf4252..31711d908b 100644
--- a/docs/kbase/index.rst
+++ b/docs/kbase/index.rst
@@ -94,3 +94,6 @@ Internals
`Lock managers <internals/locking.html>`__
Use lock managers to protect disk content
+
+`RPC protocol & APIs <internals/rpc.html>`__
+ RPC protocol information and API / dispatch guide
diff --git a/docs/kbase/internals/meson.build b/docs/kbase/internals/meson.build
index 8195d7caf0..879c4b2de8 100644
--- a/docs/kbase/internals/meson.build
+++ b/docs/kbase/internals/meson.build
@@ -4,6 +4,7 @@ docs_kbase_internals_files = [
'incremental-backup',
'locking',
'migration',
+ 'rpc',
]
diff --git a/docs/kbase/internals/rpc.rst b/docs/kbase/internals/rpc.rst
new file mode 100644
index 0000000000..02bc880044
--- /dev/null
+++ b/docs/kbase/internals/rpc.rst
@@ -0,0 +1,781 @@
+==========================
+libvirt RPC infrastructure
+==========================
+
+.. contents::
+
+libvirt includes a basic protocol and code to implement an extensible, secure
+client/server RPC service. This was originally designed for communication
+between the libvirt client library and the libvirtd daemon, but the code is now
+isolated to allow reuse in other areas of libvirt code. This document provides
+an overview of the protocol and structure / operation of the internal RPC
+library APIs.
+
+RPC protocol
+------------
+
+libvirt uses a simple, variable length, packet based RPC protocol. All
+structured data within packets is encoded using the `XDR
+standard <
https://en.wikipedia.org/wiki/External_Data_Representation>`__ as
+currently defined by `RFC 4506 <
https://tools.ietf.org/html/rfc4506>`__. On any
+connection running the RPC protocol, there can be multiple programs active, each
+supporting one or more versions. A program defines a set of procedures that it
+supports. The procedures can support call+reply method invocation, asynchronous
+events, and generic data streams. Method invocations can be overlapped, so
+waiting for a reply to one will not block the receipt of the reply to another
+outstanding method. The protocol was loosely inspired by the design of SunRPC.
+The definition of the RPC protocol is in the file ``src/rpc/virnetprotocol.x``
+in the libvirt source tree.
+
+`Packet framing <protocolframing>`__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On the wire, there is no explicit packet framing marker. Instead each packet is
+preceded by an unsigned 32-bit integer giving the total length of the packet in
+bytes. This length includes the 4-bytes of the length word itself. Conceptually
+the framing looks like this:
+
+::
+
+ |~~~ Packet 1 ~~~|~~~ Packet 2 ~~~|~~~ Packet 3 ~~~|~~~
+
+ +-------+------------+-------+------------+-------+------------+...
+ | n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 | n=U32 | (n-4) * U8 |
+ +-------+------------+-------+------------+-------+------------+...
+
+ |~ Len ~|~ Data ~|~ Len ~|~ Data ~|~ Len ~|~ Data ~|~
+
+`Packet data <protocoldata>`__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The data in each packet is split into two parts, a short fixed length header,
+followed by a variable length payload. So a packet from the illustration above
+is more correctly shown as
+
+::
+
+
+ +-------+-------------+---------------....---+
+ | n=U32 | 6*U32 | (n-(7*4))*U8 |
+ +-------+-------------+---------------....---+
+
+ |~ Len ~|~ Header ~|~ Payload .... ~|
+
+`Packet header <protocolheader>`__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The header contains 6 fields, encoded as signed/unsigned 32-bit integers.
+
+::
+
+ +---------------+
+ | program=U32 |
+ +---------------+
+ | version=U32 |
+ +---------------+
+ | procedure=S32 |
+ +---------------+
+ | type=S32 |
+ +---------------+
+ | serial=U32 |
+ +---------------+
+ | status=S32 |
+ +---------------+
+
+``program``
+ This is an arbitrarily chosen number that will uniquely identify the
+ "service" running over the stream.
+``version``
+ This is the version number of the program, by convention starting from '1'.
+ When an incompatible change is made to a program, the version number is
+ incremented. Ideally both versions will then be supported on the wire in
+ parallel for backwards compatibility.
+``procedure``
+ This is an arbitrarily chosen number that will uniquely identify the method
+ call, or event associated with the packet. By convention, procedure numbers
+ start from 1 and are assigned monotonically thereafter.
+``type``
+ This can be one of the following enumeration values
+
+ #. call: invocation of a method call
+ #. reply: completion of a method call
+ #. event: an asynchronous event
+ #. stream: control info or data from a stream
+
+``serial``
+ This is a number that starts from 1 and increases each time a method call
+ packet is sent. A reply or stream packet will have a serial number matching
+ the original method call packet serial. Events always have the serial number
+ set to 0.
+``status``
+ This can one of the following enumeration values
+
+ #. ok: a normal packet. this is always set for method calls or events. For
+ replies it indicates successful completion of the method. For streams it
+ indicates confirmation of the end of file on the stream.
+ #. error: for replies this indicates that the method call failed and error
+ information is being returned. For streams this indicates that not all
+ data was sent and the stream has aborted
+ #. continue: for streams this indicates that further data packets will be
+ following
+
+`Packet payload <protocolpayload>`__
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The payload of a packet will vary depending on the ``type`` and ``status``
+fields from the header.
+
+- type=call: the in parameters for the method call, XDR encoded
+- type=call-with-fds: number of file handles, then the in parameters for the
+ method call, XDR encoded, followed by the file handles
+- type=reply+status=ok: the return value and/or out parameters for the method
+ call, XDR encoded
+- type=reply+status=error: the error information for the method, a virErrorPtr
+ XDR encoded
+- type=reply-with-fds+status=ok: number of file handles, the return value
+ and/or out parameters for the method call, XDR encoded, followed by the file
+ handles
+- type=reply-with-fds+status=error: number of file handles, the error
+ information for the method, a virErrorPtr XDR encoded, followed by the file
+ handles
+- type=event: the parameters for the event, XDR encoded
+- type=stream+status=ok: no payload
+- type=stream+status=error: the error information for the method, a virErrorPtr
+ XDR encoded
+- type=stream+status=continue: the raw bytes of data for the stream. No XDR
+ encoding
+
+With the two packet types that support passing file descriptors, in between the
+header and the payload there will be a 4-byte integer specifying the number of
+file descriptors which are being sent. The actual file handles are sent after
+the payload has been sent. Each file handle has a single dummy byte transmitted
+as a carrier for the out of band file descriptor. While the sender should always
+send '\0' as the dummy byte value, the receiver ought to ignore the value for
+the sake of robustness.
+
+For the exact payload information for each procedure, consult the XDR protocol
+definition for the program+version in question
+
+Wire examples
+~~~~~~~~~~~~~
+
+The following diagrams illustrate some example packet exchanges between a client
+and server
+
+Method call
+^^^^^^^^^^^
+
+A single method call and successful reply, for a program=8, version=1,
+procedure=3, which 10 bytes worth of input args, and 4 bytes worth of return
+values. The overall input packet length is 4 + 24 + 10 == 38, and output packet
+length 32
+
+::
+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
+ +--+-----------------------+-----------+
+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
+ +--+-----------------------+--------+
+
+Method call with error
+^^^^^^^^^^^^^^^^^^^^^^
+
+An unsuccessful method call will instead return an error object
+
+::
+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
+ +--+-----------------------+-----------+
+
+ +--+-----------------------+--------------------------+
+ C <-- |48| 8 | 1 | 3 | 2 | 1 | 0 | .o.oOo.o.oOo.o.oOo.o.oOo | <-- S (error)
+ +--+-----------------------+--------------------------+
+
+Method call with upload stream
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A method call which also involves uploading some data over a stream will result
+in
+
+::
+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
+ +--+-----------------------+-----------+
+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
+ +--+-----------------------+--------+
+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ ...
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+
+ C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish)
+ +--+-----------------------+
+ +--+-----------------------+
+ C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish)
+ +--+-----------------------+
+
+Method call bidirectional stream
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A method call which also involves a bi-directional stream will result in
+
+::
+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call)
+ +--+-----------------------+-----------+
+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
+ +--+-----------------------+--------+
+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream
data down)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream
data down)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream
data down)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C <-- |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | <-- S (stream
data down)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ ..
+ +--+-----------------------+-------------....-------+
+ C --> |38| 8 | 1 | 3 | 3 | 1 | 2 | .o.oOo.o.oOo....o.oOo. | --> S (stream
data up)
+ +--+-----------------------+-------------....-------+
+ +--+-----------------------+
+ C --> |24| 8 | 1 | 3 | 3 | 1 | 0 | --> S (stream finish)
+ +--+-----------------------+
+ +--+-----------------------+
+ C <-- |24| 8 | 1 | 3 | 3 | 1 | 0 | <-- S (stream finish)
+ +--+-----------------------+
+
+Method calls overlapping
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+::
+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 1 | 0 | .o.oOo.o. | --> S (call 1)
+ +--+-----------------------+-----------+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 2 | 0 | .o.oOo.o. | --> S (call 2)
+ +--+-----------------------+-----------+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 2 | 0 | .o.oOo | <-- S (reply 2)
+ +--+-----------------------+--------+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 3 | 0 | .o.oOo.o. | --> S (call 3)
+ +--+-----------------------+-----------+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 3 | 0 | .o.oOo | <-- S (reply 3)
+ +--+-----------------------+--------+
+ +--+-----------------------+-----------+
+ C --> |38| 8 | 1 | 3 | 0 | 4 | 0 | .o.oOo.o. | --> S (call 4)
+ +--+-----------------------+-----------+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply 1)
+ +--+-----------------------+--------+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 4 | 0 | .o.oOo | <-- S (reply 4)
+ +--+-----------------------+--------+
+
+Method call with passed FD
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A single method call with 2 passed file descriptors and successful reply, for a
+program=8, version=1, procedure=3, which 10 bytes worth of input args, and 4
+bytes worth of return values. The number of file descriptors is encoded as a
+32-bit int. Each file descriptor then has a 1 byte dummy payload. The overall
+input packet length is 4 + 24 + 4 + 2 + 10 == 44, and output packet length 32.
+
+::
+
+ +--+-----------------------+---------------+-------+
+ C --> |44| 8 | 1 | 3 | 0 | 1 | 0 | 2 | .o.oOo.o. | 0 | 0 | --> S (call)
+ +--+-----------------------+---------------+-------+
+
+ +--+-----------------------+--------+
+ C <-- |32| 8 | 1 | 3 | 1 | 1 | 0 | .o.oOo | <-- S (reply)
+ +--+-----------------------+--------+
+
+RPC security
+------------
+
+There are various things to consider to ensure an implementation of the RPC
+protocol can be satisfactorily secured
+
+Authentication/encryption
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The basic RPC protocol does not define or require any specific
+authentication/encryption capabilities. A generic solution to providing
+encryption for the protocol is to run the protocol over a TLS encrypted data
+stream. x509 certificate checks can be done to form a crude authentication
+mechanism. It is also possible for an RPC program to negotiate an encryption /
+authentication capability, such as SASL, which may then also provide per-packet
+data encryption. Finally the protocol data stream can of course be tunnelled
+over transports such as SSH.
+
+Data limits
+~~~~~~~~~~~
+
+Although the protocol itself defines many arbitrary sized data values in the
+payloads, to avoid denial of service attack there are a number of size limit
+checks prior to encoding or decoding data. There is a limit on the maximum size
+of a single RPC message, limit on the maximum string length, and limits on any
+other parameter which uses a variable length array. These limits can be raised,
+subject to agreement between client/server, without otherwise breaking
+compatibility of the RPC data on the wire.
+
+Data validation
+~~~~~~~~~~~~~~~
+
+It is important that all data be fully validated before performing any actions
+based on the data. When reading an RPC packet, the first four bytes must be read
+and the max packet size limit validated, before any attempt is made to read the
+variable length packet data. After a complete packet has been read, the header
+must be decoded and all 6 fields fully validated, before attempting to dispatch
+the payload. Once dispatched, the payload can be decoded and passed on to the
+appropriate API for execution. The RPC code must not take any action based on
+the payload, since it has no way to validate the semantics of the payload data.
+It must delegate this to the execution API (e.g. corresponding libvirt public
+API).
+
+RPC internal APIs
+-----------------
+
+The generic internal RPC library code lives in the ``src/rpc/`` directory of the
+libvirt source tree. Unless otherwise noted, the objects are all threadsafe. The
+core object types and their purposes are:
+
+Overview of RPC objects
+~~~~~~~~~~~~~~~~~~~~~~~
+
+The following is a high level overview of the role of each of the main RPC
+objects
+
+``virNetSASLContext *`` (virnetsaslcontext.h)
+ The virNetSASLContext APIs maintain SASL state for a network service (server
+ or client). This is primarily used on the server to provide an access control
+ list of SASL usernames permitted as clients.
+``virNetSASLSession *`` (virnetsaslcontext.h)
+ The virNetSASLSession APIs maintain SASL state for a single network
+ connection (socket). This is used to perform the multi-step SASL handshake
+ and perform encryption/decryption of data once authenticated, via integration
+ with virNetSocket.
+``virNetTLSContext *`` (virnettlscontext.h)
+ The virNetTLSContext APIs maintain TLS state for a network service (server or
+ client). This is primarily used on the server to provide an access control
+ list of x509 distinguished names, as well as diffie-hellman keys. It can also
+ do validation of x509 certificates prior to initiating a connection, in order
+ to improve detection of configuration errors.
+``virNetTLSSession *`` (virnettlscontext.h)
+ The virNetTLSSession APIs maintain TLS state for a single network connection
+ (socket). This is used to perform the multi-step TLS handshake and perform
+ encryption/decryption of data once authenticated, via integration with
+ virNetSocket.
+``virNetSocket *`` (virnetsocket.h)
+ The virNetSocket APIs provide a higher level wrapper around the raw BSD
+ sockets and getaddrinfo APIs. They allow for creation of both server and
+ client sockets. Data transports supported are TCP, UNIX, SSH tunnel or
+ external command tunnel. Internally the TCP socket impl uses the getaddrinfo
+ info APIs to ensure correct protocol-independent behaviour, thus supporting
+ both IPv4 and IPv6. The socket APIs can be associated with a
+ virNetSASLSession \*or virNetTLSSession \*object to allow seamless
+ encryption/decryption of all writes and reads. For UNIX sockets it is
+ possible to obtain the remote client user ID and process ID. Integration with
+ the libvirt event loop also allows use of callbacks for notification of
+ various I/O conditions
+``virNetMessage *`` (virnetmessage.h)
+ The virNetMessage APIs provide a wrapper around the libxdr API calls, to
+ facilitate processing and creation of RPC packets. There are convenience APIs
+ for encoding/encoding the packet headers, encoding/decoding the payload using
+ an XDR filter, encoding/decoding a raw payload (for streams), and encoding a
+ virErrorPtr object. There is also a means to add to/serve from a linked-list
+ queue of messages.
+``virNetClient *`` (virnetclient.h)
+ The virNetClient APIs provide a way to connect to a remote server and run one
+ or more RPC protocols over the connection. Connections can be made over TCP,
+ UNIX sockets, SSH tunnels, or external command tunnels. There is support for
+ both TLS and SASL session encryption. The client also supports management of
+ multiple data streams over each connection. Each client object can be used
+ from multiple threads concurrently, with method calls/replies being
+ interleaved on the wire as required.
+``virNetClientProgram *`` (virnetclientprogram.h)
+ The virNetClientProgram APIs are used to register a program+version with the
+ connection. This then enables invocation of method calls, receipt of
+ asynchronous events and use of data streams, within that program+version.
+ When created a set of callbacks must be supplied to take care of dispatching
+ any incoming asynchronous events.
+``virNetClientStream *`` (virnetclientstream.h)
+ The virNetClientStream APIs are used to control transmission and receipt of
+ data over a stream active on a client. Streams provide a low latency,
+ unlimited length, bi-directional raw data exchange mechanism layered over the
+ RPC connection
+``virNetServer *`` (virnetserver.h)
+ The virNetServer APIs are used to manage a network server. A server exposed
+ one or more programs, over one or more services. It manages multiple client
+ connections invoking multiple RPC calls in parallel, with dispatch across
+ multiple worker threads.
+``virNetDaemon *`` (virnetdaemon.h)
+ The virNetDaemon APIs are used to manage a daemon process. A daemon is a
+ process that might expose one or more servers. It handles most
+ process-related details, network-related should be part of the underlying
+ server.
+``virNetServerClient *`` (virnetserverclient.h)
+ The virNetServerClient APIs are used to manage I/O related to a single client
+ network connection. It handles initial validation and routing of incoming RPC
+ packets, and transmission of outgoing packets.
+``virNetServerProgram *`` (virnetserverprogram.h)
+ The virNetServerProgram APIs are used to provide the implementation of a
+ single program/version set. Primarily this includes a set of callbacks used
+ to actually invoke the APIs corresponding to program procedure numbers. It is
+ responsible for all the serialization of payloads to/from XDR.
+``virNetServerService *`` (virnetserverservice.h)
+ The virNetServerService APIs are used to connect the server to one or more
+ network protocols. A single service may involve multiple sockets (ie both
+ IPv4 and IPv6). A service also has an associated authentication policy for
+ incoming clients.
+
+Client RPC dispatch
+~~~~~~~~~~~~~~~~~~~
+
+The client RPC code must allow for multiple overlapping RPC method calls to be
+invoked, transmission and receipt of data for multiple streams and receipt of
+asynchronous events. Understandably this involves coordination of multiple
+threads.
+
+The core requirement in the client dispatch code is that only one thread is
+allowed to be performing I/O on the socket at any time. This thread is said to
+be "holding the buck". When any other thread comes along and needs to do I/O
it
+must place its packets on a queue and delegate processing of them to the thread
+that has the buck. This thread will send out the method call, and if it sees a
+reply will pass it back to the waiting thread. If the other thread's reply
+hasn't arrived, by the time the main thread has got its own reply, then it will
+transfer responsibility for I/O to the thread that has been waiting the longest.
+It is said to be "passing the buck" for I/O.
+
+When no thread is performing any RPC method call, or sending stream data there
+is still a need to monitor the socket for incoming I/O related to asynchronous
+events, or stream data receipt. For this task, a watch is registered with the
+event loop which triggers whenever the socket is readable. This watch is
+automatically disabled whenever any other thread grabs the buck, and re-enabled
+when the buck is released.
+
+Example with buck passing
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In the first example, a second thread issues an API call while the first thread
+holds the buck. The reply to the first call arrives first, so the buck is passed
+to the second thread.
+
+::
+
+ Thread-1
+ |
+ V
+ Call API1()
+ |
+ V
+ Grab Buck
+ | Thread-2
+ V |
+ Send method1 V
+ | Call API2()
+ V |
+ Wait I/O V
+ |<--------Queue method2
+ V |
+ Send method2 V
+ | Wait for buck
+ V |
+ Wait I/O |
+ | |
+ V |
+ Recv reply1 |
+ | |
+ V |
+ Pass the buck----->|
+ | V
+ V Wait I/O
+ Return API1() |
+ V
+ Recv reply2
+ |
+ V
+ Release the buck
+ |
+ V
+ Return API2()
+
+Example without buck passing
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In this second example, a second thread issues an API call which is sent and
+replied to, before the first thread's API call has completed. The first thread
+thus notifies the second that its reply is ready, and there is no need to pass
+the buck
+
+::
+
+ Thread-1
+ |
+ V
+ Call API1()
+ |
+ V
+ Grab Buck
+ | Thread-2
+ V |
+ Send method1 V
+ | Call API2()
+ V |
+ Wait I/O V
+ |<--------Queue method2
+ V |
+ Send method2 V
+ | Wait for buck
+ V |
+ Wait I/O |
+ | |
+ V |
+ Recv reply2 |
+ | |
+ V |
+ Notify reply2------>|
+ | V
+ V Return API2()
+ Wait I/O
+ |
+ V
+ Recv reply1
+ |
+ V
+ Release the buck
+ |
+ V
+ Return API1()
+
+Example with async events
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In this example, only one thread is present and it has to deal with some async
+events arriving. The events are actually dispatched to the application from the
+event loop thread
+
+::
+
+ Thread-1
+ |
+ V
+ Call API1()
+ |
+ V
+ Grab Buck
+ |
+ V
+ Send method1
+ |
+ V
+ Wait I/O
+ | Event thread
+ V ...
+ Recv event1 |
+ | V
+ V Wait for timer/fd
+ Queue event1 |
+ | V
+ V Timer fires
+ Wait I/O |
+ | V
+ V Emit event1
+ Recv reply1 |
+ | V
+ V Wait for timer/fd
+ Return API1() |
+ ...
+
+Server RPC dispatch
+~~~~~~~~~~~~~~~~~~~
+
+The RPC server code must support receipt of incoming RPC requests from multiple
+client connections, and parallel processing of all RPC requests, even many from
+a single client. This goal is achieved through a combination of event driven
+I/O, and multiple processing threads.
+
+The main libvirt event loop thread is responsible for performing all socket I/O.
+It will read incoming packets from clients and will transmit outgoing packets to
+clients. It will handle the I/O to/from streams associated with client API
+calls. When doing client I/O it will also pass the data through any applicable
+encryption layer (through use of the virNetSocket / virNetTLSSession and
+virNetSASLSession integration). What is paramount is that the event loop thread
+never do any task that can take a non-trivial amount of time.
+
+When reading packets, the event loop will first read the 4 byte length word.
+This is validated to make sure it does not exceed the maximum permissible packet
+size, and the client is set to allow receipt of the rest of the packet data.
+Once a complete packet has been received, the next step is to decode the RPC
+header. The header is validated to ensure the request is sensible, ie the server
+should not receive a method reply from a client. If the client has not yet
+authenticated, an access control list check is also performed to make sure the
+procedure is one of those allowed prior to auth. If the packet is a method call,
+it will be placed on a global processing queue. The event loop thread is now
+done with the packet for the time being.
+
+The server has a pool of worker threads, which wait for method call packets to
+be queued. One of them will grab the new method call off the queue for
+processing. The first step is to decode the payload of the packet to extract the
+method call arguments. The worker does not attempt to do any semantic validation
+of the arguments, except to make sure the size of any variable length fields is
+below defined limits.
+
+The worker now invokes the libvirt API call that corresponds to the procedure
+number in the packet header. The worker is thus kept busy until the API call
+completes. The implementation of the API call is responsible for doing semantic
+validation of parameters and any MAC security checks on the objects affected.
+
+Once the API call has completed, the worker thread will take the return value
+and output parameters, or error object and encode them into a reply packet.
+Again it does not attempt to do any semantic validation of output data, aside
+from variable length field limit checks. The worker thread puts the reply packet
+on the transmission queue for the client. The worker is now finished and goes
+back to wait for another incoming method call.
+
+The main event loop is back in charge and when the client socket becomes
+writable, it will start sending the method reply packet back to the client.
+
+At any time the libvirt connection object can emit asynchronous events. These
+are handled by callbacks in the main event thread. The callback will simply
+encode the event parameters into a new data packet and place the packet on the
+client transmission queue.
+
+Incoming and outgoing stream packets are also directly handled by the main event
+thread. When an incoming stream packet is received, instead of placing it in the
+global dispatch queue for the worker threads, it is sidetracked into a
+per-stream processing queue. When the stream becomes writable, queued incoming
+stream packets will be processed, passing their data payload on the stream.
+Conversely when the stream becomes readable, chunks of data will be read from
+it, encoded into new outgoing packets, and placed on the client's transmit
+queue.
+
+Example with overlapping methods
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This example illustrates processing of two incoming methods with overlapping
+execution
+
+::
+
+ Event thread Worker 1 Worker 2
+ | | |
+ V V V
+ Wait I/O Wait Job Wait Job
+ | | |
+ V | |
+ Recv method1 | |
+ | | |
+ V | |
+ Queue method1 V |
+ | Serve method1 |
+ V | |
+ Wait I/O V |
+ | Call API1() |
+ V | |
+ Recv method2 | |
+ | | |
+ V | |
+ Queue method2 | V
+ | | Serve method2
+ V V |
+ Wait I/O Return API1() V
+ | | Call API2()
+ | V |
+ V Queue reply1 |
+ Send reply1 | |
+ | V V
+ V Wait Job Return API2()
+ Wait I/O | |
+ | ... V
+ V Queue reply2
+ Send reply2 |
+ | V
+ V Wait Job
+ Wait I/O |
+ | ...
+ ...
+
+Example with stream data
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+This example illustrates processing of stream data
+
+::
+
+ Event thread
+ |
+ V
+ Wait I/O
+ |
+ V
+ Recv stream1
+ |
+ V
+ Queue stream1
+ |
+ V
+ Wait I/O
+ |
+ V
+ Recv stream2
+ |
+ V
+ Queue stream2
+ |
+ V
+ Wait I/O
+ |
+ V
+ Write stream1
+ |
+ V
+ Write stream2
+ |
+ V
+ Wait I/O
+ |
+ ...
--
2.35.1