On 08/11/2011 09:12 AM, Daniel P. Berrange wrote:
From: "Daniel P. Berrange"<berrange(a)redhat.com>
* remote.html.in: Remove obsolete notes about internals of the
RPC protocol
* internals/rpc.html.in: Extensive docs on RPC protocol/API
* sitemap.html.in: Add new page
---
docs/internals/rpc.html.in | 876 ++++++++++++++++++++++++++++++++++++++++++++
docs/remote.html.in | 45 ---
docs/sitemap.html.in | 4 +
3 files changed, 880 insertions(+), 45 deletions(-)
create mode 100644 docs/internals/rpc.html.in
ACK, and very nice. Some nits below to fix first...
+<p>
+ libvirt includes a basic protocol and code to implement
+ an extensible, secure client/server RPC service. This was
+ originally designed for communication between the libvirt
+ client library and the libvirtd daemon. It is also also
+ used for communication to the virtlockd daemon and (soon)
+ for the libvirt_lxc controller process. This document
Do we want to mention "(soon)"? I find that docs tend to quickly get
outdated (a new feature gets added, and we forget to modify this file,
and then the reference to soon sounds funny). Conversely, we don't have
libvirt_lxc controller yet, so mentioning it without some disclaimer
also seems odd. I don't have any better suggestion, other than to
remember to delete "(soon)" once we do add libvirt_lxc controller.
+ so waiting for a reply to one will not block the receipt of
the
+ reply to another outstanding method. The protocol was loosely
+ inspired by the design of SunRPC. The definition of the RC
s/RC/RPC/
+ protocol is in the
file<code>src/rpc/virnetprotocol.x</code>
+ in the libvirt source tree.
+</p>
+
+<h3><a href="protocolframing">Packet framing</a></h3>
+
+<p>
+ On the wire, there is no explicit packet framing marker. Instead
+ each packet is preceeded by an unsigned 32-bit integer giving
s/preceeded/preceded/
+
+<h3><a href="protocolheader">Packet header</a></h3>
+<p>
+ The header contains 6 fields, encoded as signed/unsigned 32-bit
+ integers.
Mention that signed integers are twos-complement.
+<dt><code>procedure</code></dt>
+<dd>
+ This is an arbitrarily chosen number that will uniqely
s/uniqely/uniquely/
+<dt><code>status</code></dt>
+<dd>
+<p>
+ This can one of the following enumeration values
+</p>
+<ol>
+<li>ok: a normal packet. this is always set for method calls or events.
+ For replies it indicates succesful completion of the method. For
s/succesful/successful/
+
+<h4><a name="wireexamplescall">Method call</a></h4>
+
+<p>
+ A single method call and succesful
s/succesful/successful/
+<h3><a name="securitylimits">Data
limits</a></h3>
+
+<p>
+ Although the protocol itself defines many arbitrary sized data values in the
+ payloads, to avoid denial of service attack there are a number of size limit
+ checks prior to encoding or decoding data. There is a limit on the maximum
+ size of a single RPC message, limit on the maximum string length, and limits
+ on any other parameter which uses a variable length array. These limits can
+ be raised, subject to agreement between client/server, without otherwise
+ breaking compatibility of the RPC data on the wire.
Hmm, sounds like we might someday want some capability negotiation,
where the client can learn whether the server supports resizes, and if
so, then the client can resize these limits on the fly, instead of the
current approach of recompiling libvirt with the new limits.
+</p>
+
+<h3><a name="securityvalidate">Data
validation</a></h3>
+
+<p>
+ It is important that all data be fully validated before performing
+ any actions based on the data. When reading an RPC packet, the
+ first four bytes must be read and the max packet size limit validated,
+ before any attempt is made to read the variable length packet data.
+ After a complete packet has been read, the header must be decoded
+ and all 6 fields fully validated, before attempting to dispatch
+ the payload. Once dispatched, the payload can be decoded and passed
+ onto the appropriate API for execution. The RPC code must not take
+ any action based on the payload, since it has no way to validate
+ the semantics of the payload data. It must delegate this to the
+ execution API (eg corresponding libvirt public API).
s/eg/e.g./
+
+<dt><code>virNetSocketPtr</code> (virnetsocket.h)</dt>
+<dd>The virNetSocket APIs provide a higher level wrapper around
+ the raw BSD sockets and getaddrinfo APIs. They allow for creation
+ of both server and client sockets. Data transports supported are
+ TCP, UNIX, SSH tunnel or external command tunnel. Internally the
+ TCP socket impl uses the getaddrinfo info APIs to ensure correct
+ protocol independant behaviour, thus supporting both IPv4 and IPv6.
s/protocol independant/protocol-independent/ (add hyphen and fix spelling)
+ The socket APIs can be associated with a
virNetSASLSessionPtr or
+ virNetTLSSessionPtr object to allow seemless encryption/decryption
s/seemless/seamless/
+
+<dt><code>virNetServerMDNSPtr</code> (virnetservermdns.h)</dt>
+<dd>The virNetServerMDNS APIs are used to advertize a server
Don't know if you want US vs. UK spelling (advertise vs. advertize) in
the documentation...
+ across the local network, enabling clients to automatically
+ detect the existance of remote services. This is done by
s/existance/existence/
+ interfacing with the Avahi mDNS advertisement service.
...but at least be consistent in which one you pick.
+<h3><a name="apiclientdispatch">Client RPC
dispatch</a></h3>
+
+<p>
+ The client RPC code must allow for multiple overlapping RPC method
+ calls to be invoked, transmission& receipt of data for mutliple
s/mutliple/multiple/
Also, I prefer s/&/and/ when writing documentation prose; the
resulting & looks like we're taking too many shortcuts.
+ streams and receipt of asynchronous events. Understandably
this
+ involves coordination of multiple threads.
+</p>
+
+<p>
+ The core requirement in the client dispatch code is that only
+ one thread is allowed to be performing I/O on the socket at
+ any time. This thread is said to be "holding the buck". When
+ any other thread comes along and needs todo I/O it must place
s/todo/to do/
+<p>
+ The main libvirt event loop thread is responsible for performing all
+ socket I/O. It will read incoming packets from clients and willl
+ transmit outgoing packets to clients. It will handle the I/O to/from
+ streams associated with client API calls. When doing client I/O it
+ will also take pass the data through any applicable encryption layer
s/take //
+<p>
+ The server has a pool of worker threads, which wait for method call
+ packets to be queued. One of them will grab the new method call off
+ the queue for processing. The first step is to decode the payload of
+ the packet to extract the method call arguments. The worker does not
+ attempt todo any semantic validation of the arguments, except to make
s/todo/to do/
+ sure the size of any variable length fields is below defined
limits.
+</p>
+
+<p>
+ The worker now invokes the libvirt API call that corresponds to the
+ procedure number in the packet header. The worker is thus kept busy
+ until the API call completes. The implemementation of the API call
s/implemementation/implementation/
+ is responsible for doing semantic validation of parameters and
any
+ MAC security checks on the objects affected.
+</p>
+
+<p>
+ Once the API call has completed, the worker thread will take the
+ return value and output parameters, or error object and encode
+ them into a reply packet. Again it does not attempt todo any
s/todo/to do/
--
Eric Blake eblake(a)redhat.com +1-801-349-2682
Libvirt virtualization library
http://libvirt.org