On Fri, Mar 20, 2009 at 11:45:06AM +0100, Daniel Veillard wrote:
Okay I have tried to think again about this, from the code fragment
before and discussions on IRC while performances are tolerable, there
is a lot of costs related to the 64KB chunking imposed by the XML-RPC.
This isn't XML-RPC. This is our own binary protocol using XDR encoding,
which has very little metadata overhead - just a single 24 byte header
per 64kb of data. The payload is the raw binary encoding.
The biggest problem is that this is using our REMOTE_CALL message type.
Thus every 64 kb sent, the client has to wait for an empty REMOTE_REPLY
message to be returned before it can send another 64kb. This synchronous
round trip will really kill latency and throughput.
It is probably acceptable for a class of users who really want
encryption of their data but I would like to make sure we don't close
the door for a possibly more performant implementation.
Trying to reopen a bit the discussion we had before on opening a
separate encrypted connection, this would have a number of potential
improvements over the XML-RPC:
- no chunking, far less context-switching (it would be good to know
how much of excess time spent in the secure migration is data
encoding, how much is overall system burden)
The chunking imposed by our RPC layer isn't really a problem IMHO,
because you have chunking in the encryption APIs too - GNUTLS and
SASL will both encrypt in blocks. There is probably some tuning we
can do to optimize the size of chunks we encrypt, but ultimately
we will always have chunking of the data no matter what protocol
we're sending over.
The context switching between threads is a bit of a annoyance.
We could adapt the libvirtd daemon, so that when it gets a migration
message, the worker thread takes over all responsibility for I/O in
that connection, instead of delegates back to the main thread. THat
probably would not be all that hard - we'd just never re-enable the
'rx' queue for that 'struct qemud_client', so the main thread will
ignore that client forever more.
- seems to me a more standard encrypted TCP/IP connection would
be
more likely to be able to reuse crypto hardware if/when they get
available.
Use of crypto hardware is something that's a hidden impl detail in
the TLS/SASL libraries, so that shouldn't be any problem for us.
I think it is pratical to use our existing RPC channel, but work
on eliminating the round-trip. Our protocol already allows for a
fire-and-forget message type with no roundtrip reply. This is the
"REMOTE_MESSAGE" type. Currently that is only used for messages
sent from the server back to the client. This type of message from
the client to the server is rejected.
One idea is to allow for a generic 'data stream' over the RPC
channel.
- Source libvirtd connects to dest libvirtd and sends
a message 'MIGRATION_INCOMING' message. This is a normal
REMOTE_CALL + REMOTE_REPLY synchronous call. The intent
of this is to let the source tell the destionation that
it wants to start a migration operation. Upon receving
this mesage, the remote libvirtd would decide whether it
wants to allow this. If so, then it can switch the TCP
channel to 'data stream' mode.
- Source libvirtd now sends a series of "MIGRATION_DATA"
messages. These are REMOTE_MESSAGE type, fire & forget
message types with no reply. The source sends this as
fast as it likes, until complete. They are all processed
directly by the worker thread in libvirtd, and not the
main I/O thread.
This avoids any round trip delays, avoids context switches between
threads, and has only a few bytes overhead per 64kb of data sent
and avoids the need to open more ports.
Or we could generalize this a little more and instead of calling the
message MIGRATION_DATA, just have a "STREAM_DATA" message type, as
a generic & efficient way of sending huge chunks of data over the
RPC service. You could imagine it being useful for some of our other
APIs, such virDomain{Memory,Block}Peek.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|