On Mon, Aug 08, 2011 at 08:29:51AM -0500, Anthony Liguori wrote:
On 08/08/2011 03:42 AM, Shribman, Aidan wrote:
>Subject: [PATCH v4] XBZRLE delta for live migration of large memory apps
>From: Aidan Shribman<aidan.shribman(a)sap.com>
>
>By using XBZRLE (Xor Binary Zero Run-Length-Encoding) we can reduce VM downtime
>and total live-migration time of VMs running memory write intensive workloads
>typical of large enterprise applications such as SAP ERP Systems, and generally
>speaking for any application with a sparse memory update pattern.
[snip]
One thing that strikes me about this algorithm is that it's very
good for a particular type of workload--shockingly good really.
I think workload aware migration compression is possible for a lot
of different types of workloads. That makes me a bit wary of QEMU
growing quite a lot of compression mechanisms.
It makes me think that this logic may really belong at a higher
level where more information is known about the workload. For
instance, I can imagine XBZRLE living in something like libvirt.
Today, parsing migration traffic is pretty horrible but I think
we're pretty strongly committed to fixing that in 1.0. That makes
me wonder if it would be nicer architecturally for a higher level
tool to own something like this.
Originally, when I added migration, I had the view that we would
have transport plugins based on the exec: protocol. That hasn't
really happened since libvirt really owns migration but I think
having XBZRLE as a transport plugin for libvirt is something worth
considering.
NB I've not been much of a fan of the exec: migration code, since it
has proved rather buggy in practice when we used it for 'save/restore
to/from file' support. It has been hard to diagnose when things go
wrong, and difficult for QEMU to report any useful error messages.
Even with the tcp: protocol, QEMU is seemingly unable to provide any
useful error reporting even of things as simple as "unable to connect
to remote host". So with one exception, current libvirt now uses the
'fd:' protocol for everything, and the last exception will be removed
soon too.
I'm curious what people think about this type of approach.
CC'ing
libvirt to get their input.
In "normal" migration though, even when using fd:, we don't make
any attempt to touch the data stream. We just pass a pre-connected
TCP socket into QEMU and let it write directly to it. This avoids
extra data copying via libvirt.
In our alternative "tunnelled" migration mode, libvirt does touch
the data stream, passing a pipe FD into QEMU, and copying the data
from the pipe into packets to be sent over libvirtd's existing
secure RPC stream, and then copying it back to QEMU on the destination.
The downside here is that we've added several extra data copies.
In our "save/restore to file" code, we use 'fd:' and always have
to send the data via a filter program. For example, we have the
ability to compress/decompress data via gzip, bzip, xz, and lzop,
for which instead pass QEMU as pipe FD to the external compression
helper program. We also have another new option where we send data
via another I/O helper program that uses O_DIRECT, so save/restore
does not pollute the page cache.
With this kind of existing precedent, I won't strongly argue against
libvirt adding a filter to support this XBZRLE encoding scheme for
migration, or indeed save/restore too, if it proves better than
lzop which is our current optimal speed/compression winner.
My main concern with all these scenarios where libvirt touches the
actual data stream though is that we're introducing extra data copies
into the migration path which potentially waste CPU cycles.
If QEMU can directly XBZRLE encode data into the FD passed via 'fd:'
then we minimize data copies. Whether this is a big enough benefit
to offset the burden of having to maintain various compression code
options in QEMU I can't answer.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|