On Wed, Jul 09, 2008 at 08:26:47PM +0100, Richard W.M. Jones wrote:
The kernel images that I want to snoop in virt-mem are around 16 MB
in
size. In the qemu / KVM case, these images have to travel over the
remote connection. Because of limits on the maximum message size,
they have to travel currently in 64 KB chunks, and it turns out that
this is slow. Apparently the dominating factors are how long it takes
to issue the 'memsave' command in the qemu monitor (there is some big
constant overhead), and extra network round trips.
The current remote message size is intentionally limited to 256KB
(fully serialized, including all XDR headers and overhead), so the
most we could practically send in a single message at the moment is
128KB if we stick to powers of two, or ~255KB if we don't.
The reason we limit it is to avoid denial of service attacks, where a
rogue client or server sends excessively large messages and causes the
peer to allocate lots of memory [eg. if we didn't have any limit, then
you could send a message which was several GB in size and cause
problems at the other end, because the message is slurped in before it
is fully parsed].
There is a second problem with reading the kernel in small chunks,
namely that this allows the virtual machine to make a lot of progress,
so we don't get anything near an 'instantaneous' snapshot (getting the
kernel in a single chunk doesn't necessarily guarantee this either,
but it's better).
As an experiment, I tried increasing the maximum message to 32 MB, so
that I could send the whole kernel in one go.
Unfortunately just increasing the limit doesn't work for two reasons,
one prosaic and one very weird:
(1) The current code likes to keep message buffers on the stack, and
because Linux limits the stack to something artificially small, this
fails. Increasing the stack ulimit is a short-term fix for this,
while testing. In the long term we could rewrite any code which does
this to use heap buffers instead.
Yeah we should fix this. I've had a patch for refactoring the main
dispatch method pending for quite a while which dramatically reduces
stack usage
(2) There is some really odd problem with our use of recv(2) which
causes messages > 64 KB to fail. I have no idea what is really
happening, but the sequence of events seems to be this:
server client
write(sock,buf,len) = len-k
recv(sock,buf,len) = len-k
write(sock,buf+len-k,len-k) = k
recv(sock,buf,k) = 0 [NOT k]
Bizarre. The docs quite clearly say
These calls return the number of bytes received, or -1 if an error occurred.
The return value will be 0 when the peer has performed an orderly shutdown.
So its clearly thinking there's a shutdown here.
Were you doing this over the UNIX socket, or the TCP ? If the latter
then might want to turn off all authentication and use the TCP socket
to ensure none of the encryption routines are in use.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|