Hi everyone,
On Thu, Nov 06, 2014 at 09:18:04AM +0200, Cristian Klein wrote: I
talked to the qemu post-copy guys (Andrea and Dave in CC). Starting
post-copy immediately is a bad performance choice: The VM will start
on the destination hypervisor before the read-only or kernel memory
is there. This means that those pages need to be pulled on-demand,
hence a lot of overhead and interruptions in the VM’s execution.
Instead, it is better to first do one pass of pre-copy and only then
trigger post-copy. In fact, I did an experiment with a video
streaming VM and starting post-copy after the first pass of pre-copy
(instead of starting post-copy immediately) reduces downtime from
3.5 seconds to under 1 second.
Given all above, I propose the following post-copy API in libvirt:
virDomainMigrateXXX(..., VIR_MIGRATE_ENABLE_POSTCOPY)
virDomainMigrateStartPostCopy(...) // from a different thread
This is for those who just need the post-copy mechanism and want to
implement a policy themselves.
virDomainMigrateXXX(..., VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY)
This is for those who want to use post-copy without caring about any
low-level details, offering a good enough policy for most cases.
What do you think? Would you accept patches that implement this API?
I agree, even better would be to also pass a parameter to specify how
many passes of precopy to run before engaging postcopy, adding at
least the number of passes parameter shouldn't be a huge change and in
doubt you can just define it to 1.
The other things needed are:
1) adding an event that doesn't require libvirt to poll to know when
the source node has been stopped (then if postcopy was engaged
precopy may not have finished, but the problem remains the same as
with pure precopy: we need to know efficiently when exactly the
source node has been stopped without adding an average 25msec
latency)
2) preparing a second socket for qemu so we can run the out of band
requests of postcopy without incurring into artificial latencies
created by the socket sendbuffer kept filled by the background
transfer (the hack to decrease /proc/sys/net/ipv4/tcp_wmem helps
tremendously, but it's unlikely to ever be as efficient as having
two sockets, potentially both running on openssl etc..)
Thanks,
Andrea