Re: [libvirt] [PATCH v2 1/8] Added public API to enable post-copy migration

Thursday, 6 November 2014

Hi everyone,

On Thu, Nov 06, 2014 at 09:18:04AM +0200, Cristian Klein wrote: I
...
 talked to the qemu post-copy guys (Andrea and Dave in CC). Starting
 post-copy immediately is a bad performance choice: The VM will start
 on the destination hypervisor before the read-only or kernel memory
 is there. This means that those pages need to be pulled on-demand,
 hence a lot of overhead and interruptions in the VM’s execution.

 Instead, it is better to first do one pass of pre-copy and only then
 trigger post-copy. In fact, I did an experiment with a video
 streaming VM and starting post-copy after the first pass of pre-copy
 (instead of starting post-copy immediately) reduces downtime from
 3.5 seconds to under 1 second.

 Given all above, I propose the following post-copy API in libvirt:

 virDomainMigrateXXX(..., VIR_MIGRATE_ENABLE_POSTCOPY)
 virDomainMigrateStartPostCopy(...) // from a different thread

 This is for those who just need the post-copy mechanism and want to
 implement a policy themselves.

 virDomainMigrateXXX(..., VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY)

 This is for those who want to use post-copy without caring about any
 low-level details, offering a good enough policy for most cases.

 What do you think? Would you accept patches that implement this API? 
I agree, even better would be to also pass a parameter to specify how
many passes of precopy to run before engaging postcopy, adding at
least the number of passes parameter shouldn't be a huge change and in
doubt you can just define it to 1.

The other things needed are:

1) adding an event that doesn't require libvirt to poll to know when
   the source node has been stopped (then if postcopy was engaged
   precopy may not have finished, but the problem remains the same as
   with pure precopy: we need to know efficiently when exactly the
   source node has been stopped without adding an average 25msec
   latency)

2) preparing a second socket for qemu so we can run the out of band
   requests of postcopy without incurring into artificial latencies
   created by the socket sendbuffer kept filled by the background
   transfer (the hack to decrease /proc/sys/net/ipv4/tcp_wmem helps
   tremendously, but it's unlikely to ever be as efficient as having
   two sockets, potentially both running on openssl etc..)

Thanks,
Andrea

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] [PATCH v2 1/8] Added public API to enable post-copy migration