Re: [libvirt] [PATCH v2 1/8] Added public API to enable post-copy migration

6 Nov 2014

On 01 Oct 2014, at 12:07 , Jiri Denemark <jdenemar@redhat.com> wrote:
...
On Wed, Oct 01, 2014 at 10:45:33 +0200, Cristian KLEIN wrote:
...
On 2014-09-30 17:16, Daniel P. Berrange wrote:
...
On Tue, Sep 30, 2014 at 05:11:03PM +0200, Jiri Denemark wrote:
...
On Tue, Sep 30, 2014 at 16:39:22 +0200, Cristian Klein wrote:
...
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se>
---
 include/libvirt/libvirt.h.in | 1 +
 src/libvirt.c                | 7 +++++++
 2 files changed, 8 insertions(+)

diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
index 5217ab3..82f3aeb 100644
--- a/include/libvirt/libvirt.h.in
+++ b/include/libvirt/libvirt.h.in
@@ -1225,6 +1225,7 @@ typedef enum {
     VIR_MIGRATE_ABORT_ON_ERROR    = (1 << 12), /* abort migration on I/O errors happened during migration */
     VIR_MIGRATE_AUTO_CONVERGE     = (1 << 13), /* force convergence */
     VIR_MIGRATE_RDMA_PIN_ALL      = (1 << 14), /* RDMA memory pinning */
+    VIR_MIGRATE_POSTCOPY          = (1 << 15), /* enable (but don't start) post-copy */
 } virDomainMigrateFlags;
I still think we should add an extra flag to start post copy
immediately. To address your concerns about it, I don't think it's
implementing a policy in libvirt. It's for apps that want to make sure
migration converges without having to spawn another thread and monitor
the progress or wait for a timeout. It's a bit similar to migrating a
paused domain vs. migrating a running domain and pausing it when it
doesn't seem to converge.
Your point about spawning another thread makes me wonder if we should
actually look at adding a 'VIR_MIGRATE_ASYNC' method (that would require
P2P migration of course). If this flag were set, virDomainMigrateXXX would
only block for long enough to start the migration and then return.
Callers can use the job info API to monitor progress & success/failure.
Then we wouldn't have to keep adding flags like you suggest - apps can
just easily call the appropriate API right away with no threads needed
This would make a lot of sense. The user would call:
"""
virDomainMigrateXXX(..., VIR_MIGRATE_POSTCOPY | VIR_MIGRATE_ASYNC)
virDomainMigrateStartPostCopy(...)
"""
Would this be seen as more cumbersome than having a dedicated 
VIR_MIGRATE_POSTCOPY_AUTOSTART?
The ASYNC flag Daniel suggested makes sense, so I guess you can just
ignore my request for a special flag. Although, I don't think the ASYNC
stuff needs to be done within this series, let's just focus on the
post-copy stuff.
Hi Jirka,

I talked to the qemu post-copy guys (Andrea and Dave in CC). Starting post-copy immediately is a bad performance choice: The VM will start on the destination hypervisor before the read-only or kernel memory is there. This means that those pages need to be pulled on-demand, hence a lot of overhead and interruptions in the VM’s execution.

Instead, it is better to first do one pass of pre-copy and only then trigger post-copy. In fact, I did an experiment with a video streaming VM and starting post-copy after the first pass of pre-copy (instead of starting post-copy immediately) reduces downtime from 3.5 seconds to under 1 second.

Given all above, I propose the following post-copy API in libvirt:

virDomainMigrateXXX(..., VIR_MIGRATE_ENABLE_POSTCOPY)
virDomainMigrateStartPostCopy(...) // from a different thread

This is for those who just need the post-copy mechanism and want to implement a policy themselves.


virDomainMigrateXXX(..., VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY)

This is for those who want to use post-copy without caring about any low-level details, offering a good enough policy for most cases.

What do you think? Would you accept patches that implement this API?

Cristian