[libvirt] [PATCH 0/6] Post-copy live migration support

Qemu currently implements pre-copy live migration. VM memory pages are first copied from the source hypervisor to the destination, potentially multiple times as pages get dirtied during transfer, then VCPU state is migrated. Unfortunately, if the VM dirties memory faster than the network bandwidth, then pre-copy cannot finish. `virsh` currently includes an option to suspend a VM after a timeout, so that migration may finish, but at the expense of downtime. A future version of qemu will implement post-copy live migration. The VCPU state is first migrated to the destination hypervisor, then memory pages are pulled from the source hypervisor. Post-copy has the potential to do migration with zero-downtime, despite the VM dirtying pages fast, with minimum performance impact. On the other hand, one post-copy is in progress, any network failure would render the VM unusable, as its memory is partitioned between the source and destination hypervisor. Therefore, post-copy should only be used when necessary. Post-copy migration in qemu will work as follows: (1) The `x-postcopy-ram` migration capability needs to be set. (2) Migration is started. (3) When the user decides so, post-copy migration is activated by sending the `migrate-start-postcopy` command. Qemu acknowledges by setting migration status to `postcopy-active`. Cristian Klein (6): Added qemu postcopy-active migration status Added public API to enable post-copy migration Added new public API virDomainMigrateStartPostCopy Added domainMigrateStartPostCopy in qemu driver Added domainMigrateStartPostCopy in remote driver virsh: added migrate --postcopy-after include/libvirt/libvirt.h.in | 3 ++ src/driver.h | 4 +++ src/libvirt.c | 44 +++++++++++++++++++++++++++ src/libvirt_public.syms | 5 +++ src/qemu/qemu_driver.c | 44 +++++++++++++++++++++++++++ src/qemu/qemu_migration.c | 44 +++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 +- src/qemu/qemu_monitor.c | 21 ++++++++++++- src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 21 ++++++++++++- src/qemu/qemu_monitor_json.h | 1 + src/qemu/qemu_monitor_text.c | 14 ++++++++- src/qemu/qemu_monitor_text.h | 2 ++ src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +++++++- tools/virsh-domain.c | 72 ++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 5 +++ 17 files changed, 292 insertions(+), 7 deletions(-) -- 1.9.1

Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/qemu/qemu_migration.c | 1 + src/qemu/qemu_monitor.c | 2 +- src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 3 ++- src/qemu/qemu_monitor_text.c | 3 ++- 5 files changed, 7 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 44cb826..a5bd825 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1991,6 +1991,7 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, /* fall through */ case QEMU_MONITOR_MIGRATION_STATUS_SETUP: case QEMU_MONITOR_MIGRATION_STATUS_ACTIVE: + case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE: ret = 0; break; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index e2013be..14688bf 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -117,7 +117,7 @@ VIR_ONCE_GLOBAL_INIT(qemuMonitor) VIR_ENUM_IMPL(qemuMonitorMigrationStatus, QEMU_MONITOR_MIGRATION_STATUS_LAST, - "inactive", "active", "completed", "failed", "cancelled", "setup") + "inactive", "active", "completed", "failed", "cancelled", "setup", "postcopy-active") VIR_ENUM_IMPL(qemuMonitorMigrationCaps, QEMU_MONITOR_MIGRATION_CAPS_LAST, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 9a72b59..587f779 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -429,6 +429,7 @@ enum { QEMU_MONITOR_MIGRATION_STATUS_ERROR, QEMU_MONITOR_MIGRATION_STATUS_CANCELLED, QEMU_MONITOR_MIGRATION_STATUS_SETUP, + QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE, QEMU_MONITOR_MIGRATION_STATUS_LAST }; diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index a3d7c2c..e98962b 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2557,7 +2557,8 @@ qemuMonitorJSONGetMigrationStatusReply(virJSONValuePtr reply, status->setup_time_set = true; if (status->status == QEMU_MONITOR_MIGRATION_STATUS_ACTIVE || - status->status == QEMU_MONITOR_MIGRATION_STATUS_COMPLETED) { + status->status == QEMU_MONITOR_MIGRATION_STATUS_COMPLETED || + status->status == QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE) { virJSONValuePtr ram = virJSONValueObjectGet(ret, "ram"); if (!ram) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index 46d2782..a3c8aa5 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -1458,7 +1458,8 @@ int qemuMonitorTextGetMigrationStatus(qemuMonitorPtr mon, goto cleanup; } - if (status->status == QEMU_MONITOR_MIGRATION_STATUS_ACTIVE) { + if (status->status == QEMU_MONITOR_MIGRATION_STATUS_ACTIVE || + status->status == QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE) { tmp = end + 1; if (!(tmp = strstr(tmp, MIGRATION_TRANSFER_PREFIX))) -- 1.9.1

On Tue, Sep 23, 2014 at 16:09:56 +0200, Cristian Klein wrote:
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/qemu/qemu_migration.c | 1 + src/qemu/qemu_monitor.c | 2 +- src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 3 ++- src/qemu/qemu_monitor_text.c | 3 ++- 5 files changed, 7 insertions(+), 3 deletions(-)
...
diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index 46d2782..a3c8aa5 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -1458,7 +1458,8 @@ int qemuMonitorTextGetMigrationStatus(qemuMonitorPtr mon, goto cleanup; }
- if (status->status == QEMU_MONITOR_MIGRATION_STATUS_ACTIVE) { + if (status->status == QEMU_MONITOR_MIGRATION_STATUS_ACTIVE || + status->status == QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE) { tmp = end + 1;
I don't think we need to change text monitor code since we won't be talking with HMP to any version of QEMU that supports post-copy migration. Otherwise the patch is just fine and we just need to wait for post-copy to land in QEMU. Jirka

Added a new `libvirt` migration flag `VIR_MIGRATE_POSTCOPY` and the necessary code to pass this flag to qemu as migration capability. Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 7 +++++++ src/qemu/qemu_migration.c | 43 +++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 ++- 4 files changed, 53 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 6371b7b..bdc33c6 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1225,6 +1225,7 @@ typedef enum { VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration on I/O errors happened during migration */ VIR_MIGRATE_AUTO_CONVERGE = (1 << 13), /* force convergence */ VIR_MIGRATE_RDMA_PIN_ALL = (1 << 14), /* RDMA memory pinning */ + VIR_MIGRATE_POSTCOPY = (1 << 15), /* enable (but don't start) post-copy */ } virDomainMigrateFlags; diff --git a/src/libvirt.c b/src/libvirt.c index 1a285ca..33aeafa 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -5265,6 +5265,7 @@ virDomainMigrateDirect(virDomainPtr domain, * automatically when supported). * VIR_MIGRATE_UNSAFE Force migration even if it is considered unsafe. * VIR_MIGRATE_OFFLINE Migrate offline + * VIR_MIGRATE_POSTCOPY Enable (but don't start) post-copy * * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set. * Applications using the VIR_MIGRATE_PEER2PEER flag will probably @@ -5291,6 +5292,12 @@ virDomainMigrateDirect(virDomainPtr domain, * can use either VIR_MIGRATE_NON_SHARED_DISK or * VIR_MIGRATE_NON_SHARED_INC as they are mutually exclusive. * + * If you want to enable post-copy migration you must set the + * VIR_MIGRATE_POSTCOPY flag. Once migration is active, you may + * start post-copy by calling virDomainMigrateStartPostCopy. + * When to start post-copy is entirely left to the user, libvirt + * only implements the necessary mechanism. + * * In either case it is typically only necessary to specify a * URI if the destination host has multiple interfaces and a * specific interface is required to transmit migration data. diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index a5bd825..bd1f2d6 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1799,6 +1799,45 @@ qemuMigrationSetOffline(virQEMUDriverPtr driver, static int +qemuMigrationSetPostCopy(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob job) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, job) < 0) + return -1; + + ret = qemuMonitorGetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY); + + if (ret < 0) { + goto cleanup; + } else if (ret == 0) { + if (job == QEMU_ASYNC_JOB_MIGRATION_IN) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Post-copy migration is not supported by " + "target QEMU binary")); + } else { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Post-copy migration is not supported by " + "source QEMU binary")); + } + ret = -1; + goto cleanup; + } + + ret = qemuMonitorSetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY); + + cleanup: + qemuDomainObjExitMonitor(driver, vm); + return ret; +} +static int qemuMigrationSetCompression(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob job) @@ -3580,6 +3619,10 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (flags & VIR_MIGRATE_RDMA_PIN_ALL && qemuMigrationSetPinAll(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + + if (flags & VIR_MIGRATE_POSTCOPY && + qemuMigrationSetPostCopy(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup; if (qemuDomainObjEnterMonitorAsync(driver, vm, diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index e7a90c3..349c9c4 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -41,7 +41,8 @@ VIR_MIGRATE_COMPRESSED | \ VIR_MIGRATE_ABORT_ON_ERROR | \ VIR_MIGRATE_AUTO_CONVERGE | \ - VIR_MIGRATE_RDMA_PIN_ALL) + VIR_MIGRATE_RDMA_PIN_ALL | \ + VIR_MIGRATE_POSTCOPY) /* All supported migration parameters and their types. */ # define QEMU_MIGRATION_PARAMETERS \ -- 1.9.1

On Tue, Sep 23, 2014 at 16:09:57 +0200, Cristian Klein wrote:
Added a new `libvirt` migration flag `VIR_MIGRATE_POSTCOPY` and the necessary code to pass this flag to qemu as migration capability.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 7 +++++++ src/qemu/qemu_migration.c | 43 +++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 ++- 4 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 6371b7b..bdc33c6 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1225,6 +1225,7 @@ typedef enum { VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration on I/O errors happened during migration */ VIR_MIGRATE_AUTO_CONVERGE = (1 << 13), /* force convergence */ VIR_MIGRATE_RDMA_PIN_ALL = (1 << 14), /* RDMA memory pinning */ + VIR_MIGRATE_POSTCOPY = (1 << 15), /* enable (but don't start) post-copy */ } virDomainMigrateFlags;
diff --git a/src/libvirt.c b/src/libvirt.c index 1a285ca..33aeafa 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -5265,6 +5265,7 @@ virDomainMigrateDirect(virDomainPtr domain, * automatically when supported). * VIR_MIGRATE_UNSAFE Force migration even if it is considered unsafe. * VIR_MIGRATE_OFFLINE Migrate offline + * VIR_MIGRATE_POSTCOPY Enable (but don't start) post-copy * * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set. * Applications using the VIR_MIGRATE_PEER2PEER flag will probably @@ -5291,6 +5292,12 @@ virDomainMigrateDirect(virDomainPtr domain, * can use either VIR_MIGRATE_NON_SHARED_DISK or * VIR_MIGRATE_NON_SHARED_INC as they are mutually exclusive. * + * If you want to enable post-copy migration you must set the + * VIR_MIGRATE_POSTCOPY flag. Once migration is active, you may + * start post-copy by calling virDomainMigrateStartPostCopy. + * When to start post-copy is entirely left to the user, libvirt + * only implements the necessary mechanism. + * * In either case it is typically only necessary to specify a * URI if the destination host has multiple interfaces and a * specific interface is required to transmit migration data. diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index a5bd825..bd1f2d6 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1799,6 +1799,45 @@ qemuMigrationSetOffline(virQEMUDriverPtr driver,
static int +qemuMigrationSetPostCopy(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob job)
s/ // in the two lines above to fix indentation.
+{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, job) < 0) + return -1; + + ret = qemuMonitorGetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY);
QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY is not defined anywhere in this patchset. It seems you forgot to update qemu_monitor.[ch].
+ + if (ret < 0) { + goto cleanup; + } else if (ret == 0) { + if (job == QEMU_ASYNC_JOB_MIGRATION_IN) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Post-copy migration is not supported by " + "target QEMU binary")); + } else { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Post-copy migration is not supported by " + "source QEMU binary")); + } + ret = -1; + goto cleanup; + } + + ret = qemuMonitorSetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY); + + cleanup: + qemuDomainObjExitMonitor(driver, vm); + return ret; +} +static int qemuMigrationSetCompression(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob job) @@ -3580,6 +3619,10 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (flags & VIR_MIGRATE_RDMA_PIN_ALL && qemuMigrationSetPinAll(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + + if (flags & VIR_MIGRATE_POSTCOPY && + qemuMigrationSetPostCopy(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup;
if (qemuDomainObjEnterMonitorAsync(driver, vm,
I'd expect similar thing would need to be done in the Prepare phase on destination... However, if destination does not need to set the capability, we at least need to check if destination QEMU supports it and report failure from Prepare if it doesn't. And the QEMU_ASYNC_JOB_MIGRATION_IN branch in qemuMigrationSetPostCopy would be unreachable in that case.
diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index e7a90c3..349c9c4 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -41,7 +41,8 @@ VIR_MIGRATE_COMPRESSED | \ VIR_MIGRATE_ABORT_ON_ERROR | \ VIR_MIGRATE_AUTO_CONVERGE | \ - VIR_MIGRATE_RDMA_PIN_ALL) + VIR_MIGRATE_RDMA_PIN_ALL | \ + VIR_MIGRATE_POSTCOPY)
/* All supported migration parameters and their types. */ # define QEMU_MIGRATION_PARAMETERS \
Jirka

On 2014-09-24 14:32, Jiri Denemark wrote:
On Tue, Sep 23, 2014 at 16:09:57 +0200, Cristian Klein wrote:
Added a new `libvirt` migration flag `VIR_MIGRATE_POSTCOPY` and the necessary code to pass this flag to qemu as migration capability.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 7 +++++++ src/qemu/qemu_migration.c | 43 +++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 ++- 4 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 6371b7b..bdc33c6 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1225,6 +1225,7 @@ typedef enum { VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration on I/O errors happened during migration */ VIR_MIGRATE_AUTO_CONVERGE = (1 << 13), /* force convergence */ VIR_MIGRATE_RDMA_PIN_ALL = (1 << 14), /* RDMA memory pinning */ + VIR_MIGRATE_POSTCOPY = (1 << 15), /* enable (but don't start) post-copy */ } virDomainMigrateFlags;
diff --git a/src/libvirt.c b/src/libvirt.c index 1a285ca..33aeafa 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -5265,6 +5265,7 @@ virDomainMigrateDirect(virDomainPtr domain, * automatically when supported). * VIR_MIGRATE_UNSAFE Force migration even if it is considered unsafe. * VIR_MIGRATE_OFFLINE Migrate offline + * VIR_MIGRATE_POSTCOPY Enable (but don't start) post-copy * * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set. * Applications using the VIR_MIGRATE_PEER2PEER flag will probably @@ -5291,6 +5292,12 @@ virDomainMigrateDirect(virDomainPtr domain, * can use either VIR_MIGRATE_NON_SHARED_DISK or * VIR_MIGRATE_NON_SHARED_INC as they are mutually exclusive. * + * If you want to enable post-copy migration you must set the + * VIR_MIGRATE_POSTCOPY flag. Once migration is active, you may + * start post-copy by calling virDomainMigrateStartPostCopy. + * When to start post-copy is entirely left to the user, libvirt + * only implements the necessary mechanism. + * * In either case it is typically only necessary to specify a * URI if the destination host has multiple interfaces and a * specific interface is required to transmit migration data. diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index a5bd825..bd1f2d6 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1799,6 +1799,45 @@ qemuMigrationSetOffline(virQEMUDriverPtr driver,
static int +qemuMigrationSetPostCopy(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob job)
s/ // in the two lines above to fix indentation.
Will fix in next version of the patch. Thanks for pointing out "make syntax-check".
+{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, job) < 0) + return -1; + + ret = qemuMonitorGetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY);
QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY is not defined anywhere in this patchset. It seems you forgot to update qemu_monitor.[ch].
Sorry about that, I divided the patches incorrectly. Will be fixed in PATCH v2.
+ + if (ret < 0) { + goto cleanup; + } else if (ret == 0) { + if (job == QEMU_ASYNC_JOB_MIGRATION_IN) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Post-copy migration is not supported by " + "target QEMU binary")); + } else { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Post-copy migration is not supported by " + "source QEMU binary")); + } + ret = -1; + goto cleanup; + } + + ret = qemuMonitorSetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY); + + cleanup: + qemuDomainObjExitMonitor(driver, vm); + return ret; +} +static int qemuMigrationSetCompression(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob job) @@ -3580,6 +3619,10 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (flags & VIR_MIGRATE_RDMA_PIN_ALL && qemuMigrationSetPinAll(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + + if (flags & VIR_MIGRATE_POSTCOPY && + qemuMigrationSetPostCopy(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup;
if (qemuDomainObjEnterMonitorAsync(driver, vm,
I'd expect similar thing would need to be done in the Prepare phase on destination... However, if destination does not need to set the capability, we at least need to check if destination QEMU supports it and report failure from Prepare if it doesn't. And the QEMU_ASYNC_JOB_MIGRATION_IN branch in qemuMigrationSetPostCopy would be unreachable in that case.
It's up to the source qemu (through the migration protocol) to activate post-copy on the destination qemu. So there are no other steps to be done on the source. I have mixed feelings about having libvirt check the necessary capability on the destination. Personally, I would prefer qemu to return an error like "post-copy unavailable" when the "migrate" command is issued, as there might be more factors that influence the availability of post-copy. For example, it's not sufficient for the destination qemu to support post-copy, but the destination kernel also needs userfault support. Dave, I just tried migration from the post-copy qemu to master qemu and it fails unexpectedly. The "migrate" command return no error, but a "failed" migration status is returned later. Weirdly, this also happens when I initiate non-post-copy migration. Do you happen to know what is the qemu policy regarding migration? What would be the best way to determine if a certain type of migration is feasible, while at the same time returning a meaningful error to the user?
diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index e7a90c3..349c9c4 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -41,7 +41,8 @@ VIR_MIGRATE_COMPRESSED | \ VIR_MIGRATE_ABORT_ON_ERROR | \ VIR_MIGRATE_AUTO_CONVERGE | \ - VIR_MIGRATE_RDMA_PIN_ALL) + VIR_MIGRATE_RDMA_PIN_ALL | \ + VIR_MIGRATE_POSTCOPY)
/* All supported migration parameters and their types. */ # define QEMU_MIGRATION_PARAMETERS \
-- Cristian Klein, PhD Post-doc @ Umeå Universitet http://www8.cs.umu.se/~cklein

On Thu, Sep 25, 2014 at 11:42:16 +0200, Cristian KLEIN wrote:
On 2014-09-24 14:32, Jiri Denemark wrote:
On Tue, Sep 23, 2014 at 16:09:57 +0200, Cristian Klein wrote:
Added a new `libvirt` migration flag `VIR_MIGRATE_POSTCOPY` and the necessary code to pass this flag to qemu as migration capability.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 7 +++++++ src/qemu/qemu_migration.c | 43 +++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 ++- 4 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 6371b7b..bdc33c6 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1225,6 +1225,7 @@ typedef enum { VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration on I/O errors happened during migration */ VIR_MIGRATE_AUTO_CONVERGE = (1 << 13), /* force convergence */ VIR_MIGRATE_RDMA_PIN_ALL = (1 << 14), /* RDMA memory pinning */ + VIR_MIGRATE_POSTCOPY = (1 << 15), /* enable (but don't start) post-copy */ } virDomainMigrateFlags;
In my first review, I forgot to mention that I think another flag requesting post-copy to be started right away (instead of waiting for virDomainMigrateStartPostCopy to be called) could be useful too. ...
@@ -3580,6 +3619,10 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (flags & VIR_MIGRATE_RDMA_PIN_ALL && qemuMigrationSetPinAll(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + + if (flags & VIR_MIGRATE_POSTCOPY && + qemuMigrationSetPostCopy(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup;
if (qemuDomainObjEnterMonitorAsync(driver, vm,
I'd expect similar thing would need to be done in the Prepare phase on destination... However, if destination does not need to set the capability, we at least need to check if destination QEMU supports it and report failure from Prepare if it doesn't. And the QEMU_ASYNC_JOB_MIGRATION_IN branch in qemuMigrationSetPostCopy would be unreachable in that case.
It's up to the source qemu (through the migration protocol) to activate post-copy on the destination qemu. So there are no other steps to be done on the source.
I have mixed feelings about having libvirt check the necessary capability on the destination. Personally, I would prefer qemu to return an error like "post-copy unavailable" when the "migrate" command is issued, as there might be more factors that influence the availability of post-copy. For example, it's not sufficient for the destination qemu to support post-copy, but the destination kernel also needs userfault support.
Yeah, it won't catch all cases, but will at least fail early when someone tries to enable post-copy migration while migrating to a host with older QEMU that does not support it. Moreover, if QEMU is smart enough to check if post-copy migration can actually be used when the migration capability is set, we could catch even cases when userfault is not supported on the destination. Jirka

On 2014-09-25 14:21, Jiri Denemark wrote:
On Thu, Sep 25, 2014 at 11:42:16 +0200, Cristian KLEIN wrote:
On 2014-09-24 14:32, Jiri Denemark wrote:
On Tue, Sep 23, 2014 at 16:09:57 +0200, Cristian Klein wrote:
Added a new `libvirt` migration flag `VIR_MIGRATE_POSTCOPY` and the necessary code to pass this flag to qemu as migration capability.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 7 +++++++ src/qemu/qemu_migration.c | 43 +++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 ++- 4 files changed, 53 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 6371b7b..bdc33c6 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1225,6 +1225,7 @@ typedef enum { VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration on I/O errors happened during migration */ VIR_MIGRATE_AUTO_CONVERGE = (1 << 13), /* force convergence */ VIR_MIGRATE_RDMA_PIN_ALL = (1 << 14), /* RDMA memory pinning */ + VIR_MIGRATE_POSTCOPY = (1 << 15), /* enable (but don't start) post-copy */ } virDomainMigrateFlags;
In my first review, I forgot to mention that I think another flag requesting post-copy to be started right away (instead of waiting for virDomainMigrateStartPostCopy to be called) could be useful too.
I'm a bit reluctant to include such a flag. Since qemu does not offer such a mechanism, it would feel like we are implementing a policy in libvirt.
...
@@ -3580,6 +3619,10 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (flags & VIR_MIGRATE_RDMA_PIN_ALL && qemuMigrationSetPinAll(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + + if (flags & VIR_MIGRATE_POSTCOPY && + qemuMigrationSetPostCopy(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup;
if (qemuDomainObjEnterMonitorAsync(driver, vm,
I'd expect similar thing would need to be done in the Prepare phase on destination... However, if destination does not need to set the capability, we at least need to check if destination QEMU supports it and report failure from Prepare if it doesn't. And the QEMU_ASYNC_JOB_MIGRATION_IN branch in qemuMigrationSetPostCopy would be unreachable in that case.
It's up to the source qemu (through the migration protocol) to activate post-copy on the destination qemu. So there are no other steps to be done on the source.
I have mixed feelings about having libvirt check the necessary capability on the destination. Personally, I would prefer qemu to return an error like "post-copy unavailable" when the "migrate" command is issued, as there might be more factors that influence the availability of post-copy. For example, it's not sufficient for the destination qemu to support post-copy, but the destination kernel also needs userfault support.
Yeah, it won't catch all cases, but will at least fail early when someone tries to enable post-copy migration while migrating to a host with older QEMU that does not support it.
Moreover, if QEMU is smart enough to check if post-copy migration can actually be used when the migration capability is set, we could catch even cases when userfault is not supported on the destination.
Okey, I'll add code for that too. -- Cristian Klein, PhD Post-doc @ Umeå Universitet http://www8.cs.umu.se/~cklein

The user first start migration using the `VIR_MIGRATE_POSTCOPY` flag, then calls `virDomainMigrateStartPostCopy` asynchronously to switch from pre-copy to post-copy. Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- include/libvirt/libvirt.h.in | 2 ++ src/driver.h | 4 ++++ src/libvirt.c | 37 +++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 5 +++++ 4 files changed, 48 insertions(+) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index bdc33c6..eabedfa 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1346,6 +1346,8 @@ int virDomainMigrateToURI3(virDomainPtr domain, unsigned int nparams, unsigned int flags); +int virDomainMigrateStartPostCopy (virDomainPtr domain); + int virDomainMigrateSetMaxDowntime (virDomainPtr domain, unsigned long long downtime, unsigned int flags); diff --git a/src/driver.h b/src/driver.h index bb748c4..6866ccd 100644 --- a/src/driver.h +++ b/src/driver.h @@ -1212,6 +1212,9 @@ typedef int virDomainStatsRecordPtr **retStats, unsigned int flags); +typedef int +(*virDrvDomainMigrateStartPostCopy)(virDomainPtr domain); + typedef struct _virDriver virDriver; typedef virDriver *virDriverPtr; @@ -1435,6 +1438,7 @@ struct _virDriver { virDrvNodeGetFreePages nodeGetFreePages; virDrvConnectGetDomainCapabilities connectGetDomainCapabilities; virDrvConnectGetAllDomainStats connectGetAllDomainStats; + virDrvDomainMigrateStartPostCopy domainMigrateStartPostCopy; }; diff --git a/src/libvirt.c b/src/libvirt.c index 33aeafa..e685da2 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17803,6 +17803,43 @@ virDomainMigrateSetCompressionCache(virDomainPtr domain, /** + * virDomainMigrateStartPostCopy: + * @domain: a domain object + * + * Starts post-copy migration. This function has to be called while + * migration (initially pre-copy) is in progress. The migration operation + * must be called with the VIR_MIGRATE_POSTCOPY flag. + * + * Returns 0 in case of success, -1 otherwise. + */ +int +virDomainMigrateStartPostCopy(virDomainPtr domain) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(domain); + + virResetLastError(); + + virCheckDomainReturn(domain, -1); + conn = domain->conn; + + virCheckReadOnlyGoto(conn->flags, error); + + if (conn->driver->domainMigrateStartPostCopy) { + if (conn->driver->domainMigrateStartPostCopy(domain) < 0) + goto error; + return 0; + } + + virReportUnsupportedError(); + error: + virDispatchError(conn); + return -1; +} + + +/** * virDomainMigrateSetMaxSpeed: * @domain: a domain object * @bandwidth: migration bandwidth limit in MiB/s diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index e1f013f..ea17a07 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -679,4 +679,9 @@ LIBVIRT_1.2.8 { virDomainStatsRecordListFree; } LIBVIRT_1.2.7; +LIBVIRT_1.2.9 { + global: + virDomainMigrateStartPostCopy; +} LIBVIRT_1.2.8; + # .... define new API here using predicted next version number .... -- 1.9.1

On Tue, Sep 23, 2014 at 16:09:58 +0200, Cristian Klein wrote:
The user first start migration using the `VIR_MIGRATE_POSTCOPY` flag, then calls `virDomainMigrateStartPostCopy` asynchronously to switch from pre-copy to post-copy.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- include/libvirt/libvirt.h.in | 2 ++ src/driver.h | 4 ++++ src/libvirt.c | 37 +++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 5 +++++ 4 files changed, 48 insertions(+)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index bdc33c6..eabedfa 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1346,6 +1346,8 @@ int virDomainMigrateToURI3(virDomainPtr domain, unsigned int nparams, unsigned int flags);
+int virDomainMigrateStartPostCopy (virDomainPtr domain); + int virDomainMigrateSetMaxDowntime (virDomainPtr domain, unsigned long long downtime, unsigned int flags); diff --git a/src/driver.h b/src/driver.h index bb748c4..6866ccd 100644 --- a/src/driver.h +++ b/src/driver.h @@ -1212,6 +1212,9 @@ typedef int virDomainStatsRecordPtr **retStats, unsigned int flags);
+typedef int +(*virDrvDomainMigrateStartPostCopy)(virDomainPtr domain); + typedef struct _virDriver virDriver; typedef virDriver *virDriverPtr;
@@ -1435,6 +1438,7 @@ struct _virDriver { virDrvNodeGetFreePages nodeGetFreePages; virDrvConnectGetDomainCapabilities connectGetDomainCapabilities; virDrvConnectGetAllDomainStats connectGetAllDomainStats; + virDrvDomainMigrateStartPostCopy domainMigrateStartPostCopy; };
diff --git a/src/libvirt.c b/src/libvirt.c index 33aeafa..e685da2 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17803,6 +17803,43 @@ virDomainMigrateSetCompressionCache(virDomainPtr domain,
/** + * virDomainMigrateStartPostCopy: + * @domain: a domain object + * + * Starts post-copy migration. This function has to be called while + * migration (initially pre-copy) is in progress. The migration operation + * must be called with the VIR_MIGRATE_POSTCOPY flag. + * + * Returns 0 in case of success, -1 otherwise. + */ +int +virDomainMigrateStartPostCopy(virDomainPtr domain) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(domain); + + virResetLastError(); + + virCheckDomainReturn(domain, -1); + conn = domain->conn; + + virCheckReadOnlyGoto(conn->flags, error); + + if (conn->driver->domainMigrateStartPostCopy) { + if (conn->driver->domainMigrateStartPostCopy(domain) < 0) + goto error; + return 0; + } + + virReportUnsupportedError(); + error: + virDispatchError(conn); + return -1; +} + + +/** * virDomainMigrateSetMaxSpeed: * @domain: a domain object * @bandwidth: migration bandwidth limit in MiB/s diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index e1f013f..ea17a07 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -679,4 +679,9 @@ LIBVIRT_1.2.8 { virDomainStatsRecordListFree; } LIBVIRT_1.2.7;
+LIBVIRT_1.2.9 { + global: + virDomainMigrateStartPostCopy; +} LIBVIRT_1.2.8; +
You will need to change this section since post-copy won't make it in 1.2.9 (which freezes tomorrow)... Looks good otherwise. Jirka

On Wed, Sep 24, 2014 at 14:45:02 +0200, Jiri Denemark wrote:
On Tue, Sep 23, 2014 at 16:09:58 +0200, Cristian Klein wrote:
The user first start migration using the `VIR_MIGRATE_POSTCOPY` flag, then calls `virDomainMigrateStartPostCopy` asynchronously to switch from pre-copy to post-copy.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- include/libvirt/libvirt.h.in | 2 ++ src/driver.h | 4 ++++ src/libvirt.c | 37 +++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 5 +++++ 4 files changed, 48 insertions(+)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index bdc33c6..eabedfa 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1346,6 +1346,8 @@ int virDomainMigrateToURI3(virDomainPtr domain, unsigned int nparams, unsigned int flags);
+int virDomainMigrateStartPostCopy (virDomainPtr domain); + int virDomainMigrateSetMaxDowntime (virDomainPtr domain, unsigned long long downtime, unsigned int flags); diff --git a/src/driver.h b/src/driver.h index bb748c4..6866ccd 100644 --- a/src/driver.h +++ b/src/driver.h @@ -1212,6 +1212,9 @@ typedef int virDomainStatsRecordPtr **retStats, unsigned int flags);
+typedef int +(*virDrvDomainMigrateStartPostCopy)(virDomainPtr domain); + typedef struct _virDriver virDriver; typedef virDriver *virDriverPtr;
@@ -1435,6 +1438,7 @@ struct _virDriver { virDrvNodeGetFreePages nodeGetFreePages; virDrvConnectGetDomainCapabilities connectGetDomainCapabilities; virDrvConnectGetAllDomainStats connectGetAllDomainStats; + virDrvDomainMigrateStartPostCopy domainMigrateStartPostCopy; };
diff --git a/src/libvirt.c b/src/libvirt.c index 33aeafa..e685da2 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17803,6 +17803,43 @@ virDomainMigrateSetCompressionCache(virDomainPtr domain,
/** + * virDomainMigrateStartPostCopy: + * @domain: a domain object + * + * Starts post-copy migration. This function has to be called while + * migration (initially pre-copy) is in progress. The migration operation + * must be called with the VIR_MIGRATE_POSTCOPY flag. + * + * Returns 0 in case of success, -1 otherwise. + */ +int +virDomainMigrateStartPostCopy(virDomainPtr domain)
Oops, I forgot to mention we should add unsigned int flags parameter for this API just in case we need it in the future. Jirka

Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/qemu/qemu_driver.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + src/qemu/qemu_monitor_text.c | 11 +++++++++++ src/qemu/qemu_monitor_text.h | 2 ++ 7 files changed, 97 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e73d4f9..02c9a3b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11861,6 +11861,49 @@ qemuDomainGetJobStats(virDomainPtr dom, } +static int qemuMigrateStartPostCopy(virDomainPtr dom) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm; + int ret = -1; + qemuDomainObjPrivatePtr priv; + + if (!(vm = qemuDomObjFromDomain(dom))) + goto cleanup; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MIGRATION_OP) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + priv = vm->privateData; + + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("post-copy can only be started " + "while migration is in progress")); + goto cleanup; + } + + VIR_DEBUG("Starting post-copy"); + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorMigrateStartPostCopy(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + + endjob: + if (!qemuDomainObjEndJob(driver, vm)) + vm = NULL; + + cleanup: + if (vm) + virObjectUnlock(vm); + return ret; +} + static int qemuDomainAbortJob(virDomainPtr dom) { virQEMUDriverPtr driver = dom->conn->privateData; @@ -18259,6 +18302,7 @@ static virDriver qemuDriver = { .nodeGetFreePages = qemuNodeGetFreePages, /* 1.2.6 */ .connectGetDomainCapabilities = qemuConnectGetDomainCapabilities, /* 1.2.7 */ .connectGetAllDomainStats = qemuConnectGetAllDomainStats, /* 1.2.8 */ + .domainMigrateStartPostCopy = qemuMigrateStartPostCopy, /* 1.2.9 */ }; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 14688bf..0b230cc 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2384,6 +2384,25 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, return ret; } +int qemuMonitorMigrateStartPostCopy(qemuMonitorPtr mon) +{ + int ret; + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONMigrateStartPostCopy(mon); + else + ret = qemuMonitorTextMigrateStartPostCopy(mon); + return ret; +} + + int qemuMonitorMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 587f779..97d980d 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -539,6 +539,8 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, unsigned int flags, const char *unixfile); +int qemuMonitorMigrateStartPostCopy(qemuMonitorPtr mon); + int qemuMonitorMigrateCancel(qemuMonitorPtr mon); int qemuMonitorGetDumpGuestMemoryCapability(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e98962b..14e7f84 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2798,6 +2798,24 @@ int qemuMonitorJSONMigrate(qemuMonitorPtr mon, return ret; } +int qemuMonitorJSONMigrateStartPostCopy(qemuMonitorPtr mon) +{ + int ret; + virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("migrate-start-postcopy", NULL); + virJSONValuePtr reply = NULL; + if (!cmd) + return -1; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index d366179..54beb7f 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -151,6 +151,7 @@ int qemuMonitorJSONGetSpiceMigrationStatus(qemuMonitorPtr mon, bool *spice_migrated); +int qemuMonitorJSONMigrateStartPostCopy(qemuMonitorPtr mon); int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon); int qemuMonitorJSONGetDumpGuestMemoryCapability(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index a3c8aa5..1354651 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -1613,6 +1613,17 @@ int qemuMonitorTextMigrate(qemuMonitorPtr mon, return ret; } +int qemuMonitorTextMigrateStartPostCopy(qemuMonitorPtr mon) +{ + char *info = NULL; + int ret; + + ret = qemuMonitorHMPCommand(mon, "migrate-start-postcopy", &info); + + VIR_FREE(info); + return ret; +} + int qemuMonitorTextMigrateCancel(qemuMonitorPtr mon) { char *info = NULL; diff --git a/src/qemu/qemu_monitor_text.h b/src/qemu/qemu_monitor_text.h index 49d4b88..ae982d5 100644 --- a/src/qemu/qemu_monitor_text.h +++ b/src/qemu/qemu_monitor_text.h @@ -123,6 +123,8 @@ int qemuMonitorTextMigrate(qemuMonitorPtr mon, unsigned int flags, const char *uri); +int qemuMonitorTextMigrateStartPostCopy(qemuMonitorPtr mon); + int qemuMonitorTextMigrateCancel(qemuMonitorPtr mon); int qemuMonitorTextGraphicsRelocate(qemuMonitorPtr mon, -- 1.9.1

On Tue, Sep 23, 2014 at 16:09:59 +0200, Cristian Klein wrote:
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/qemu/qemu_driver.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + src/qemu/qemu_monitor_text.c | 11 +++++++++++ src/qemu/qemu_monitor_text.h | 2 ++
No need to touch qemu_monitor_text.[ch] since we will only use QMP with any QEMU that supports post-copy migration.
7 files changed, 97 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e73d4f9..02c9a3b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11861,6 +11861,49 @@ qemuDomainGetJobStats(virDomainPtr dom, }
+static int qemuMigrateStartPostCopy(virDomainPtr dom) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm; + int ret = -1; + qemuDomainObjPrivatePtr priv; + + if (!(vm = qemuDomObjFromDomain(dom))) + goto cleanup; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MIGRATION_OP) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + priv = vm->privateData; + + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("post-copy can only be started " + "while migration is in progress"));
We forbid the use of tabs for indentation. Please, run make syntax-check (in addition to make check) before sending patches to catch this type of issues.
+ goto cleanup; + } + + VIR_DEBUG("Starting post-copy"); + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorMigrateStartPostCopy(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + + endjob: + if (!qemuDomainObjEndJob(driver, vm)) + vm = NULL; + + cleanup: + if (vm) + virObjectUnlock(vm); + return ret; +} + static int qemuDomainAbortJob(virDomainPtr dom) { virQEMUDriverPtr driver = dom->conn->privateData; @@ -18259,6 +18302,7 @@ static virDriver qemuDriver = { .nodeGetFreePages = qemuNodeGetFreePages, /* 1.2.6 */ .connectGetDomainCapabilities = qemuConnectGetDomainCapabilities, /* 1.2.7 */ .connectGetAllDomainStats = qemuConnectGetAllDomainStats, /* 1.2.8 */ + .domainMigrateStartPostCopy = qemuMigrateStartPostCopy, /* 1.2.9 */
This will need to be updated.
};
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 14688bf..0b230cc 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2384,6 +2384,25 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, return ret; }
+int qemuMonitorMigrateStartPostCopy(qemuMonitorPtr mon) +{ + int ret; + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONMigrateStartPostCopy(mon); + else + ret = qemuMonitorTextMigrateStartPostCopy(mon);
Just return an error if !mon->json as other recent functions in qemu_monitor.c do.
+ return ret; +} + + int qemuMonitorMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 587f779..97d980d 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -539,6 +539,8 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, unsigned int flags, const char *unixfile);
+int qemuMonitorMigrateStartPostCopy(qemuMonitorPtr mon); + int qemuMonitorMigrateCancel(qemuMonitorPtr mon);
int qemuMonitorGetDumpGuestMemoryCapability(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e98962b..14e7f84 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2798,6 +2798,24 @@ int qemuMonitorJSONMigrate(qemuMonitorPtr mon, return ret; }
+int qemuMonitorJSONMigrateStartPostCopy(qemuMonitorPtr mon) +{ + int ret; + virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("migrate-start-postcopy", NULL); + virJSONValuePtr reply = NULL; + if (!cmd) + return -1;
I think the following would be better to avoid the long line: virJSONValuePtr cmd; cmd = qemuMonitorJSONMakeCommand("migrate-start-postcopy", NULL); if (!cmd) ...
+ + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index d366179..54beb7f 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -151,6 +151,7 @@ int qemuMonitorJSONGetSpiceMigrationStatus(qemuMonitorPtr mon, bool *spice_migrated);
+int qemuMonitorJSONMigrateStartPostCopy(qemuMonitorPtr mon); int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon);
int qemuMonitorJSONGetDumpGuestMemoryCapability(qemuMonitorPtr mon, ...
This mostly looks good in isolation but I think this is not going to work. When post-copy is started, QEMU on the destination host will be resumed (I'm not sure if that happens automatically or we have to do it), which basically means we need to jump out of the Perform state and call Finish and once it returns, we should keep waiting for the post-copy migration to finish in Confirm state and kill the domain at the end. It's certainly possible the steps we need to do are a bit different since I'm not familiar with all the details about post-copy migration, but I believe we need to do something. And just running a single QEMU command is not enough to start post-copy in libvirt. Jirka

On Wed, Sep 24, 2014 at 15:06:57 +0200, Jiri Denemark wrote:
On Tue, Sep 23, 2014 at 16:09:59 +0200, Cristian Klein wrote:
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/qemu/qemu_driver.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + src/qemu/qemu_monitor_text.c | 11 +++++++++++ src/qemu/qemu_monitor_text.h | 2 ++
No need to touch qemu_monitor_text.[ch] since we will only use QMP with any QEMU that supports post-copy migration.
7 files changed, 97 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e73d4f9..02c9a3b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11861,6 +11861,49 @@ qemuDomainGetJobStats(virDomainPtr dom, }
+static int qemuMigrateStartPostCopy(virDomainPtr dom) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm; + int ret = -1; + qemuDomainObjPrivatePtr priv; + + if (!(vm = qemuDomObjFromDomain(dom))) + goto cleanup; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MIGRATION_OP) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + priv = vm->privateData; + + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("post-copy can only be started " + "while migration is in progress"));
We forbid the use of tabs for indentation. Please, run make syntax-check (in addition to make check) before sending patches to catch this type of issues.
+ goto cleanup; + } + + VIR_DEBUG("Starting post-copy"); + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorMigrateStartPostCopy(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + + endjob: + if (!qemuDomainObjEndJob(driver, vm)) + vm = NULL; + + cleanup: + if (vm) + virObjectUnlock(vm); + return ret; +} + static int qemuDomainAbortJob(virDomainPtr dom) { virQEMUDriverPtr driver = dom->conn->privateData; @@ -18259,6 +18302,7 @@ static virDriver qemuDriver = { .nodeGetFreePages = qemuNodeGetFreePages, /* 1.2.6 */ .connectGetDomainCapabilities = qemuConnectGetDomainCapabilities, /* 1.2.7 */ .connectGetAllDomainStats = qemuConnectGetAllDomainStats, /* 1.2.8 */ + .domainMigrateStartPostCopy = qemuMigrateStartPostCopy, /* 1.2.9 */
This will need to be updated.
and renamed as qemuDomainMigrateStartPostCopy. Jirka

On 2014-09-24 15:06, Jiri Denemark wrote:
On Tue, Sep 23, 2014 at 16:09:59 +0200, Cristian Klein wrote:
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/qemu/qemu_driver.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + src/qemu/qemu_monitor_text.c | 11 +++++++++++ src/qemu/qemu_monitor_text.h | 2 ++
No need to touch qemu_monitor_text.[ch] since we will only use QMP with any QEMU that supports post-copy migration.
7 files changed, 97 insertions(+)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e73d4f9..02c9a3b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11861,6 +11861,49 @@ qemuDomainGetJobStats(virDomainPtr dom, }
+static int qemuMigrateStartPostCopy(virDomainPtr dom) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm; + int ret = -1; + qemuDomainObjPrivatePtr priv; + + if (!(vm = qemuDomObjFromDomain(dom))) + goto cleanup; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MIGRATION_OP) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + priv = vm->privateData; + + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("post-copy can only be started " + "while migration is in progress"));
We forbid the use of tabs for indentation. Please, run make syntax-check (in addition to make check) before sending patches to catch this type of issues.
+ goto cleanup; + } + + VIR_DEBUG("Starting post-copy"); + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorMigrateStartPostCopy(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + + endjob: + if (!qemuDomainObjEndJob(driver, vm)) + vm = NULL; + + cleanup: + if (vm) + virObjectUnlock(vm); + return ret; +} + static int qemuDomainAbortJob(virDomainPtr dom) { virQEMUDriverPtr driver = dom->conn->privateData; @@ -18259,6 +18302,7 @@ static virDriver qemuDriver = { .nodeGetFreePages = qemuNodeGetFreePages, /* 1.2.6 */ .connectGetDomainCapabilities = qemuConnectGetDomainCapabilities, /* 1.2.7 */ .connectGetAllDomainStats = qemuConnectGetAllDomainStats, /* 1.2.8 */ + .domainMigrateStartPostCopy = qemuMigrateStartPostCopy, /* 1.2.9 */
This will need to be updated.
};
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 14688bf..0b230cc 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2384,6 +2384,25 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, return ret; }
+int qemuMonitorMigrateStartPostCopy(qemuMonitorPtr mon) +{ + int ret; + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONMigrateStartPostCopy(mon); + else + ret = qemuMonitorTextMigrateStartPostCopy(mon);
Just return an error if !mon->json as other recent functions in qemu_monitor.c do.
+ return ret; +} + + int qemuMonitorMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 587f779..97d980d 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -539,6 +539,8 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, unsigned int flags, const char *unixfile);
+int qemuMonitorMigrateStartPostCopy(qemuMonitorPtr mon); + int qemuMonitorMigrateCancel(qemuMonitorPtr mon);
int qemuMonitorGetDumpGuestMemoryCapability(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e98962b..14e7f84 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2798,6 +2798,24 @@ int qemuMonitorJSONMigrate(qemuMonitorPtr mon, return ret; }
+int qemuMonitorJSONMigrateStartPostCopy(qemuMonitorPtr mon) +{ + int ret; + virJSONValuePtr cmd = qemuMonitorJSONMakeCommand("migrate-start-postcopy", NULL); + virJSONValuePtr reply = NULL; + if (!cmd) + return -1;
I think the following would be better to avoid the long line:
virJSONValuePtr cmd;
cmd = qemuMonitorJSONMakeCommand("migrate-start-postcopy", NULL); if (!cmd) ...
+ + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index d366179..54beb7f 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -151,6 +151,7 @@ int qemuMonitorJSONGetSpiceMigrationStatus(qemuMonitorPtr mon, bool *spice_migrated);
+int qemuMonitorJSONMigrateStartPostCopy(qemuMonitorPtr mon); int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon);
int qemuMonitorJSONGetDumpGuestMemoryCapability(qemuMonitorPtr mon, ...
Already fixed for v2.
This mostly looks good in isolation but I think this is not going to work. When post-copy is started, QEMU on the destination host will be resumed (I'm not sure if that happens automatically or we have to do it), which basically means we need to jump out of the Perform state and call Finish and once it returns, we should keep waiting for the post-copy migration to finish in Confirm state and kill the domain at the end. It's certainly possible the steps we need to do are a bit different since I'm not familiar with all the details about post-copy migration, but I believe we need to do something. And just running a single QEMU command is not enough to start post-copy in libvirt.
I'm not sure to follow. I tested the patch and it worked well: A VM that was "unmigratable" with pre-copy was successfully migrated through post-copy. Through the migration protocol, once we start post-copy on the source qemu, the following will happen: - source qemu suspends VM and transfer CPU state; - destination qemu resumes the VM. Could you tell me why you think it's necessary to jump out of Perform state? What is libvirt doing when calling Finish that the destination VM requires to function properly? -- Cristian Klein, PhD Post-doc @ Umeå Universitet http://www8.cs.umu.se/~cklein

On Thu, Sep 25, 2014 at 12:00:41 +0200, Cristian KLEIN wrote:
On 2014-09-24 15:06, Jiri Denemark wrote:
This mostly looks good in isolation but I think this is not going to work. When post-copy is started, QEMU on the destination host will be resumed (I'm not sure if that happens automatically or we have to do it), which basically means we need to jump out of the Perform state and call Finish and once it returns, we should keep waiting for the post-copy migration to finish in Confirm state and kill the domain at the end. It's certainly possible the steps we need to do are a bit different since I'm not familiar with all the details about post-copy migration, but I believe we need to do something. And just running a single QEMU command is not enough to start post-copy in libvirt.
I'm not sure to follow. I tested the patch and it worked well: A VM that was "unmigratable" with pre-copy was successfully migrated through post-copy. Through the migration protocol, once we start post-copy on the source qemu, the following will happen:
- source qemu suspends VM and transfer CPU state; - destination qemu resumes the VM.
Hmm, that's a bit unfortunate. I think we will need a way to tell QEMU not to resume the CPU automatically. The process should flow as follows: - libvirt sends migrate-start-postcopy command to QEMU - QEMU suspends the VM and transfers CPU state - QEMU tells us we can resume the destination - libvirt tells the destination QEMU to resume the VM - libvirt waits until migration is done - libvirt kills the source QEMU Perhaps, we could tell the destination QEMU to resume the VM while the source is transferring CPU state if that's allowed by QEMU to minimize downtime.
Could you tell me why you think it's necessary to jump out of Perform state? What is libvirt doing when calling Finish that the destination VM requires to function properly?
The problem is Finish does more than just resuming the VM on the destination. Before resuming the VM, libvirt needs to transfer locks on resources from the source to the destination, it needs to enable networking for the destination QEMU, etc. Without all this, the VM won't be able to really work on the destination. Not to mention that if something fails while the VM is already resumed on the destination, the code in Perform phase would just abort the migration and resume the VM on the source, which is wrong. We need to kill both ends since non of them has the complete state to be able to continue running the VM. BTW, it's going to work in simple cases, when there's no lock daemon in use, only basic Linux bridge support is used, etc., which is why it works just fine for you. But we need to count with all the non-simple cases too. Jirka

On Thu, Sep 25, 2014 at 02:12:24PM +0200, Jiri Denemark wrote:
On Thu, Sep 25, 2014 at 12:00:41 +0200, Cristian KLEIN wrote:
On 2014-09-24 15:06, Jiri Denemark wrote:
This mostly looks good in isolation but I think this is not going to work. When post-copy is started, QEMU on the destination host will be resumed (I'm not sure if that happens automatically or we have to do it), which basically means we need to jump out of the Perform state and call Finish and once it returns, we should keep waiting for the post-copy migration to finish in Confirm state and kill the domain at the end. It's certainly possible the steps we need to do are a bit different since I'm not familiar with all the details about post-copy migration, but I believe we need to do something. And just running a single QEMU command is not enough to start post-copy in libvirt.
I'm not sure to follow. I tested the patch and it worked well: A VM that was "unmigratable" with pre-copy was successfully migrated through post-copy. Through the migration protocol, once we start post-copy on the source qemu, the following will happen:
- source qemu suspends VM and transfer CPU state; - destination qemu resumes the VM.
Hmm, that's a bit unfortunate. I think we will need a way to tell QEMU not to resume the CPU automatically. The process should flow as follows:
- libvirt sends migrate-start-postcopy command to QEMU - QEMU suspends the VM and transfers CPU state - QEMU tells us we can resume the destination - libvirt tells the destination QEMU to resume the VM - libvirt waits until migration is done - libvirt kills the source QEMU
Perhaps, we could tell the destination QEMU to resume the VM while the source is transferring CPU state if that's allowed by QEMU to minimize downtime.
Could you tell me why you think it's necessary to jump out of Perform state? What is libvirt doing when calling Finish that the destination VM requires to function properly?
The problem is Finish does more than just resuming the VM on the destination. Before resuming the VM, libvirt needs to transfer locks on resources from the source to the destination, it needs to enable networking for the destination QEMU, etc. Without all this, the VM won't be able to really work on the destination. Not to mention that if something fails while the VM is already resumed on the destination, the code in Perform phase would just abort the migration and resume the VM on the source, which is wrong. We need to kill both ends since non of them has the complete state to be able to continue running the VM.
BTW, it's going to work in simple cases, when there's no lock daemon in use, only basic Linux bridge support is used, etc., which is why it works just fine for you. But we need to count with all the non-simple cases too.
Yes, having this work correctly with virtlockd and sanlock is really mandatory for including the code. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 2014-09-25 14:20, Daniel P. Berrange wrote:
On Thu, Sep 25, 2014 at 02:12:24PM +0200, Jiri Denemark wrote:
On Thu, Sep 25, 2014 at 12:00:41 +0200, Cristian KLEIN wrote:
On 2014-09-24 15:06, Jiri Denemark wrote:
This mostly looks good in isolation but I think this is not going to work. When post-copy is started, QEMU on the destination host will be resumed (I'm not sure if that happens automatically or we have to do it), which basically means we need to jump out of the Perform state and call Finish and once it returns, we should keep waiting for the post-copy migration to finish in Confirm state and kill the domain at the end. It's certainly possible the steps we need to do are a bit different since I'm not familiar with all the details about post-copy migration, but I believe we need to do something. And just running a single QEMU command is not enough to start post-copy in libvirt.
I'm not sure to follow. I tested the patch and it worked well: A VM that was "unmigratable" with pre-copy was successfully migrated through post-copy. Through the migration protocol, once we start post-copy on the source qemu, the following will happen:
- source qemu suspends VM and transfer CPU state; - destination qemu resumes the VM.
Hmm, that's a bit unfortunate. I think we will need a way to tell QEMU not to resume the CPU automatically. The process should flow as follows:
- libvirt sends migrate-start-postcopy command to QEMU - QEMU suspends the VM and transfers CPU state - QEMU tells us we can resume the destination - libvirt tells the destination QEMU to resume the VM - libvirt waits until migration is done - libvirt kills the source QEMU
Perhaps, we could tell the destination QEMU to resume the VM while the source is transferring CPU state if that's allowed by QEMU to minimize downtime.
Could you tell me why you think it's necessary to jump out of Perform state? What is libvirt doing when calling Finish that the destination VM requires to function properly?
The problem is Finish does more than just resuming the VM on the destination. Before resuming the VM, libvirt needs to transfer locks on resources from the source to the destination, it needs to enable networking for the destination QEMU, etc. Without all this, the VM won't be able to really work on the destination. Not to mention that if something fails while the VM is already resumed on the destination, the code in Perform phase would just abort the migration and resume the VM on the source, which is wrong. We need to kill both ends since non of them has the complete state to be able to continue running the VM.
BTW, it's going to work in simple cases, when there's no lock daemon in use, only basic Linux bridge support is used, etc., which is why it works just fine for you. But we need to count with all the non-simple cases too.
Yes, having this work correctly with virtlockd and sanlock is really mandatory for including the code.
Thanks for pointing this out. (I had a feeling I was missing something.) I'll study the libvirt code and see how this could be nicely integrated. -- Cristian Klein, PhD Post-doc @ Umeå Universitet http://www8.cs.umu.se/~cklein

* Jiri Denemark (jdenemar@redhat.com) wrote:
On Thu, Sep 25, 2014 at 12:00:41 +0200, Cristian KLEIN wrote:
On 2014-09-24 15:06, Jiri Denemark wrote:
This mostly looks good in isolation but I think this is not going to work. When post-copy is started, QEMU on the destination host will be resumed (I'm not sure if that happens automatically or we have to do it), which basically means we need to jump out of the Perform state and call Finish and once it returns, we should keep waiting for the post-copy migration to finish in Confirm state and kill the domain at the end. It's certainly possible the steps we need to do are a bit different since I'm not familiar with all the details about post-copy migration, but I believe we need to do something. And just running a single QEMU command is not enough to start post-copy in libvirt.
I'm not sure to follow. I tested the patch and it worked well: A VM that was "unmigratable" with pre-copy was successfully migrated through post-copy. Through the migration protocol, once we start post-copy on the source qemu, the following will happen:
- source qemu suspends VM and transfer CPU state; - destination qemu resumes the VM.
Hmm, that's a bit unfortunate. I think we will need a way to tell QEMU not to resume the CPU automatically. The process should flow as follows:
- libvirt sends migrate-start-postcopy command to QEMU - QEMU suspends the VM and transfers CPU state - QEMU tells us we can resume the destination - libvirt tells the destination QEMU to resume the VM - libvirt waits until migration is done - libvirt kills the source QEMU
The destination QEMU should behave the same way as precopy does; i.e. if you run the qemu with -S it should pause rather than start the CPU. If it doesn't it's a bug I can fight (I did test it a while ago, and I think I'm using approximately the same code as precopy to do it). The only difference is with postcopy that point happens way before the migration has finished. Dave
Perhaps, we could tell the destination QEMU to resume the VM while the source is transferring CPU state if that's allowed by QEMU to minimize downtime.
Could you tell me why you think it's necessary to jump out of Perform state? What is libvirt doing when calling Finish that the destination VM requires to function properly?
The problem is Finish does more than just resuming the VM on the destination. Before resuming the VM, libvirt needs to transfer locks on resources from the source to the destination, it needs to enable networking for the destination QEMU, etc. Without all this, the VM won't be able to really work on the destination. Not to mention that if something fails while the VM is already resumed on the destination, the code in Perform phase would just abort the migration and resume the VM on the source, which is wrong. We need to kill both ends since non of them has the complete state to be able to continue running the VM.
BTW, it's going to work in simple cases, when there's no lock daemon in use, only basic Linux bridge support is used, etc., which is why it works just fine for you. But we need to count with all the non-simple cases too.
Jirka -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 75a3a7b..9a6a974 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8152,6 +8152,7 @@ static virDriver remote_driver = { .nodeGetFreePages = remoteNodeGetFreePages, /* 1.2.6 */ .connectGetDomainCapabilities = remoteConnectGetDomainCapabilities, /* 1.2.7 */ .connectGetAllDomainStats = remoteConnectGetAllDomainStats, /* 1.2.8 */ + .domainMigrateStartPostCopy = remoteDomainMigrateStartPostCopy, /* 1.2.9 */ }; static virNetworkDriver network_driver = { diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index a4ca0c3..c443636 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3090,6 +3090,10 @@ struct remote_connect_get_all_domain_stats_args { struct remote_connect_get_all_domain_stats_ret { remote_domain_stats_record retStats<REMOTE_DOMAIN_LIST_MAX>; }; + +struct remote_domain_migrate_start_post_copy_args { + remote_nonnull_domain dom; +}; /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -5472,5 +5476,11 @@ enum remote_procedure { * @generate: both * @acl: domain:block_write */ - REMOTE_PROC_DOMAIN_BLOCK_COPY = 345 + REMOTE_PROC_DOMAIN_BLOCK_COPY = 345, + + /** + * @generate: both + * @acl: domain:migrate + */ + REMOTE_PROC_DOMAIN_MIGRATE_START_POST_COPY = 346 }; -- 1.9.1

On Tue, Sep 23, 2014 at 16:10:00 +0200, Cristian Klein wrote:
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +++++++++++- 2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 75a3a7b..9a6a974 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8152,6 +8152,7 @@ static virDriver remote_driver = { .nodeGetFreePages = remoteNodeGetFreePages, /* 1.2.6 */ .connectGetDomainCapabilities = remoteConnectGetDomainCapabilities, /* 1.2.7 */ .connectGetAllDomainStats = remoteConnectGetAllDomainStats, /* 1.2.8 */ + .domainMigrateStartPostCopy = remoteDomainMigrateStartPostCopy, /* 1.2.9 */ };
static virNetworkDriver network_driver = { diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index a4ca0c3..c443636 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3090,6 +3090,10 @@ struct remote_connect_get_all_domain_stats_args { struct remote_connect_get_all_domain_stats_ret { remote_domain_stats_record retStats<REMOTE_DOMAIN_LIST_MAX>; }; + +struct remote_domain_migrate_start_post_copy_args { + remote_nonnull_domain dom; +}; /*----- Protocol. -----*/
/* Define the program number, protocol version and procedure numbers here. */ @@ -5472,5 +5476,11 @@ enum remote_procedure { * @generate: both * @acl: domain:block_write */ - REMOTE_PROC_DOMAIN_BLOCK_COPY = 345 + REMOTE_PROC_DOMAIN_BLOCK_COPY = 345, + + /** + * @generate: both + * @acl: domain:migrate + */ + REMOTE_PROC_DOMAIN_MIGRATE_START_POST_COPY = 346 };
This is pretty straightforward and you could have just squashed this patch into "Added new public API virDomainMigrateStartPostCopy". Jirka

Added a new parameter to the `virsh migrate` command to switch to post-copy migration after a given timeout. Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- tools/virsh-domain.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 5 ++++ 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index a6ced5f..1bba710 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9250,6 +9250,10 @@ static const vshCmdOptDef opts_migrate[] = { .type = VSH_OT_INT, .help = N_("force guest to suspend if live migration exceeds timeout (in seconds)") }, + {.name = "postcopy-after", + .type = VSH_OT_INT, + .help = N_("switch to post-copy migration if live migration exceeds timeout (in seconds)") + }, {.name = "xml", .type = VSH_OT_STRING, .help = N_("filename containing updated XML for the target") @@ -9331,6 +9335,8 @@ doMigrate(void *opaque) VIR_FREE(xml); } + if (vshCommandOptBool(cmd, "postcopy-after")) /* actually an int */ + flags |= VIR_MIGRATE_POSTCOPY; if (vshCommandOptBool(cmd, "live")) flags |= VIR_MIGRATE_LIVE; if (vshCommandOptBool(cmd, "p2p")) @@ -9422,6 +9428,50 @@ vshMigrationTimeout(vshControl *ctl, virDomainSuspend(dom); } +static void +vshMigrationPostCopyAfter(vshControl *ctl, + virDomainPtr dom, + void *opaque ATTRIBUTE_UNUSED) +{ + vshDebug(ctl, VSH_ERR_DEBUG, "starting post-copy\n"); + int rv = virDomainMigrateStartPostCopy(dom); + if (rv < 0) { + vshError(ctl, "%s", _("start post-copy command failed")); + } else { + vshDebug(ctl, VSH_ERR_INFO, "switched to post-copy\n"); + } +} + +/* Parse the --postcopy-after parameter in seconds, but store its + * value in milliseconds. Return -1 on error, 0 if + * no timeout was requested, and 1 if timeout was set. + * Copy-paste-adapted from vshCommandOptTimeoutToMs. + */ +static int +vshCommandOptPostCopyAfterToMs(vshControl *ctl, const vshCmd *cmd, int *postCopyAfter) +{ + int rv = vshCommandOptInt(cmd, "postcopy-after", postCopyAfter); + + if (rv < 0 || (rv > 0 && *postCopyAfter < 0)) { + vshError(ctl, "%s", _("invalid postcopy-after parameter")); + return -1; + } + if (rv > 0) { + /* Ensure that we can multiply by 1000 without overflowing. */ + if (*postCopyAfter > INT_MAX / 1000) { + vshError(ctl, "%s", _("post-copy after parameter is too big")); + return -1; + } + *postCopyAfter *= 1000; + /* 0 is a special value inside virsh, which means no timeout, so + * use 1ms instead for "start post-copy immediately" + */ + if (*postCopyAfter == 0) + *postCopyAfter = 1; + } + return rv; +} + static bool cmdMigrate(vshControl *ctl, const vshCmd *cmd) { @@ -9431,6 +9481,7 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) bool verbose = false; bool functionReturn = false; int timeout = 0; + int postCopyAfter = 0; bool live_flag = false; vshCtrlData data = { .dconn = NULL }; @@ -9450,6 +9501,18 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) goto cleanup; } + if (vshCommandOptPostCopyAfterToMs(ctl, cmd, &postCopyAfter) < 0) { + goto cleanup; + } else if (postCopyAfter > 0 && !live_flag) { + vshError(ctl, "%s", + _("migrate: Unexpected postcopy-after for offline migration")); + goto cleanup; + } else if (postCopyAfter > 0 && timeout > 0) { + vshError(ctl, "%s", + _("migrate: --postcopy-after is incompatible with --timeout")); + goto cleanup; + } + if (pipe(p) < 0) goto cleanup; @@ -9479,8 +9542,13 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) doMigrate, &data) < 0) goto cleanup; - functionReturn = vshWatchJob(ctl, dom, verbose, p[0], timeout, - vshMigrationTimeout, NULL, _("Migration")); + if (postCopyAfter != 0) { + functionReturn = vshWatchJob(ctl, dom, verbose, p[0], postCopyAfter, + vshMigrationPostCopyAfter, NULL, _("Migration")); + } else { + functionReturn = vshWatchJob(ctl, dom, verbose, p[0], timeout, + vshMigrationTimeout, NULL, _("Migration")); + } virThreadJoin(&workerThread); diff --git a/tools/virsh.pod b/tools/virsh.pod index 9919f92..3947720 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -1392,6 +1392,7 @@ to the I<uri> namespace is displayed instead of being modified. [I<--compressed>] [I<--abort-on-error>] [I<--auto-converge>] I<domain> I<desturi> [I<migrateuri>] [I<graphicsuri>] [I<listen-address>] [I<dname>] [I<--timeout> B<seconds>] [I<--xml> B<file>] +[I<--postcopy-after> B<seconds>] Migrate domain to another host. Add I<--live> for live migration; <--p2p> for peer-2-peer migration; I<--direct> for direct migration; or I<--tunnelled> @@ -1439,6 +1440,10 @@ I<--timeout> B<seconds> forces guest to suspend when live migration exceeds that many seconds, and then the migration will complete offline. It can only be used with I<--live>. +I<--postcopy-after> switches to post-copy migration when pre-copy migration +exceeds that many seconds. Zero means start post-copy as soon as possible. +It can only be used with I<--live>. + Running migration can be canceled by interrupting virsh (usually using C<Ctrl-C>) or by B<domjobabort> command sent from another virsh instance. -- 1.9.1

On Tue, Sep 23, 2014 at 16:10:01 +0200, Cristian Klein wrote:
Added a new parameter to the `virsh migrate` command to switch to post-copy migration after a given timeout.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- tools/virsh-domain.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 5 ++++ 2 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index a6ced5f..1bba710 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9250,6 +9250,10 @@ static const vshCmdOptDef opts_migrate[] = { .type = VSH_OT_INT, .help = N_("force guest to suspend if live migration exceeds timeout (in seconds)") }, + {.name = "postcopy-after", + .type = VSH_OT_INT, + .help = N_("switch to post-copy migration if live migration exceeds timeout (in seconds)") + },
I think we want both postcopy and postcopy-after options. And we should also add a dedicated migrate-startpostcopy command.
{.name = "xml", .type = VSH_OT_STRING, .help = N_("filename containing updated XML for the target") @@ -9331,6 +9335,8 @@ doMigrate(void *opaque) VIR_FREE(xml); }
+ if (vshCommandOptBool(cmd, "postcopy-after")) /* actually an int */ + flags |= VIR_MIGRATE_POSTCOPY; if (vshCommandOptBool(cmd, "live")) flags |= VIR_MIGRATE_LIVE; if (vshCommandOptBool(cmd, "p2p")) @@ -9422,6 +9428,50 @@ vshMigrationTimeout(vshControl *ctl, virDomainSuspend(dom); }
+static void +vshMigrationPostCopyAfter(vshControl *ctl, + virDomainPtr dom, + void *opaque ATTRIBUTE_UNUSED) +{ + vshDebug(ctl, VSH_ERR_DEBUG, "starting post-copy\n"); + int rv = virDomainMigrateStartPostCopy(dom); + if (rv < 0) { + vshError(ctl, "%s", _("start post-copy command failed")); + } else { + vshDebug(ctl, VSH_ERR_INFO, "switched to post-copy\n"); + } +} + +/* Parse the --postcopy-after parameter in seconds, but store its + * value in milliseconds. Return -1 on error, 0 if + * no timeout was requested, and 1 if timeout was set. + * Copy-paste-adapted from vshCommandOptTimeoutToMs. + */ +static int +vshCommandOptPostCopyAfterToMs(vshControl *ctl, const vshCmd *cmd, int *postCopyAfter) +{ + int rv = vshCommandOptInt(cmd, "postcopy-after", postCopyAfter); + + if (rv < 0 || (rv > 0 && *postCopyAfter < 0)) { + vshError(ctl, "%s", _("invalid postcopy-after parameter")); + return -1; + } + if (rv > 0) { + /* Ensure that we can multiply by 1000 without overflowing. */ + if (*postCopyAfter > INT_MAX / 1000) { + vshError(ctl, "%s", _("post-copy after parameter is too big")); + return -1; + } + *postCopyAfter *= 1000; + /* 0 is a special value inside virsh, which means no timeout, so + * use 1ms instead for "start post-copy immediately" + */ + if (*postCopyAfter == 0) + *postCopyAfter = 1; + } + return rv; +} + static bool cmdMigrate(vshControl *ctl, const vshCmd *cmd) { @@ -9431,6 +9481,7 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) bool verbose = false; bool functionReturn = false; int timeout = 0; + int postCopyAfter = 0; bool live_flag = false; vshCtrlData data = { .dconn = NULL };
@@ -9450,6 +9501,18 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) goto cleanup; }
+ if (vshCommandOptPostCopyAfterToMs(ctl, cmd, &postCopyAfter) < 0) { + goto cleanup; + } else if (postCopyAfter > 0 && !live_flag) { + vshError(ctl, "%s", + _("migrate: Unexpected postcopy-after for offline migration")); + goto cleanup; + } else if (postCopyAfter > 0 && timeout > 0) { + vshError(ctl, "%s", + _("migrate: --postcopy-after is incompatible with --timeout")); + goto cleanup; + } +
Is post-copy only allowed for live migrations? I know it doesn't make much sense otherwise but I'm just checking... Anyway, this check belongs to qemu driver rather than virsh so that any API user gets the error message. Jirka

On 2014-09-24 17:19, Jiri Denemark wrote:
On Tue, Sep 23, 2014 at 16:10:01 +0200, Cristian Klein wrote:
Added a new parameter to the `virsh migrate` command to switch to post-copy migration after a given timeout.
Signed-off-by: Cristian Klein <cristian.klein@cs.umu.se> --- tools/virsh-domain.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 5 ++++ 2 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index a6ced5f..1bba710 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9250,6 +9250,10 @@ static const vshCmdOptDef opts_migrate[] = { .type = VSH_OT_INT, .help = N_("force guest to suspend if live migration exceeds timeout (in seconds)") }, + {.name = "postcopy-after", + .type = VSH_OT_INT, + .help = N_("switch to post-copy migration if live migration exceeds timeout (in seconds)") + },
I think we want both postcopy and postcopy-after options. And we should also add a dedicated migrate-startpostcopy command.
Just to make sure I understood correctly, the "postcopy" option would enable post-copy without starting it, then the user would need to type "migrate-startpostcopy" to trigger post-copy. If so, then I'm a bit reluctant to such a usage, as the user would need to type "migrate-startpostcopy" from a second instance of virsh, the first one being busy displaying migration progress. Do you think users would find favour/need such a usage?
{.name = "xml", .type = VSH_OT_STRING, .help = N_("filename containing updated XML for the target") @@ -9331,6 +9335,8 @@ doMigrate(void *opaque) VIR_FREE(xml); }
+ if (vshCommandOptBool(cmd, "postcopy-after")) /* actually an int */ + flags |= VIR_MIGRATE_POSTCOPY; if (vshCommandOptBool(cmd, "live")) flags |= VIR_MIGRATE_LIVE; if (vshCommandOptBool(cmd, "p2p")) @@ -9422,6 +9428,50 @@ vshMigrationTimeout(vshControl *ctl, virDomainSuspend(dom); }
+static void +vshMigrationPostCopyAfter(vshControl *ctl, + virDomainPtr dom, + void *opaque ATTRIBUTE_UNUSED) +{ + vshDebug(ctl, VSH_ERR_DEBUG, "starting post-copy\n"); + int rv = virDomainMigrateStartPostCopy(dom); + if (rv < 0) { + vshError(ctl, "%s", _("start post-copy command failed")); + } else { + vshDebug(ctl, VSH_ERR_INFO, "switched to post-copy\n"); + } +} + +/* Parse the --postcopy-after parameter in seconds, but store its + * value in milliseconds. Return -1 on error, 0 if + * no timeout was requested, and 1 if timeout was set. + * Copy-paste-adapted from vshCommandOptTimeoutToMs. + */ +static int +vshCommandOptPostCopyAfterToMs(vshControl *ctl, const vshCmd *cmd, int *postCopyAfter) +{ + int rv = vshCommandOptInt(cmd, "postcopy-after", postCopyAfter); + + if (rv < 0 || (rv > 0 && *postCopyAfter < 0)) { + vshError(ctl, "%s", _("invalid postcopy-after parameter")); + return -1; + } + if (rv > 0) { + /* Ensure that we can multiply by 1000 without overflowing. */ + if (*postCopyAfter > INT_MAX / 1000) { + vshError(ctl, "%s", _("post-copy after parameter is too big")); + return -1; + } + *postCopyAfter *= 1000; + /* 0 is a special value inside virsh, which means no timeout, so + * use 1ms instead for "start post-copy immediately" + */ + if (*postCopyAfter == 0) + *postCopyAfter = 1; + } + return rv; +} + static bool cmdMigrate(vshControl *ctl, const vshCmd *cmd) { @@ -9431,6 +9481,7 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) bool verbose = false; bool functionReturn = false; int timeout = 0; + int postCopyAfter = 0; bool live_flag = false; vshCtrlData data = { .dconn = NULL };
@@ -9450,6 +9501,18 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) goto cleanup; }
+ if (vshCommandOptPostCopyAfterToMs(ctl, cmd, &postCopyAfter) < 0) { + goto cleanup; + } else if (postCopyAfter > 0 && !live_flag) { + vshError(ctl, "%s", + _("migrate: Unexpected postcopy-after for offline migration")); + goto cleanup; + } else if (postCopyAfter > 0 && timeout > 0) { + vshError(ctl, "%s", + _("migrate: --postcopy-after is incompatible with --timeout")); + goto cleanup; + } +
Is post-copy only allowed for live migrations? I know it doesn't make much sense otherwise but I'm just checking... Anyway, this check belongs to qemu driver rather than virsh so that any API user gets the error message.
Yes, post-copy only makes sense for live migrations. I'll add a check in the qemu driver too, but suggest keeping the above code for coherence with the "--timeout" option. -- Cristian Klein, PhD Post-doc @ Umeå Universitet http://www8.cs.umu.se/~cklein

On 09/23/2014 08:09 AM, Cristian Klein wrote:
Qemu currently implements pre-copy live migration. VM memory pages are first copied from the source hypervisor to the destination, potentially multiple times as pages get dirtied during transfer, then VCPU state is migrated. Unfortunately, if the VM dirties memory faster than the network bandwidth, then pre-copy cannot finish. `virsh` currently includes an option to suspend a VM after a timeout, so that migration may finish, but at the expense of downtime.
A future version of qemu will implement post-copy live migration. The VCPU state is first migrated to the destination hypervisor, then memory pages are pulled from the source hypervisor. Post-copy has the potential to do migration with zero-downtime, despite the VM dirtying pages fast, with minimum performance impact. On the other hand, one post-copy is in progress, any network failure would render the VM unusable, as its memory is partitioned between the source and destination hypervisor. Therefore, post-copy should only be used when necessary.
Thanks for tackling the proof of concept patches.
Post-copy migration in qemu will work as follows: (1) The `x-postcopy-ram` migration capability needs to be set.
This right here is a problem. Qemu has explicitly documented that anything beginning with x- may be subject to change or going away in a later release, so libvirt policy has been to delay any patches targetting an x- interface until upstream qemu renames it to drop the x- to signify that it is now stable. So we can review your patches, but can't apply them yet. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

* Eric Blake (eblake@redhat.com) wrote:
On 09/23/2014 08:09 AM, Cristian Klein wrote:
Qemu currently implements pre-copy live migration. VM memory pages are first copied from the source hypervisor to the destination, potentially multiple times as pages get dirtied during transfer, then VCPU state is migrated. Unfortunately, if the VM dirties memory faster than the network bandwidth, then pre-copy cannot finish. `virsh` currently includes an option to suspend a VM after a timeout, so that migration may finish, but at the expense of downtime.
A future version of qemu will implement post-copy live migration. The VCPU state is first migrated to the destination hypervisor, then memory pages are pulled from the source hypervisor. Post-copy has the potential to do migration with zero-downtime, despite the VM dirtying pages fast, with minimum performance impact. On the other hand, one post-copy is in progress, any network failure would render the VM unusable, as its memory is partitioned between the source and destination hypervisor. Therefore, post-copy should only be used when necessary.
Thanks for tackling the proof of concept patches.
Post-copy migration in qemu will work as follows: (1) The `x-postcopy-ram` migration capability needs to be set.
This right here is a problem. Qemu has explicitly documented that anything beginning with x- may be subject to change or going away in a later release, so libvirt policy has been to delay any patches targetting an x- interface until upstream qemu renames it to drop the x- to signify that it is now stable. So we can review your patches, but can't apply them yet.
It is a shame that libvirt doesn't have a similar mechanism which could be used in conjunction with qemu. My reason for currently leaving the x- in place is that while I don't expect the QML side to change, it seemed fair to put a new feature into QEMU without locking it down for the first cut. This seems to be normal practice in QEMU but invariably means that libvirt support is delayed. Dave
-- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
-- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

On 09/24/2014 02:09 PM, Dr. David Alan Gilbert wrote:
This right here is a problem. Qemu has explicitly documented that anything beginning with x- may be subject to change or going away in a later release, so libvirt policy has been to delay any patches targetting an x- interface until upstream qemu renames it to drop the x- to signify that it is now stable. So we can review your patches, but can't apply them yet.
It is a shame that libvirt doesn't have a similar mechanism which could be used in conjunction with qemu. My reason for currently leaving the x- in place is that while I don't expect the QML side to change, it seemed fair to put a new feature into QEMU without locking it down for the first cut. This seems to be normal practice in QEMU but invariably means that libvirt support is delayed.
Libvirt DOES have the ability to issue raw monitor commands through libvirt-qemu.so, and that might be sufficient for coding up the required hacks. Or, if the libvirt proof of concept patches here do the job, we can feed that back to the qemu community as proof that the feature works enough to lose the x- prefix right away, and commit ourselves to the interface. The benefit of a proof-of-concept patch done in parallel is that the moment the upstream qemu feature is no longer experimental, libvirt can apply the patches right away, and downstream distros can backport those packages in situations where .so stability is not violated. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

* Eric Blake (eblake@redhat.com) wrote:
On 09/24/2014 02:09 PM, Dr. David Alan Gilbert wrote:
This right here is a problem. Qemu has explicitly documented that anything beginning with x- may be subject to change or going away in a later release, so libvirt policy has been to delay any patches targetting an x- interface until upstream qemu renames it to drop the x- to signify that it is now stable. So we can review your patches, but can't apply them yet.
It is a shame that libvirt doesn't have a similar mechanism which could be used in conjunction with qemu. My reason for currently leaving the x- in place is that while I don't expect the QML side to change, it seemed fair to put a new feature into QEMU without locking it down for the first cut. This seems to be normal practice in QEMU but invariably means that libvirt support is delayed.
Libvirt DOES have the ability to issue raw monitor commands through libvirt-qemu.so, and that might be sufficient for coding up the required hacks.
Not really.
Or, if the libvirt proof of concept patches here do the job, we can feed that back to the qemu community as proof that the feature works enough to lose the x- prefix right away, and commit ourselves to the interface. The benefit of a proof-of-concept patch done in parallel is that the moment the upstream qemu feature is no longer experimental, libvirt can apply the patches right away, and downstream distros can backport those packages in situations where .so stability is not violated.
My difficulty here is that 'x-' is used to convey multiple things in qemu: 1) That the interface is unstable 2) That the feature is unstable and might disappear/change 3) That the feature is new and might change internally While we're reasonably happy with (1) and (2) I did want to be able to indicate (3) so that the onwire migration format of the postcopy wasn't locked in yet; however given that the x- is primarily intended to indicate interface stability, and we seem happy with that, I think I'll try dropping the x- and see what qemu-devel says when I try it. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
participants (6)
-
Cristian KLEIN
-
Cristian Klein
-
Daniel P. Berrange
-
Dr. David Alan Gilbert
-
Eric Blake
-
Jiri Denemark