[libvirt] [PATCH 00/11] Post-Copy Live Migration Support

Qemu currently implements pre-copy live migration. VM memory pages are first copied from the source hypervisor to the destination, potentially multiple times as pages get dirtied during transfer, then VCPU state is migrated. Unfortunately, if the VM dirties memory faster than the network bandwidth, then pre-copy cannot finish. `virsh` currently includes an option to suspend a VM after a timeout, so that migration may finish, but at the expense of downtime. A future version of qemu will implement post-copy live migration. The VCPU state is first migrated to the destination hypervisor, then memory pages are pulled from the source hypervisor. Post-copy has the potential to do migration with zero-downtime, despite the VM dirtying pages fast, with minimum performance impact. On the other hand, while post-copy is in progress, any network failure would render the VM unusable, as its memory is partitioned between the source and destination hypervisor. Therefore, post-copy should only be used when necessary. Post-copy migration in qemu will work as follows: (1) The `x-postcopy-ram` migration capability needs to be set. (2) Migration is started. (3) When the user decides so, post-copy migration is activated by sending the `migrate-start-postcopy` command. (4) Qemu acknowledges by setting migration status to `postcopy-active`. This patch series implements two ways to access post-copy functionality: low-level and high-level. The low-level API implements a mechanism that basically requires the libvirt user to manually go through the above steps, by calling migration with `VIR_MIGRATE_ENABLE_POSTCOPY`, then during migration from a separate thread, call `virDomainMigrateStartPostCopy`. The choice of when migration should switch from pre-copy to post-copy is left entirely to the user. The high-level API implements a policy that automatically triggers post-copy after one pass of pre-copy, which experiments have shown to minimize downtime. Using it is also simpler: the user only has to call migration with `VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY`. TODO: - Wait for qemu API to become stable, i.e., drop `x-` - Wait for qemu to offer notification for migration state change v4: - Rename low-level API flag to `VIR_MIGRATE_ENABLE_POSTCOPY` - Added high-level API flag `VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY` - Do not introduce a new job type and use migration status instead - Added both low- and high-level interface to virsh - Tested with OpenStack Icehouse Cristian Klein (11): Added public API for post-copy migration qemu: added low-level post-copy migration functions qemu: implemented VIR_MIGRATE_ENABLE_POSTCOPY qemu: implemented post-copy migration logic qemu: implement virDomainMigrateStartPostCopy virsh: added --enable-postcopy and migrate-start-postcopy virsh: added --postcopy-after to migrate command qemu: retrieve dirty sync count qemu: implemented VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY virsh: added --postcopy-after-precopy to migrate Revert "Do not allow changing the UUID of a nwfilter" include/libvirt/libvirt-domain.h | 5 ++ src/conf/nwfilter_conf.c | 11 --- src/driver-hypervisor.h | 5 ++ src/libvirt-domain.c | 92 +++++++++++++++++++++ src/libvirt_public.syms | 1 + src/qemu/qemu_driver.c | 60 ++++++++++++++ src/qemu/qemu_migration.c | 169 +++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_migration.h | 4 +- src/qemu/qemu_monitor.c | 24 +++++- src/qemu/qemu_monitor.h | 5 ++ src/qemu/qemu_monitor_json.c | 27 ++++++- src/qemu/qemu_monitor_json.h | 1 + src/qemu/qemu_monitor_text.c | 1 + src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 ++- src/remote_protocol-structs | 5 ++ tests/qemumonitorjsontest.c | 1 + tools/virsh-domain.c | 116 ++++++++++++++++++++++++++- tools/virsh.pod | 21 +++++ 19 files changed, 536 insertions(+), 25 deletions(-) -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- include/libvirt/libvirt-domain.h | 5 +++ src/driver-hypervisor.h | 5 +++ src/libvirt-domain.c | 92 ++++++++++++++++++++++++++++++++++++++++ src/libvirt_public.syms | 1 + src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +++++- src/remote_protocol-structs | 5 +++ 7 files changed, 120 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h index ae2c49c..090d97a 100644 --- a/include/libvirt/libvirt-domain.h +++ b/include/libvirt/libvirt-domain.h @@ -634,6 +634,8 @@ typedef enum { VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration on I/O errors happened during migration */ VIR_MIGRATE_AUTO_CONVERGE = (1 << 13), /* force convergence */ VIR_MIGRATE_RDMA_PIN_ALL = (1 << 14), /* RDMA memory pinning */ + VIR_MIGRATE_ENABLE_POSTCOPY = (1 << 15), /* enable (but do not start) post-copy */ + VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY = (1 << 16), /* start post-copy after the first pass of pre-copy */ } virDomainMigrateFlags; @@ -773,6 +775,9 @@ int virDomainMigrateGetMaxSpeed(virDomainPtr domain, unsigned long *bandwidth, unsigned int flags); +int virDomainMigrateStartPostCopy (virDomainPtr domain, + unsigned int flags); + char * virConnectGetDomainCapabilities(virConnectPtr conn, const char *emulatorbin, const char *arch, diff --git a/src/driver-hypervisor.h b/src/driver-hypervisor.h index 9f26b13..a642dea 100644 --- a/src/driver-hypervisor.h +++ b/src/driver-hypervisor.h @@ -613,6 +613,10 @@ typedef int const char *dom_xml); typedef int +(*virDrvDomainMigrateStartPostCopy)(virDomainPtr domain, + unsigned int flags); + +typedef int (*virDrvConnectIsEncrypted)(virConnectPtr conn); typedef int @@ -1396,6 +1400,7 @@ struct _virHypervisorDriver { virDrvConnectGetAllDomainStats connectGetAllDomainStats; virDrvNodeAllocPages nodeAllocPages; virDrvDomainGetFSInfo domainGetFSInfo; + virDrvDomainMigrateStartPostCopy domainMigrateStartPostCopy; }; diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 2b0defc..f29af21 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -3500,6 +3500,9 @@ virDomainMigrateDirect(virDomainPtr domain, * automatically when supported). * VIR_MIGRATE_UNSAFE Force migration even if it is considered unsafe. * VIR_MIGRATE_OFFLINE Migrate offline + * VIR_MIGRATE_ENABLE_POSTCOPY Enable (but do not start) post-copy + * VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY Start post-copy after the first pass + * of pre-copy * * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set. * Applications using the VIR_MIGRATE_PEER2PEER flag will probably @@ -3536,6 +3539,17 @@ virDomainMigrateDirect(virDomainPtr domain, * not support this feature and will return an error if bandwidth * is not 0. * + * If you want to do post-copy migration, you have two choices: + * either use the low-level mechanism provided by libvirt, or its + * default policy. To use the low-level mechanism, you must first enable + * post-copy migration using the VIR_MIGRATE_ENABLE_POSTCOPY flag. Once + * migration is active, from a separate thread, you may start post-copy + * by calling virDomainMigrateStartPostCopy. + * + * The default post-copy policy implemented in libvirt is to start + * post-copy after the first pass of pre-copy. To enable this behaviour + * start migration with the VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY flag. + * * To see which features are supported by the current hypervisor, * see virConnectGetCapabilities, /capabilities/host/migration_features. * @@ -3716,6 +3730,8 @@ virDomainMigrate(virDomainPtr domain, * automatically when supported). * VIR_MIGRATE_UNSAFE Force migration even if it is considered unsafe. * VIR_MIGRATE_OFFLINE Migrate offline + * VIR_MIGRATE_ENABLE_POSTCOPY Enable (but do not start) post-copy + * VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY Start post-copy after the first pass of pre-copy * * VIR_MIGRATE_TUNNELLED requires that VIR_MIGRATE_PEER2PEER be set. * Applications using the VIR_MIGRATE_PEER2PEER flag will probably @@ -3752,6 +3768,17 @@ virDomainMigrate(virDomainPtr domain, * not support this feature and will return an error if bandwidth * is not 0. * + * If you want to do post-copy migration, you have two choices: + * either use the low-level mechanism provided by libvirt, or its + * default policy. To use the low-level mechanism, you must first enable + * post-copy migration using the VIR_MIGRATE_ENABLE_POSTCOPY flag. Once + * migration is active, from a separate thread, you may start post-copy + * by calling virDomainMigrateStartPostCopy. + * + * The default post-copy policy implemented in libvirt is to start + * post-copy after the first pass of pre-copy. To enable this behaviour + * start migration with the VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY flag. + * * To see which features are supported by the current hypervisor, * see virConnectGetCapabilities, /capabilities/host/migration_features. * @@ -4147,6 +4174,8 @@ virDomainMigrate3(virDomainPtr domain, * automatically when supported). * VIR_MIGRATE_UNSAFE Force migration even if it is considered unsafe. * VIR_MIGRATE_OFFLINE Migrate offline + * VIR_MIGRATE_ENABLE_POSTCOPY Enable (but do not start) post-copy + * VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY Start post-copy after the first pass of pre-copy * * The operation of this API hinges on the VIR_MIGRATE_PEER2PEER flag. * If the VIR_MIGRATE_PEER2PEER flag is NOT set, the duri parameter @@ -4179,6 +4208,17 @@ virDomainMigrate3(virDomainPtr domain, * not support this feature and will return an error if bandwidth * is not 0. * + * If you want to do post-copy migration, you have two choices: + * either use the low-level mechanism provided by libvirt, or its + * default policy. To use the low-level mechanism, you must first enable + * post-copy migration using the VIR_MIGRATE_ENABLE_POSTCOPY flag. Once + * migration is active, from a separate thread, you may start post-copy + * by calling virDomainMigrateStartPostCopy. + * + * The default post-copy policy implemented in libvirt is to start + * post-copy after the first pass of pre-copy. To enable this behaviour + * start migration with the VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY flag. + * * To see which features are supported by the current hypervisor, * see virConnectGetCapabilities, /capabilities/host/migration_features. * @@ -4292,6 +4332,8 @@ virDomainMigrateToURI(virDomainPtr domain, * automatically when supported). * VIR_MIGRATE_UNSAFE Force migration even if it is considered unsafe. * VIR_MIGRATE_OFFLINE Migrate offline + * VIR_MIGRATE_ENABLE_POSTCOPY Enable (but do not start) post-copy + * VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY Start post-copy after the first pass of pre-copy * * The operation of this API hinges on the VIR_MIGRATE_PEER2PEER flag. * @@ -4334,6 +4376,17 @@ virDomainMigrateToURI(virDomainPtr domain, * not support this feature and will return an error if bandwidth * is not 0. * + * If you want to do post-copy migration, you have two choices: + * either use the low-level mechanism provided by libvirt, or its + * default policy. To use the low-level mechanism, you must first enable + * post-copy migration using the VIR_MIGRATE_ENABLE_POSTCOPY flag. Once + * migration is active, from a separate thread, you may start post-copy + * by calling virDomainMigrateStartPostCopy. + * + * The default post-copy policy implemented in libvirt is to start + * post-copy after the first pass of pre-copy. To enable this behaviour + * start migration with the VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY flag. + * * To see which features are supported by the current hypervisor, * see virConnectGetCapabilities, /capabilities/host/migration_features. * @@ -8932,6 +8985,45 @@ virDomainMigrateGetMaxSpeed(virDomainPtr domain, /** + * virDomainMigrateStartPostCopy: + * @domain: a domain object + * @flags: extra flags; not used yet, so callers should always pass 0 + * + * Starts post-copy migration. This function has to be called while + * migration (initially pre-copy) is in progress. The migration operation + * must be called with the VIR_MIGRATE_ENABLE_POSTCOPY flag. + * + * Returns 0 in case of success, -1 otherwise. + */ +int +virDomainMigrateStartPostCopy(virDomainPtr domain, + unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(domain); + + virResetLastError(); + + virCheckDomainReturn(domain, -1); + conn = domain->conn; + + virCheckReadOnlyGoto(conn->flags, error); + + if (conn->driver->domainMigrateStartPostCopy) { + if (conn->driver->domainMigrateStartPostCopy(domain, flags) < 0) + goto error; + return 0; + } + + virReportUnsupportedError(); + error: + virDispatchError(conn); + return -1; +} + + +/** * virConnectDomainEventRegisterAny: * @conn: pointer to the connection * @dom: pointer to the domain diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index e4c2df1..790f2f3 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -688,6 +688,7 @@ LIBVIRT_1.2.11 { global: virDomainFSInfoFree; virDomainGetFSInfo; + virDomainMigrateStartPostCopy; } LIBVIRT_1.2.9; # .... define new API here using predicted next version number .... diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index 22f0c88..001010f 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -8295,6 +8295,7 @@ static virHypervisorDriver hypervisor_driver = { .connectGetAllDomainStats = remoteConnectGetAllDomainStats, /* 1.2.8 */ .nodeAllocPages = remoteNodeAllocPages, /* 1.2.9 */ .domainGetFSInfo = remoteDomainGetFSInfo, /* 1.2.11 */ + .domainMigrateStartPostCopy = remoteDomainMigrateStartPostCopy, /* 1.2.11 */ }; static virNetworkDriver network_driver = { diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index cbd3ec7..4b13a8e 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -3143,6 +3143,10 @@ struct remote_domain_get_fsinfo_ret { unsigned int ret; }; +struct remote_domain_migrate_start_post_copy_args { + remote_nonnull_domain dom; + unsigned int flags; +}; /*----- Protocol. -----*/ /* Define the program number, protocol version and procedure numbers here. */ @@ -5550,5 +5554,11 @@ enum remote_procedure { * @generate: none * @acl: domain:fs_freeze */ - REMOTE_PROC_DOMAIN_GET_FSINFO = 349 + REMOTE_PROC_DOMAIN_GET_FSINFO = 349, + + /** + * @generate: both + * @acl: domain:migrate + */ + REMOTE_PROC_DOMAIN_MIGRATE_START_POST_COPY = 350 }; diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 2907fd5..c23d7c7 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -2605,6 +2605,10 @@ struct remote_domain_get_fsinfo_ret { } info; u_int ret; }; +struct remote_domain_migrate_start_post_copy_args { + remote_nonnull_domain dom; + u_int flags; +}; enum remote_procedure { REMOTE_PROC_CONNECT_OPEN = 1, REMOTE_PROC_CONNECT_CLOSE = 2, @@ -2955,4 +2959,5 @@ enum remote_procedure { REMOTE_PROC_NODE_ALLOC_PAGES = 347, REMOTE_PROC_DOMAIN_EVENT_CALLBACK_AGENT_LIFECYCLE = 348, REMOTE_PROC_DOMAIN_GET_FSINFO = 349, + REMOTE_PROC_DOMAIN_MIGRATE_START_POST_COPY = 350, }; -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++++-- src/qemu/qemu_monitor.h | 4 ++++ src/qemu/qemu_monitor_json.c | 23 ++++++++++++++++++++++- src/qemu/qemu_monitor_json.h | 1 + 4 files changed, 49 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index c9c84f9..6ecc35c 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -117,11 +117,11 @@ VIR_ONCE_GLOBAL_INIT(qemuMonitor) VIR_ENUM_IMPL(qemuMonitorMigrationStatus, QEMU_MONITOR_MIGRATION_STATUS_LAST, - "inactive", "active", "completed", "failed", "cancelled", "setup") + "inactive", "active", "completed", "failed", "cancelled", "setup", "postcopy-active") VIR_ENUM_IMPL(qemuMonitorMigrationCaps, QEMU_MONITOR_MIGRATION_CAPS_LAST, - "xbzrle", "auto-converge", "rdma-pin-all") + "xbzrle", "auto-converge", "rdma-pin-all", "x-postcopy-ram") VIR_ENUM_IMPL(qemuMonitorVMStatus, QEMU_MONITOR_VM_STATUS_LAST, @@ -2422,6 +2422,26 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, return ret; } +int qemuMonitorMigrateStartPostCopy(qemuMonitorPtr mon) +{ + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONMigrateStartPostCopy(mon); +} + + int qemuMonitorMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 21533a4..17bf879 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -452,6 +452,7 @@ enum { QEMU_MONITOR_MIGRATION_STATUS_ERROR, QEMU_MONITOR_MIGRATION_STATUS_CANCELLED, QEMU_MONITOR_MIGRATION_STATUS_SETUP, + QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE, QEMU_MONITOR_MIGRATION_STATUS_LAST }; @@ -505,6 +506,7 @@ typedef enum { QEMU_MONITOR_MIGRATION_CAPS_XBZRLE, QEMU_MONITOR_MIGRATION_CAPS_AUTO_CONVERGE, QEMU_MONITOR_MIGRATION_CAPS_RDMA_PIN_ALL, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY, QEMU_MONITOR_MIGRATION_CAPS_LAST } qemuMonitorMigrationCaps; @@ -561,6 +563,8 @@ int qemuMonitorMigrateToUnix(qemuMonitorPtr mon, unsigned int flags, const char *unixfile); +int qemuMonitorMigrateStartPostCopy(qemuMonitorPtr mon); + int qemuMonitorMigrateCancel(qemuMonitorPtr mon); int qemuMonitorGetDumpGuestMemoryCapability(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 6e251b3..c83f738 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2557,7 +2557,8 @@ qemuMonitorJSONGetMigrationStatusReply(virJSONValuePtr reply, status->setup_time_set = true; if (status->status == QEMU_MONITOR_MIGRATION_STATUS_ACTIVE || - status->status == QEMU_MONITOR_MIGRATION_STATUS_COMPLETED) { + status->status == QEMU_MONITOR_MIGRATION_STATUS_COMPLETED || + status->status == QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE) { virJSONValuePtr ram = virJSONValueObjectGet(ret, "ram"); if (!ram) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -2797,6 +2798,26 @@ int qemuMonitorJSONMigrate(qemuMonitorPtr mon, return ret; } +int qemuMonitorJSONMigrateStartPostCopy(qemuMonitorPtr mon) +{ + int ret; + virJSONValuePtr cmd; + cmd = qemuMonitorJSONMakeCommand("migrate-start-postcopy", NULL); + + virJSONValuePtr reply = NULL; + if (!cmd) + return -1; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon) { int ret; diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index ae20fb1..71558c6 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -152,6 +152,7 @@ int qemuMonitorJSONGetSpiceMigrationStatus(qemuMonitorPtr mon, bool *spice_migrated); +int qemuMonitorJSONMigrateStartPostCopy(qemuMonitorPtr mon); int qemuMonitorJSONMigrateCancel(qemuMonitorPtr mon); int qemuMonitorJSONGetDumpGuestMemoryCapability(qemuMonitorPtr mon, -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/qemu/qemu_migration.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 +- 2 files changed, 84 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index a1b1458..ede938b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1803,6 +1803,67 @@ qemuMigrationSetOffline(virQEMUDriverPtr driver, static int +qemuMigrationTestPostCopy(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob job) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, job) < 0) + return -1; + + ret = qemuMonitorGetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY); + + qemuDomainObjExitMonitor(driver, vm); + return ret; +} + + +static int +qemuMigrationSetPostCopy(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuDomainAsyncJob job) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret; + + if (job != QEMU_ASYNC_JOB_MIGRATION_OUT) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Set post-copy only makes sense for outgoing migration")); + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, job) < 0) + return -1; + + ret = qemuMonitorGetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY); + + if (ret < 0) { + goto cleanup; + } else if (ret == 0) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Post-copy migration is not supported by " + "source QEMU binary")); + ret = -1; + goto cleanup; + } + + ret = qemuMonitorSetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_POSTCOPY, + true); + + cleanup: + qemuDomainObjExitMonitor(driver, vm); + return ret; +} + + +static int qemuMigrationSetCompression(virQEMUDriverPtr driver, virDomainObjPtr vm, bool state, @@ -2752,6 +2813,15 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, dataFD[1] = -1; /* 'st' owns the FD now & will close it */ } + if (flags & VIR_MIGRATE_ENABLE_POSTCOPY && + qemuMigrationTestPostCopy(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("Post-copy migration is not supported by " + "target QEMU binary")); + goto stop; + } + if (qemuMigrationSetCompression(driver, vm, flags & VIR_MIGRATE_COMPRESSED, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) @@ -3602,6 +3672,18 @@ qemuMigrationRun(virQEMUDriverPtr driver, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup; + if (flags & VIR_MIGRATE_ENABLE_POSTCOPY) { + if (!(flags & VIR_MIGRATE_LIVE)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Enabling post-copy only makes sense with " + "live migration")); + goto cleanup; + } + if (qemuMigrationSetPostCopy(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto cleanup; + } + if (qemuDomainObjEnterMonitorAsync(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup; diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index e7a90c3..5d60238 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -41,7 +41,8 @@ VIR_MIGRATE_COMPRESSED | \ VIR_MIGRATE_ABORT_ON_ERROR | \ VIR_MIGRATE_AUTO_CONVERGE | \ - VIR_MIGRATE_RDMA_PIN_ALL) + VIR_MIGRATE_RDMA_PIN_ALL | \ + VIR_MIGRATE_ENABLE_POSTCOPY) /* All supported migration parameters and their types. */ # define QEMU_MIGRATION_PARAMETERS \ -- 1.9.1

Perform phase stops once migration switched to post-copy. Confirm phase waits for post-copy to finish before killing the VM. Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/qemu/qemu_driver.c | 8 ++++++++ src/qemu/qemu_migration.c | 46 +++++++++++++++++++++++++++++++++++++++------- 2 files changed, 47 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 07da3e3..06803b4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11348,6 +11348,14 @@ qemuDomainMigratePrepare2(virConnectPtr dconn, virCheckFlags(QEMU_MIGRATION_FLAGS, -1); + if (flags & VIR_MIGRATE_ENABLE_POSTCOPY) { + /* post-copy migration does not work with Sequence v2 */ + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Post-copy migration requested but not " + "supported by v2 protocol")); + goto cleanup; + } + if (flags & VIR_MIGRATE_TUNNELLED) { /* this is a logical error; we never should have gotten here with * VIR_MIGRATE_TUNNELLED set diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ede938b..137ddfa 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2074,6 +2074,11 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, ret = 0; break; + case QEMU_MONITOR_MIGRATION_STATUS_POSTCOPY_ACTIVE: + jobInfo->type = VIR_DOMAIN_JOB_BOUNDED; + ret = 0; + break; + case QEMU_MONITOR_MIGRATION_STATUS_INACTIVE: jobInfo->type = VIR_DOMAIN_JOB_NONE; virReportError(VIR_ERR_OPERATION_FAILED, @@ -2106,7 +2111,8 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuDomainAsyncJob asyncJob, virConnectPtr dconn, - bool abort_on_error) + bool abort_on_error, + bool exit_on_postcopy_active) { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; @@ -2129,7 +2135,9 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, jobInfo->type = VIR_DOMAIN_JOB_UNBOUNDED; - while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) { + while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED || + (!exit_on_postcopy_active && + jobInfo->type == VIR_DOMAIN_JOB_BOUNDED)) { /* Poll every 50ms for progress & to allow cancellation */ struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; @@ -2158,7 +2166,8 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, virObjectLock(vm); } - if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED) { + if (jobInfo->type == VIR_DOMAIN_JOB_COMPLETED || + jobInfo->type == VIR_DOMAIN_JOB_BOUNDED) { qemuDomainJobInfoUpdateDowntime(jobInfo); VIR_FREE(priv->job.completed); if (VIR_ALLOC(priv->job.completed) == 0) @@ -3190,6 +3199,18 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver, virCheckFlags(QEMU_MIGRATION_FLAGS, -1); + /* Wait for post-copy to complete */ + if (flags & VIR_MIGRATE_ENABLE_POSTCOPY) { + bool abort_on_error = !!(flags & VIR_MIGRATE_ABORT_ON_ERROR); + bool exit_on_postcopy_active = false; + rv = qemuMigrationWaitForCompletion(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT, + conn, abort_on_error, + exit_on_postcopy_active); + if (rv < 0) + goto cleanup; + } + qemuMigrationJobSetPhase(driver, vm, retcode == 0 ? QEMU_MIGRATION_PHASE_CONFIRM3 @@ -3786,9 +3807,14 @@ qemuMigrationRun(virQEMUDriverPtr driver, !(iothread = qemuMigrationStartTunnel(spec->fwd.stream, fd))) goto cancel; - rc = qemuMigrationWaitForCompletion(driver, vm, - QEMU_ASYNC_JOB_MIGRATION_OUT, - dconn, abort_on_error); + { + bool exit_on_postcopy_active = true; + rc = qemuMigrationWaitForCompletion(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT, + dconn, abort_on_error, + exit_on_postcopy_active); + } + if (rc == -2) goto cancel; else if (rc == -1) @@ -5251,7 +5277,13 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, if (rc < 0) goto cleanup; - rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, false); + { + bool abort_on_error = false; + bool exit_on_postcopy_active = true; + rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, + abort_on_error, + exit_on_postcopy_active); + } if (rc < 0) { if (rc == -2) { -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/qemu/qemu_driver.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 06803b4..fc7de23 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12356,6 +12356,55 @@ qemuDomainGetJobStats(virDomainPtr dom, } +static int qemuDomainMigrateStartPostCopy(virDomainPtr dom, + unsigned int flags) +{ + virQEMUDriverPtr driver = dom->conn->privateData; + virDomainObjPtr vm; + int ret = -1; + qemuDomainObjPrivatePtr priv; + + virCheckFlags(0, -1); + + if (!(vm = qemuDomObjFromDomain(dom))) + goto cleanup; + + if (virDomainMigrateStartPostCopyEnsureACL(dom->conn, vm->def) < 0) + goto cleanup; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MIGRATION_OP) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + goto endjob; + } + + priv = vm->privateData; + + if (priv->job.asyncJob != QEMU_ASYNC_JOB_MIGRATION_OUT) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("post-copy can only be started " + "while migration is in progress")); + goto endjob; + } + + VIR_DEBUG("Starting post-copy"); + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorMigrateStartPostCopy(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + + endjob: + if (!qemuDomainObjEndJob(driver, vm)) + vm = NULL; + + cleanup: + if (vm) + virObjectUnlock(vm); + return ret; +} + static int qemuDomainAbortJob(virDomainPtr dom) { virQEMUDriverPtr driver = dom->conn->privateData; @@ -19088,6 +19137,7 @@ static virHypervisorDriver qemuDriver = { .connectGetAllDomainStats = qemuConnectGetAllDomainStats, /* 1.2.8 */ .nodeAllocPages = qemuNodeAllocPages, /* 1.2.9 */ .domainGetFSInfo = qemuDomainGetFSInfo, /* 1.2.11 */ + .domainMigrateStartPostCopy = qemuDomainMigrateStartPostCopy, /* 1.2.11 */ }; -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- tools/virsh-domain.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ tools/virsh.pod | 11 +++++++++++ 2 files changed, 61 insertions(+) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 750411b..1753f6e 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9435,6 +9435,10 @@ static const vshCmdOptDef opts_migrate[] = { .type = VSH_OT_INT, .help = N_("force guest to suspend if live migration exceeds timeout (in seconds)") }, + {.name = "enable-postcopy", + .type = VSH_OT_BOOL, + .help = N_("enable (but do not start) post-copy migration; to start post-copy use migrate-start-postcopy") + }, {.name = "xml", .type = VSH_OT_STRING, .help = N_("filename containing updated XML for the target") @@ -9516,6 +9520,8 @@ doMigrate(void *opaque) VIR_FREE(xml); } + if (vshCommandOptBool(cmd, "enable-postcopy")) + flags |= VIR_MIGRATE_ENABLE_POSTCOPY; if (vshCommandOptBool(cmd, "live")) flags |= VIR_MIGRATE_LIVE; if (vshCommandOptBool(cmd, "p2p")) @@ -12289,6 +12295,44 @@ cmdDomFSInfo(vshControl *ctl, const vshCmd *cmd) return ret >= 0; } +/* + * "migrate-start-postcopy" command + */ +static const vshCmdInfo info_migratestartpostcopy[] = { + {.name = "help", + .data = N_("Switch running migration from pre-copy to post-copy") + }, + {.name = "desc", + .data = N_("Switch running migration from pre-copy to post-copy") + }, + {.name = NULL} +}; + +static const vshCmdOptDef opts_migratestartpostcopy[] = { + {.name = "domain", + .type = VSH_OT_DATA, + .flags = VSH_OFLAG_REQ, + .help = N_("domain name, id or uuid") + }, + {.name = NULL} +}; + +static bool +cmdMigrateStartPostCopy(vshControl *ctl, const vshCmd *cmd) +{ + virDomainPtr dom; + bool ret = true; + + if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) + return false; + + if (virDomainMigrateStartPostCopy(dom, 0) < 0) + ret = false; + + virDomainFree(dom); + return ret; +} + const vshCmdDef domManagementCmds[] = { {.name = "attach-device", .handler = cmdAttachDevice, @@ -12808,5 +12852,11 @@ const vshCmdDef domManagementCmds[] = { .info = info_vncdisplay, .flags = 0 }, + {.name = "migrate-start-postcopy", + .handler = cmdMigrateStartPostCopy, + .opts = opts_migratestartpostcopy, + .info = info_migratestartpostcopy, + .flags = 0 + }, {.name = NULL} }; diff --git a/tools/virsh.pod b/tools/virsh.pod index 7cde3fd..18c6a23 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -1428,6 +1428,7 @@ to the I<uri> namespace is displayed instead of being modified. [I<--compressed>] [I<--abort-on-error>] [I<--auto-converge>] I<domain> I<desturi> [I<migrateuri>] [I<graphicsuri>] [I<listen-address>] [I<dname>] [I<--timeout> B<seconds>] [I<--xml> B<file>] +[I<--enable-postcopy>] Migrate domain to another host. Add I<--live> for live migration; <--p2p> for peer-2-peer migration; I<--direct> for direct migration; or I<--tunnelled> @@ -1475,6 +1476,11 @@ I<--timeout> B<seconds> forces guest to suspend when live migration exceeds that many seconds, and then the migration will complete offline. It can only be used with I<--live>. +I<--enable-postcopy> enables post-copy logic in migration, but does not +actually start post-copy, i.e., migration is started in pre-copy mode. +Once migration started, the user may switch to post-copy using the +B<migrate-start-postcopy> command sent from another virsh instance. + Running migration can be canceled by interrupting virsh (usually using C<Ctrl-C>) or by B<domjobabort> command sent from another virsh instance. @@ -1552,6 +1558,11 @@ addresses are accepted as well as hostnames (the resolving is done on destination). Some hypervisors do not support this feature and will return an error if this parameter is used. +=item B<migrate-start-postcopy> I<domain> + +Switch the current migration from pre-copy to post-copy. A migration needs +to be in progress, that has been started with I<--enable-postcopy>. + =item B<migrate-setmaxdowntime> I<domain> I<downtime> Set maximum tolerable downtime for a domain which is being live-migrated to -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- tools/virsh-domain.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 5 +++++ 2 files changed, 63 insertions(+), 2 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 1753f6e..52e91d9 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9439,6 +9439,10 @@ static const vshCmdOptDef opts_migrate[] = { .type = VSH_OT_BOOL, .help = N_("enable (but do not start) post-copy migration; to start post-copy use migrate-start-postcopy") }, + {.name = "postcopy-after", + .type = VSH_OT_INT, + .help = N_("switch to post-copy migration if live migration exceeds timeout (in seconds)") + }, {.name = "xml", .type = VSH_OT_STRING, .help = N_("filename containing updated XML for the target") @@ -9522,6 +9526,8 @@ doMigrate(void *opaque) if (vshCommandOptBool(cmd, "enable-postcopy")) flags |= VIR_MIGRATE_ENABLE_POSTCOPY; + if (vshCommandOptBool(cmd, "postcopy-after")) /* actually an int */ + flags |= VIR_MIGRATE_ENABLE_POSTCOPY; if (vshCommandOptBool(cmd, "live")) flags |= VIR_MIGRATE_LIVE; if (vshCommandOptBool(cmd, "p2p")) @@ -9612,6 +9618,20 @@ vshMigrationTimeout(vshControl *ctl, virDomainSuspend(dom); } +static void +vshMigrationPostCopyAfter(vshControl *ctl, + virDomainPtr dom, + void *opaque ATTRIBUTE_UNUSED) +{ + vshDebug(ctl, VSH_ERR_DEBUG, "starting post-copy\n"); + int rv = virDomainMigrateStartPostCopy(dom, 0); + if (rv < 0) { + vshError(ctl, "%s", _("start post-copy command failed")); + } else { + vshDebug(ctl, VSH_ERR_INFO, "switched to post-copy\n"); + } +} + static bool cmdMigrate(vshControl *ctl, const vshCmd *cmd) { @@ -9621,8 +9641,10 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) bool verbose = false; bool functionReturn = false; int timeout = 0; + int postCopyAfter = 0; bool live_flag = false; vshCtrlData data = { .dconn = NULL }; + int rv; if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) return false; @@ -9640,6 +9662,35 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) goto cleanup; } + rv = vshCommandOptInt(cmd, "postcopy-after", &postCopyAfter); + if (rv < 0 || (rv > 0 && postCopyAfter < 0)) { + vshError(ctl, "%s", _("invalid postcopy-after parameter")); + goto cleanup; + } + if (rv > 0) { + /* Ensure that we can multiply by 1000 without overflowing. */ + if (postCopyAfter > INT_MAX / 1000) { + vshError(ctl, "%s", _("post-copy after parameter is too large")); + goto cleanup; + } + postCopyAfter *= 1000; + /* 0 is a special value inside virsh, which means no timeout, so + * use 1ms instead for "start post-copy immediately" + */ + if (postCopyAfter == 0) + postCopyAfter = 1; + } + + if (postCopyAfter > 0 && !live_flag) { + vshError(ctl, "%s", + _("migrate: Unexpected postcopy-after for offline migration")); + goto cleanup; + } else if (postCopyAfter > 0 && timeout > 0) { + vshError(ctl, "%s", + _("migrate: --postcopy-after is incompatible with --timeout")); + goto cleanup; + } + if (pipe(p) < 0) goto cleanup; @@ -9669,8 +9720,13 @@ cmdMigrate(vshControl *ctl, const vshCmd *cmd) doMigrate, &data) < 0) goto cleanup; - functionReturn = vshWatchJob(ctl, dom, verbose, p[0], timeout, - vshMigrationTimeout, NULL, _("Migration")); + if (postCopyAfter != 0) { + functionReturn = vshWatchJob(ctl, dom, verbose, p[0], postCopyAfter, + vshMigrationPostCopyAfter, NULL, _("Migration")); + } else { + functionReturn = vshWatchJob(ctl, dom, verbose, p[0], timeout, + vshMigrationTimeout, NULL, _("Migration")); + } virThreadJoin(&workerThread); diff --git a/tools/virsh.pod b/tools/virsh.pod index 18c6a23..a4c5d34 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -1429,6 +1429,7 @@ to the I<uri> namespace is displayed instead of being modified. I<domain> I<desturi> [I<migrateuri>] [I<graphicsuri>] [I<listen-address>] [I<dname>] [I<--timeout> B<seconds>] [I<--xml> B<file>] [I<--enable-postcopy>] +[I<--postcopy-after> B<seconds>] Migrate domain to another host. Add I<--live> for live migration; <--p2p> for peer-2-peer migration; I<--direct> for direct migration; or I<--tunnelled> @@ -1481,6 +1482,10 @@ actually start post-copy, i.e., migration is started in pre-copy mode. Once migration started, the user may switch to post-copy using the B<migrate-start-postcopy> command sent from another virsh instance. +I<--postcopy-after> switches to post-copy migration when pre-copy migration +exceeds that many seconds. Zero means start post-copy as soon as possible. +It can only be used with I<--live>. + Running migration can be canceled by interrupting virsh (usually using C<Ctrl-C>) or by B<domjobabort> command sent from another virsh instance. -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 4 ++++ src/qemu/qemu_monitor_text.c | 1 + tests/qemumonitorjsontest.c | 1 + 4 files changed, 7 insertions(+) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 17bf879..4dd0bef 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -483,6 +483,7 @@ struct _qemuMonitorMigrationStatus { unsigned long long ram_duplicate; unsigned long long ram_normal; unsigned long long ram_normal_bytes; + unsigned long long ram_dirty_sync_count; /* how many times pre-copy restarted so far */ unsigned long long disk_transferred; unsigned long long disk_remaining; diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c83f738..0c4b7ad 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2566,6 +2566,10 @@ qemuMonitorJSONGetMigrationStatusReply(virJSONValuePtr reply, return -1; } + if (virJSONValueObjectGetNumberUlong(ram, "dirty-sync-count", + &status->ram_dirty_sync_count) < 0) { + status->ram_dirty_sync_count = -1; /* silently ignored */ + } if (virJSONValueObjectGetNumberUlong(ram, "transferred", &status->ram_transferred) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", diff --git a/src/qemu/qemu_monitor_text.c b/src/qemu/qemu_monitor_text.c index 70aeaca..c4c0075 100644 --- a/src/qemu/qemu_monitor_text.c +++ b/src/qemu/qemu_monitor_text.c @@ -1435,6 +1435,7 @@ int qemuMonitorTextGetMigrationStatus(qemuMonitorPtr mon, int ret = -1; memset(status, 0, sizeof(*status)); + status->ram_dirty_sync_count = -1; /* not implemented for text monitor */ if (qemuMonitorHMPCommand(mon, "info migrate", &reply) < 0) return -1; diff --git a/tests/qemumonitorjsontest.c b/tests/qemumonitorjsontest.c index 5bfcd20..343f010 100644 --- a/tests/qemumonitorjsontest.c +++ b/tests/qemumonitorjsontest.c @@ -1688,6 +1688,7 @@ testQemuMonitorJSONqemuMonitorJSONGetMigrationStatus(const void *data) expectedStatus.ram_total = 1611038720; expectedStatus.ram_remaining = 1605013504; expectedStatus.ram_transferred = 3625548; + expectedStatus.ram_dirty_sync_count = -1; if (qemuMonitorTestAddItem(test, "query-migrate", "{" -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/qemu/qemu_driver.c | 2 ++ src/qemu/qemu_migration.c | 49 +++++++++++++++++++++++++++++++++++++++++++---- src/qemu/qemu_migration.h | 3 ++- 3 files changed, 49 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index fc7de23..53347ea 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11348,6 +11348,8 @@ qemuDomainMigratePrepare2(virConnectPtr dconn, virCheckFlags(QEMU_MIGRATION_FLAGS, -1); + if (flags & VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY) + flags |= VIR_MIGRATE_ENABLE_POSTCOPY; if (flags & VIR_MIGRATE_ENABLE_POSTCOPY) { /* post-copy migration does not work with Sequence v2 */ virReportError(VIR_ERR_INTERNAL_ERROR, "%s", diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 137ddfa..e5412c5 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2112,7 +2112,8 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, qemuDomainAsyncJob asyncJob, virConnectPtr dconn, bool abort_on_error, - bool exit_on_postcopy_active) + bool exit_on_postcopy_active, + bool trigger_postcopy) { qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainJobInfoPtr jobInfo = priv->job.current; @@ -2144,6 +2145,34 @@ qemuMigrationWaitForCompletion(virQEMUDriverPtr driver, if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) == -1) break; + /* automatically switch to post-copy if the user requested so */ + if (trigger_postcopy && + jobInfo->status.status == QEMU_MONITOR_MIGRATION_STATUS_ACTIVE && + jobInfo->status.ram_dirty_sync_count > 0) { + int rv; + + /* Clear variable to prevent sending this command to qemu twice */ + trigger_postcopy = false; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob) < 0) { + /* Migration might have finished before we got to + * trigger post-copy */ + break; + } + rv = qemuMonitorMigrateStartPostCopy(priv->mon); + qemuDomainObjExitMonitor(driver, vm); + + if (rv < 0) { + virReportError(VIR_ERR_OPERATION_FAILED, + _("%s: %s"), job, + _("Switching to post-copy failed")); + if (abort_on_error) + break; + } else { + VIR_DEBUG("Switched to post-copy"); + } + } + /* cancel migration if disk I/O error is emitted while migrating */ if (abort_on_error && virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED && @@ -2822,6 +2851,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, dataFD[1] = -1; /* 'st' owns the FD now & will close it */ } + if (flags & VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY) + flags |= VIR_MIGRATE_ENABLE_POSTCOPY; if (flags & VIR_MIGRATE_ENABLE_POSTCOPY && qemuMigrationTestPostCopy(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) { @@ -3200,13 +3231,17 @@ qemuMigrationConfirmPhase(virQEMUDriverPtr driver, virCheckFlags(QEMU_MIGRATION_FLAGS, -1); /* Wait for post-copy to complete */ + if (flags & VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY) + flags |= VIR_MIGRATE_ENABLE_POSTCOPY; if (flags & VIR_MIGRATE_ENABLE_POSTCOPY) { bool abort_on_error = !!(flags & VIR_MIGRATE_ABORT_ON_ERROR); bool exit_on_postcopy_active = false; + bool trigger_postcopy = false; rv = qemuMigrationWaitForCompletion(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT, conn, abort_on_error, - exit_on_postcopy_active); + exit_on_postcopy_active, + trigger_postcopy); if (rv < 0) goto cleanup; } @@ -3693,6 +3728,8 @@ qemuMigrationRun(virQEMUDriverPtr driver, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup; + if (flags & VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY) + flags |= VIR_MIGRATE_ENABLE_POSTCOPY; if (flags & VIR_MIGRATE_ENABLE_POSTCOPY) { if (!(flags & VIR_MIGRATE_LIVE)) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -3809,10 +3846,12 @@ qemuMigrationRun(virQEMUDriverPtr driver, { bool exit_on_postcopy_active = true; + bool trigger_postcopy = (flags & VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY); rc = qemuMigrationWaitForCompletion(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT, dconn, abort_on_error, - exit_on_postcopy_active); + exit_on_postcopy_active, + trigger_postcopy); } if (rc == -2) @@ -5280,9 +5319,11 @@ qemuMigrationToFile(virQEMUDriverPtr driver, virDomainObjPtr vm, { bool abort_on_error = false; bool exit_on_postcopy_active = true; + bool trigger_postcopy = false; rc = qemuMigrationWaitForCompletion(driver, vm, asyncJob, NULL, abort_on_error, - exit_on_postcopy_active); + exit_on_postcopy_active, + trigger_postcopy); } if (rc < 0) { diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 5d60238..8cec9b8 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -42,7 +42,8 @@ VIR_MIGRATE_ABORT_ON_ERROR | \ VIR_MIGRATE_AUTO_CONVERGE | \ VIR_MIGRATE_RDMA_PIN_ALL | \ - VIR_MIGRATE_ENABLE_POSTCOPY) + VIR_MIGRATE_ENABLE_POSTCOPY | \ + VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY) /* All supported migration parameters and their types. */ # define QEMU_MIGRATION_PARAMETERS \ -- 1.9.1

Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- tools/virsh-domain.c | 6 ++++++ tools/virsh.pod | 5 +++++ 2 files changed, 11 insertions(+) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 52e91d9..c453dc4 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -9443,6 +9443,10 @@ static const vshCmdOptDef opts_migrate[] = { .type = VSH_OT_INT, .help = N_("switch to post-copy migration if live migration exceeds timeout (in seconds)") }, + {.name = "postcopy-after-precopy", + .type = VSH_OT_BOOL, + .help = N_("switch to post-copy migration after one pass of pre-copy") + }, {.name = "xml", .type = VSH_OT_STRING, .help = N_("filename containing updated XML for the target") @@ -9528,6 +9532,8 @@ doMigrate(void *opaque) flags |= VIR_MIGRATE_ENABLE_POSTCOPY; if (vshCommandOptBool(cmd, "postcopy-after")) /* actually an int */ flags |= VIR_MIGRATE_ENABLE_POSTCOPY; + if (vshCommandOptBool(cmd, "postcopy-after-precopy")) + flags |= VIR_MIGRATE_POSTCOPY_AFTER_PRECOPY; if (vshCommandOptBool(cmd, "live")) flags |= VIR_MIGRATE_LIVE; if (vshCommandOptBool(cmd, "p2p")) diff --git a/tools/virsh.pod b/tools/virsh.pod index a4c5d34..22c4447 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -1430,6 +1430,7 @@ I<domain> I<desturi> [I<migrateuri>] [I<graphicsuri>] [I<listen-address>] [I<dname>] [I<--timeout> B<seconds>] [I<--xml> B<file>] [I<--enable-postcopy>] [I<--postcopy-after> B<seconds>] +[I<--postcopy-after-precopy>] Migrate domain to another host. Add I<--live> for live migration; <--p2p> for peer-2-peer migration; I<--direct> for direct migration; or I<--tunnelled> @@ -1486,6 +1487,10 @@ I<--postcopy-after> switches to post-copy migration when pre-copy migration exceeds that many seconds. Zero means start post-copy as soon as possible. It can only be used with I<--live>. +I<--postcopy-after-precopy> switches to post-copy migration after the +first pass of pre-copy. For most VMs, this is the most efficient way to +do migration while minimizing downtime. + Running migration can be canceled by interrupting virsh (usually using C<Ctrl-C>) or by B<domjobabort> command sent from another virsh instance. -- 1.9.1

This reverts commit 46a811db0731cedaea0153fc223faa6096cee5b5. It causes random problems in OpenStack, which displays the following error: """ error : virNWFilterObjAssignDef:3075 : operation failed: filter 'nova-no-nd-reflection' already exists with uuid ef783c9f-ae1c-4242-8cd5-9cef5ec4fa7a """ Special thanks to Vojtech Cima for investigating. Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/conf/nwfilter_conf.c | 11 ----------- 1 file changed, 11 deletions(-) diff --git a/src/conf/nwfilter_conf.c b/src/conf/nwfilter_conf.c index 074d745..7abeff8 100644 --- a/src/conf/nwfilter_conf.c +++ b/src/conf/nwfilter_conf.c @@ -3065,17 +3065,6 @@ virNWFilterObjAssignDef(virNWFilterObjListPtr nwfilters, return NULL; } virNWFilterObjUnlock(nwfilter); - } else { - nwfilter = virNWFilterObjFindByName(nwfilters, def->name); - if (nwfilter) { - char uuidstr[VIR_UUID_STRING_BUFLEN]; - virUUIDFormat(nwfilter->def->uuid, uuidstr); - virReportError(VIR_ERR_OPERATION_FAILED, - _("filter '%s' already exists with uuid %s"), - def->name, uuidstr); - virNWFilterObjUnlock(nwfilter); - return NULL; - } } if (virNWFilterDefLoopDetect(nwfilters, def) < 0) { -- 1.9.1

On Mon, Dec 01, 2014 at 17:00:03 +0100, Cristian Klein wrote:
This reverts commit 46a811db0731cedaea0153fc223faa6096cee5b5. It causes random problems in OpenStack, which displays the following error:
""" error : virNWFilterObjAssignDef:3075 : operation failed: filter 'nova-no-nd-reflection' already exists with uuid ef783c9f-ae1c-4242-8cd5-9cef5ec4fa7a """
Special thanks to Vojtech Cima for investigating.
Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/conf/nwfilter_conf.c | 11 ----------- 1 file changed, 11 deletions(-)
diff --git a/src/conf/nwfilter_conf.c b/src/conf/nwfilter_conf.c index 074d745..7abeff8 100644 --- a/src/conf/nwfilter_conf.c +++ b/src/conf/nwfilter_conf.c @@ -3065,17 +3065,6 @@ virNWFilterObjAssignDef(virNWFilterObjListPtr nwfilters, return NULL; } virNWFilterObjUnlock(nwfilter); - } else { - nwfilter = virNWFilterObjFindByName(nwfilters, def->name); - if (nwfilter) { - char uuidstr[VIR_UUID_STRING_BUFLEN]; - virUUIDFormat(nwfilter->def->uuid, uuidstr); - virReportError(VIR_ERR_OPERATION_FAILED, - _("filter '%s' already exists with uuid %s"), - def->name, uuidstr); - virNWFilterObjUnlock(nwfilter); - return NULL; - } }
if (virNWFilterDefLoopDetect(nwfilters, def) < 0) {
NACK to this one. Not only this is completely unrelated to post-copy migration but it is also wrong. If openstack is trying to redefine an existing nwfilter but uses different UUID (or no UUID at all), it's openstack which needs to be fixed. Jirka

On 2014-12-02 17:03, Jiri Denemark wrote:
On Mon, Dec 01, 2014 at 17:00:03 +0100, Cristian Klein wrote:
This reverts commit 46a811db0731cedaea0153fc223faa6096cee5b5. It causes random problems in OpenStack, which displays the following error:
""" error : virNWFilterObjAssignDef:3075 : operation failed: filter 'nova-no-nd-reflection' already exists with uuid ef783c9f-ae1c-4242-8cd5-9cef5ec4fa7a """
Special thanks to Vojtech Cima for investigating.
Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/conf/nwfilter_conf.c | 11 ----------- 1 file changed, 11 deletions(-)
diff --git a/src/conf/nwfilter_conf.c b/src/conf/nwfilter_conf.c index 074d745..7abeff8 100644 --- a/src/conf/nwfilter_conf.c +++ b/src/conf/nwfilter_conf.c @@ -3065,17 +3065,6 @@ virNWFilterObjAssignDef(virNWFilterObjListPtr nwfilters, return NULL; } virNWFilterObjUnlock(nwfilter); - } else { - nwfilter = virNWFilterObjFindByName(nwfilters, def->name); - if (nwfilter) { - char uuidstr[VIR_UUID_STRING_BUFLEN]; - virUUIDFormat(nwfilter->def->uuid, uuidstr); - virReportError(VIR_ERR_OPERATION_FAILED, - _("filter '%s' already exists with uuid %s"), - def->name, uuidstr); - virNWFilterObjUnlock(nwfilter); - return NULL; - } }
if (virNWFilterDefLoopDetect(nwfilters, def) < 0) {
NACK to this one. Not only this is completely unrelated to post-copy migration but it is also wrong. If openstack is trying to redefine an existing nwfilter but uses different UUID (or no UUID at all), it's openstack which needs to be fixed.
I just observed that later versions of OpenStack fixed this. Sorry for the noise. For future references: https://review.openstack.org/#/c/122721/ https://review.openstack.org/#/c/122721/1/nova/virt/libvirt/firewall.py Cristian

Hello Cristian, On 02.12.2014 17:23, Cristian KLEIN wrote:
On 2014-12-02 17:03, Jiri Denemark wrote:
On Mon, Dec 01, 2014 at 17:00:03 +0100, Cristian Klein wrote:
This reverts commit 46a811db0731cedaea0153fc223faa6096cee5b5. It causes random problems in OpenStack, which displays the following error:
""" error : virNWFilterObjAssignDef:3075 : operation failed: filter 'nova-no-nd-reflection' already exists with uuid ef783c9f-ae1c-4242-8cd5-9cef5ec4fa7a """
Special thanks to Vojtech Cima for investigating.
Signed-off-by: Cristian Klein <cristiklein@gmail.com> --- src/conf/nwfilter_conf.c | 11 ----------- 1 file changed, 11 deletions(-)
diff --git a/src/conf/nwfilter_conf.c b/src/conf/nwfilter_conf.c index 074d745..7abeff8 100644 --- a/src/conf/nwfilter_conf.c +++ b/src/conf/nwfilter_conf.c @@ -3065,17 +3065,6 @@ virNWFilterObjAssignDef(virNWFilterObjListPtr nwfilters, return NULL; } virNWFilterObjUnlock(nwfilter); - } else { - nwfilter = virNWFilterObjFindByName(nwfilters, def->name); - if (nwfilter) { - char uuidstr[VIR_UUID_STRING_BUFLEN]; - virUUIDFormat(nwfilter->def->uuid, uuidstr); - virReportError(VIR_ERR_OPERATION_FAILED, - _("filter '%s' already exists with uuid %s"), - def->name, uuidstr); - virNWFilterObjUnlock(nwfilter); - return NULL; - } }
if (virNWFilterDefLoopDetect(nwfilters, def) < 0) {
NACK to this one. Not only this is completely unrelated to post-copy migration but it is also wrong. If openstack is trying to redefine an existing nwfilter but uses different UUID (or no UUID at all), it's openstack which needs to be fixed.
I just observed that later versions of OpenStack fixed this. Sorry for the noise. For future references:
https://review.openstack.org/#/c/122721/ https://review.openstack.org/#/c/122721/1/nova/virt/libvirt/firewall.py
Thank you, didn't know about this one. Sorry for complications.
Cristian Regards, Vojtech

On 12/01/2014 10:59 AM, Cristian Klein wrote:
Qemu currently implements pre-copy live migration. VM memory pages are first copied from the source hypervisor to the destination, potentially multiple times as pages get dirtied during transfer, then VCPU state is migrated. Unfortunately, if the VM dirties memory faster than the network bandwidth, then pre-copy cannot finish. `virsh` currently includes an option to suspend a VM after a timeout, so that migration may finish, but at the expense of downtime.
A future version of qemu will implement post-copy live migration. The VCPU state is first migrated to the destination hypervisor, then memory pages are pulled from the source hypervisor. Post-copy has the potential to do migration with zero-downtime, despite the VM dirtying pages fast, with minimum performance impact. On the other hand, while post-copy is in progress, any network failure would render the VM unusable, as its memory is partitioned between the source and destination hypervisor. Therefore, post-copy should only be used when necessary.
Post-copy migration in qemu will work as follows: (1) The `x-postcopy-ram` migration capability needs to be set. (2) Migration is started. (3) When the user decides so, post-copy migration is activated by sending the `migrate-start-postcopy` command. (4) Qemu acknowledges by setting migration status to `postcopy-active`.
(there are probably inaccuracies and misstatements in the following, but the topic does need consideration, and this seemed like a good place to bring it up while it's fresh in my mind...) I happened to be thinking about post-copy migration vs. guest networking over the weekend, and realized a potential problem related to starting the destination domain so quickly after it is created - if the guest is connected to the network via a host bridge that has STP enabled and a non-zero forwarding delay, the guest's network traffic could be interrupted until the delay timer has counted down. This points out a couple of things: 1) the "migrate-start-postcopy" needs to be either sent, or acknowledged (I'm not sure which coincides more closely with the stopping of the source domain and starting of the destination domain) after the destination domain's tap devices have existed and been connected to the bridge long enough to be able to forward traffic. 2) libvirt needs to have a more formal separation of the following tasks: * allocate resources for a network device (i.e. networkAllocateActualDevice()) * create a network device (create and ifup the tap device, which would start timers counting down; in the case of macvtap, the device should be created, but not ifup'ed) * activate a network device (for a tap device send a gratuitous arp request, update the bridge's FDB for the guest's MAC address. For macvtap, ifup the device) It should also have the reverse of all these operations: * deactivate (remove fdb entries for tap, ifdown for macvtap) * destroy (delete the tap/macvtap device) * free (networkReleaseActualDevice()) Additionally, for completeness we need "notify" which is done for each guest interface any time libvirtd is restarted (this already exists in networkNotifyActualDevice()); this just recreates libvirtd's tables of which host interfaces are in use by guests. Currently, libvirt does create and activate simultaneously (and also qemu does a gratuitous ARP request at some point, although I haven't checked if it happens when qemu starts or when the guest CPUs are started), and deactivate, destroy, and free all happen at pretty much the same time as well. The former leads to problems like this one reported by dgilbert: https://bugzilla.redhat.com/show_bug.cgi?id=1081461 This is just one of several possible variations of "some parts of the network have incorrect information about where MAC X is currently located"; when you mix in post-copy migration, and manual handling of the bridge FDB (https://www.redhat.com/archives/libvir-list/2014-December/msg00173.html), there are many opportunities for failure! Back to my list of operations - to make migration work smoothly, allocate and create should be done prior to starting the qemu process, but activate shouldn't be done until just before the CPUs are turned on (and ideally, *that* shouldn't happen until the connection to the device is ready to forward traffic). Likewise, deactivate should be called as soon as the CPUs are paused, while destroy/free should be done after qemu is terminated. This way, the guest's MAC will only be in one bridge's FDB at any given time, and it will be the FDB of the bridge attached to the currently running instance. Does anybody else have any thoughts/ideas on this subject? Cleaning up the hypervisor drivers' use of network devices has been on my mind for a long time, and it may be time to finally take action.

On 2014-12-04 10:40, Laine Stump wrote:
On 12/01/2014 10:59 AM, Cristian Klein wrote:
Qemu currently implements pre-copy live migration. VM memory pages are first copied from the source hypervisor to the destination, potentially multiple times as pages get dirtied during transfer, then VCPU state is migrated. Unfortunately, if the VM dirties memory faster than the network bandwidth, then pre-copy cannot finish. `virsh` currently includes an option to suspend a VM after a timeout, so that migration may finish, but at the expense of downtime.
A future version of qemu will implement post-copy live migration. The VCPU state is first migrated to the destination hypervisor, then memory pages are pulled from the source hypervisor. Post-copy has the potential to do migration with zero-downtime, despite the VM dirtying pages fast, with minimum performance impact. On the other hand, while post-copy is in progress, any network failure would render the VM unusable, as its memory is partitioned between the source and destination hypervisor. Therefore, post-copy should only be used when necessary.
Post-copy migration in qemu will work as follows: (1) The `x-postcopy-ram` migration capability needs to be set. (2) Migration is started. (3) When the user decides so, post-copy migration is activated by sending the `migrate-start-postcopy` command. (4) Qemu acknowledges by setting migration status to `postcopy-active`.
(there are probably inaccuracies and misstatements in the following, but the topic does need consideration, and this seemed like a good place to bring it up while it's fresh in my mind...)
I happened to be thinking about post-copy migration vs. guest networking over the weekend, and realized a potential problem related to starting the destination domain so quickly after it is created - if the guest is connected to the network via a host bridge that has STP enabled and a non-zero forwarding delay, the guest's network traffic could be interrupted until the delay timer has counted down. This points out a couple of things:
1) the "migrate-start-postcopy" needs to be either sent, or acknowledged (I'm not sure which coincides more closely with the stopping of the source domain and starting of the destination domain) after the destination domain's tap devices have existed and been connected to the bridge long enough to be able to forward traffic.
2) libvirt needs to have a more formal separation of the following tasks:
* allocate resources for a network device (i.e. networkAllocateActualDevice()) * create a network device (create and ifup the tap device, which would start timers counting down; in the case of macvtap, the device should be created, but not ifup'ed) * activate a network device (for a tap device send a gratuitous arp request, update the bridge's FDB for the guest's MAC address. For macvtap, ifup the device)
It should also have the reverse of all these operations:
* deactivate (remove fdb entries for tap, ifdown for macvtap) * destroy (delete the tap/macvtap device) * free (networkReleaseActualDevice())
Additionally, for completeness we need "notify" which is done for each guest interface any time libvirtd is restarted (this already exists in networkNotifyActualDevice()); this just recreates libvirtd's tables of which host interfaces are in use by guests.
Currently, libvirt does create and activate simultaneously (and also qemu does a gratuitous ARP request at some point, although I haven't checked if it happens when qemu starts or when the guest CPUs are started), and deactivate, destroy, and free all happen at pretty much the same time as well. The former leads to problems like this one reported by dgilbert:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
This is just one of several possible variations of "some parts of the network have incorrect information about where MAC X is currently located"; when you mix in post-copy migration, and manual handling of the bridge FDB (https://www.redhat.com/archives/libvir-list/2014-December/msg00173.html), there are many opportunities for failure!
Back to my list of operations - to make migration work smoothly, allocate and create should be done prior to starting the qemu process, but activate shouldn't be done until just before the CPUs are turned on (and ideally, *that* shouldn't happen until the connection to the device is ready to forward traffic). Likewise, deactivate should be called as soon as the CPUs are paused, while destroy/free should be done after qemu is terminated. This way, the guest's MAC will only be in one bridge's FDB at any given time, and it will be the FDB of the bridge attached to the currently running instance.
Does anybody else have any thoughts/ideas on this subject? Cleaning up the hypervisor drivers' use of network devices has been on my mind for a long time, and it may be time to finally take action.
Hi Laine, I am not sufficiently familiar with libvirt's internals to contribute too much to the discussion, but I'll try to share my experience with how networking behaves with post-copy migration. First of all, I would strongly recommend disabling STP on bridges that are involved in post-copy migration. STP adds too much downtime, which goes pretty much against the benefits of post-copy live migration. Second, I observed that qemu announces itself when the CPUs are resumed on the destination. Hence, at least from outside, it seems like the FDB are updated correctly. Cristian.

On 12/04/2014 05:09 AM, Cristian KLEIN wrote:
On 2014-12-04 10:40, Laine Stump wrote:
Currently, libvirt does create and activate simultaneously (and also qemu does a gratuitous ARP request at some point, although I haven't checked if it happens when qemu starts or when the guest CPUs are started), and deactivate, destroy, and free all happen at pretty much the same time as well. The former leads to problems like this one reported by dgilbert:
https://bugzilla.redhat.com/show_bug.cgi?id=1081461
This is just one of several possible variations of "some parts of the network have incorrect information about where MAC X is currently located"; when you mix in post-copy migration, and manual handling of the bridge FDB (https://www.redhat.com/archives/libvir-list/2014-December/msg00173.html),
there are many opportunities for failure!
(BTW, sorry for interjecting this into your migration patches, after this I'll create a new thread when/if I have more to say) Another problem that I've discovered due to the haphazardness of netdev initialization - the network "plugged" hook is called before the tap device has been created, so the XML given to the hook will, if anything, contain an out-of-date tap device name (and the tap device won't exist to be manipulated by the hook anyway).
First of all, I would strongly recommend disabling STP on bridges that are involved in post-copy migration. STP adds too much downtime, which goes pretty much against the benefits of post-copy live migration.
... as long as STP isn't required to avoid forwarding loops, and as long as the admin is schooled enough to know they should disable it. But we should either explicitly forbid (by logging an error if it's encountered, or at least documenting that it doesn't work) or do whatever we can to make it operate properly in those cases.
Second, I observed that qemu announces itself when the CPUs are resumed on the destination.
Good to know.
Hence, at least from outside, it seems like the FDB are updated correctly.
For a host bridge using the kernel's builtin flood/learning to populate the fdb, that likely is the case. With the current state of the code, things aren't so rosy for macvtap, nor for my new libvirt-managed fdb updates. I need to work on that. (Mainly I posted to this thread to see what other problems people may have encountered in this area, to make sure all of them get handled, and to see if anyone thought my suggested changes were crack-based.)
participants (5)
-
Cristian KLEIN
-
Cristian Klein
-
Jiri Denemark
-
Laine Stump
-
Vojtech Cima