[libvirt] [PATCH 0/3] qemu: libvirt RDMA live migration support

From: "Michael R. Hines" <mrhines@us.ibm.com> QEMU has in tree now planned for 1.6 support for RDMA-based live migration. Changes to libvirt: 1. QEMU has a new 'setup' phase in their state machine. 2. Expose the 'x-rdma' migration protocol URI. 3. Expose the 'x-rdma-pin-all' capability for pre-registration of memory. Michael R. Hines (3): qemu: handle new 'setup' migration state qemu: RDMA migration support using 'x-rdma' URI qemu: memory pre-pinning support for RDMA migration include/libvirt/libvirt.h.in | 3 + src/qemu/qemu_capabilities.c | 7 +++ src/qemu/qemu_capabilities.h | 4 ++ src/qemu/qemu_command.c | 8 +++ src/qemu/qemu_migration.c | 131 ++++++++++++++++++++++++++++++++++++------ src/qemu/qemu_migration.h | 3 +- src/qemu/qemu_monitor.c | 7 ++- src/qemu/qemu_monitor.h | 13 +++++ src/qemu/qemu_monitor_json.c | 18 ++++++ tools/virsh-domain.c | 7 +++ 10 files changed, 178 insertions(+), 23 deletions(-) -- 1.7.10.4

From: "Michael R. Hines" <mrhines@us.ibm.com> Previously, QEMU's 'setup' state was no a formal state in their state machine, but it is now. This state is used by RDMA to optionally perform memory pinning. This state is now exposed over the monitor and also measured in the migration info status. This patch consumes both the new setup state as well as the timestamp of the total time spent in that state as reported by QEMU. RDMA migrations perform an optional 'pin-all' operation du Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> --- include/libvirt/libvirt.h.in | 2 ++ src/qemu/qemu_migration.c | 6 ++++++ src/qemu/qemu_monitor.c | 2 +- src/qemu/qemu_monitor.h | 11 +++++++++++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ 5 files changed, 38 insertions(+), 1 deletion(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index c0eb25b..31fb37e 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -4048,6 +4048,8 @@ struct _virDomainJobInfo { /* Time is measured in mill-seconds */ unsigned long long timeElapsed; /* Always set */ unsigned long long timeRemaining; /* Only for VIR_DOMAIN_JOB_BOUNDED */ + unsigned long long setupTime; /* length of the SETUP phase */ + double mbps; /* Migration throughput in Mbps */ /* Data is measured in bytes unless otherwise specified * and is measuring the job as a whole diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 3dedfe8..19001b9 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1667,6 +1667,12 @@ qemuMigrationUpdateJobStatus(virQEMUDriverPtr driver, priv->job.status.ram_transferred + priv->job.status.disk_transferred; + priv->job.info.mbps = priv->job.status.mbps; + priv->job.info.setupTime = priv->job.status.setup_time; + + ret = 0; + break; + case QEMU_MONITOR_MIGRATION_STATUS_SETUP: ret = 0; break; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 0b73411..86aed75 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -108,7 +108,7 @@ VIR_ONCE_GLOBAL_INIT(qemuMonitor) VIR_ENUM_IMPL(qemuMonitorMigrationStatus, QEMU_MONITOR_MIGRATION_STATUS_LAST, - "inactive", "active", "completed", "failed", "cancelled") + "inactive", "setup", "active", "completed", "failed", "cancelled") VIR_ENUM_IMPL(qemuMonitorMigrationCaps, QEMU_MONITOR_MIGRATION_CAPS_LAST, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 4a55501..82e6ae2 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -348,6 +348,7 @@ int qemuMonitorSetMigrationCacheSize(qemuMonitorPtr mon, enum { QEMU_MONITOR_MIGRATION_STATUS_INACTIVE, + QEMU_MONITOR_MIGRATION_STATUS_SETUP, QEMU_MONITOR_MIGRATION_STATUS_ACTIVE, QEMU_MONITOR_MIGRATION_STATUS_COMPLETED, QEMU_MONITOR_MIGRATION_STATUS_ERROR, @@ -366,6 +367,16 @@ struct _qemuMonitorMigrationStatus { /* total or expected depending on status */ bool downtime_set; unsigned long long downtime; + /* + * Duration of the QEMU 'setup' state. + * for RDMA, this may be on the order of several seconds + * if pinning support is requested before the migration begins. + */ + unsigned long long setup_time; + /* + * Migration throughput in Mbps. + */ + double mbps; unsigned long long ram_transferred; unsigned long long ram_remaining; diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 12f7e69..20ca05e 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -2426,6 +2426,14 @@ qemuMonitorJSONGetMigrationStatusReply(virJSONValuePtr reply, virJSONValueObjectGetNumberUlong(ram, "normal-bytes", &status->ram_normal_bytes); + if (virJSONValueObjectGetNumberDouble(ram, "mbps", + &status->mbps) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("migration was active, but RAM 'mbps' " + "data was missing")); + return -1; + } + virJSONValuePtr disk = virJSONValueObjectGet(ret, "disk"); if (disk) { rc = virJSONValueObjectGetNumberUlong(disk, "transferred", @@ -2504,6 +2512,16 @@ qemuMonitorJSONGetMigrationStatusReply(virJSONValuePtr reply, return -1; } } + + rc = virJSONValueObjectGetNumberUlong(ret, "setup-time", + &status->setup_time); + + if (rc < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("migration was active, but no setup-time was set")); + return -1; + } + } return 0; -- 1.7.10.4

On Fri, Jul 26, 2013 at 13:47:43 -0400, mrhines@linux.vnet.ibm.com wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
Previously, QEMU's 'setup' state was no a formal state in their state machine, but it is now. This state is used by RDMA to optionally perform memory pinning. This state is now exposed over the monitor and also measured in the migration info status.
This patch consumes both the new setup state as well as the timestamp of the total time spent in that state as reported by QEMU.
RDMA migrations perform an optional 'pin-all' operation du
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> --- include/libvirt/libvirt.h.in | 2 ++ src/qemu/qemu_migration.c | 6 ++++++ src/qemu/qemu_monitor.c | 2 +- src/qemu/qemu_monitor.h | 11 +++++++++++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ 5 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index c0eb25b..31fb37e 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -4048,6 +4048,8 @@ struct _virDomainJobInfo { /* Time is measured in mill-seconds */ unsigned long long timeElapsed; /* Always set */ unsigned long long timeRemaining; /* Only for VIR_DOMAIN_JOB_BOUNDED */ + unsigned long long setupTime; /* length of the SETUP phase */ + double mbps; /* Migration throughput in Mbps */
/* Data is measured in bytes unless otherwise specified * and is measuring the job as a whole
NACK You can't change content of existing C structures that are part or public API. Running make syntax-check will warn you when you try to do that. Look at virDomainGetJobStats (commits v1.0.2-239-g4dd00f4 to v1.0.2-244-g4121a77) which can return extended migration statistics. Jirka

On 07/26/2013 02:17 PM, Jiri Denemark wrote:
On Fri, Jul 26, 2013 at 13:47:43 -0400, mrhines@linux.vnet.ibm.com wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
Previously, QEMU's 'setup' state was no a formal state in their state machine, but it is now. This state is used by RDMA to optionally perform memory pinning. This state is now exposed over the monitor and also measured in the migration info status.
This patch consumes both the new setup state as well as the timestamp of the total time spent in that state as reported by QEMU.
RDMA migrations perform an optional 'pin-all' operation du
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> --- include/libvirt/libvirt.h.in | 2 ++ src/qemu/qemu_migration.c | 6 ++++++ src/qemu/qemu_monitor.c | 2 +- src/qemu/qemu_monitor.h | 11 +++++++++++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ 5 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index c0eb25b..31fb37e 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -4048,6 +4048,8 @@ struct _virDomainJobInfo { /* Time is measured in mill-seconds */ unsigned long long timeElapsed; /* Always set */ unsigned long long timeRemaining; /* Only for VIR_DOMAIN_JOB_BOUNDED */ + unsigned long long setupTime; /* length of the SETUP phase */ + double mbps; /* Migration throughput in Mbps */
/* Data is measured in bytes unless otherwise specified * and is measuring the job as a whole NACK
You can't change content of existing C structures that are part or public API. Running make syntax-check will warn you when you try to do that. Look at virDomainGetJobStats (commits v1.0.2-239-g4dd00f4 to v1.0.2-244-g4121a77) which can return extended migration statistics.
Jirka
Acknowledged - Will re-work it. - Michael

On 07/26/2013 11:47 AM, mrhines@linux.vnet.ibm.com wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
Previously, QEMU's 'setup' state was no a formal state in their state machine, but it is now. This state is used by RDMA to optionally perform memory pinning. This state is now exposed over the monitor and also measured in the migration info status.
This patch consumes both the new setup state as well as the timestamp of the total time spent in that state as reported by QEMU.
RDMA migrations perform an optional 'pin-all' operation du
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> --- include/libvirt/libvirt.h.in | 2 ++ src/qemu/qemu_migration.c | 6 ++++++ src/qemu/qemu_monitor.c | 2 +- src/qemu/qemu_monitor.h | 11 +++++++++++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ 5 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index c0eb25b..31fb37e 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -4048,6 +4048,8 @@ struct _virDomainJobInfo { /* Time is measured in mill-seconds */ unsigned long long timeElapsed; /* Always set */ unsigned long long timeRemaining; /* Only for VIR_DOMAIN_JOB_BOUNDED */ + unsigned long long setupTime; /* length of the SETUP phase */ + double mbps; /* Migration throughput in Mbps */
This series adds a new feature, and we're already in freeze for 1.1.1, so it would have to be post-freeze. NACK; this is a change to ABI. We cannot change the existing struct size without invalidating apps compiled against an earlier version of the library (ie. virDomainGetJobInfo is cast in stone). Rather, we should be adding new VIR_DOMAIN_JOB_* macros for use in the virTypedParameters of virDomainGetJobStats(). It does point out a bug that can be fixed during freeze: we should fix our documentation so that virDomainGetJobInfo() refers to the better virDomainGetJobStats(). Somewhat related: I mentioned in another thread that neither virDomainGetJob{Info,Stats} nor virDomainGetBlockJobInfo play nicely with the idea of being able to have multiple jobs running in parallel, which is necessary if we want to be able to support fleecing of point-in-time snapshots from multiple clients. Ultimately, that means I think we need to enhance the job mechanism so that new jobs return a handle, then we add new API that can query and abort jobs based on the handle instead of being limited to only the current job; I guess for back-compat, it would also help to have an API that sets the current job out of a set of jobs so that the old APIs are still usable. If we do an upgrade of the job API, we would make sure that the new query method sticks to virTypedParameters in the same was as virDomainGetJobStats(). -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/26/2013 02:32 PM, Eric Blake wrote:
On 07/26/2013 11:47 AM, mrhines@linux.vnet.ibm.com wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
Previously, QEMU's 'setup' state was no a formal state in their state machine, but it is now. This state is used by RDMA to optionally perform memory pinning. This state is now exposed over the monitor and also measured in the migration info status.
This patch consumes both the new setup state as well as the timestamp of the total time spent in that state as reported by QEMU.
RDMA migrations perform an optional 'pin-all' operation du
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> --- include/libvirt/libvirt.h.in | 2 ++ src/qemu/qemu_migration.c | 6 ++++++ src/qemu/qemu_monitor.c | 2 +- src/qemu/qemu_monitor.h | 11 +++++++++++ src/qemu/qemu_monitor_json.c | 18 ++++++++++++++++++ 5 files changed, 38 insertions(+), 1 deletion(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index c0eb25b..31fb37e 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -4048,6 +4048,8 @@ struct _virDomainJobInfo { /* Time is measured in mill-seconds */ unsigned long long timeElapsed; /* Always set */ unsigned long long timeRemaining; /* Only for VIR_DOMAIN_JOB_BOUNDED */ + unsigned long long setupTime; /* length of the SETUP phase */ + double mbps; /* Migration throughput in Mbps */ This series adds a new feature, and we're already in freeze for 1.1.1, so it would have to be post-freeze.
No problem - we're not in a hurry to get libvirt RDMA support merged - just wanted to make sure we get these reviews through so all the code plays nicely.
NACK; this is a change to ABI. We cannot change the existing struct size without invalidating apps compiled against an earlier version of the library (ie. virDomainGetJobInfo is cast in stone). Rather, we should be adding new VIR_DOMAIN_JOB_* macros for use in the virTypedParameters of virDomainGetJobStats().
Acknowledged.
It does point out a bug that can be fixed during freeze: we should fix our documentation so that virDomainGetJobInfo() refers to the better virDomainGetJobStats().
Somewhat related: I mentioned in another thread that neither virDomainGetJob{Info,Stats} nor virDomainGetBlockJobInfo play nicely with the idea of being able to have multiple jobs running in parallel, which is necessary if we want to be able to support fleecing of point-in-time snapshots from multiple clients. Ultimately, that means I think we need to enhance the job mechanism so that new jobs return a handle, then we add new API that can query and abort jobs based on the handle instead of being limited to only the current job; I guess for back-compat, it would also help to have an API that sets the current job out of a set of jobs so that the old APIs are still usable. If we do an upgrade of the job API, we would make sure that the new query method sticks to virTypedParameters in the same was as virDomainGetJobStats().

From: "Michael R. Hines" <mrhines@us.ibm.com> QEMU has in tree now for version 1.6 support for RDMA Live migration. Full documenation of the feature: http://wiki.qemu.org/Features/RDMALiveMigration This patch includes mainly making all the locations in libvirt where the 'tcp' string was hard-coded to be more flexible to use more than one protocol. While the RDMA protocol has been extensively tested (from multiple companies as well as virt-test), the protocol 'x-rdma' will later be renamed to 'rdma' after the community has allowed the feature more cooking. Example usage: virsh migrate --live --migrateuri x-rdma:hostname domain qemu+tcp://hostname/system Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> --- src/qemu/qemu_capabilities.c | 7 ++++ src/qemu/qemu_capabilities.h | 4 +++ src/qemu/qemu_command.c | 8 +++++ src/qemu/qemu_migration.c | 75 +++++++++++++++++++++++++++++++----------- src/qemu/qemu_monitor.c | 3 +- src/qemu/qemu_monitor.h | 1 + 6 files changed, 78 insertions(+), 20 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 5dc3c9e..94d17c6 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -234,6 +234,8 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "vnc-share-policy", /* 150 */ "device-del-event", + + "x-rdma", /* 152 */ ); struct _virQEMUCaps { @@ -1101,6 +1103,7 @@ virQEMUCapsComputeCmdFlags(const char *help, * -incoming unix (qemu >= 0.12.0) * -incoming fd (qemu >= 0.12.0) * -incoming stdio (all earlier kvm) + * -incoming x-rdma (qemu >= 1.6.0) * * NB, there was a pre-kvm-79 'tcp' support, but it * was broken, because it blocked the monitor console @@ -2437,6 +2440,10 @@ virQEMUCapsInitArchQMPBasic(virQEMUCapsPtr qemuCaps, char *archstr = NULL; int ret = -1; + if (qemuCaps->version >= MIN_X_RDMA_VERSION) { + virQEMUCapsSet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_X_RDMA); + } + if (!(archstr = qemuMonitorGetTargetArch(mon))) return -1; diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index f5f685d..5069552 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -191,9 +191,13 @@ enum virQEMUCapsFlags { QEMU_CAPS_VNC_SHARE_POLICY = 150, /* set display sharing policy */ QEMU_CAPS_DEVICE_DEL_EVENT = 151, /* DEVICE_DELETED event */ + QEMU_CAPS_MIGRATE_QEMU_X_RDMA = 152, /* have qemu x-rdma migration */ + QEMU_CAPS_LAST, /* this must always be the last item */ }; +#define MIN_X_RDMA_VERSION 1006000 + typedef struct _virQEMUCaps virQEMUCaps; typedef virQEMUCaps *virQEMUCapsPtr; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index aa3a2fd..a26acd7 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -8657,6 +8657,14 @@ qemuBuildCommandLine(virConnectPtr conn, goto error; } virCommandAddArg(cmd, migrateFrom); + } else if (STRPREFIX(migrateFrom, "x-rdma")) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_X_RDMA)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + "%s", _("RDMA migration is not supported with " + "this QEMU binary")); + goto error; + } + virCommandAddArg(cmd, migrateFrom); } else if (STREQ(migrateFrom, "stdio")) { if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD)) { virCommandAddArgFormat(cmd, "fd:%d", migrateFd); diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 19001b9..de20d23 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2169,7 +2169,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, virDomainDefPtr *def, virStreamPtr st, unsigned int port, - unsigned long flags) + unsigned long flags, + const char *protocol) { virDomainObjPtr vm = NULL; virDomainEventPtr event = NULL; @@ -2280,7 +2281,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, * and there is at least one IPv6 address configured */ if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_IPV6_MIGRATION) && - getaddrinfo("::", NULL, &hints, &info) == 0) { + getaddrinfo("::", NULL, &hints, &info) == 0 && + !strstr(protocol, "rdma")) { freeaddrinfo(info); listenAddr = "[::]"; } else { @@ -2291,7 +2293,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, /* QEMU will be started with -incoming [::]:port * or -incoming 0.0.0.0:port */ - if (virAsprintf(&migrateFrom, "tcp:%s:%d", listenAddr, port) < 0) + if (virAsprintf(&migrateFrom, "%s:%s:%d", protocol, + listenAddr, port) < 0) goto cleanup; } @@ -2482,7 +2485,7 @@ qemuMigrationPrepareTunnel(virQEMUDriverPtr driver, ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, def, - st, 0, flags); + st, 0, flags, "tcp"); return ret; } @@ -2502,6 +2505,8 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver, static int port = 0; int this_port; char *hostname = NULL; + const char *protocol = NULL; + char *well_formed_protocol = NULL; const char *p; char *uri_str = NULL; int ret = -1; @@ -2550,20 +2555,29 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver, if (virAsprintf(uri_out, "tcp:%s:%d", hostname, this_port) < 0) goto cleanup; } else { - /* Check the URI starts with "tcp:". We will escape the + /* Check the URI starts with a valid prefix. We will escape the * URI when passing it to the qemu monitor, so bad * characters in hostname part don't matter. */ - if (!(p = STRSKIP(uri_in, "tcp:"))) { - virReportError(VIR_ERR_INVALID_ARG, "%s", - _("only tcp URIs are supported for KVM/QEMU" - " migrations")); + + protocol = strtok(strdup(uri_in), ":"); + if (protocol) { + if (virAsprintf(&well_formed_protocol, "%s://", protocol) < 0) + goto cleanup; + } + + /* Make sure it's a valid protocol */ + if (!(p = STRSKIP(uri_in, "tcp:")) && + !(p = STRSKIP(uri_in, "x-rdma:"))) { + virReportError(VIR_ERR_INVALID_ARG, _("URI %s (%s) not supported" + " for KVM/QEMU migrations"), protocol, uri_in); goto cleanup; } - /* Convert uri_in to well-formed URI with // after tcp: */ - if (!(STRPREFIX(uri_in, "tcp://"))) { - if (virAsprintf(&uri_str, "tcp://%s", p) < 0) + + /* Convert uri_in to well-formed URI with // after colon */ + if (!(STRPREFIX(uri_in, well_formed_protocol))) { + if (virAsprintf(&uri_str, "%s://%s", protocol, p) < 0) goto cleanup; } @@ -2602,10 +2616,20 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver, ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, def, - NULL, this_port, flags); + NULL, this_port, flags, + protocol ? protocol : "tcp"); cleanup: virURIFree(uri); VIR_FREE(hostname); + + if (protocol) { + VIR_FREE(protocol); + } + + if (well_formed_protocol) { + VIR_FREE(well_formed_protocol); + } + if (ret != 0) VIR_FREE(*uri_out); return ret; @@ -2800,6 +2824,7 @@ struct _qemuMigrationSpec { enum qemuMigrationDestinationType destType; union { struct { + const char *proto; const char *name; int port; } host; @@ -3161,6 +3186,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, switch (spec->destType) { case MIGRATION_DEST_HOST: ret = qemuMonitorMigrateToHost(priv->mon, migrate_flags, + spec->dest.host.proto, spec->dest.host.name, spec->dest.host.port); break; @@ -3291,7 +3317,7 @@ cancel: goto cleanup; } -/* Perform migration using QEMU's native TCP migrate support, +/* Perform migration using QEMU's native migrate support, * not encrypted obviously */ static int doNativeMigrate(virQEMUDriverPtr driver, @@ -3309,6 +3335,8 @@ static int doNativeMigrate(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; virURIPtr uribits = NULL; int ret = -1; + char *tmp = NULL; + bool rdma = false; qemuMigrationSpec spec; VIR_DEBUG("driver=%p, vm=%p, uri=%s, cookiein=%s, cookieinlen=%d, " @@ -3318,20 +3346,29 @@ static int doNativeMigrate(virQEMUDriverPtr driver, cookieout, cookieoutlen, flags, resource, NULLSTR(graphicsuri)); + /* HACK: source host generates bogus URIs, so fix them up */ if (STRPREFIX(uri, "tcp:") && !STRPREFIX(uri, "tcp://")) { - char *tmp; - /* HACK: source host generates bogus URIs, so fix them up */ if (virAsprintf(&tmp, "tcp://%s", uri + strlen("tcp:")) < 0) return -1; - uribits = virURIParse(tmp); - VIR_FREE(tmp); + spec.dest.host.proto = "tcp"; + } else if (STRPREFIX(uri, "x-rdma:") && !STRPREFIX(uri, "x-rdma://")) { + if (virAsprintf(&tmp, "x-rdma://%s", uri + strlen("x-rdma:")) < 0) + return -1; + rdma = true; + spec.dest.host.proto = "x-rdma"; } else { uribits = virURIParse(uri); } + + if (tmp) { + uribits = virURIParse(tmp); + VIR_FREE(tmp); + } + if (!uribits) return -1; - if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD)) + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && !rdma) spec.destType = MIGRATION_DEST_CONNECT_HOST; else spec.destType = MIGRATION_DEST_HOST; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 86aed75..ce95174 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2098,6 +2098,7 @@ int qemuMonitorMigrateToFd(qemuMonitorPtr mon, int qemuMonitorMigrateToHost(qemuMonitorPtr mon, unsigned int flags, + const char *proto, const char *hostname, int port) { @@ -2113,7 +2114,7 @@ int qemuMonitorMigrateToHost(qemuMonitorPtr mon, } - if (virAsprintf(&uri, "tcp:%s:%d", hostname, port) < 0) + if (virAsprintf(&uri, "%s:%s:%d", proto, hostname, port) < 0) return -1; if (mon->json) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 82e6ae2..d722e12 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -429,6 +429,7 @@ int qemuMonitorMigrateToFd(qemuMonitorPtr mon, int qemuMonitorMigrateToHost(qemuMonitorPtr mon, unsigned int flags, + const char *proto, const char *hostname, int port); -- 1.7.10.4

On Fri, Jul 26, 2013 at 13:47:44 -0400, mrhines@linux.vnet.ibm.com wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
QEMU has in tree now for version 1.6 support for RDMA Live migration. Full documenation of the feature: http://wiki.qemu.org/Features/RDMALiveMigration
This patch includes mainly making all the locations in libvirt where the 'tcp' string was hard-coded to be more flexible to use more than one protocol.
While the RDMA protocol has been extensively tested (from multiple companies as well as virt-test), the protocol 'x-rdma' will later be renamed to 'rdma' after the community has allowed the feature more cooking.
This does not prevent us from calling the protocol "rdma" right away and possibly translating it to "x-rdma". However, I don't think we actually want to commit patches for rdma migration before QEMU changes the name to "rdma".
Example usage:
virsh migrate --live --migrateuri x-rdma:hostname domain qemu+tcp://hostname/system
Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> --- src/qemu/qemu_capabilities.c | 7 ++++ src/qemu/qemu_capabilities.h | 4 +++ src/qemu/qemu_command.c | 8 +++++ src/qemu/qemu_migration.c | 75 +++++++++++++++++++++++++++++++----------- src/qemu/qemu_monitor.c | 3 +- src/qemu/qemu_monitor.h | 1 + 6 files changed, 78 insertions(+), 20 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 5dc3c9e..94d17c6 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -234,6 +234,8 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST,
"vnc-share-policy", /* 150 */ "device-del-event", + + "x-rdma", /* 152 */ );
struct _virQEMUCaps { @@ -1101,6 +1103,7 @@ virQEMUCapsComputeCmdFlags(const char *help, * -incoming unix (qemu >= 0.12.0) * -incoming fd (qemu >= 0.12.0) * -incoming stdio (all earlier kvm) + * -incoming x-rdma (qemu >= 1.6.0) * * NB, there was a pre-kvm-79 'tcp' support, but it * was broken, because it blocked the monitor console @@ -2437,6 +2440,10 @@ virQEMUCapsInitArchQMPBasic(virQEMUCapsPtr qemuCaps, char *archstr = NULL; int ret = -1;
+ if (qemuCaps->version >= MIN_X_RDMA_VERSION) { + virQEMUCapsSet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_X_RDMA); + } + if (!(archstr = qemuMonitorGetTargetArch(mon))) return -1;
This is wrong. First, you're adding this into a totally wrong place and second, we need a better detection which is not based on qemu version.
diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index f5f685d..5069552 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -191,9 +191,13 @@ enum virQEMUCapsFlags { QEMU_CAPS_VNC_SHARE_POLICY = 150, /* set display sharing policy */ QEMU_CAPS_DEVICE_DEL_EVENT = 151, /* DEVICE_DELETED event */
+ QEMU_CAPS_MIGRATE_QEMU_X_RDMA = 152, /* have qemu x-rdma migration */ + QEMU_CAPS_LAST, /* this must always be the last item */ };
+#define MIN_X_RDMA_VERSION 1006000 + typedef struct _virQEMUCaps virQEMUCaps; typedef virQEMUCaps *virQEMUCapsPtr;
diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index aa3a2fd..a26acd7 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -8657,6 +8657,14 @@ qemuBuildCommandLine(virConnectPtr conn, goto error; } virCommandAddArg(cmd, migrateFrom); + } else if (STRPREFIX(migrateFrom, "x-rdma")) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_X_RDMA)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + "%s", _("RDMA migration is not supported with " + "this QEMU binary")); + goto error; + } + virCommandAddArg(cmd, migrateFrom); } else if (STREQ(migrateFrom, "stdio")) { if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD)) { virCommandAddArgFormat(cmd, "fd:%d", migrateFd); diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 19001b9..de20d23 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2169,7 +2169,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, virDomainDefPtr *def, virStreamPtr st, unsigned int port, - unsigned long flags) + unsigned long flags, + const char *protocol) { virDomainObjPtr vm = NULL; virDomainEventPtr event = NULL; @@ -2280,7 +2281,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, * and there is at least one IPv6 address configured */ if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_IPV6_MIGRATION) && - getaddrinfo("::", NULL, &hints, &info) == 0) { + getaddrinfo("::", NULL, &hints, &info) == 0 && + !strstr(protocol, "rdma")) { freeaddrinfo(info); listenAddr = "[::]"; } else {
Is there any reason why RDMA migration does not work over IPv6?
@@ -2291,7 +2293,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, /* QEMU will be started with -incoming [::]:port * or -incoming 0.0.0.0:port */ - if (virAsprintf(&migrateFrom, "tcp:%s:%d", listenAddr, port) < 0) + if (virAsprintf(&migrateFrom, "%s:%s:%d", protocol, + listenAddr, port) < 0) goto cleanup; }
@@ -2482,7 +2485,7 @@ qemuMigrationPrepareTunnel(virQEMUDriverPtr driver,
ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, def, - st, 0, flags); + st, 0, flags, "tcp"); return ret; }
@@ -2502,6 +2505,8 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver, static int port = 0; int this_port; char *hostname = NULL; + const char *protocol = NULL; + char *well_formed_protocol = NULL; const char *p; char *uri_str = NULL; int ret = -1; @@ -2550,20 +2555,29 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver, if (virAsprintf(uri_out, "tcp:%s:%d", hostname, this_port) < 0) goto cleanup; } else { - /* Check the URI starts with "tcp:". We will escape the + /* Check the URI starts with a valid prefix. We will escape the * URI when passing it to the qemu monitor, so bad * characters in hostname part don't matter. */ - if (!(p = STRSKIP(uri_in, "tcp:"))) { - virReportError(VIR_ERR_INVALID_ARG, "%s", - _("only tcp URIs are supported for KVM/QEMU" - " migrations")); + + protocol = strtok(strdup(uri_in), ":"); + if (protocol) { + if (virAsprintf(&well_formed_protocol, "%s://", protocol) < 0) + goto cleanup; + } + + /* Make sure it's a valid protocol */ + if (!(p = STRSKIP(uri_in, "tcp:")) && + !(p = STRSKIP(uri_in, "x-rdma:"))) { + virReportError(VIR_ERR_INVALID_ARG, _("URI %s (%s) not supported" + " for KVM/QEMU migrations"), protocol, uri_in); goto cleanup; }
- /* Convert uri_in to well-formed URI with // after tcp: */ - if (!(STRPREFIX(uri_in, "tcp://"))) { - if (virAsprintf(&uri_str, "tcp://%s", p) < 0) + + /* Convert uri_in to well-formed URI with // after colon */ + if (!(STRPREFIX(uri_in, well_formed_protocol))) { + if (virAsprintf(&uri_str, "%s://%s", protocol, p) < 0) goto cleanup; }
Just because we did the mistake with tcp protocol we don't have to repeat it with rdma. Just change the rdma protocol to always be well-formed, i.e., rdma://host
@@ -2602,10 +2616,20 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver,
ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, def, - NULL, this_port, flags); + NULL, this_port, flags, + protocol ? protocol : "tcp"); cleanup: virURIFree(uri); VIR_FREE(hostname); + + if (protocol) { + VIR_FREE(protocol); + } + + if (well_formed_protocol) { + VIR_FREE(well_formed_protocol); + } +
You're not supposed to check if a variable you're about to free is non-NULL. Running make syntax-check would have warned you.
if (ret != 0) VIR_FREE(*uri_out); return ret; @@ -2800,6 +2824,7 @@ struct _qemuMigrationSpec { enum qemuMigrationDestinationType destType; union { struct { + const char *proto; const char *name; int port; } host; @@ -3161,6 +3186,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, switch (spec->destType) { case MIGRATION_DEST_HOST: ret = qemuMonitorMigrateToHost(priv->mon, migrate_flags, + spec->dest.host.proto, spec->dest.host.name, spec->dest.host.port); break; @@ -3291,7 +3317,7 @@ cancel: goto cleanup; }
-/* Perform migration using QEMU's native TCP migrate support, +/* Perform migration using QEMU's native migrate support, * not encrypted obviously */ static int doNativeMigrate(virQEMUDriverPtr driver, @@ -3309,6 +3335,8 @@ static int doNativeMigrate(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; virURIPtr uribits = NULL; int ret = -1; + char *tmp = NULL; + bool rdma = false; qemuMigrationSpec spec;
VIR_DEBUG("driver=%p, vm=%p, uri=%s, cookiein=%s, cookieinlen=%d, " @@ -3318,20 +3346,29 @@ static int doNativeMigrate(virQEMUDriverPtr driver, cookieout, cookieoutlen, flags, resource, NULLSTR(graphicsuri));
+ /* HACK: source host generates bogus URIs, so fix them up */ if (STRPREFIX(uri, "tcp:") && !STRPREFIX(uri, "tcp://")) { - char *tmp; - /* HACK: source host generates bogus URIs, so fix them up */ if (virAsprintf(&tmp, "tcp://%s", uri + strlen("tcp:")) < 0) return -1; - uribits = virURIParse(tmp); - VIR_FREE(tmp); + spec.dest.host.proto = "tcp"; + } else if (STRPREFIX(uri, "x-rdma:") && !STRPREFIX(uri, "x-rdma://")) { + if (virAsprintf(&tmp, "x-rdma://%s", uri + strlen("x-rdma:")) < 0) + return -1; + rdma = true; + spec.dest.host.proto = "x-rdma"; } else { uribits = virURIParse(uri); } + + if (tmp) { + uribits = virURIParse(tmp); + VIR_FREE(tmp); + } + if (!uribits) return -1;
- if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD)) + if (virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_MIGRATE_QEMU_FD) && !rdma) spec.destType = MIGRATION_DEST_CONNECT_HOST; else spec.destType = MIGRATION_DEST_HOST; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 86aed75..ce95174 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2098,6 +2098,7 @@ int qemuMonitorMigrateToFd(qemuMonitorPtr mon,
int qemuMonitorMigrateToHost(qemuMonitorPtr mon, unsigned int flags, + const char *proto, const char *hostname, int port) { @@ -2113,7 +2114,7 @@ int qemuMonitorMigrateToHost(qemuMonitorPtr mon, }
- if (virAsprintf(&uri, "tcp:%s:%d", hostname, port) < 0) + if (virAsprintf(&uri, "%s:%s:%d", proto, hostname, port) < 0) return -1;
if (mon->json) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 82e6ae2..d722e12 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -429,6 +429,7 @@ int qemuMonitorMigrateToFd(qemuMonitorPtr mon,
int qemuMonitorMigrateToHost(qemuMonitorPtr mon, unsigned int flags, + const char *proto, const char *hostname, int port);
Jirka

On 07/26/2013 02:27 PM, Jiri Denemark wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
QEMU has in tree now for version 1.6 support for RDMA Live migration. Full documenation of the feature: http://wiki.qemu.org/Features/RDMALiveMigration
This patch includes mainly making all the locations in libvirt where the 'tcp' string was hard-coded to be more flexible to use more than one protocol.
While the RDMA protocol has been extensively tested (from multiple companies as well as virt-test), the protocol 'x-rdma' will later be renamed to 'rdma' after the community has allowed the feature more cooking. This does not prevent us from calling the protocol "rdma" right away and
On Fri, Jul 26, 2013 at 13:47:44 -0400, mrhines@linux.vnet.ibm.com wrote: possibly translating it to "x-rdma". However, I don't think we actually want to commit patches for rdma migration before QEMU changes the name to "rdma".
Acknowledged.
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 5dc3c9e..94d17c6 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -234,6 +234,8 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST,
"vnc-share-policy", /* 150 */ "device-del-event", + + "x-rdma", /* 152 */ );
struct _virQEMUCaps { @@ -1101,6 +1103,7 @@ virQEMUCapsComputeCmdFlags(const char *help, * -incoming unix (qemu >= 0.12.0) * -incoming fd (qemu >= 0.12.0) * -incoming stdio (all earlier kvm) + * -incoming x-rdma (qemu >= 1.6.0) * * NB, there was a pre-kvm-79 'tcp' support, but it * was broken, because it blocked the monitor console @@ -2437,6 +2440,10 @@ virQEMUCapsInitArchQMPBasic(virQEMUCapsPtr qemuCaps, char *archstr = NULL; int ret = -1;
+ if (qemuCaps->version >= MIN_X_RDMA_VERSION) { + virQEMUCapsSet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_X_RDMA); + } + if (!(archstr = qemuMonitorGetTargetArch(mon))) return -1;
This is wrong. First, you're adding this into a totally wrong place and second, we need a better detection which is not based on qemu version.
How would we detect without using the QEMU version? This feature doesn't have any new command-line arguments (except for -incoming ....)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 19001b9..de20d23 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -2169,7 +2169,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, virDomainDefPtr *def, virStreamPtr st, unsigned int port, - unsigned long flags) + unsigned long flags, + const char *protocol) { virDomainObjPtr vm = NULL; virDomainEventPtr event = NULL; @@ -2280,7 +2281,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, * and there is at least one IPv6 address configured */ if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_IPV6_MIGRATION) && - getaddrinfo("::", NULL, &hints, &info) == 0) { + getaddrinfo("::", NULL, &hints, &info) == 0 && + !strstr(protocol, "rdma")) { freeaddrinfo(info); listenAddr = "[::]"; } else { Is there any reason why RDMA migration does not work over IPv6?
Laziness on my part - It was never implemented because QEMU's parsing of the "[::]" brackets is hard-coded for TCP/IP. I'll submit a patch to break this out on qemu-devel.
@@ -2291,7 +2293,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, /* QEMU will be started with -incoming [::]:port * or -incoming 0.0.0.0:port */ - if (virAsprintf(&migrateFrom, "tcp:%s:%d", listenAddr, port) < 0) + if (virAsprintf(&migrateFrom, "%s:%s:%d", protocol, + listenAddr, port) < 0) goto cleanup; }
@@ -2482,7 +2485,7 @@ qemuMigrationPrepareTunnel(virQEMUDriverPtr driver,
ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, def, - st, 0, flags); + st, 0, flags, "tcp"); return ret; }
@@ -2502,6 +2505,8 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver, static int port = 0; int this_port; char *hostname = NULL; + const char *protocol = NULL; + char *well_formed_protocol = NULL; const char *p; char *uri_str = NULL; int ret = -1; @@ -2550,20 +2555,29 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver, if (virAsprintf(uri_out, "tcp:%s:%d", hostname, this_port) < 0) goto cleanup; } else { - /* Check the URI starts with "tcp:". We will escape the + /* Check the URI starts with a valid prefix. We will escape the * URI when passing it to the qemu monitor, so bad * characters in hostname part don't matter. */ - if (!(p = STRSKIP(uri_in, "tcp:"))) { - virReportError(VIR_ERR_INVALID_ARG, "%s", - _("only tcp URIs are supported for KVM/QEMU" - " migrations")); + + protocol = strtok(strdup(uri_in), ":"); + if (protocol) { + if (virAsprintf(&well_formed_protocol, "%s://", protocol) < 0) + goto cleanup; + } + + /* Make sure it's a valid protocol */ + if (!(p = STRSKIP(uri_in, "tcp:")) && + !(p = STRSKIP(uri_in, "x-rdma:"))) { + virReportError(VIR_ERR_INVALID_ARG, _("URI %s (%s) not supported" + " for KVM/QEMU migrations"), protocol, uri_in); goto cleanup; }
- /* Convert uri_in to well-formed URI with // after tcp: */ - if (!(STRPREFIX(uri_in, "tcp://"))) { - if (virAsprintf(&uri_str, "tcp://%s", p) < 0) + + /* Convert uri_in to well-formed URI with // after colon */ + if (!(STRPREFIX(uri_in, well_formed_protocol))) { + if (virAsprintf(&uri_str, "%s://%s", protocol, p) < 0) goto cleanup; } Just because we did the mistake with tcp protocol we don't have to repeat it with rdma. Just change the rdma protocol to always be well-formed, i.e., rdma://host
Having a conditional check only for 'rdma' is what I was trying to avoid. Wouldn't it be better to have both options available? Having both choices is kind of nice - and it's unlikely that users will stop breaking the habit for a while.
@@ -2602,10 +2616,20 @@ qemuMigrationPrepareDirect(virQEMUDriverPtr driver,
ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, def, - NULL, this_port, flags); + NULL, this_port, flags, + protocol ? protocol : "tcp"); cleanup: virURIFree(uri); VIR_FREE(hostname); + + if (protocol) { + VIR_FREE(protocol); + } + + if (well_formed_protocol) { + VIR_FREE(well_formed_protocol); + } +
You're not supposed to check if a variable you're about to free is non-NULL. Running make syntax-check would have warned you.
Understood =)

On 07/26/2013 12:48 PM, Michael R. Hines wrote:
int ret = -1; + if (qemuCaps->version >= MIN_X_RDMA_VERSION) { + virQEMUCapsSet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_X_RDMA); + } + if (!(archstr = qemuMonitorGetTargetArch(mon))) return -1;
This is wrong. First, you're adding this into a totally wrong place and second, we need a better detection which is not based on qemu version.
How would we detect without using the QEMU version?
Call 'query-migrate-capabilities' as part of the one-time capability probing in qemu_capabilities.c; if [x-]rdma-pin-all is listed, then RDMA is supported and you set the qemuCaps bit. There's also the matter of handshaking between source and dest - we can't enable it on the source unless the dest also supports rdma; there, the migration cookie should be handy to pass destination support of the qemuCaps bit back to the source, in time for the source to react. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/26/2013 02:56 PM, Eric Blake wrote:
On 07/26/2013 12:48 PM, Michael R. Hines wrote:
int ret = -1; + if (qemuCaps->version >= MIN_X_RDMA_VERSION) { + virQEMUCapsSet(qemuCaps, QEMU_CAPS_MIGRATE_QEMU_X_RDMA); + } + if (!(archstr = qemuMonitorGetTargetArch(mon))) return -1;
This is wrong. First, you're adding this into a totally wrong place and second, we need a better detection which is not based on qemu version. How would we detect without using the QEMU version? Call 'query-migrate-capabilities' as part of the one-time capability probing in qemu_capabilities.c; if [x-]rdma-pin-all is listed, then RDMA is supported and you set the qemuCaps bit. There's also the matter of handshaking between source and dest - we can't enable it on the source unless the dest also supports rdma; there, the migration cookie should be handy to pass destination support of the qemuCaps bit back to the source, in time for the source to react.
Acknowledged.

From: "Michael R. Hines" <mrhines@us.ibm.com> RDMA Live migration requires registering memory with the hardware, and thus QEMU offers a new 'capability' which supports the ability to pre-register / mlock() the guest memory in advance for higher RDMA performance before the migration begins. This patch exposes this capability with the following example usage: virsh migrate --live --x-rdma-pin-all --migrateuri x-rdma:hostname domain qemu+tcp://hostname/system This capability is disabled by default, and thus ommiting it will cause QEMU to register the memory with the hardware in an on-demand basis. Signed-off-by: Michael R. Hines <mrhines@us.ibm.com> --- include/libvirt/libvirt.h.in | 1 + src/qemu/qemu_migration.c | 50 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_migration.h | 3 ++- src/qemu/qemu_monitor.c | 2 +- src/qemu/qemu_monitor.h | 1 + tools/virsh-domain.c | 7 ++++++ 6 files changed, 62 insertions(+), 2 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 31fb37e..d21cb74 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1193,6 +1193,7 @@ typedef enum { VIR_MIGRATE_OFFLINE = (1 << 10), /* offline migrate */ VIR_MIGRATE_COMPRESSED = (1 << 11), /* compress data during migration */ VIR_MIGRATE_ABORT_ON_ERROR = (1 << 12), /* abort migration on I/O errors happened during migration */ + VIR_MIGRATE_X_RDMA_PIN_ALL = (1 << 13), /* RDMA memory pinning */ } virDomainMigrateFlags; diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index de20d23..f5d9b16 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1574,6 +1574,46 @@ cleanup: } static int +qemuMigrationSetPinAll(virQEMUDriverPtr driver, + virDomainObjPtr vm, + enum qemuDomainAsyncJob job) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, job) < 0) + return -1; + + ret = qemuMonitorGetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_X_RDMA_PIN_ALL); + + if (ret < 0) { + goto cleanup; + } else if (ret == 0) { + if (job == QEMU_ASYNC_JOB_MIGRATION_IN) { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("rdma pinning migration is not supported by " + "target QEMU binary")); + } else { + virReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, "%s", + _("rdma pinning migration is not supported by " + "source QEMU binary")); + } + ret = -1; + goto cleanup; + } + + ret = qemuMonitorSetMigrationCapability( + priv->mon, + QEMU_MONITOR_MIGRATION_CAPS_X_RDMA_PIN_ALL); + +cleanup: + qemuDomainObjExitMonitor(driver, vm); + return ret; +} + +static int qemuMigrationWaitForSpice(virQEMUDriverPtr driver, virDomainObjPtr vm) { @@ -2363,6 +2403,11 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) goto stop; + if (flags & VIR_MIGRATE_X_RDMA_PIN_ALL && + qemuMigrationSetPinAll(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) + goto stop; + if (mig->lockState) { VIR_DEBUG("Received lockstate %s", mig->lockState); VIR_FREE(priv->lockState); @@ -3156,6 +3201,11 @@ qemuMigrationRun(virQEMUDriverPtr driver, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup; + if (flags & VIR_MIGRATE_X_RDMA_PIN_ALL && + qemuMigrationSetPinAll(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto cleanup; + if (qemuDomainObjEnterMonitorAsync(driver, vm, QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) goto cleanup; diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 0f6c5f7..d08979f 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -39,7 +39,8 @@ VIR_MIGRATE_UNSAFE | \ VIR_MIGRATE_OFFLINE | \ VIR_MIGRATE_COMPRESSED | \ - VIR_MIGRATE_ABORT_ON_ERROR) + VIR_MIGRATE_ABORT_ON_ERROR | \ + VIR_MIGRATE_X_RDMA_PIN_ALL) /* All supported migration parameters and their types. */ # define QEMU_MIGRATION_PARAMETERS \ diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ce95174..34d6840 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -112,7 +112,7 @@ VIR_ENUM_IMPL(qemuMonitorMigrationStatus, VIR_ENUM_IMPL(qemuMonitorMigrationCaps, QEMU_MONITOR_MIGRATION_CAPS_LAST, - "xbzrle") + "xbzrle", "x-rdma-pin-all") VIR_ENUM_IMPL(qemuMonitorVMStatus, QEMU_MONITOR_VM_STATUS_LAST, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index d722e12..83864f2 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -405,6 +405,7 @@ int qemuMonitorGetSpiceMigrationStatus(qemuMonitorPtr mon, typedef enum { QEMU_MONITOR_MIGRATION_CAPS_XBZRLE, + QEMU_MONITOR_MIGRATION_CAPS_X_RDMA_PIN_ALL, QEMU_MONITOR_MIGRATION_CAPS_LAST } qemuMonitorMigrationCaps; diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 8cafce4..ee8b423 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -8380,6 +8380,10 @@ static const vshCmdOptDef opts_migrate[] = { .type = VSH_OT_BOOL, .help = N_("compress repeated pages during live migration") }, + {.name = "x-rdma-pin-all", + .type = VSH_OT_BOOL, + .help = N_("support memory pinning during rdma live migration") + }, {.name = "abort-on-error", .type = VSH_OT_BOOL, .help = N_("abort on soft errors during migration") @@ -8513,6 +8517,9 @@ doMigrate(void *opaque) if (vshCommandOptBool(cmd, "compressed")) flags |= VIR_MIGRATE_COMPRESSED; + if (vshCommandOptBool(cmd, "x-rdma-pin-all")) + flags |= VIR_MIGRATE_X_RDMA_PIN_ALL; + if (vshCommandOptBool(cmd, "offline")) { flags |= VIR_MIGRATE_OFFLINE; } -- 1.7.10.4

On 07/26/2013 11:47 AM, mrhines@linux.vnet.ibm.com wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
QEMU has in tree now planned for 1.6 support for RDMA-based live migration.
Changes to libvirt: 1. QEMU has a new 'setup' phase in their state machine. 2. Expose the 'x-rdma' migration protocol URI. 3. Expose the 'x-rdma-pin-all' capability for pre-registration of memory.
The x- prefix means that the migration is still experimental; do we want to be codifying the use of experimental API into libvirt, or is it time for a patch to qemu to remove the x- prefix? Back in the 1.5 timeframe, when RDMA was first proposed, the x- prefix made sense, but now that we are closer to qemu 1.6, and you are trying to get libvirt to drive it, that's a declaration of stability. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 07/26/2013 02:16 PM, Eric Blake wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
QEMU has in tree now planned for 1.6 support for RDMA-based live migration.
Changes to libvirt: 1. QEMU has a new 'setup' phase in their state machine. 2. Expose the 'x-rdma' migration protocol URI. 3. Expose the 'x-rdma-pin-all' capability for pre-registration of memory. The x- prefix means that the migration is still experimental; do we want to be codifying the use of experimental API into libvirt, or is it time for a patch to qemu to remove the x- prefix? Back in the 1.5 timeframe, when RDMA was first proposed, the x- prefix made sense, but now that we are closer to qemu 1.6, and you are trying to get libvirt to drive it,
On 07/26/2013 11:47 AM, mrhines@linux.vnet.ibm.com wrote: that's a declaration of stability.
That's a good question. I'll consult with the community for guidance. It has been very extensively tested, though - by several parties, but I'll find out and be in touch.

On Fri, Jul 26, 2013 at 12:16:08PM -0600, Eric Blake wrote:
On 07/26/2013 11:47 AM, mrhines@linux.vnet.ibm.com wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
QEMU has in tree now planned for 1.6 support for RDMA-based live migration.
Changes to libvirt: 1. QEMU has a new 'setup' phase in their state machine. 2. Expose the 'x-rdma' migration protocol URI. 3. Expose the 'x-rdma-pin-all' capability for pre-registration of memory.
The x- prefix means that the migration is still experimental; do we want to be codifying the use of experimental API into libvirt, or is it time for a patch to qemu to remove the x- prefix? Back in the 1.5 timeframe, when RDMA was first proposed, the x- prefix made sense, but now that we are closer to qemu 1.6, and you are trying to get libvirt to drive it, that's a declaration of stability.
If it is an experimental API & naming, we definitely do not want to expose that in the libvirt public API. We need to expose a stable API for libvirt's client apps. If we can't do that until QEMU itself is stable, then we'll have to wait. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 07/29/2013 06:18 AM, Daniel P. Berrange wrote:
On Fri, Jul 26, 2013 at 12:16:08PM -0600, Eric Blake wrote:
From: "Michael R. Hines" <mrhines@us.ibm.com>
QEMU has in tree now planned for 1.6 support for RDMA-based live migration.
Changes to libvirt: 1. QEMU has a new 'setup' phase in their state machine. 2. Expose the 'x-rdma' migration protocol URI. 3. Expose the 'x-rdma-pin-all' capability for pre-registration of memory. The x- prefix means that the migration is still experimental; do we want to be codifying the use of experimental API into libvirt, or is it time for a patch to qemu to remove the x- prefix? Back in the 1.5 timeframe, when RDMA was first proposed, the x- prefix made sense, but now that we are closer to qemu 1.6, and you are trying to get libvirt to drive it,
On 07/26/2013 11:47 AM, mrhines@linux.vnet.ibm.com wrote: that's a declaration of stability. If it is an experimental API & naming, we definitely do not want to expose that in the libvirt public API. We need to expose a stable API for libvirt's client apps. If we can't do that until QEMU itself is stable, then we'll have to wait.
Daniel
Acknowledged. Will work it out with the QEMU community (they are in freeze too, so will wait till next month). - Michael
participants (5)
-
Daniel P. Berrange
-
Eric Blake
-
Jiri Denemark
-
Michael R. Hines
-
mrhines@linux.vnet.ibm.com