[libvirt] [PATCH v1 00/11] Rework storage migration

This patch set re-implements migration with storage for enough new qemu. Currently, you can migrate a domain to a host without need for shared storage. This is done by setting 'blk' or 'inc' attribute (representing VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC flags respectively) of 'migrate' monitor command. However, the qemu implementation is buggy and applications are advised to switch to new impementation which, moreover, offers some nice features, like migrating only explicitly specified disks. The new functionality is controlled via 'nbd-server-*' and 'drive-mirror' commands. The flow is meant to look like this: 1) User invokes libvirt's migrate functionality. 2) libvirt checks that no block jobs are active on the source. 3) libvirt starts the destination QEMU and sets up the NBD server using the nbd-server-start and nbd-server-add commands. 4) libvirt starts drive-mirror with a destination pointing to the remote NBD server, for example nbd:host:port:exportname=diskname (where diskname is the -drive id specified on the destination). 5) once all mirroring jobs reach steady state, libvirt invokes the migrate command. 6) once migration completed, libvirt invokes the nbd-server-stop command on the destination QEMU. If we just skip the 2nd step and there is an active block-job, qemu will fail in step 4. No big deal. Since we try to NOT break migration and keep things compatible, this feature is enabled iff both sides support it. Since there's obvious need for some data transfer between src and dst, I've put it into qemuCookieMigration: 1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk src='/var/lib/libvirt/images/f17.img' size='17179869184'/> </nbd> Hey destination, I know how to use this cool new feature. Moreover, these are the paths I'll send you. Each file is X bytes big. It's one of the prerequisite - the file on dst exists and has at least the same size as on dst. 2) dst -> src: (QEMU_MIGRATION_PHASE_PREPARE -> QEMU_MIGRATION_PHASE_PERFORM3) <nbd port='X'/> Okay, I (destination) support this feature as well. I've created all files as you (src) told me to and you can start rolling data. I am listening on port X. 3) src -> dst: (QEMU_MIGRATION_PHASE_PERFORM3 -> QEMU_MIGRATION_PHASE_FINISH3) <nbd port='-1'/> Migration completed, destination, you may shut the NBD server down. If either src or dst doesn't support NBD, it is not used and whole process fall backs to old implementation. Michal Privoznik (11): qemu: Introduce NBD_SERVER capability Introduce NBD migration cookie qemu: Introduce nbd-server-start command qemu: Introduce nbd-server-add command qemu: Introduce nbd-server-stop command qemu_migration: Introduce qemuMigrationStartNBDServer qemu_migration: Move port allocation to a separate func qemu_migration: Implement qemuMigrationStartNBDServer() qemu_migration: Implement qemuMigrationDriveMirror qemu_migration: Check size prerequisites qemu_migration: Stop NBD server at Finish phase src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_driver.c | 8 +- src/qemu/qemu_migration.c | 609 +++++++++++++++++++++++++++++++++++++++--- src/qemu/qemu_migration.h | 6 +- src/qemu/qemu_monitor.c | 62 +++++ src/qemu/qemu_monitor.h | 6 + src/qemu/qemu_monitor_json.c | 93 +++++++ src/qemu/qemu_monitor_json.h | 6 + 9 files changed, 756 insertions(+), 38 deletions(-) -- 1.7.8.6

This just keeps track whether qemu knows nbd-server-* commands so we can use it during migration or not. --- src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 4 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 6ce2638..e331de9 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -193,6 +193,7 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "drive-mirror", /* 115 */ "usb-redir.bootindex", "usb-host.bootindex", + "nbd-server-start", ); struct _qemuCaps { @@ -1909,6 +1910,8 @@ qemuCapsProbeQMPCommands(qemuCapsPtr caps, qemuCapsSet(caps, QEMU_CAPS_VNC); else if (STREQ(name, "drive-mirror")) qemuCapsSet(caps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "nbd-server-start")) + qemuCapsSet(caps, QEMU_CAPS_NBD_SERVER); VIR_FREE(name); } VIR_FREE(commands); diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 751b3ec..962dfdf 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -155,6 +155,7 @@ enum qemuCapsFlags { QEMU_CAPS_DRIVE_MIRROR = 115, /* drive-mirror monitor command */ QEMU_CAPS_USB_REDIR_BOOTINDEX = 116, /* usb-redir.bootindex */ QEMU_CAPS_USB_HOST_BOOTINDEX = 117, /* usb-host.bootindex */ + QEMU_CAPS_NBD_SERVER = 118, /* nbd-server-start QMP command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; -- 1.7.8.6

----- Original Message -----
This just keeps track whether qemu knows nbd-server-* commands so we can use it during migration or not. --- src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 1 + 2 files changed, 4 insertions(+), 0 deletions(-)
ACK.

This migration cookie is meant for two purposes. The first is to be sent in begin phase from source to destination to let it know we support new implementation of VIR_MIGRATE_NON_SHARED_{DISK,INC} so destination can start NBD server. Then, the second purpose is, destination can let us know, on which port is NBD server running. --- src/qemu/qemu_migration.c | 108 ++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 97 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d52ec59..cd59eda 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -72,6 +72,7 @@ enum qemuMigrationCookieFlags { QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE, QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT, QEMU_MIGRATION_COOKIE_FLAG_NETWORK, + QEMU_MIGRATION_COOKIE_FLAG_NBD, QEMU_MIGRATION_COOKIE_FLAG_LAST }; @@ -79,13 +80,14 @@ enum qemuMigrationCookieFlags { VIR_ENUM_DECL(qemuMigrationCookieFlag); VIR_ENUM_IMPL(qemuMigrationCookieFlag, QEMU_MIGRATION_COOKIE_FLAG_LAST, - "graphics", "lockstate", "persistent", "network"); + "graphics", "lockstate", "persistent", "network", "nbd"); enum qemuMigrationCookieFeatures { QEMU_MIGRATION_COOKIE_GRAPHICS = (1 << QEMU_MIGRATION_COOKIE_FLAG_GRAPHICS), QEMU_MIGRATION_COOKIE_LOCKSTATE = (1 << QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE), QEMU_MIGRATION_COOKIE_PERSISTENT = (1 << QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT), QEMU_MIGRATION_COOKIE_NETWORK = (1 << QEMU_MIGRATION_COOKIE_FLAG_NETWORK), + QEMU_MIGRATION_COOKIE_NBD = (1 << QEMU_MIGRATION_COOKIE_FLAG_NBD), }; typedef struct _qemuMigrationCookieGraphics qemuMigrationCookieGraphics; @@ -119,6 +121,17 @@ struct _qemuMigrationCookieNetwork { qemuMigrationCookieNetDataPtr net; }; +typedef struct _qemuMigrationCookieNBD qemuMigrationCookieNBD; +typedef qemuMigrationCookieNBD *qemuMigrationCookieNBDPtr; +struct _qemuMigrationCookieNBD { + int port; /* on which port does NBD server listen for incoming data. + Zero value has special meaning - it is there just to let + destination know we (the source) do support NBD. + Negative one is meant to be sent when translating from + perform to finish phase to let destination know it's + safe to stop NBD server.*/ +}; + typedef struct _qemuMigrationCookie qemuMigrationCookie; typedef qemuMigrationCookie *qemuMigrationCookiePtr; struct _qemuMigrationCookie { @@ -147,6 +160,9 @@ struct _qemuMigrationCookie { /* If (flags & QEMU_MIGRATION_COOKIE_NETWORK) */ qemuMigrationCookieNetworkPtr network; + + /* If (flags & QEMU_MIGRATION_COOKIE_NBD) */ + qemuMigrationCookieNBDPtr nbd; }; static void qemuMigrationCookieGraphicsFree(qemuMigrationCookieGraphicsPtr grap) @@ -192,6 +208,7 @@ static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) VIR_FREE(mig->name); VIR_FREE(mig->lockState); VIR_FREE(mig->lockDriver); + VIR_FREE(mig->nbd); VIR_FREE(mig); } @@ -492,6 +509,24 @@ qemuMigrationCookieAddNetwork(qemuMigrationCookiePtr mig, } +static int +qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, + int nbdPort) +{ + /* It is not a bug if there already is a NBD data */ + if (!mig->nbd && + VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + return -1; + } + + mig->nbd->port = nbdPort; + mig->flags |= QEMU_MIGRATION_COOKIE_NBD; + + return 0; +} + + static void qemuMigrationCookieGraphicsXMLFormat(virBufferPtr buf, qemuMigrationCookieGraphicsPtr grap) { @@ -594,6 +629,13 @@ qemuMigrationCookieXMLFormat(struct qemud_driver *driver, if ((mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) && mig->network) qemuMigrationCookieNetworkXMLFormat(buf, mig->network); + if ((mig->flags & QEMU_MIGRATION_COOKIE_NBD) && mig->nbd) { + virBufferAddLit(buf, " <nbd"); + if (mig->nbd->port) + virBufferAsprintf(buf, " port='%d'", mig->nbd->port); + virBufferAddLit(buf, "/>\n"); + } + virBufferAddLit(buf, "</qemu-migration>\n"); return 0; } @@ -821,6 +863,12 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, goto error; } + /* nbd is optional */ + if (val == QEMU_MIGRATION_COOKIE_FLAG_NBD) { + VIR_FREE(str); + continue; + } + if ((flags & (1 << val)) == 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Unsupported migration cookie feature %s"), @@ -873,6 +921,25 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, (!(mig->network = qemuMigrationCookieNetworkXMLParse(ctxt)))) goto error; + if (flags & QEMU_MIGRATION_COOKIE_NBD && + virXPathBoolean("count(./nbd) > 0", ctxt)) { + char *port; + + if (VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + goto error; + } + + port = virXPathString("string(./nbd/@port)", ctxt); + if (port && + virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed nbd port '%s'"), + port); + goto error; + } + } + return 0; error: @@ -911,6 +978,7 @@ static int qemuMigrationBakeCookie(qemuMigrationCookiePtr mig, struct qemud_driver *driver, virDomainObjPtr dom, + int nbdPort, char **cookieout, int *cookieoutlen, unsigned int flags) @@ -937,6 +1005,10 @@ qemuMigrationBakeCookie(qemuMigrationCookiePtr mig, return -1; } + if (flags & QEMU_MIGRATION_COOKIE_NBD && + qemuMigrationCookieAddNBD(mig, nbdPort) < 0) + return -1; + if (!(*cookieout = qemuMigrationCookieXMLFormatStr(driver, mig))) return -1; @@ -1415,6 +1487,7 @@ char *qemuMigrationBegin(struct qemud_driver *driver, qemuMigrationCookiePtr mig = NULL; virDomainDefPtr def = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; + unsigned int cookie_flags = QEMU_MIGRATION_COOKIE_LOCKSTATE; VIR_DEBUG("driver=%p, vm=%p, xmlin=%s, dname=%s," " cookieout=%p, cookieoutlen=%p, flags=%lx", @@ -1434,12 +1507,15 @@ char *qemuMigrationBegin(struct qemud_driver *driver, if (!(flags & VIR_MIGRATE_UNSAFE) && !qemuMigrationIsSafe(vm->def)) goto cleanup; + if (qemuCapsGet(priv->caps, QEMU_CAPS_NBD_SERVER)) + cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + if (!(mig = qemuMigrationEatCookie(driver, vm, NULL, 0, 0))) goto cleanup; - if (qemuMigrationBakeCookie(mig, driver, vm, + if (qemuMigrationBakeCookie(mig, driver, vm, 0, cookieout, cookieoutlen, - QEMU_MIGRATION_COOKIE_LOCKSTATE) < 0) + cookie_flags) < 0) goto cleanup; if (xmlin) { @@ -1512,6 +1588,8 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, bool tunnel = !!st; char *origname = NULL; char *xmlout = NULL; + int nbdPort = 0; + unsigned int cookie_flags = QEMU_MIGRATION_COOKIE_GRAPHICS; if (virTimeMillisNow(&now) < 0) return -1; @@ -1589,7 +1667,8 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, origname = NULL; if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_LOCKSTATE))) + QEMU_MIGRATION_COOKIE_LOCKSTATE | + QEMU_MIGRATION_COOKIE_NBD))) goto cleanup; if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) @@ -1640,8 +1719,12 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, VIR_DEBUG("Received no lockstate"); } - if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, - QEMU_MIGRATION_COOKIE_GRAPHICS) < 0) { + /* dummy place holder for real work */ + nbdPort = 0; + cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + + if (qemuMigrationBakeCookie(mig, driver, vm, nbdPort, + cookieout, cookieoutlen, cookie_flags) < 0) { /* We could tear down the whole guest here, but * cookie data is (so far) non-critical, so that * seems a little harsh. We'll just warn for now. @@ -2150,7 +2233,8 @@ qemuMigrationRun(struct qemud_driver *driver, } if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_GRAPHICS))) + QEMU_MIGRATION_COOKIE_GRAPHICS | + QEMU_MIGRATION_COOKIE_NBD))) goto cleanup; if (qemuDomainMigrateGraphicsRelocate(driver, vm, mig) < 0) @@ -2296,9 +2380,10 @@ cleanup: } if (ret == 0 && - qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, + qemuMigrationBakeCookie(mig, driver, vm, -1, cookieout, cookieoutlen, QEMU_MIGRATION_COOKIE_PERSISTENT | - QEMU_MIGRATION_COOKIE_NETWORK) < 0) { + QEMU_MIGRATION_COOKIE_NETWORK | + QEMU_MIGRATION_COOKIE_NBD) < 0) { VIR_WARN("Unable to encode migration cookie"); } @@ -3233,7 +3318,7 @@ qemuMigrationFinish(struct qemud_driver *driver, qemuDomainCleanupRemove(vm, qemuMigrationPrepareCleanup); - cookie_flags = QEMU_MIGRATION_COOKIE_NETWORK; + cookie_flags = QEMU_MIGRATION_COOKIE_NETWORK | QEMU_MIGRATION_COOKIE_NBD; if (flags & VIR_MIGRATE_PERSIST_DEST) cookie_flags |= QEMU_MIGRATION_COOKIE_PERSISTENT; @@ -3377,7 +3462,8 @@ qemuMigrationFinish(struct qemud_driver *driver, VIR_DOMAIN_EVENT_STOPPED_FAILED); } - if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, 0) < 0) + if (qemuMigrationBakeCookie(mig, driver, vm, 0, + cookieout, cookieoutlen, 0) < 0) VIR_WARN("Unable to encode migration cookie"); endjob: -- 1.7.8.6

----- Original Message -----
This migration cookie is meant for two purposes. The first is to be sent in begin phase from source to destination to let it know we support new implementation of VIR_MIGRATE_NON_SHARED_{DISK,INC} so destination can start NBD server. Then, the second purpose is, destination can let us know, on which port is NBD server running. --- src/qemu/qemu_migration.c | 108 ++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 97 insertions(+), 11 deletions(-)
Before looking at the code, I have a question: Should we be adding a new VIR_DRV_FEATURE_... bit, and setting that driver feature bit in qemu_driver.c:qemuSupportsFeature(), and only send the cookie on the source side if the destination side knows how to handle it? My concern (and it might be an invalid concern) is that if we send the nbd cookie but the destination is too old to handle it, then will the destination reject the cookie? On the other hand, this is more than just a handshake determined by the compile-time versions of libvirtd; after all, not only does libvirtd on both sides of the migration need to understand the cookie, but also qemu on the destination side has to support the NBD server, and qemu on the source side has to support drive-mirror. If any one of those pieces are not present, then we can still migrate storage, by using the older flags of the 'migrate' command itself, rather than outright failure. It would help if you document the fallback cases, something like: old libvirt src -> any destination: no cookie sent, old-style storage migration new libvirt src, but src qemu too old: same as old libvirt src new libvirt/qemu src -> old libvirt destination: src->dst cookie sent? may depend on VIR_DRV_FEATURE but no dst->src cookie reply, so use old-style migration new libvirt src -> new libvirt old qemu destination: VIR_DRV_FEATURE would be set (libvirt understands it), but because qemu does not support it, the dst must not return the cookie reply, so the source can fall back to old-style migration new libvirt src -> new libvirt/qemu dest: cookie exchange succeeds, use new style migration Now, on to the code:
@@ -79,13 +80,14 @@ enum qemuMigrationCookieFlags { VIR_ENUM_DECL(qemuMigrationCookieFlag); VIR_ENUM_IMPL(qemuMigrationCookieFlag, QEMU_MIGRATION_COOKIE_FLAG_LAST, - "graphics", "lockstate", "persistent", "network"); + "graphics", "lockstate", "persistent", "network", "nbd");
[About the only good thing of this webmail interface botching long lines is that it is easy to spot lines that are worth wrapping]
+static int +qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, + int nbdPort) +{ + /* It is not a bug if there already is a NBD data */ + if (!mig->nbd && + VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + return -1; + } + + mig->nbd->port = nbdPort;
If there is already NBD data (I'm assuming that it gets created once per disk that needs migration?), does this override of the former mig->nbd->port cause us to forget an important piece of information? Overall, the cookie manipulation seems reasonable, but I'm still worried whether new libvirt talking to older libvirt will cause confusion if the <nbd> element appears in the cookie, and also whether new libvirt gracefully handles the absence of a cookie from older libvirt.

On 27.11.2012 23:18, Eric Blake wrote:
----- Original Message -----
This migration cookie is meant for two purposes. The first is to be sent in begin phase from source to destination to let it know we support new implementation of VIR_MIGRATE_NON_SHARED_{DISK,INC} so destination can start NBD server. Then, the second purpose is, destination can let us know, on which port is NBD server running. --- src/qemu/qemu_migration.c | 108 ++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 97 insertions(+), 11 deletions(-)
Before looking at the code, I have a question:
Should we be adding a new VIR_DRV_FEATURE_... bit, and setting that driver feature bit in qemu_driver.c:qemuSupportsFeature(), and only send the cookie on the source side if the destination side knows how to handle it?
My concern (and it might be an invalid concern) is that if we send the nbd cookie but the destination is too old to handle it, then will the destination reject the cookie?
The destination would not parse that part (without any error of course) and doesn't reply to the destination neither. So in Perform phase, when the source gets a migration cookie from dst without any NBD data it is obvious dst is running an older libvirt without NBD support and hence source will fall back to previous implementation. However, we can't switch to VIR_DRV_FEATURE approach as you suggest, because usage of new implementation is not just matter of qemu driver itself, but depends on current qemu instance that is to be migrated (it's pointless to ask cached capabilities as running qemu can has a different ones). It would be nice if we could do qemuSupportsFeature() but I am afraid we can't.
On the other hand, this is more than just a handshake determined by the compile-time versions of libvirtd; after all, not only does libvirtd on both sides of the migration need to understand the cookie, but also qemu on the destination side has to support the NBD server, and qemu on the source side has to support drive-mirror. If any one of those pieces are not present, then we can still migrate storage, by using the older flags of the 'migrate' command itself, rather than outright failure.
It would help if you document the fallback cases, something like:
old libvirt src -> any destination: no cookie sent, old-style storage migration new libvirt src, but src qemu too old: same as old libvirt src new libvirt/qemu src -> old libvirt destination: src->dst cookie sent? may depend on VIR_DRV_FEATURE but no dst->src cookie reply, so use old-style migration new libvirt src -> new libvirt old qemu destination: VIR_DRV_FEATURE would be set (libvirt understands it), but because qemu does not support it, the dst must not return the cookie reply, so the source can fall back to old-style migration new libvirt src -> new libvirt/qemu dest: cookie exchange succeeds, use new style migration
Okay, I'll add it into comment somewhere in code.
Now, on to the code:
@@ -79,13 +80,14 @@ enum qemuMigrationCookieFlags { VIR_ENUM_DECL(qemuMigrationCookieFlag); VIR_ENUM_IMPL(qemuMigrationCookieFlag, QEMU_MIGRATION_COOKIE_FLAG_LAST, - "graphics", "lockstate", "persistent", "network"); + "graphics", "lockstate", "persistent", "network", "nbd");
[About the only good thing of this webmail interface botching long lines is that it is easy to spot lines that are worth wrapping]
+static int +qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, + int nbdPort) +{ + /* It is not a bug if there already is a NBD data */ + if (!mig->nbd && + VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + return -1; + } + + mig->nbd->port = nbdPort;
If there is already NBD data (I'm assuming that it gets created once per disk that needs migration?), does this override of the former mig->nbd->port cause us to forget an important piece of information?
Not really. It will overwrite info about the port NBD server is listening to. However, this will in place when we need it, the proper value is passed. Otherwise, in all other places 0 is being passed - which is what we want, since in that phases the port info is not relevant at all. So I think it is okay as is.
Overall, the cookie manipulation seems reasonable, but I'm still worried whether new libvirt talking to older libvirt will cause confusion if the <nbd> element appears in the cookie, and also whether new libvirt gracefully handles the absence of a cookie from older libvirt.
It does play nice. I've tested it. Michal

This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 22 ++++++++++++++++++ src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 49 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ 4 files changed, 77 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index cbde2b1..228a993 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3322,3 +3322,25 @@ char *qemuMonitorGetTargetArch(qemuMonitorPtr mon) return qemuMonitorJSONGetTargetArch(mon); } + +int qemuMonitorNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port) +{ + VIR_DEBUG("mon=%p host=%s port=%u", + mon, host, port); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerStart(mon, host, port); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index dbfab88..eb069ed 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -631,6 +631,9 @@ int qemuMonitorGetObjectProps(qemuMonitorPtr mon, char ***props); char *qemuMonitorGetTargetArch(qemuMonitorPtr mon); +int qemuMonitorNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 8a31466..2bf03e8 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4540,3 +4540,52 @@ cleanup: virJSONValueFree(reply); return ret; } + +int +qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port) +{ + int ret = -1; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr data = NULL; + virJSONValuePtr addr = NULL; + char *port_str = NULL; + + if (!(data = virJSONValueNewObject()) || + !(addr = virJSONValueNewObject()) || + virAsprintf(&port_str, "%d", port) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (virJSONValueObjectAppendString(data, "host", host) < 0 || + virJSONValueObjectAppendString(data, "port", port_str) < 0 || + virJSONValueObjectAppendString(addr, "type", "inet") < 0 || + virJSONValueObjectAppend(addr, "data", data) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-start", + "a:addr", addr, + NULL))) + goto cleanup; + + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + if (qemuMonitorJSONCheckError(cmd, reply) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(port_str); + virJSONValueFree(reply); + virJSONValueFree(cmd); + virJSONValueFree(addr); + virJSONValueFree(data); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index c62ae24..506e6f7 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -336,4 +336,7 @@ int qemuMonitorJSONGetObjectProps(qemuMonitorPtr mon, ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); char *qemuMonitorJSONGetTargetArch(qemuMonitorPtr mon); +int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port); #endif /* QEMU_MONITOR_JSON_H */ -- 1.7.8.6

----- Original Message -----
This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches.
Reasonable way to break up the review.
--- src/qemu/qemu_monitor.c | 22 ++++++++++++++++++ src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 49 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ 4 files changed, 77 insertions(+), 0 deletions(-)
+ if (!(data = virJSONValueNewObject()) || + !(addr = virJSONValueNewObject()) || + virAsprintf(&port_str, "%d", port) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (virJSONValueObjectAppendString(data, "host", host) < 0 || + virJSONValueObjectAppendString(data, "port", port_str) < 0
Is 'port' really a string rather than a JSON integer? (goes and checks... yep - you really did match the JSON here to the documentation in qemu.git:qapi-schema.json)
|| + virJSONValueObjectAppendString(addr, "type", "inet") < 0 || + virJSONValueObjectAppend(addr, "data", data) < 0) {
Hmm, you aren't supplying anything for the optional 'ipv4' and 'ipv6' portions of the address; do we always want the defaults of always trying both families, or are we going to need to make this configurable? But I guess we can add that later if we find we need it. ACK.

On 28.11.2012 00:26, Eric Blake wrote:
----- Original Message -----
This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches.
Reasonable way to break up the review.
--- src/qemu/qemu_monitor.c | 22 ++++++++++++++++++ src/qemu/qemu_monitor.h | 3 ++ src/qemu/qemu_monitor_json.c | 49 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 ++ 4 files changed, 77 insertions(+), 0 deletions(-)
+ if (!(data = virJSONValueNewObject()) || + !(addr = virJSONValueNewObject()) || + virAsprintf(&port_str, "%d", port) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (virJSONValueObjectAppendString(data, "host", host) < 0 || + virJSONValueObjectAppendString(data, "port", port_str) < 0
Is 'port' really a string rather than a JSON integer? (goes and checks... yep - you really did match the JSON here to the documentation in qemu.git:qapi-schema.json)
I was surprised as well. Maybe a comment just before would be nice.
|| + virJSONValueObjectAppendString(addr, "type", "inet") < 0 || + virJSONValueObjectAppend(addr, "data", data) < 0) {
Hmm, you aren't supplying anything for the optional 'ipv4' and 'ipv6' portions of the address; do we always want the defaults of always trying both families, or are we going to need to make this configurable? But I guess we can add that later if we find we need it.
Yeah. This is just internal code - no need to make it as general as possible for now. We can change it whenever somebody needs it. Where if this is be public API I would be all ten for as general as possible.
ACK.

This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 23 +++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 2 ++ 4 files changed, 48 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 228a993..b84b961 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3344,3 +3344,24 @@ int qemuMonitorNBDServerStart(qemuMonitorPtr mon, return qemuMonitorJSONNBDServerStart(mon, host, port); } + +int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID) +{ + VIR_DEBUG("mon=%p deviceID=%s", + mon, deviceID); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerAdd(mon, deviceID); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index eb069ed..09b7866 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -634,6 +634,8 @@ char *qemuMonitorGetTargetArch(qemuMonitorPtr mon); int qemuMonitorNBDServerStart(qemuMonitorPtr mon, const char *host, unsigned int port); +int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 2bf03e8..c740837 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4589,3 +4589,26 @@ cleanup: virJSONValueFree(data); return ret; } + +int +qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-add", + "s:device", deviceID, + NULL))) + return ret; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 506e6f7..1dc2fbe 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -339,4 +339,6 @@ char *qemuMonitorJSONGetTargetArch(qemuMonitorPtr mon); int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, const char *host, unsigned int port); +int qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID); #endif /* QEMU_MONITOR_JSON_H */ -- 1.7.8.6

This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 23 +++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 2 ++ 4 files changed, 48 insertions(+), 0 deletions(-)
+ +int +qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-add", + "s:device", deviceID, + NULL)))
According to qemu.git, 'writable' is optional, but if omitted, it defaults to false. But doesn't storage migration require 'writable' to be true in order for drive-mirror on the source to be able to actually write into the destination? [You may have dealt with this later in the series, but even if so, I still think this stub should be doing something with "b:writable".]

On 28.11.2012 00:41, Eric Blake wrote:
This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 23 +++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 2 ++ 4 files changed, 48 insertions(+), 0 deletions(-)
+ +int +qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-add", + "s:device", deviceID, + NULL)))
According to qemu.git, 'writable' is optional, but if omitted, it defaults to false. But doesn't storage migration require 'writable' to be true in order for drive-mirror on the source to be able to actually write into the destination?
[You may have dealt with this later in the series, but even if so, I still think this stub should be doing something with "b:writable".]
Ah, there's a bug in qemu; qapi-schema.json tells the default is false, however blockdev-nbd.c sets default to false. That would explain why it worked even without explicitly setting this argument. Anyway, I'll update my patch. Meanwhile, I've posted a qemu patch: http://lists.nongnu.org/archive/html/qemu-devel/2012-11/msg03133.html Michal

This will be used after all migration work is done to stop NBD server running on destination. It doesn't take any arguments, just issues a command. --- src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + 4 files changed, 42 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index b84b961..be70e0a 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3365,3 +3365,22 @@ int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, return qemuMonitorJSONNBDServerAdd(mon, deviceID); } + +int qemuMonitorNBDServerStop(qemuMonitorPtr mon) +{ + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerStop(mon); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 09b7866..2754519 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -636,6 +636,7 @@ int qemuMonitorNBDServerStart(qemuMonitorPtr mon, unsigned int port); int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, const char *deviceID); +int qemuMonitorNBDServerStop(qemuMonitorPtr); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c740837..76805f0 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4612,3 +4612,24 @@ qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, virJSONValueFree(reply); return ret; } + +int +qemuMonitorJSONNBDServerStop(qemuMonitorPtr mon) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-stop", + NULL))) + return ret; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 1dc2fbe..abc465c 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -341,4 +341,5 @@ int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, unsigned int port); int qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, const char *deviceID); +int qemuMonitorJSONNBDServerStop(qemuMonitorPtr mon); #endif /* QEMU_MONITOR_JSON_H */ -- 1.7.8.6

This will be used after all migration work is done to stop NBD server running on destination. It doesn't take any arguments, just issues a command. --- src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + 4 files changed, 42 insertions(+), 0 deletions(-)
+++ b/src/qemu/qemu_monitor.c @@ -3365,3 +3365,22 @@ int qemuMonitorNBDServerAdd(qemuMonitorPtr mon,
return qemuMonitorJSONNBDServerAdd(mon, deviceID); } + +int qemuMonitorNBDServerStop(qemuMonitorPtr mon) +{ + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + }
No change to this patch, but I'm getting rather tired of this copy and paste pattern - every new monitor command has to touch two .h files. I wonder if we should instead switch to more of a callback pattern, where we only have to touch one .h file (the public entry point, and the signature of a callback function), then have qemu_monitor.c do: if (mon->driver.func) mon->driver.func(args) else report error about unsupported then qemu_monitor_json.c would register a table of static callback functions, rather than having to also declare every single function almost verbatim like qemu_monitor.h. But enough of that side-track thought about a potential future reorganization of the code. ACK to this patch; fine as-is.

On 28.11.2012 00:46, Eric Blake wrote:
This will be used after all migration work is done to stop NBD server running on destination. It doesn't take any arguments, just issues a command. --- src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + 4 files changed, 42 insertions(+), 0 deletions(-)
+++ b/src/qemu/qemu_monitor.c @@ -3365,3 +3365,22 @@ int qemuMonitorNBDServerAdd(qemuMonitorPtr mon,
return qemuMonitorJSONNBDServerAdd(mon, deviceID); } + +int qemuMonitorNBDServerStop(qemuMonitorPtr mon) +{ + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + }
No change to this patch, but I'm getting rather tired of this copy and paste pattern - every new monitor command has to touch two .h files. I wonder if we should instead switch to more of a callback pattern, where we only have to touch one .h file (the public entry point, and the signature of a callback function), then have qemu_monitor.c do:
if (mon->driver.func) mon->driver.func(args) else report error about unsupported
then qemu_monitor_json.c would register a table of static callback functions, rather than having to also declare every single function almost verbatim like qemu_monitor.h.
But enough of that side-track thought about a potential future reorganization of the code.
ACK to this patch; fine as-is.
Yeah, this patch just follows structure we have. I agree that your suggestion is nicer, but not really connected to feature I am introducing. It should be saved for separate patch set. Michal

This is a stub internal API just for now. Its purpose in life is to start NBD server and feed it with all domain disks. When adding a disk to NBD server, it is addressed via its alias (id= param on qemu command line). --- src/qemu/qemu_driver.c | 8 +++--- src/qemu/qemu_migration.c | 59 +++++++++++++++++++++++++++++++++++--------- src/qemu/qemu_migration.h | 6 +++- 3 files changed, 55 insertions(+), 18 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d4cafcc..493fbb9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9625,7 +9625,7 @@ qemudDomainMigratePrepareTunnel(virConnectPtr dconn, ret = qemuMigrationPrepareTunnel(driver, dconn, NULL, 0, NULL, NULL, /* No cookies in v2 */ - st, dname, dom_xml); + st, dname, dom_xml, flags); cleanup: qemuDriverUnlock(driver); @@ -9685,7 +9685,7 @@ qemudDomainMigratePrepare2(virConnectPtr dconn, ret = qemuMigrationPrepareDirect(driver, dconn, NULL, 0, NULL, NULL, /* No cookies */ uri_in, uri_out, - dname, dom_xml); + dname, dom_xml, flags); cleanup: qemuDriverUnlock(driver); @@ -9922,7 +9922,7 @@ qemuDomainMigratePrepare3(virConnectPtr dconn, cookiein, cookieinlen, cookieout, cookieoutlen, uri_in, uri_out, - dname, dom_xml); + dname, dom_xml, flags); cleanup: qemuDriverUnlock(driver); @@ -9967,7 +9967,7 @@ qemuDomainMigratePrepareTunnel3(virConnectPtr dconn, ret = qemuMigrationPrepareTunnel(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, - st, dname, dom_xml); + st, dname, dom_xml, flags); qemuDriverUnlock(driver); cleanup: diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index cd59eda..7e86c33 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1074,6 +1074,29 @@ error: return NULL; } +/** + * qemuMigrationStartNBDServer: + * @driver: qemu driver + * @vm: domain + * @nbdPort: which port is NBD server listening to + * + * Starts NBD server. This is a newer method to copy + * storage during migration than using 'blk' and 'inc' + * arguments in 'migrate' monitor command. + * Error is reported here. + * + * Returns 0 on success, -1 otherwise. + */ +static int +qemuMigrationStartNBDServer(struct qemud_driver *driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + int *nbdPort ATTRIBUTE_UNUSED) +{ + /* do nothing for now */ + return 0; +} + + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -1575,7 +1598,8 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, const char *dname, const char *dom_xml, const char *migrateFrom, - virStreamPtr st) + virStreamPtr st, + unsigned long flags) { virDomainDefPtr def = NULL; virDomainObjPtr vm = NULL; @@ -1719,9 +1743,17 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, VIR_DEBUG("Received no lockstate"); } - /* dummy place holder for real work */ - nbdPort = 0; - cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + if ((flags & VIR_MIGRATE_NON_SHARED_INC || + flags & VIR_MIGRATE_NON_SHARED_DISK) && + mig->nbd && qemuCapsGet(priv->caps, QEMU_CAPS_NBD_SERVER)) { + /* both source and destination qemus support nbd-server-* + * commands and user requested disk copy. Use the new ones */ + if (qemuMigrationStartNBDServer(driver, vm, &nbdPort) < 0) { + /* error already reported */ + goto endjob; + } + cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + } if (qemuMigrationBakeCookie(mig, driver, vm, nbdPort, cookieout, cookieoutlen, cookie_flags) < 0) { @@ -1791,21 +1823,23 @@ qemuMigrationPrepareTunnel(struct qemud_driver *driver, int *cookieoutlen, virStreamPtr st, const char *dname, - const char *dom_xml) + const char *dom_xml, + unsigned long flags) { int ret; VIR_DEBUG("driver=%p, dconn=%p, cookiein=%s, cookieinlen=%d, " - "cookieout=%p, cookieoutlen=%p, st=%p, dname=%s, dom_xml=%s", + "cookieout=%p, cookieoutlen=%p, st=%p, dname=%s, dom_xml=%s " + "flags=%lx", driver, dconn, NULLSTR(cookiein), cookieinlen, - cookieout, cookieoutlen, st, NULLSTR(dname), dom_xml); + cookieout, cookieoutlen, st, NULLSTR(dname), dom_xml, flags); /* QEMU will be started with -incoming stdio (which qemu_command might * convert to exec:cat or fd:n) */ ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, dname, dom_xml, - "stdio", st); + "stdio", st, flags); return ret; } @@ -1820,7 +1854,8 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver, const char *uri_in, char **uri_out, const char *dname, - const char *dom_xml) + const char *dom_xml, + unsigned long flags) { static int port = 0; int this_port; @@ -1831,10 +1866,10 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver, VIR_DEBUG("driver=%p, dconn=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, uri_in=%s, uri_out=%p, " - "dname=%s, dom_xml=%s", + "dname=%s, dom_xml=%s flags=%lx", driver, dconn, NULLSTR(cookiein), cookieinlen, cookieout, cookieoutlen, NULLSTR(uri_in), uri_out, - NULLSTR(dname), dom_xml); + NULLSTR(dname), dom_xml, flags); /* The URI passed in may be NULL or a string "tcp://somehostname:port". * @@ -1916,7 +1951,7 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver, ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, dname, dom_xml, - migrateFrom, NULL); + migrateFrom, NULL, flags); cleanup: VIR_FREE(hostname); if (ret != 0) diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 7a2269a..8e411ef 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -97,7 +97,8 @@ int qemuMigrationPrepareTunnel(struct qemud_driver *driver, int *cookieoutlen, virStreamPtr st, const char *dname, - const char *dom_xml); + const char *dom_xml, + unsigned long flags); int qemuMigrationPrepareDirect(struct qemud_driver *driver, virConnectPtr dconn, @@ -108,7 +109,8 @@ int qemuMigrationPrepareDirect(struct qemud_driver *driver, const char *uri_in, char **uri_out, const char *dname, - const char *dom_xml); + const char *dom_xml, + unsigned long flags); int qemuMigrationPerform(struct qemud_driver *driver, virConnectPtr conn, -- 1.7.8.6

在 2012-11-27二的 19:50 +0100,Michal Privoznik写道:
This is a stub internal API just for now. Its purpose in life is to start NBD server and feed it with all domain disks. When adding a disk to NBD server, it is addressed via its alias (id= param on qemu command line). --- src/qemu/qemu_driver.c | 8 +++--- src/qemu/qemu_migration.c | 59 +++++++++++++++++++++++++++++++++++--------- src/qemu/qemu_migration.h | 6 +++- 3 files changed, 55 insertions(+), 18 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d4cafcc..493fbb9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9625,7 +9625,7 @@ qemudDomainMigratePrepareTunnel(virConnectPtr dconn,
ret = qemuMigrationPrepareTunnel(driver, dconn, NULL, 0, NULL, NULL, /* No cookies in v2 */ - st, dname, dom_xml); + st, dname, dom_xml, flags);
cleanup: qemuDriverUnlock(driver); @@ -9685,7 +9685,7 @@ qemudDomainMigratePrepare2(virConnectPtr dconn, ret = qemuMigrationPrepareDirect(driver, dconn, NULL, 0, NULL, NULL, /* No cookies */ uri_in, uri_out, - dname, dom_xml); + dname, dom_xml, flags);
cleanup: qemuDriverUnlock(driver); @@ -9922,7 +9922,7 @@ qemuDomainMigratePrepare3(virConnectPtr dconn, cookiein, cookieinlen, cookieout, cookieoutlen, uri_in, uri_out, - dname, dom_xml); + dname, dom_xml, flags);
cleanup: qemuDriverUnlock(driver); @@ -9967,7 +9967,7 @@ qemuDomainMigratePrepareTunnel3(virConnectPtr dconn, ret = qemuMigrationPrepareTunnel(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, - st, dname, dom_xml); + st, dname, dom_xml, flags); qemuDriverUnlock(driver);
cleanup: diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index cd59eda..7e86c33 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1074,6 +1074,29 @@ error: return NULL; }
+/** + * qemuMigrationStartNBDServer: + * @driver: qemu driver + * @vm: domain + * @nbdPort: which port is NBD server listening to + * + * Starts NBD server. This is a newer method to copy + * storage during migration than using 'blk' and 'inc' + * arguments in 'migrate' monitor command. + * Error is reported here. + * + * Returns 0 on success, -1 otherwise. + */ +static int +qemuMigrationStartNBDServer(struct qemud_driver *driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + int *nbdPort ATTRIBUTE_UNUSED) +{ + /* do nothing for now */ + return 0; +} + + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -1575,7 +1598,8 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, const char *dname, const char *dom_xml, const char *migrateFrom, - virStreamPtr st) + virStreamPtr st, + unsigned long flags) { virDomainDefPtr def = NULL; virDomainObjPtr vm = NULL; @@ -1719,9 +1743,17 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, VIR_DEBUG("Received no lockstate"); }
- /* dummy place holder for real work */ - nbdPort = 0; - cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + if ((flags & VIR_MIGRATE_NON_SHARED_INC || + flags & VIR_MIGRATE_NON_SHARED_DISK) && + mig->nbd && qemuCapsGet(priv->caps, QEMU_CAPS_NBD_SERVER)) { + /* both source and destination qemus support nbd-server-* + * commands and user requested disk copy. Use the new ones */ + if (qemuMigrationStartNBDServer(driver, vm, &nbdPort) < 0) {
so, nbdPort is generated by qemuMigrationNextPort() (08/11) not by cookie element 'nbd/port' (02/11), as a result, seems the previous cookie baking is rather needless.
+ /* error already reported */ + goto endjob; + } + cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + }
if (qemuMigrationBakeCookie(mig, driver, vm, nbdPort, cookieout, cookieoutlen, cookie_flags) < 0) { @@ -1791,21 +1823,23 @@ qemuMigrationPrepareTunnel(struct qemud_driver *driver, int *cookieoutlen, virStreamPtr st, const char *dname, - const char *dom_xml) + const char *dom_xml, + unsigned long flags) { int ret;
VIR_DEBUG("driver=%p, dconn=%p, cookiein=%s, cookieinlen=%d, " - "cookieout=%p, cookieoutlen=%p, st=%p, dname=%s, dom_xml=%s", + "cookieout=%p, cookieoutlen=%p, st=%p, dname=%s, dom_xml=%s " + "flags=%lx", driver, dconn, NULLSTR(cookiein), cookieinlen, - cookieout, cookieoutlen, st, NULLSTR(dname), dom_xml); + cookieout, cookieoutlen, st, NULLSTR(dname), dom_xml, flags);
/* QEMU will be started with -incoming stdio (which qemu_command might * convert to exec:cat or fd:n) */ ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, dname, dom_xml, - "stdio", st); + "stdio", st, flags); return ret; }
@@ -1820,7 +1854,8 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver, const char *uri_in, char **uri_out, const char *dname, - const char *dom_xml) + const char *dom_xml, + unsigned long flags) { static int port = 0; int this_port; @@ -1831,10 +1866,10 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver,
VIR_DEBUG("driver=%p, dconn=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, uri_in=%s, uri_out=%p, " - "dname=%s, dom_xml=%s", + "dname=%s, dom_xml=%s flags=%lx", driver, dconn, NULLSTR(cookiein), cookieinlen, cookieout, cookieoutlen, NULLSTR(uri_in), uri_out, - NULLSTR(dname), dom_xml); + NULLSTR(dname), dom_xml, flags);
/* The URI passed in may be NULL or a string "tcp://somehostname:port". * @@ -1916,7 +1951,7 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver,
ret = qemuMigrationPrepareAny(driver, dconn, cookiein, cookieinlen, cookieout, cookieoutlen, dname, dom_xml, - migrateFrom, NULL); + migrateFrom, NULL, flags); cleanup: VIR_FREE(hostname); if (ret != 0) diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h index 7a2269a..8e411ef 100644 --- a/src/qemu/qemu_migration.h +++ b/src/qemu/qemu_migration.h @@ -97,7 +97,8 @@ int qemuMigrationPrepareTunnel(struct qemud_driver *driver, int *cookieoutlen, virStreamPtr st, const char *dname, - const char *dom_xml); + const char *dom_xml, + unsigned long flags);
int qemuMigrationPrepareDirect(struct qemud_driver *driver, virConnectPtr dconn, @@ -108,7 +109,8 @@ int qemuMigrationPrepareDirect(struct qemud_driver *driver, const char *uri_in, char **uri_out, const char *dname, - const char *dom_xml); + const char *dom_xml, + unsigned long flags);
int qemuMigrationPerform(struct qemud_driver *driver, virConnectPtr conn,
-- regards! li guang linux kernel team at FNST, china thinking with brain but heart living with heart but brain

On 28.11.2012 04:03, li guang wrote:
在 2012-11-27二的 19:50 +0100,Michal Privoznik写道:
This is a stub internal API just for now. Its purpose in life is to start NBD server and feed it with all domain disks. When adding a disk to NBD server, it is addressed via its alias (id= param on qemu command line). --- src/qemu/qemu_driver.c | 8 +++--- src/qemu/qemu_migration.c | 59 +++++++++++++++++++++++++++++++++++--------- src/qemu/qemu_migration.h | 6 +++- 3 files changed, 55 insertions(+), 18 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index cd59eda..7e86c33 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1074,6 +1074,29 @@ error: return NULL; }
+/** + * qemuMigrationStartNBDServer: + * @driver: qemu driver + * @vm: domain + * @nbdPort: which port is NBD server listening to + * + * Starts NBD server. This is a newer method to copy + * storage during migration than using 'blk' and 'inc' + * arguments in 'migrate' monitor command. + * Error is reported here. + * + * Returns 0 on success, -1 otherwise. + */ +static int +qemuMigrationStartNBDServer(struct qemud_driver *driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm ATTRIBUTE_UNUSED, + int *nbdPort ATTRIBUTE_UNUSED) +{ + /* do nothing for now */ + return 0; +} + + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -1575,7 +1598,8 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, const char *dname, const char *dom_xml, const char *migrateFrom, - virStreamPtr st) + virStreamPtr st, + unsigned long flags) { virDomainDefPtr def = NULL; virDomainObjPtr vm = NULL; @@ -1719,9 +1743,17 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, VIR_DEBUG("Received no lockstate"); }
- /* dummy place holder for real work */ - nbdPort = 0; - cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + if ((flags & VIR_MIGRATE_NON_SHARED_INC || + flags & VIR_MIGRATE_NON_SHARED_DISK) && + mig->nbd && qemuCapsGet(priv->caps, QEMU_CAPS_NBD_SERVER)) { + /* both source and destination qemus support nbd-server-* + * commands and user requested disk copy. Use the new ones */ + if (qemuMigrationStartNBDServer(driver, vm, &nbdPort) < 0) {
so, nbdPort is generated by qemuMigrationNextPort() (08/11) not by cookie element 'nbd/port' (02/11), as a result, seems the previous cookie baking is rather needless.
Yes, that's correct. nbdPort is generated by that function. However, in this patch, I wanted to point out it qemuMigrationStartNBDServer which decides which port the NBD server will listen to. The flow is exactly the opposite - qemuMigrationStartNBDServer starts the server on a port it decides. The port is returned to callee which will store it into the cookie and send back to the source. Source needs to know which port the data should be sent to but it has no info which ports are free on dst. Long story short, This patch is mainly here to show this flow: A NBDServerStart function is called (under certain circumstances). This function allocates a port which is then injected into migration cookie. The cookie is then sent from dst to src. How the function work inside is not important for this patch. Michal

There's a code snippet which allocates a port for incoming migration. We will need this for allocating a port for incoming disks as well. Hence, move this snippet into a separate function called qemuMigrationNextPort(). --- src/qemu/qemu_migration.c | 19 +++++++++++++------ 1 files changed, 13 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 7e86c33..7a2d0a2 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1074,6 +1074,17 @@ error: return NULL; } +static int +qemuMigrationNextPort(void) { + static int port = 0; + int ret = QEMUD_MIGRATION_FIRST_PORT + port++; + + if (port == QEMUD_MIGRATION_NUM_PORTS) + port = 0; + + return ret; +} + /** * qemuMigrationStartNBDServer: * @driver: qemu driver @@ -1857,7 +1868,6 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver, const char *dom_xml, unsigned long flags) { - static int port = 0; int this_port; char *hostname = NULL; char migrateFrom [64]; @@ -1882,8 +1892,7 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver, * to be a correct hostname which refers to the target machine). */ if (uri_in == NULL) { - this_port = QEMUD_MIGRATION_FIRST_PORT + port++; - if (port == QEMUD_MIGRATION_NUM_PORTS) port = 0; + this_port = qemuMigrationNextPort(); /* Get hostname */ if ((hostname = virGetHostname(NULL)) == NULL) @@ -1922,9 +1931,7 @@ qemuMigrationPrepareDirect(struct qemud_driver *driver, p = strrchr(uri_in, ':'); if (p == strchr(uri_in, ':')) { /* Generate a port */ - this_port = QEMUD_MIGRATION_FIRST_PORT + port++; - if (port == QEMUD_MIGRATION_NUM_PORTS) - port = 0; + this_port = qemuMigrationNextPort(); /* Caller frees */ if (virAsprintf(uri_out, "%s:%d", uri_in, this_port) < 0) { -- 1.7.8.6

We need to start NBD server and feed it with all non-<shared/> disks. However, after qemuDomainObjEnterMonitorAsync the domain object is unlocked so we cannot touch its disk definitions. Therefore, we must prepare the list of disk IDs prior entering monitor. --- src/qemu/qemu_migration.c | 65 +++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 60 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 7a2d0a2..3f35a8c 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -33,6 +33,7 @@ #include "qemu_domain.h" #include "qemu_process.h" #include "qemu_capabilities.h" +#include "qemu_command.h" #include "qemu_cgroup.h" #include "domain_audit.h" @@ -1099,12 +1100,66 @@ qemuMigrationNextPort(void) { * Returns 0 on success, -1 otherwise. */ static int -qemuMigrationStartNBDServer(struct qemud_driver *driver ATTRIBUTE_UNUSED, - virDomainObjPtr vm ATTRIBUTE_UNUSED, - int *nbdPort ATTRIBUTE_UNUSED) +qemuMigrationStartNBDServer(struct qemud_driver *driver, + virDomainObjPtr vm, + int *nbdPort) { - /* do nothing for now */ - return 0; + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + int port = qemuMigrationNextPort(); + const char *listen = "0.0.0.0"; + char **disks = NULL; + size_t i, ndisks = 0; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared disks */ + if (disk->shared) + continue; + + if (VIR_REALLOC_N(disks, ndisks + 1) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (virAsprintf(&disks[ndisks++], "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto cleanup; + } + } + + if (!ndisks) { + /* Hooray! Nothing to care about */ + ret = 0; + goto cleanup; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) + goto cleanup; + + if (qemuMonitorNBDServerStart(priv->mon, listen, port) < 0) + goto endjob; + + for (i = 0; i < ndisks; i++) { + if (qemuMonitorNBDServerAdd(priv->mon, disks[i]) < 0) { + VIR_WARN("Unable to add '%s' to NDB server", disks[i]); + goto endjob; + } + } + + *nbdPort = port; + ret = 0; + +endjob: + qemuDomainObjExitMonitorWithDriver(driver, vm); +cleanup: + for (i = 0; i < ndisks; i++) + VIR_FREE(disks[i]); + VIR_FREE(disks); + return ret; } -- 1.7.8.6

This function does the source part of NBD magic. It invokes drive-mirror on each non shared disk and wait till the mirroring process completes. When it does we can proceed with migration. Currently, an active waiting is done: every 50ms libvirt asks qemu if block-job is finished or not. However, once the job finishes, qemu doesn't report its progress so we can only assume if the job finished successfully or not. The better solution would be to listen to the event which is sent as soon as the job finishes. The event does contain the result of job. --- The polling interval was just copied as-is. Maybe we can relax 50ms as disks are expected to migrate much longer than just bare RAM + qemu internal state. src/qemu/qemu_migration.c | 190 +++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 184 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 3f35a8c..2c66d8c 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1162,6 +1162,177 @@ cleanup: return ret; } +/** + * qemuMigrationDiskMirror: + * @driver: qemu driver + * @vm: domain + * @mig: migration cookie + * @migrate_flags: migrate monitor command flags + * + * Run drive-mirror to feed NBD server running on dst and + * wait till the process completes. On success, update + * @migrate_flags so we don't tell 'migrate' command to + * do the very same operation. + * + * Returns 0 on success (@migrate_flags updated), + * -1 otherwise. + */ +static int +qemuMigrationDriveMirror(struct qemud_driver *driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig, + const char *host, + unsigned long speed, + unsigned int *migrate_flags) +{ + int ret = -1; + int mon_ret; + qemuDomainObjPrivatePtr priv = vm->privateData; + size_t ndisks = 0, i; + char **disks = NULL; + unsigned int mirror_flags = VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + char *nbd_dest = NULL; + + if (*migrate_flags & QEMU_MONITOR_MIGRATE_NON_SHARED_DISK) { + /* dummy */ + } else if (*migrate_flags & QEMU_MONITOR_MIGRATE_NON_SHARED_INC) { + mirror_flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + } else { + /* Nothing to be done here. Claim success */ + return 0; + } + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared disks */ + if (disk->shared) + continue; + + if (VIR_REALLOC_N(disks, ndisks + 1) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (virAsprintf(&disks[ndisks++], "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto cleanup; + } + } + + if (!ndisks) { + /* Hooray! Nothing to care about */ + ret = 0; + goto cleanup; + } + + if (!mig->nbd) { + /* Destination doesn't support NBD server. + * Fall back to previous implementation. + * XXX Or should we report an error here? */ + VIR_DEBUG("Destination doesn't support NBD server " + "Falling back to previous implementation."); + ret = 0; + goto cleanup; + } + + for (i = 0; i < ndisks; i++) { + virDomainBlockJobInfo info; + VIR_FREE(nbd_dest); + if (virAsprintf(&nbd_dest, "nbd:%s:%u:exportname=%s", + host, mig->nbd->port, disks[i]) < 0) { + virReportOOMError(); + goto error; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + mon_ret = qemuMonitorDriveMirror(priv->mon, disks[i], nbd_dest, + NULL, speed, mirror_flags); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (mon_ret < 0) + goto error; + + /* wait for completion */ + while (true) { + /* Poll every 50ms for progress & to allow cancellation */ + struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull }; + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + if (priv->job.asyncAbort) { + /* explicitly do this *after* we entered the monitor, + * as this is a critical section so we are guaranteed + * priv->job.asyncAbort will not change */ + qemuDomainObjExitMonitorWithDriver(driver, vm); + virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), + qemuDomainAsyncJobTypeToString(priv->job.asyncJob), + _("canceled by client")); + goto cleanup; + } + mon_ret = qemuMonitorBlockJob(priv->mon, disks[i], NULL, 0, + &info, BLOCK_JOB_INFO, true); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (mon_ret < 0) { + /* qemu doesn't report finished jobs */ + VIR_WARN("Unable to query drive-mirror job status. " + "Stop polling on '%s' cur:%llu end:%llu", + disks[i], info.cur, info.end); + break; + } + + if (info.cur == info.end) { + VIR_DEBUG("Drive mirroring of '%s' completed", disks[i]); + break; + } + + /* XXX Frankly speaking, we should listen to the events, + * instead of doing this. But this works for now and we + * are doing something similar in migration itself anyway */ + + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + + nanosleep(&ts, NULL); + + qemuDriverLock(driver); + virDomainObjLock(vm); + + } + } + + /* okay, copied. modify migrate_flags */ + *migrate_flags &= ~(QEMU_MONITOR_MIGRATE_NON_SHARED_DISK | + QEMU_MONITOR_MIGRATE_NON_SHARED_INC); + ret = 0; + +cleanup: + for (i = 0; i < ndisks; i++) + VIR_FREE(disks[i]); + VIR_FREE(disks); + VIR_FREE(nbd_dest); + return ret; + +error: + /* cancel any outstanding jobs */ + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) == 0) { + while (i) { + if (qemuMonitorBlockJob(priv->mon, disks[i], NULL, 0, + NULL, BLOCK_JOB_ABORT, true) < 0) + VIR_WARN("Unable to cancel block-job on '%s'", disks[i]); + i--; + } + qemuDomainObjExitMonitorWithDriver(driver, vm); + } else { + VIR_WARN("Unable to enter monitor. No block job cancelled"); + } + goto cleanup; +} /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination @@ -2320,6 +2491,12 @@ qemuMigrationRun(struct qemud_driver *driver, cookieout, cookieoutlen, flags, resource, spec, spec->destType, spec->fwdType); + if (flags & VIR_MIGRATE_NON_SHARED_DISK) + migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_DISK; + + if (flags & VIR_MIGRATE_NON_SHARED_INC) + migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_INC; + if (virLockManagerPluginUsesState(driver->lockManager) && !cookieout) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -2337,6 +2514,13 @@ qemuMigrationRun(struct qemud_driver *driver, if (qemuDomainMigrateGraphicsRelocate(driver, vm, mig) < 0) VIR_WARN("unable to provide data for graphics client relocation"); + /* this will update migrate_flags on success */ + if (qemuMigrationDriveMirror(driver, vm, mig, spec->dest.host.name, + migrate_speed, &migrate_flags) < 0) { + /* error reported by helper func */ + goto cleanup; + } + /* Before EnterMonitor, since qemuMigrationSetOffline already does that */ if (!(flags & VIR_MIGRATE_LIVE) && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { @@ -2364,12 +2548,6 @@ qemuMigrationRun(struct qemud_driver *driver, goto cleanup; } - if (flags & VIR_MIGRATE_NON_SHARED_DISK) - migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_DISK; - - if (flags & VIR_MIGRATE_NON_SHARED_INC) - migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_INC; - /* connect to the destination qemu if needed */ if (spec->destType == MIGRATION_DEST_CONNECT_HOST && qemuMigrationConnect(driver, vm, spec) < 0) { -- 1.7.8.6

With new NBD storage migration approach there are several requirements that need to be meet for successful use of the feature. One of them is - the file representing a disk, needs to have at least same size as on the source. Hence, we must transfer a list of pairs [disk source, size] and check on destination that this requirement is met and/or take actions to meet it. --- src/qemu/qemu_migration.c | 162 ++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 159 insertions(+), 3 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 2c66d8c..f8e52fb 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -27,6 +27,9 @@ #include <gnutls/x509.h> #include <fcntl.h> #include <poll.h> +#include <unistd.h> +#include <sys/types.h> +#include <sys/stat.h> #include "qemu_migration.h" #include "qemu_monitor.h" @@ -131,6 +134,15 @@ struct _qemuMigrationCookieNBD { Negative one is meant to be sent when translating from perform to finish phase to let destination know it's safe to stop NBD server.*/ + + /* The list of pairs [disk-alias,disk-size] (in Bytes). + * This is needed because the same disk size is one of + * prerequisites for NBD storage migration. */ + size_t ndisks; + struct { + char *src; /* disk alias */ + size_t bytes; + } *disk; }; typedef struct _qemuMigrationCookie qemuMigrationCookie; @@ -193,6 +205,21 @@ qemuMigrationCookieNetworkFree(qemuMigrationCookieNetworkPtr network) } +static void +qemuMigrationCookieNBDFree(qemuMigrationCookieNBDPtr nbd) +{ + size_t i; + + if (!nbd) + return; + + for (i = 0; i < nbd->ndisks; i++) + VIR_FREE(nbd->disk[i].src); + VIR_FREE(nbd->disk); + VIR_FREE(nbd); +} + + static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) { if (!mig) @@ -204,12 +231,13 @@ static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) if (mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) qemuMigrationCookieNetworkFree(mig->network); + qemuMigrationCookieNBDFree(mig->nbd); + VIR_FREE(mig->localHostname); VIR_FREE(mig->remoteHostname); VIR_FREE(mig->name); VIR_FREE(mig->lockState); VIR_FREE(mig->lockDriver); - VIR_FREE(mig->nbd); VIR_FREE(mig); } @@ -512,8 +540,12 @@ qemuMigrationCookieAddNetwork(qemuMigrationCookiePtr mig, static int qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, + virDomainObjPtr vm, int nbdPort) { + qemuDomainObjPrivatePtr priv = vm->privateData; + size_t i; + /* It is not a bug if there already is a NBD data */ if (!mig->nbd && VIR_ALLOC(mig->nbd) < 0) { @@ -521,6 +553,38 @@ qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, return -1; } + /* in Begin phase add info about disks */ + if (priv->job.phase == QEMU_MIGRATION_PHASE_BEGIN3 && + vm->def->ndisks) { + if (VIR_ALLOC_N(mig->nbd->disk, vm->def->ndisks) < 0) { + virReportOOMError(); + return -1; + } + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + struct stat sb; + + if (!disk->src) + continue; + + if (virAsprintf(&mig->nbd->disk[mig->nbd->ndisks].src, + "%s", disk->src) < 0) { + virReportOOMError(); + return -1; + } + + if (stat(disk->src, &sb) < 0) { + virReportSystemError(errno, + _("Unable to stat '%s'"), + disk->src); + return -1; + } + + mig->nbd->disk[mig->nbd->ndisks++].bytes = sb.st_size; + } + } + mig->nbd->port = nbdPort; mig->flags |= QEMU_MIGRATION_COOKIE_NBD; @@ -634,7 +698,16 @@ qemuMigrationCookieXMLFormat(struct qemud_driver *driver, virBufferAddLit(buf, " <nbd"); if (mig->nbd->port) virBufferAsprintf(buf, " port='%d'", mig->nbd->port); - virBufferAddLit(buf, "/>\n"); + if (mig->nbd->ndisks) { + virBufferAddLit(buf, ">\n"); + for (i = 0; i < mig->nbd->ndisks; i++) + virBufferAsprintf(buf, " <disk src='%s' size='%zu'/>\n", + mig->nbd->disk[i].src, + mig->nbd->disk[i].bytes); + virBufferAddLit(buf, " </nbd>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } } virBufferAddLit(buf, "</qemu-migration>\n"); @@ -939,6 +1012,34 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, port); goto error; } + + if ((n = virXPathNodeSet("./nbd/disk", ctxt, &nodes)) > 0) { + xmlNodePtr oldNode = ctxt->node; + if (VIR_ALLOC_N(mig->nbd->disk, n) < 0) { + virReportOOMError(); + goto error; + } + mig->nbd->ndisks = n; + + for (i = 0; i < n; i++) { + ctxt->node = nodes[i]; + + mig->nbd->disk[i].src = virXPathString("string(./@src)", ctxt); + + tmp = virXPathString("string(./@size)", ctxt); + if (virStrToLong_ull(tmp, NULL, 10, (unsigned long long *) + &mig->nbd->disk[i].bytes) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed size attribute '%s'"), + tmp); + VIR_FREE(tmp); + goto error; + } + VIR_FREE(tmp); + } + VIR_FREE(nodes); + ctxt->node = oldNode; + } } return 0; @@ -1007,7 +1108,7 @@ qemuMigrationBakeCookie(qemuMigrationCookiePtr mig, } if (flags & QEMU_MIGRATION_COOKIE_NBD && - qemuMigrationCookieAddNBD(mig, nbdPort) < 0) + qemuMigrationCookieAddNBD(mig, dom, nbdPort) < 0) return -1; if (!(*cookieout = qemuMigrationCookieXMLFormatStr(driver, mig))) @@ -1334,6 +1435,57 @@ error: goto cleanup; } +static int +qemuMigrationPreCreateStorage(qemuMigrationCookiePtr mig) +{ + int ret = -1; + size_t i; + struct stat sb; + int fd = -1; + + if (!mig->nbd || !mig->nbd->ndisks) { + /* nothing to do here */ + return 0; + } + + for (i = 0; i < mig->nbd->ndisks; i++) { + const char *src = mig->nbd->disk[i].src; + size_t bytes = mig->nbd->disk[i].bytes; + VIR_DEBUG("Checking '%s' for its size (requested %zuB)", src, bytes); + + if ((fd = virFileOpenAs(src, O_RDWR | O_CREAT, 0660, + -1, -1, VIR_FILE_OPEN_NOFORK)) < 0) { + virReportSystemError(errno, _("Unable to create '%s'"), src); + goto cleanup; + } + + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("Unable to stat '%s'"), src); + goto cleanup; + } + + VIR_DEBUG("File '%s' is %zuB big", src, sb.st_size); + if (sb.st_size < bytes && + ftruncate(fd, bytes) < 0) { + virReportSystemError(errno, _("Unable to ftruncate '%s'"), src); + goto cleanup; + } + + VIR_FORCE_CLOSE(fd); + } + + ret = 0; +cleanup: + VIR_FORCE_CLOSE(fd); + /* free from migration data to prevent + * infinite sending from src to dst and back */ + for (i = 0; i < mig->nbd->ndisks; i++) + VIR_FREE(mig->nbd->disk[i].src); + VIR_FREE(mig->nbd->disk); + mig->nbd->ndisks = 0; + return ret; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -1932,6 +2084,10 @@ qemuMigrationPrepareAny(struct qemud_driver *driver, QEMU_MIGRATION_COOKIE_NBD))) goto cleanup; + /* pre-create all storage */ + if (qemuMigrationPreCreateStorage(mig) < 0) + goto cleanup; + if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) goto cleanup; qemuMigrationJobSetPhase(driver, vm, QEMU_MIGRATION_PHASE_PREPARE); -- 1.7.8.6

At the end of migration, it is important to stop NBD server and thus release all allocated resources. --- src/qemu/qemu_migration.c | 28 ++++++++++++++++++++++++++++ 1 files changed, 28 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index f8e52fb..14be00b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1486,6 +1486,32 @@ cleanup: return ret; } +static void +qemuMigrationStopNBDServer(struct qemud_driver *driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!mig->nbd) + return; + + if (mig->nbd->port != -1) + VIR_WARN("This is strange. NBD port was not -1 " + "when shutting NDB server down"); + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) { + VIR_WARN("Unable to enter monitor"); + return; + } + + if (qemuMonitorNBDServerStop(priv->mon) < 0) + VIR_WARN("Unable to stop NBD server"); + + qemuDomainObjExitMonitorWithDriver(driver, vm); +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -3781,6 +3807,8 @@ qemuMigrationFinish(struct qemud_driver *driver, if (qemuDomainMigrateOPDRelocate(driver, vm, mig) < 0) VIR_WARN("unable to provide network data for relocation"); + qemuMigrationStopNBDServer(driver, vm, mig); + if (flags & VIR_MIGRATE_PERSIST_DEST) { virDomainDefPtr vmdef; if (vm->persistent) -- 1.7.8.6

This patch set re-implements migration with storage for enough new qemu.
How does this series interact with Li Guang's efforts to add offline migration? In particular,
1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk src='/var/lib/libvirt/images/f17.img' size='17179869184'/> </nbd>
Both sets of patches need to pass size information across in the cookies; so is tying it to <nbd> appropriate, or should we be rethinking this XML to be shared between both patches?

在 2012-11-27二的 16:42 -0500,Eric Blake写道:
This patch set re-implements migration with storage for enough new qemu.
How does this series interact with Li Guang's efforts to add offline migration? In particular,
1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk src='/var/lib/libvirt/images/f17.img' size='17179869184'/> </nbd>
Both sets of patches need to pass size information across in the cookies; so is tying it to <nbd> appropriate, or should we be rethinking this XML to be shared between both patches?
actually, I think 'src' is unnecessary, for 'def->disks[i]->src' shared between src & dst, size and 'src' paired one by one, we will not make a mistake to match them. -- regards! li guang linux kernel team at FNST, china thinking with brain but heart living with heart but brain

On Wed, Nov 28, 2012 at 10:31:54 +0800, li guang wrote:
在 2012-11-27二的 16:42 -0500,Eric Blake写道:
This patch set re-implements migration with storage for enough new qemu.
How does this series interact with Li Guang's efforts to add offline migration? In particular,
1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk src='/var/lib/libvirt/images/f17.img' size='17179869184'/> </nbd>
Both sets of patches need to pass size information across in the cookies; so is tying it to <nbd> appropriate, or should we be rethinking this XML to be shared between both patches?
actually, I think 'src' is unnecessary, for 'def->disks[i]->src'
'src' is not only unnecessary it's actually totally useless since we can't rely on it. In case a domain is migrated to a host with pre-migration hook installed, the hook is allowed to modify parts of the XML that are not visible to the domain. And disk source is one of them. (The same applies for specifying the XML to be used on destination directly to virDomainMigrate* APIs.) In other words, disk source on destination does not have to match the path on source host. Jirka

On 28.11.2012 14:00, Jiri Denemark wrote:
On Wed, Nov 28, 2012 at 10:31:54 +0800, li guang wrote:
在 2012-11-27二的 16:42 -0500,Eric Blake写道:
This patch set re-implements migration with storage for enough new qemu.
How does this series interact with Li Guang's efforts to add offline migration? In particular,
1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk src='/var/lib/libvirt/images/f17.img' size='17179869184'/> </nbd>
Both sets of patches need to pass size information across in the cookies; so is tying it to <nbd> appropriate, or should we be rethinking this XML to be shared between both patches?
actually, I think 'src' is unnecessary, for 'def->disks[i]->src'
'src' is not only unnecessary it's actually totally useless since we can't rely on it. In case a domain is migrated to a host with pre-migration hook installed, the hook is allowed to modify parts of the XML that are not visible to the domain. And disk source is one of them. (The same applies for specifying the XML to be used on destination directly to virDomainMigrate* APIs.) In other words, disk source on destination does not have to match the path on source host.
Jirka
Oh, right. Then we can't rely on disk ordering neither (well, for now we can because of our code being dumb-enough; but if we change it ...). I should have used device-id instead as it is the only thing that will not change. However, if it ever will - we are gonna need a mapping anyway, so we can map these device-ids then. So the migration cookie should look like this: <nbd> <disk id="drive-virtio-disk0" size="123456"/> ... </nbd> Although, I am not sure whether to include the QEMU_DRIVE_HOST_PREFIX (="drive-") or not. Michal

On 27.11.2012 22:42, Eric Blake wrote:
This patch set re-implements migration with storage for enough new qemu.
How does this series interact with Li Guang's efforts to add offline migration? In particular,
I guess you mean [1] more precisely, right? Although, event the origin offline migration patch [2] does something similar like I am doing in 2/11: propagating 'unsigned long flags' deeper in the stack of functions to make some decisions. But that's not what are you asking later, so ignore this comment :)
1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk src='/var/lib/libvirt/images/f17.img' size='17179869184'/> </nbd>
Both sets of patches need to pass size information across in the cookies; so is tying it to <nbd> appropriate, or should we be rethinking this XML to be shared between both patches?
I've commented his patch [1] yesterday. We need the same piece of functionality. Actually, my migration cookie is just a superset of his (because I do need the port attribute as pointed out in comment to 6/11). However, I am not hesitating to rename it. But from a quick look at his patch - we seem to implement the same thing, more or less. His patch allows us to pre-create images for older qemu which doesn't support nbd-server-* yet. But who is using such ancient qemu? :) 1: https://www.redhat.com/archives/libvir-list/2012-November/msg01022.html 2: https://www.redhat.com/archives/libvir-list/2012-November/msg00886.html

On Tue, Nov 27, 2012 at 07:49:54PM +0100, Michal Privoznik wrote:
This patch set re-implements migration with storage for enough new qemu. Currently, you can migrate a domain to a host without need for shared storage. This is done by setting 'blk' or 'inc' attribute (representing VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC flags respectively) of 'migrate' monitor command. However, the qemu implementation is buggy and applications are advised to switch to new impementation which, moreover, offers some nice features, like migrating only explicitly specified disks.
The new functionality is controlled via 'nbd-server-*' and 'drive-mirror' commands. The flow is meant to look like this:
1) User invokes libvirt's migrate functionality.
2) libvirt checks that no block jobs are active on the source.
3) libvirt starts the destination QEMU and sets up the NBD server using the nbd-server-start and nbd-server-add commands.
4) libvirt starts drive-mirror with a destination pointing to the remote NBD server, for example nbd:host:port:exportname=diskname (where diskname is the -drive id specified on the destination).
This is where I have a problem with automatically using the new impl. If we consider that we need to allow multiple concurrent migrations, We are introducing a requirement to open arbitrary number of ports in the firewall. This data stream is also cleartext without any encryption, even using an encrypted qcow2 disk won't help, since we're transferring the logical guest side blocks, not the physical host side blocks. What you have here is fine in the direct migration case, since we already require arbitrary open ports and non-encrypted data stream. If the user has requested TUNNELLED migration, we need follow up work done. We need the NBD server to be able to accept a pre-opened file descriptor to rather than listening on a TCP host/port, and then for libvirtd to tunnel the data stream somehow. Alternatively just use the old impl in the TUNNELLED case.
5) once all mirroring jobs reach steady state, libvirt invokes the migrate command.
6) once migration completed, libvirt invokes the nbd-server-stop command on the destination QEMU.
If we just skip the 2nd step and there is an active block-job, qemu will fail in step 4. No big deal. Since we try to NOT break migration and keep things compatible, this feature is enabled iff both sides support it. Since there's obvious need for some data transfer between src and dst, I've put it into qemuCookieMigration:
1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk src='/var/lib/libvirt/images/f17.img' size='17179869184'/> </nbd>
Hey destination, I know how to use this cool new feature. Moreover, these are the paths I'll send you. Each file is X bytes big. It's one of the prerequisite - the file on dst exists and has at least the same size as on dst.
2) dst -> src: (QEMU_MIGRATION_PHASE_PREPARE -> QEMU_MIGRATION_PHASE_PERFORM3) <nbd port='X'/>
Okay, I (destination) support this feature as well. I've created all files as you (src) told me to and you can start rolling data. I am listening on port X.
3) src -> dst: (QEMU_MIGRATION_PHASE_PERFORM3 -> QEMU_MIGRATION_PHASE_FINISH3) <nbd port='-1'/>
Migration completed, destination, you may shut the NBD server down.
If either src or dst doesn't support NBD, it is not used and whole process fall backs to old implementation.
Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 28.11.2012 11:59, Daniel P. Berrange wrote:
On Tue, Nov 27, 2012 at 07:49:54PM +0100, Michal Privoznik wrote:
This patch set re-implements migration with storage for enough new qemu. Currently, you can migrate a domain to a host without need for shared storage. This is done by setting 'blk' or 'inc' attribute (representing VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC flags respectively) of 'migrate' monitor command. However, the qemu implementation is buggy and applications are advised to switch to new impementation which, moreover, offers some nice features, like migrating only explicitly specified disks.
The new functionality is controlled via 'nbd-server-*' and 'drive-mirror' commands. The flow is meant to look like this:
1) User invokes libvirt's migrate functionality.
2) libvirt checks that no block jobs are active on the source.
3) libvirt starts the destination QEMU and sets up the NBD server using the nbd-server-start and nbd-server-add commands.
4) libvirt starts drive-mirror with a destination pointing to the remote NBD server, for example nbd:host:port:exportname=diskname (where diskname is the -drive id specified on the destination).
This is where I have a problem with automatically using the new impl. If we consider that we need to allow multiple concurrent migrations, We are introducing a requirement to open arbitrary number of ports in the firewall. This data stream is also cleartext without any encryption, even using an encrypted qcow2 disk won't help, since we're transferring the logical guest side blocks, not the physical host side blocks.
What you have here is fine in the direct migration case, since we already require arbitrary open ports and non-encrypted data stream.
If the user has requested TUNNELLED migration, we need follow up work done. We need the NBD server to be able to accept a pre-opened file descriptor to rather than listening on a TCP host/port, and then for libvirtd to tunnel the data stream somehow. Alternatively just use the old impl in the TUNNELLED case.
Right. Given how big this patch set is just for direct migration, I think we can save TUNNELLED case for a separate patch set. Having said that, this needs to be squashed into 2/11 patch: - if (qemuCapsGet(priv->caps, QEMU_CAPS_NBD_SERVER)) - cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + if (qemuCapsGet(priv->caps, QEMU_CAPS_NBD_SERVER)) { + /* TODO support NBD for TUNNELLED migration */ + if (flags & VIR_MIGRATE_TUNNELLED) + VIR_DEBUG("NBD in tunnelled migration is currently not supported"); + else + cookie_flags |= QEMU_MIGRATION_COOKIE_NBD; + } Michal

Il 28/11/2012 11:59, Daniel P. Berrange ha scritto:
If the user has requested TUNNELLED migration, we need follow up work done. We need the NBD server to be able to accept a pre-opened file descriptor to rather than listening on a TCP host/port,
This is already supported. However, the pre-opened fd would be for a listening socket. The main problem is that even though a single port is used on the destination, it is used for multiple connections. Migration would require an arbitrary number of streams, and I'm afraid supporting this would basically entail rewriting all the tunnelling code. Paolo
and then for libvirtd to tunnel the data stream somehow. Alternatively just use the old impl in the TUNNELLED case.

On Wed, Nov 28, 2012 at 03:43:16PM +0100, Paolo Bonzini wrote:
Il 28/11/2012 11:59, Daniel P. Berrange ha scritto:
If the user has requested TUNNELLED migration, we need follow up work done. We need the NBD server to be able to accept a pre-opened file descriptor to rather than listening on a TCP host/port,
This is already supported. However, the pre-opened fd would be for a listening socket.
The main problem is that even though a single port is used on the destination, it is used for multiple connections. Migration would require an arbitrary number of streams, and I'm afraid supporting this would basically entail rewriting all the tunnelling code.
Hmm, yes, that is rather a complex problem. I've long thought that QEMU migration should have a mode where it passes all its data through TLS natively. So you could do secure migration, without needing to use libvirtd tunnelling. The same is really true of the NBD code. I know the "official" NBD impl does not do encryption, but there's no strong reason why QEMU can't layer in TLS below the NBD protocol if it is an explicit opt-in at both client+server requested by libvirt. That would at least solve the security issue, without requiring tunnelling. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

Il 28/11/2012 15:46, Daniel P. Berrange ha scritto:
On Wed, Nov 28, 2012 at 03:43:16PM +0100, Paolo Bonzini wrote:
Il 28/11/2012 11:59, Daniel P. Berrange ha scritto:
If the user has requested TUNNELLED migration, we need follow up work done. We need the NBD server to be able to accept a pre-opened file descriptor to rather than listening on a TCP host/port,
This is already supported. However, the pre-opened fd would be for a listening socket.
The main problem is that even though a single port is used on the destination, it is used for multiple connections. Migration would require an arbitrary number of streams, and I'm afraid supporting this would basically entail rewriting all the tunnelling code.
Hmm, yes, that is rather a complex problem.
I've long thought that QEMU migration should have a mode where it passes all its data through TLS natively. So you could do secure migration, without needing to use libvirtd tunnelling.
I agree. Hopefully, the various rewrites/refactorings of the upstream migration code will make this easier.
The same is really true of the NBD code. I know the "official" NBD impl does not do encryption, but there's no strong reason why QEMU can't layer in TLS below the NBD protocol if it is an explicit opt-in at both client+server requested by libvirt. That would at least solve the security issue, without requiring tunnelling.
Yes, that's a good idea. Paolo
participants (6)
-
Daniel P. Berrange
-
Eric Blake
-
Jiri Denemark
-
li guang
-
Michal Privoznik
-
Paolo Bonzini