[libvirt] [PATCH v3 RESEND 00/12] Rework storage migration

This patch set re-implements migration with storage for enough new qemu. Currently, you can migrate a domain to a host without need for shared storage. This is done by setting 'blk' or 'inc' attribute (representing VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC flags respectively) of 'migrate' monitor command. However, the qemu implementation is buggy and applications are advised to switch to new impementation which, moreover, offers some nice features, like migrating only explicitly specified disks. The new functionality is controlled via 'nbd-server-*' and 'drive-mirror' commands. The flow is meant to look like this: 1) User invokes libvirt's migrate functionality. 2) libvirt checks that no block jobs are active on the source. 3) libvirt starts the destination QEMU and sets up the NBD server using the nbd-server-start and nbd-server-add commands. 4) libvirt starts drive-mirror with a destination pointing to the remote NBD server, for example nbd:host:port:exportname=diskname (where diskname is the -drive id specified on the destination). 5) once all mirroring jobs reach steady state, libvirt invokes the migrate command. 6) once migration completed, libvirt invokes the nbd-server-stop command on the destination QEMU. If we just skip the 2nd step and there is an active block-job, qemu will fail in step 4. No big deal. Since we try to NOT break migration and keep things compatible, this feature is enabled iff both sides support it. Since there's obvious need for some data transfer between src and dst, I've put it into qemuCookieMigration: 1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk target='vda' size='15032385536'/> <disk target='vdb' size='11534336'/> <disk target='vdc' size='13631488'/> </nbd> The source is telling the destination it supports the NBD feature. Moreover, which disks are to be transferred and what are their sizes. That's because disks needs to be fully allocated (even qcow) for successful transfer. 2) dst -> src: (QEMU_MIGRATION_PHASE_PREPARE -> QEMU_MIGRATION_PHASE_PERFORM3) <nbd port='5901'/> The destination is confirming it does support NBD as well. All disks were pre-created and NBD server is listening on given port (5901 in this case). If either src or dst doesn't support NBD, it is not used and whole process falls back to old implementation. diff to v1: -Eric's and Daniel's suggestions worked in. To point out the bigger ones: don't do NBD style when TUNNELLED requested, added 'b:writable' to 'nbd-server-add' -drop '/qemu-migration/nbd/disk/@src' attribute from migration cookie. As pointed out by Jirka, disk->src can be changed during migration (e.g. by migration hook or by passed xml). So I've tried (as suggested on the list) passing disk alias. However, since qemu hasn't been started on destination yet, the aliases hasn't been generated yet. So we have to rely on ordering completely. diff to v2: -rebase to reflect changes made by offline migration patch -send initial nbd cookie only if needed diff to v2.1: -nbd cookie reworked -don't rely on disk ordering in the cookie, but use disk target for that -adapt to virPortAllocator -unlink pre-created storage on migration fail -other of Jirka's suggestions worked in "diff" to v3: -just rebase & adapt to new qemu code after dropping QDL (Qemu Driver Lock) Michal Privoznik (12): qemu: Introduce NBD_SERVER capability Introduce NBD migration cookie qemu: Introduce nbd-server-start command qemu: Introduce nbd-server-add command qemu: Introduce nbd-server-stop command qemu_migration: Introduce qemuMigrationStartNBDServer() qemu_migration: Introduce qemuMigrationDriveMirror qemu_domain: Introduce qemuDomainGetDiskBlockInfo qemu_migration: Check size prerequisites qemu_migration: Stop NBD server at Finish phase qemu_migration: Cancel running jobs on failed migration qemu_migration: Unlink pre-created storage on error src/qemu/qemu_capabilities.c | 4 +- src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain.c | 157 ++++++++- src/qemu/qemu_domain.h | 7 + src/qemu/qemu_driver.c | 124 +------ src/qemu/qemu_migration.c | 767 ++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor.c | 63 ++++ src/qemu/qemu_monitor.h | 7 + src/qemu/qemu_monitor_json.c | 102 ++++++ src/qemu/qemu_monitor_json.h | 7 + src/qemu/qemu_process.c | 13 + 11 files changed, 1116 insertions(+), 136 deletions(-) -- 1.8.0.2

This just keeps track whether qemu knows nbd-server-* commands so we can use it during migration or not. --- src/qemu/qemu_capabilities.c | 4 +++- src/qemu/qemu_capabilities.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 51fc9dc..54e2f9b 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -205,7 +205,7 @@ VIR_ENUM_IMPL(virQEMUCaps, QEMU_CAPS_LAST, "usb-serial", /* 125 */ "usb-net", "add-fd", - + "nbd-server-start", ); struct _virQEMUCaps { @@ -1940,6 +1940,8 @@ virQEMUCapsProbeQMPCommands(virQEMUCapsPtr qemuCaps, virQEMUCapsSet(qemuCaps, QEMU_CAPS_DISK_SNAPSHOT); else if (STREQ(name, "add-fd")) virQEMUCapsSet(qemuCaps, QEMU_CAPS_ADD_FD); + else if (STREQ(name, "nbd-server-start")) + virQEMUCapsSet(qemuCaps, QEMU_CAPS_NBD_SERVER); VIR_FREE(name); } VIR_FREE(commands); diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index e69d558..7c55dc8 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -166,6 +166,7 @@ enum virQEMUCapsFlags { QEMU_CAPS_DEVICE_USB_SERIAL = 125, /* -device usb-serial */ QEMU_CAPS_DEVICE_USB_NET = 126, /* -device usb-net */ QEMU_CAPS_ADD_FD = 127, /* -add-fd */ + QEMU_CAPS_NBD_SERVER = 128, /* nbd-server-start QMP command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:34 +0100, Michal Privoznik wrote:
This just keeps track whether qemu knows nbd-server-* commands so we can use it during migration or not. --- src/qemu/qemu_capabilities.c | 4 +++- src/qemu/qemu_capabilities.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-)
ACK Jirka

This migration cookie is meant for two purposes. The first is to be sent in begin phase from source to destination to let it know we support new implementation of VIR_MIGRATE_NON_SHARED_{DISK,INC} so destination can start NBD server. Then, the second purpose is, destination can let us know, on which port the NBD server is running. --- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_migration.c | 139 +++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 125 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 905b099..e4df668 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -154,6 +154,7 @@ struct _qemuDomainObjPrivate { unsigned long migMaxBandwidth; char *origname; + int nbdPort; /* Port used for migration with NBD */ virChrdevsPtr devs; diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 36e55d2..82c3f97 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -74,6 +74,7 @@ enum qemuMigrationCookieFlags { QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE, QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT, QEMU_MIGRATION_COOKIE_FLAG_NETWORK, + QEMU_MIGRATION_COOKIE_FLAG_NBD, QEMU_MIGRATION_COOKIE_FLAG_LAST }; @@ -81,13 +82,18 @@ enum qemuMigrationCookieFlags { VIR_ENUM_DECL(qemuMigrationCookieFlag); VIR_ENUM_IMPL(qemuMigrationCookieFlag, QEMU_MIGRATION_COOKIE_FLAG_LAST, - "graphics", "lockstate", "persistent", "network"); + "graphics", + "lockstate", + "persistent", + "network", + "nbd"); enum qemuMigrationCookieFeatures { QEMU_MIGRATION_COOKIE_GRAPHICS = (1 << QEMU_MIGRATION_COOKIE_FLAG_GRAPHICS), QEMU_MIGRATION_COOKIE_LOCKSTATE = (1 << QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE), QEMU_MIGRATION_COOKIE_PERSISTENT = (1 << QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT), QEMU_MIGRATION_COOKIE_NETWORK = (1 << QEMU_MIGRATION_COOKIE_FLAG_NETWORK), + QEMU_MIGRATION_COOKIE_NBD = (1 << QEMU_MIGRATION_COOKIE_FLAG_NBD), }; typedef struct _qemuMigrationCookieGraphics qemuMigrationCookieGraphics; @@ -121,6 +127,14 @@ struct _qemuMigrationCookieNetwork { qemuMigrationCookieNetDataPtr net; }; +typedef struct _qemuMigrationCookieNBD qemuMigrationCookieNBD; +typedef qemuMigrationCookieNBD *qemuMigrationCookieNBDPtr; +struct _qemuMigrationCookieNBD { + int port; /* on which port does NBD server listen for incoming data. + Zero value has special meaning - it is there just to let + destination know we (the source) do support NBD. */ +}; + typedef struct _qemuMigrationCookie qemuMigrationCookie; typedef qemuMigrationCookie *qemuMigrationCookiePtr; struct _qemuMigrationCookie { @@ -149,6 +163,9 @@ struct _qemuMigrationCookie { /* If (flags & QEMU_MIGRATION_COOKIE_NETWORK) */ qemuMigrationCookieNetworkPtr network; + + /* If (flags & QEMU_MIGRATION_COOKIE_NBD) */ + qemuMigrationCookieNBDPtr nbd; }; static void qemuMigrationCookieGraphicsFree(qemuMigrationCookieGraphicsPtr grap) @@ -194,6 +211,7 @@ static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) VIR_FREE(mig->name); VIR_FREE(mig->lockState); VIR_FREE(mig->lockDriver); + VIR_FREE(mig->nbd); VIR_FREE(mig); } @@ -504,6 +522,27 @@ qemuMigrationCookieAddNetwork(qemuMigrationCookiePtr mig, } +static int +qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, + virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + /* It is not a bug if there already is a NBD data */ + if (!mig->nbd && + VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + return -1; + } + + mig->nbd->port = priv->nbdPort; + mig->flags |= QEMU_MIGRATION_COOKIE_NBD; + + return 0; +} + + static void qemuMigrationCookieGraphicsXMLFormat(virBufferPtr buf, qemuMigrationCookieGraphicsPtr grap) { @@ -606,6 +645,13 @@ qemuMigrationCookieXMLFormat(virQEMUDriverPtr driver, if ((mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) && mig->network) qemuMigrationCookieNetworkXMLFormat(buf, mig->network); + if ((mig->flags & QEMU_MIGRATION_COOKIE_NBD) && mig->nbd) { + virBufferAddLit(buf, " <nbd"); + if (mig->nbd->port) + virBufferAsprintf(buf, " port='%d'", mig->nbd->port); + virBufferAddLit(buf, "/>\n"); + } + virBufferAddLit(buf, "</qemu-migration>\n"); return 0; } @@ -837,6 +883,12 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, goto error; } + /* nbd is optional */ + if (val == QEMU_MIGRATION_COOKIE_FLAG_NBD) { + VIR_FREE(str); + continue; + } + if ((flags & (1 << val)) == 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Unsupported migration cookie feature %s"), @@ -889,6 +941,27 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, (!(mig->network = qemuMigrationCookieNetworkXMLParse(ctxt)))) goto error; + if (flags & QEMU_MIGRATION_COOKIE_NBD && + virXPathBoolean("boolean(./nbd)", ctxt)) { + char *port; + + port = virXPathString("string(./nbd/@port)", ctxt); + if (port) { + if (VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + goto error; + } + if (virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { + VIR_FREE(port); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed nbd port '%s'"), + port); + goto error; + } + VIR_FREE(port); + } + } + virObjectUnref(caps); return 0; @@ -955,6 +1028,10 @@ qemuMigrationBakeCookie(qemuMigrationCookiePtr mig, return -1; } + if ((flags & QEMU_MIGRATION_COOKIE_NBD) && + qemuMigrationCookieAddNBD(mig, driver, dom) < 0) + return -1; + if (!(*cookieout = qemuMigrationCookieXMLFormatStr(driver, mig))) return -1; @@ -1439,6 +1516,7 @@ char *qemuMigrationBegin(virQEMUDriverPtr driver, virDomainDefPtr def = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; virCapsPtr caps = NULL; + unsigned int cookieFlags = QEMU_MIGRATION_COOKIE_LOCKSTATE; VIR_DEBUG("driver=%p, vm=%p, xmlin=%s, dname=%s," " cookieout=%p, cookieoutlen=%p, flags=%lx", @@ -1461,12 +1539,23 @@ char *qemuMigrationBegin(virQEMUDriverPtr driver, if (!(flags & VIR_MIGRATE_UNSAFE) && !qemuMigrationIsSafe(vm->def)) goto cleanup; + if (flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC) && + virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NBD_SERVER)) { + /* TODO support NBD for TUNNELLED migration */ + if (flags & VIR_MIGRATE_TUNNELLED) + VIR_DEBUG("NBD in tunnelled migration is currently not supported"); + else { + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + priv->nbdPort = 0; + } + } + if (!(mig = qemuMigrationEatCookie(driver, vm, NULL, 0, 0))) goto cleanup; if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, - QEMU_MIGRATION_COOKIE_LOCKSTATE) < 0) + cookieFlags) < 0) goto cleanup; if (flags & VIR_MIGRATE_OFFLINE) { @@ -1666,7 +1755,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, origname = NULL; if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_LOCKSTATE))) + QEMU_MIGRATION_COOKIE_LOCKSTATE | + QEMU_MIGRATION_COOKIE_NBD))) goto cleanup; if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) @@ -1726,8 +1816,19 @@ done: else cookieFlags = QEMU_MIGRATION_COOKIE_GRAPHICS; - if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, - cookieFlags) < 0) { + if (flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC) && + virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NBD_SERVER)) { + /* TODO support NBD for TUNNELLED migration */ + if (flags & VIR_MIGRATE_TUNNELLED) + VIR_DEBUG("NBD in tunnelled migration is currently not supported"); + else { + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + priv->nbdPort = 0; + } + } + + if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, + cookieoutlen, cookieFlags) < 0) { /* We could tear down the whole guest here, but * cookie data is (so far) non-critical, so that * seems a little harsh. We'll just warn for now. @@ -2224,6 +2325,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, int fd = -1; unsigned long migrate_speed = resource ? resource : priv->migMaxBandwidth; virErrorPtr orig_err = NULL; + unsigned int cookieFlags = 0; VIR_DEBUG("driver=%p, vm=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, flags=%lx, resource=%lu, " @@ -2232,6 +2334,16 @@ qemuMigrationRun(virQEMUDriverPtr driver, cookieout, cookieoutlen, flags, resource, spec, spec->destType, spec->fwdType); + if (flags & VIR_MIGRATE_NON_SHARED_DISK) { + migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_DISK; + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + } + + if (flags & VIR_MIGRATE_NON_SHARED_INC) { + migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_INC; + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + } + if (virLockManagerPluginUsesState(driver->lockManager) && !cookieout) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -2241,8 +2353,9 @@ qemuMigrationRun(virQEMUDriverPtr driver, return -1; } - if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_GRAPHICS))) + mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, + cookieFlags | QEMU_MIGRATION_COOKIE_GRAPHICS); + if (!mig) goto cleanup; if (qemuDomainMigrateGraphicsRelocate(driver, vm, mig) < 0) @@ -2275,11 +2388,6 @@ qemuMigrationRun(virQEMUDriverPtr driver, goto cleanup; } - if (flags & VIR_MIGRATE_NON_SHARED_DISK) - migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_DISK; - - if (flags & VIR_MIGRATE_NON_SHARED_INC) - migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_INC; /* connect to the destination qemu if needed */ if (spec->destType == MIGRATION_DEST_CONNECT_HOST && @@ -2387,10 +2495,11 @@ cleanup: VIR_FORCE_CLOSE(fd); } + cookieFlags |= (QEMU_MIGRATION_COOKIE_PERSISTENT | + QEMU_MIGRATION_COOKIE_NETWORK); if (ret == 0 && - qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, - QEMU_MIGRATION_COOKIE_PERSISTENT | - QEMU_MIGRATION_COOKIE_NETWORK) < 0) { + qemuMigrationBakeCookie(mig, driver, vm, cookieout, + cookieoutlen, cookieFlags) < 0) { VIR_WARN("Unable to encode migration cookie"); } -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:35 +0100, Michal Privoznik wrote:
This migration cookie is meant for two purposes. The first is to be sent in begin phase from source to destination to let it know we support new implementation of VIR_MIGRATE_NON_SHARED_{DISK,INC} so destination can start NBD server. Then, the second purpose is, destination can let us know, on which port the NBD server is running. ... diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 36e55d2..82c3f97 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c ... @@ -121,6 +127,14 @@ struct _qemuMigrationCookieNetwork { qemuMigrationCookieNetDataPtr net; };
+typedef struct _qemuMigrationCookieNBD qemuMigrationCookieNBD; +typedef qemuMigrationCookieNBD *qemuMigrationCookieNBDPtr; +struct _qemuMigrationCookieNBD { + int port; /* on which port does NBD server listen for incoming data. + Zero value has special meaning - it is there just to let + destination know we (the source) do support NBD. */
I think you can drop this note about 0 being special. The support for NBD can detected by an empty <nbd> element in the cookie (i.e., QEMU_MIGRATION_COOKIE_NBD is set in mig->flags and mig->ndb is not NULL). And it seems the parsing code does what I suggest and does not need port='0' attribute in <nbd/> once 9/12 is applied.
+}; + typedef struct _qemuMigrationCookie qemuMigrationCookie; typedef qemuMigrationCookie *qemuMigrationCookiePtr; struct _qemuMigrationCookie { ... @@ -837,6 +883,12 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, goto error; }
+ /* nbd is optional */ + if (val == QEMU_MIGRATION_COOKIE_FLAG_NBD) { + VIR_FREE(str); + continue; + } +
This does not make sense at all, just remove this hunk. Features marked as mandatory are always mandatory and NBD is no exception.
if ((flags & (1 << val)) == 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Unsupported migration cookie feature %s"), @@ -889,6 +941,27 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, (!(mig->network = qemuMigrationCookieNetworkXMLParse(ctxt)))) goto error;
+ if (flags & QEMU_MIGRATION_COOKIE_NBD && + virXPathBoolean("boolean(./nbd)", ctxt)) { + char *port; + + port = virXPathString("string(./nbd/@port)", ctxt); + if (port) { + if (VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + goto error; + } + if (virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { + VIR_FREE(port); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed nbd port '%s'"), + port); + goto error; + } + VIR_FREE(port); + }
We even have virXPathInt that does the conversion for you, btw. But anyway, sender would send <nbd/> if port == 0 and the receiver would just ignore it since there is no port attribute present. Something's strange here. And you actually fix that in 9/12.
+ } + virObjectUnref(caps); return 0;
...
@@ -1666,7 +1755,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, origname = NULL;
if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_LOCKSTATE))) + QEMU_MIGRATION_COOKIE_LOCKSTATE | + QEMU_MIGRATION_COOKIE_NBD))) goto cleanup;
if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) @@ -1726,8 +1816,19 @@ done: else cookieFlags = QEMU_MIGRATION_COOKIE_GRAPHICS;
- if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, - cookieFlags) < 0) { + if (flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC) && + virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_NBD_SERVER)) {
I think we should only do this if the source actually asked for NBD, that is mig->nbd is not NULL.
+ /* TODO support NBD for TUNNELLED migration */ + if (flags & VIR_MIGRATE_TUNNELLED) + VIR_DEBUG("NBD in tunnelled migration is currently not supported"); + else { + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + priv->nbdPort = 0; + } + } + + if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, + cookieoutlen, cookieFlags) < 0) { /* We could tear down the whole guest here, but * cookie data is (so far) non-critical, so that * seems a little harsh. We'll just warn for now. ... @@ -2241,8 +2353,9 @@ qemuMigrationRun(virQEMUDriverPtr driver, return -1; }
- if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_GRAPHICS))) + mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, + cookieFlags | QEMU_MIGRATION_COOKIE_GRAPHICS); + if (!mig) goto cleanup;
I think we should emit a log message when we support NBD but it was not present in migration cookie. And perhaps clear the flag from cookieFlags. Now I see you did that in 7/12.
if (qemuDomainMigrateGraphicsRelocate(driver, vm, mig) < 0)
... Jirka

This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 22 +++++++++++++++++ src/qemu/qemu_monitor.h | 3 +++ src/qemu/qemu_monitor_json.c | 56 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 +++ 4 files changed, 84 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 7af571d..c7aa6d0 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3383,3 +3383,25 @@ char *qemuMonitorGetTargetArch(qemuMonitorPtr mon) return qemuMonitorJSONGetTargetArch(mon); } + +int qemuMonitorNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port) +{ + VIR_DEBUG("mon=%p host=%s port=%u", + mon, host, port); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerStart(mon, host, port); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index ac77158..230f920 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -632,6 +632,9 @@ int qemuMonitorGetObjectProps(qemuMonitorPtr mon, char ***props); char *qemuMonitorGetTargetArch(qemuMonitorPtr mon); +int qemuMonitorNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index a86d90c..e03688d 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4359,3 +4359,59 @@ cleanup: virJSONValueFree(reply); return ret; } + +int +qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port) +{ + int ret = -1; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr data = NULL; + virJSONValuePtr addr = NULL; + char *port_str = NULL; + + if (!(data = virJSONValueNewObject()) || + !(addr = virJSONValueNewObject()) || + (virAsprintf(&port_str, "%u", port) < 0)) { + virReportOOMError(); + goto cleanup; + } + + /* port is really expected as a string here by qemu */ + if (virJSONValueObjectAppendString(data, "host", host) < 0 || + virJSONValueObjectAppendString(data, "port", port_str) < 0 || + virJSONValueObjectAppendString(addr, "type", "inet") < 0 || + virJSONValueObjectAppend(addr, "data", data) < 0) { + virReportOOMError(); + goto cleanup; + } + + /* From now on, @data is part of @addr */ + data = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-start", + "a:addr", addr, + NULL))) + goto cleanup; + + /* From now on, @addr is part of @cmd */ + addr = NULL; + + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + if (qemuMonitorJSONCheckError(cmd, reply) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(port_str); + virJSONValueFree(reply); + virJSONValueFree(cmd); + virJSONValueFree(addr); + virJSONValueFree(data); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 925d937..a6ab68b 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -327,4 +327,7 @@ int qemuMonitorJSONGetObjectProps(qemuMonitorPtr mon, ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); char *qemuMonitorJSONGetTargetArch(qemuMonitorPtr mon); +int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port); #endif /* QEMU_MONITOR_JSON_H */ -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:36 +0100, Michal Privoznik wrote:
This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 22 +++++++++++++++++ src/qemu/qemu_monitor.h | 3 +++ src/qemu/qemu_monitor_json.c | 56 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 +++ 4 files changed, 84 insertions(+)
ACK Jirka

This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 22 ++++++++++++++++++++++ src/qemu/qemu_monitor.h | 3 +++ src/qemu/qemu_monitor_json.c | 25 +++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 +++ 4 files changed, 53 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index c7aa6d0..5abd9d0 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3405,3 +3405,25 @@ int qemuMonitorNBDServerStart(qemuMonitorPtr mon, return qemuMonitorJSONNBDServerStart(mon, host, port); } + +int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID, + bool writable) +{ + VIR_DEBUG("mon=%p deviceID=%s", + mon, deviceID); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerAdd(mon, deviceID, writable); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 230f920..d54d16d 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -635,6 +635,9 @@ char *qemuMonitorGetTargetArch(qemuMonitorPtr mon); int qemuMonitorNBDServerStart(qemuMonitorPtr mon, const char *host, unsigned int port); +int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID, + bool writable); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e03688d..f14b597 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4415,3 +4415,28 @@ cleanup: virJSONValueFree(data); return ret; } + +int +qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID, + bool writable) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-add", + "s:device", deviceID, + "b:writable", writable, + NULL))) + return ret; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index a6ab68b..c58aa7d 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -330,4 +330,7 @@ char *qemuMonitorJSONGetTargetArch(qemuMonitorPtr mon); int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, const char *host, unsigned int port); +int qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID, + bool writable); #endif /* QEMU_MONITOR_JSON_H */ -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:37 +0100, Michal Privoznik wrote:
This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 22 ++++++++++++++++++++++ src/qemu/qemu_monitor.h | 3 +++ src/qemu/qemu_monitor_json.c | 25 +++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 +++ 4 files changed, 53 insertions(+)
ACK Jirka

This will be used after all migration work is done to stop NBD server running on destination. It doesn't take any arguments, just issues a command. --- src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + 4 files changed, 42 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 5abd9d0..1166d8d 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3427,3 +3427,22 @@ int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, return qemuMonitorJSONNBDServerAdd(mon, deviceID, writable); } + +int qemuMonitorNBDServerStop(qemuMonitorPtr mon) +{ + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerStop(mon); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index d54d16d..10145e5 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -638,6 +638,7 @@ int qemuMonitorNBDServerStart(qemuMonitorPtr mon, int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, const char *deviceID, bool writable); +int qemuMonitorNBDServerStop(qemuMonitorPtr); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index f14b597..856dd6a 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4440,3 +4440,24 @@ qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, virJSONValueFree(reply); return ret; } + +int +qemuMonitorJSONNBDServerStop(qemuMonitorPtr mon) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-stop", + NULL))) + return ret; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index c58aa7d..fd04a92 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -333,4 +333,5 @@ int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, int qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, const char *deviceID, bool writable); +int qemuMonitorJSONNBDServerStop(qemuMonitorPtr mon); #endif /* QEMU_MONITOR_JSON_H */ -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:38 +0100, Michal Privoznik wrote:
This will be used after all migration work is done to stop NBD server running on destination. It doesn't take any arguments, just issues a command. --- src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + 4 files changed, 42 insertions(+)
ACK Jirka

We need to start NBD server and feed it with all non-<shared/>, RW and source-full disks. Moreover, with new virPortAllocator we must ensure the borrowed port for NBD server will be returned if either migration completes or qemu process is teared down. --- src/qemu/qemu_migration.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_process.c | 5 +++ 2 files changed, 88 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 82c3f97..876e81b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -35,6 +35,7 @@ #include "qemu_domain.h" #include "qemu_process.h" #include "qemu_capabilities.h" +#include "qemu_command.h" #include "qemu_cgroup.h" #include "domain_audit.h" @@ -1097,6 +1098,80 @@ error: return NULL; } +/** + * qemuMigrationStartNBDServer: + * @driver: qemu driver + * @vm: domain + * + * Starts NBD server. This is a newer method to copy + * storage during migration than using 'blk' and 'inc' + * arguments in 'migrate' monitor command. + * Error is reported here. + * + * Returns 0 on success, -1 otherwise. + */ +static int +qemuMigrationStartNBDServer(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + unsigned short port = 0; + const char *listen = "0.0.0.0"; + char *diskAlias = NULL; + size_t i; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) + goto cleanup; + + if (!port) { + if (virPortAllocatorAcquire(driver->remotePorts, &port) < 0) + goto cleanup; + + if (qemuMonitorNBDServerStart(priv->mon, listen, port) < 0) { + qemuDomainObjExitMonitor(driver, vm); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Unable to start nbd server")); + goto cleanup; + } + } + + if (qemuMonitorNBDServerAdd(priv->mon, diskAlias, true) < 0) { + qemuDomainObjExitMonitor(driver, vm); + + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to add '%s' to NDB server"), diskAlias); + goto cleanup; + } + qemuDomainObjExitMonitor(driver, vm); + } + + priv->nbdPort = port; + ret = 0; + +cleanup: + VIR_FREE(diskAlias); + if (ret < 0) + virPortAllocatorRelease(driver->remotePorts, port); + return ret; +} + + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -1822,8 +1897,11 @@ done: if (flags & VIR_MIGRATE_TUNNELLED) VIR_DEBUG("NBD in tunnelled migration is currently not supported"); else { + if (qemuMigrationStartNBDServer(driver, vm) < 0) { + /* error already reported */ + goto endjob; + } cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; - priv->nbdPort = 0; } } @@ -1870,6 +1948,10 @@ cleanup: virObjectUnlock(vm); else qemuDomainRemoveInactive(driver, vm); + if (ret < 0 && priv->nbdPort) { + virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); + priv->nbdPort = 0; + } } if (event) qemuDomainEventQueue(driver, event); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 4251c34..e6874e2 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4237,6 +4237,11 @@ void qemuProcessStop(virQEMUDriverPtr driver, } } + if (priv->nbdPort) { + virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); + priv->nbdPort = 0; + } + if (priv->agent) { qemuAgentClose(priv->agent); priv->agent = NULL; -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:39 +0100, Michal Privoznik wrote:
We need to start NBD server and feed it with all non-<shared/>, RW and source-full disks. Moreover, with new virPortAllocator we must ensure the borrowed port for NBD server will be returned if either migration completes or qemu process is teared down. --- src/qemu/qemu_migration.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_process.c | 5 +++ 2 files changed, 88 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 82c3f97..876e81b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c ... @@ -1097,6 +1098,80 @@ error: return NULL; }
+/** + * qemuMigrationStartNBDServer: + * @driver: qemu driver + * @vm: domain + * + * Starts NBD server. This is a newer method to copy + * storage during migration than using 'blk' and 'inc' + * arguments in 'migrate' monitor command. + * Error is reported here. + * + * Returns 0 on success, -1 otherwise. + */ +static int +qemuMigrationStartNBDServer(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + unsigned short port = 0; + const char *listen = "0.0.0.0";
This doesn't compile: error: declaration of 'listen' shadows a global declaration
+ char *diskAlias = NULL; + size_t i; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) + goto cleanup; + + if (!port) { + if (virPortAllocatorAcquire(driver->remotePorts, &port) < 0)
qemuDomainObjExitMonitor() is missing here.
+ goto cleanup; + + if (qemuMonitorNBDServerStart(priv->mon, listen, port) < 0) { + qemuDomainObjExitMonitor(driver, vm); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Unable to start nbd server"));
Do not overwrite useful errors from lower layers.
+ goto cleanup; + } + } + + if (qemuMonitorNBDServerAdd(priv->mon, diskAlias, true) < 0) { + qemuDomainObjExitMonitor(driver, vm); + + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to add '%s' to NDB server"), diskAlias);
Do not overwrite useful errors from lower layers.
+ goto cleanup; + goto cleanup; + } + qemuDomainObjExitMonitor(driver, vm); + } + + priv->nbdPort = port; + ret = 0; + +cleanup: + VIR_FREE(diskAlias); + if (ret < 0) + virPortAllocatorRelease(driver->remotePorts, port); + return ret; +} + + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -1822,8 +1897,11 @@ done: if (flags & VIR_MIGRATE_TUNNELLED) VIR_DEBUG("NBD in tunnelled migration is currently not supported"); else { + if (qemuMigrationStartNBDServer(driver, vm) < 0) { + /* error already reported */ + goto endjob; + } cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; - priv->nbdPort = 0; } }
As already mentioned in 2/12, we must not do this when source did not ask for NBD. ... Jirka

This function does the source part of NBD magic. It invokes drive-mirror on each non shared and RW disk with a source and wait till the mirroring process completes. When it does we can proceed with migration. Currently, an active waiting is done: every 500ms libvirt asks qemu if block-job is finished or not. However, once the job finishes, qemu doesn't report its progress so we can only assume if the job finished successfully or not. The better solution would be to listen to the event which is sent as soon as the job finishes. The event does contain the result of job. --- src/qemu/qemu_migration.c | 188 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 187 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 876e81b..e2c7804 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1171,6 +1171,186 @@ cleanup: return ret; } +/** + * qemuMigrationDiskMirror: + * @driver: qemu driver + * @vm: domain + * @mig: migration cookie + * @host: where are we migrating to + * @speed: how much should the copying be limited + * @migrate_flags: migrate monitor command flags + * + * Run drive-mirror to feed NBD server running on dst and + * wait till the process completes. On success, update + * @migrate_flags so we don't tell 'migrate' command to + * do the very same operation. + * + * Returns 0 on success (@migrate_flags updated), + * -1 otherwise. + */ +static int +qemuMigrationDriveMirror(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig, + const char *host, + unsigned long speed, + unsigned int *migrate_flags) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + int mon_ret; + int port; + ssize_t i; + char *diskAlias = NULL; + char *nbd_dest = NULL; + unsigned int mirror_flags = VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + virErrorPtr err = NULL; + + if (!(*migrate_flags & (QEMU_MONITOR_MIGRATE_NON_SHARED_DISK | + QEMU_MONITOR_MIGRATE_NON_SHARED_INC))) + return 0; + + if (!mig->nbd) { + /* Destination doesn't support NBD server. + * Fall back to previous implementation. + * XXX Or should we report an error here? */ + VIR_DEBUG("Destination doesn't support NBD server " + "Falling back to previous implementation."); + ret = 0; + goto cleanup; + } + + /* steal NBD port and thus prevent it's propagation back to destination */ + port = mig->nbd->port; + mig->nbd->port = 0; + + if (*migrate_flags & QEMU_MONITOR_MIGRATE_NON_SHARED_INC) + mirror_flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + virDomainBlockJobInfo info; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + VIR_FREE(nbd_dest); + if ((virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) || + (virAsprintf(&nbd_dest, "nbd:%s:%u:exportname=%s", + host, port, diskAlias) < 0)) { + virReportOOMError(); + goto error; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + mon_ret = qemuMonitorDriveMirror(priv->mon, diskAlias, nbd_dest, + NULL, speed, mirror_flags); + qemuDomainObjExitMonitor(driver, vm); + + if (mon_ret < 0) + goto error; + + /* wait for completion */ + while (true) { + /* Poll every 500ms for progress & to allow cancellation */ + struct timespec ts = { .tv_sec = 0, .tv_nsec = 500 * 1000 * 1000ull }; + + memset(&info, 0, sizeof(info)); + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + if (priv->job.asyncAbort) { + /* explicitly do this *after* we entered the monitor, + * as this is a critical section so we are guaranteed + * priv->job.asyncAbort will not change */ + qemuDomainObjExitMonitor(driver, vm); + virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), + qemuDomainAsyncJobTypeToString(priv->job.asyncJob), + _("canceled by client")); + goto error; + } + mon_ret = qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + &info, BLOCK_JOB_INFO, true); + qemuDomainObjExitMonitor(driver, vm); + + if (mon_ret < 0) { + /* If there's not any job running on a block device, + * qemu won't report anything, so it is not fatal + * if we fail to query status as job may have finished + * while we were sleeping. */ + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to query drive-mirror job status. " + "Stop polling on '%s' cur:%llu end:%llu"), + diskAlias, info.cur, info.end); + goto error; + } + + if (info.cur == info.end) { + VIR_DEBUG("Drive mirroring of '%s' completed", diskAlias); + break; + } + + /* XXX Frankly speaking, we should listen to the events, + * instead of doing this. But this works for now and we + * are doing something similar in migration itself anyway */ + + virObjectUnlock(vm); + + nanosleep(&ts, NULL); + + virObjectLock(vm); + } + } + + /* Okay, copied. Modify migrate_flags */ + *migrate_flags &= ~(QEMU_MONITOR_MIGRATE_NON_SHARED_DISK | + QEMU_MONITOR_MIGRATE_NON_SHARED_INC); + ret = 0; + +cleanup: + VIR_FREE(diskAlias); + VIR_FREE(nbd_dest); + return ret; + +error: + /* don't overwrite any errors */ + err = virSaveLastError(); + /* cancel any outstanding jobs */ + for (; i >= 0; i--) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto error; + } + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) == 0) { + if (qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + NULL, BLOCK_JOB_ABORT, true) < 0) { + VIR_WARN("Unable to cancel block-job on '%s'", diskAlias); + } + qemuDomainObjExitMonitor(driver, vm); + } else { + VIR_WARN("Unable to enter monitor. No block job cancelled"); + } + } + if (err) + virSetError(err); + virFreeError(err); + goto cleanup; +} /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination @@ -2443,6 +2623,13 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (qemuDomainMigrateGraphicsRelocate(driver, vm, mig) < 0) VIR_WARN("unable to provide data for graphics client relocation"); + /* this will update migrate_flags on success */ + if (qemuMigrationDriveMirror(driver, vm, mig, spec->dest.host.name, + migrate_speed, &migrate_flags) < 0) { + /* error reported by helper func */ + goto cleanup; + } + /* Before EnterMonitor, since qemuMigrationSetOffline already does that */ if (!(flags & VIR_MIGRATE_LIVE) && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { @@ -2470,7 +2657,6 @@ qemuMigrationRun(virQEMUDriverPtr driver, goto cleanup; } - /* connect to the destination qemu if needed */ if (spec->destType == MIGRATION_DEST_CONNECT_HOST && qemuMigrationConnect(driver, vm, spec) < 0) { -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:40 +0100, Michal Privoznik wrote:
This function does the source part of NBD magic. It invokes drive-mirror on each non shared and RW disk with a source and wait till the mirroring process completes. When it does we can proceed with migration.
Currently, an active waiting is done: every 500ms libvirt asks qemu if block-job is finished or not. However, once the job finishes, qemu doesn't report its progress so we can only assume if the job finished successfully or not. The better solution would be to listen to the event which is sent as soon as the job finishes. The event does contain the result of job. --- src/qemu/qemu_migration.c | 188 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 187 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 876e81b..e2c7804 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1171,6 +1171,186 @@ cleanup: return ret; }
+/** + * qemuMigrationDiskMirror:
Name mismatch.
+ * @driver: qemu driver + * @vm: domain + * @mig: migration cookie + * @host: where are we migrating to + * @speed: how much should the copying be limited + * @migrate_flags: migrate monitor command flags + * + * Run drive-mirror to feed NBD server running on dst and + * wait till the process completes. On success, update + * @migrate_flags so we don't tell 'migrate' command to + * do the very same operation.
I think the comment should mention that the process does not actually completes until the end of migration, it just switches into another phase where writes go simultaneously to both source and destination. And this switch is what we are waiting for before proceeding with the next disk. It can apparently confuse people looking at the code that shuts down all drive-mirror jobs.
+ * + * Returns 0 on success (@migrate_flags updated), + * -1 otherwise. + */ +static int +qemuMigrationDriveMirror(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig, + const char *host, + unsigned long speed, + unsigned int *migrate_flags) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + int mon_ret; + int port; + ssize_t i; + char *diskAlias = NULL; + char *nbd_dest = NULL; + unsigned int mirror_flags = VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + virErrorPtr err = NULL; + + if (!(*migrate_flags & (QEMU_MONITOR_MIGRATE_NON_SHARED_DISK | + QEMU_MONITOR_MIGRATE_NON_SHARED_INC))) + return 0; + + if (!mig->nbd) { + /* Destination doesn't support NBD server. + * Fall back to previous implementation. + * XXX Or should we report an error here? */
No, we don't want to report an error here since the feature was enabled automatically. We would only want to fail the process if we had a flag that apps/users may use to explicitly request NBD.
+ VIR_DEBUG("Destination doesn't support NBD server " + "Falling back to previous implementation.");
OK, this addresses one of my comments to 2/12.
+ ret = 0; + goto cleanup;
You may just return 0 as you do few lines above but it's not a big deal.
+ } + + /* steal NBD port and thus prevent it's propagation back to destination */ + port = mig->nbd->port; + mig->nbd->port = 0; + + if (*migrate_flags & QEMU_MONITOR_MIGRATE_NON_SHARED_INC) + mirror_flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + virDomainBlockJobInfo info; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + VIR_FREE(nbd_dest); + if ((virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) || + (virAsprintf(&nbd_dest, "nbd:%s:%u:exportname=%s", + host, port, diskAlias) < 0)) { + virReportOOMError(); + goto error; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + mon_ret = qemuMonitorDriveMirror(priv->mon, diskAlias, nbd_dest, + NULL, speed, mirror_flags); + qemuDomainObjExitMonitor(driver, vm); + + if (mon_ret < 0) + goto error; + + /* wait for completion */ + while (true) { + /* Poll every 500ms for progress & to allow cancellation */ + struct timespec ts = { .tv_sec = 0, .tv_nsec = 500 * 1000 * 1000ull }; + + memset(&info, 0, sizeof(info)); + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + if (priv->job.asyncAbort) { + /* explicitly do this *after* we entered the monitor, + * as this is a critical section so we are guaranteed + * priv->job.asyncAbort will not change */ + qemuDomainObjExitMonitor(driver, vm); + virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), + qemuDomainAsyncJobTypeToString(priv->job.asyncJob), + _("canceled by client")); + goto error; + } + mon_ret = qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + &info, BLOCK_JOB_INFO, true); + qemuDomainObjExitMonitor(driver, vm); + + if (mon_ret < 0) { + /* If there's not any job running on a block device, + * qemu won't report anything, so it is not fatal + * if we fail to query status as job may have finished + * while we were sleeping. */ + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to query drive-mirror job status. " + "Stop polling on '%s' cur:%llu end:%llu"), + diskAlias, info.cur, info.end);
Again, don't overwrite errors from lower layers. Anyway, I don't understand the code at all. You're telling failure is not fatal but still report an error and abort the process. Not to mention that if some failure was not fatal some of them definitely is. Thus you either want to remove the comment altogether or make it more clear and distinguish fatal and non-fatal errors.
+ goto error; + } + + if (info.cur == info.end) { + VIR_DEBUG("Drive mirroring of '%s' completed", diskAlias); + break; + } + + /* XXX Frankly speaking, we should listen to the events, + * instead of doing this. But this works for now and we + * are doing something similar in migration itself anyway */ + + virObjectUnlock(vm); + + nanosleep(&ts, NULL); + + virObjectLock(vm); + } + } ...
Jirka

This is just digging out important implementation from qemu driver's qemuDomainGetDiskBlockInfo() API as this functionality is going to be required in the next patch. --- src/qemu/qemu_domain.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 124 +++-------------------------------------------- 3 files changed, 138 insertions(+), 117 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 482f64a..8df2739 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -40,6 +40,9 @@ #include <sys/time.h> #include <fcntl.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <unistd.h> #include <libxml/xpathInternals.h> @@ -1959,3 +1962,127 @@ cleanup: virObjectUnref(cfg); return ret; } + +int +qemuDomainGetDiskBlockInfo(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virDomainBlockInfoPtr info) +{ + + int ret = -1; + virStorageFileMetadata *meta = NULL; + virQEMUDriverConfigPtr cfg = NULL; + int format; + struct stat sb; + int fd = -1; + off_t end; + const char *path; + + if (!disk->src) { + virReportError(VIR_ERR_INVALID_ARG, + _("disk %s does not currently have a source assigned"), + disk->dst); + goto cleanup; + } + path = disk->src; + cfg = virQEMUDriverGetConfig(driver); + + /* The path is correct, now try to open it and get its size. */ + if ((fd = open(path, O_RDONLY)) < 0) { + virReportSystemError(errno, _("failed to open path '%s'"), path); + goto cleanup; + } + + /* Probe for magic formats */ + if (disk->format) { + format = disk->format; + } else { + if (cfg->allowDiskFormatProbing) { + if ((format = virStorageFileProbeFormat(path, + cfg->user, + cfg->group)) < 0) + goto cleanup; + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("no disk format for %s and probing is disabled"), + path); + goto cleanup; + } + } + + if (!(meta = virStorageFileGetMetadataFromFD(path, fd, format))) + goto cleanup; + + /* Get info for normal formats */ + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("cannot stat file '%s'"), path); + goto cleanup; + } + + if (S_ISREG(sb.st_mode)) { +#ifndef WIN32 + info->physical = (unsigned long long)sb.st_blocks * + (unsigned long long)DEV_BSIZE; +#else + info->physical = sb.st_size; +#endif + /* Regular files may be sparse, so logical size (capacity) is not same + * as actual physical above + */ + info->capacity = sb.st_size; + } else { + /* NB. Because we configure with AC_SYS_LARGEFILE, off_t should + * be 64 bits on all platforms. + */ + end = lseek(fd, 0, SEEK_END); + if (end == (off_t)-1) { + virReportSystemError(errno, _("failed to seek to end of %s"), path); + goto cleanup; + } + info->physical = end; + info->capacity = end; + } + + /* If the file we probed has a capacity set, then override + * what we calculated from file/block extents */ + if (meta->capacity) + info->capacity = meta->capacity; + + /* Set default value .. */ + info->allocation = info->physical; + + /* ..but if guest is running & not using raw + disk format and on a block device, then query + highest allocated extent from QEMU */ + if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && + format != VIR_STORAGE_FILE_RAW && + S_ISBLK(sb.st_mode) && + virDomainObjIsActive(vm)) { + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) + goto cleanup; + + if (virDomainObjIsActive(vm)) { + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorGetBlockExtent(priv->mon, + disk->info.alias, + &info->allocation); + qemuDomainObjExitMonitor(driver, vm); + } else { + ret = 0; + } + + if (qemuDomainObjEndJob(driver, vm) == 0) + vm = NULL; + } else { + ret = 0; + } + +cleanup: + VIR_FORCE_CLOSE(fd); + virStorageFileFreeMetadata(meta); + virObjectUnref(cfg); + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index e4df668..4e20a49 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -334,5 +334,9 @@ void qemuDomainCleanupRemove(virDomainObjPtr vm, qemuDomainCleanupCallback cb); void qemuDomainCleanupRun(virQEMUDriverPtr driver, virDomainObjPtr vm); +int qemuDomainGetDiskBlockInfo(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virDomainBlockInfoPtr info); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 23499ef..bff7885 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9145,29 +9145,23 @@ cleanup: } -static int qemuDomainGetBlockInfo(virDomainPtr dom, - const char *path, - virDomainBlockInfoPtr info, - unsigned int flags) { +static int +qemuDomainGetBlockInfo(virDomainPtr dom, + const char *path, + virDomainBlockInfoPtr info, + unsigned int flags) +{ virQEMUDriverPtr driver = dom->conn->privateData; virDomainObjPtr vm; int ret = -1; - int fd = -1; - off_t end; - virStorageFileMetadata *meta = NULL; virDomainDiskDefPtr disk = NULL; - struct stat sb; int i; - int format; - virQEMUDriverConfigPtr cfg = NULL; virCheckFlags(0, -1); if (!(vm = qemuDomObjFromDomain(dom))) goto cleanup; - cfg = virQEMUDriverGetConfig(driver); - if (!path || path[0] == '\0') { virReportError(VIR_ERR_INVALID_ARG, "%s", _("NULL or empty path")); @@ -9181,116 +9175,12 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, goto cleanup; } disk = vm->def->disks[i]; - if (!disk->src) { - virReportError(VIR_ERR_INVALID_ARG, - _("disk %s does not currently have a source assigned"), - path); - goto cleanup; - } - path = disk->src; - - /* The path is correct, now try to open it and get its size. */ - fd = open(path, O_RDONLY); - if (fd == -1) { - virReportSystemError(errno, - _("failed to open path '%s'"), path); - goto cleanup; - } - - /* Probe for magic formats */ - if (disk->format) { - format = disk->format; - } else { - if (cfg->allowDiskFormatProbing) { - if ((format = virStorageFileProbeFormat(disk->src, - cfg->user, - cfg->group)) < 0) - goto cleanup; - } else { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("no disk format for %s and probing is disabled"), - disk->src); - goto cleanup; - } - } - - if (!(meta = virStorageFileGetMetadataFromFD(path, fd, format))) - goto cleanup; - - /* Get info for normal formats */ - if (fstat(fd, &sb) < 0) { - virReportSystemError(errno, - _("cannot stat file '%s'"), path); - goto cleanup; - } - if (S_ISREG(sb.st_mode)) { -#ifndef WIN32 - info->physical = (unsigned long long)sb.st_blocks * - (unsigned long long)DEV_BSIZE; -#else - info->physical = sb.st_size; -#endif - /* Regular files may be sparse, so logical size (capacity) is not same - * as actual physical above - */ - info->capacity = sb.st_size; - } else { - /* NB. Because we configure with AC_SYS_LARGEFILE, off_t should - * be 64 bits on all platforms. - */ - end = lseek(fd, 0, SEEK_END); - if (end == (off_t)-1) { - virReportSystemError(errno, - _("failed to seek to end of %s"), path); - goto cleanup; - } - info->physical = end; - info->capacity = end; - } - - /* If the file we probed has a capacity set, then override - * what we calculated from file/block extents */ - if (meta->capacity) - info->capacity = meta->capacity; - - /* Set default value .. */ - info->allocation = info->physical; - - /* ..but if guest is running & not using raw - disk format and on a block device, then query - highest allocated extent from QEMU */ - if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && - format != VIR_STORAGE_FILE_RAW && - S_ISBLK(sb.st_mode) && - virDomainObjIsActive(vm)) { - qemuDomainObjPrivatePtr priv = vm->privateData; - - if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) - goto cleanup; - - if (virDomainObjIsActive(vm)) { - qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorGetBlockExtent(priv->mon, - disk->info.alias, - &info->allocation); - qemuDomainObjExitMonitor(driver, vm); - } else { - ret = 0; - } - - if (qemuDomainObjEndJob(driver, vm) == 0) - vm = NULL; - } else { - ret = 0; - } + ret = qemuDomainGetDiskBlockInfo(driver, vm, disk, info); cleanup: - virStorageFileFreeMetadata(meta); - VIR_FORCE_CLOSE(fd); if (vm) virObjectUnlock(vm); - virObjectUnref(cfg); return ret; } -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:41 +0100, Michal Privoznik wrote:
This is just digging out important implementation from qemu driver's qemuDomainGetDiskBlockInfo() API as this functionality is going to be required in the next patch. --- src/qemu/qemu_domain.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 124 +++-------------------------------------------- 3 files changed, 138 insertions(+), 117 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 482f64a..8df2739 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -40,6 +40,9 @@
#include <sys/time.h> #include <fcntl.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <unistd.h>
#include <libxml/xpathInternals.h>
@@ -1959,3 +1962,127 @@ cleanup: virObjectUnref(cfg); return ret; } + +int +qemuDomainGetDiskBlockInfo(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virDomainBlockInfoPtr info) +{ + + int ret = -1; + virStorageFileMetadata *meta = NULL; + virQEMUDriverConfigPtr cfg = NULL; + int format; + struct stat sb; + int fd = -1; + off_t end; + const char *path; + + if (!disk->src) { + virReportError(VIR_ERR_INVALID_ARG, + _("disk %s does not currently have a source assigned"), + disk->dst); + goto cleanup; + } + path = disk->src; + cfg = virQEMUDriverGetConfig(driver); + + /* The path is correct, now try to open it and get its size. */ + if ((fd = open(path, O_RDONLY)) < 0) {
This should really check for -1 only, which is what the original code did.
+ virReportSystemError(errno, _("failed to open path '%s'"), path); + goto cleanup; + } + + /* Probe for magic formats */ + if (disk->format) { + format = disk->format; + } else { + if (cfg->allowDiskFormatProbing) { + if ((format = virStorageFileProbeFormat(path, + cfg->user, + cfg->group)) < 0) + goto cleanup; + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("no disk format for %s and probing is disabled"), + path); + goto cleanup; + } + } + + if (!(meta = virStorageFileGetMetadataFromFD(path, fd, format))) + goto cleanup; + + /* Get info for normal formats */ + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("cannot stat file '%s'"), path); + goto cleanup; + } + + if (S_ISREG(sb.st_mode)) { +#ifndef WIN32 + info->physical = (unsigned long long)sb.st_blocks * + (unsigned long long)DEV_BSIZE; +#else + info->physical = sb.st_size; +#endif + /* Regular files may be sparse, so logical size (capacity) is not same + * as actual physical above + */ + info->capacity = sb.st_size; + } else { + /* NB. Because we configure with AC_SYS_LARGEFILE, off_t should + * be 64 bits on all platforms. + */ + end = lseek(fd, 0, SEEK_END); + if (end == (off_t)-1) { + virReportSystemError(errno, _("failed to seek to end of %s"), path); + goto cleanup; + } + info->physical = end; + info->capacity = end; + } + + /* If the file we probed has a capacity set, then override + * what we calculated from file/block extents */ + if (meta->capacity) + info->capacity = meta->capacity; + + /* Set default value .. */ + info->allocation = info->physical; + + /* ..but if guest is running & not using raw + disk format and on a block device, then query + highest allocated extent from QEMU */ + if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && + format != VIR_STORAGE_FILE_RAW && + S_ISBLK(sb.st_mode) && + virDomainObjIsActive(vm)) { + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) + goto cleanup; + + if (virDomainObjIsActive(vm)) { + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorGetBlockExtent(priv->mon, + disk->info.alias, + &info->allocation); + qemuDomainObjExitMonitor(driver, vm); + } else { + ret = 0; + } + + if (qemuDomainObjEndJob(driver, vm) == 0) + vm = NULL;
As already found by John, this would not work. You either need to move setting up the job outside of this function or pass a reference to vm inside.
+ } else { + ret = 0; + } + +cleanup: + VIR_FORCE_CLOSE(fd); + virStorageFileFreeMetadata(meta); + virObjectUnref(cfg); + return ret; +} ...
Jirka

With new NBD storage migration approach there are several requirements that need to be meet for successful use of the feature. One of them is - the file representing a disk, needs to have at least same size as on the source. Hence, we must transfer a list of pairs [disk target, size] and check on destination that this requirement is met and/or take actions to meet it. --- src/qemu/qemu_migration.c | 302 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 288 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index e2c7804..9b38a0c 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -29,6 +29,9 @@ #endif #include <fcntl.h> #include <poll.h> +#include <unistd.h> +#include <sys/types.h> +#include <sys/stat.h> #include "qemu_migration.h" #include "qemu_monitor.h" @@ -134,6 +137,15 @@ struct _qemuMigrationCookieNBD { int port; /* on which port does NBD server listen for incoming data. Zero value has special meaning - it is there just to let destination know we (the source) do support NBD. */ + + /* The list of pairs [disk, size] (in Bytes). + * This is needed because the same disk size is one of + * prerequisites for NBD storage migration. */ + size_t ndisks; + struct { + char *target; + size_t bytes; + } *disk; }; typedef struct _qemuMigrationCookie qemuMigrationCookie; @@ -196,6 +208,22 @@ qemuMigrationCookieNetworkFree(qemuMigrationCookieNetworkPtr network) } +static void +qemuMigrationCookieNBDFree(qemuMigrationCookieNBDPtr nbd) +{ + size_t i; + + if (!nbd) + return; + + for (i = 0; i < nbd->ndisks; i++) + VIR_FREE(nbd->disk[i].target); + + VIR_FREE(nbd->disk); + VIR_FREE(nbd); +} + + static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) { if (!mig) @@ -207,12 +235,14 @@ static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) if (mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) qemuMigrationCookieNetworkFree(mig->network); + if (mig->flags & QEMU_MIGRATION_COOKIE_NBD) + qemuMigrationCookieNBDFree(mig->nbd); + VIR_FREE(mig->localHostname); VIR_FREE(mig->remoteHostname); VIR_FREE(mig->name); VIR_FREE(mig->lockState); VIR_FREE(mig->lockDriver); - VIR_FREE(mig->nbd); VIR_FREE(mig); } @@ -525,22 +555,71 @@ qemuMigrationCookieAddNetwork(qemuMigrationCookiePtr mig, static int qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, - virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virQEMUDriverPtr driver, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + size_t i; /* It is not a bug if there already is a NBD data */ if (!mig->nbd && VIR_ALLOC(mig->nbd) < 0) { virReportOOMError(); - return -1; + return ret; + } + + /* in Begin phase add info about disks */ + if (priv->job.phase == QEMU_MIGRATION_PHASE_BEGIN3 && + vm->def->ndisks) { + if (VIR_ALLOC_N(mig->nbd->disk, vm->def->ndisks) < 0) { + virReportOOMError(); + goto cleanup; + } + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + virDomainBlockInfo info; + int format = disk->format; + + /* Add only non-shared RW disks with source */ + if (!disk->src || disk->shared || disk->readonly) + continue; + + memset(&info, 0, sizeof(info)); + + if (qemuDomainGetDiskBlockInfo(driver, vm, disk, &info) < 0) + goto cleanup; + + if (format != VIR_STORAGE_FILE_RAW && + format != VIR_STORAGE_FILE_QCOW && + format != VIR_STORAGE_FILE_QCOW2) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("Cannot pre-create storage of %s format"), + virStorageFileFormatTypeToString(format)); + goto cleanup; + } + + if (!(mig->nbd->disk[mig->nbd->ndisks].target = strdup(disk->dst))) { + virReportOOMError(); + goto cleanup; + } + + mig->nbd->disk[mig->nbd->ndisks].bytes = info.capacity; + mig->nbd->ndisks++; + } } mig->nbd->port = priv->nbdPort; mig->flags |= QEMU_MIGRATION_COOKIE_NBD; - return 0; + ret = 0; + +cleanup: + if (ret < 0) + qemuMigrationCookieNBDFree(mig->nbd); + + return ret; } @@ -650,7 +729,16 @@ qemuMigrationCookieXMLFormat(virQEMUDriverPtr driver, virBufferAddLit(buf, " <nbd"); if (mig->nbd->port) virBufferAsprintf(buf, " port='%d'", mig->nbd->port); - virBufferAddLit(buf, "/>\n"); + if (mig->nbd->ndisks) { + virBufferAddLit(buf, ">\n"); + for (i = 0; i < mig->nbd->ndisks; i++) + virBufferAsprintf(buf, " <disk target='%s' size='%zu'/>\n", + mig->nbd->disk[i].target, + mig->nbd->disk[i].bytes); + virBufferAddLit(buf, " </nbd>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } } virBufferAddLit(buf, "</qemu-migration>\n"); @@ -946,20 +1034,56 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, virXPathBoolean("boolean(./nbd)", ctxt)) { char *port; + if (VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + goto error; + } + port = virXPathString("string(./nbd/@port)", ctxt); - if (port) { - if (VIR_ALLOC(mig->nbd) < 0) { + if (port && virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { + VIR_FREE(port); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed nbd port '%s'"), + port); + goto error; + } + VIR_FREE(port); + + if ((n = virXPathNodeSet("./nbd/disk", ctxt, &nodes)) > 0) { + xmlNodePtr oldNode = ctxt->node; + if (VIR_ALLOC_N(mig->nbd->disk, n) < 0) { virReportOOMError(); goto error; } - if (virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { - VIR_FREE(port); - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Malformed nbd port '%s'"), - port); - goto error; + mig->nbd->ndisks = n; + + for (i = 0; i < n; i++) { + ctxt->node = nodes[i]; + + tmp = virXPathString("string(./@target)", ctxt); + if (!tmp) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Malformed target attribute")); + goto error; + } + mig->nbd->disk[i].target = tmp; + /* deliberately don't free @tmp here, as the + * cookie has the reference now and it is + * responsible for freeing it later */ + + tmp = virXPathString("string(./@size)", ctxt); + if (virStrToLong_ull(tmp, NULL, 10, (unsigned long long *) + &mig->nbd->disk[i].bytes) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed size attribute '%s'"), + tmp); + VIR_FREE(tmp); + goto error; + } + VIR_FREE(tmp); } - VIR_FREE(port); + VIR_FREE(nodes); + ctxt->node = oldNode; } } @@ -1352,6 +1476,152 @@ error: goto cleanup; } +static int +qemuMigrationPreCreateStorageRaw(virDomainDiskDefPtr disk, + size_t bytes) +{ + int ret = -1; + struct stat sb; + int fd = -1; + + if ((fd = virFileOpenAs(disk->src, O_RDWR | O_CREAT, 0660, + -1, -1, VIR_FILE_OPEN_NOFORK)) < 0) { + virReportSystemError(errno, _("Unable to create '%s'"), disk->src); + goto cleanup; + } + + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("Unable to stat '%s'"), disk->src); + goto unlink; + } + + VIR_DEBUG("File '%s' is %zuB big. Required %zuB", disk->src, sb.st_size, bytes); + if (sb.st_size < bytes) { + if (ftruncate(fd, bytes) < 0) { + virReportSystemError(errno, _("Unable to ftruncate '%s'"), disk->src); + goto unlink; + } + + if (fsync(fd) < 0) { + ret = -errno; + virReportSystemError(errno, _("cannot sync data to file '%s'"), + disk->src); + goto unlink; + } + } + + ret = 0; + +cleanup: + VIR_FORCE_CLOSE(fd); + return ret; + +unlink: + if (unlink(disk->src) < 0) + VIR_WARN("Unable to unlink '%s': %d", disk->src, errno); + goto cleanup; +} + +static int +qemuMigrationPreCreateStorageQCOW(virQEMUDriverPtr driver, + virDomainDiskDefPtr disk, + int format, + size_t bytes) +{ + int ret = -1; + const char *imgBinary = qemuFindQemuImgBinary(driver); + virCommandPtr cmd = NULL; + size_t size_arg; + + if (!imgBinary) + return ret; + + /* Size in KB */ + size_arg = VIR_DIV_UP(bytes, 1024); + + cmd = virCommandNewArgList(imgBinary, "create", "-f", + virStorageFileFormatTypeToString(format), + "-o", "preallocation=metadata", + disk->src, NULL); + + virCommandAddArgFormat(cmd, "%zuK", size_arg); + + if (virCommandRun(cmd, NULL) < 0) + goto unlink; + + ret = 0; + +cleanup: + return ret; + +unlink: + if (unlink(disk->src) < 0) + VIR_WARN("Unable to unlink '%s': %d", disk->src, errno); + goto cleanup; +} + +static int +qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig) +{ + int ret = -1; + size_t i = 0; + + if (!mig->nbd || !mig->nbd->ndisks) { + /* nothing to do here */ + return 0; + } + + for (i = 0; i < mig->nbd->ndisks; i++) { + virDomainDiskDefPtr disk; + int format; + size_t bytes; + int index; + + index = virDomainDiskIndexByName(vm->def, mig->nbd->disk[i].target, false); + if (index < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No such disk '%s"), + mig->nbd->disk[i].target); + goto cleanup; + } + + disk = vm->def->disks[i]; + format = disk->format; + bytes = mig->nbd->disk[i].bytes; + + VIR_DEBUG("Checking '%s' of %s format for its size (requested %zuB)", + disk->src, virStorageFileFormatTypeToString(format), bytes); + switch (format) { + case VIR_STORAGE_FILE_RAW: + if (qemuMigrationPreCreateStorageRaw(disk, bytes) < 0) + goto cleanup; + break; + case VIR_STORAGE_FILE_QCOW: + case VIR_STORAGE_FILE_QCOW2: + if (qemuMigrationPreCreateStorageQCOW(driver, disk, + format, bytes) < 0) + goto cleanup; + break; + default: + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("Cannot pre-create disks of %s format"), + virStorageFileFormatTypeToString(format)); + goto cleanup; + } + + } + + ret = 0; +cleanup: + /* free from migration data to prevent + * infinite sending from src to dst and back */ + VIR_FREE(mig->nbd->disk); + mig->nbd->ndisks = 0; + return ret; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -2014,6 +2284,10 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, QEMU_MIGRATION_COOKIE_NBD))) goto cleanup; + /* pre-create all storage */ + if (qemuMigrationPreCreateStorage(driver, vm, mig) < 0) + goto cleanup; + if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) goto cleanup; qemuMigrationJobSetPhase(driver, vm, QEMU_MIGRATION_PHASE_PREPARE); -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:42 +0100, Michal Privoznik wrote:
With new NBD storage migration approach there are several requirements that need to be meet for successful use of the
s/meet/met/
feature. One of them is - the file representing a disk, needs to have at least same size as on the source. Hence, we must transfer a list of pairs [disk target, size] and check on destination that this requirement is met and/or take actions to meet it. --- src/qemu/qemu_migration.c | 302 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 288 insertions(+), 14 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index e2c7804..9b38a0c 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c ... @@ -525,22 +555,71 @@ qemuMigrationCookieAddNetwork(qemuMigrationCookiePtr mig,
static int qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, - virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virQEMUDriverPtr driver, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + size_t i;
/* It is not a bug if there already is a NBD data */ if (!mig->nbd && VIR_ALLOC(mig->nbd) < 0) { virReportOOMError(); - return -1; + return ret; + } + + /* in Begin phase add info about disks */ + if (priv->job.phase == QEMU_MIGRATION_PHASE_BEGIN3 && + vm->def->ndisks) { + if (VIR_ALLOC_N(mig->nbd->disk, vm->def->ndisks) < 0) { + virReportOOMError(); + goto cleanup; + } + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + virDomainBlockInfo info; + int format = disk->format; + + /* Add only non-shared RW disks with source */ + if (!disk->src || disk->shared || disk->readonly) + continue; + + memset(&info, 0, sizeof(info)); + + if (qemuDomainGetDiskBlockInfo(driver, vm, disk, &info) < 0) + goto cleanup;
Hmm, I see solving the need to enter a job for getting disk block info won't be exactly easy.
+ + if (format != VIR_STORAGE_FILE_RAW && + format != VIR_STORAGE_FILE_QCOW && + format != VIR_STORAGE_FILE_QCOW2) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("Cannot pre-create storage of %s format"), + virStorageFileFormatTypeToString(format)); + goto cleanup; + } + + if (!(mig->nbd->disk[mig->nbd->ndisks].target = strdup(disk->dst))) { + virReportOOMError(); + goto cleanup; + } + + mig->nbd->disk[mig->nbd->ndisks].bytes = info.capacity; + mig->nbd->ndisks++; + } }
mig->nbd->port = priv->nbdPort; mig->flags |= QEMU_MIGRATION_COOKIE_NBD;
- return 0; + ret = 0; + +cleanup: + if (ret < 0) + qemuMigrationCookieNBDFree(mig->nbd); + + return ret; }
...
@@ -946,20 +1034,56 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, virXPathBoolean("boolean(./nbd)", ctxt)) { char *port;
+ if (VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + goto error; + } + port = virXPathString("string(./nbd/@port)", ctxt); - if (port) { - if (VIR_ALLOC(mig->nbd) < 0) { + if (port && virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { + VIR_FREE(port); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed nbd port '%s'"), + port);
Oops, I believe you wanted to swap these two lines :-)
+ goto error; + } + VIR_FREE(port); + + if ((n = virXPathNodeSet("./nbd/disk", ctxt, &nodes)) > 0) { + xmlNodePtr oldNode = ctxt->node; + if (VIR_ALLOC_N(mig->nbd->disk, n) < 0) { virReportOOMError(); goto error; } - if (virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { - VIR_FREE(port); - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Malformed nbd port '%s'"), - port); - goto error; + mig->nbd->ndisks = n; + + for (i = 0; i < n; i++) { + ctxt->node = nodes[i];
When accessing just attributes, it's usually easier to just call virXMLPropString(nodes[i], "name") instead of using XPath machinery.
+ + tmp = virXPathString("string(./@target)", ctxt); + if (!tmp) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Malformed target attribute"));
I think this should rather be "Malformed NBD disk target attribute".
+ goto error; + } + mig->nbd->disk[i].target = tmp; + /* deliberately don't free @tmp here, as the + * cookie has the reference now and it is + * responsible for freeing it later */ + + tmp = virXPathString("string(./@size)", ctxt); + if (virStrToLong_ull(tmp, NULL, 10, (unsigned long long *) + &mig->nbd->disk[i].bytes) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed size attribute '%s'"),
You should mention NBD disk here as well.
+ tmp); + VIR_FREE(tmp); + goto error; + } + VIR_FREE(tmp); } - VIR_FREE(port); + VIR_FREE(nodes); + ctxt->node = oldNode; } }
@@ -1352,6 +1476,152 @@ error: goto cleanup; }
+static int +qemuMigrationPreCreateStorageRaw(virDomainDiskDefPtr disk, + size_t bytes) +{ + int ret = -1; + struct stat sb; + int fd = -1; + + if ((fd = virFileOpenAs(disk->src, O_RDWR | O_CREAT, 0660, + -1, -1, VIR_FILE_OPEN_NOFORK)) < 0) { + virReportSystemError(errno, _("Unable to create '%s'"), disk->src); + goto cleanup; + } + + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("Unable to stat '%s'"), disk->src); + goto unlink; + } + + VIR_DEBUG("File '%s' is %zuB big. Required %zuB", disk->src, sb.st_size, bytes); + if (sb.st_size < bytes) { + if (ftruncate(fd, bytes) < 0) { + virReportSystemError(errno, _("Unable to ftruncate '%s'"), disk->src); + goto unlink; + }
Shouldn't we rather call posix_fallocate or something similar?
+ + if (fsync(fd) < 0) { + ret = -errno;
No need to pass errno to the caller.
+ virReportSystemError(errno, _("cannot sync data to file '%s'"), + disk->src); + goto unlink; + } + } + + ret = 0; + +cleanup: + VIR_FORCE_CLOSE(fd);
You should really call VIR_CLOSE(fd) here and check for errors.
+ return ret; + +unlink: + if (unlink(disk->src) < 0) + VIR_WARN("Unable to unlink '%s': %d", disk->src, errno); + goto cleanup;
What if the file was already there but it was just too small and making it larger failed. I don't think you want to unlink it in this case.
+} + +static int +qemuMigrationPreCreateStorageQCOW(virQEMUDriverPtr driver, + virDomainDiskDefPtr disk, + int format, + size_t bytes) +{ + int ret = -1; + const char *imgBinary = qemuFindQemuImgBinary(driver); + virCommandPtr cmd = NULL; + size_t size_arg; + + if (!imgBinary) + return ret; + + /* Size in KB */ + size_arg = VIR_DIV_UP(bytes, 1024); + + cmd = virCommandNewArgList(imgBinary, "create", "-f", + virStorageFileFormatTypeToString(format), + "-o", "preallocation=metadata", + disk->src, NULL);
Wrong indentation.
+ + virCommandAddArgFormat(cmd, "%zuK", size_arg);
Any particular reason why we need to round up the size when we can just use bytes? virCommandAddArgFormat(cmd, "%zu", bytes);
+ + if (virCommandRun(cmd, NULL) < 0) + goto unlink;
What happens if the QCOW is already there?
+ + ret = 0; + +cleanup: + return ret; + +unlink: + if (unlink(disk->src) < 0) + VIR_WARN("Unable to unlink '%s': %d", disk->src, errno); + goto cleanup; +}
Anyway, I feel like all this storage creation code should be placed in a storage driver somewhere.
+ +static int +qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig) +{ + int ret = -1; + size_t i = 0; + + if (!mig->nbd || !mig->nbd->ndisks) { + /* nothing to do here */ + return 0; + } + + for (i = 0; i < mig->nbd->ndisks; i++) { + virDomainDiskDefPtr disk; + int format; + size_t bytes; + int index;
This fails to compile: error: declaration of 'index' shadows a global declaration
+ + index = virDomainDiskIndexByName(vm->def, mig->nbd->disk[i].target, false); + if (index < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No such disk '%s"), + mig->nbd->disk[i].target); + goto cleanup; + } + + disk = vm->def->disks[i];
As John already said, I think you want vm->def->disks[index] here.
+ format = disk->format; + bytes = mig->nbd->disk[i].bytes; + + VIR_DEBUG("Checking '%s' of %s format for its size (requested %zuB)", + disk->src, virStorageFileFormatTypeToString(format), bytes); + switch (format) { + case VIR_STORAGE_FILE_RAW: + if (qemuMigrationPreCreateStorageRaw(disk, bytes) < 0) + goto cleanup; + break; + case VIR_STORAGE_FILE_QCOW: + case VIR_STORAGE_FILE_QCOW2: + if (qemuMigrationPreCreateStorageQCOW(driver, disk, + format, bytes) < 0) + goto cleanup; + break; + default: + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("Cannot pre-create disks of %s format"), + virStorageFileFormatTypeToString(format)); + goto cleanup; + } + + } + + ret = 0; +cleanup: + /* free from migration data to prevent + * infinite sending from src to dst and back */ + VIR_FREE(mig->nbd->disk);
This leaks all disk targets.
+ mig->nbd->ndisks = 0; + return ret; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -2014,6 +2284,10 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, QEMU_MIGRATION_COOKIE_NBD))) goto cleanup;
+ /* pre-create all storage */ + if (qemuMigrationPreCreateStorage(driver, vm, mig) < 0) + goto cleanup; +
What happens if the disks we are pre-creating exist and are used by another domain on the destination host? :-P Normally, migration would fail if locking manager is in use but with your code, we'd just happily overwrite all disks, right? :-) I'm afraid we should deal with lock manager somehow to avoid doing that :-/
if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) goto cleanup; qemuMigrationJobSetPhase(driver, vm, QEMU_MIGRATION_PHASE_PREPARE);
Jirka

On Wed, Feb 20, 2013 at 14:47:11 +0100, Jiri Denemark wrote: ...
+static int +qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig) +{ + int ret = -1; + size_t i = 0; + + if (!mig->nbd || !mig->nbd->ndisks) { + /* nothing to do here */ + return 0; + } + + for (i = 0; i < mig->nbd->ndisks; i++) { + virDomainDiskDefPtr disk; + int format; + size_t bytes; + int index;
This fails to compile:
error: declaration of 'index' shadows a global declaration
+ + index = virDomainDiskIndexByName(vm->def, mig->nbd->disk[i].target, false); + if (index < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No such disk '%s"), + mig->nbd->disk[i].target); + goto cleanup; + } + + disk = vm->def->disks[i];
As John already said, I think you want vm->def->disks[index] here.
+ format = disk->format; + bytes = mig->nbd->disk[i].bytes;
Oh, and I forgot to mention that we should only try to pre-create disks with type='file'. Jirka

At the end of migration, it is important to stop NBD server and thus release all allocated resources. --- src/qemu/qemu_migration.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 9b38a0c..ba602e4 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1622,6 +1622,29 @@ cleanup: return ret; } +static void +qemuMigrationStopNBDServer(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!mig->nbd) + return; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) + return; + + if (qemuMonitorNBDServerStop(priv->mon) < 0) + VIR_WARN("Unable to stop NBD server"); + + qemuDomainObjExitMonitor(driver, vm); + + virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); + priv->nbdPort = 0; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -4040,6 +4063,8 @@ qemuMigrationFinish(virQEMUDriverPtr driver, VIR_WARN("unable to provide network data for relocation"); } + qemuMigrationStopNBDServer(driver, vm, mig); + if (flags & VIR_MIGRATE_PERSIST_DEST) { virDomainDefPtr vmdef; if (vm->persistent) -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:43 +0100, Michal Privoznik wrote:
At the end of migration, it is important to stop NBD server and thus release all allocated resources. --- src/qemu/qemu_migration.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
ACK Jirka

If a migration fails, we need to stop all block jobs running so qemu doesn't try to send data to destination over and over again. --- src/qemu/qemu_migration.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index ba602e4..95b6102 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1645,6 +1645,46 @@ qemuMigrationStopNBDServer(virQEMUDriverPtr driver, priv->nbdPort = 0; } +static void +qemuMigrationCancelDriveMirror(qemuMigrationCookiePtr mig, + virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + size_t i; + char *diskAlias = NULL; + + VIR_DEBUG("mig=%p nbdPort=%d", mig->nbd, priv->nbdPort); + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto cleanup; + + if (qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + NULL, BLOCK_JOB_ABORT, true) < 0) + VIR_WARN("Unable to stop block job on %s", diskAlias); + qemuDomainObjExitMonitor(driver, vm); + } + +cleanup: + VIR_FREE(diskAlias); + return; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -3054,6 +3094,9 @@ cleanup: if (ret < 0 && !orig_err) orig_err = virSaveLastError(); + /* cancel any outstanding NBD jobs */ + qemuMigrationCancelDriveMirror(mig, driver, vm); + if (spec->fwdType != MIGRATION_FWD_DIRECT) { if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) ret = -1; @@ -4261,6 +4304,9 @@ int qemuMigrationConfirm(virQEMUDriverPtr driver, VIR_DOMAIN_EVENT_STOPPED_MIGRATED); } else { + /* cancel any outstanding NBD jobs */ + qemuMigrationCancelDriveMirror(mig, driver, vm); + /* run 'cont' on the destination, which allows migration on qemu * >= 0.10.6 to work properly. This isn't strictly necessary on * older qemu's, but it also doesn't hurt anything there -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:44 +0100, Michal Privoznik wrote:
If a migration fails, we need to stop all block jobs running so qemu doesn't try to send data to destination over and over again. --- src/qemu/qemu_migration.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+)
ACK Jirka

If migratioin fails because of whatever reason and we've pre-created any disks, we should remove them instead of letting them lying around. Moreover, we need to save the disks sources into domain status file in case libvirtd gets restarted. --- src/qemu/qemu_domain.c | 30 ++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 2 ++ src/qemu/qemu_migration.c | 13 +++++++++++++ src/qemu/qemu_process.c | 8 ++++++++ 4 files changed, 51 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8df2739..66e0d82 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -231,10 +231,15 @@ error: static void qemuDomainObjPrivateFree(void *data) { + size_t i; qemuDomainObjPrivatePtr priv = data; virObjectUnref(priv->qemuCaps); + for (i = 0; i < priv->nnbdDisk; i++) + VIR_FREE(priv->nbdDisk[i]); + VIR_FREE(priv->nbdDisk); + qemuDomainPCIAddressSetFree(priv->pciaddrs); virDomainChrSourceDefFree(priv->monConfig); qemuDomainObjFreeJob(priv); @@ -263,6 +268,7 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) qemuDomainObjPrivatePtr priv = data; const char *monitorpath; enum qemuDomainJob job; + size_t i; /* priv->monitor_chr is set only for qemu */ if (priv->monConfig) { @@ -285,7 +291,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) if (priv->nvcpupids) { - int i; virBufferAddLit(buf, " <vcpus>\n"); for (i = 0 ; i < priv->nvcpupids ; i++) { virBufferAsprintf(buf, " <vcpu pid='%d'/>\n", priv->vcpupids[i]); @@ -294,7 +299,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) } if (priv->qemuCaps) { - int i; virBufferAddLit(buf, " <qemuCaps>\n"); for (i = 0 ; i < QEMU_CAPS_LAST ; i++) { if (virQEMUCapsGet(priv->qemuCaps, i)) { @@ -328,6 +332,13 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) if (priv->fakeReboot) virBufferAsprintf(buf, " <fakereboot/>\n"); + if (priv->nnbdDisk) { + virBufferAddLit(buf, " <nbdDisk>\n"); + for (i = 0; i < priv->nnbdDisk; i++) + virBufferAsprintf(buf, " <disk src='%s'/>\n", priv->nbdDisk[i]); + virBufferAddLit(buf, " </nbdDisk>\n"); + } + return 0; } @@ -473,6 +484,21 @@ static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1; + n = virXPathNodeSet("./nbdDisk/disk", ctxt, &nodes); + if (n < 0) + goto error; + if (n) { + if (VIR_REALLOC_N(priv->nbdDisk, n) < 0) { + virReportOOMError(); + goto error; + } + priv->nnbdDisk = n; + + for (i = 0; i < n; i++) + if (!(priv->nbdDisk[i] = virXMLPropString(nodes[i], "src"))) + goto error; + } + return 0; error: diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 4e20a49..83a2c1f 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -155,6 +155,8 @@ struct _qemuDomainObjPrivate { unsigned long migMaxBandwidth; char *origname; int nbdPort; /* Port used for migration with NBD */ + size_t nnbdDisk; + char **nbdDisk; /* src of disks we want to transfer via NBD */ virChrdevsPtr devs; diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 95b6102..fb4b7ac 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1565,6 +1565,7 @@ qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuMigrationCookiePtr mig) { + qemuDomainObjPrivatePtr priv = vm->privateData; int ret = -1; size_t i = 0; @@ -1611,6 +1612,11 @@ qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, goto cleanup; } + if ((VIR_REALLOC_N(priv->nbdDisk, priv->nnbdDisk + 1) < 0) || + !(priv->nbdDisk[priv->nnbdDisk++] = strdup(disk->src))) { + virReportOOMError(); + goto cleanup; + } } ret = 0; @@ -4055,6 +4061,7 @@ qemuMigrationFinish(virQEMUDriverPtr driver, qemuDomainObjPrivatePtr priv = vm->privateData; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); virCapsPtr caps = NULL; + size_t i; VIR_DEBUG("driver=%p, dconn=%p, vm=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, flags=%lx, retcode=%d", @@ -4208,6 +4215,12 @@ qemuMigrationFinish(virQEMUDriverPtr driver, VIR_DOMAIN_EVENT_SUSPENDED, VIR_DOMAIN_EVENT_SUSPENDED_PAUSED); } + + /* Free disks transferred via NBD so they don't get deleted */ + for (i = 0; i < priv->nnbdDisk; i++) + VIR_FREE(priv->nbdDisk[i]); + VIR_FREE(priv->nbdDisk); + priv->nnbdDisk = 0; } if (virDomainObjIsActive(vm) && diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index e6874e2..9fa7004 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4242,6 +4242,14 @@ void qemuProcessStop(virQEMUDriverPtr driver, priv->nbdPort = 0; } + for (i = 0; i < priv->nnbdDisk; i++) { + VIR_DEBUG("Unlinking %s due to unsuccessful NBD transfer", + priv->nbdDisk[i]); + if (unlink(priv->nbdDisk[i]) < 0) + virReportSystemError(errno, _("Unable to unlink %s"), + priv->nbdDisk[i]); + } + if (priv->agent) { qemuAgentClose(priv->agent); priv->agent = NULL; -- 1.8.0.2

On Mon, Feb 18, 2013 at 15:38:45 +0100, Michal Privoznik wrote:
If migratioin fails because of whatever reason and we've pre-created any disks, we should remove them instead of letting them lying around. Moreover, we need to save the disks sources into domain status file in case libvirtd gets restarted. --- src/qemu/qemu_domain.c | 30 ++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 2 ++ src/qemu/qemu_migration.c | 13 +++++++++++++ src/qemu/qemu_process.c | 8 ++++++++ 4 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 8df2739..66e0d82 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -231,10 +231,15 @@ error:
static void qemuDomainObjPrivateFree(void *data) { + size_t i;
int or unsigned would also work :-)
qemuDomainObjPrivatePtr priv = data;
virObjectUnref(priv->qemuCaps);
+ for (i = 0; i < priv->nnbdDisk; i++) + VIR_FREE(priv->nbdDisk[i]); + VIR_FREE(priv->nbdDisk); + qemuDomainPCIAddressSetFree(priv->pciaddrs); virDomainChrSourceDefFree(priv->monConfig); qemuDomainObjFreeJob(priv); @@ -263,6 +268,7 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) qemuDomainObjPrivatePtr priv = data; const char *monitorpath; enum qemuDomainJob job; + size_t i;
int or unsigned would also work :-)
/* priv->monitor_chr is set only for qemu */ if (priv->monConfig) {
...
@@ -473,6 +484,21 @@ static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data)
priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1;
+ n = virXPathNodeSet("./nbdDisk/disk", ctxt, &nodes); + if (n < 0) + goto error; + if (n) { + if (VIR_REALLOC_N(priv->nbdDisk, n) < 0) { + virReportOOMError(); + goto error; + } + priv->nnbdDisk = n; + + for (i = 0; i < n; i++) + if (!(priv->nbdDisk[i] = virXMLPropString(nodes[i], "src"))) + goto error;
Ouch, for some reason I though virXMLPropString does not strdup the attribute value. But looking at xmlGetProp (which virXMLPropString is just a wrapper for) the caller is responsible for freeing the memory. Thus your code is correct and we have a lot of memory leaks in our XML parsers :-(
+ } + return 0;
error:
ACK Jirka
participants (2)
-
Jiri Denemark
-
Michal Privoznik