[libvirt] [PATCH v3 00/12] Rework storage migration

This patch set re-implements migration with storage for enough new qemu. Currently, you can migrate a domain to a host without need for shared storage. This is done by setting 'blk' or 'inc' attribute (representing VIR_MIGRATE_NON_SHARED_DISK and VIR_MIGRATE_NON_SHARED_INC flags respectively) of 'migrate' monitor command. However, the qemu implementation is buggy and applications are advised to switch to new impementation which, moreover, offers some nice features, like migrating only explicitly specified disks. The new functionality is controlled via 'nbd-server-*' and 'drive-mirror' commands. The flow is meant to look like this: 1) User invokes libvirt's migrate functionality. 2) libvirt checks that no block jobs are active on the source. 3) libvirt starts the destination QEMU and sets up the NBD server using the nbd-server-start and nbd-server-add commands. 4) libvirt starts drive-mirror with a destination pointing to the remote NBD server, for example nbd:host:port:exportname=diskname (where diskname is the -drive id specified on the destination). 5) once all mirroring jobs reach steady state, libvirt invokes the migrate command. 6) once migration completed, libvirt invokes the nbd-server-stop command on the destination QEMU. If we just skip the 2nd step and there is an active block-job, qemu will fail in step 4. No big deal. Since we try to NOT break migration and keep things compatible, this feature is enabled iff both sides support it. Since there's obvious need for some data transfer between src and dst, I've put it into qemuCookieMigration: 1) src -> dest: (QEMU_MIGRATION_PHASE_BEGIN3 -> QEMU_MIGRATION_PHASE_PREPARE) <nbd> <disk target='vda' size='15032385536'/> <disk target='vdb' size='11534336'/> <disk target='vdc' size='13631488'/> </nbd> The source is telling the destination it supports the NBD feature. Moreover, which disks are to be transferred and what are their sizes. That's because disks needs to be fully allocated (even qcow) for successful transfer. 2) dst -> src: (QEMU_MIGRATION_PHASE_PREPARE -> QEMU_MIGRATION_PHASE_PERFORM3) <nbd port='5901'/> The destination is confirming it does support NBD as well. All disks were pre-created and NBD server is listening on given port (5901 in this case). If either src or dst doesn't support NBD, it is not used and whole process falls back to old implementation. diff to v1: -Eric's and Daniel's suggestions worked in. To point out the bigger ones: don't do NBD style when TUNNELLED requested, added 'b:writable' to 'nbd-server-add' -drop '/qemu-migration/nbd/disk/@src' attribute from migration cookie. As pointed out by Jirka, disk->src can be changed during migration (e.g. by migration hook or by passed xml). So I've tried (as suggested on the list) passing disk alias. However, since qemu hasn't been started on destination yet, the aliases hasn't been generated yet. So we have to rely on ordering completely. diff to v2: -rebase to reflect changes made by offline migration patch -send initial nbd cookie only if needed diff to v2.1: -nbd cookie reworked -don't rely on disk ordering in the cookie, but use disk target for that -adapt to virPortAllocator -unlink pre-created storage on migration fail -other of Jirka's suggestions worked in Michal Privoznik (12): qemu: Introduce NBD_SERVER capability Introduce NBD migration cookie qemu: Introduce nbd-server-start command qemu: Introduce nbd-server-add command qemu: Introduce nbd-server-stop command qemu_migration: Introduce qemuMigrationStartNBDServer() qemu_migration: Introduce qemuMigrationDriveMirror qemu_domain: Introduce qemuDomainGetDiskBlockInfo qemu_migration: Check size prerequisites qemu_migration: Stop NBD server at Finish phase qemu_migration: Cancel running jobs on failed migration qemu_migration: Unlink pre-created storage on error src/qemu/qemu_capabilities.c | 4 +- src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain.c | 157 ++++++++- src/qemu/qemu_domain.h | 7 + src/qemu/qemu_driver.c | 124 +------ src/qemu/qemu_migration.c | 769 ++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor.c | 63 ++++ src/qemu/qemu_monitor.h | 7 + src/qemu/qemu_monitor_json.c | 102 ++++++ src/qemu/qemu_monitor_json.h | 7 + src/qemu/qemu_process.c | 13 + 11 files changed, 1118 insertions(+), 136 deletions(-) -- 1.8.0.2

This just keeps track whether qemu knows nbd-server-* commands so we can use it during migration or not. --- src/qemu/qemu_capabilities.c | 4 +++- src/qemu/qemu_capabilities.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index e390cb1..ca831fa 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -205,7 +205,7 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "usb-serial", /* 125 */ "usb-net", "add-fd", - + "nbd-server-start", ); struct _qemuCaps { @@ -1965,6 +1965,8 @@ qemuCapsProbeQMPCommands(qemuCapsPtr caps, qemuCapsSet(caps, QEMU_CAPS_DISK_SNAPSHOT); else if (STREQ(name, "add-fd")) qemuCapsSet(caps, QEMU_CAPS_ADD_FD); + else if (STREQ(name, "nbd-server-start")) + qemuCapsSet(caps, QEMU_CAPS_NBD_SERVER); VIR_FREE(name); } VIR_FREE(commands); diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index cb0dad0..7e720f4 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -166,6 +166,7 @@ enum qemuCapsFlags { QEMU_CAPS_DEVICE_USB_SERIAL = 125, /* -device usb-serial */ QEMU_CAPS_DEVICE_USB_NET = 126, /* -device usb-net */ QEMU_CAPS_ADD_FD = 127, /* -add-fd */ + QEMU_CAPS_NBD_SERVER = 128, /* nbd-server-start QMP command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; -- 1.8.0.2

This migration cookie is meant for two purposes. The first is to be sent in begin phase from source to destination to let it know we support new implementation of VIR_MIGRATE_NON_SHARED_{DISK,INC} so destination can start NBD server. Then, the second purpose is, destination can let us know, on which port the NBD server is running. --- src/qemu/qemu_domain.h | 1 + src/qemu/qemu_migration.c | 139 +++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 125 insertions(+), 15 deletions(-) diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 68cf295..fb58877 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -154,6 +154,7 @@ struct _qemuDomainObjPrivate { unsigned long migMaxBandwidth; char *origname; + int nbdPort; /* Port used for migration with NBD */ virChrdevsPtr devs; diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index a75fb4e..c71021d 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -74,6 +74,7 @@ enum qemuMigrationCookieFlags { QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE, QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT, QEMU_MIGRATION_COOKIE_FLAG_NETWORK, + QEMU_MIGRATION_COOKIE_FLAG_NBD, QEMU_MIGRATION_COOKIE_FLAG_LAST }; @@ -81,13 +82,18 @@ enum qemuMigrationCookieFlags { VIR_ENUM_DECL(qemuMigrationCookieFlag); VIR_ENUM_IMPL(qemuMigrationCookieFlag, QEMU_MIGRATION_COOKIE_FLAG_LAST, - "graphics", "lockstate", "persistent", "network"); + "graphics", + "lockstate", + "persistent", + "network", + "nbd"); enum qemuMigrationCookieFeatures { QEMU_MIGRATION_COOKIE_GRAPHICS = (1 << QEMU_MIGRATION_COOKIE_FLAG_GRAPHICS), QEMU_MIGRATION_COOKIE_LOCKSTATE = (1 << QEMU_MIGRATION_COOKIE_FLAG_LOCKSTATE), QEMU_MIGRATION_COOKIE_PERSISTENT = (1 << QEMU_MIGRATION_COOKIE_FLAG_PERSISTENT), QEMU_MIGRATION_COOKIE_NETWORK = (1 << QEMU_MIGRATION_COOKIE_FLAG_NETWORK), + QEMU_MIGRATION_COOKIE_NBD = (1 << QEMU_MIGRATION_COOKIE_FLAG_NBD), }; typedef struct _qemuMigrationCookieGraphics qemuMigrationCookieGraphics; @@ -121,6 +127,14 @@ struct _qemuMigrationCookieNetwork { qemuMigrationCookieNetDataPtr net; }; +typedef struct _qemuMigrationCookieNBD qemuMigrationCookieNBD; +typedef qemuMigrationCookieNBD *qemuMigrationCookieNBDPtr; +struct _qemuMigrationCookieNBD { + int port; /* on which port does NBD server listen for incoming data. + Zero value has special meaning - it is there just to let + destination know we (the source) do support NBD. */ +}; + typedef struct _qemuMigrationCookie qemuMigrationCookie; typedef qemuMigrationCookie *qemuMigrationCookiePtr; struct _qemuMigrationCookie { @@ -149,6 +163,9 @@ struct _qemuMigrationCookie { /* If (flags & QEMU_MIGRATION_COOKIE_NETWORK) */ qemuMigrationCookieNetworkPtr network; + + /* If (flags & QEMU_MIGRATION_COOKIE_NBD) */ + qemuMigrationCookieNBDPtr nbd; }; static void qemuMigrationCookieGraphicsFree(qemuMigrationCookieGraphicsPtr grap) @@ -194,6 +211,7 @@ static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) VIR_FREE(mig->name); VIR_FREE(mig->lockState); VIR_FREE(mig->lockDriver); + VIR_FREE(mig->nbd); VIR_FREE(mig); } @@ -504,6 +522,27 @@ qemuMigrationCookieAddNetwork(qemuMigrationCookiePtr mig, } +static int +qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, + virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + /* It is not a bug if there already is a NBD data */ + if (!mig->nbd && + VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + return -1; + } + + mig->nbd->port = priv->nbdPort; + mig->flags |= QEMU_MIGRATION_COOKIE_NBD; + + return 0; +} + + static void qemuMigrationCookieGraphicsXMLFormat(virBufferPtr buf, qemuMigrationCookieGraphicsPtr grap) { @@ -606,6 +645,13 @@ qemuMigrationCookieXMLFormat(virQEMUDriverPtr driver, if ((mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) && mig->network) qemuMigrationCookieNetworkXMLFormat(buf, mig->network); + if ((mig->flags & QEMU_MIGRATION_COOKIE_NBD) && mig->nbd) { + virBufferAddLit(buf, " <nbd"); + if (mig->nbd->port) + virBufferAsprintf(buf, " port='%d'", mig->nbd->port); + virBufferAddLit(buf, "/>\n"); + } + virBufferAddLit(buf, "</qemu-migration>\n"); return 0; } @@ -833,6 +879,12 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, goto error; } + /* nbd is optional */ + if (val == QEMU_MIGRATION_COOKIE_FLAG_NBD) { + VIR_FREE(str); + continue; + } + if ((flags & (1 << val)) == 0) { virReportError(VIR_ERR_INTERNAL_ERROR, _("Unsupported migration cookie feature %s"), @@ -885,6 +937,27 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, (!(mig->network = qemuMigrationCookieNetworkXMLParse(ctxt)))) goto error; + if (flags & QEMU_MIGRATION_COOKIE_NBD && + virXPathBoolean("boolean(./nbd)", ctxt)) { + char *port; + + port = virXPathString("string(./nbd/@port)", ctxt); + if (port) { + if (VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + goto error; + } + if (virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { + VIR_FREE(port); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed nbd port '%s'"), + port); + goto error; + } + VIR_FREE(port); + } + } + return 0; error: @@ -949,6 +1022,10 @@ qemuMigrationBakeCookie(qemuMigrationCookiePtr mig, return -1; } + if ((flags & QEMU_MIGRATION_COOKIE_NBD) && + qemuMigrationCookieAddNBD(mig, driver, dom) < 0) + return -1; + if (!(*cookieout = qemuMigrationCookieXMLFormatStr(driver, mig))) return -1; @@ -1434,6 +1511,7 @@ char *qemuMigrationBegin(virQEMUDriverPtr driver, qemuMigrationCookiePtr mig = NULL; virDomainDefPtr def = NULL; qemuDomainObjPrivatePtr priv = vm->privateData; + unsigned int cookieFlags = QEMU_MIGRATION_COOKIE_LOCKSTATE; VIR_DEBUG("driver=%p, vm=%p, xmlin=%s, dname=%s," " cookieout=%p, cookieoutlen=%p, flags=%lx", @@ -1453,12 +1531,23 @@ char *qemuMigrationBegin(virQEMUDriverPtr driver, if (!(flags & VIR_MIGRATE_UNSAFE) && !qemuMigrationIsSafe(vm->def)) goto cleanup; + if (flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC) && + qemuCapsGet(priv->caps, QEMU_CAPS_NBD_SERVER)) { + /* TODO support NBD for TUNNELLED migration */ + if (flags & VIR_MIGRATE_TUNNELLED) + VIR_DEBUG("NBD in tunnelled migration is currently not supported"); + else { + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + priv->nbdPort = 0; + } + } + if (!(mig = qemuMigrationEatCookie(driver, vm, NULL, 0, 0))) goto cleanup; if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, - QEMU_MIGRATION_COOKIE_LOCKSTATE) < 0) + cookieFlags) < 0) goto cleanup; if (flags & VIR_MIGRATE_OFFLINE) { @@ -1653,7 +1742,8 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, origname = NULL; if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_LOCKSTATE))) + QEMU_MIGRATION_COOKIE_LOCKSTATE | + QEMU_MIGRATION_COOKIE_NBD))) goto cleanup; if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) @@ -1713,8 +1803,19 @@ done: else cookieFlags = QEMU_MIGRATION_COOKIE_GRAPHICS; - if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, - cookieFlags) < 0) { + if (flags & (VIR_MIGRATE_NON_SHARED_DISK | VIR_MIGRATE_NON_SHARED_INC) && + qemuCapsGet(priv->caps, QEMU_CAPS_NBD_SERVER)) { + /* TODO support NBD for TUNNELLED migration */ + if (flags & VIR_MIGRATE_TUNNELLED) + VIR_DEBUG("NBD in tunnelled migration is currently not supported"); + else { + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + priv->nbdPort = 0; + } + } + + if (qemuMigrationBakeCookie(mig, driver, vm, cookieout, + cookieoutlen, cookieFlags) < 0) { /* We could tear down the whole guest here, but * cookie data is (so far) non-critical, so that * seems a little harsh. We'll just warn for now. @@ -2210,6 +2311,7 @@ qemuMigrationRun(virQEMUDriverPtr driver, int fd = -1; unsigned long migrate_speed = resource ? resource : priv->migMaxBandwidth; virErrorPtr orig_err = NULL; + unsigned int cookieFlags = 0; VIR_DEBUG("driver=%p, vm=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, flags=%lx, resource=%lu, " @@ -2218,6 +2320,16 @@ qemuMigrationRun(virQEMUDriverPtr driver, cookieout, cookieoutlen, flags, resource, spec, spec->destType, spec->fwdType); + if (flags & VIR_MIGRATE_NON_SHARED_DISK) { + migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_DISK; + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + } + + if (flags & VIR_MIGRATE_NON_SHARED_INC) { + migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_INC; + cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; + } + if (virLockManagerPluginUsesState(driver->lockManager) && !cookieout) { virReportError(VIR_ERR_INTERNAL_ERROR, @@ -2227,8 +2339,9 @@ qemuMigrationRun(virQEMUDriverPtr driver, return -1; } - if (!(mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, - QEMU_MIGRATION_COOKIE_GRAPHICS))) + mig = qemuMigrationEatCookie(driver, vm, cookiein, cookieinlen, + cookieFlags | QEMU_MIGRATION_COOKIE_GRAPHICS); + if (!mig) goto cleanup; if (qemuDomainMigrateGraphicsRelocate(driver, vm, mig) < 0) @@ -2261,11 +2374,6 @@ qemuMigrationRun(virQEMUDriverPtr driver, goto cleanup; } - if (flags & VIR_MIGRATE_NON_SHARED_DISK) - migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_DISK; - - if (flags & VIR_MIGRATE_NON_SHARED_INC) - migrate_flags |= QEMU_MONITOR_MIGRATE_NON_SHARED_INC; /* connect to the destination qemu if needed */ if (spec->destType == MIGRATION_DEST_CONNECT_HOST && @@ -2373,10 +2481,11 @@ cleanup: VIR_FORCE_CLOSE(fd); } + cookieFlags |= (QEMU_MIGRATION_COOKIE_PERSISTENT | + QEMU_MIGRATION_COOKIE_NETWORK); if (ret == 0 && - qemuMigrationBakeCookie(mig, driver, vm, cookieout, cookieoutlen, - QEMU_MIGRATION_COOKIE_PERSISTENT | - QEMU_MIGRATION_COOKIE_NETWORK) < 0) { + qemuMigrationBakeCookie(mig, driver, vm, cookieout, + cookieoutlen, cookieFlags) < 0) { VIR_WARN("Unable to encode migration cookie"); } -- 1.8.0.2

This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 22 +++++++++++++++++ src/qemu/qemu_monitor.h | 3 +++ src/qemu/qemu_monitor_json.c | 56 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 +++ 4 files changed, 84 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 3f1aed8..7508ea3 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3384,3 +3384,25 @@ char *qemuMonitorGetTargetArch(qemuMonitorPtr mon) return qemuMonitorJSONGetTargetArch(mon); } + +int qemuMonitorNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port) +{ + VIR_DEBUG("mon=%p host=%s port=%u", + mon, host, port); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerStart(mon, host, port); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index ac77158..230f920 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -632,6 +632,9 @@ int qemuMonitorGetObjectProps(qemuMonitorPtr mon, char ***props); char *qemuMonitorGetTargetArch(qemuMonitorPtr mon); +int qemuMonitorNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index a86d90c..e03688d 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4359,3 +4359,59 @@ cleanup: virJSONValueFree(reply); return ret; } + +int +qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port) +{ + int ret = -1; + virJSONValuePtr cmd = NULL; + virJSONValuePtr reply = NULL; + virJSONValuePtr data = NULL; + virJSONValuePtr addr = NULL; + char *port_str = NULL; + + if (!(data = virJSONValueNewObject()) || + !(addr = virJSONValueNewObject()) || + (virAsprintf(&port_str, "%u", port) < 0)) { + virReportOOMError(); + goto cleanup; + } + + /* port is really expected as a string here by qemu */ + if (virJSONValueObjectAppendString(data, "host", host) < 0 || + virJSONValueObjectAppendString(data, "port", port_str) < 0 || + virJSONValueObjectAppendString(addr, "type", "inet") < 0 || + virJSONValueObjectAppend(addr, "data", data) < 0) { + virReportOOMError(); + goto cleanup; + } + + /* From now on, @data is part of @addr */ + data = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-start", + "a:addr", addr, + NULL))) + goto cleanup; + + /* From now on, @addr is part of @cmd */ + addr = NULL; + + if (qemuMonitorJSONCommand(mon, cmd, &reply) < 0) + goto cleanup; + + if (qemuMonitorJSONCheckError(cmd, reply) < 0) + goto cleanup; + + ret = 0; + +cleanup: + VIR_FREE(port_str); + virJSONValueFree(reply); + virJSONValueFree(cmd); + virJSONValueFree(addr); + virJSONValueFree(data); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 925d937..a6ab68b 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -327,4 +327,7 @@ int qemuMonitorJSONGetObjectProps(qemuMonitorPtr mon, ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); char *qemuMonitorJSONGetTargetArch(qemuMonitorPtr mon); +int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, + const char *host, + unsigned int port); #endif /* QEMU_MONITOR_JSON_H */ -- 1.8.0.2

This will be used with new migration scheme. This patch creates basically just monitor stub functions. Wiring them into something useful is done in later patches. --- src/qemu/qemu_monitor.c | 22 ++++++++++++++++++++++ src/qemu/qemu_monitor.h | 3 +++ src/qemu/qemu_monitor_json.c | 25 +++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 3 +++ 4 files changed, 53 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 7508ea3..c266341 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3406,3 +3406,25 @@ int qemuMonitorNBDServerStart(qemuMonitorPtr mon, return qemuMonitorJSONNBDServerStart(mon, host, port); } + +int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID, + bool writable) +{ + VIR_DEBUG("mon=%p deviceID=%s", + mon, deviceID); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerAdd(mon, deviceID, writable); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 230f920..d54d16d 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -635,6 +635,9 @@ char *qemuMonitorGetTargetArch(qemuMonitorPtr mon); int qemuMonitorNBDServerStart(qemuMonitorPtr mon, const char *host, unsigned int port); +int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID, + bool writable); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e03688d..f14b597 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4415,3 +4415,28 @@ cleanup: virJSONValueFree(data); return ret; } + +int +qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID, + bool writable) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-add", + "s:device", deviceID, + "b:writable", writable, + NULL))) + return ret; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index a6ab68b..c58aa7d 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -330,4 +330,7 @@ char *qemuMonitorJSONGetTargetArch(qemuMonitorPtr mon); int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, const char *host, unsigned int port); +int qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, + const char *deviceID, + bool writable); #endif /* QEMU_MONITOR_JSON_H */ -- 1.8.0.2

This will be used after all migration work is done to stop NBD server running on destination. It doesn't take any arguments, just issues a command. --- src/qemu/qemu_monitor.c | 19 +++++++++++++++++++ src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 21 +++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 1 + 4 files changed, 42 insertions(+) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index c266341..14efe7b 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3428,3 +3428,22 @@ int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, return qemuMonitorJSONNBDServerAdd(mon, deviceID, writable); } + +int qemuMonitorNBDServerStop(qemuMonitorPtr mon) +{ + VIR_DEBUG("mon=%p", mon); + + if (!mon) { + virReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (!mon->json) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s", + _("JSON monitor is required")); + return -1; + } + + return qemuMonitorJSONNBDServerStop(mon); +} diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index d54d16d..10145e5 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -638,6 +638,7 @@ int qemuMonitorNBDServerStart(qemuMonitorPtr mon, int qemuMonitorNBDServerAdd(qemuMonitorPtr mon, const char *deviceID, bool writable); +int qemuMonitorNBDServerStop(qemuMonitorPtr); /** * When running two dd process and using <> redirection, we need a * shell that will not truncate files. These two strings serve that diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index f14b597..856dd6a 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -4440,3 +4440,24 @@ qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, virJSONValueFree(reply); return ret; } + +int +qemuMonitorJSONNBDServerStop(qemuMonitorPtr mon) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (!(cmd = qemuMonitorJSONMakeCommand("nbd-server-stop", + NULL))) + return ret; + + ret = qemuMonitorJSONCommand(mon, cmd, &reply); + + if (ret == 0) + ret = qemuMonitorJSONCheckError(cmd, reply); + + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index c58aa7d..fd04a92 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -333,4 +333,5 @@ int qemuMonitorJSONNBDServerStart(qemuMonitorPtr mon, int qemuMonitorJSONNBDServerAdd(qemuMonitorPtr mon, const char *deviceID, bool writable); +int qemuMonitorJSONNBDServerStop(qemuMonitorPtr mon); #endif /* QEMU_MONITOR_JSON_H */ -- 1.8.0.2

We need to start NBD server and feed it with all non-<shared/>, RW and source-full disks. Moreover, with new virPortAllocator we must ensure the borrowed port for NBD server will be returned if either migration completes or qemu process is teared down. --- src/qemu/qemu_migration.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_process.c | 5 +++ 2 files changed, 88 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index c71021d..16840b2 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -35,6 +35,7 @@ #include "qemu_domain.h" #include "qemu_process.h" #include "qemu_capabilities.h" +#include "qemu_command.h" #include "qemu_cgroup.h" #include "domain_audit.h" @@ -1091,6 +1092,80 @@ error: return NULL; } +/** + * qemuMigrationStartNBDServer: + * @driver: qemu driver + * @vm: domain + * + * Starts NBD server. This is a newer method to copy + * storage during migration than using 'blk' and 'inc' + * arguments in 'migrate' monitor command. + * Error is reported here. + * + * Returns 0 on success, -1 otherwise. + */ +static int +qemuMigrationStartNBDServer(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + unsigned short port = 0; + const char *listen = "0.0.0.0"; + char *diskAlias = NULL; + size_t i; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) + goto cleanup; + + if (!port) { + if (virPortAllocatorAcquire(driver->remotePorts, &port) < 0) + goto cleanup; + + if (qemuMonitorNBDServerStart(priv->mon, listen, port) < 0) { + qemuDomainObjExitMonitorWithDriver(driver, vm); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Unable to start nbd server")); + goto cleanup; + } + } + + if (qemuMonitorNBDServerAdd(priv->mon, diskAlias, true) < 0) { + qemuDomainObjExitMonitorWithDriver(driver, vm); + + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to add '%s' to NDB server"), diskAlias); + goto cleanup; + } + qemuDomainObjExitMonitorWithDriver(driver, vm); + } + + priv->nbdPort = port; + ret = 0; + +cleanup: + VIR_FREE(diskAlias); + if (ret < 0) + virPortAllocatorRelease(driver->remotePorts, port); + return ret; +} + + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -1809,8 +1884,11 @@ done: if (flags & VIR_MIGRATE_TUNNELLED) VIR_DEBUG("NBD in tunnelled migration is currently not supported"); else { + if (qemuMigrationStartNBDServer(driver, vm) < 0) { + /* error already reported */ + goto endjob; + } cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; - priv->nbdPort = 0; } } @@ -1857,6 +1935,10 @@ cleanup: virObjectUnlock(vm); else qemuDomainRemoveInactive(driver, vm); + if (ret < 0 && priv->nbdPort) { + virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); + priv->nbdPort = 0; + } } if (event) qemuDomainEventQueue(driver, event); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 30f923a..2c57e0e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4240,6 +4240,11 @@ void qemuProcessStop(virQEMUDriverPtr driver, } } + if (priv->nbdPort) { + virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); + priv->nbdPort = 0; + } + if (priv->agent) { qemuAgentClose(priv->agent); priv->agent = NULL; -- 1.8.0.2

On 02/06/2013 08:52 AM, Michal Privoznik wrote:
We need to start NBD server and feed it with all non-<shared/>, RW and source-full disks. Moreover, with new virPortAllocator we must ensure the borrowed port for NBD server will be returned if either migration completes or qemu process is teared down. --- src/qemu/qemu_migration.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_process.c | 5 +++ 2 files changed, 88 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index c71021d..16840b2 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -35,6 +35,7 @@ #include "qemu_domain.h" #include "qemu_process.h" #include "qemu_capabilities.h" +#include "qemu_command.h" #include "qemu_cgroup.h"
#include "domain_audit.h" @@ -1091,6 +1092,80 @@ error: return NULL; }
+/** + * qemuMigrationStartNBDServer: + * @driver: qemu driver + * @vm: domain + * + * Starts NBD server. This is a newer method to copy + * storage during migration than using 'blk' and 'inc' + * arguments in 'migrate' monitor command. + * Error is reported here. + * + * Returns 0 on success, -1 otherwise. + */ +static int +qemuMigrationStartNBDServer(virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + unsigned short port = 0; + const char *listen = "0.0.0.0"; + char *diskAlias = NULL; + size_t i; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) + goto cleanup; + + if (!port) { + if (virPortAllocatorAcquire(driver->remotePorts, &port) < 0) + goto cleanup; + + if (qemuMonitorNBDServerStart(priv->mon, listen, port) < 0) { + qemuDomainObjExitMonitorWithDriver(driver, vm); + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("Unable to start nbd server")); + goto cleanup; + } + } + + if (qemuMonitorNBDServerAdd(priv->mon, diskAlias, true) < 0) { + qemuDomainObjExitMonitorWithDriver(driver, vm); + + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to add '%s' to NDB server"), diskAlias); + goto cleanup; + } + qemuDomainObjExitMonitorWithDriver(driver, vm); + } + + priv->nbdPort = port; + ret = 0; + +cleanup: + VIR_FREE(diskAlias); + if (ret < 0) + virPortAllocatorRelease(driver->remotePorts, port);
I believe you can get here with driver->remotePorts not yet allocated which will cause issues in the Release routine. Perhaps follow what you did below and use "&& port" to indicate that you have a port value.
+ return ret; +} + + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -1809,8 +1884,11 @@ done: if (flags & VIR_MIGRATE_TUNNELLED) VIR_DEBUG("NBD in tunnelled migration is currently not supported"); else { + if (qemuMigrationStartNBDServer(driver, vm) < 0) { + /* error already reported */ + goto endjob; + } cookieFlags |= QEMU_MIGRATION_COOKIE_NBD; - priv->nbdPort = 0; } }
@@ -1857,6 +1935,10 @@ cleanup: virObjectUnlock(vm); else qemuDomainRemoveInactive(driver, vm); + if (ret < 0 && priv->nbdPort) { + virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); + priv->nbdPort = 0; + } } if (event) qemuDomainEventQueue(driver, event); diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 30f923a..2c57e0e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4240,6 +4240,11 @@ void qemuProcessStop(virQEMUDriverPtr driver, } }
+ if (priv->nbdPort) { + virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); + priv->nbdPort = 0; + } + if (priv->agent) { qemuAgentClose(priv->agent); priv->agent = NULL;

This function does the source part of NBD magic. It invokes drive-mirror on each non shared and RW disk with a source and wait till the mirroring process completes. When it does we can proceed with migration. Currently, an active waiting is done: every 500ms libvirt asks qemu if block-job is finished or not. However, once the job finishes, qemu doesn't report its progress so we can only assume if the job finished successfully or not. The better solution would be to listen to the event which is sent as soon as the job finishes. The event does contain the result of job. --- src/qemu/qemu_migration.c | 190 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 189 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 16840b2..defad6b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1165,6 +1165,188 @@ cleanup: return ret; } +/** + * qemuMigrationDiskMirror: + * @driver: qemu driver + * @vm: domain + * @mig: migration cookie + * @host: where are we migrating to + * @speed: how much should the copying be limited + * @migrate_flags: migrate monitor command flags + * + * Run drive-mirror to feed NBD server running on dst and + * wait till the process completes. On success, update + * @migrate_flags so we don't tell 'migrate' command to + * do the very same operation. + * + * Returns 0 on success (@migrate_flags updated), + * -1 otherwise. + */ +static int +qemuMigrationDriveMirror(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig, + const char *host, + unsigned long speed, + unsigned int *migrate_flags) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + int mon_ret; + int port; + ssize_t i; + char *diskAlias = NULL; + char *nbd_dest = NULL; + unsigned int mirror_flags = VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + virErrorPtr err = NULL; + + if (!(*migrate_flags & (QEMU_MONITOR_MIGRATE_NON_SHARED_DISK | + QEMU_MONITOR_MIGRATE_NON_SHARED_INC))) + return 0; + + if (!mig->nbd) { + /* Destination doesn't support NBD server. + * Fall back to previous implementation. + * XXX Or should we report an error here? */ + VIR_DEBUG("Destination doesn't support NBD server " + "Falling back to previous implementation."); + ret = 0; + goto cleanup; + } + + /* steal NBD port and thus prevent it's propagation back to destination */ + port = mig->nbd->port; + mig->nbd->port = 0; + + if (*migrate_flags & QEMU_MONITOR_MIGRATE_NON_SHARED_INC) + mirror_flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + virDomainBlockJobInfo info; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + VIR_FREE(nbd_dest); + if ((virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) || + (virAsprintf(&nbd_dest, "nbd:%s:%u:exportname=%s", + host, port, diskAlias) < 0)) { + virReportOOMError(); + goto error; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + mon_ret = qemuMonitorDriveMirror(priv->mon, diskAlias, nbd_dest, + NULL, speed, mirror_flags); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (mon_ret < 0) + goto error; + + /* wait for completion */ + while (true) { + /* Poll every 500ms for progress & to allow cancellation */ + struct timespec ts = { .tv_sec = 0, .tv_nsec = 500 * 1000 * 1000ull }; + + memset(&info, 0, sizeof(info)); + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + if (priv->job.asyncAbort) { + /* explicitly do this *after* we entered the monitor, + * as this is a critical section so we are guaranteed + * priv->job.asyncAbort will not change */ + qemuDomainObjExitMonitorWithDriver(driver, vm); + virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), + qemuDomainAsyncJobTypeToString(priv->job.asyncJob), + _("canceled by client")); + goto error; + } + mon_ret = qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + &info, BLOCK_JOB_INFO, true); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (mon_ret < 0) { + /* If there's not any job running on a block device, + * qemu won't report anything, so it is not fatal + * if we fail to query status as job may have finished + * while we were sleeping. */ + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to query drive-mirror job status. " + "Stop polling on '%s' cur:%llu end:%llu"), + diskAlias, info.cur, info.end); + goto error; + } + + if (info.cur == info.end) { + VIR_DEBUG("Drive mirroring of '%s' completed", diskAlias); + break; + } + + /* XXX Frankly speaking, we should listen to the events, + * instead of doing this. But this works for now and we + * are doing something similar in migration itself anyway */ + + virObjectUnlock(vm); + qemuDriverUnlock(driver); + + nanosleep(&ts, NULL); + + qemuDriverLock(driver); + virObjectLock(vm); + } + } + + /* Okay, copied. Modify migrate_flags */ + *migrate_flags &= ~(QEMU_MONITOR_MIGRATE_NON_SHARED_DISK | + QEMU_MONITOR_MIGRATE_NON_SHARED_INC); + ret = 0; + +cleanup: + VIR_FREE(diskAlias); + VIR_FREE(nbd_dest); + return ret; + +error: + /* don't overwrite any errors */ + err = virSaveLastError(); + /* cancel any outstanding jobs */ + for (; i >= 0; i--) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto error; + } + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) == 0) { + if (qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + NULL, BLOCK_JOB_ABORT, true) < 0) { + VIR_WARN("Unable to cancel block-job on '%s'", diskAlias); + } + qemuDomainObjExitMonitorWithDriver(driver, vm); + } else { + VIR_WARN("Unable to enter monitor. No block job cancelled"); + } + } + if (err) + virSetError(err); + virFreeError(err); + goto cleanup; +} /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination @@ -2429,6 +2611,13 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (qemuDomainMigrateGraphicsRelocate(driver, vm, mig) < 0) VIR_WARN("unable to provide data for graphics client relocation"); + /* this will update migrate_flags on success */ + if (qemuMigrationDriveMirror(driver, vm, mig, spec->dest.host.name, + migrate_speed, &migrate_flags) < 0) { + /* error reported by helper func */ + goto cleanup; + } + /* Before EnterMonitor, since qemuMigrationSetOffline already does that */ if (!(flags & VIR_MIGRATE_LIVE) && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { @@ -2456,7 +2645,6 @@ qemuMigrationRun(virQEMUDriverPtr driver, goto cleanup; } - /* connect to the destination qemu if needed */ if (spec->destType == MIGRATION_DEST_CONNECT_HOST && qemuMigrationConnect(driver, vm, spec) < 0) { -- 1.8.0.2

On 02/06/2013 08:52 AM, Michal Privoznik wrote:
This function does the source part of NBD magic. It invokes drive-mirror on each non shared and RW disk with a source and wait till the mirroring process completes. When it does we can proceed with migration.
Currently, an active waiting is done: every 500ms libvirt asks qemu if block-job is finished or not. However, once the job finishes, qemu doesn't report its progress so we can only assume if the job finished successfully or not. The better solution would be to listen to the event which is sent as soon as the job finishes. The event does contain the result of job. --- src/qemu/qemu_migration.c | 190 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 189 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 16840b2..defad6b 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1165,6 +1165,188 @@ cleanup: return ret; }
+/** + * qemuMigrationDiskMirror: + * @driver: qemu driver + * @vm: domain + * @mig: migration cookie + * @host: where are we migrating to + * @speed: how much should the copying be limited + * @migrate_flags: migrate monitor command flags + * + * Run drive-mirror to feed NBD server running on dst and + * wait till the process completes. On success, update + * @migrate_flags so we don't tell 'migrate' command to + * do the very same operation. + * + * Returns 0 on success (@migrate_flags updated), + * -1 otherwise. + */ +static int +qemuMigrationDriveMirror(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig, + const char *host, + unsigned long speed, + unsigned int *migrate_flags) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + int mon_ret; + int port; + ssize_t i;
Isn't this really just a loop counter?
+ char *diskAlias = NULL; + char *nbd_dest = NULL; + unsigned int mirror_flags = VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + virErrorPtr err = NULL; + + if (!(*migrate_flags & (QEMU_MONITOR_MIGRATE_NON_SHARED_DISK | + QEMU_MONITOR_MIGRATE_NON_SHARED_INC))) + return 0; + + if (!mig->nbd) { + /* Destination doesn't support NBD server. + * Fall back to previous implementation. + * XXX Or should we report an error here? */ + VIR_DEBUG("Destination doesn't support NBD server " + "Falling back to previous implementation."); + ret = 0; + goto cleanup; + } + + /* steal NBD port and thus prevent it's propagation back to destination */ + port = mig->nbd->port; + mig->nbd->port = 0; + + if (*migrate_flags & QEMU_MONITOR_MIGRATE_NON_SHARED_INC) + mirror_flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + virDomainBlockJobInfo info; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + VIR_FREE(nbd_dest); + if ((virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) || + (virAsprintf(&nbd_dest, "nbd:%s:%u:exportname=%s", + host, port, diskAlias) < 0)) { + virReportOOMError(); + goto error; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + mon_ret = qemuMonitorDriveMirror(priv->mon, diskAlias, nbd_dest, + NULL, speed, mirror_flags); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (mon_ret < 0) + goto error; + + /* wait for completion */ + while (true) { + /* Poll every 500ms for progress & to allow cancellation */ + struct timespec ts = { .tv_sec = 0, .tv_nsec = 500 * 1000 * 1000ull }; + + memset(&info, 0, sizeof(info)); + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto error; + if (priv->job.asyncAbort) { + /* explicitly do this *after* we entered the monitor, + * as this is a critical section so we are guaranteed + * priv->job.asyncAbort will not change */ + qemuDomainObjExitMonitorWithDriver(driver, vm); + virReportError(VIR_ERR_OPERATION_ABORTED, _("%s: %s"), + qemuDomainAsyncJobTypeToString(priv->job.asyncJob), + _("canceled by client")); + goto error; + } + mon_ret = qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + &info, BLOCK_JOB_INFO, true); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (mon_ret < 0) { + /* If there's not any job running on a block device, + * qemu won't report anything, so it is not fatal + * if we fail to query status as job may have finished + * while we were sleeping. */ + virReportError(VIR_ERR_OPERATION_FAILED, + _("Unable to query drive-mirror job status. " + "Stop polling on '%s' cur:%llu end:%llu"), + diskAlias, info.cur, info.end); + goto error; + } + + if (info.cur == info.end) { + VIR_DEBUG("Drive mirroring of '%s' completed", diskAlias); + break; + } + + /* XXX Frankly speaking, we should listen to the events, + * instead of doing this. But this works for now and we + * are doing something similar in migration itself anyway */ + + virObjectUnlock(vm); + qemuDriverUnlock(driver); + + nanosleep(&ts, NULL); + + qemuDriverLock(driver); + virObjectLock(vm); + } + } + + /* Okay, copied. Modify migrate_flags */ + *migrate_flags &= ~(QEMU_MONITOR_MIGRATE_NON_SHARED_DISK | + QEMU_MONITOR_MIGRATE_NON_SHARED_INC); + ret = 0; + +cleanup: + VIR_FREE(diskAlias); + VIR_FREE(nbd_dest); + return ret; + +error: + /* don't overwrite any errors */ + err = virSaveLastError(); + /* cancel any outstanding jobs */ + for (; i >= 0; i--) {
You could enter this loop without actually having started a job for the current "i" value. Also, it seems to me from reading the code that the processing is start a job, wait for completion... start a job, wait for completion. Thus is going backwards here necessary? IOW - wouldn't you only need to cancel the "current" job? There's a number of reasons to get to error a couple which could be because of some other failure, such as priv->job.asyncAbort and "(mon_ret < 0)" when there's job a job running.
+ virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto error;
This could be one nasty loop
+ } + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) == 0) { + if (qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + NULL, BLOCK_JOB_ABORT, true) < 0) { + VIR_WARN("Unable to cancel block-job on '%s'", diskAlias); + } + qemuDomainObjExitMonitorWithDriver(driver, vm); + } else { + VIR_WARN("Unable to enter monitor. No block job cancelled"); + } + } + if (err) + virSetError(err); + virFreeError(err); + goto cleanup; +}
/* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination @@ -2429,6 +2611,13 @@ qemuMigrationRun(virQEMUDriverPtr driver, if (qemuDomainMigrateGraphicsRelocate(driver, vm, mig) < 0) VIR_WARN("unable to provide data for graphics client relocation");
+ /* this will update migrate_flags on success */ + if (qemuMigrationDriveMirror(driver, vm, mig, spec->dest.host.name, + migrate_speed, &migrate_flags) < 0) { + /* error reported by helper func */ + goto cleanup; + } + /* Before EnterMonitor, since qemuMigrationSetOffline already does that */ if (!(flags & VIR_MIGRATE_LIVE) && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { @@ -2456,7 +2645,6 @@ qemuMigrationRun(virQEMUDriverPtr driver, goto cleanup; }
- /* connect to the destination qemu if needed */ if (spec->destType == MIGRATION_DEST_CONNECT_HOST && qemuMigrationConnect(driver, vm, spec) < 0) {

This is just digging out important implementation from qemu driver's qemuDomainGetDiskBlockInfo() API as this functionality is going to be required in the next patch. --- src/qemu/qemu_domain.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 124 +++-------------------------------------------- 3 files changed, 138 insertions(+), 117 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5bf0ab0..d7d7163 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -40,6 +40,9 @@ #include <sys/time.h> #include <fcntl.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <unistd.h> #include <libxml/xpathInternals.h> @@ -2073,3 +2076,127 @@ cleanup: virObjectUnref(cfg); return ret; } + +int +qemuDomainGetDiskBlockInfo(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virDomainBlockInfoPtr info) +{ + + int ret = -1; + virStorageFileMetadata *meta = NULL; + virQEMUDriverConfigPtr cfg = NULL; + int format; + struct stat sb; + int fd = -1; + off_t end; + const char *path; + + if (!disk->src) { + virReportError(VIR_ERR_INVALID_ARG, + _("disk %s does not currently have a source assigned"), + disk->dst); + goto cleanup; + } + path = disk->src; + cfg = virQEMUDriverGetConfig(driver); + + /* The path is correct, now try to open it and get its size. */ + if ((fd = open(path, O_RDONLY)) < 0) { + virReportSystemError(errno, _("failed to open path '%s'"), path); + goto cleanup; + } + + /* Probe for magic formats */ + if (disk->format) { + format = disk->format; + } else { + if (cfg->allowDiskFormatProbing) { + if ((format = virStorageFileProbeFormat(path, + cfg->user, + cfg->group)) < 0) + goto cleanup; + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("no disk format for %s and probing is disabled"), + path); + goto cleanup; + } + } + + if (!(meta = virStorageFileGetMetadataFromFD(path, fd, format))) + goto cleanup; + + /* Get info for normal formats */ + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("cannot stat file '%s'"), path); + goto cleanup; + } + + if (S_ISREG(sb.st_mode)) { +#ifndef WIN32 + info->physical = (unsigned long long)sb.st_blocks * + (unsigned long long)DEV_BSIZE; +#else + info->physical = sb.st_size; +#endif + /* Regular files may be sparse, so logical size (capacity) is not same + * as actual physical above + */ + info->capacity = sb.st_size; + } else { + /* NB. Because we configure with AC_SYS_LARGEFILE, off_t should + * be 64 bits on all platforms. + */ + end = lseek(fd, 0, SEEK_END); + if (end == (off_t)-1) { + virReportSystemError(errno, _("failed to seek to end of %s"), path); + goto cleanup; + } + info->physical = end; + info->capacity = end; + } + + /* If the file we probed has a capacity set, then override + * what we calculated from file/block extents */ + if (meta->capacity) + info->capacity = meta->capacity; + + /* Set default value .. */ + info->allocation = info->physical; + + /* ..but if guest is running & not using raw + disk format and on a block device, then query + highest allocated extent from QEMU */ + if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && + format != VIR_STORAGE_FILE_RAW && + S_ISBLK(sb.st_mode) && + virDomainObjIsActive(vm)) { + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) + goto cleanup; + + if (virDomainObjIsActive(vm)) { + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorGetBlockExtent(priv->mon, + disk->info.alias, + &info->allocation); + qemuDomainObjExitMonitor(driver, vm); + } else { + ret = 0; + } + + if (qemuDomainObjEndJob(driver, vm) == 0) + vm = NULL; + } else { + ret = 0; + } + +cleanup: + VIR_FORCE_CLOSE(fd); + virStorageFileFreeMetadata(meta); + virObjectUnref(cfg); + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index fb58877..e8555d3 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -359,5 +359,9 @@ void qemuDomainCleanupRemove(virDomainObjPtr vm, qemuDomainCleanupCallback cb); void qemuDomainCleanupRun(virQEMUDriverPtr driver, virDomainObjPtr vm); +int qemuDomainGetDiskBlockInfo(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virDomainBlockInfoPtr info); #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 979a027..c3d0dc3 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9212,29 +9212,23 @@ cleanup: } -static int qemuDomainGetBlockInfo(virDomainPtr dom, - const char *path, - virDomainBlockInfoPtr info, - unsigned int flags) { +static int +qemuDomainGetBlockInfo(virDomainPtr dom, + const char *path, + virDomainBlockInfoPtr info, + unsigned int flags) +{ virQEMUDriverPtr driver = dom->conn->privateData; virDomainObjPtr vm; int ret = -1; - int fd = -1; - off_t end; - virStorageFileMetadata *meta = NULL; virDomainDiskDefPtr disk = NULL; - struct stat sb; int i; - int format; - virQEMUDriverConfigPtr cfg = NULL; virCheckFlags(0, -1); if (!(vm = qemuDomObjFromDomain(dom))) goto cleanup; - cfg = virQEMUDriverGetConfig(driver); - if (!path || path[0] == '\0') { virReportError(VIR_ERR_INVALID_ARG, "%s", _("NULL or empty path")); @@ -9248,116 +9242,12 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, goto cleanup; } disk = vm->def->disks[i]; - if (!disk->src) { - virReportError(VIR_ERR_INVALID_ARG, - _("disk %s does not currently have a source assigned"), - path); - goto cleanup; - } - path = disk->src; - - /* The path is correct, now try to open it and get its size. */ - fd = open(path, O_RDONLY); - if (fd == -1) { - virReportSystemError(errno, - _("failed to open path '%s'"), path); - goto cleanup; - } - - /* Probe for magic formats */ - if (disk->format) { - format = disk->format; - } else { - if (cfg->allowDiskFormatProbing) { - if ((format = virStorageFileProbeFormat(disk->src, - cfg->user, - cfg->group)) < 0) - goto cleanup; - } else { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("no disk format for %s and probing is disabled"), - disk->src); - goto cleanup; - } - } - - if (!(meta = virStorageFileGetMetadataFromFD(path, fd, format))) - goto cleanup; - - /* Get info for normal formats */ - if (fstat(fd, &sb) < 0) { - virReportSystemError(errno, - _("cannot stat file '%s'"), path); - goto cleanup; - } - if (S_ISREG(sb.st_mode)) { -#ifndef WIN32 - info->physical = (unsigned long long)sb.st_blocks * - (unsigned long long)DEV_BSIZE; -#else - info->physical = sb.st_size; -#endif - /* Regular files may be sparse, so logical size (capacity) is not same - * as actual physical above - */ - info->capacity = sb.st_size; - } else { - /* NB. Because we configure with AC_SYS_LARGEFILE, off_t should - * be 64 bits on all platforms. - */ - end = lseek(fd, 0, SEEK_END); - if (end == (off_t)-1) { - virReportSystemError(errno, - _("failed to seek to end of %s"), path); - goto cleanup; - } - info->physical = end; - info->capacity = end; - } - - /* If the file we probed has a capacity set, then override - * what we calculated from file/block extents */ - if (meta->capacity) - info->capacity = meta->capacity; - - /* Set default value .. */ - info->allocation = info->physical; - - /* ..but if guest is running & not using raw - disk format and on a block device, then query - highest allocated extent from QEMU */ - if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && - format != VIR_STORAGE_FILE_RAW && - S_ISBLK(sb.st_mode) && - virDomainObjIsActive(vm)) { - qemuDomainObjPrivatePtr priv = vm->privateData; - - if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) - goto cleanup; - - if (virDomainObjIsActive(vm)) { - qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorGetBlockExtent(priv->mon, - disk->info.alias, - &info->allocation); - qemuDomainObjExitMonitor(driver, vm); - } else { - ret = 0; - } - - if (qemuDomainObjEndJob(driver, vm) == 0) - vm = NULL; - } else { - ret = 0; - } + ret = qemuDomainGetDiskBlockInfo(driver, vm, disk, info); cleanup: - virStorageFileFreeMetadata(meta); - VIR_FORCE_CLOSE(fd); if (vm) virObjectUnlock(vm); - virObjectUnref(cfg); return ret; } -- 1.8.0.2

On 02/06/2013 08:52 AM, Michal Privoznik wrote:
This is just digging out important implementation from qemu driver's qemuDomainGetDiskBlockInfo() API as this functionality is going to be required in the next patch. --- src/qemu/qemu_domain.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_domain.h | 4 ++ src/qemu/qemu_driver.c | 124 +++-------------------------------------------- 3 files changed, 138 insertions(+), 117 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 5bf0ab0..d7d7163 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -40,6 +40,9 @@
#include <sys/time.h> #include <fcntl.h> +#include <sys/types.h> +#include <sys/stat.h> +#include <unistd.h>
#include <libxml/xpathInternals.h>
@@ -2073,3 +2076,127 @@ cleanup: virObjectUnref(cfg); return ret; } + +int +qemuDomainGetDiskBlockInfo(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virDomainBlockInfoPtr info) +{ + + int ret = -1; + virStorageFileMetadata *meta = NULL; + virQEMUDriverConfigPtr cfg = NULL; + int format; + struct stat sb; + int fd = -1; + off_t end; + const char *path; + + if (!disk->src) { + virReportError(VIR_ERR_INVALID_ARG, + _("disk %s does not currently have a source assigned"), + disk->dst); + goto cleanup; + } + path = disk->src; + cfg = virQEMUDriverGetConfig(driver); + + /* The path is correct, now try to open it and get its size. */ + if ((fd = open(path, O_RDONLY)) < 0) { + virReportSystemError(errno, _("failed to open path '%s'"), path); + goto cleanup; + } + + /* Probe for magic formats */ + if (disk->format) { + format = disk->format; + } else { + if (cfg->allowDiskFormatProbing) { + if ((format = virStorageFileProbeFormat(path, + cfg->user, + cfg->group)) < 0) + goto cleanup; + } else { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("no disk format for %s and probing is disabled"), + path); + goto cleanup; + } + } + + if (!(meta = virStorageFileGetMetadataFromFD(path, fd, format))) + goto cleanup; + + /* Get info for normal formats */ + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("cannot stat file '%s'"), path); + goto cleanup; + } + + if (S_ISREG(sb.st_mode)) { +#ifndef WIN32 + info->physical = (unsigned long long)sb.st_blocks * + (unsigned long long)DEV_BSIZE; +#else + info->physical = sb.st_size; +#endif + /* Regular files may be sparse, so logical size (capacity) is not same + * as actual physical above + */ + info->capacity = sb.st_size; + } else { + /* NB. Because we configure with AC_SYS_LARGEFILE, off_t should + * be 64 bits on all platforms. + */ + end = lseek(fd, 0, SEEK_END); + if (end == (off_t)-1) { + virReportSystemError(errno, _("failed to seek to end of %s"), path); + goto cleanup; + } + info->physical = end; + info->capacity = end; + } + + /* If the file we probed has a capacity set, then override + * what we calculated from file/block extents */ + if (meta->capacity) + info->capacity = meta->capacity; + + /* Set default value .. */ + info->allocation = info->physical; + + /* ..but if guest is running & not using raw + disk format and on a block device, then query + highest allocated extent from QEMU */ + if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && + format != VIR_STORAGE_FILE_RAW && + S_ISBLK(sb.st_mode) && + virDomainObjIsActive(vm)) { + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) + goto cleanup; + + if (virDomainObjIsActive(vm)) { + qemuDomainObjEnterMonitor(driver, vm); + ret = qemuMonitorGetBlockExtent(priv->mon, + disk->info.alias, + &info->allocation); + qemuDomainObjExitMonitor(driver, vm); + } else { + ret = 0; + } + + if (qemuDomainObjEndJob(driver, vm) == 0) + vm = NULL;
I don't believe this will have the same effect as the prior code. The 'vm' is not passed by reference, thus setting it to NULL here just does so for this routine. When you return to the caller vm is what it was at call time. When I saw this here without seeing you had lifted code below I was wondering why this was done since it's essentially a NOP.
+ } else { + ret = 0; + } + +cleanup: + VIR_FORCE_CLOSE(fd); + virStorageFileFreeMetadata(meta); + virObjectUnref(cfg); + return ret; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index fb58877..e8555d3 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -359,5 +359,9 @@ void qemuDomainCleanupRemove(virDomainObjPtr vm, qemuDomainCleanupCallback cb); void qemuDomainCleanupRun(virQEMUDriverPtr driver, virDomainObjPtr vm); +int qemuDomainGetDiskBlockInfo(virQEMUDriverPtr driver, + virDomainObjPtr vm, + virDomainDiskDefPtr disk, + virDomainBlockInfoPtr info);
#endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 979a027..c3d0dc3 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9212,29 +9212,23 @@ cleanup: }
-static int qemuDomainGetBlockInfo(virDomainPtr dom, - const char *path, - virDomainBlockInfoPtr info, - unsigned int flags) { +static int +qemuDomainGetBlockInfo(virDomainPtr dom, + const char *path, + virDomainBlockInfoPtr info, + unsigned int flags) +{ virQEMUDriverPtr driver = dom->conn->privateData; virDomainObjPtr vm; int ret = -1; - int fd = -1; - off_t end; - virStorageFileMetadata *meta = NULL; virDomainDiskDefPtr disk = NULL; - struct stat sb; int i; - int format; - virQEMUDriverConfigPtr cfg = NULL;
virCheckFlags(0, -1);
if (!(vm = qemuDomObjFromDomain(dom))) goto cleanup;
- cfg = virQEMUDriverGetConfig(driver); - if (!path || path[0] == '\0') { virReportError(VIR_ERR_INVALID_ARG, "%s", _("NULL or empty path")); @@ -9248,116 +9242,12 @@ static int qemuDomainGetBlockInfo(virDomainPtr dom, goto cleanup; } disk = vm->def->disks[i]; - if (!disk->src) { - virReportError(VIR_ERR_INVALID_ARG, - _("disk %s does not currently have a source assigned"), - path); - goto cleanup; - } - path = disk->src; - - /* The path is correct, now try to open it and get its size. */ - fd = open(path, O_RDONLY); - if (fd == -1) { - virReportSystemError(errno, - _("failed to open path '%s'"), path); - goto cleanup; - } - - /* Probe for magic formats */ - if (disk->format) { - format = disk->format; - } else { - if (cfg->allowDiskFormatProbing) { - if ((format = virStorageFileProbeFormat(disk->src, - cfg->user, - cfg->group)) < 0) - goto cleanup; - } else { - virReportError(VIR_ERR_INTERNAL_ERROR, - _("no disk format for %s and probing is disabled"), - disk->src); - goto cleanup; - } - } - - if (!(meta = virStorageFileGetMetadataFromFD(path, fd, format))) - goto cleanup; - - /* Get info for normal formats */ - if (fstat(fd, &sb) < 0) { - virReportSystemError(errno, - _("cannot stat file '%s'"), path); - goto cleanup; - }
- if (S_ISREG(sb.st_mode)) { -#ifndef WIN32 - info->physical = (unsigned long long)sb.st_blocks * - (unsigned long long)DEV_BSIZE; -#else - info->physical = sb.st_size; -#endif - /* Regular files may be sparse, so logical size (capacity) is not same - * as actual physical above - */ - info->capacity = sb.st_size; - } else { - /* NB. Because we configure with AC_SYS_LARGEFILE, off_t should - * be 64 bits on all platforms. - */ - end = lseek(fd, 0, SEEK_END); - if (end == (off_t)-1) { - virReportSystemError(errno, - _("failed to seek to end of %s"), path); - goto cleanup; - } - info->physical = end; - info->capacity = end; - } - - /* If the file we probed has a capacity set, then override - * what we calculated from file/block extents */ - if (meta->capacity) - info->capacity = meta->capacity; - - /* Set default value .. */ - info->allocation = info->physical; - - /* ..but if guest is running & not using raw - disk format and on a block device, then query - highest allocated extent from QEMU */ - if (disk->type == VIR_DOMAIN_DISK_TYPE_BLOCK && - format != VIR_STORAGE_FILE_RAW && - S_ISBLK(sb.st_mode) && - virDomainObjIsActive(vm)) { - qemuDomainObjPrivatePtr priv = vm->privateData; - - if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_QUERY) < 0) - goto cleanup; - - if (virDomainObjIsActive(vm)) { - qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorGetBlockExtent(priv->mon, - disk->info.alias, - &info->allocation); - qemuDomainObjExitMonitor(driver, vm); - } else { - ret = 0; - } - - if (qemuDomainObjEndJob(driver, vm) == 0) - vm = NULL; - } else { - ret = 0; - } + ret = qemuDomainGetDiskBlockInfo(driver, vm, disk, info);
cleanup: - virStorageFileFreeMetadata(meta); - VIR_FORCE_CLOSE(fd); if (vm) virObjectUnlock(vm);
See note above in the qemuDomainGetDiskBlockInfo() routine
- virObjectUnref(cfg); return ret; }

With new NBD storage migration approach there are several requirements that need to be meet for successful use of the feature. One of them is - the file representing a disk, needs to have at least same size as on the source. Hence, we must transfer a list of pairs [disk target, size] and check on destination that this requirement is met and/or take actions to meet it. --- src/qemu/qemu_migration.c | 302 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 288 insertions(+), 14 deletions(-) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index defad6b..d10c5b5 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -29,6 +29,9 @@ #endif #include <fcntl.h> #include <poll.h> +#include <unistd.h> +#include <sys/types.h> +#include <sys/stat.h> #include "qemu_migration.h" #include "qemu_monitor.h" @@ -134,6 +137,15 @@ struct _qemuMigrationCookieNBD { int port; /* on which port does NBD server listen for incoming data. Zero value has special meaning - it is there just to let destination know we (the source) do support NBD. */ + + /* The list of pairs [disk, size] (in Bytes). + * This is needed because the same disk size is one of + * prerequisites for NBD storage migration. */ + size_t ndisks; + struct { + char *target; + size_t bytes; + } *disk; }; typedef struct _qemuMigrationCookie qemuMigrationCookie; @@ -196,6 +208,22 @@ qemuMigrationCookieNetworkFree(qemuMigrationCookieNetworkPtr network) } +static void +qemuMigrationCookieNBDFree(qemuMigrationCookieNBDPtr nbd) +{ + size_t i; + + if (!nbd) + return; + + for (i = 0; i < nbd->ndisks; i++) + VIR_FREE(nbd->disk[i].target); + + VIR_FREE(nbd->disk); + VIR_FREE(nbd); +} + + static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) { if (!mig) @@ -207,12 +235,14 @@ static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) if (mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) qemuMigrationCookieNetworkFree(mig->network); + if (mig->flags & QEMU_MIGRATION_COOKIE_NBD) + qemuMigrationCookieNBDFree(mig->nbd); + VIR_FREE(mig->localHostname); VIR_FREE(mig->remoteHostname); VIR_FREE(mig->name); VIR_FREE(mig->lockState); VIR_FREE(mig->lockDriver); - VIR_FREE(mig->nbd); VIR_FREE(mig); } @@ -525,22 +555,71 @@ qemuMigrationCookieAddNetwork(qemuMigrationCookiePtr mig, static int qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, - virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virQEMUDriverPtr driver, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + size_t i; /* It is not a bug if there already is a NBD data */ if (!mig->nbd && VIR_ALLOC(mig->nbd) < 0) { virReportOOMError(); - return -1; + return ret; + } + + /* in Begin phase add info about disks */ + if (priv->job.phase == QEMU_MIGRATION_PHASE_BEGIN3 && + vm->def->ndisks) { + if (VIR_ALLOC_N(mig->nbd->disk, vm->def->ndisks) < 0) { + virReportOOMError(); + goto cleanup; + } + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + virDomainBlockInfo info; + int format = disk->format; + + /* Add only non-shared RW disks with source */ + if (!disk->src || disk->shared || disk->readonly) + continue; + + memset(&info, 0, sizeof(info)); + + if (qemuDomainGetDiskBlockInfo(driver, vm, disk, &info) < 0) + goto cleanup; + + if (format != VIR_STORAGE_FILE_RAW && + format != VIR_STORAGE_FILE_QCOW && + format != VIR_STORAGE_FILE_QCOW2) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("Cannot pre-create storage of %s format"), + virStorageFileFormatTypeToString(format)); + goto cleanup; + } + + if (!(mig->nbd->disk[mig->nbd->ndisks].target = strdup(disk->dst))) { + virReportOOMError(); + goto cleanup; + } + + mig->nbd->disk[mig->nbd->ndisks].bytes = info.capacity; + mig->nbd->ndisks++; + } } mig->nbd->port = priv->nbdPort; mig->flags |= QEMU_MIGRATION_COOKIE_NBD; - return 0; + ret = 0; + +cleanup: + if (ret < 0) + qemuMigrationCookieNBDFree(mig->nbd); + + return ret; } @@ -650,7 +729,16 @@ qemuMigrationCookieXMLFormat(virQEMUDriverPtr driver, virBufferAddLit(buf, " <nbd"); if (mig->nbd->port) virBufferAsprintf(buf, " port='%d'", mig->nbd->port); - virBufferAddLit(buf, "/>\n"); + if (mig->nbd->ndisks) { + virBufferAddLit(buf, ">\n"); + for (i = 0; i < mig->nbd->ndisks; i++) + virBufferAsprintf(buf, " <disk target='%s' size='%zu'/>\n", + mig->nbd->disk[i].target, + mig->nbd->disk[i].bytes); + virBufferAddLit(buf, " </nbd>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } } virBufferAddLit(buf, "</qemu-migration>\n"); @@ -942,20 +1030,56 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, virXPathBoolean("boolean(./nbd)", ctxt)) { char *port; + if (VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + goto error; + } + port = virXPathString("string(./nbd/@port)", ctxt); - if (port) { - if (VIR_ALLOC(mig->nbd) < 0) { + if (port && virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { + VIR_FREE(port); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed nbd port '%s'"), + port); + goto error; + } + VIR_FREE(port); + + if ((n = virXPathNodeSet("./nbd/disk", ctxt, &nodes)) > 0) { + xmlNodePtr oldNode = ctxt->node; + if (VIR_ALLOC_N(mig->nbd->disk, n) < 0) { virReportOOMError(); goto error; } - if (virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { - VIR_FREE(port); - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Malformed nbd port '%s'"), - port); - goto error; + mig->nbd->ndisks = n; + + for (i = 0; i < n; i++) { + ctxt->node = nodes[i]; + + tmp = virXPathString("string(./@target)", ctxt); + if (!tmp) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Malformed target attribute")); + goto error; + } + mig->nbd->disk[i].target = tmp; + /* deliberately don't free @tmp here, as the + * cookie has the reference now and it is + * responsible for freeing it later */ + + tmp = virXPathString("string(./@size)", ctxt); + if (virStrToLong_ull(tmp, NULL, 10, (unsigned long long *) + &mig->nbd->disk[i].bytes) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed size attribute '%s'"), + tmp); + VIR_FREE(tmp); + goto error; + } + VIR_FREE(tmp); } - VIR_FREE(port); + VIR_FREE(nodes); + ctxt->node = oldNode; } } @@ -1348,6 +1472,152 @@ error: goto cleanup; } +static int +qemuMigrationPreCreateStorageRaw(virDomainDiskDefPtr disk, + size_t bytes) +{ + int ret = -1; + struct stat sb; + int fd = -1; + + if ((fd = virFileOpenAs(disk->src, O_RDWR | O_CREAT, 0660, + -1, -1, VIR_FILE_OPEN_NOFORK)) < 0) { + virReportSystemError(errno, _("Unable to create '%s'"), disk->src); + goto cleanup; + } + + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("Unable to stat '%s'"), disk->src); + goto unlink; + } + + VIR_DEBUG("File '%s' is %zuB big. Required %zuB", disk->src, sb.st_size, bytes); + if (sb.st_size < bytes) { + if (ftruncate(fd, bytes) < 0) { + virReportSystemError(errno, _("Unable to ftruncate '%s'"), disk->src); + goto unlink; + } + + if (fsync(fd) < 0) { + ret = -errno; + virReportSystemError(errno, _("cannot sync data to file '%s'"), + disk->src); + goto unlink; + } + } + + ret = 0; + +cleanup: + VIR_FORCE_CLOSE(fd); + return ret; + +unlink: + if (unlink(disk->src) < 0) + VIR_WARN("Unable to unlink '%s': %d", disk->src, errno); + goto cleanup; +} + +static int +qemuMigrationPreCreateStorageQCOW(virQEMUDriverPtr driver, + virDomainDiskDefPtr disk, + int format, + size_t bytes) +{ + int ret = -1; + const char *imgBinary = qemuFindQemuImgBinary(driver); + virCommandPtr cmd = NULL; + size_t size_arg; + + if (!imgBinary) + return ret; + + /* Size in KB */ + size_arg = VIR_DIV_UP(bytes, 1024); + + cmd = virCommandNewArgList(imgBinary, "create", "-f", + virStorageFileFormatTypeToString(format), + "-o", "preallocation=metadata", + disk->src, NULL); + + virCommandAddArgFormat(cmd, "%zuK", size_arg); + + if (virCommandRun(cmd, NULL) < 0) + goto unlink; + + ret = 0; + +cleanup: + return ret; + +unlink: + if (unlink(disk->src) < 0) + VIR_WARN("Unable to unlink '%s': %d", disk->src, errno); + goto cleanup; +} + +static int +qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig) +{ + int ret = -1; + size_t i = 0; + + if (!mig->nbd || !mig->nbd->ndisks) { + /* nothing to do here */ + return 0; + } + + for (i = 0; i < mig->nbd->ndisks; i++) { + virDomainDiskDefPtr disk; + int format; + size_t bytes; + int index; + + index = virDomainDiskIndexByName(vm->def, mig->nbd->disk[i].target, false); + if (index < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No such disk '%s"), + mig->nbd->disk[i].target); + goto cleanup; + } + + disk = vm->def->disks[i]; + format = disk->format; + bytes = mig->nbd->disk[i].bytes; + + VIR_DEBUG("Checking '%s' of %s format for its size (requested %zuB)", + disk->src, virStorageFileFormatTypeToString(format), bytes); + switch (format) { + case VIR_STORAGE_FILE_RAW: + if (qemuMigrationPreCreateStorageRaw(disk, bytes) < 0) + goto cleanup; + break; + case VIR_STORAGE_FILE_QCOW: + case VIR_STORAGE_FILE_QCOW2: + if (qemuMigrationPreCreateStorageQCOW(driver, disk, + format, bytes) < 0) + goto cleanup; + break; + default: + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("Cannot pre-create disks of %s format"), + virStorageFileFormatTypeToString(format)); + goto cleanup; + } + + } + + ret = 0; +cleanup: + /* free from migration data to prevent + * infinite sending from src to dst and back */ + VIR_FREE(mig->nbd->disk); + mig->nbd->ndisks = 0; + return ret; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -2003,6 +2273,10 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, QEMU_MIGRATION_COOKIE_NBD))) goto cleanup; + /* pre-create all storage */ + if (qemuMigrationPreCreateStorage(driver, vm, mig) < 0) + goto cleanup; + if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) goto cleanup; qemuMigrationJobSetPhase(driver, vm, QEMU_MIGRATION_PHASE_PREPARE); -- 1.8.0.2

On 02/06/2013 08:52 AM, Michal Privoznik wrote:
With new NBD storage migration approach there are several requirements that need to be meet for successful use of the feature. One of them is - the file representing a disk, needs to have at least same size as on the source. Hence, we must transfer a list of pairs [disk target, size] and check on destination that this requirement is met and/or take actions to meet it. --- src/qemu/qemu_migration.c | 302 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 288 insertions(+), 14 deletions(-)
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index defad6b..d10c5b5 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -29,6 +29,9 @@ #endif #include <fcntl.h> #include <poll.h> +#include <unistd.h> +#include <sys/types.h> +#include <sys/stat.h>
#include "qemu_migration.h" #include "qemu_monitor.h" @@ -134,6 +137,15 @@ struct _qemuMigrationCookieNBD { int port; /* on which port does NBD server listen for incoming data. Zero value has special meaning - it is there just to let destination know we (the source) do support NBD. */ + + /* The list of pairs [disk, size] (in Bytes). + * This is needed because the same disk size is one of + * prerequisites for NBD storage migration. */ + size_t ndisks; + struct { + char *target; + size_t bytes; + } *disk; };
typedef struct _qemuMigrationCookie qemuMigrationCookie; @@ -196,6 +208,22 @@ qemuMigrationCookieNetworkFree(qemuMigrationCookieNetworkPtr network) }
+static void +qemuMigrationCookieNBDFree(qemuMigrationCookieNBDPtr nbd) +{ + size_t i; + + if (!nbd) + return; + + for (i = 0; i < nbd->ndisks; i++) + VIR_FREE(nbd->disk[i].target); + + VIR_FREE(nbd->disk); + VIR_FREE(nbd); +} + + static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) { if (!mig) @@ -207,12 +235,14 @@ static void qemuMigrationCookieFree(qemuMigrationCookiePtr mig) if (mig->flags & QEMU_MIGRATION_COOKIE_NETWORK) qemuMigrationCookieNetworkFree(mig->network);
+ if (mig->flags & QEMU_MIGRATION_COOKIE_NBD) + qemuMigrationCookieNBDFree(mig->nbd); + VIR_FREE(mig->localHostname); VIR_FREE(mig->remoteHostname); VIR_FREE(mig->name); VIR_FREE(mig->lockState); VIR_FREE(mig->lockDriver); - VIR_FREE(mig->nbd); VIR_FREE(mig); }
@@ -525,22 +555,71 @@ qemuMigrationCookieAddNetwork(qemuMigrationCookiePtr mig,
static int qemuMigrationCookieAddNBD(qemuMigrationCookiePtr mig, - virQEMUDriverPtr driver ATTRIBUTE_UNUSED, + virQEMUDriverPtr driver, virDomainObjPtr vm) { qemuDomainObjPrivatePtr priv = vm->privateData; + int ret = -1; + size_t i;
/* It is not a bug if there already is a NBD data */ if (!mig->nbd && VIR_ALLOC(mig->nbd) < 0) { virReportOOMError(); - return -1; + return ret; + } + + /* in Begin phase add info about disks */ + if (priv->job.phase == QEMU_MIGRATION_PHASE_BEGIN3 && + vm->def->ndisks) { + if (VIR_ALLOC_N(mig->nbd->disk, vm->def->ndisks) < 0) { + virReportOOMError(); + goto cleanup; + }
Could be a lot of wasted space potentially, right? I've seen other code make use of VIR_REALLOC_N
+ + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + virDomainBlockInfo info; + int format = disk->format; + + /* Add only non-shared RW disks with source */ + if (!disk->src || disk->shared || disk->readonly) + continue; + + memset(&info, 0, sizeof(info)); + + if (qemuDomainGetDiskBlockInfo(driver, vm, disk, &info) < 0) + goto cleanup; + + if (format != VIR_STORAGE_FILE_RAW && + format != VIR_STORAGE_FILE_QCOW && + format != VIR_STORAGE_FILE_QCOW2) { + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("Cannot pre-create storage of %s format"), + virStorageFileFormatTypeToString(format)); + goto cleanup; + } + + if (!(mig->nbd->disk[mig->nbd->ndisks].target = strdup(disk->dst))) { + virReportOOMError(); + goto cleanup; + } + + mig->nbd->disk[mig->nbd->ndisks].bytes = info.capacity; + mig->nbd->ndisks++; + } }
mig->nbd->port = priv->nbdPort; mig->flags |= QEMU_MIGRATION_COOKIE_NBD;
- return 0; + ret = 0; + +cleanup: + if (ret < 0) + qemuMigrationCookieNBDFree(mig->nbd); + + return ret; }
@@ -650,7 +729,16 @@ qemuMigrationCookieXMLFormat(virQEMUDriverPtr driver, virBufferAddLit(buf, " <nbd"); if (mig->nbd->port) virBufferAsprintf(buf, " port='%d'", mig->nbd->port); - virBufferAddLit(buf, "/>\n"); + if (mig->nbd->ndisks) { + virBufferAddLit(buf, ">\n"); + for (i = 0; i < mig->nbd->ndisks; i++) + virBufferAsprintf(buf, " <disk target='%s' size='%zu'/>\n", + mig->nbd->disk[i].target, + mig->nbd->disk[i].bytes); + virBufferAddLit(buf, " </nbd>\n"); + } else { + virBufferAddLit(buf, "/>\n"); + } }
virBufferAddLit(buf, "</qemu-migration>\n"); @@ -942,20 +1030,56 @@ qemuMigrationCookieXMLParse(qemuMigrationCookiePtr mig, virXPathBoolean("boolean(./nbd)", ctxt)) { char *port;
+ if (VIR_ALLOC(mig->nbd) < 0) { + virReportOOMError(); + goto error; + } + port = virXPathString("string(./nbd/@port)", ctxt); - if (port) { - if (VIR_ALLOC(mig->nbd) < 0) { + if (port && virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { + VIR_FREE(port); + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed nbd port '%s'"), + port); + goto error; + } + VIR_FREE(port); + + if ((n = virXPathNodeSet("./nbd/disk", ctxt, &nodes)) > 0) { + xmlNodePtr oldNode = ctxt->node; + if (VIR_ALLOC_N(mig->nbd->disk, n) < 0) { virReportOOMError(); goto error; } - if (virStrToLong_i(port, NULL, 10, &mig->nbd->port) < 0) { - VIR_FREE(port); - virReportError(VIR_ERR_INTERNAL_ERROR, - _("Malformed nbd port '%s'"), - port); - goto error; + mig->nbd->ndisks = n; + + for (i = 0; i < n; i++) { + ctxt->node = nodes[i]; + + tmp = virXPathString("string(./@target)", ctxt); + if (!tmp) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Malformed target attribute")); + goto error; + } + mig->nbd->disk[i].target = tmp; + /* deliberately don't free @tmp here, as the + * cookie has the reference now and it is + * responsible for freeing it later */ + + tmp = virXPathString("string(./@size)", ctxt); + if (virStrToLong_ull(tmp, NULL, 10, (unsigned long long *) + &mig->nbd->disk[i].bytes) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Malformed size attribute '%s'"), + tmp); + VIR_FREE(tmp); + goto error; + } + VIR_FREE(tmp); } - VIR_FREE(port); + VIR_FREE(nodes); + ctxt->node = oldNode; } }
@@ -1348,6 +1472,152 @@ error: goto cleanup; }
+static int +qemuMigrationPreCreateStorageRaw(virDomainDiskDefPtr disk, + size_t bytes) +{ + int ret = -1; + struct stat sb; + int fd = -1; + + if ((fd = virFileOpenAs(disk->src, O_RDWR | O_CREAT, 0660, + -1, -1, VIR_FILE_OPEN_NOFORK)) < 0) { + virReportSystemError(errno, _("Unable to create '%s'"), disk->src); + goto cleanup; + } + + if (fstat(fd, &sb) < 0) { + virReportSystemError(errno, _("Unable to stat '%s'"), disk->src); + goto unlink; + } + + VIR_DEBUG("File '%s' is %zuB big. Required %zuB", disk->src, sb.st_size, bytes); + if (sb.st_size < bytes) { + if (ftruncate(fd, bytes) < 0) { + virReportSystemError(errno, _("Unable to ftruncate '%s'"), disk->src); + goto unlink; + } + + if (fsync(fd) < 0) { + ret = -errno; + virReportSystemError(errno, _("cannot sync data to file '%s'"), + disk->src); + goto unlink; + } + } + + ret = 0; + +cleanup: + VIR_FORCE_CLOSE(fd); + return ret; + +unlink: + if (unlink(disk->src) < 0) + VIR_WARN("Unable to unlink '%s': %d", disk->src, errno); + goto cleanup; +} + +static int +qemuMigrationPreCreateStorageQCOW(virQEMUDriverPtr driver, + virDomainDiskDefPtr disk, + int format, + size_t bytes) +{ + int ret = -1; + const char *imgBinary = qemuFindQemuImgBinary(driver); + virCommandPtr cmd = NULL; + size_t size_arg; + + if (!imgBinary) + return ret; + + /* Size in KB */ + size_arg = VIR_DIV_UP(bytes, 1024); + + cmd = virCommandNewArgList(imgBinary, "create", "-f", + virStorageFileFormatTypeToString(format), + "-o", "preallocation=metadata", + disk->src, NULL); + + virCommandAddArgFormat(cmd, "%zuK", size_arg); + + if (virCommandRun(cmd, NULL) < 0) + goto unlink; + + ret = 0; + +cleanup:
Doesn't there need to be a virCommandFree(cmd) here?
+ return ret; + +unlink: + if (unlink(disk->src) < 0) + VIR_WARN("Unable to unlink '%s': %d", disk->src, errno); + goto cleanup; +} + +static int +qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig) +{ + int ret = -1; + size_t i = 0; + + if (!mig->nbd || !mig->nbd->ndisks) { + /* nothing to do here */ + return 0; + } + + for (i = 0; i < mig->nbd->ndisks; i++) { + virDomainDiskDefPtr disk; + int format; + size_t bytes; + int index; + + index = virDomainDiskIndexByName(vm->def, mig->nbd->disk[i].target, false); + if (index < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("No such disk '%s"), + mig->nbd->disk[i].target); + goto cleanup; + } + + disk = vm->def->disks[i];
Huh? Is 'i' supposed to be 'index'?
+ format = disk->format; + bytes = mig->nbd->disk[i].bytes; + + VIR_DEBUG("Checking '%s' of %s format for its size (requested %zuB)", + disk->src, virStorageFileFormatTypeToString(format), bytes); + switch (format) { + case VIR_STORAGE_FILE_RAW: + if (qemuMigrationPreCreateStorageRaw(disk, bytes) < 0) + goto cleanup; + break; + case VIR_STORAGE_FILE_QCOW: + case VIR_STORAGE_FILE_QCOW2: + if (qemuMigrationPreCreateStorageQCOW(driver, disk, + format, bytes) < 0) + goto cleanup; + break; + default: + virReportError(VIR_ERR_OPERATION_UNSUPPORTED, + _("Cannot pre-create disks of %s format"), + virStorageFileFormatTypeToString(format)); + goto cleanup; + } + + } + + ret = 0; +cleanup: + /* free from migration data to prevent + * infinite sending from src to dst and back */ + VIR_FREE(mig->nbd->disk); + mig->nbd->ndisks = 0; + return ret; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -2003,6 +2273,10 @@ qemuMigrationPrepareAny(virQEMUDriverPtr driver, QEMU_MIGRATION_COOKIE_NBD))) goto cleanup;
+ /* pre-create all storage */ + if (qemuMigrationPreCreateStorage(driver, vm, mig) < 0) + goto cleanup; + if (qemuMigrationJobStart(driver, vm, QEMU_ASYNC_JOB_MIGRATION_IN) < 0) goto cleanup; qemuMigrationJobSetPhase(driver, vm, QEMU_MIGRATION_PHASE_PREPARE);

At the end of migration, it is important to stop NBD server and thus release all allocated resources. --- src/qemu/qemu_migration.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index d10c5b5..7d08022 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1618,6 +1618,29 @@ cleanup: return ret; } +static void +qemuMigrationStopNBDServer(virQEMUDriverPtr driver, + virDomainObjPtr vm, + qemuMigrationCookiePtr mig) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + + if (!mig->nbd) + return; + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_IN) < 0) + return; + + if (qemuMonitorNBDServerStop(priv->mon) < 0) + VIR_WARN("Unable to stop NBD server"); + + qemuDomainObjExitMonitorWithDriver(driver, vm); + + virPortAllocatorRelease(driver->remotePorts, priv->nbdPort); + priv->nbdPort = 0; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -4024,6 +4047,8 @@ qemuMigrationFinish(virQEMUDriverPtr driver, VIR_WARN("unable to provide network data for relocation"); } + qemuMigrationStopNBDServer(driver, vm, mig); + if (flags & VIR_MIGRATE_PERSIST_DEST) { virDomainDefPtr vmdef; if (vm->persistent) -- 1.8.0.2

If a migration fails, we need to stop all block jobs running so qemu doesn't try to send data to destination over and over again. --- src/qemu/qemu_migration.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 7d08022..f3edd51 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1641,6 +1641,46 @@ qemuMigrationStopNBDServer(virQEMUDriverPtr driver, priv->nbdPort = 0; } +static void +qemuMigrationCancelDriveMirror(qemuMigrationCookiePtr mig, + virQEMUDriverPtr driver, + virDomainObjPtr vm) +{ + qemuDomainObjPrivatePtr priv = vm->privateData; + size_t i; + char *diskAlias = NULL; + + VIR_DEBUG("mig=%p nbdPort=%d", mig->nbd, priv->nbdPort); + + for (i = 0; i < vm->def->ndisks; i++) { + virDomainDiskDefPtr disk = vm->def->disks[i]; + + /* skip shared, RO and source-less disks */ + if (disk->shared || disk->readonly || !disk->src) + continue; + + VIR_FREE(diskAlias); + if (virAsprintf(&diskAlias, "%s%s", + QEMU_DRIVE_HOST_PREFIX, disk->info.alias) < 0) { + virReportOOMError(); + goto cleanup; + } + + if (qemuDomainObjEnterMonitorAsync(driver, vm, + QEMU_ASYNC_JOB_MIGRATION_OUT) < 0) + goto cleanup; + + if (qemuMonitorBlockJob(priv->mon, diskAlias, NULL, 0, + NULL, BLOCK_JOB_ABORT, true) < 0) + VIR_WARN("Unable to stop block job on %s", diskAlias); + qemuDomainObjExitMonitorWithDriver(driver, vm); + } + +cleanup: + VIR_FREE(diskAlias); + return; +} + /* Validate whether the domain is safe to migrate. If vm is NULL, * then this is being run in the v2 Prepare stage on the destination * (where we only have the target xml); if vm is provided, then this @@ -3042,6 +3082,9 @@ cleanup: if (ret < 0 && !orig_err) orig_err = virSaveLastError(); + /* cancel any outstanding NBD jobs */ + qemuMigrationCancelDriveMirror(mig, driver, vm); + if (spec->fwdType != MIGRATION_FWD_DIRECT) { if (iothread && qemuMigrationStopTunnel(iothread, ret < 0) < 0) ret = -1; @@ -4240,6 +4283,9 @@ int qemuMigrationConfirm(virQEMUDriverPtr driver, VIR_DOMAIN_EVENT_STOPPED_MIGRATED); } else { + /* cancel any outstanding NBD jobs */ + qemuMigrationCancelDriveMirror(mig, driver, vm); + /* run 'cont' on the destination, which allows migration on qemu * >= 0.10.6 to work properly. This isn't strictly necessary on * older qemu's, but it also doesn't hurt anything there -- 1.8.0.2

If migratioin fails because of whatever reason and we've pre-created any disks, we should remove them instead of letting them lying around. Moreover, we need to save the disks sources into domain status file in case libvirtd gets restarted. --- src/qemu/qemu_domain.c | 30 ++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 2 ++ src/qemu/qemu_migration.c | 13 +++++++++++++ src/qemu/qemu_process.c | 8 ++++++++ 4 files changed, 51 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d7d7163..6090623 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -232,10 +232,15 @@ error: static void qemuDomainObjPrivateFree(void *data) { + size_t i; qemuDomainObjPrivatePtr priv = data; virObjectUnref(priv->caps); + for (i = 0; i < priv->nnbdDisk; i++) + VIR_FREE(priv->nbdDisk[i]); + VIR_FREE(priv->nbdDisk); + qemuDomainPCIAddressSetFree(priv->pciaddrs); virDomainChrSourceDefFree(priv->monConfig); qemuDomainObjFreeJob(priv); @@ -264,6 +269,7 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) qemuDomainObjPrivatePtr priv = data; const char *monitorpath; enum qemuDomainJob job; + size_t i; /* priv->monitor_chr is set only for qemu */ if (priv->monConfig) { @@ -286,7 +292,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) if (priv->nvcpupids) { - int i; virBufferAddLit(buf, " <vcpus>\n"); for (i = 0 ; i < priv->nvcpupids ; i++) { virBufferAsprintf(buf, " <vcpu pid='%d'/>\n", priv->vcpupids[i]); @@ -295,7 +300,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) } if (priv->caps) { - int i; virBufferAddLit(buf, " <qemuCaps>\n"); for (i = 0 ; i < QEMU_CAPS_LAST ; i++) { if (qemuCapsGet(priv->caps, i)) { @@ -329,6 +333,13 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) if (priv->fakeReboot) virBufferAsprintf(buf, " <fakereboot/>\n"); + if (priv->nnbdDisk) { + virBufferAddLit(buf, " <nbdDisk>\n"); + for (i = 0; i < priv->nnbdDisk; i++) + virBufferAsprintf(buf, " <disk src='%s'/>\n", priv->nbdDisk[i]); + virBufferAddLit(buf, " </nbdDisk>\n"); + } + return 0; } @@ -474,6 +485,21 @@ static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data) priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1; + n = virXPathNodeSet("./nbdDisk/disk", ctxt, &nodes); + if (n < 0) + goto error; + if (n) { + if (VIR_REALLOC_N(priv->nbdDisk, n) < 0) { + virReportOOMError(); + goto error; + } + priv->nnbdDisk = n; + + for (i = 0; i < n; i++) + if (!(priv->nbdDisk[i] = virXMLPropString(nodes[i], "src"))) + goto error; + } + return 0; error: diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index e8555d3..3548a51 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -155,6 +155,8 @@ struct _qemuDomainObjPrivate { unsigned long migMaxBandwidth; char *origname; int nbdPort; /* Port used for migration with NBD */ + size_t nnbdDisk; + char **nbdDisk; /* src of disks we want to transfer via NBD */ virChrdevsPtr devs; diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index f3edd51..1379b23 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1561,6 +1561,7 @@ qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuMigrationCookiePtr mig) { + qemuDomainObjPrivatePtr priv = vm->privateData; int ret = -1; size_t i = 0; @@ -1607,6 +1608,11 @@ qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, goto cleanup; } + if ((VIR_REALLOC_N(priv->nbdDisk, priv->nnbdDisk + 1) < 0) || + !(priv->nbdDisk[priv->nnbdDisk++] = strdup(disk->src))) { + virReportOOMError(); + goto cleanup; + } } ret = 0; @@ -4042,6 +4048,7 @@ qemuMigrationFinish(virQEMUDriverPtr driver, int cookie_flags = 0; qemuDomainObjPrivatePtr priv = vm->privateData; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + size_t i; VIR_DEBUG("driver=%p, dconn=%p, vm=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, flags=%lx, retcode=%d", @@ -4192,6 +4199,12 @@ qemuMigrationFinish(virQEMUDriverPtr driver, VIR_DOMAIN_EVENT_SUSPENDED, VIR_DOMAIN_EVENT_SUSPENDED_PAUSED); } + + /* Free disks transferred via NBD so they don't get deleted */ + for (i = 0; i < priv->nnbdDisk; i++) + VIR_FREE(priv->nbdDisk[i]); + VIR_FREE(priv->nbdDisk); + priv->nnbdDisk = 0; } if (virDomainObjIsActive(vm) && diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2c57e0e..73420e3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4245,6 +4245,14 @@ void qemuProcessStop(virQEMUDriverPtr driver, priv->nbdPort = 0; } + for (i = 0; i < priv->nnbdDisk; i++) { + VIR_DEBUG("Unlinking %s due to unsuccessful NBD transfer", + priv->nbdDisk[i]); + if (unlink(priv->nbdDisk[i]) < 0) + virReportSystemError(errno, _("Unable to unlink %s"), + priv->nbdDisk[i]); + } + if (priv->agent) { qemuAgentClose(priv->agent); priv->agent = NULL; -- 1.8.0.2

On 02/06/2013 08:52 AM, Michal Privoznik wrote:
If migratioin fails because of whatever reason and we've pre-created any disks, we should remove them instead of letting them lying around. Moreover, we need to save the disks sources into domain status file in case libvirtd gets restarted. --- src/qemu/qemu_domain.c | 30 ++++++++++++++++++++++++++++-- src/qemu/qemu_domain.h | 2 ++ src/qemu/qemu_migration.c | 13 +++++++++++++ src/qemu/qemu_process.c | 8 ++++++++ 4 files changed, 51 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index d7d7163..6090623 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -232,10 +232,15 @@ error:
static void qemuDomainObjPrivateFree(void *data) { + size_t i; qemuDomainObjPrivatePtr priv = data;
virObjectUnref(priv->caps);
+ for (i = 0; i < priv->nnbdDisk; i++) + VIR_FREE(priv->nbdDisk[i]); + VIR_FREE(priv->nbdDisk); + qemuDomainPCIAddressSetFree(priv->pciaddrs); virDomainChrSourceDefFree(priv->monConfig); qemuDomainObjFreeJob(priv); @@ -264,6 +269,7 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) qemuDomainObjPrivatePtr priv = data; const char *monitorpath; enum qemuDomainJob job; + size_t i;
/* priv->monitor_chr is set only for qemu */ if (priv->monConfig) { @@ -286,7 +292,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data)
if (priv->nvcpupids) { - int i; virBufferAddLit(buf, " <vcpus>\n"); for (i = 0 ; i < priv->nvcpupids ; i++) { virBufferAsprintf(buf, " <vcpu pid='%d'/>\n", priv->vcpupids[i]); @@ -295,7 +300,6 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) }
if (priv->caps) { - int i; virBufferAddLit(buf, " <qemuCaps>\n"); for (i = 0 ; i < QEMU_CAPS_LAST ; i++) { if (qemuCapsGet(priv->caps, i)) { @@ -329,6 +333,13 @@ static int qemuDomainObjPrivateXMLFormat(virBufferPtr buf, void *data) if (priv->fakeReboot) virBufferAsprintf(buf, " <fakereboot/>\n");
+ if (priv->nnbdDisk) { + virBufferAddLit(buf, " <nbdDisk>\n"); + for (i = 0; i < priv->nnbdDisk; i++) + virBufferAsprintf(buf, " <disk src='%s'/>\n", priv->nbdDisk[i]); + virBufferAddLit(buf, " </nbdDisk>\n"); + } + return 0; }
@@ -474,6 +485,21 @@ static int qemuDomainObjPrivateXMLParse(xmlXPathContextPtr ctxt, void *data)
priv->fakeReboot = virXPathBoolean("boolean(./fakereboot)", ctxt) == 1;
+ n = virXPathNodeSet("./nbdDisk/disk", ctxt, &nodes); + if (n < 0) + goto error; + if (n) { + if (VIR_REALLOC_N(priv->nbdDisk, n) < 0) { + virReportOOMError(); + goto error; + } + priv->nnbdDisk = n; + + for (i = 0; i < n; i++) + if (!(priv->nbdDisk[i] = virXMLPropString(nodes[i], "src"))) + goto error;
If we fail in here, then nbdDisk[] entries from i -> n will be empty. Will this cause issues elsewhere (such as qemuProcessStop() below) because other code assumes each nnbdDisk entry is populated with something? Also I think the for() needs {} since we have multiple lines following.
+ } + return 0;
error: diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index e8555d3..3548a51 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -155,6 +155,8 @@ struct _qemuDomainObjPrivate { unsigned long migMaxBandwidth; char *origname; int nbdPort; /* Port used for migration with NBD */ + size_t nnbdDisk; + char **nbdDisk; /* src of disks we want to transfer via NBD */
virChrdevsPtr devs;
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index f3edd51..1379b23 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -1561,6 +1561,7 @@ qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, virDomainObjPtr vm, qemuMigrationCookiePtr mig) { + qemuDomainObjPrivatePtr priv = vm->privateData; int ret = -1; size_t i = 0;
@@ -1607,6 +1608,11 @@ qemuMigrationPreCreateStorage(virQEMUDriverPtr driver, goto cleanup; }
+ if ((VIR_REALLOC_N(priv->nbdDisk, priv->nnbdDisk + 1) < 0) || + !(priv->nbdDisk[priv->nnbdDisk++] = strdup(disk->src))) { + virReportOOMError(); + goto cleanup; + } }
ret = 0; @@ -4042,6 +4048,7 @@ qemuMigrationFinish(virQEMUDriverPtr driver, int cookie_flags = 0; qemuDomainObjPrivatePtr priv = vm->privateData; virQEMUDriverConfigPtr cfg = virQEMUDriverGetConfig(driver); + size_t i;
VIR_DEBUG("driver=%p, dconn=%p, vm=%p, cookiein=%s, cookieinlen=%d, " "cookieout=%p, cookieoutlen=%p, flags=%lx, retcode=%d", @@ -4192,6 +4199,12 @@ qemuMigrationFinish(virQEMUDriverPtr driver, VIR_DOMAIN_EVENT_SUSPENDED, VIR_DOMAIN_EVENT_SUSPENDED_PAUSED); } + + /* Free disks transferred via NBD so they don't get deleted */ + for (i = 0; i < priv->nnbdDisk; i++) + VIR_FREE(priv->nbdDisk[i]); + VIR_FREE(priv->nbdDisk); + priv->nnbdDisk = 0; }
if (virDomainObjIsActive(vm) && diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2c57e0e..73420e3 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -4245,6 +4245,14 @@ void qemuProcessStop(virQEMUDriverPtr driver, priv->nbdPort = 0; }
+ for (i = 0; i < priv->nnbdDisk; i++) { + VIR_DEBUG("Unlinking %s due to unsuccessful NBD transfer", + priv->nbdDisk[i]); + if (unlink(priv->nbdDisk[i]) < 0) + virReportSystemError(errno, _("Unable to unlink %s"), + priv->nbdDisk[i]); + } + if (priv->agent) { qemuAgentClose(priv->agent); priv->agent = NULL;

On 02/06/2013 08:52 AM, Michal Privoznik wrote:
Michal Privoznik (12): qemu: Introduce NBD_SERVER capability Introduce NBD migration cookie qemu: Introduce nbd-server-start command qemu: Introduce nbd-server-add command qemu: Introduce nbd-server-stop command qemu_migration: Introduce qemuMigrationStartNBDServer() qemu_migration: Introduce qemuMigrationDriveMirror qemu_domain: Introduce qemuDomainGetDiskBlockInfo qemu_migration: Check size prerequisites qemu_migration: Stop NBD server at Finish phase qemu_migration: Cancel running jobs on failed migration qemu_migration: Unlink pre-created storage on error
src/qemu/qemu_capabilities.c | 4 +- src/qemu/qemu_capabilities.h | 1 + src/qemu/qemu_domain.c | 157 ++++++++- src/qemu/qemu_domain.h | 7 + src/qemu/qemu_driver.c | 124 +------ src/qemu/qemu_migration.c | 769 ++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_monitor.c | 63 ++++ src/qemu/qemu_monitor.h | 7 + src/qemu/qemu_monitor_json.c | 102 ++++++ src/qemu/qemu_monitor_json.h | 7 + src/qemu/qemu_process.c | 13 + 11 files changed, 1118 insertions(+), 136 deletions(-)
Seeing as this has been sitting around awhile I looked through the changes and made comments on changes 6, 7, 8, 9, and 12. The others I didn't have comments on. Consider me a soft ACK - hopefully one of the original v1/v2 reviewers can also chime in! John
participants (2)
-
John Ferlan
-
Michal Privoznik