[libvirt] [PATCHv5 00/23] live block migration

This series shows how to do live block migration using both streaming mirror (as proposed for upstream qemu 1.1) and snapshot+mirror (as proposed for RHEL 6.3, although not as actively tested, since streaming mirror seems so much nicer). v4 (blockjob): https://www.redhat.com/archives/libvir-list/2012-April/msg00330.html v1 (snapshot): https://www.redhat.com/archives/libvir-list/2012-March/msg00578.html I've hopefully resolved the review comments Jiri gave against v4, and have tested this against Paolo's proposed builds of qemu for RHEL 6.3. Patches 1-14 are ready to apply if qemu commits to the 'drive-mirror' interface for qemu 1.1. Patches 15-17 need a bit more thought to decide if we also want to support partial streaming support (similar to the optional 'base' argument to 'block-stream') in 'drive-mirror' in qemu 1.2; it might be wiser to defer these until we know for sure what happens with qemu. Patches 18-23 are only valid for the RHEL 6.3 build, although patches 19-23 would be useful if qemu 1.2 enhances 'drive-mirror' to support snapshot+mirror. I've tried to give more details in the various commits up to 14, which are the most important to review for purposes of backporting; other changes in later patches include fixing a use-after-free bug in the snapshot+mirror code, and altering patch 18 to match the difference in what RHEL 6.3 will probably be providing. Available here as well: git fetch git://repo.or.cz/libvirt/ericb.git snapshot http://repo.or.cz/w/libvirt/ericb.git/shortlog/refs/heads/snapshot I think 1-3 can be applied without controversy, 4-7 can be applied _if_ we think that the libvirt API is reasonable no matter what qemu throws at us (and I'm pretty positive that it is, as I have already been through several rounds of design changes on the qemu side where the libvirt API still held up without problem), and 8 onwards should be delayed until we have a firm commitment on the actual qemu 1.1 implementation. Eric Blake (23): virsh: minor syntactic cleanups qemu: use consistent error when qemu binary is too old blockjob: add virsh blockpull --wait blockjob: add new API flags blockjob: add 'blockcopy' to virsh blockjob: enhance xml to track mirrors across libvirtd restart blockjob: react to active block copy blockjob: add qemu capabilities related to block jobs blockjob: return appropriate event and info blockjob: support pivot operation on cancel blockjob: implement block copy for qemu blockjob: relax block job behavior when setting speed up front blockjob: allow for existing files blockjob: allow mirroring under SELinux blockjob: add virDomainBlockCopy blockjob: enhance virsh 'blockcopy' blockjob: wire up qemu and RPC for block copy blockjob: accommodate RHEL backport names snapshot: allow for creation of mirrored snapshots snapshot: add new snapshot delete flags snapshot: make it possible to check for mirrored snapshot snapshot: implement new snapshot delete flags in qemu snapshot: enable mirrored snapshots on transient vm docs/apibuild.py | 1 + docs/formatdomain.html.in | 13 + docs/formatsnapshot.html.in | 31 + docs/schemas/domaincommon.rng | 24 +- docs/schemas/domainsnapshot.rng | 8 + include/libvirt/libvirt.h.in | 50 ++- include/libvirt/virterror.h | 1 + src/conf/domain_conf.c | 125 ++++- src/conf/domain_conf.h | 8 + src/driver.h | 5 + src/libvirt.c | 268 ++++++++- src/libvirt_private.syms | 2 + src/libvirt_public.syms | 5 + src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_command.c | 2 +- src/qemu/qemu_driver.c | 624 +++++++++++++++++++- src/qemu/qemu_hotplug.c | 9 +- src/qemu/qemu_monitor.c | 56 ++- src/qemu/qemu_monitor.h | 12 + src/qemu/qemu_monitor_json.c | 94 +++ src/qemu/qemu_monitor_json.h | 19 +- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +- src/remote_protocol-structs | 9 + src/rpc/gendispatch.pl | 1 + src/util/virterror.c | 6 + .../disk_snapshot_mirror.xml | 13 + .../disk_snapshot_mirror.xml | 49 ++ tests/domainsnapshotxml2xmltest.c | 4 +- .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 42 ++ .../qemuxml2xmlout-disk-mirror.xml | 40 ++ tests/qemuxml2xmltest.c | 42 +- tools/virsh.c | 558 ++++++++++++++---- tools/virsh.pod | 84 +++- 35 files changed, 2012 insertions(+), 210 deletions(-) create mode 100644 tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml create mode 100644 tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml -- 1.7.7.6

No semantic change. * tools/virsh.c: Fix some spacing issues, {} usage, long lines, and redundant (). --- v5: new patch, factored out from comments on other patches tools/virsh.c | 197 +++++++++++++++++++++++++++++---------------------------- 1 files changed, 100 insertions(+), 97 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 8ef57e0..5400487 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -402,9 +402,9 @@ static virTypedParameterPtr vshFindTypedParamByName(const char *name, static char *vshGetTypedParamValue(vshControl *ctl, virTypedParameterPtr item) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); -static char *editWriteToTempFile (vshControl *ctl, const char *doc); -static int editFile (vshControl *ctl, const char *filename); -static char *editReadBackFile (vshControl *ctl, const char *filename); +static char *editWriteToTempFile(vshControl *ctl, const char *doc); +static int editFile(vshControl *ctl, const char *filename); +static char *editReadBackFile(vshControl *ctl, const char *filename); /* Typedefs, function prototypes for job progress reporting. * There are used by some long lingering commands like @@ -604,7 +604,7 @@ static int disconnected = 0; /* we may have been disconnected */ */ static void vshCatchDisconnect(int sig, siginfo_t *siginfo, void *context ATTRIBUTE_UNUSED) { - if ((sig == SIGPIPE) || + if (sig == SIGPIPE || (SA_SIGINFO && siginfo->si_signo == SIGPIPE)) disconnected++; } @@ -1044,8 +1044,7 @@ cmdList(vshControl *ctl, const vshCmd *cmd ATTRIBUTE_UNUSED) goto cleanup; } - if ((persistent && !optPersistent) || - (!persistent && !optTransient)) { + if (!(persistent ? optPersistent : optTransient)) { virDomainFree(dom); dom = NULL; continue; @@ -3269,7 +3268,7 @@ cmdSave(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptString(cmd, "file", &to) <= 0) goto cleanup; - if (vshCommandOptBool (cmd, "verbose")) + if (vshCommandOptBool(cmd, "verbose")) verbose = true; if (pipe(p) < 0) @@ -3576,7 +3575,7 @@ cmdManagedSave(vshControl *ctl, const vshCmd *cmd) if (!(dom = vshCommandOptDomain(ctl, cmd, &name))) return false; - if (vshCommandOptBool (cmd, "verbose")) + if (vshCommandOptBool(cmd, "verbose")) verbose = true; if (pipe(p) < 0) @@ -4013,9 +4012,9 @@ doDump(void *opaque) if (!(dom = vshCommandOptDomain(ctl, cmd, &name))) goto out; - if (vshCommandOptBool (cmd, "live")) + if (vshCommandOptBool(cmd, "live")) flags |= VIR_DUMP_LIVE; - if (vshCommandOptBool (cmd, "crash")) + if (vshCommandOptBool(cmd, "crash")) flags |= VIR_DUMP_CRASH; if (vshCommandOptBool(cmd, "bypass-cache")) flags |= VIR_DUMP_BYPASS_CACHE; @@ -4054,7 +4053,7 @@ cmdDump(vshControl *ctl, const vshCmd *cmd) if (vshCommandOptString(cmd, "file", &to) <= 0) return false; - if (vshCommandOptBool (cmd, "verbose")) + if (vshCommandOptBool(cmd, "verbose")) verbose = true; if (pipe(p) < 0) @@ -5394,7 +5393,7 @@ cmdVcpuPin(vshControl *ctl, const vshCmd *cmd) } virSkipSpaces(&cur); - if ((*cur == ',') || (*cur == 0)) { + if (*cur == ',' || *cur == 0) { if (unuse) { VIR_UNUSE_CPU(cpumap, cpu); } else { @@ -5995,9 +5994,9 @@ cmdSetmaxmem(vshControl *ctl, const vshCmd *cmd) */ static const vshCmdInfo info_blkiotune[] = { {"help", N_("Get or set blkio parameters")}, - {"desc", N_("Get or set the current blkio parameters for a guest" \ - " domain.\n" \ - " To get the blkio parameters use following command: \n\n" \ + {"desc", N_("Get or set the current blkio parameters for a guest" + " domain.\n" + " To get the blkio parameters use following command: \n\n" " virsh # blkiotune <domain>")}, {NULL, NULL} }; @@ -6143,9 +6142,9 @@ cmdBlkiotune(vshControl * ctl, const vshCmd * cmd) */ static const vshCmdInfo info_memtune[] = { {"help", N_("Get or set memory parameters")}, - {"desc", N_("Get or set the current memory parameters for a guest" \ - " domain.\n" \ - " To get the memory parameters use following command: \n\n" \ + {"desc", N_("Get or set the current memory parameters for a guest" + " domain.\n" + " To get the memory parameters use following command: \n\n" " virsh # memtune <domain>")}, {NULL, NULL} }; @@ -6335,9 +6334,9 @@ cmdMemtune(vshControl *ctl, const vshCmd *cmd) */ static const vshCmdInfo info_numatune[] = { {"help", N_("Get or set numa parameters")}, - {"desc", N_("Get or set the current numa parameters for a guest" \ - " domain.\n" \ - " To get the numa parameters use following command: \n\n" \ + {"desc", N_("Get or set the current numa parameters for a guest" + " domain.\n" + " To get the numa parameters use following command: \n\n" " virsh # numatune <domain>")}, {NULL, NULL} @@ -7126,28 +7125,28 @@ doMigrate (void *opaque) goto out; } - if (vshCommandOptBool (cmd, "live")) + if (vshCommandOptBool(cmd, "live")) flags |= VIR_MIGRATE_LIVE; - if (vshCommandOptBool (cmd, "p2p")) + if (vshCommandOptBool(cmd, "p2p")) flags |= VIR_MIGRATE_PEER2PEER; - if (vshCommandOptBool (cmd, "tunnelled")) + if (vshCommandOptBool(cmd, "tunnelled")) flags |= VIR_MIGRATE_TUNNELLED; - if (vshCommandOptBool (cmd, "persistent")) + if (vshCommandOptBool(cmd, "persistent")) flags |= VIR_MIGRATE_PERSIST_DEST; - if (vshCommandOptBool (cmd, "undefinesource")) + if (vshCommandOptBool(cmd, "undefinesource")) flags |= VIR_MIGRATE_UNDEFINE_SOURCE; - if (vshCommandOptBool (cmd, "suspend")) + if (vshCommandOptBool(cmd, "suspend")) flags |= VIR_MIGRATE_PAUSED; - if (vshCommandOptBool (cmd, "copy-storage-all")) + if (vshCommandOptBool(cmd, "copy-storage-all")) flags |= VIR_MIGRATE_NON_SHARED_DISK; - if (vshCommandOptBool (cmd, "copy-storage-inc")) + if (vshCommandOptBool(cmd, "copy-storage-inc")) flags |= VIR_MIGRATE_NON_SHARED_INC; - if (vshCommandOptBool (cmd, "change-protection")) + if (vshCommandOptBool(cmd, "change-protection")) flags |= VIR_MIGRATE_CHANGE_PROTECTION; if (vshCommandOptBool(cmd, "unsafe")) @@ -7160,7 +7159,7 @@ doMigrate (void *opaque) } if ((flags & VIR_MIGRATE_PEER2PEER) || - vshCommandOptBool (cmd, "direct")) { + vshCommandOptBool(cmd, "direct")) { /* For peer2peer migration or direct migration we only expect one URI * a libvirt URI, or a hypervisor specific URI. */ @@ -7274,11 +7273,11 @@ repoll: if (pollfd.revents & POLLIN && saferead(pipe_fd, &retchar, sizeof(retchar)) > 0 && retchar == '0') { - if (verbose) { - /* print [100 %] */ - print_job_progress(label, 0, 1); - } - break; + if (verbose) { + /* print [100 %] */ + print_job_progress(label, 0, 1); + } + break; } goto cleanup; } @@ -7288,15 +7287,17 @@ repoll: if (intCaught) { virDomainAbortJob(dom); intCaught = 0; - } else + } else { goto repoll; + } } goto cleanup; } GETTIMEOFDAY(&curr); - if ( timeout && ((int)(curr.tv_sec - start.tv_sec) * 1000 + \ - (int)(curr.tv_usec - start.tv_usec) / 1000) > timeout * 1000 ) { + if (timeout && (((int)(curr.tv_sec - start.tv_sec) * 1000 + + (int)(curr.tv_usec - start.tv_usec) / 1000) > + timeout * 1000)) { /* suspend the domain when migration timeouts. */ vshDebug(ctl, VSH_ERR_DEBUG, "%s timeout", label); if (timeout_func) @@ -7322,7 +7323,7 @@ cleanup: } static bool -cmdMigrate (vshControl *ctl, const vshCmd *cmd) +cmdMigrate(vshControl *ctl, const vshCmd *cmd) { virDomainPtr dom = NULL; int p[2] = {-1, -1}; @@ -7336,14 +7337,15 @@ cmdMigrate (vshControl *ctl, const vshCmd *cmd) if (!(dom = vshCommandOptDomain(ctl, cmd, NULL))) return false; - if (vshCommandOptBool (cmd, "verbose")) + if (vshCommandOptBool(cmd, "verbose")) verbose = true; - if (vshCommandOptBool (cmd, "live")) + if (vshCommandOptBool(cmd, "live")) live_flag = true; if (vshCommandOptInt(cmd, "timeout", &timeout) > 0) { if (! live_flag) { - vshError(ctl, "%s", _("migrate: Unexpected timeout for offline migration")); + vshError(ctl, "%s", + _("migrate: Unexpected timeout for offline migration")); goto cleanup; } @@ -7896,7 +7898,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) if (nparams == 0) { - if ((virDomainGetBlockIoTune(dom, NULL, NULL, &nparams, flags)) != 0) { + if (virDomainGetBlockIoTune(dom, NULL, NULL, &nparams, flags) != 0) { vshError(ctl, "%s", _("Unable to get number of block I/O throttle parameters")); goto cleanup; @@ -7909,7 +7911,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) params = vshCalloc(ctl, nparams, sizeof(*params)); - if ((virDomainGetBlockIoTune(dom, disk, params, &nparams, flags)) != 0) { + if (virDomainGetBlockIoTune(dom, disk, params, &nparams, flags) != 0) { vshError(ctl, "%s", _("Unable to get block I/O throttle parameters")); goto cleanup; @@ -7928,7 +7930,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) params = vshCalloc(ctl, nparams, sizeof(*params)); i = 0; - if ((i < nparams) && (vshCommandOptBool(cmd, "total-bytes-sec"))) { + if (i < nparams && vshCommandOptBool(cmd, "total-bytes-sec")) { temp = ¶ms[i]; temp->type = VIR_TYPED_PARAM_ULLONG; strncpy(temp->field, VIR_DOMAIN_BLOCK_IOTUNE_TOTAL_BYTES_SEC, @@ -7937,7 +7939,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) i++; } - if ((i < nparams) && (vshCommandOptBool(cmd, "read-bytes-sec"))) { + if (i < nparams && vshCommandOptBool(cmd, "read-bytes-sec")) { temp = ¶ms[i]; temp->type = VIR_TYPED_PARAM_ULLONG; strncpy(temp->field, VIR_DOMAIN_BLOCK_IOTUNE_READ_BYTES_SEC, @@ -7946,7 +7948,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) i++; } - if ((i < nparams) && (vshCommandOptBool(cmd, "write-bytes-sec"))) { + if (i < nparams && vshCommandOptBool(cmd, "write-bytes-sec")) { temp = ¶ms[i]; temp->type = VIR_TYPED_PARAM_ULLONG; strncpy(temp->field, VIR_DOMAIN_BLOCK_IOTUNE_WRITE_BYTES_SEC, @@ -7955,7 +7957,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) i++; } - if ((i < nparams) && (vshCommandOptBool(cmd, "total-iops-sec"))) { + if (i < nparams && vshCommandOptBool(cmd, "total-iops-sec")) { temp = ¶ms[i]; temp->type = VIR_TYPED_PARAM_ULLONG; strncpy(temp->field, VIR_DOMAIN_BLOCK_IOTUNE_TOTAL_IOPS_SEC, @@ -7964,7 +7966,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) i++; } - if ((i < nparams) && (vshCommandOptBool(cmd, "read-iops-sec"))) { + if (i < nparams && vshCommandOptBool(cmd, "read-iops-sec")) { temp = ¶ms[i]; temp->type = VIR_TYPED_PARAM_ULLONG; strncpy(temp->field, VIR_DOMAIN_BLOCK_IOTUNE_READ_IOPS_SEC, @@ -7973,7 +7975,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) i++; } - if ((i < nparams) && (vshCommandOptBool(cmd, "write-iops-sec"))) { + if (i < nparams && vshCommandOptBool(cmd, "write-iops-sec")) { temp = ¶ms[i]; temp->type = VIR_TYPED_PARAM_ULLONG; strncpy(temp->field, VIR_DOMAIN_BLOCK_IOTUNE_WRITE_IOPS_SEC, @@ -7981,7 +7983,7 @@ cmdBlkdeviotune(vshControl *ctl, const vshCmd *cmd) temp->value.ul = write_iops_sec; } - if ((virDomainSetBlockIoTune(dom, disk, params, nparams, flags)) < 0) { + if (virDomainSetBlockIoTune(dom, disk, params, nparams, flags) < 0) { vshError(ctl, "%s", _("Unable to change block I/O throttle")); goto cleanup; @@ -8157,7 +8159,7 @@ cmdNetworkDumpXML(vshControl *ctl, const vshCmd *cmd) if (!(network = vshCommandOptNetwork(ctl, cmd, NULL))) return false; - inactive = vshCommandOptBool (cmd, "inactive"); + inactive = vshCommandOptBool(cmd, "inactive"); if (inactive) flags |= VIR_NETWORK_XML_INACTIVE; @@ -8306,7 +8308,7 @@ cmdInterfaceEdit (vshControl *ctl, const vshCmd *cmd) /* Everything checks out, so redefine the interface. */ virInterfaceFree (iface); - iface = virInterfaceDefineXML (ctl->conn, doc_edited, 0); + iface = virInterfaceDefineXML(ctl->conn, doc_edited, 0); if (!iface) goto cleanup; @@ -9742,7 +9744,7 @@ cmdNWFilterEdit (vshControl *ctl, const vshCmd *cmd) /* Everything checks out, so redefine the interface. */ virNWFilterFree (nwfilter); - nwfilter = virNWFilterDefineXML (ctl->conn, doc_edited); + nwfilter = virNWFilterDefineXML(ctl->conn, doc_edited); if (!nwfilter) goto cleanup; @@ -10196,11 +10198,11 @@ cmdPoolBuild(vshControl *ctl, const vshCmd *cmd) if (!(pool = vshCommandOptPool(ctl, cmd, "pool", &name))) return false; - if (vshCommandOptBool (cmd, "no-overwrite")) { + if (vshCommandOptBool(cmd, "no-overwrite")) { flags |= VIR_STORAGE_POOL_BUILD_NO_OVERWRITE; } - if (vshCommandOptBool (cmd, "overwrite")) { + if (vshCommandOptBool(cmd, "overwrite")) { flags |= VIR_STORAGE_POOL_BUILD_OVERWRITE; } @@ -10451,8 +10453,8 @@ cmdPoolList(vshControl *ctl, const vshCmd *cmd ATTRIBUTE_UNUSED) /* Retrieve a list of active storage pool names */ if (active) { - if ((virConnectListStoragePools(ctl->conn, - poolNames, numActivePools)) < 0) { + if (virConnectListStoragePools(ctl->conn, + poolNames, numActivePools) < 0) { vshError(ctl, "%s", _("Failed to list active pools")); VIR_FREE(poolInfoTexts); VIR_FREE(poolNames); @@ -10462,9 +10464,9 @@ cmdPoolList(vshControl *ctl, const vshCmd *cmd ATTRIBUTE_UNUSED) /* Add the inactive storage pools to the end of the name list */ if (inactive) { - if ((virConnectListDefinedStoragePools(ctl->conn, - &poolNames[numActivePools], - numInactivePools)) < 0) { + if (virConnectListDefinedStoragePools(ctl->conn, + &poolNames[numActivePools], + numInactivePools) < 0) { vshError(ctl, "%s", _("Failed to list inactive pools")); VIR_FREE(poolInfoTexts); VIR_FREE(poolNames); @@ -11139,8 +11141,8 @@ cmdVolCreateAs(vshControl *ctl, const vshCmd *cmd) if (vshVolSize(capacityStr, &capacity) < 0) vshError(ctl, _("Malformed size %s"), capacityStr); - if ((vshCommandOptString(cmd, "allocation", &allocationStr) > 0) && - (vshVolSize(allocationStr, &allocation) < 0)) + if (vshCommandOptString(cmd, "allocation", &allocationStr) > 0 && + vshVolSize(allocationStr, &allocation) < 0) vshError(ctl, _("Malformed size %s"), allocationStr); if (vshCommandOptString(cmd, "format", &format) < 0 || @@ -11464,8 +11466,8 @@ makeCloneXML(const char *origxml, const char *newname) goto cleanup; obj = xmlXPathEval(BAD_CAST "/volume/name", ctxt); - if ((obj == NULL) || (obj->nodesetval == NULL) || - (obj->nodesetval->nodeTab == NULL)) + if (obj == NULL || obj->nodesetval == NULL || + obj->nodesetval->nodeTab == NULL) goto cleanup; xmlNodeSetContent(obj->nodesetval->nodeTab[0], (const xmlChar *)newname); @@ -13101,7 +13103,7 @@ static const vshCmdOptDef opts_node_device_dumpxml[] = { }; static bool -cmdNodeDeviceDumpXML (vshControl *ctl, const vshCmd *cmd) +cmdNodeDeviceDumpXML(vshControl *ctl, const vshCmd *cmd) { const char *name = NULL; virNodeDevicePtr device; @@ -13382,8 +13384,8 @@ cmdVNCDisplay(vshControl *ctl, const vshCmd *cmd) goto cleanup; obj = xmlXPathEval(BAD_CAST "string(/domain/devices/graphics[@type='vnc']/@port)", ctxt); - if ((obj == NULL) || (obj->type != XPATH_STRING) || - (obj->stringval == NULL) || (obj->stringval[0] == 0)) { + if (obj == NULL || obj->type != XPATH_STRING || + obj->stringval == NULL || obj->stringval[0] == 0) { goto cleanup; } if (virStrToLong_i((const char *)obj->stringval, NULL, 10, &port) || port < 0) @@ -13391,8 +13393,8 @@ cmdVNCDisplay(vshControl *ctl, const vshCmd *cmd) xmlXPathFreeObject(obj); obj = xmlXPathEval(BAD_CAST "string(/domain/devices/graphics[@type='vnc']/@listen)", ctxt); - if ((obj == NULL) || (obj->type != XPATH_STRING) || - (obj->stringval == NULL) || (obj->stringval[0] == 0) || + if (obj == NULL || obj->type != XPATH_STRING || + obj->stringval == NULL || obj->stringval[0] == 0 || STREQ((const char*)obj->stringval, "0.0.0.0")) { vshPrint(ctl, ":%d\n", port-5900); } else { @@ -13450,8 +13452,8 @@ cmdTTYConsole(vshControl *ctl, const vshCmd *cmd) goto cleanup; obj = xmlXPathEval(BAD_CAST "string(/domain/devices/console/@tty)", ctxt); - if ((obj == NULL) || (obj->type != XPATH_STRING) || - (obj->stringval == NULL) || (obj->stringval[0] == 0)) { + if (obj == NULL || obj->type != XPATH_STRING || + obj->stringval == NULL || obj->stringval[0] == 0) { goto cleanup; } vshPrint(ctl, "%s\n", (const char *)obj->stringval); @@ -14175,13 +14177,13 @@ cmdDetachInterface(vshControl *ctl, const vshCmd *cmd) snprintf(buf, sizeof(buf), "/domain/devices/interface[@type='%s']", type); obj = xmlXPathEval(BAD_CAST buf, ctxt); - if ((obj == NULL) || (obj->type != XPATH_NODESET) || - (obj->nodesetval == NULL) || (obj->nodesetval->nodeNr == 0)) { + if (obj == NULL || obj->type != XPATH_NODESET || + obj->nodesetval == NULL || obj->nodesetval->nodeNr == 0) { vshError(ctl, _("No found interface whose type is %s"), type); goto cleanup; } - if ((!mac) && (obj->nodesetval->nodeNr > 1)) { + if (!mac && obj->nodesetval->nodeNr > 1) { vshError(ctl, _("Domain has %d interfaces. Please specify which one " "to detach using --mac"), obj->nodesetval->nodeNr); goto cleanup; @@ -14498,7 +14500,7 @@ cmdAttachDisk(vshControl *ctl, const vshCmd *cmd) (isFile) ? "file" : "block"); if (type) virBufferAsprintf(&buf, " device='%s'", type); - if (vshCommandOptBool (cmd, "rawio")) + if (vshCommandOptBool(cmd, "rawio")) virBufferAddLit(&buf, " rawio='yes'"); virBufferAddLit(&buf, ">\n"); @@ -14637,10 +14639,10 @@ vshFindDisk(const char *doc, } obj = xmlXPathEval(BAD_CAST "/domain/devices/disk", ctxt); - if ((obj == NULL) || - (obj->type != XPATH_NODESET) || - (obj->nodesetval == NULL) || - (obj->nodesetval->nodeNr == 0)) { + if (obj == NULL || + obj->type != XPATH_NODESET || + obj->nodesetval == NULL || + obj->nodesetval->nodeNr == 0) { vshError(NULL, "%s", _("Failed to get disk information")); goto cleanup; } @@ -15219,7 +15221,7 @@ cleanup: xmlFreeDoc(xml); xmlBufferFree(xml_buf); VIR_FREE(result); - if ((list != NULL) && (count > 0)) { + if (list != NULL && count > 0) { for (i = 0; i < count; i++) VIR_FREE(list[i]); } @@ -15336,7 +15338,7 @@ editReadBackFile (vshControl *ctl, const char *filename) { char *ret; - if (virFileReadAll (filename, VIRSH_MAX_XML_FILE, &ret) == -1) { + if (virFileReadAll(filename, VIRSH_MAX_XML_FILE, &ret) == -1) { vshError(ctl, _("%s: failed to read temporary file: %s"), filename, strerror(errno)); @@ -15565,7 +15567,7 @@ cmdEdit (vshControl *ctl, const vshCmd *cmd) /* Everything checks out, so redefine the domain. */ virDomainFree (dom); - dom = virDomainDefineXML (ctl->conn, doc_edited); + dom = virDomainDefineXML(ctl->conn, doc_edited); if (!dom) goto cleanup; @@ -16454,8 +16456,8 @@ cmdSnapshotList(vshControl *ctl, const vshCmd *cmd) else vshPrintExtra(ctl, " %-20s %-25s %s", _("Name"), _("Creation Time"), _("State")); - vshPrintExtra(ctl, "\n\ -------------------------------------------------------------\n"); + vshPrintExtra(ctl, "\n" +"------------------------------------------------------------\n"); } if (!numsnaps) { @@ -17874,7 +17876,7 @@ vshCommandOptInt(const vshCmd *cmd, const char *name, int *value) } num = strtol(arg->data, &end_p, 10); - if ((arg->data != end_p) && (*end_p == 0)) { + if (arg->data != end_p && *end_p == 0) { *value = num; return 1; } @@ -17909,7 +17911,7 @@ vshCommandOptUInt(const vshCmd *cmd, const char *name, unsigned int *value) } num = strtoul(arg->data, &end_p, 10); - if ((arg->data != end_p) && (*end_p == 0)) { + if (arg->data != end_p && *end_p == 0) { *value = num; return 1; } @@ -17944,7 +17946,7 @@ vshCommandOptUL(const vshCmd *cmd, const char *name, unsigned long *value) } num = strtoul(arg->data, &end_p, 10); - if ((arg->data != end_p) && (*end_p == 0)) { + if (arg->data != end_p && *end_p == 0) { *value = num; return 1; } @@ -18013,7 +18015,7 @@ vshCommandOptLongLong(const vshCmd *cmd, const char *name, } num = strtoll(arg->data, &end_p, 10); - if ((arg->data != end_p) && (*end_p == 0)) { + if (arg->data != end_p && *end_p == 0) { *value = num; return 1; } @@ -18048,7 +18050,7 @@ vshCommandOptULongLong(const vshCmd *cmd, const char *name, } num = strtoull(arg->data, &end_p, 10); - if ((arg->data != end_p) && (*end_p == 0)) { + if (arg->data != end_p && *end_p == 0) { *value = num; return 1; } @@ -18220,7 +18222,7 @@ vshCommandOptNetworkBy(vshControl *ctl, const vshCmd *cmd, *name = n; /* try it by UUID */ - if ((flag & VSH_BYUUID) && (strlen(n) == VIR_UUID_STRING_BUFLEN-1)) { + if ((flag & VSH_BYUUID) && strlen(n) == VIR_UUID_STRING_BUFLEN-1) { vshDebug(ctl, VSH_ERR_DEBUG, "%s: <%s> trying as network UUID\n", cmd->def->name, optname); network = virNetworkLookupByUUIDString(ctl->conn, n); @@ -18259,7 +18261,7 @@ vshCommandOptNWFilterBy(vshControl *ctl, const vshCmd *cmd, *name = n; /* try it by UUID */ - if ((flag & VSH_BYUUID) && (strlen(n) == VIR_UUID_STRING_BUFLEN-1)) { + if ((flag & VSH_BYUUID) && strlen(n) == VIR_UUID_STRING_BUFLEN-1) { vshDebug(ctl, VSH_ERR_DEBUG, "%s: <%s> trying as nwfilter UUID\n", cmd->def->name, optname); nwfilter = virNWFilterLookupByUUIDString(ctl->conn, n); @@ -18300,13 +18302,13 @@ vshCommandOptInterfaceBy(vshControl *ctl, const vshCmd *cmd, *name = n; /* try it by NAME */ - if ((flag & VSH_BYNAME)) { + if (flag & VSH_BYNAME) { vshDebug(ctl, VSH_ERR_DEBUG, "%s: <%s> trying as interface NAME\n", cmd->def->name, optname); iface = virInterfaceLookupByName(ctl->conn, n); } /* try it by MAC */ - if ((iface == NULL) && (flag & VSH_BYMAC)) { + if (iface == NULL && (flag & VSH_BYMAC)) { vshDebug(ctl, VSH_ERR_DEBUG, "%s: <%s> trying as interface MAC\n", cmd->def->name, optname); iface = virInterfaceLookupByMACString(ctl->conn, n); @@ -18335,7 +18337,7 @@ vshCommandOptPoolBy(vshControl *ctl, const vshCmd *cmd, const char *optname, *name = n; /* try it by UUID */ - if ((flag & VSH_BYUUID) && (strlen(n) == VIR_UUID_STRING_BUFLEN-1)) { + if ((flag & VSH_BYUUID) && strlen(n) == VIR_UUID_STRING_BUFLEN-1) { vshDebug(ctl, VSH_ERR_DEBUG, "%s: <%s> trying as pool UUID\n", cmd->def->name, optname); pool = virStoragePoolLookupByUUIDString(ctl->conn, n); @@ -19629,8 +19631,9 @@ vshReadlineDeinit (vshControl *ctl) char ebuf[1024]; vshError(ctl, _("Failed to create '%s': %s"), ctl->historydir, virStrerror(errno, ebuf, sizeof(ebuf))); - } else + } else { write_history(ctl->historyfile); + } } VIR_FREE(ctl->historydir); -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:05:52 -0600, Eric Blake wrote:
No semantic change.
* tools/virsh.c: Fix some spacing issues, {} usage, long lines, and redundant (). ---
v5: new patch, factored out from comments on other patches
tools/virsh.c | 197 +++++++++++++++++++++++++++++---------------------------- 1 files changed, 100 insertions(+), 97 deletions(-)
ACK. Jirka

Most of our errors complaining about an inability to support a particular action due to qemu limitations used CONFIG_UNSUPPORTED, but we had a few outliers. Reported by Jiri Denemark. * src/qemu/qemu_command.c (qemuBuildDriveDevStr): Prefer CONFIG_UNSUPPORTED. * src/qemu/qemu_driver.c (qemuDomainReboot) (qemuDomainBlockJobImpl): Likewise. * src/qemu/qemu_hotplug.c (qemuDomainAttachPciControllerDevice): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorTransaction) (qemuMonitorBlockJob, qemuMonitorSystemWakeup): Likewise. --- v5: new patch, factored out from comments on other patches src/qemu/qemu_command.c | 2 +- src/qemu/qemu_driver.c | 13 ++++++++----- src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_monitor.c | 6 +++--- 4 files changed, 13 insertions(+), 10 deletions(-) diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index c82f5bc..b66ce18 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -1527,7 +1527,7 @@ qemuBuildDeviceAddressStr(virBufferPtr buf, } } else { if (info->addr.pci.function != 0) { - qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Only PCI device addresses with function=0 " "are supported with this QEMU binary")); return -1; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 436ef37..c3555ca 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -1633,12 +1633,15 @@ cleanup: return ret; } -static int qemuDomainShutdown(virDomainPtr dom) { +static int qemuDomainShutdown(virDomainPtr dom) +{ return qemuDomainShutdownFlags(dom, 0); } -static int qemuDomainReboot(virDomainPtr dom, unsigned int flags) { +static int +qemuDomainReboot(virDomainPtr dom, unsigned int flags) +{ struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm; int ret = -1; @@ -1682,7 +1685,7 @@ static int qemuDomainReboot(virDomainPtr dom, unsigned int flags) { #if HAVE_YAJL if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_MONITOR_JSON)) { if (!qemuCapsGet(priv->qemuCaps, QEMU_CAPS_NO_SHUTDOWN)) { - qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("Reboot is not supported with this QEMU binary")); goto cleanup; } @@ -11643,11 +11646,11 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKJOB_ASYNC)) { async = true; } else if (!qemuCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKJOB_SYNC)) { - qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("block jobs not supported with this QEMU binary")); goto cleanup; } else if (base) { - qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("partial block pull not supported with this " "QEMU binary")); goto cleanup; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 857b980..7cf7b90 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -335,7 +335,7 @@ int qemuDomainAttachPciControllerDevice(struct qemud_driver *driver, if (controller->type == VIR_DOMAIN_CONTROLLER_TYPE_USB && controller->model == -1 && !qemuCapsGet(priv->qemuCaps, QEMU_CAPS_PIIX3_USB_UHCI)) { - qemuReportError(VIR_ERR_OPERATION_FAILED, + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("USB controller hotplug unsupported in this QEMU binary")); goto cleanup; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 36f3832..2f66c46 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2696,7 +2696,7 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) if (mon->json) ret = qemuMonitorJSONTransaction(mon, actions); else - qemuReportError(VIR_ERR_INVALID_ARG, "%s", + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("transaction requires JSON monitor")); return ret; } @@ -2786,7 +2786,7 @@ int qemuMonitorBlockJob(qemuMonitorPtr mon, ret = qemuMonitorJSONBlockJob(mon, device, base, bandwidth, info, mode, async); else - qemuReportError(VIR_ERR_INVALID_ARG, "%s", + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("block jobs require JSON monitor")); return ret; } @@ -2918,7 +2918,7 @@ int qemuMonitorSystemWakeup(qemuMonitorPtr mon) } if (!mon->json) { - qemuReportError(VIR_ERR_NO_SUPPORT, "%s", + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("JSON monitor is required")); return -1; } -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:05:53 -0600, Eric Blake wrote:
Most of our errors complaining about an inability to support a particular action due to qemu limitations used CONFIG_UNSUPPORTED, but we had a few outliers. Reported by Jiri Denemark.
* src/qemu/qemu_command.c (qemuBuildDriveDevStr): Prefer CONFIG_UNSUPPORTED. * src/qemu/qemu_driver.c (qemuDomainReboot) (qemuDomainBlockJobImpl): Likewise. * src/qemu/qemu_hotplug.c (qemuDomainAttachPciControllerDevice): Likewise. * src/qemu/qemu_monitor.c (qemuMonitorTransaction) (qemuMonitorBlockJob, qemuMonitorSystemWakeup): Likewise. ---
v5: new patch, factored out from comments on other patches
src/qemu/qemu_command.c | 2 +- src/qemu/qemu_driver.c | 13 ++++++++----- src/qemu/qemu_hotplug.c | 2 +- src/qemu/qemu_monitor.c | 6 +++--- 4 files changed, 13 insertions(+), 10 deletions(-)
Also independent, ACK. Jirka

I'm tired of shell-scripting to wait for completion of a block pull, when virsh can be taught to do the same. I couldn't quite reuse vshWatchJob, as this is not a case of a long-running command where a second thread must be used to probe job status (at least, not unless I make virsh start doing blocking waits for an event to fire), but it served as inspiration for my simpler single-threaded loop. There is up to a half-second delay between sending SIGINT and the job being aborted, but I didn't think it worth the complexity of a second thread and use of poll() just to minimize that delay. * tools/virsh.c (cmdBlockPull): Add new options to wait for completion. (blockJobImpl): Add argument. (cmdBlockJob): Adjust caller. * tools/virsh.pod (blockjob): Document new mode. --- was independent patch previously v5: address review comments, add an --async flag tools/virsh.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 14 +++++- 2 files changed, 127 insertions(+), 7 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 5400487..95ed7bc 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7521,7 +7521,8 @@ typedef enum { static int blockJobImpl(vshControl *ctl, const vshCmd *cmd, - virDomainBlockJobInfoPtr info, int mode) + virDomainBlockJobInfoPtr info, int mode, + virDomainPtr *pdom) { virDomainPtr dom = NULL; const char *name, *path; @@ -7562,7 +7563,9 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, } cleanup: - if (dom) + if (pdom && ret == 0) + *pdom = dom; + else if (dom) virDomainFree(dom); return ret; } @@ -7582,15 +7585,122 @@ static const vshCmdOptDef opts_block_pull[] = { {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, {"base", VSH_OT_DATA, VSH_OFLAG_NONE, N_("path of backing file in chain for a partial pull")}, + {"wait", VSH_OT_BOOL, 0, N_("wait for job to finish")}, + {"verbose", VSH_OT_BOOL, 0, N_("with --wait, display the progress")}, + {"timeout", VSH_OT_INT, VSH_OFLAG_NONE, + N_("with --wait, abort if pull exceeds timeout (in seconds)")}, + {"async", VSH_OT_BOOL, 0, + N_("with --wait, don't wait for cancel to finish")}, {NULL, 0, 0, NULL} }; static bool cmdBlockPull(vshControl *ctl, const vshCmd *cmd) { - if (blockJobImpl(ctl, cmd, NULL, VSH_CMD_BLOCK_JOB_PULL) != 0) + virDomainPtr dom = NULL; + bool ret = false; + bool blocking = vshCommandOptBool(cmd, "wait"); + bool verbose = vshCommandOptBool(cmd, "verbose"); + int timeout = 0; + struct sigaction sig_action; + struct sigaction old_sig_action; + sigset_t sigmask; + struct timeval start; + struct timeval curr; + const char *path = NULL; + bool quit = false; + int abort_flags = 0; + + if (blocking) { + if (vshCommandOptInt(cmd, "timeout", &timeout) > 0) { + if (timeout < 1) { + vshError(ctl, "%s", _("invalid timeout")); + return false; + } + + /* Ensure that we can multiply by 1000 without overflowing. */ + if (timeout > INT_MAX / 1000) { + vshError(ctl, "%s", _("timeout is too big")); + return false; + } + } + if (vshCommandOptString(cmd, "path", &path) < 0) + return false; + if (vshCommandOptBool(cmd, "async")) + abort_flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + + sigemptyset(&sigmask); + sigaddset(&sigmask, SIGINT); + + intCaught = 0; + sig_action.sa_sigaction = vshCatchInt; + sigemptyset(&sig_action.sa_mask); + sigaction(SIGINT, &sig_action, &old_sig_action); + + GETTIMEOFDAY(&start); + } else if (verbose || vshCommandOptBool(cmd, "timeout") || + vshCommandOptBool(cmd, "async")) { + vshError(ctl, "%s", _("blocking control options require --wait")); return false; - return true; + } + + if (blockJobImpl(ctl, cmd, NULL, VSH_CMD_BLOCK_JOB_PULL, &dom) < 0) + goto cleanup; + + if (!blocking) { + vshPrint(ctl, "%s", _("Block Pull started")); + ret = true; + goto cleanup; + } + + while (blocking) { + virDomainBlockJobInfo info; + int result = virDomainGetBlockJobInfo(dom, path, &info, 0); + + if (result < 0) { + vshError(ctl, _("failed to query job for disk %s"), path); + goto cleanup; + } + if (result == 0) + break; + + if (verbose) + print_job_progress(_("Block Pull"), info.end - info.cur, info.end); + + GETTIMEOFDAY(&curr); + if (intCaught || (timeout && + (((int)(curr.tv_sec - start.tv_sec) * 1000 + + (int)(curr.tv_usec - start.tv_usec) / 1000) > + timeout * 1000))) { + vshDebug(ctl, VSH_ERR_DEBUG, + intCaught ? "interrupted" : "timeout"); + intCaught = 0; + timeout = 0; + quit = true; + if (virDomainBlockJobAbort(dom, path, abort_flags) < 0) { + vshError(ctl, _("failed to abort job for disk %s"), path); + goto cleanup; + } + if (abort_flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) + break; + } else { + usleep(500 * 1000); + } + } + + if (verbose && !quit) { + /* printf [100 %] */ + print_job_progress(_("Block Pull"), 0, 1); + } + vshPrint(ctl, "\n%s", quit ? _("Pull aborted") : _("Pull complete")); + + ret = true; +cleanup: + if (dom) + virDomainFree(dom); + if (blocking) + sigaction(SIGINT, &old_sig_action, NULL); + return ret; } /* @@ -7641,7 +7751,7 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) else mode = VSH_CMD_BLOCK_JOB_INFO; - ret = blockJobImpl(ctl, cmd, &info, mode); + ret = blockJobImpl(ctl, cmd, &info, mode, NULL); if (ret < 0) return false; diff --git a/tools/virsh.pod b/tools/virsh.pod index 5f3d9b1..140d8e8 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -640,6 +640,7 @@ address of virtual interface (such as I<detach-interface> or I<domif-setlink>) will accept the MAC address printed by this command. =item B<blockpull> I<domain> I<path> [I<bandwidth>] [I<base>] +[I<--wait> [I<--verbose>] [I<--timeout> B<seconds>] [I<--async]] Populate a disk from its backing image chain. By default, this command flattens the entire chain; but if I<base> is specified, containing the @@ -647,8 +648,17 @@ name of one of the backing files in the chain, then that file becomes the new backing file and only the intermediate portion of the chain is pulled. Once all requested data from the backing image chain has been pulled, the disk no longer depends on that portion of the backing chain. -It pulls data for the entire disk in the background, the process of the -operation can be checked with B<blockjob>. + +By default, this command returns as soon as possible, and data for +the entire disk is pulled in the background; the progress of the +operation can be checked with B<blockjob>. However, if I<--wait> is +specified, then this command will block until the operation completes, +or cancel the operation if the optional I<timeout> in seconds elapses +or SIGINT is sent (usually with C<Ctrl-C>). Using I<--verbose> along +with I<--wait> will produce periodic status updates. If job cancellation +is triggered, I<--async> will return control to the user as fast as +possible, otherwise the command may continue to block a little while +longer until the job is done cleaning up. I<path> specifies fully-qualified path of the disk; it corresponds to a unique target name (<target dev='name'/>) or source file (<source -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:05:54 -0600, Eric Blake wrote:
I'm tired of shell-scripting to wait for completion of a block pull, when virsh can be taught to do the same. I couldn't quite reuse vshWatchJob, as this is not a case of a long-running command where a second thread must be used to probe job status (at least, not unless I make virsh start doing blocking waits for an event to fire), but it served as inspiration for my simpler single-threaded loop. There is up to a half-second delay between sending SIGINT and the job being aborted, but I didn't think it worth the complexity of a second thread and use of poll() just to minimize that delay.
* tools/virsh.c (cmdBlockPull): Add new options to wait for completion. (blockJobImpl): Add argument. (cmdBlockJob): Adjust caller. * tools/virsh.pod (blockjob): Document new mode. ---
was independent patch previously v5: address review comments, add an --async flag
tools/virsh.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 14 +++++- 2 files changed, 127 insertions(+), 7 deletions(-)
Looks good and .5-second delay is a big issue. And if it is, we can improve the implementation in the future. ACK, this can be pushed independently as well. Jirka

On 04/17/2012 07:15 AM, Jiri Denemark wrote:
On Mon, Apr 16, 2012 at 23:05:54 -0600, Eric Blake wrote:
I'm tired of shell-scripting to wait for completion of a block pull, when virsh can be taught to do the same. I couldn't quite reuse vshWatchJob, as this is not a case of a long-running command where a second thread must be used to probe job status (at least, not unless I make virsh start doing blocking waits for an event to fire), but it served as inspiration for my simpler single-threaded loop. There is up to a half-second delay between sending SIGINT and the job being aborted, but I didn't think it worth the complexity of a second thread and use of poll() just to minimize that delay.
* tools/virsh.c (cmdBlockPull): Add new options to wait for completion. (blockJobImpl): Add argument. (cmdBlockJob): Adjust caller. * tools/virsh.pod (blockjob): Document new mode. ---
was independent patch previously v5: address review comments, add an --async flag
tools/virsh.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 14 +++++- 2 files changed, 127 insertions(+), 7 deletions(-)
Looks good and .5-second delay is a big issue. And if it is, we can improve the implementation in the future. ACK, this can be pushed independently as well.
I've pushed 1-3 now. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Tue, Apr 17, 2012 at 11:13:53 -0600, Eric Blake wrote:
On 04/17/2012 07:15 AM, Jiri Denemark wrote:
On Mon, Apr 16, 2012 at 23:05:54 -0600, Eric Blake wrote:
I'm tired of shell-scripting to wait for completion of a block pull, when virsh can be taught to do the same. I couldn't quite reuse vshWatchJob, as this is not a case of a long-running command where a second thread must be used to probe job status (at least, not unless I make virsh start doing blocking waits for an event to fire), but it served as inspiration for my simpler single-threaded loop. There is up to a half-second delay between sending SIGINT and the job being aborted, but I didn't think it worth the complexity of a second thread and use of poll() just to minimize that delay.
* tools/virsh.c (cmdBlockPull): Add new options to wait for completion. (blockJobImpl): Add argument. (cmdBlockJob): Adjust caller. * tools/virsh.pod (blockjob): Document new mode. ---
was independent patch previously v5: address review comments, add an --async flag
tools/virsh.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++-- tools/virsh.pod | 14 +++++- 2 files changed, 127 insertions(+), 7 deletions(-)
Looks good and .5-second delay is a big issue. And if it is, we can improve
Heh, the above should have been "I don't think .5-second delay is a big issue" but I guess you figured that :-) Jirka

This patch introduces a new block job, useful for live storage migration using pre-copy streaming. Justification for including this under virDomainBlockRebase rather than adding a new command includes: 1) there are now two possible block jobs in qemu, with virDomainBlockRebase starting either type of command, and virDomainBlockJobInfo and virDomainBlockJobAbort working to end either type; 2) reusing this command allows distros to backport this feature to the libvirt 0.9.10 API without a .so bump. Note that a future patch may add a more powerful interface named virDomainBlockJobCopy, dedicated to just the block copy job, in order to expose even more options (such as setting an arbitrary format type for the destination without having to probe it from a pre-existing destination file); adding a new command for targetting just block copy would be similar to how we already have virDomainBlockPull for targetting just the block pull job. Using a live VM with the backing chain: base <- snap1 <- snap2 as the starting point, we have: - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY) creates /path/to/copy with the same format as snap2, with no backing file, so entire chain is copied and flattened - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) creates /path/to/copy as a raw file, so entire chain is copied and flattened - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW) creates /path/to/copy with the same format as snap2, but with snap1 as a backing file, so only snap2 is copied. - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) reuse existing /path/to/copy (must have empty contents, and format is probed[*] from the metadata), and copy the full chain - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW) reuse existing /path/to/copy (contents must be identical to snap1, and format is probed[*] from the metadata), and copy only the contents of snap2 - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) reuse existing /path/to/copy (must be raw volume with contents identical to snap1), and copy only the contents of snap2 Less useful combinations: - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) fail if source is not raw, otherwise create /path/to/copy as raw and the single file is copied (no chain involved) - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) makes little sense: the destination must be raw but have no contents, meaning that it is an empty file, so there is nothing to reuse The other three flags are rejected without VIR_DOMAIN_BLOCK_COPY. [*] Note that probing an existing file for its format can be a security risk _if_ there is a possibility that the existing file is 'raw', in which case the guest can manipulate the file to appear like some other format. But, by virtue of the VIR_DOMAIN_BLOCK_REBASE_COPY_RAW flag, it is possible to avoid probing of raw files, at which point, probing of any remaining file type is no longer a security risk. It would be nice if we could issue an event when pivoting from phase 1 to phase 2, but qemu hasn't implemented that, and we would have to poll in order to synthesize it ourselves. Meanwhile, qemu will give us a distinct job info and completion event when we either cancel or pivot to end the job. Pivoting is accomplished via the new: virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) Management applications can pre-create the copy with a relative backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT flag to have qemu reuse the metadata; if the management application also copies the backing files to a new location, this can be used to perform live storage migration of an entire backing chain. * include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_TYPE_COPY): New block job type. (virDomainBlockJobAbortFlags, virDomainBlockRebaseFlags): New enums. * src/libvirt.c (virDomainBlockRebase): Document the new flags, and implement general restrictions on flag combinations. (virDomainBlockJobAbort): Document the new flag. (virDomainSaveFlags, virDomainSnapshotCreateXML) (virDomainRevertToSnapshot, virDomainDetachDeviceFlags): Document restrictions. * include/libvirt/virterror.h (VIR_ERR_BLOCK_COPY_ACTIVE): New error. * src/util/virterror.c (virErrorMsg): Define it. --- was 5/18 in v4 v5: enhance commit message, fix typos in docs include/libvirt/libvirt.h.in | 24 +++++++++- include/libvirt/virterror.h | 1 + src/libvirt.c | 100 ++++++++++++++++++++++++++++++++++++++---- src/util/virterror.c | 6 +++ 4 files changed, 120 insertions(+), 11 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 97ad99d..ac5df95 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1934,12 +1934,15 @@ int virDomainUpdateDeviceFlags(virDomainPtr domain, /** * virDomainBlockJobType: * - * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull or - * virDomainBlockRebase) + * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull, or + * virDomainBlockRebase without flags), job ends on completion + * VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: Block Copy (virDomainBlockRebase with + * flags), job exists as long as mirroring is active */ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN = 0, VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, + VIR_DOMAIN_BLOCK_JOB_TYPE_COPY = 2, #ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_TYPE_LAST @@ -1950,9 +1953,11 @@ typedef enum { * virDomainBlockJobAbortFlags: * * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + * VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT: Pivot to mirror when ending a copy job */ typedef enum { VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1 << 0, + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT = 1 << 1, } virDomainBlockJobAbortFlags; /* An iterator for monitoring block job operations */ @@ -1983,6 +1988,21 @@ int virDomainBlockJobSetSpeed(virDomainPtr dom, const char *disk, int virDomainBlockPull(virDomainPtr dom, const char *disk, unsigned long bandwidth, unsigned int flags); + +/** + * virDomainBlockRebaseFlags: + * + * Flags available for virDomainBlockRebase(). + */ +typedef enum { + VIR_DOMAIN_BLOCK_REBASE_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW = 1 << 2, /* Make destination file raw */ + VIR_DOMAIN_BLOCK_REBASE_COPY = 1 << 3, /* Start a copy job */ +} virDomainBlockRebaseFlags; + int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags); diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index e04d29e..070fdb5 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -249,6 +249,7 @@ typedef enum { VIR_ERR_NO_DOMAIN_METADATA = 80, /* The metadata is not present */ VIR_ERR_MIGRATE_UNSAFE = 81, /* Migration is not safe */ VIR_ERR_OVERFLOW = 82, /* integer overflow */ + VIR_ERR_BLOCK_COPY_ACTIVE = 83, /* action prevented by block copy job */ } virErrorNumber; /** diff --git a/src/libvirt.c b/src/libvirt.c index 93ec817..af42d3b 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2696,6 +2696,10 @@ error: * A save file can be inspected or modified slightly with * virDomainSaveImageGetXMLDesc() and virDomainSaveImageDefineXML(). * + * Some hypervisors may prevent this operation if there is a current + * block copy operation; in that case, use virDomainBlockJobAbort() + * to stop the block copy first. + * * Returns 0 in case of success and -1 in case of failure. */ int @@ -7891,6 +7895,11 @@ error: * virDomainUndefine(). A previous definition for this domain would be * overriden if it already exists. * + * Some hypervisors may prevent this operation if there is a current + * block copy operation on a transient domain with the same id as the + * domain being defined; in that case, use virDomainBlockJobAbort() to + * stop the block copy first. + * * Returns NULL in case of error, a pointer to the domain otherwise */ virDomainPtr @@ -9424,6 +9433,10 @@ error: * return failure if LIVE is specified but it only supports removing the * persisted device allocation. * + * Some hypervisors may prevent this operation if there is a current + * block copy operation on the device being detached; in that case, + * use virDomainBlockJobAbort() to stop the block copy first. + * * Returns 0 in case of success, -1 in case of failure. */ int @@ -17124,6 +17137,10 @@ virDomainSnapshotGetConnect(virDomainSnapshotPtr snapshot) * that it is still possible to fail after disks have changed, but only * in the much rarer cases of running out of memory or disk space). * + * Some hypervisors may prevent this operation if there is a current + * block copy operation; in that case, use virDomainBlockJobAbort() + * to stop the block copy first. + * * Returns an (opaque) virDomainSnapshotPtr on success, NULL on failure. */ virDomainSnapshotPtr @@ -17913,7 +17930,8 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * - * By default, this function performs a synchronous operation and the caller + * If the current block job for @disk is VIR_DOMAIN_BLOCK_JOB_TYPE_PULL, then + * by default, this function performs a synchronous operation and the caller * may assume that the operation has completed when 0 is returned. However, * BlockJob operations may take a long time to cancel, and during this time * further domain interactions may be unresponsive. To avoid this problem, @@ -17922,7 +17940,18 @@ error: * been canceled, a BlockJob event will be emitted, with status * VIR_DOMAIN_BLOCK_JOB_CANCELED (even if the ABORT_ASYNC flag was not * used); it is also possible to poll virDomainBlockJobInfo() to see if - * the job cancellation is still pending. + * the job cancellation is still pending. This type of job can be restarted + * to pick up from where it left off. + * + * If the current block job for @disk is VIR_DOMAIN_BLOCK_JOB_TYPE_COPY, then + * the default is to abort the mirroring and revert to the source disk; + * adding @flags of VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT causes this call to + * fail with VIR_ERR_BLOCK_COPY_ACTIVE if the copy is not fully populated, + * otherwise it will swap the disk over to the copy to end the mirroring. An + * event will be issued when the job is ended, and it is possible to use + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC to control whether this command waits + * for the completion of the job. Restarting this job requires starting + * over from the beginning of the first phase. * * Returns -1 in case of failure, 0 when successful. */ @@ -18172,19 +18201,57 @@ error: * @disk: path to the block device, or device shorthand * @base: path to backing file to keep, or NULL for no backing file * @bandwidth: (optional) specify copy bandwidth limit in Mbps - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise-OR of virDomainBlockRebaseFlags * * Populate a disk image with data from its backing image chain, and - * setting the backing image to @base. @base must be the absolute + * setting the backing image to @base, or alternatively copy an entire + * backing chain to a new file @base. + * + * When @flags is 0, this starts a pull, where @base must be the absolute * path of one of the backing images further up the chain, or NULL to * convert the disk image so that it has no backing image. Once all * data from its backing image chain has been pulled, the disk no * longer depends on those intermediate backing images. This function * pulls data for the entire device in the background. Progress of - * the operation can be checked with virDomainGetBlockJobInfo() and - * the operation can be aborted with virDomainBlockJobAbort(). When - * finished, an asynchronous event is raised to indicate the final - * status. + * the operation can be checked with virDomainGetBlockJobInfo() with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_PULL, and the operation can be + * aborted with virDomainBlockJobAbort(). When finished, an asynchronous + * event is raised to indicate the final status, and the job no longer + * exists. If the job is aborted, a new one can be started later to + * resume from the same point. + * + * When @flags includes VIR_DOMAIN_BLOCK_REBASE_COPY, this starts a copy, + * where @base must be the name of a new file to copy the chain to. By + * default, the copy will pull the entire source chain into the destination + * file, but if @flags also contains VIR_DOMAIN_BLOCK_REBASE_SHALLOW, then + * only the top of the source chain will be copied (the source and + * destination have a common backing file). By default, @base will be + * created with the same file format as the source, but this can be altered + * by adding VIR_DOMAIN_BLOCK_REBASE_COPY_RAW to force the copy to be raw + * (does not make sense with the shallow flag unless the source is also raw), + * or by using VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT to reuse an existing file + * with initial contents identical to the backing file of the source (this + * allows a management app to pre-create files with relative backing file + * names, rather than the default of absolute backing file names; as a + * security precaution, you should generally only use reuse_ext with the + * shallow flag and a non-raw destination file). + * + * A copy job has two parts; in the first phase, the @bandwidth parameter + * affects how fast the source is pulled into the destination, and the job + * can only be canceled by reverting to the source file; progress in this + * phase can be tracked via the virDomainBlockJobInfo() command, with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_COPY. The job transitions to the + * second phase when the job info states cur == end, and remains alive to + * mirror all further changes to both source and destination. The user + * must call virDomainBlockJobAbort() to end the mirroring while choosing + * whether to revert to source or pivot to the destination. An event is + * issued when the job ends, and in the future, an event may be added when + * the job transitions from pulling to mirroring. If the job is aborted, + * a new job will have to start over from the beginning of the first phase. + * + * Some hypervisors will restrict certain actions, such as virDomainSave() + * or virDomainDetachDevice(), while a copy job is active; they may + * also restrict a copy job to transient domains. * * The @disk parameter is either an unambiguous source name of the * block device (the <source file='...'/> sub-element, such as @@ -18198,7 +18265,8 @@ error: * suitable default. Some hypervisors do not support this feature and will * return an error if bandwidth is not 0. * - * When @base is NULL, this is identical to virDomainBlockPull(). + * When @base is NULL and @flags is 0, this is identical to + * virDomainBlockPull(). * * Returns 0 if the operation has started, -1 on failure. */ @@ -18231,6 +18299,20 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, goto error; } + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + if (!base) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("base is required when starting a copy")); + goto error; + } + } else if (flags & (VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW)) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("use of flags requires a copy job")); + goto error; + } + if (conn->driver->domainBlockRebase) { int ret; ret = conn->driver->domainBlockRebase(dom, disk, base, bandwidth, diff --git a/src/util/virterror.c b/src/util/virterror.c index ff9a36f..845081e 100644 --- a/src/util/virterror.c +++ b/src/util/virterror.c @@ -1250,6 +1250,12 @@ virErrorMsg(virErrorNumber error, const char *info) else errmsg = _("numerical overflow: %s"); break; + case VIR_ERR_BLOCK_COPY_ACTIVE: + if (!info) + errmsg = _("block copy still active"); + else + errmsg = _("block copy still active: %s"); + break; } return errmsg; } -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:05:55 -0600, Eric Blake wrote:
This patch introduces a new block job, useful for live storage migration using pre-copy streaming. Justification for including this under virDomainBlockRebase rather than adding a new command includes: 1) there are now two possible block jobs in qemu, with virDomainBlockRebase starting either type of command, and virDomainBlockJobInfo and virDomainBlockJobAbort working to end either type; 2) reusing this command allows distros to backport this feature to the libvirt 0.9.10 API without a .so bump.
Note that a future patch may add a more powerful interface named virDomainBlockJobCopy, dedicated to just the block copy job, in order to expose even more options (such as setting an arbitrary format type for the destination without having to probe it from a pre-existing destination file); adding a new command for targetting just block copy would be similar to how we already have virDomainBlockPull for targetting just the block pull job.
Using a live VM with the backing chain: base <- snap1 <- snap2 as the starting point, we have:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY) creates /path/to/copy with the same format as snap2, with no backing file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) creates /path/to/copy as a raw file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW) creates /path/to/copy with the same format as snap2, but with snap1 as a backing file, so only snap2 is copied.
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) reuse existing /path/to/copy (must have empty contents, and format is probed[*] from the metadata), and copy the full chain
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW) reuse existing /path/to/copy (contents must be identical to snap1, and format is probed[*] from the metadata), and copy only the contents of snap2
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) reuse existing /path/to/copy (must be raw volume with contents identical to snap1), and copy only the contents of snap2
Less useful combinations:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) fail if source is not raw, otherwise create /path/to/copy as raw and the single file is copied (no chain involved)
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) makes little sense: the destination must be raw but have no contents, meaning that it is an empty file, so there is nothing to reuse
The other three flags are rejected without VIR_DOMAIN_BLOCK_COPY.
[*] Note that probing an existing file for its format can be a security risk _if_ there is a possibility that the existing file is 'raw', in which case the guest can manipulate the file to appear like some other format. But, by virtue of the VIR_DOMAIN_BLOCK_REBASE_COPY_RAW flag, it is possible to avoid probing of raw files, at which point, probing of any remaining file type is no longer a security risk.
It would be nice if we could issue an event when pivoting from phase 1 to phase 2, but qemu hasn't implemented that, and we would have to poll in order to synthesize it ourselves. Meanwhile, qemu will give us a distinct job info and completion event when we either cancel or pivot to end the job. Pivoting is accomplished via the new:
virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)
Management applications can pre-create the copy with a relative backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT flag to have qemu reuse the metadata; if the management application also copies the backing files to a new location, this can be used to perform live storage migration of an entire backing chain.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_TYPE_COPY): New block job type. (virDomainBlockJobAbortFlags, virDomainBlockRebaseFlags): New enums. * src/libvirt.c (virDomainBlockRebase): Document the new flags, and implement general restrictions on flag combinations. (virDomainBlockJobAbort): Document the new flag. (virDomainSaveFlags, virDomainSnapshotCreateXML) (virDomainRevertToSnapshot, virDomainDetachDeviceFlags): Document restrictions. * include/libvirt/virterror.h (VIR_ERR_BLOCK_COPY_ACTIVE): New error. * src/util/virterror.c (virErrorMsg): Define it. ---
was 5/18 in v4 v5: enhance commit message, fix typos in docs
include/libvirt/libvirt.h.in | 24 +++++++++- include/libvirt/virterror.h | 1 + src/libvirt.c | 100 ++++++++++++++++++++++++++++++++++++++---- src/util/virterror.c | 6 +++ 4 files changed, 120 insertions(+), 11 deletions(-)
OK, you addressed my comments and I didn't spot any additional issues here. However, the question is whether we feel comfortable enough with pushing this with the risk that qemu implementation is not upstream yet. I think the API is sane but the history of pushing something that qemu ended up implementing in a different way worries me a bit. Jirka

Il 18/04/2012 16:18, Jiri Denemark ha scritto:
However, the question is whether we feel comfortable enough with pushing this with the risk that qemu implementation is not upstream yet. I think the API is sane but the history of pushing something that qemu ended up implementing in a different way worries me a bit.
FWIW, the worries on the QEMU side aren't really with the API but with the implementation. We are not sure of the drive-reopen API, but the mirroring abstractions (apart from transactions, i.e. patches 19-23) should be fine. Paolo

On 04/18/2012 08:42 AM, Paolo Bonzini wrote:
Il 18/04/2012 16:18, Jiri Denemark ha scritto:
However, the question is whether we feel comfortable enough with pushing this with the risk that qemu implementation is not upstream yet. I think the API is sane but the history of pushing something that qemu ended up implementing in a different way worries me a bit.
FWIW, the worries on the QEMU side aren't really with the API but with the implementation. We are not sure of the drive-reopen API, but the mirroring abstractions (apart from transactions, i.e. patches 19-23) should be fine.
Given the various comments, I've gone ahead and pushed patches 4-6 (the API changes); I'll wait for a bit longer for any further word on the qemu side before pushing any qemu implementation patches, but at this point, I think we're now committed to supporting as much of live storage copy as qemu will let us support at the time of the libvirt 0.9.12 release. I'll probably repost a v6 later today, with only minor tweaks for rebasing the qemu implementation to latest; at this point, any changes I make to the patch series will be limited to a possible rewrite of 12/23 (if upstream will let us call block-job-set-speed prior to block-stream or drive-mirror), a possible split of 8/23 (if we are sure that upstream will take drive-mirror but not drive-reopen in time for qemu 1.1), and adding a new patch 14.5/23 to pause and resume the domain around any block-reopen call (to ensure that we know when the guest has pivoted, even if libvirtd is restarted at an inopportune time, based on whether the guest is still paused). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Rather than further overloading 'blockpull', I decided to create a new virsh command to expose the new flags of virDomainBlockRebase. Blocking until the command completes naturally is pointless, since the block copy job is intended to run indefinitely. Instead, I made the command support three --wait modes: by default, it runs until mirroring is started; with --pivot, it pivots as soon as mirroring is started; and with --finish, it aborts (for a clean copy) as soon as mirroring is started. * tools/virsh.c (VSH_CMD_BLOCK_JOB_COPY): New mode. (blockJobImpl): Support new flags. (cmdBlockCopy): New command. (cmdBlockJob): Support new job info, new abort flag. * tools/virsh.pod (blockcopy, blockjob): Document the new command and flags. --- was 6/18 in v4 v5: add --wait flag, consistent capitalization tools/virsh.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++---- tools/virsh.pod | 49 ++++++++++++- 2 files changed, 243 insertions(+), 18 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 95ed7bc..0f79022 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7517,7 +7517,8 @@ typedef enum { VSH_CMD_BLOCK_JOB_INFO = 1, VSH_CMD_BLOCK_JOB_SPEED = 2, VSH_CMD_BLOCK_JOB_PULL = 3, -} VSH_CMD_BLOCK_JOB_MODE; + VSH_CMD_BLOCK_JOB_COPY = 4, +} vshCmdBlockJobMode; static int blockJobImpl(vshControl *ctl, const vshCmd *cmd, @@ -7528,6 +7529,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, const char *name, *path; unsigned long bandwidth = 0; int ret = -1; + const char *base = NULL; unsigned int flags = 0; if (!vshConnectionUsability(ctl, ctl->conn)) @@ -7544,22 +7546,39 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, goto cleanup; } - if (mode == VSH_CMD_BLOCK_JOB_ABORT) { + switch ((vshCmdBlockJobMode) mode) { + case VSH_CMD_BLOCK_JOB_ABORT: if (vshCommandOptBool(cmd, "async")) flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + if (vshCommandOptBool(cmd, "pivot")) + flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT; ret = virDomainBlockJobAbort(dom, path, flags); - } else if (mode == VSH_CMD_BLOCK_JOB_INFO) { + break; + case VSH_CMD_BLOCK_JOB_INFO: ret = virDomainGetBlockJobInfo(dom, path, info, 0); - } else if (mode == VSH_CMD_BLOCK_JOB_SPEED) { + break; + case VSH_CMD_BLOCK_JOB_SPEED: ret = virDomainBlockJobSetSpeed(dom, path, bandwidth, 0); - } else if (mode == VSH_CMD_BLOCK_JOB_PULL) { - const char *base = NULL; + break; + case VSH_CMD_BLOCK_JOB_PULL: if (vshCommandOptString(cmd, "base", &base) < 0) goto cleanup; if (base) ret = virDomainBlockRebase(dom, path, base, bandwidth, 0); else ret = virDomainBlockPull(dom, path, bandwidth, 0); + break; + case VSH_CMD_BLOCK_JOB_COPY: + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; + if (vshCommandOptBool(cmd, "shallow")) + flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + if (vshCommandOptBool(cmd, "reuse-external")) + flags |= VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + if (vshCommandOptBool(cmd, "raw")) + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + if (vshCommandOptString(cmd, "dest", &base) < 0) + goto cleanup; + ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); } cleanup: @@ -7571,6 +7590,158 @@ cleanup: } /* + * "blockcopy" command + */ +static const vshCmdInfo info_block_copy[] = { + {"help", N_("Start a block copy operation.")}, + {"desc", N_("Populate a disk from its backing image.")}, + {NULL, NULL} +}; + +static const vshCmdOptDef opts_block_copy[] = { + {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, + {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("fully-qualified path of disk")}, + {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("bandwidth limit in MB/s")}, + {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, + {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, + {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, + {"wait", VSH_OT_BOOL, 0, N_("wait for job to reach mirroring phase")}, + {"verbose", VSH_OT_BOOL, 0, N_("with --wait, display the progress")}, + {"timeout", VSH_OT_INT, VSH_OFLAG_NONE, + N_("with --wait, abort if copy exceeds timeout (in seconds)")}, + {"pivot", VSH_OT_BOOL, 0, N_("with --wait, pivot when mirroring starts")}, + {"finish", VSH_OT_BOOL, 0, N_("with --wait, quit when mirroring starts")}, + {"async", VSH_OT_BOOL, 0, + N_("with --wait, don't wait for cancel to finish")}, + {NULL, 0, 0, NULL} +}; + +static bool +cmdBlockCopy(vshControl *ctl, const vshCmd *cmd) +{ + virDomainPtr dom = NULL; + bool ret = false; + bool blocking = vshCommandOptBool(cmd, "wait"); + bool verbose = vshCommandOptBool(cmd, "verbose"); + bool pivot = vshCommandOptBool(cmd, "pivot"); + bool finish = vshCommandOptBool(cmd, "finish"); + int timeout = 0; + struct sigaction sig_action; + struct sigaction old_sig_action; + sigset_t sigmask; + struct timeval start; + struct timeval curr; + const char *path = NULL; + bool quit = false; + int abort_flags = 0; + + if (blocking) { + if (pivot && finish) { + vshError(ctl, "%s", _("cannot mix --pivot and --finish")); + return false; + } + if (vshCommandOptInt(cmd, "timeout", &timeout) > 0) { + if (timeout < 1) { + vshError(ctl, "%s", _("migrate: Invalid timeout")); + return false; + } + + /* Ensure that we can multiply by 1000 without overflowing. */ + if (timeout > INT_MAX / 1000) { + vshError(ctl, "%s", _("migrate: Timeout is too big")); + return false; + } + } + if (vshCommandOptString(cmd, "path", &path) < 0) + return false; + if (vshCommandOptBool(cmd, "async")) + abort_flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + + sigemptyset(&sigmask); + sigaddset(&sigmask, SIGINT); + + intCaught = 0; + sig_action.sa_sigaction = vshCatchInt; + sigemptyset(&sig_action.sa_mask); + sigaction(SIGINT, &sig_action, &old_sig_action); + + GETTIMEOFDAY(&start); + } else if (verbose || vshCommandOptBool(cmd, "timeout") || + vshCommandOptBool(cmd, "async") || pivot || finish) { + vshError(ctl, "%s", _("blocking control options require --wait")); + return false; + } + + if (blockJobImpl(ctl, cmd, NULL, VSH_CMD_BLOCK_JOB_COPY, &dom) < 0) + goto cleanup; + + if (!blocking) { + vshPrint(ctl, "%s", _("Block Copy started")); + ret = true; + goto cleanup; + } + + while (blocking) { + virDomainBlockJobInfo info; + int result = virDomainGetBlockJobInfo(dom, path, &info, 0); + + if (result <= 0) { + vshError(ctl, _("failed to query job for disk %s"), path); + goto cleanup; + } + if (verbose) + print_job_progress(_("Block Copy"), info.end - info.cur, info.end); + if (info.cur == info.end) + break; + + GETTIMEOFDAY(&curr); + if (intCaught || (timeout && + (((int)(curr.tv_sec - start.tv_sec) * 1000 + + (int)(curr.tv_usec - start.tv_usec) / 1000) > + timeout * 1000))) { + vshDebug(ctl, VSH_ERR_DEBUG, + intCaught ? "interrupted" : "timeout"); + intCaught = 0; + timeout = 0; + quit = true; + if (virDomainBlockJobAbort(dom, path, abort_flags) < 0) { + vshError(ctl, _("failed to abort job for disk %s"), path); + goto cleanup; + } + if (abort_flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) + break; + } else { + usleep(500 * 1000); + } + } + + if (pivot) { + abort_flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT; + if (virDomainBlockJobAbort(dom, path, abort_flags) < 0) { + vshError(ctl, _("failed to pivot job for disk %s"), path); + goto cleanup; + } + } else if (finish && virDomainBlockJobAbort(dom, path, abort_flags) < 0) { + vshError(ctl, _("failed to finish job for disk %s"), path); + goto cleanup; + } + vshPrint(ctl, "\n%s", + quit ? _("Copy aborted") : + pivot ? _("Successfully pivoted") : + finish ? _("Successfully copied") : + _("Now in mirroring phase")); + + ret = true; +cleanup: + if (dom) + virDomainFree(dom); + if (blocking) + sigaction(SIGINT, &old_sig_action, NULL); + return ret; +} + +/* * "blockpull" command */ static const vshCmdInfo info_block_pull[] = { @@ -7581,8 +7752,8 @@ static const vshCmdInfo info_block_pull[] = { static const vshCmdOptDef opts_block_pull[] = { {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, - {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, - {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, + {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("fully-qualified path of disk")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("bandwidth limit in MB/s")}, {"base", VSH_OT_DATA, VSH_OFLAG_NONE, N_("path of backing file in chain for a partial pull")}, {"wait", VSH_OT_BOOL, 0, N_("wait for job to finish")}, @@ -7714,15 +7885,17 @@ static const vshCmdInfo info_block_job[] = { static const vshCmdOptDef opts_block_job[] = { {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, - {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, + {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("fully-qualified path of disk")}, {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, - N_("Abort the active job on the specified disk")}, + N_("abort the active job on the specified disk")}, {"async", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("don't wait for --abort to complete")}, + {"pivot", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("conclude and pivot a copy job")}, {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, - N_("Get active job information for the specified disk")}, + N_("get active job information for the specified disk")}, {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, - N_("Set the Bandwidth limit in MB/s")}, + N_("set the Bandwidth limit in MB/s")}, {NULL, 0, 0, NULL} }; @@ -7734,7 +7907,8 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) const char *type; int ret; bool abortMode = (vshCommandOptBool(cmd, "abort") || - vshCommandOptBool(cmd, "async")); + vshCommandOptBool(cmd, "async") || + vshCommandOptBool(cmd, "pivot")); bool infoMode = vshCommandOptBool(cmd, "info"); bool bandwidth = vshCommandOptBool(cmd, "bandwidth"); @@ -7758,10 +7932,17 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) if (ret == 0 || mode != VSH_CMD_BLOCK_JOB_INFO) return true; - if (info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL) + switch (info.type) { + case VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: type = _("Block Pull"); - else + break; + case VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: + type = _("Block Copy"); + break; + default: type = _("Unknown job"); + break; + } print_job_progress(type, info.end - info.cur, info.end); if (info.bandwidth != 0) @@ -17240,6 +17421,7 @@ static const vshCmdDef domManagementCmds[] = { {"autostart", cmdAutostart, opts_autostart, info_autostart, 0}, {"blkdeviotune", cmdBlkdeviotune, opts_blkdeviotune, info_blkdeviotune, 0}, {"blkiotune", cmdBlkiotune, opts_blkiotune, info_blkiotune, 0}, + {"blockcopy", cmdBlockCopy, opts_block_copy, info_block_copy, 0}, {"blockjob", cmdBlockJob, opts_block_job, info_block_job, 0}, {"blockpull", cmdBlockPull, opts_block_pull, info_block_pull, 0}, {"blockresize", cmdBlockResize, opts_block_resize, info_block_resize, 0}, diff --git a/tools/virsh.pod b/tools/virsh.pod index 140d8e8..23324b2 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -639,6 +639,47 @@ currently in use by a running domain. Other contexts that require a MAC address of virtual interface (such as I<detach-interface> or I<domif-setlink>) will accept the MAC address printed by this command. +=item B<blockcopy> I<domain> I<path> I<dest> [I<bandwidth>] [I<--shallow>] +[I<--reuse-external>] [I<--raw>] [I<--wait> [I<--verbose] +[{I<--pivot> | I<--finish>}] [I<--timeout> B<seconds>] [I<--async>]] + +Copy a disk backing image chain to I<dest>. By default, this command +flattens the entire chain; but if I<--shallow> is specified, the copy +shares the backing chain. + +If I<--reuse-external> is specified, then I<dest> must exist and have +contents identical to the resulting backing file (that is, it must +start with contents matching the backing file I<disk> if I<--shallow> +is used, otherwise it must start empty); this option is typically used +to set up a relative backing file name in the destination. + +The format of the destination is determined by the first match in the +following list: if I<--raw> is specified, it will be raw; if +I<--reuse-external> is specified, the existing destination is probed +for a format; and in all other cases, the destination format will +match the source format. + +By default, the copy job runs in the background, and consists of two +phases. Initially, the job must copy all data from the source, and +during this phase, the job can only be canceled to revert back to the +source disk, with no guarantees about the destination. After this phase +completes, both the source and the destination remain mirrored until a +call to B<blockjob> with the I<--abort> and I<--pivot> flags pivots over +to the copy, or a call without I<--pivot> leaves the destination as a +faithful copy of that point in time. However, if I<--wait> is specified, +then this command will block until the mirroring phase begins, or cancel +the operation if the optional I<timeout> in seconds elapses or SIGINT is +sent (usually with C<Ctrl-C>). Using I<--verbose> along with I<--wait> +will produce periodic status updates. Using I<--pivot> or I<--finish> +along with I<--wait> will additionally end the job cleanly rather than +leaving things in the mirroring phase. If job cancellation is triggered, +I<--async> will return control to the user as fast as possible, otherwise +the command may continue to block a little while longer until the job +is done cleaning up. + +I<path> specifies fully-qualified path of the disk. +I<bandwidth> specifies copying bandwidth limit in Mbps. + =item B<blockpull> I<domain> I<path> [I<bandwidth>] [I<base>] [I<--wait> [I<--verbose>] [I<--timeout> B<seconds>] [I<--async]] @@ -700,12 +741,12 @@ Both I<--live> and I<--current> flags may be given, but I<--current> is exclusive. If no flag is specified, behavior is different depending on hypervisor. -=item B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async>] | +=item B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async>] [I<--pivot>] | [I<--info>] | [I<bandwidth>] } Manage active block operations. There are three modes: I<--info>, I<bandwidth>, and I<--abort>; I<--info> is default except that I<--async> -implies I<--abort>. +or I<--pivot> implies I<--abort>. I<path> specifies fully-qualified path of the disk; it corresponds to a unique target name (<target dev='name'/>) or source file (<source @@ -714,7 +755,9 @@ also B<domblklist> for listing these names). If I<--abort> is specified, the active job on the specified disk will be aborted. If I<--async> is also specified, this command will return -immediately, rather than waiting for the cancelation to complete. +immediately, rather than waiting for the cancelation to complete. If +I<--pivot> is specified, this requests that an active copy job +be pivoted over to the new copy. If I<--info> is specified, the active job information on the specified disk will be printed. I<bandwidth> can be used to set bandwidth limit for the active job. -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:05:56 -0600, Eric Blake wrote:
Rather than further overloading 'blockpull', I decided to create a new virsh command to expose the new flags of virDomainBlockRebase.
Blocking until the command completes naturally is pointless, since the block copy job is intended to run indefinitely. Instead, I made the command support three --wait modes: by default, it runs until mirroring is started; with --pivot, it pivots as soon as mirroring is started; and with --finish, it aborts (for a clean copy) as soon as mirroring is started.
* tools/virsh.c (VSH_CMD_BLOCK_JOB_COPY): New mode. (blockJobImpl): Support new flags. (cmdBlockCopy): New command. (cmdBlockJob): Support new job info, new abort flag. * tools/virsh.pod (blockcopy, blockjob): Document the new command and flags. ---
was 6/18 in v4 v5: add --wait flag, consistent capitalization
tools/virsh.c | 212 +++++++++++++++++++++++++++++++++++++++++++++++++++---- tools/virsh.pod | 49 ++++++++++++- 2 files changed, 243 insertions(+), 18 deletions(-)
OK, looks good but my comment from patch 4/23 applies here as well. Jirka

In order to track a block copy job across libvirtd restarts, we need to save internal XML that tracks the name of the file holding the mirror. Displaying this name in dumpxml might also be useful to the user, even if we don't yet have a way to (re-) start a domain with mirroring enabled up front. This is done with a new <mirror> sub-element to <disk>, as in: <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/original.img'/> <mirror file='/var/lib/libvirt/images/copy.img' format='qcow2' ready='yes'/> ... </disk> For now, the element is output-only, in live domains; it is ignored when defining a domain or hot-plugging a disk (since those contexts use VIR_DOMAIN_XML_INACTIVE in parsing). The 'ready' attribute appears when libvirt knows that the job has changed from the initial pulling phase over to the mirroring phase, although absence of the attribute is not a sure indicator of the current phase. If we come up with a way to make qemu start with mirroring enabled, we can relax the xml restriction, and allow <mirror> (but not attribute 'ready') on input. Testing active-only XML meant tweaking the testsuite slightly, but it was worth it. * docs/schemas/domaincommon.rng (diskspec): Add diskMirror. * docs/formatdomain.html.in (elementsDisks): Document it. * src/conf/domain_conf.h (_virDomainDiskDef): New members. * src/conf/domain_conf.c (virDomainDiskDefFree): Clean them. (virDomainDiskDefParseXML): Parse them, but only internally. (virDomainDiskDefFormat): Output them. * tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: New test file. * tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml: Likewise. * tests/qemuxml2xmltest.c (testInfo): Alter members. (testCompareXMLToXMLHelper): Allow more test control. (mymain): Run new test. --- was 7/18 in v4 v5: allow but ignore <mirror> on inactive, and add tests docs/formatdomain.html.in | 13 ++++ docs/schemas/domaincommon.rng | 24 +++++++- src/conf/domain_conf.c | 62 +++++++++++++++++--- src/conf/domain_conf.h | 4 + .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 42 +++++++++++++ .../qemuxml2xmlout-disk-mirror.xml | 40 +++++++++++++ tests/qemuxml2xmltest.c | 42 ++++++++------ 7 files changed, 198 insertions(+), 29 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index f8ec6ff..a99281d 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1296,6 +1296,19 @@ </table> <span class="since">Since 0.9.7</span> </dd> + <dt><code>mirror</code></dt> + <dd> + This element is present if the hypervisor has started a block + copy operation (via the <code>virDomainBlockCopy</code> API), + where the mirror location in attribute <code>file</code> will + eventually have the same contents as the source, and with the + file format in attribute <code>format</code> (which might + differ from the format of the source). If + attribute <code>ready</code> is present, then it is known the + disk is ready to pivot; otherwise, the disk is probably still + copying. For now, this element only valid in output; it is + rejected on input. <span class="since">Since 0.9.12</span> + </dd> <dt><code>target</code></dt> <dd>The <code>target</code> element controls the bus / device under which the disk is exposed to the guest diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 0cc04af..44f0e8c 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -772,6 +772,9 @@ <ref name="driver"/> </optional> <optional> + <ref name='diskMirror'/> + </optional> + <optional> <ref name="diskAuth"/> </optional> <ref name="target"/> @@ -1013,9 +1016,7 @@ </element> </define> <!-- - Disk may use a special driver for access. Currently this is - only defined for Xen for tap/aio and file, but will certainly be - extended in the future, and libvirt doesn't look for specific values. + Disk may use a special driver for access. --> <define name="driver"> <element name="driver"> @@ -3024,6 +3025,23 @@ <empty/> </element> </define> + <define name='diskMirror'> + <element name='mirror'> + <attribute name='file'> + <ref name='absFilePath'/> + </attribute> + <optional> + <attribute name='format'> + <ref name="genericName"/> + </attribute> + </optional> + <optional> + <attribute name='ready'> + <value>yes</value> + </attribute> + </optional> + </element> + </define> <define name="diskAuth"> <element name="auth"> <attribute name="username"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index a9c5cbc..fe5c92c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -933,6 +933,8 @@ void virDomainDiskDefFree(virDomainDiskDefPtr def) VIR_FREE(def->dst); VIR_FREE(def->driverName); VIR_FREE(def->driverType); + VIR_FREE(def->mirror); + VIR_FREE(def->mirrorFormat); VIR_FREE(def->auth.username); if (def->auth.secretType == VIR_DOMAIN_DISK_SECRET_TYPE_USAGE) VIR_FREE(def->auth.secret.usage); @@ -3318,6 +3320,9 @@ virDomainDiskDefParseXML(virCapsPtr caps, char *ioeventfd = NULL; char *event_idx = NULL; char *copy_on_read = NULL; + char *mirror = NULL; + char *mirrorFormat = NULL; + bool mirroring = false; char *devaddr = NULL; virStorageEncryptionPtr encryption = NULL; char *serial = NULL; @@ -3353,8 +3358,8 @@ virDomainDiskDefParseXML(virCapsPtr caps, cur = node->children; while (cur != NULL) { if (cur->type == XML_ELEMENT_NODE) { - if ((source == NULL && hosts == NULL) && - (xmlStrEqual(cur->name, BAD_CAST "source"))) { + if (source == NULL && hosts == NULL && + xmlStrEqual(cur->name, BAD_CAST "source")) { sourceNode = cur; @@ -3431,8 +3436,8 @@ virDomainDiskDefParseXML(virCapsPtr caps, those broken apps */ if (source && STREQ(source, "")) VIR_FREE(source); - } else if ((target == NULL) && - (xmlStrEqual(cur->name, BAD_CAST "target"))) { + } else if (target == NULL && + xmlStrEqual(cur->name, BAD_CAST "target")) { target = virXMLPropString(cur, "dev"); bus = virXMLPropString(cur, "bus"); tray = virXMLPropString(cur, "tray"); @@ -3442,8 +3447,8 @@ virDomainDiskDefParseXML(virCapsPtr caps, if (target && STRPREFIX(target, "ioemu:")) memmove(target, target+6, strlen(target)-5); - } else if ((driverName == NULL) && - (xmlStrEqual(cur->name, BAD_CAST "driver"))) { + } else if (driverName == NULL && + xmlStrEqual(cur->name, BAD_CAST "driver")) { driverName = virXMLPropString(cur, "name"); driverType = virXMLPropString(cur, "type"); cachetag = virXMLPropString(cur, "cache"); @@ -3453,6 +3458,22 @@ virDomainDiskDefParseXML(virCapsPtr caps, ioeventfd = virXMLPropString(cur, "ioeventfd"); event_idx = virXMLPropString(cur, "event_idx"); copy_on_read = virXMLPropString(cur, "copy_on_read"); + } else if (mirror == NULL && + xmlStrEqual(cur->name, BAD_CAST "mirror") && + !(flags & VIR_DOMAIN_XML_INACTIVE)) { + char *ready; + mirror = virXMLPropString(cur, "file"); + if (!mirror) { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("mirror requires file name")); + goto error; + } + mirrorFormat = virXMLPropString(cur, "format"); + ready = virXMLPropString(cur, "ready"); + if (ready) { + mirroring = true; + VIR_FREE(ready); + } } else if (xmlStrEqual(cur->name, BAD_CAST "auth")) { authUsername = virXMLPropString(cur, "username"); if (authUsername == NULL) { @@ -3577,8 +3598,8 @@ virDomainDiskDefParseXML(virCapsPtr caps, cur); if (encryption == NULL) goto error; - } else if ((serial == NULL) && - (xmlStrEqual(cur->name, BAD_CAST "serial"))) { + } else if (serial == NULL && + xmlStrEqual(cur->name, BAD_CAST "serial")) { serial = (char *)xmlNodeGetContent(cur); } else if (xmlStrEqual(cur->name, BAD_CAST "boot")) { /* boot is parsed as part of virDomainDeviceInfoParseXML */ @@ -3867,6 +3888,11 @@ virDomainDiskDefParseXML(virCapsPtr caps, driverName = NULL; def->driverType = driverType; driverType = NULL; + def->mirror = mirror; + mirror = NULL; + def->mirrorFormat = mirrorFormat; + mirrorFormat = NULL; + def->mirroring = mirroring; def->encryption = encryption; encryption = NULL; def->serial = serial; @@ -3882,6 +3908,12 @@ virDomainDiskDefParseXML(virCapsPtr caps, !(def->driverName = strdup(caps->defaultDiskDriverName))) goto no_memory; + + if (def->mirror && !def->mirrorFormat && + caps->defaultDiskDriverType && + !(def->mirrorFormat = strdup(caps->defaultDiskDriverType))) + goto no_memory; + if (def->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && virDomainDiskDefAssignAddress(caps, def) < 0) goto error; @@ -3907,6 +3939,8 @@ cleanup: VIR_FREE(authUsage); VIR_FREE(driverType); VIR_FREE(driverName); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); VIR_FREE(cachetag); VIR_FREE(error_policy); VIR_FREE(rerror_policy); @@ -10829,6 +10863,18 @@ virDomainDiskDefFormat(virBufferPtr buf, } } + /* For now, mirroring is currently output-only: we only output it + * for live domains, therefore we ignore it on input except for + * the internal parse on libvirtd restart. */ + if (def->mirror && !(flags & VIR_DOMAIN_XML_INACTIVE)) { + virBufferEscapeString(buf, " <mirror file='%s'", def->mirror); + if (def->mirrorFormat) + virBufferAsprintf(buf, " format='%s'", def->mirrorFormat); + if (def->mirroring) + virBufferAddLit(buf, " ready='yes'"); + virBufferAddLit(buf, "/>\n"); + } + virBufferAsprintf(buf, " <target dev='%s' bus='%s'", def->dst, bus); if ((def->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY || diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 0eed60e..5aa8fc1 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -563,6 +563,10 @@ struct _virDomainDiskDef { char *driverName; char *driverType; + char *mirror; + char *mirrorFormat; + bool mirroring; + virDomainBlockIoTuneInfo blkdeviotune; char *serial; diff --git a/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml b/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml new file mode 100644 index 0000000..0c95724 --- /dev/null +++ b/tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml @@ -0,0 +1,42 @@ +<domain type='qemu' id='1'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <mirror file='/dev/HostVG/QEMUGuest1Copy' ready='yes'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='cdrom'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='hdc' bus='ide'/> + <readonly/> + <address type='drive' controller='0' bus='1' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <source file='/tmp/data.img'/> + <mirror file='/tmp/copy.img' format='qcow2'/> + <target dev='vda' bus='virtio'/> + </disk> + <disk type='file' device='disk'> + <source file='/tmp/logs.img'/> + <target dev='vdb' bus='virtio'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml b/tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml new file mode 100644 index 0000000..00111be --- /dev/null +++ b/tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml @@ -0,0 +1,40 @@ +<domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219136</memory> + <currentMemory unit='KiB'>219136</currentMemory> + <vcpu>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='cdrom'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='hdc' bus='ide'/> + <readonly/> + <address type='drive' controller='0' bus='1' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <source file='/tmp/data.img'/> + <target dev='vda' bus='virtio'/> + </disk> + <disk type='file' device='disk'> + <source file='/tmp/logs.img'/> + <target dev='vdb' bus='virtio'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <memballoon model='virtio'/> + </devices> +</domain> diff --git a/tests/qemuxml2xmltest.c b/tests/qemuxml2xmltest.c index 42e4d9d..9bca066 100644 --- a/tests/qemuxml2xmltest.c +++ b/tests/qemuxml2xmltest.c @@ -54,10 +54,16 @@ testCompareXMLToXMLFiles(const char *inxml, const char *outxml, bool live) return ret; } +enum { + WHEN_INACTIVE = 1, + WHEN_ACTIVE = 2, + WHEN_EITHER = 3, +}; + struct testInfo { const char *name; - int different; - bool inactive_only; + bool different; + int when; }; static int @@ -74,17 +80,15 @@ testCompareXMLToXMLHelper(const void *data) abs_srcdir, info->name) < 0) goto cleanup; - if (info->different) { - ret = testCompareXMLToXMLFiles(xml_in, xml_out, false); - } else { - ret = testCompareXMLToXMLFiles(xml_in, xml_in, false); + if (info->when & WHEN_INACTIVE) { + ret = testCompareXMLToXMLFiles(xml_in, + info->different ? xml_out : xml_in, + false); } - if (!info->inactive_only) { - if (info->different) { - ret = testCompareXMLToXMLFiles(xml_in, xml_out, true); - } else { - ret = testCompareXMLToXMLFiles(xml_in, xml_in, true); - } + if (info->when & WHEN_ACTIVE) { + ret = testCompareXMLToXMLFiles(xml_in, + info->different ? xml_out : xml_in, + true); } cleanup: @@ -102,19 +106,19 @@ mymain(void) if ((driver.caps = testQemuCapsInit()) == NULL) return EXIT_FAILURE; -# define DO_TEST_FULL(name, is_different, inactive) \ +# define DO_TEST_FULL(name, is_different, when) \ do { \ - const struct testInfo info = {name, is_different, inactive}; \ + const struct testInfo info = {name, is_different, when}; \ if (virtTestRun("QEMU XML-2-XML " name, \ 1, testCompareXMLToXMLHelper, &info) < 0) \ ret = -1; \ } while (0) # define DO_TEST(name) \ - DO_TEST_FULL(name, 0, false) + DO_TEST_FULL(name, false, WHEN_EITHER) # define DO_TEST_DIFFERENT(name) \ - DO_TEST_FULL(name, 1, false) + DO_TEST_FULL(name, true, WHEN_EITHER) /* Unset or set all envvars here that are copied in qemudBuildCommandLine * using ADD_ENV_COPY, otherwise these tests may fail due to unexpected @@ -151,6 +155,8 @@ mymain(void) DO_TEST("disk-scsi-device"); DO_TEST("disk-scsi-vscsi"); DO_TEST("disk-scsi-virtio-scsi"); + DO_TEST_FULL("disk-mirror", false, WHEN_ACTIVE); + DO_TEST_FULL("disk-mirror", true, WHEN_INACTIVE); DO_TEST("graphics-listen-network"); DO_TEST("graphics-vnc"); DO_TEST("graphics-vnc-sasl"); @@ -208,8 +214,8 @@ mymain(void) DO_TEST("usb-redir"); DO_TEST("blkdeviotune"); - DO_TEST_FULL("seclabel-dynamic-baselabel", false, true); - DO_TEST_FULL("seclabel-dynamic-override", false, true); + DO_TEST_FULL("seclabel-dynamic-baselabel", false, WHEN_INACTIVE); + DO_TEST_FULL("seclabel-dynamic-override", false, WHEN_INACTIVE); DO_TEST("seclabel-static"); DO_TEST("seclabel-none"); -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:05:57 -0600, Eric Blake wrote:
In order to track a block copy job across libvirtd restarts, we need to save internal XML that tracks the name of the file holding the mirror. Displaying this name in dumpxml might also be useful to the user, even if we don't yet have a way to (re-) start a domain with mirroring enabled up front. This is done with a new <mirror> sub-element to <disk>, as in:
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/original.img'/> <mirror file='/var/lib/libvirt/images/copy.img' format='qcow2' ready='yes'/> ... </disk>
For now, the element is output-only, in live domains; it is ignored when defining a domain or hot-plugging a disk (since those contexts use VIR_DOMAIN_XML_INACTIVE in parsing). The 'ready' attribute appears when libvirt knows that the job has changed from the initial pulling phase over to the mirroring phase, although absence of the attribute is not a sure indicator of the current phase. If we come up with a way to make qemu start with mirroring enabled, we can relax the xml restriction, and allow <mirror> (but not attribute 'ready') on input. Testing active-only XML meant tweaking the testsuite slightly, but it was worth it.
* docs/schemas/domaincommon.rng (diskspec): Add diskMirror. * docs/formatdomain.html.in (elementsDisks): Document it. * src/conf/domain_conf.h (_virDomainDiskDef): New members. * src/conf/domain_conf.c (virDomainDiskDefFree): Clean them. (virDomainDiskDefParseXML): Parse them, but only internally. (virDomainDiskDefFormat): Output them. * tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml: New test file. * tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml: Likewise. * tests/qemuxml2xmltest.c (testInfo): Alter members. (testCompareXMLToXMLHelper): Allow more test control. (mymain): Run new test. ---
was 7/18 in v4 v5: allow but ignore <mirror> on inactive, and add tests
docs/formatdomain.html.in | 13 ++++ docs/schemas/domaincommon.rng | 24 +++++++- src/conf/domain_conf.c | 62 +++++++++++++++++--- src/conf/domain_conf.h | 4 + .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 42 +++++++++++++ .../qemuxml2xmlout-disk-mirror.xml | 40 +++++++++++++ tests/qemuxml2xmltest.c | 42 ++++++++------ 7 files changed, 198 insertions(+), 29 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml
OK, looks good as well with the comment from 4/23 applied. BTW, not removing unrelated () would make the patch a bit smaller for review ;-) Jirka

On 04/18/2012 08:23 AM, Jiri Denemark wrote:
On Mon, Apr 16, 2012 at 23:05:57 -0600, Eric Blake wrote:
In order to track a block copy job across libvirtd restarts, we need to save internal XML that tracks the name of the file holding the mirror. Displaying this name in dumpxml might also be useful to the user, even if we don't yet have a way to (re-) start a domain with mirroring enabled up front. This is done with a new <mirror> sub-element to <disk>, as in:
v5: allow but ignore <mirror> on inactive, and add tests
docs/formatdomain.html.in | 13 ++++ docs/schemas/domaincommon.rng | 24 +++++++- src/conf/domain_conf.c | 62 +++++++++++++++++--- src/conf/domain_conf.h | 4 + .../qemuxml2argvdata/qemuxml2argv-disk-mirror.xml | 42 +++++++++++++ .../qemuxml2xmlout-disk-mirror.xml | 40 +++++++++++++ tests/qemuxml2xmltest.c | 42 ++++++++------ 7 files changed, 198 insertions(+), 29 deletions(-) create mode 100644 tests/qemuxml2argvdata/qemuxml2argv-disk-mirror.xml create mode 100644 tests/qemuxml2xmloutdata/qemuxml2xmlout-disk-mirror.xml
OK, looks good as well with the comment from 4/23 applied. BTW, not removing unrelated () would make the patch a bit smaller for review ;-)
I'll work on splitting the () trimming into a separate patch, which can be applied now, similar to what I already did for virsh in patch 1/23. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

For now, disk migration via block copy job is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout. But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data. We can make life a lot easier by being cowards, and forbidding certain operations on a domain. This patch guarantees that we never get in a state where we would have to restart a domain with a mirroring block copy, by preventing saves, snapshots, hot unplug of a disk in use, and conversion to a persistent domain (thankfully, it is still relatively easy to 'virsh undefine' a running domain to temporarily make it transient, run tests on 'virsh blockcopy', then 'virsh define' to restore the persistence). The change to qemudDomainDefine looks a bit odd for undoing an assignment, rather than probing up front to avoid the assignment, but this is because of how virDomainAssignDef combines both a lookup and assignment into a single function call. * src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype. * src/conf/domain_conf.c (virDomainHasDiskMirror): New function. * src/libvirt_private.syms (domain_conf.h): Export it. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous actions while block copy is already in action. * src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise. --- was 8/18 in v4 v5: improve commit message src/conf/domain_conf.c | 12 ++++++++++++ src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 29 +++++++++++++++++++++++++++++ src/qemu/qemu_hotplug.c | 7 +++++++ 5 files changed, 50 insertions(+), 0 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fe5c92c..1b3f34a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7185,6 +7185,18 @@ virDomainDiskRemoveByName(virDomainDefPtr def, const char *name) return virDomainDiskRemove(def, i); } +/* Return true if VM has at least one disk involved in a current block + * copy job (that is, with a <mirror> element in the disk xml). */ +bool +virDomainHasDiskMirror(virDomainObjPtr vm) +{ + int i; + for (i = 0; i < vm->def->ndisks; i++) + if (vm->def->disks[i]->mirror) + return true; + return false; +} + int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) { if (VIR_REALLOC_N(def->nets, def->nnets + 1) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 5aa8fc1..9d74f44 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1954,6 +1954,7 @@ virDomainDiskDefPtr virDomainDiskRemove(virDomainDefPtr def, size_t i); virDomainDiskDefPtr virDomainDiskRemoveByName(virDomainDefPtr def, const char *name); +bool virDomainHasDiskMirror(virDomainObjPtr vm); int virDomainNetIndexByMac(virDomainDefPtr def, const unsigned char *mac); int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 8d29e75..4ed2f0c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -354,6 +354,7 @@ virDomainGraphicsSpiceZlibCompressionTypeFromString; virDomainGraphicsSpiceZlibCompressionTypeToString; virDomainGraphicsTypeFromString; virDomainGraphicsTypeToString; +virDomainHasDiskMirror; virDomainHostdevDefAlloc; virDomainHostdevDefClear; virDomainHostdevDefFree; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c3555ca..582eafa 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2572,6 +2572,11 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_PARTIAL, sizeof(header.magic)); @@ -4964,6 +4969,12 @@ static virDomainPtr qemudDomainDefine(virConnectPtr conn, const char *xml) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1; if (virDomainSaveConfig(driver->configDir, @@ -10279,6 +10290,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } + if (!vm->persistent && (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT)) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cannot halt after transient domain snapshot")); @@ -10886,6 +10903,11 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } snap = virDomainSnapshotFindByName(&vm->snapshots, snapshot->name); if (!snap) { @@ -11661,6 +11683,13 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto cleanup; disk = vm->def->disks[idx]; + if (mode == BLOCK_JOB_PULL && disk->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 7cf7b90..4c5b863 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1721,6 +1721,13 @@ int qemuDomainDetachDiskDevice(struct qemud_driver *driver, detach = vm->def->disks[i]; + if (detach->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' is in an active block copy job"), + detach->dst); + goto cleanup; + } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0) != 0) { qemuReportError(VIR_ERR_INTERNAL_ERROR, -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:05:58 -0600, Eric Blake wrote:
For now, disk migration via block copy job is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout.
But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data.
We can make life a lot easier by being cowards, and forbidding certain operations on a domain. This patch guarantees that we never get in a state where we would have to restart a domain with a mirroring block copy, by preventing saves, snapshots, hot unplug of a disk in use, and conversion to a persistent domain (thankfully, it is still relatively easy to 'virsh undefine' a running domain to temporarily make it transient, run tests on 'virsh blockcopy', then 'virsh define' to restore the persistence).
The change to qemudDomainDefine looks a bit odd for undoing an assignment, rather than probing up front to avoid the assignment, but this is because of how virDomainAssignDef combines both a lookup and assignment into a single function call.
* src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype. * src/conf/domain_conf.c (virDomainHasDiskMirror): New function. * src/libvirt_private.syms (domain_conf.h): Export it. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous actions while block copy is already in action. * src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise. ---
was 8/18 in v4 v5: improve commit message
src/conf/domain_conf.c | 12 ++++++++++++ src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 29 +++++++++++++++++++++++++++++ src/qemu/qemu_hotplug.c | 7 +++++++ 5 files changed, 50 insertions(+), 0 deletions(-)
OK, nice commit message improvement. Jirka

The new block copy storage migration sequence requires both the 'drive-mirror' and 'drive-reopen' monitor commands, which have been proposed[1] for qemu 1.1. Someday (probably for qemu 1.2), these commands may also be added to the 'transaction' monitor command for even more power, but we don't need to worry about that now. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01630.html * src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR) (QEMU_CAPS_DRIVE_REOPEN): New bits. * src/qemu/qemu_capabilities.c (qemuCaps): Name them. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set them. (qemuMonitorJSONDiskSnapshot): Fix formatting issues. (qemuMonitorJSONDriveMirror, qemuMonitorDriveReopen): New functions. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): Declare them. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): New passthroughs. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): Declare them. --- merge 1/18 and 9/18 from v4 v5: adjust to latest upstream qemu proposal, which requires 'full' argument and no longer permits interaction with 'transaction' src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_monitor.c | 49 ++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 11 +++++++ src/qemu/qemu_monitor_json.c | 63 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 18 ++++++++++- 6 files changed, 143 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index fd4ca4d..0e5dd68 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -159,6 +159,8 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "block-job-sync", /* 90 */ "block-job-async", + "drive-mirror", + "drive-reopen", ); struct qemu_feature_flags { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 6550d59..e6fa519 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -126,6 +126,8 @@ enum qemuCapsFlags { QEMU_CAPS_TRANSACTION = 89, /* transaction monitor command */ QEMU_CAPS_BLOCKJOB_SYNC = 90, /* RHEL 6.2 block_job_cancel */ QEMU_CAPS_BLOCKJOB_ASYNC = 91, /* qemu 1.1 block-job-cancel */ + QEMU_CAPS_DRIVE_MIRROR = 92, /* drive-mirror monitor command */ + QEMU_CAPS_DRIVE_REOPEN = 93, /* drive-reopen monitor command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 2f66c46..34c7926 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2685,6 +2685,31 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; } +/* Add the drive-mirror action to a transaction. */ +int +qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, flags=%x", + mon, device, file, format, flags); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, device, file, format, flags); + else + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("drive-mirror requires JSON monitor")); + return ret; +} + /* Use the transaction QMP command to run atomic snapshot commands. */ int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) @@ -2701,6 +2726,30 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) return ret; } +/* Use the drive-reopen monitor command. */ +int +qemuMonitorDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s", + mon, device, file, format); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveReopen(mon, device, file, format); + else + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("drive-reopen requires JSON monitor")); + return ret; +} + int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, char **reply, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index f3cdcdd..1b4b130 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -510,6 +510,17 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, bool reuse); int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int qemuMonitorDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index eb58f13..e0ea505 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -987,6 +987,10 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, qemuCapsSet(qemuCaps, QEMU_CAPS_BLOCKJOB_SYNC); else if (STREQ(name, "block-job-cancel")) qemuCapsSet(qemuCaps, QEMU_CAPS_BLOCKJOB_ASYNC); + else if (STREQ(name, "drive-mirror")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "drive-reopen")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); } ret = 0; @@ -3199,6 +3203,38 @@ cleanup: } int +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + bool shallow = (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) != 0; + bool reuse = (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) != 0; + + cmd = qemuMonitorJSONMakeCommand("drive-mirror", + "s:device", device, + "s:target", file, + "b:full", !shallow, + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) { int ret = -1; @@ -3226,6 +3262,33 @@ cleanup: return ret; } +int +qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + cmd = qemuMonitorJSONMakeCommand("drive-reopen", + "s:device", device, + "s:new-image-file", file, + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index aacbb5f..d93c37d 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -230,8 +230,22 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, const char *device, const char *file, const char *format, - bool reuse); -int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions); + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, -- 1.7.7.6

On 04/17/2012 01:05 AM, Eric Blake wrote:
The new block copy storage migration sequence requires both the 'drive-mirror' and 'drive-reopen' monitor commands, which have been proposed[1] for qemu 1.1. Someday (probably for qemu 1.2), these commands may also be added to the 'transaction' monitor command for even more power, but we don't need to worry about that now.
[1]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01630.html
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR) (QEMU_CAPS_DRIVE_REOPEN): New bits. * src/qemu/qemu_capabilities.c (qemuCaps): Name them. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set them. (qemuMonitorJSONDiskSnapshot): Fix formatting issues. (qemuMonitorJSONDriveMirror, qemuMonitorDriveReopen): New functions. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): Declare them. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): New passthroughs. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): Declare them. ---
merge 1/18 and 9/18 from v4 v5: adjust to latest upstream qemu proposal, which requires 'full' argument and no longer permits interaction with 'transaction'
src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_monitor.c | 49 ++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 11 +++++++ src/qemu/qemu_monitor_json.c | 63 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 18 ++++++++++- 6 files changed, 143 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index fd4ca4d..0e5dd68 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -159,6 +159,8 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST,
"block-job-sync", /* 90 */ "block-job-async", + "drive-mirror", + "drive-reopen", );
struct qemu_feature_flags { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 6550d59..e6fa519 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -126,6 +126,8 @@ enum qemuCapsFlags { QEMU_CAPS_TRANSACTION = 89, /* transaction monitor command */ QEMU_CAPS_BLOCKJOB_SYNC = 90, /* RHEL 6.2 block_job_cancel */ QEMU_CAPS_BLOCKJOB_ASYNC = 91, /* qemu 1.1 block-job-cancel */ + QEMU_CAPS_DRIVE_MIRROR = 92, /* drive-mirror monitor command */ + QEMU_CAPS_DRIVE_REOPEN = 93, /* drive-reopen monitor command */
QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 2f66c46..34c7926 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2685,6 +2685,31 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; }
+/* Add the drive-mirror action to a transaction. */
Is this comment now incorrect? (You say the new qemu proposal doesn't allow interaction with "transaction")
+int +qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, flags=%x", + mon, device, file, format, flags);
Should we be concerned about NULL string pointers for args that aren't qualified as ATTRIBUTE_NONNULL?
+ + if (!mon) {
I had thought that declaring an arg ATTRIBUTE_NONNULL meant that you didn't need to check for NULL in the code, so when I saw this, I was confused and decided to investigate. I found the following gcc bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308 conveniently with a comment posted by on "Eric Blake" (any relation? :-) So does this mean that all of our ATTRIBUTE_NONNULL declarations are pointless?
+ qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, device, file, format, flags); + else + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("drive-mirror requires JSON monitor")); + return ret; +} + /* Use the transaction QMP command to run atomic snapshot commands. */ int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) @@ -2701,6 +2726,30 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) return ret; }
+/* Use the drive-reopen monitor command. */ +int +qemuMonitorDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s", + mon, device, file, format); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveReopen(mon, device, file, format); + else + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("drive-reopen requires JSON monitor")); + return ret; +} + int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, char **reply, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index f3cdcdd..1b4b130 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -510,6 +510,17 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, bool reuse); int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int qemuMonitorDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3);
int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index eb58f13..e0ea505 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -987,6 +987,10 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, qemuCapsSet(qemuCaps, QEMU_CAPS_BLOCKJOB_SYNC); else if (STREQ(name, "block-job-cancel")) qemuCapsSet(qemuCaps, QEMU_CAPS_BLOCKJOB_ASYNC); + else if (STREQ(name, "drive-mirror")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "drive-reopen")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); }
ret = 0; @@ -3199,6 +3203,38 @@ cleanup: }
int +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + bool shallow = (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) != 0; + bool reuse = (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) != 0; + + cmd = qemuMonitorJSONMakeCommand("drive-mirror", + "s:device", device, + "s:target", file, + "b:full", !shallow, + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) { int ret = -1; @@ -3226,6 +3262,33 @@ cleanup: return ret; }
+int +qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + cmd = qemuMonitorJSONMakeCommand("drive-reopen", + "s:device", device, + "s:new-image-file", file, + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index aacbb5f..d93c37d 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -230,8 +230,22 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, const char *device, const char *file, const char *format, - bool reuse); -int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions); + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3);
int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str,
I'll avoid using the "A word" because I don't know the status of qemu upstream, and don't want to confuse patchchecker. It all looks okay to me though, although the ATTRIBUTE_NONNULL stuff was a bit of a revelation (probably I'm the only one who didn't already know about it...).

On 04/18/2012 02:18 PM, Laine Stump wrote:
On 04/17/2012 01:05 AM, Eric Blake wrote:
The new block copy storage migration sequence requires both the 'drive-mirror' and 'drive-reopen' monitor commands, which have been proposed[1] for qemu 1.1. Someday (probably for qemu 1.2), these commands may also be added to the 'transaction' monitor command for even more power, but we don't need to worry about that now.
[1]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01630.html
+++ b/src/qemu/qemu_monitor.c @@ -2685,6 +2685,31 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; }
+/* Add the drive-mirror action to a transaction. */
Is this comment now incorrect? (You say the new qemu proposal doesn't allow interaction with "transaction")
Indeed. I'll fix that.
+int +qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, flags=%x", + mon, device, file, format, flags);
Should we be concerned about NULL string pointers for args that aren't qualified as ATTRIBUTE_NONNULL?
Which is just format, but yes, I need NULLSTR(format). glibc doesn't care, but other platforms might.
+ + if (!mon) {
I had thought that declaring an arg ATTRIBUTE_NONNULL meant that you didn't need to check for NULL in the code, so when I saw this, I was confused and decided to investigate. I found the following gcc bug:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17308
conveniently with a comment posted by on "Eric Blake" (any relation? :-)
A blast from my past :)
So does this mean that all of our ATTRIBUTE_NONNULL declarations are pointless?
Not entirely pointless, since Coverity, clang, and friends are smarter than gcc, and will actually use ATTRIBUTE_NONNULL to point out bugs that gcc currently misses. But yes, that gcc bug is quite annoying (and one of the sad reasons that llvm is gaining popularity, even though it's license is not as Free [as in freedom] as gcc's). And it gives us all the more reason to keep running Coverity reports.
I'll avoid using the "A word" because I don't know the status of qemu upstream, and don't want to confuse patchchecker. It all looks okay to me though, although the ATTRIBUTE_NONNULL stuff was a bit of a revelation (probably I'm the only one who didn't already know about it...).
On IRC today, Paolo told me that 'drive-mirror' is looking robust enough to make it into qemu 1.1, but 'drive-reopen' is floundering and might be delayed. If that ends up being the case, it would mean that libvirt could _still_ provide copy semantics (virDomainBlockRebase(,VIR_DOMAIN_BLOCK_REBASE_COPY) followed by virDomainBlockJobAbort(,0) after reaching the mirroring phase), but would lose out on the ability to do migration (virDomainBlockRebase(,VIR_DOMAIN_BLOCK_REBASE_COPY) followed by virDomainBlockJobAbort(,VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)). As such, it may mean that I should consider splitting this patch into two pieces, to be applied independently according to when things are applied in upstream qemu (and to backport both to RHEL, where both commands exist as a __com.redhat_ variant). I _still_ think that patches 4-6 are in solid shape - we know what API changes to support, once we can map them to a qemu implementation (and better yet, we can map them to other hypervisors), bringing us back to my claim that 7 and beyond are still waiting on qemu stability. It's not the nicest to commit to an API without an implementation, but we've done it before, and I'm quite confident that no matter what qemu finally settles on, at least our API decision is robust enough to support that qemu implementation. Paolo, any corrections? -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Il 18/04/2012 23:40, Eric Blake ha scritto:
It's not the nicest to commit to an API without an implementation, but we've done it before, and I'm quite confident that no matter what qemu finally settles on, at least our API decision is robust enough to support that qemu implementation.
Paolo, any corrections?
I think so. Paolo

Handle the new type of block copy event and info. Of course, this patch does nothing until a later patch actually allows the creation/abort of a block copy job. And we'd really love to have an event without having to poll for the transition between pull and mirroring, but that will have to wait for qemu. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockJobImpl) (qemuMonitorJSONGetBlockJobInfoOne): Translate new job type. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful info query to save effort on a pivot request. --- was 10/18 in v4 no real change src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_monitor_json.c | 4 ++++ 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 582eafa..e1584c6 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11712,6 +11712,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (ret < 0) goto endjob; + /* Snoop block copy operations, so future cancel operations can + * avoid checking if pivot is safe. */ + if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && + info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + disk->mirroring = true; + /* With synchronous block cancel, we must synthesize an event, and * we silently ignore the ABORT_ASYNC flag. With asynchronous * block cancel, the event will come from qemu, but without the diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e0ea505..6bce701 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -797,6 +797,8 @@ qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type_str, "mirror")) + type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; switch ((virConnectDomainEventBlockJobStatus) event) { case VIR_DOMAIN_BLOCK_JOB_COMPLETED: @@ -3415,6 +3417,8 @@ static int qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, } if (STREQ(type, "stream")) info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type, "mirror")) + info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; else info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:06:00 -0600, Eric Blake wrote:
Handle the new type of block copy event and info. Of course, this patch does nothing until a later patch actually allows the creation/abort of a block copy job. And we'd really love to have an event without having to poll for the transition between pull and mirroring, but that will have to wait for qemu.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockJobImpl) (qemuMonitorJSONGetBlockJobInfoOne): Translate new job type. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful info query to save effort on a pivot request. ---
was 10/18 in v4 no real change
src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_monitor_json.c | 4 ++++ 2 files changed, 10 insertions(+), 0 deletions(-)
OK Jirka

This is the bare minimum to end a copy job (of course, until a later patch adds the ability to start a copy job, this patch doesn't do much in isolation; I've just split the patches to ease the review). This patch intentionally avoids SELinux, lock manager, and audit actions. Also, if libvirtd restarts at the exact moment that a 'drive-reopen' is in flight, the only proper way to detect the outcome of that 'drive-reopen' would be to first pass in a witness fd with 'getfd', then at libvirtd restart, probe whether that file is still empty. This patch is enough to test the common case of success when used correctly, while saving the subtleties of proper cleanup for worst-case errors for later. When a mirror job is started, cancelling the job safely reverts back to the source disk, regardless of whether the destination is in phase 1 (streaming, in which case the destination is worthless) or phase 2 (mirroring, in which case the destination is synced up to the source at the time of the cancel). Our existing code does just fine in either phase, other than some bookkeeping cleanup. Pivoting the job requires the use of the new 'drive-reopen' command. Here, failure of the command is potentially catastrophic to the domain, since the initial qemu implementation rips out the old disk before attempting to open the new one; qemu will attempt a recovery path of retrying the reopen on the original source, but if that also fails, the domain is hosed, with nothing libvirt can do about it. If qemu 1.2 ever adds 'drive-reopen' inside 'transaction', then the problem will no longer exist (a transaction promises not to close the old file until after the new file is proven to work), at which point we would add a VIR_DOMAIN_REBASE_COPY_ATOMIC that fails up front if we detect an older qemu with the risky drive-reopen. Interesting side note: while snapshot-create --disk-only creates a copy of the disk at a point in time by moving the domain on to a new file (the copy is the file now in the just-extended backing chain), blockjob --abort of a copy job creates a copy of the disk while keeping the domain on the original file. There may be potential improvements to the snapshot code to exploit block copy over multiple disks all at one point in time. And, if 'block-job-cancel' were made part of 'transaction', you could copy multiple disks at the same point in time without pausing the domain. This also implies we may want to add a --quiesce flag to the pivot operation, so that when breaking a mirror, the side of the mirror that we are abandoning is at least in a stable state with regards to guest I/O. * src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag. (qemuDomainBlockPivot): New helper function. (qemuDomainBlockJobImpl): Implement it. --- was 11/18 in v4 v5: no real change, improve commit message src/qemu/qemu_driver.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 105 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e1584c6..8b6c984 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11640,6 +11640,86 @@ cleanup: return ret; } +/* Called while holding the VM job lock, to implement a block job + * abort with pivot; this updates the VM definition as appropriate, on + * either success or failure (although there are some forms of + * catastrophic failure that will leave the VM unusable). */ +static int +qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, + const char *device, virDomainDiskDefPtr disk) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainBlockJobInfo info; + + /* Probe the status, if needed. */ + if (!disk->mirroring) { + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorBlockJob(priv->mon, device, NULL, 0, &info, + BLOCK_JOB_INFO, true); + qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } + if (ret == 1 && info.cur == info.end && + info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + disk->mirroring = true; + } + + if (!disk->mirroring) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' not ready for pivot yet"), + disk->dst); + goto cleanup; + } + + /* Attempt the pivot. */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveReopen(priv->mon, device, disk->mirror, + disk->mirrorFormat); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + /* XXX Until qemu adds support for 'drive-reopen' inside + * 'transaction', we have the remote risk of a catastrophic + * failure, where the drive-reopen fails but can't recover by + * reopening the source. Not much we can do about it. */ + + if (ret == 0) { + /* XXX We want to revoke security labels and disk lease, as + * well as audit that revocation, before dropping the original + * source. But it gets tricky if both source and mirror share + * common backing files (we want to only revoke the non-shared + * portion of the chain, and is made more difficult by the + * fact that we aren't tracking the full chain ourselves; so + * for now, we leak the access to the original. */ + VIR_FREE(disk->src); + VIR_FREE(disk->driverType); + disk->src = disk->mirror; + disk->driverType = disk->mirrorFormat; + disk->mirror = NULL; + disk->mirrorFormat = NULL; + disk->mirroring = false; + } else { + /* On failure, qemu abandons the mirror, and attempts to + * revert back to the source disk. Hopefully it was able to + * reopen things. */ + /* XXX should we be parsing the exact qemu error, or calling + * 'query-block', to see what state we really got left in + * before killing the mirroring job? And just as on the + * success case, there's security labeling to worry about. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } + +cleanup: + return ret; +} + static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, virDomainBlockJobInfoPtr info, @@ -11689,6 +11769,14 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, disk->dst); goto cleanup; } + if (mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) && + !(async && disk->mirror)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + _("pivot of disk '%s' requires an active copy job"), + disk->dst); + goto cleanup; + } if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; @@ -11699,6 +11787,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto endjob; } + if (disk->mirror && mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + ret = qemuDomainBlockPivot(driver, vm, device, disk); + goto endjob; + } + qemuDomainObjEnterMonitorWithDriver(driver, vm); /* XXX - libvirt should really be tracking the backing file chain * itself, and validating that base is on the chain, rather than @@ -11718,6 +11812,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) disk->mirroring = true; + /* A successful block job cancelation stops any mirroring. */ + if (mode == BLOCK_JOB_ABORT && disk->mirror) { + /* XXX We should also revoke security labels and disk lease on + * the mirror, and audit that fact, before dropping things. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } + /* With synchronous block cancel, we must synthesize an event, and * we silently ignore the ABORT_ASYNC flag. With asynchronous * block cancel, the event will come from qemu, but without the @@ -11782,7 +11885,8 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC | + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT, -1); return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT, flags); } -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:06:01 -0600, Eric Blake wrote:
This is the bare minimum to end a copy job (of course, until a later patch adds the ability to start a copy job, this patch doesn't do much in isolation; I've just split the patches to ease the review).
This patch intentionally avoids SELinux, lock manager, and audit actions. Also, if libvirtd restarts at the exact moment that a 'drive-reopen' is in flight, the only proper way to detect the outcome of that 'drive-reopen' would be to first pass in a witness fd with 'getfd', then at libvirtd restart, probe whether that file is still empty. This patch is enough to test the common case of success when used correctly, while saving the subtleties of proper cleanup for worst-case errors for later.
When a mirror job is started, cancelling the job safely reverts back to the source disk, regardless of whether the destination is in phase 1 (streaming, in which case the destination is worthless) or phase 2 (mirroring, in which case the destination is synced up to the source at the time of the cancel). Our existing code does just fine in either phase, other than some bookkeeping cleanup.
Pivoting the job requires the use of the new 'drive-reopen' command. Here, failure of the command is potentially catastrophic to the domain, since the initial qemu implementation rips out the old disk before attempting to open the new one; qemu will attempt a recovery path of retrying the reopen on the original source, but if that also fails, the domain is hosed, with nothing libvirt can do about it. If qemu 1.2 ever adds 'drive-reopen' inside 'transaction', then the problem will no longer exist (a transaction promises not to close the old file until after the new file is proven to work), at which point we would add a VIR_DOMAIN_REBASE_COPY_ATOMIC that fails up front if we detect an older qemu with the risky drive-reopen.
Interesting side note: while snapshot-create --disk-only creates a copy of the disk at a point in time by moving the domain on to a new file (the copy is the file now in the just-extended backing chain), blockjob --abort of a copy job creates a copy of the disk while keeping the domain on the original file. There may be potential improvements to the snapshot code to exploit block copy over multiple disks all at one point in time. And, if 'block-job-cancel' were made part of 'transaction', you could copy multiple disks at the same point in time without pausing the domain. This also implies we may want to add a --quiesce flag to the pivot operation, so that when breaking a mirror, the side of the mirror that we are abandoning is at least in a stable state with regards to guest I/O.
* src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag. (qemuDomainBlockPivot): New helper function. (qemuDomainBlockJobImpl): Implement it. ---
was 11/18 in v4 v5: no real change, improve commit message
src/qemu/qemu_driver.c | 106 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 105 insertions(+), 1 deletions(-)
OK Jirka

Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. --- was 12/18 in v4 v5: address review comments, add comment in code src/qemu/qemu_driver.c | 122 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 120 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8b6c984..5c3cea8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11910,10 +11910,126 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, } static int +qemuDomainBlockCopy(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *device = NULL; + virDomainDiskDefPtr disk; + int ret = -1; + int idx; + + /* Preliminaries: find the disk we are editing, sanity checks */ + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + + qemuDriverLock(driver); + virUUIDFormat(dom->uuid, uuidstr); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + if (!vm) { + qemuReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + + device = qemuDiskPathToAlias(vm, path, &idx); + if (!device) { + goto cleanup; + } + disk = vm->def->disks[idx]; + if (disk->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + + priv = vm->privateData; + if (!(qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKJOB_ASYNC))) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("block copy is not supported with this QEMU binary")); + goto cleanup; + } + if (vm->persistent) { + /* XXX if qemu ever lets us start a new domain with mirroring + * already active, we can relax this; but for now, the risk of + * 'managedsave' due to libvirt-guests means we can't risk + * this on persistent domains. */ + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not transient")); + goto cleanup; + } + + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto endjob; + } + + /* Prepare the destination file. */ + /* XXX We also need to add security labeling, lock manager lease, + * and auditing of those events, as well as to support reuse of + * existing images, including probing the existing format of an + * existing image. */ + if (!format) + format = disk->driverType; + if ((format && !(disk->mirrorFormat = strdup(format))) || + !(disk->mirror = strdup(dest))) { + virReportOOMError(); + goto endjob; + } + + /* Actually start the mirroring */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, flags); + if (ret == 0 && bandwidth != 0) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL, true); + qemuDomainObjExitMonitorWithDriver(driver, vm); + +endjob: + if (ret < 0) { + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + } + if (qemuDomainObjEndJob(driver, vm) == 0) { + vm = NULL; + goto cleanup; + } + +cleanup: + VIR_FREE(device); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + +static int qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); + + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + const char *format = NULL; + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) + format = "raw"; + flags &= ~(VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW); + return qemuDomainBlockCopy(dom, path, base, format, bandwidth, flags); + } + return qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, BLOCK_JOB_PULL, flags); } @@ -11922,7 +12038,9 @@ static int qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { - return qemuDomainBlockRebase(dom, path, NULL, bandwidth, flags); + virCheckFlags(0, -1); + return qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, + BLOCK_JOB_PULL, flags); } static int -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:06:02 -0600, Eric Blake wrote:
Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. ---
was 12/18 in v4 v5: address review comments, add comment in code
OK Jirka

In qemu, it is possible to call 'migrate_set_speed' prior to 'migrate', and therefore ensure a constant throttling through the entire migration. However, this is not possible with 'block-job-set-speed', which fails if a job is not already active. This means that you can't detect a device that doesn't support throttling until after you have already started a block job and let an unknown amount of unthrottled data through. Aborting the job upon failure to set the speed seems a bit harsh, since it would have been nicer to prevent the job from starting in the first place, rather than letting an unknown amount of data be processed before detecting the speed failure. So I propose relaxing the documentation, and explicitly mentioning that setting the speed is a best-effort attempt that might be ignored. On the other hand, I've also requested that qemu consider adding an optional parameter to allow setting the speed at the creation of a block job. * src/libvirt.c (virDomainBlockPull, virDomainBlockRebase): Update the documentation. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl) (qemuDomainBlockCopy): Log but don't fail when speed change fails. --- v5: new patch; see also this qemu request: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg02185.html src/libvirt.c | 12 ++++++++++-- src/qemu/qemu_driver.c | 18 ++++++++++++------ 2 files changed, 22 insertions(+), 8 deletions(-) diff --git a/src/libvirt.c b/src/libvirt.c index af42d3b..17c7576 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18145,7 +18145,11 @@ error: * The maximum bandwidth (in Mbps) that will be used to do the copy can be * specified with the bandwidth parameter. If set to 0, libvirt will choose a * suitable default. Some hypervisors do not support this feature and will - * return an error if bandwidth is not 0. + * return an error if bandwidth is not 0; others will make a best effort + * attempt, but silently ignore failure to set the speed. The actual speed + * can be determined with virDomainGetBlockJobInfo(), and setting speed via + * virDomainBlockJobSetSpeed() allows more control, at the expense of a race + * condition where the job may end before the speed can be set or queried. * * This is shorthand for virDomainBlockRebase() with a NULL base. * @@ -18263,7 +18267,11 @@ error: * The maximum bandwidth (in Mbps) that will be used to do the copy can be * specified with the bandwidth parameter. If set to 0, libvirt will choose a * suitable default. Some hypervisors do not support this feature and will - * return an error if bandwidth is not 0. + * return an error if bandwidth is not 0; others will make a best effort + * attempt, but silently ignore failure to set the speed. The actual speed + * can be determined with virDomainGetBlockJobInfo(), and setting speed via + * virDomainBlockJobSetSpeed() allows more control, at the expense of a race + * condition where the job may end before the speed can be set or queried. * * When @base is NULL and @flags is 0, this is identical to * virDomainBlockPull(). diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5c3cea8..aec5c4c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11799,9 +11799,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, * relying on qemu to do this. */ ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode, async); - if (ret == 0 && mode == BLOCK_JOB_PULL && bandwidth) - ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED_INTERNAL, async); + if (ret == 0 && mode == BLOCK_JOB_PULL && bandwidth) { + /* Setting speed now is best-effort. */ + if (qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL, async) < 0) + VIR_WARN("failed to set bandwidth for disk %s", disk->dst); + } qemuDomainObjExitMonitorWithDriver(driver, vm); if (ret < 0) goto endjob; @@ -11990,9 +11993,12 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, flags); - if (ret == 0 && bandwidth != 0) - ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED_INTERNAL, true); + if (ret == 0 && bandwidth != 0) { + /* Setting speed now is best-effort. */ + if (qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL, true) < 0) + VIR_WARN("failed to set bandwidth for disk %s", disk->dst); + } qemuDomainObjExitMonitorWithDriver(driver, vm); endjob: -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:06:03 -0600, Eric Blake wrote:
In qemu, it is possible to call 'migrate_set_speed' prior to 'migrate', and therefore ensure a constant throttling through the entire migration. However, this is not possible with 'block-job-set-speed', which fails if a job is not already active. This means that you can't detect a device that doesn't support throttling until after you have already started a block job and let an unknown amount of unthrottled data through. Aborting the job upon failure to set the speed seems a bit harsh, since it would have been nicer to prevent the job from starting in the first place, rather than letting an unknown amount of data be processed before detecting the speed failure. So I propose relaxing the documentation, and explicitly mentioning that setting the speed is a best-effort attempt that might be ignored. On the other hand, I've also requested that qemu consider adding an optional parameter to allow setting the speed at the creation of a block job.
* src/libvirt.c (virDomainBlockPull, virDomainBlockRebase): Update the documentation. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl) (qemuDomainBlockCopy): Log but don't fail when speed change fails. ---
v5: new patch; see also this qemu request: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg02185.html
src/libvirt.c | 12 ++++++++++-- src/qemu/qemu_driver.c | 18 ++++++++++++------ 2 files changed, 22 insertions(+), 8 deletions(-)
OK Jirka

On 04/19/2012 03:00 PM, Jiri Denemark wrote:
On Mon, Apr 16, 2012 at 23:06:03 -0600, Eric Blake wrote:
In qemu, it is possible to call 'migrate_set_speed' prior to 'migrate', and therefore ensure a constant throttling through the entire migration. However, this is not possible with 'block-job-set-speed', which fails if a job is not already active. This means that you can't detect a device that doesn't support throttling until after you have already started a block job and let an unknown amount of unthrottled data through. Aborting the job upon failure to set the speed seems a bit harsh, since it would have been nicer to prevent the job from starting in the first place, rather than letting an unknown amount of data be processed before detecting the speed failure. So I propose relaxing the documentation, and explicitly mentioning that setting the speed is a best-effort attempt that might be ignored. On the other hand, I've also requested that qemu consider adding an optional parameter to allow setting the speed at the creation of a block job.
* src/libvirt.c (virDomainBlockPull, virDomainBlockRebase): Update the documentation. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl) (qemuDomainBlockCopy): Log but don't fail when speed change fails. ---
v5: new patch; see also this qemu request: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg02185.html
src/libvirt.c | 12 ++++++++++-- src/qemu/qemu_driver.c | 18 ++++++++++++------ 2 files changed, 22 insertions(+), 8 deletions(-)
OK
I'm hoping to actually drop this patch, and replace it with an alternative that calls set-block-job-speed _before_ block-stream or drive-mirror; but that depends on someone actually writing the qemu patch outlined in the thread here: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg02273.html At any rate, this serves as a good argument for why the qemu side of this patch series isn't quite ready to push to upstream libvirt yet. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This copies some of the checks from snapshots regarding testing when a file already exists. In the process, I noticed snapshots had hard-to-read logic, as well as a missing sanity check: REUSE_EXT should require the destination to already be present. * src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT flag. (qemuDomainBlockCopy): Wire up flag, and add some sanity checks. (qemuDomainSnapshotDiskPrepare): Require destination on REUSE_EXT. --- was 13/18 in v4 v5: improve logic readability src/qemu/qemu_driver.c | 50 ++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 44 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index aec5c4c..3e8418b 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9829,8 +9829,13 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, _("unable to stat for disk %s: %s"), disk->name, disk->file); goto cleanup; + } else if (allow_reuse) { + virReportSystemError(errno, + _("missing existing file for disk %s: %s"), + disk->name, disk->file); + goto cleanup; } - } else if (!(S_ISBLK(st.st_mode) || !st.st_size || allow_reuse)) { + } else if (!S_ISBLK(st.st_mode) && st.st_size && !allow_reuse) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("external snapshot file for disk %s already " "exists and is not a block device: %s"), @@ -11925,9 +11930,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, virDomainDiskDefPtr disk; int ret = -1; int idx; + struct stat st; /* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1); qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11977,12 +11984,42 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } + /* XXX this is pessimistic; we could use 'query-block' or even + * keep track of the backing chain ourselves, rather than assuming + * that all non-raw source files have a current backing image */ + if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + STREQ_NULLABLE(format, "raw") && + STRNEQ_NULLABLE(disk->driverType, "raw")) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("raw shallow copy of non-raw disk '%s' not possible"), + disk->dst); + goto endjob; + } + /* Prepare the destination file. */ + if (stat(dest, &st) < 0) { + if (errno != ENOENT) { + virReportSystemError(errno, _("unable to stat for disk %s: %s"), + disk->dst, dest); + goto endjob; + } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + virReportSystemError(errno, + _("missing destination file for disk %s: %s"), + disk->dst, dest); + goto endjob; + } + } else if (!S_ISBLK(st.st_mode) && st.st_size && + !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("external destination file for disk %s already " + "exists and is not a block device: %s"), + disk->dst, dest); + goto endjob; + } + /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events, as well as to support reuse of - * existing images, including probing the existing format of an - * existing image. */ - if (!format) + * and auditing of those events. */ + if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) format = disk->driverType; if ((format && !(disk->mirrorFormat = strdup(format))) || !(disk->mirror = strdup(dest))) { @@ -12024,6 +12061,7 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | VIR_DOMAIN_BLOCK_REBASE_COPY | VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:06:04 -0600, Eric Blake wrote:
This copies some of the checks from snapshots regarding testing when a file already exists. In the process, I noticed snapshots had hard-to-read logic, as well as a missing sanity check: REUSE_EXT should require the destination to already be present.
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT flag. (qemuDomainBlockCopy): Wire up flag, and add some sanity checks. (qemuDomainSnapshotDiskPrepare): Require destination on REUSE_EXT. ---
was 13/18 in v4 v5: improve logic readability
src/qemu/qemu_driver.c | 50 ++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 44 insertions(+), 6 deletions(-)
OK Jirka

This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label on failure or at the end of the mirroring is a trickier prospect (we would have to know the backing chain of both source and destination, and be sure not to revoke rights to any part of the chain that is shared), so for now, virDomainBlockJobAbort still leaves things locked and labeled. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling. --- was 14/18 in v4 v5: remove label on failure of 'drive-mirror' src/qemu/qemu_driver.c | 69 +++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 60 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3e8418b..f197627 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11931,6 +11931,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, int ret = -1; int idx; struct stat st; + bool need_unlink = false; + char *mirror = NULL; + char *mirrorFormat = NULL; + char *origsrc = NULL; + char *origdriver = NULL; /* Preliminaries: find the disk we are editing, sanity checks */ virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | @@ -12017,19 +12022,41 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } - /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events. */ - if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) - format = disk->driverType; - if ((format && !(disk->mirrorFormat = strdup(format))) || - !(disk->mirror = strdup(dest))) { + if (!(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + int fd = qemuOpenFile(driver, dest, O_WRONLY | O_TRUNC | O_CREAT, + &need_unlink, NULL); + if (fd < 0) + goto endjob; + VIR_FORCE_CLOSE(fd); + if (!format) + format = disk->driverType; + } + if ((format && !(mirrorFormat = strdup(format))) || + !(mirror = strdup(dest))) { virReportOOMError(); goto endjob; } + /* Manipulate disk in place, in a way that can be reverted on + * failure, in order to set up labeling and locking. */ + origsrc = disk->src; + disk->src = (char *) dest; + origdriver = disk->driverType; + disk->driverType = (char *) "raw"; /* Don't want to probe backing files */ + + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto endjob; + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + } + /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, flags); + virDomainAuditDisk(vm, NULL, dest, "mirror", ret >= 0); if (ret == 0 && bandwidth != 0) { /* Setting speed now is best-effort. */ if (qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, @@ -12037,12 +12064,36 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, VIR_WARN("failed to set bandwidth for disk %s", disk->dst); } qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm->def, disk) < 0) + VIR_WARN("Unable to restore security label on %s", dest); + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + } + + disk->src = origsrc; + origsrc = NULL; + disk->driverType = origdriver; + origdriver = NULL; + + /* Update vm in place to match changes. */ + need_unlink = false; + disk->mirror = mirror; + disk->mirrorFormat = mirrorFormat; + mirror = NULL; + mirrorFormat = NULL; endjob: - if (ret < 0) { - VIR_FREE(disk->mirror); - VIR_FREE(disk->mirrorFormat); + if (origsrc) { + disk->src = origsrc; + disk->driverType = origdriver; } + if (need_unlink && unlink(dest)) + VIR_WARN("unable to unlink just-created %s", dest); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); if (qemuDomainObjEndJob(driver, vm) == 0) { vm = NULL; goto cleanup; -- 1.7.7.6

On Mon, Apr 16, 2012 at 23:06:05 -0600, Eric Blake wrote:
This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label on failure or at the end of the mirroring is a trickier prospect (we would have to know the backing chain of both source and destination, and be sure not to revoke rights to any part of the chain that is shared), so for now, virDomainBlockJobAbort still leaves things locked and labeled.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling. ---
was 14/18 in v4 v5: remove label on failure of 'drive-mirror'
src/qemu/qemu_driver.c | 69 +++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 60 insertions(+), 9 deletions(-)
OK Jirka

This new API provides additional flexibility over what can be crammed on top of virDomainBlockRebase (namely, the ability to specify an arbitrary destination format, for things like copying qcow2 into qed without having to pre-create the destination), at the expense that it cannot be backported without bumping the .so version. We rely on {REBASE,COPY}{_SHALLOW,_REUSE_EXT} having the same values, for easier sharing of code. * include/libvirt/libvirt.h.in (virDomainBlockCopy): New API. * src/libvirt.c (virDomainBlockCopy): Implement it. * src/libvirt_public.syms (LIBVIRT_0.9.12): Export it. * src/driver.h (virDrvDomainBlockCopy): New driver callback. * docs/apibuild.py (CParser.parseSignature): Add exception. --- docs/apibuild.py | 1 + include/libvirt/libvirt.h.in | 22 +++++++- src/driver.h | 5 ++ src/libvirt.c | 119 +++++++++++++++++++++++++++++++++++++++++- src/libvirt_public.syms | 5 ++ 5 files changed, 149 insertions(+), 3 deletions(-) diff --git a/docs/apibuild.py b/docs/apibuild.py index 1ac0281..bf06f3b 100755 --- a/docs/apibuild.py +++ b/docs/apibuild.py @@ -1650,6 +1650,7 @@ class CParser: "virDomainBlockJobSetSpeed" : (False, ("bandwidth")), "virDomainBlockPull" : (False, ("bandwidth")), "virDomainBlockRebase" : (False, ("bandwidth")), + "virDomainBlockCopy" : (False, ("bandwidth")), "virDomainMigrateGetMaxSpeed" : (False, ("bandwidth")) } def checkLongLegacyFunction(self, name, return_type, signature): diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index ac5df95..aaee218 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1937,7 +1937,7 @@ int virDomainUpdateDeviceFlags(virDomainPtr domain, * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull, or * virDomainBlockRebase without flags), job ends on completion * VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: Block Copy (virDomainBlockRebase with - * flags), job exists as long as mirroring is active + * flags, or virDomainBlockCopy), job exists as long as mirroring is active */ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN = 0, @@ -1995,6 +1995,8 @@ int virDomainBlockPull(virDomainPtr dom, const char *disk, * Flags available for virDomainBlockRebase(). */ typedef enum { + /* Note: the first two bits intentionally match values from + * virDomainBlockCopyFlags. */ VIR_DOMAIN_BLOCK_REBASE_SHALLOW = 1 << 0, /* Limit copy to top of source backing chain */ VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT = 1 << 1, /* Reuse existing external @@ -2007,6 +2009,24 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags); +/** + * virDomainBlockCopyFlags: + * + * Flags available for virDomainBlockCopy(). + */ +typedef enum { + /* Note: the first two bits intentionally match values from + * virDomainBlockRebaseFlags. */ + VIR_DOMAIN_BLOCK_COPY_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ +} virDomainBlockCopyFlags; + +int virDomainBlockCopy(virDomainPtr dom, const char *disk, const char *dest, + const char *format, unsigned long bandwidth, + unsigned int flags); + /* Block I/O throttling support */ diff --git a/src/driver.h b/src/driver.h index 03d249b..e10ba71 100644 --- a/src/driver.h +++ b/src/driver.h @@ -788,6 +788,10 @@ typedef int (*virDrvDomainBlockRebase)(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags); +typedef int + (*virDrvDomainBlockCopy)(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags); typedef int (*virDrvSetKeepAlive)(virConnectPtr conn, @@ -1005,6 +1009,7 @@ struct _virDriver { virDrvDomainBlockJobSetSpeed domainBlockJobSetSpeed; virDrvDomainBlockPull domainBlockPull; virDrvDomainBlockRebase domainBlockRebase; + virDrvDomainBlockCopy domainBlockCopy; virDrvSetKeepAlive setKeepAlive; virDrvConnectIsAlive isAlive; virDrvNodeSuspendForDuration nodeSuspendForDuration; diff --git a/src/libvirt.c b/src/libvirt.c index 17c7576..ad50815 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18274,7 +18274,11 @@ error: * condition where the job may end before the speed can be set or queried. * * When @base is NULL and @flags is 0, this is identical to - * virDomainBlockPull(). + * virDomainBlockPull(). Conversely, when @flags includes + * VIR_DOMAIN_BLOCK_REBASE_COPY, this is shorthand for + * virDomainBlockCopy(dom, disk, base, + * flags & VIR_DOMAIN_BLOCK_COPY_RAW ? "raw" : NULL, bandwidth, + * flags & (VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)). * * Returns 0 if the operation has started, -1 on failure. */ @@ -18284,7 +18288,7 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, { virConnectPtr conn; - VIR_DOMAIN_DEBUG(dom, "disk=%s, base=%s bandwidth=%lu, flags=%x", + VIR_DOMAIN_DEBUG(dom, "disk=%s, base=%s, bandwidth=%lu, flags=%x", disk, NULLSTR(base), bandwidth, flags); virResetLastError(); @@ -18339,6 +18343,117 @@ error: /** + * virDomainBlockCopy: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @dest: path to the copy destination + * @format: format of the destination + * @bandwidth: (optional) specify copy bandwidth limit in Mbps + * @flags: bitwise-OR of virDomainBlockCopyFlags + * + * Start a block copy job, where @dest is the name of a new file to copy + * the chain to. By default, the copy will pull the entire source chain + * into the destination file, but if @flags also contains + * VIR_DOMAIN_BLOCK_COPY_SHALLOW, then only the top of the source chain + * will be copied (the source and destination have a common backing file). + * By default, @dest will be created with the same file format as the + * source, but this can be altered by passing a non-NULL @format argument, + * or by using VIR_DOMAIN_BLOCK_COPY_REUSE_EXT to reuse an existing file + * with initial contents identical to the backing file of the source (this + * allows a management app to pre-create files with relative backing file + * names, rather than the default of absolute backing file names; it is + * generally used with the shallow flag, since otherwise the destination + * file must start with empty contents). + * + * A copy job has two parts; in the first phase, the @bandwidth parameter + * affects how fast the source is pulled into the destination, and the job + * can only be canceled by reverting to the source file; progress in this + * phase can be tracked via the virDomainBlockJobInfo() command, with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_COPY. The job transitions to the + * second phase when the job info states cur == end, and remains alive to + * mirror all further changes to both source and destination. The user + * must call virDomainBlockJobAbort() to end the mirroring while choosing + * whether to revert to source or pivot to the destination. An event is + * issued when the job ends, and in the future, an event may be added when + * the job transitions from pulling to mirroring. + * + * Some hypervisors will restrict certain actions, such as virDomainSave() + * or virDomainDetachDevice(), while a copy job is active; they may + * also restrict a copy job to transient domains. + * + * The @disk parameter is either an unambiguous source name of the + * block device (the <source file='...'/> sub-element, such as + * "/path/to/image"), or the device target shorthand (the + * <target dev='...'/> sub-element, such as "xvda"). Valid names + * can be found by calling virDomainGetXMLDesc() and inspecting + * elements within //domain/devices/disk. + * + * The maximum bandwidth (in Mbps) that will be used to do the copy can be + * specified with the bandwidth parameter. If set to 0, libvirt will choose a + * suitable default. Some hypervisors do not support this feature and will + * return an error if bandwidth is not 0. + * + * When @format is NULL, this is equivalent to calling + * virDomainBlockRebase() with the VIR_DOMAIN_BLOCK_REBASE_COPY flag added + * to @flags. Additionally, if @format is "raw", this is equivalent to + * calling virDomainBlockRebase() with the VIR_DOMAIN_BLOCK_REBASE_COPY + * and VIR_DOMAIN_BLOCK_REBASE_COPY_RAW flags added to @flags. + * + * Returns 0 if the operation has started, -1 on failure. + */ +int virDomainBlockCopy(virDomainPtr dom, const char *disk, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(dom, + "disk=%s, dest=%s, format=%s, bandwidth=%lu, flags=%x", + disk, dest, NULLSTR(format), bandwidth, flags); + + virResetLastError(); + + if (!VIR_IS_CONNECTED_DOMAIN (dom)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + conn = dom->conn; + + if (dom->conn->flags & VIR_CONNECT_RO) { + virLibDomainError(VIR_ERR_OPERATION_DENIED, __FUNCTION__); + goto error; + } + + if (!disk) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("disk is NULL")); + goto error; + } + if (!dest) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("dest is NULL")); + goto error; + } + + if (conn->driver->domainBlockCopy) { + int ret; + ret = conn->driver->domainBlockCopy(dom, disk, dest, format, bandwidth, + flags); + if (ret < 0) + goto error; + return ret; + } + + virLibDomainError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(dom->conn); + return -1; +} + + +/** * virDomainOpenGraphics: * @dom: pointer to domain object * @idx: index of graphics config to open diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 46c13fb..d152ab9 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -534,4 +534,9 @@ LIBVIRT_0.9.11 { virDomainPMWakeup; } LIBVIRT_0.9.10; +LIBVIRT_0.9.12 { + global: + virDomainBlockCopy; +} LIBVIRT_0.9.11; + # .... define new API here using predicted next version number .... -- 1.7.7.6

Expose the full abilities of virDomainBlockCopy. * tools/virsh.c (blockJobImpl): Add --format option for block copy. * tools/virsh.pod (blockcopy): Document this. --- tools/virsh.c | 26 ++++++++++++++++++++------ tools/virsh.pod | 10 +++++----- 2 files changed, 25 insertions(+), 11 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 0f79022..e521933 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7530,6 +7530,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, unsigned long bandwidth = 0; int ret = -1; const char *base = NULL; + const char *format = NULL; unsigned int flags = 0; if (!vshConnectionUsability(ctl, ctl->conn)) @@ -7569,16 +7570,28 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, ret = virDomainBlockPull(dom, path, bandwidth, 0); break; case VSH_CMD_BLOCK_JOB_COPY: - flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; if (vshCommandOptBool(cmd, "shallow")) - flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + flags |= VIR_DOMAIN_BLOCK_COPY_SHALLOW; if (vshCommandOptBool(cmd, "reuse-external")) - flags |= VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; - if (vshCommandOptBool(cmd, "raw")) - flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + flags |= VIR_DOMAIN_BLOCK_COPY_REUSE_EXT; if (vshCommandOptString(cmd, "dest", &base) < 0) goto cleanup; - ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); + if (vshCommandOptString(cmd, "format", &format) < 0) + goto cleanup; + if (!format) { + if (vshCommandOptBool(cmd, "raw")) + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; + ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); + } else { + if (vshCommandOptBool(cmd, "raw")) { + vshError(ctl, "%s", + _("--raw and --format are mutually exclusive")); + goto cleanup; + } + ret = virDomainBlockCopy(dom, path, base, format, + bandwidth, flags); + } } cleanup: @@ -7603,6 +7616,7 @@ static const vshCmdOptDef opts_block_copy[] = { {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("fully-qualified path of disk")}, {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("bandwidth limit in MB/s")}, + {"format", VSH_OT_DATA, VSH_OFLAG_NONE, N_("file format of dest")}, {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, diff --git a/tools/virsh.pod b/tools/virsh.pod index 23324b2..0e7617b 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -640,7 +640,7 @@ address of virtual interface (such as I<detach-interface> or I<domif-setlink>) will accept the MAC address printed by this command. =item B<blockcopy> I<domain> I<path> I<dest> [I<bandwidth>] [I<--shallow>] -[I<--reuse-external>] [I<--raw>] [I<--wait> [I<--verbose] +[I<--reuse-external>] { [I<--raw>] | I<format> } [I<--wait> [I<--verbose] [{I<--pivot> | I<--finish>}] [I<--timeout> B<seconds>] [I<--async>]] Copy a disk backing image chain to I<dest>. By default, this command @@ -654,10 +654,10 @@ is used, otherwise it must start empty); this option is typically used to set up a relative backing file name in the destination. The format of the destination is determined by the first match in the -following list: if I<--raw> is specified, it will be raw; if -I<--reuse-external> is specified, the existing destination is probed -for a format; and in all other cases, the destination format will -match the source format. +following list: if I<--raw> is specified, it will be raw; if I<format> +is specified, it will have that format; if I<--reuse-external> is +specified, the existing destination is probed for a format; and in +all other cases, the destination format will match the source format. By default, the copy job runs in the background, and consists of two phases. Initially, the job must copy all data from the source, and -- 1.7.7.6

Almost trivial; the trick was dealing with the fact that we're stuck with 'unsigned long bandwidth' due to earlier design decisions. Also, prefer the correct flag names (although we intentionally chose the _SHALLOW and _REUSE_EXT values to be equal, to allow for loose handling in our code). * src/remote/remote_protocol.x (remote_domain_block_copy_args): New struct. * src/remote/remote_driver.c (remote_driver): Use it. * src/rpc/gendispatch.pl (name_to_ProcName): Cater to legacy bandwidth type. * src/remote_protocol-structs: Regenerate. * src/qemu/qemu_driver.c (qemuDriver): Wire up driver callback. (qemuDomainBlockCopy): Use preferred flag names. --- src/qemu/qemu_driver.c | 13 +++++++------ src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +++++++++++- src/remote_protocol-structs | 9 +++++++++ src/rpc/gendispatch.pl | 1 + 5 files changed, 29 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f197627..a27f6c2 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11938,8 +11938,8 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, char *origdriver = NULL; /* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | - VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_COPY_SHALLOW | + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT, -1); qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11992,7 +11992,7 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, /* XXX this is pessimistic; we could use 'query-block' or even * keep track of the backing chain ourselves, rather than assuming * that all non-raw source files have a current backing image */ - if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + if ((flags & VIR_DOMAIN_BLOCK_COPY_SHALLOW) && STREQ_NULLABLE(format, "raw") && STRNEQ_NULLABLE(disk->driverType, "raw")) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -12007,14 +12007,14 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, virReportSystemError(errno, _("unable to stat for disk %s: %s"), disk->dst, dest); goto endjob; - } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + } else if (flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT) { virReportSystemError(errno, _("missing destination file for disk %s: %s"), disk->dst, dest); goto endjob; } } else if (!S_ISBLK(st.st_mode) && st.st_size && - !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + !(flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT)) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("external destination file for disk %s already " "exists and is not a block device: %s"), @@ -12022,7 +12022,7 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } - if (!(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + if (!(flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT)) { int fd = qemuOpenFile(driver, dest, O_WRONLY | O_TRUNC | O_CREAT, &need_unlink, NULL); if (fd < 0) @@ -13232,6 +13232,7 @@ static virDriver qemuDriver = { .domainBlockJobSetSpeed = qemuDomainBlockJobSetSpeed, /* 0.9.4 */ .domainBlockPull = qemuDomainBlockPull, /* 0.9.4 */ .domainBlockRebase = qemuDomainBlockRebase, /* 0.9.10 */ + .domainBlockCopy = qemuDomainBlockCopy, /* 0.9.12 */ .isAlive = qemuIsAlive, /* 0.9.8 */ .nodeSuspendForDuration = nodeSuspendForDuration, /* 0.9.8 */ .domainSetBlockIoTune = qemuDomainSetBlockIoTune, /* 0.9.8 */ diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index af46384..b431779 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -5095,6 +5095,7 @@ static virDriver remote_driver = { .domainBlockJobSetSpeed = remoteDomainBlockJobSetSpeed, /* 0.9.4 */ .domainBlockPull = remoteDomainBlockPull, /* 0.9.4 */ .domainBlockRebase = remoteDomainBlockRebase, /* 0.9.10 */ + .domainBlockCopy = remoteDomainBlockCopy, /* 0.9.12 */ .setKeepAlive = remoteSetKeepAlive, /* 0.9.8 */ .isAlive = remoteIsAlive, /* 0.9.8 */ .nodeSuspendForDuration = remoteNodeSuspendForDuration, /* 0.9.8 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 2d57247..65b4a1d 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -1188,6 +1188,14 @@ struct remote_domain_block_rebase_args { unsigned hyper bandwidth; unsigned int flags; }; +struct remote_domain_block_copy_args { + remote_nonnull_domain dom; + remote_nonnull_string path; + remote_nonnull_string dest; + remote_string format; + unsigned hyper bandwidth; + unsigned int flags; +}; struct remote_domain_set_block_io_tune_args { remote_nonnull_domain dom; @@ -2782,7 +2790,9 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_PM_WAKEUP = 267, /* autogen autogen */ REMOTE_PROC_DOMAIN_EVENT_TRAY_CHANGE = 268, /* autogen autogen */ REMOTE_PROC_DOMAIN_EVENT_PMWAKEUP = 269, /* autogen autogen */ - REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270 /* autogen autogen */ + REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270, /* autogen autogen */ + + REMOTE_PROC_DOMAIN_BLOCK_COPY = 271 /* autogen autogen */ /* * Notice how the entries are grouped in sets of 10 ? diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 9b2414f..dd2aa72 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -845,6 +845,14 @@ struct remote_domain_block_rebase_args { uint64_t bandwidth; u_int flags; }; +struct remote_domain_block_copy_args { + remote_nonnull_domain dom; + remote_nonnull_string path; + remote_nonnull_string dest; + remote_string format; + uint64_t bandwidth; + u_int flags; +}; struct remote_domain_set_block_io_tune_args { remote_nonnull_domain dom; remote_nonnull_string disk; @@ -2192,4 +2200,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_EVENT_TRAY_CHANGE = 268, REMOTE_PROC_DOMAIN_EVENT_PMWAKEUP = 269, REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270, + REMOTE_PROC_DOMAIN_BLOCK_COPY = 271, }; diff --git a/src/rpc/gendispatch.pl b/src/rpc/gendispatch.pl index f161ee0..e5e28e0 100755 --- a/src/rpc/gendispatch.pl +++ b/src/rpc/gendispatch.pl @@ -232,6 +232,7 @@ my $long_legacy = { NodeGetInfo => { ret => { memory => 1 } }, DomainBlockPull => { arg => { bandwidth => 1 } }, DomainBlockRebase => { arg => { bandwidth => 1 } }, + DomainBlockCopy => { arg => { bandwidth => 1 } }, DomainBlockJobSetSpeed => { arg => { bandwidth => 1 } }, DomainMigrateGetMaxSpeed => { ret => { bandwidth => 1 } }, }; -- 1.7.7.6

RHEL-only drive-mirror and drive-reopen are still under upstream qemu discussion; as a result, RHEL decided to backport things under a downstream name. Accommodate these differences. There are a few differences between the upstream proposal[1] and the RHEL build. The RHEL build uses the __com.redhat_ prefix to make the differences obvious. Upstream has a mandatory 'full' argument for 'drive-mirror', and 'drive-mirror' cannot be used as part of a transaction (that is, upstream only supports streaming block copy). Downstream treates 'full' as optional (defaulting to false), and can combine 'blockdev-snapshot-sync' and 'drive-mirror' in a single 'transaction' in order to support snapshot+mirror in addition to streaming block copy. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01630.html I don't think it's worth trying to support both spellings at once: if you build upstream libvirt on RHEL, you lose out on the ability to control the RHEL-specific block copy naming, but then you are also capable of building upstream qemu to get the upstream spelling of the same feature (that is, if you use RHEL, you should either stick to the distro patches across the entire virt stack, or you are assumed to be capable of building the entire virt stack yourself). * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Check for alternate spelling. (qemuMonitorJSONDriveReopen): Use that spelling. (qemuMonitorJSONDriveMirror): Likewise, and add an argument for snapshot+mirror. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror): Update. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror): Likewise. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Likewise. --- was 15/18 in v4 v5: a lot more invasive, now that qemu 1.1 has a firmer proposal that differs from RHEL's desire to still provide snapshot+mirror mode in contrast to streaming mirror mode. src/qemu/qemu_driver.c | 2 +- src/qemu/qemu_monitor.c | 9 +++--- src/qemu/qemu_monitor.h | 3 +- src/qemu/qemu_monitor_json.c | 57 +++++++++++++++++++++++++++++++----------- src/qemu/qemu_monitor_json.h | 3 +- 5 files changed, 52 insertions(+), 22 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index a27f6c2..dd49cb9 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12055,7 +12055,7 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); - ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, flags); + ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags); virDomainAuditDisk(vm, NULL, dest, "mirror", ret >= 0); if (ret == 0 && bandwidth != 0) { /* Setting speed now is best-effort. */ diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 34c7926..6a050e1 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2687,14 +2687,14 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, /* Add the drive-mirror action to a transaction. */ int -qemuMonitorDriveMirror(qemuMonitorPtr mon, +qemuMonitorDriveMirror(qemuMonitorPtr mon, virJSONValuePtr actions, const char *device, const char *file, const char *format, unsigned int flags) { int ret = -1; - VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, flags=%x", - mon, device, file, format, flags); + VIR_DEBUG("mon=%p, actions=%p, device=%s, file=%s, format=%s, flags=%x", + mon, actions, device, file, format, flags); if (!mon) { qemuReportError(VIR_ERR_INVALID_ARG, "%s", @@ -2703,7 +2703,8 @@ qemuMonitorDriveMirror(qemuMonitorPtr mon, } if (mon->json) - ret = qemuMonitorJSONDriveMirror(mon, device, file, format, flags); + ret = qemuMonitorJSONDriveMirror(mon, actions, device, file, format, + flags); else qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("drive-mirror requires JSON monitor")); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 1b4b130..c8a834b 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -511,11 +511,12 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); int qemuMonitorDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, const char *device, const char *file, const char *format, unsigned int flags) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); int qemuMonitorDriveReopen(qemuMonitorPtr mon, const char *device, const char *file, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 6bce701..8ff1501 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -989,9 +989,9 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, qemuCapsSet(qemuCaps, QEMU_CAPS_BLOCKJOB_SYNC); else if (STREQ(name, "block-job-cancel")) qemuCapsSet(qemuCaps, QEMU_CAPS_BLOCKJOB_ASYNC); - else if (STREQ(name, "drive-mirror")) + else if (STREQ(name, "__com.redhat_drive-mirror")) qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); - else if (STREQ(name, "drive-reopen")) + else if (STREQ(name, "__com.redhat_drive-reopen")) qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); } @@ -3205,7 +3205,7 @@ cleanup: } int -qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, virJSONValuePtr actions, const char *device, const char *file, const char *format, unsigned int flags) { @@ -3215,20 +3215,47 @@ qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, bool shallow = (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) != 0; bool reuse = (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) != 0; - cmd = qemuMonitorJSONMakeCommand("drive-mirror", - "s:device", device, - "s:target", file, - "b:full", !shallow, - "s:mode", - reuse ? "existing" : "absolute-paths", - format ? "s:format" : NULL, format, - NULL); + /* In the RHEL build, '__com.redhat_drive-mirror' is also a part + * of 'transaction', but when used there, the 'full' argument must + * not be given; as a result, the standalone command made 'full' + * optional, defaulting to false. */ + if (actions && !shallow) + return -1; + if (shallow) { + cmd = qemuMonitorJSONMakeCommandRaw(actions != NULL, + "__com.redhat_drive-mirror", + "s:device", device, + "s:target", file, + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + } else { + cmd = qemuMonitorJSONMakeCommand("__com.redhat_drive-mirror", + "s:device", device, + "s:target", file, + "b:full", true, + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + } + if (!cmd) return -1; - if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) - goto cleanup; - ret = qemuMonitorJSONCheckError(cmd, reply); + if (actions) { + if (virJSONValueArrayAppend(actions, cmd) < 0) { + virReportOOMError(); + } else { + cmd = NULL; + ret = 0; + } + } else { + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + ret = qemuMonitorJSONCheckError(cmd, reply); + } cleanup: virJSONValueFree(cmd); @@ -3272,7 +3299,7 @@ qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, virJSONValuePtr cmd; virJSONValuePtr reply = NULL; - cmd = qemuMonitorJSONMakeCommand("drive-reopen", + cmd = qemuMonitorJSONMakeCommand("__com.redhat_drive-reopen", "s:device", device, "s:new-image-file", file, format ? "s:format" : NULL, format, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index d93c37d..6306e99 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -236,11 +236,12 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, const char *device, const char *file, const char *format, unsigned int flags) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, const char *file, -- 1.7.7.6

This extends <domainsnapshot> XML to add a new element under each <disk> of a disk snapshot: <disk name='vda'> <source file='/path/to/live'/> <mirror file='/path/to/mirror'/> </disk> For now, if a <mirror> is requested, the snapshot must be external, and assumes the same driver format (qcow2 or qed) as the <source>. qemu allows more flexibility in mirror creation, which could possibly be added by further XML enhancements, but more likely would be better served by adding a new API to specifically target mirroring, as well as XML changes to represent the entire backing chain in <domain> rather than commandeering <domainsnapshot>. Meanwhile, the changes in this patch are sufficient for the oVirt use case of using a mirror for live storage migration. The new XML is parsed but rejected until later patches update the hypervisor drivers to act on mirror requests. The testsuite additions show that we can round-trip the XML. * docs/schemas/domainsnapshot.rng (disksnapshot): Add optional mirror. * docs/formatsnapshot.html.in: Document it. * src/conf/domain_conf.h (VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR): New flag. (_virDomainSnapshotDiskDef): Add new member. * src/conf/domain_conf.c (virDomainSnapshotDiskDefParseXML): Add parameter, to parse or reject mirrors. (virDomainSnapshotDefParseString): Honor new flag. (virDomainSnapshotDefFormat): Output any mirrors. (virDomainSnapshotDiskDefClear): Clean up. * tests/domainsnapshotxml2xmltest.c (mymain): Add a test. (testCompareXMLToXMLFiles): Allow for mirrors. * tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml: New file. * tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml: Likewise. --- docs/formatsnapshot.html.in | 31 ++++++++++++ docs/schemas/domainsnapshot.rng | 8 +++ src/conf/domain_conf.c | 38 ++++++++++++++- src/conf/domain_conf.h | 2 + .../disk_snapshot_mirror.xml | 13 +++++ .../disk_snapshot_mirror.xml | 49 ++++++++++++++++++++ tests/domainsnapshotxml2xmltest.c | 4 +- 7 files changed, 141 insertions(+), 4 deletions(-) create mode 100644 tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml create mode 100644 tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml diff --git a/docs/formatsnapshot.html.in b/docs/formatsnapshot.html.in index ec5ebf3..cd39f1b 100644 --- a/docs/formatsnapshot.html.in +++ b/docs/formatsnapshot.html.in @@ -90,6 +90,30 @@ another snapshot. </p> <p> + External disk snapshots have another use of allowing live + storage migration across storage domains, while still + guaranteeing that at all points in time along the storage + migration, there exists a single storage domain with a + consistent view of the guest's data. This is done by adding the + concept of snapshot mirroring, where a snapshot request + specifies not only the new file name on the original storage + domain, but a secondary mirror file name on the target storage + domain. While the snapshot is active, reads are still tied to + the first storage domain, but all writes of changes from the + backing file are recorded in both storage domains; and the + management application can synchronize the backing files across + storage domains in parallel with the guest execution. Once the + second storage domain has all data in place, the management + application then deletes the snapshot, which stops the mirroring + and lets the hypervisor swap over to the second storage domain, + so that the first storage domain is no longer in active use. + Depending on the hypervisor, the use of a mirrored snapshots may + be restricted to transient domains, and the existence of a + mirrored snapshot can prevent migration, domain saves, disk hot + plug actions, and further snapshot manipulation until the + mirrored snapshot is deleted in order to stop the mirroring. + </p> + <p> The top-level <code>domainsnapshot</code> element may contain the following elements: </p> @@ -156,6 +180,13 @@ snapshots, the original file name becomes the read-only snapshot, and the new file name contains the read-write delta of all disk changes since the snapshot. + <span class="since">Since 0.9.11</span>, an external + snapshot may also have an optional + element <code>mirror</code>, which names a second file + alongside <code>source</code> that will mirror all changes + since the snapshot and using the same file format. A + mirror must have the <code>file</code> attribute listing + the file name to use. </dd> </dl> </dd> diff --git a/docs/schemas/domainsnapshot.rng b/docs/schemas/domainsnapshot.rng index 0ef0631..86193a7 100644 --- a/docs/schemas/domainsnapshot.rng +++ b/docs/schemas/domainsnapshot.rng @@ -121,6 +121,14 @@ <empty/> </element> </optional> + <optional> + <element name='mirror'> + <attribute name='file'> + <ref name='absFilePath'/> + </attribute> + <empty/> + </element> + </optional> </interleave> </group> </choice> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 1b3f34a..fd9fd2c 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -13393,6 +13393,7 @@ virDomainSnapshotDiskDefClear(virDomainSnapshotDiskDefPtr disk) VIR_FREE(disk->name); VIR_FREE(disk->file); VIR_FREE(disk->driverType); + VIR_FREE(disk->mirror); } void virDomainSnapshotDefFree(virDomainSnapshotDefPtr def) @@ -13414,11 +13415,13 @@ void virDomainSnapshotDefFree(virDomainSnapshotDefPtr def) static int virDomainSnapshotDiskDefParseXML(xmlNodePtr node, - virDomainSnapshotDiskDefPtr def) + virDomainSnapshotDiskDefPtr def, + bool allow_mirror) { int ret = -1; char *snapshot = NULL; xmlNodePtr cur; + xmlNodePtr mirror = NULL; def->name = virXMLPropString(node, "name"); if (!def->name) { @@ -13447,14 +13450,38 @@ virDomainSnapshotDiskDefParseXML(xmlNodePtr node, } else if (!def->driverType && xmlStrEqual(cur->name, BAD_CAST "driver")) { def->driverType = virXMLPropString(cur, "type"); + } else if (!mirror && xmlStrEqual(cur->name, BAD_CAST "mirror")) { + mirror = cur; } } cur = cur->next; } - if (!def->snapshot && (def->file || def->driverType)) + if (!def->snapshot && (def->file || def->driverType || mirror)) def->snapshot = VIR_DOMAIN_DISK_SNAPSHOT_EXTERNAL; + if (mirror) { + if (!allow_mirror) { + virDomainReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, + _("unable to handle mirroring for '%s'"), + def->name); + goto cleanup; + } + def->mirror = virXMLPropString(mirror, "file"); + if (!def->mirror) { + virDomainReportError(VIR_ERR_XML_ERROR, + _("mirror for '%s' requires file attribute"), + def->name); + goto cleanup; + } + if (def->snapshot != VIR_DOMAIN_DISK_SNAPSHOT_EXTERNAL) { + virDomainReportError(VIR_ERR_XML_ERROR, + _("mirror for '%s' requires external snapshot"), + def->name); + goto cleanup; + } + } + ret = 0; cleanup: VIR_FREE(snapshot); @@ -13484,6 +13511,7 @@ virDomainSnapshotDefParseString(const char *xmlStr, int active; char *tmp; int keepBlanksDefault = xmlKeepBlanksDefault(0); + bool allow_mirror = (flags & VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR) != 0; xml = virXMLParseCtxt(NULL, xmlStr, _("(domain_snapshot)"), &ctxt); if (!xml) { @@ -13587,7 +13615,8 @@ virDomainSnapshotDefParseString(const char *xmlStr, goto cleanup; } for (i = 0; i < def->ndisks; i++) { - if (virDomainSnapshotDiskDefParseXML(nodes[i], &def->disks[i]) < 0) + if (virDomainSnapshotDiskDefParseXML(nodes[i], &def->disks[i], + allow_mirror) < 0) goto cleanup; } VIR_FREE(nodes); @@ -13852,6 +13881,9 @@ char *virDomainSnapshotDefFormat(const char *domain_uuid, if (disk->file) virBufferEscapeString(&buf, " <source file='%s'/>\n", disk->file); + if (disk->mirror) + virBufferEscapeString(&buf, " <mirror file='%s'/>\n", + disk->mirror); virBufferAddLit(&buf, " </disk>\n"); } else { virBufferAddLit(&buf, "/>\n"); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 9d74f44..a25cd1a 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1667,6 +1667,7 @@ struct _virDomainSnapshotDiskDef { int snapshot; /* enum virDomainDiskSnapshot */ char *file; /* new source file when snapshot is external */ char *driverType; /* file format type of new file */ + char *mirror; /* external snapshot mirror, of same file format as file */ }; /* Stores the complete snapshot metadata */ @@ -1715,6 +1716,7 @@ typedef enum { VIR_DOMAIN_SNAPSHOT_PARSE_REDEFINE = 1 << 0, VIR_DOMAIN_SNAPSHOT_PARSE_DISKS = 1 << 1, VIR_DOMAIN_SNAPSHOT_PARSE_INTERNAL = 1 << 2, + VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR = 1 << 3, } virDomainSnapshotParseFlags; virDomainSnapshotDefPtr virDomainSnapshotDefParseString(const char *xmlStr, diff --git a/tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml b/tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml new file mode 100644 index 0000000..4085260 --- /dev/null +++ b/tests/domainsnapshotxml2xmlin/disk_snapshot_mirror.xml @@ -0,0 +1,13 @@ +<domainsnapshot> + <name>my snap name</name> + <description>!@#$%^</description> + <disks> + <disk name='hda' snapshot='external'> + <source file='/path/to/new'/> + <mirror file='/path/to/other'/> + </disk> + <disk name='hdb'> + <mirror file='/path/to/other2'/> + </disk> + </disks> +</domainsnapshot> diff --git a/tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml b/tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml new file mode 100644 index 0000000..c99fd5b --- /dev/null +++ b/tests/domainsnapshotxml2xmlout/disk_snapshot_mirror.xml @@ -0,0 +1,49 @@ +<domainsnapshot> + <name>my snap name</name> + <description>!@#$%^</description> + <state>disk-snapshot</state> + <creationTime>1272917631</creationTime> + <disks> + <disk name='hda' snapshot='external'> + <driver type='qcow2'/> + <source file='/path/to/new'/> + <mirror file='/path/to/other'/> + </disk> + <disk name='hdb' snapshot='external'> + <driver type='qcow2'/> + <source file='/path/to/generated'/> + <mirror file='/path/to/other2'/> + </disk> + </disks> + <domain type='qemu'> + <name>QEMUGuest1</name> + <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> + <memory unit='KiB'>219100</memory> + <currentMemory unit='KiB'>219100</currentMemory> + <vcpu placement='static' cpuset='1-4,8-20,525'>1</vcpu> + <os> + <type arch='i686' machine='pc'>hvm</type> + <boot dev='hd'/> + </os> + <clock offset='utc'/> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> + <emulator>/usr/bin/qemu</emulator> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest1'/> + <target dev='hda' bus='ide'/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <disk type='block' device='disk'> + <source dev='/dev/HostVG/QEMUGuest2'/> + <target dev='hdb' bus='ide'/> + <address type='drive' controller='0' bus='1' target='0' unit='0'/> + </disk> + <controller type='usb' index='0'/> + <controller type='ide' index='0'/> + <memballoon model='virtio'/> + </devices> + </domain> +</domainsnapshot> diff --git a/tests/domainsnapshotxml2xmltest.c b/tests/domainsnapshotxml2xmltest.c index e363c99..07ff553 100644 --- a/tests/domainsnapshotxml2xmltest.c +++ b/tests/domainsnapshotxml2xmltest.c @@ -26,7 +26,8 @@ testCompareXMLToXMLFiles(const char *inxml, const char *uuid, int internal) int ret = -1; virDomainSnapshotDefPtr def = NULL; unsigned int flags = (VIR_DOMAIN_SNAPSHOT_PARSE_REDEFINE | - VIR_DOMAIN_SNAPSHOT_PARSE_DISKS); + VIR_DOMAIN_SNAPSHOT_PARSE_DISKS | + VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR); if (virtTestLoadFile(inxml, &inXmlData) < 0) goto fail; @@ -105,6 +106,7 @@ mymain(void) DO_TEST("all_parameters", "9d37b878-a7cc-9f9a-b78f-49b3abad25a8", 1); DO_TEST("disk_snapshot", "c7a5fdbd-edaf-9455-926a-d65c16db1809", 1); + DO_TEST("disk_snapshot_mirror", "c7a5fdbd-edaf-9455-926a-d65c16db1809", 0); DO_TEST("full_domain", "c7a5fdbd-edaf-9455-926a-d65c16db1809", 1); DO_TEST("noparent_nodescription_noactive", NULL, 0); DO_TEST("noparent_nodescription", NULL, 1); -- 1.7.7.6

This completes the public API for using mirrored snapshots as a means of performing live storage migration. Of course, it will take additional patches to actually provide the implementation. The idea here is that oVirt can start with a domain with 'vda' open on /path1/to/old.qcow2 with a base file of /path1/to/common, then do the following steps for a live migration to /path2 (here using virsh commands, although the underlying API would be available to oVirt through other language bindings as well). First, pre-create two files; it is important that the mirror file have a relative backing file name (if we let qemu create the entire snapshot, the backing file name would be absolute to /path1, and pivoting to the mirror would still be using the original source): $ qemu-img create -f qcow2 -o backing_file=old.qcow2 \ -o backing_fmt=qcow2 /path1/to/old.migrate $ qemu-img create -f qcow2 -o backing_file=old.qcow2 \ -o backing_fmt=qcow2 /path2/to/new.qcow2 Next, create a mirrored snapshot, while telling qemu to respect the pre-existing qcow2 header in both files: $ virsh snapshot-create-as $dom migrate --reuse-external \ --diskspec vda,file=/path1/to/old.migrate,mirror=/path2/to/new.qcow2 which means 'vda' is now open read-write on /path1/to/old.migrate and mirrored (write only) on /path2/to/new.qcow2, where qemu is using /path1/to/old.qcow2 as the logial backing for both files, but where the relative name is still intact in /path2/to/new.qcow2. Next, the backing files can be copied between locations: $ cp /path1/to/common /path1/to/old.qcow2 /path2/to When that is complete, the mirrored snapshot is no longer necessary, and requesting a pivot while deleting things will force qemu to reread the header of /path2/to/new.qcow2, notice a relative backing file name, and open /path2/to/old.qcow2 as the logical backing file: $ virsh snapshot-delete $dom migrate --mirror-pivot Now /path1 is no longer in use, but the backing chain on /path2 is longer than original. This can be cleaned up with: $ virsh blockpull $dom vda --base /path2/to/common to go back to having /path2/to/new.qcow2 directly backed by /path2/to/common. That smells like a hack, right? Well, there is a proposal for a better, more powerful API named virDomainBlockCopy: https://www.redhat.com/archives/libvir-list/2012-March/msg00585.html which would more appropriately expose the various knobs of the new qemu commands; but being a new API, it lacks the ability to be backported without causing a .so bump. So this is the compromise. * include/libvirt/libvirt.h.in (VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT) (VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT): New flags. * src/libvirt.c (virDomainSnapshotDelete): Document them. (virDomainSnapshotCreateXML, virDomainRevertToSnapshot) (virDomainSaveFlags): Mention effects of mirrored snapshots. * tools/virsh.c (vshParseSnapshotDiskspec): Add <mirror> support. (cmdSnapshotDelete): Add --mirror-abort, --mirror-pivot flags. * tools/virsh.pod (snapshot-delete, snapshot-create-as): Document new options. --- include/libvirt/libvirt.h.in | 6 ++++++ src/libvirt.c | 37 +++++++++++++++++++++++++++++++++++-- tools/virsh.c | 17 ++++++++++++++--- tools/virsh.pod | 21 ++++++++++++++------- 4 files changed, 69 insertions(+), 12 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index aaee218..7a4d416 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -3430,6 +3430,12 @@ typedef enum { VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN = (1 << 0), /* Also delete children */ VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY = (1 << 1), /* Delete just metadata */ VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY = (1 << 2), /* Delete just children */ + VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT = (1 << 3), /* Stop a mirrored + snapshot, reopening + to the source. */ + VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT = (1 << 4), /* Stop a mirrored + snapshot, reopening + to the mirror. */ } virDomainSnapshotDeleteFlags; int virDomainSnapshotDelete(virDomainSnapshotPtr snapshot, diff --git a/src/libvirt.c b/src/libvirt.c index ad50815..cd7469a 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2693,6 +2693,9 @@ error: * @flags will override what state gets saved into the file. These * two flags are mutually exclusive. * + * Some hypervisors may prevent this call if there is a current + * mirrored snapshot. + * * A save file can be inspected or modified slightly with * virDomainSaveImageGetXMLDesc() and virDomainSaveImageDefineXML(). * @@ -17089,12 +17092,14 @@ virDomainSnapshotGetConnect(virDomainSnapshotPtr snapshot) * not exist, the hypervisor may validate that reverting to the * snapshot appears to be possible (for example, disk images have * snapshot contents by the requested name). Not all hypervisors - * support these flags. + * support these flags; this might also be forbidden if there is a + * current mirrored snapshot. * * If @flags includes VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA, then the * domain's disk images are modified according to @xmlDesc, but then * the just-created snapshot has its metadata deleted. This flag is - * incompatible with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE. + * incompatible with VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE, and may + * also prevent any @xmlDesc that tries to create a mirrored snapshot. * * If @flags includes VIR_DOMAIN_SNAPSHOT_CREATE_HALT, then the domain * will be inactive after the snapshot completes, regardless of whether @@ -17704,6 +17709,9 @@ error: * inactive snapshots with a @flags request to start the domain after * the revert. * + * Some hypervisors may prevent reverting to snapshots if there is an + * active mirrored snapshot. + * * Returns 0 if the creation is successful, -1 on error. */ int @@ -17770,6 +17778,25 @@ error: * libvirt metadata to track snapshots, then this flag is silently * ignored. * + * If the domain has a current mirrored snapshot (that is, if the XML + * given to virDomainSnapshotCreateXML() included <mirror> elements for + * a disk), then the caller must specify how to end the mirroring before + * the snapshot can be deleted. In this situation, @snap must be the + * current mirrored snapshot, and @flags must contain either + * VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT to abandon the mirror while + * keeping the primary external file, or contain + * VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT to reopen the storage device + * using just the mirror, and abandon the primary external file. Note + * that while virDomainSnapshotCreateXML can create mirrors atomically, + * the deletion process might not be atomic; in this case, the operation + * will fail, but only after updating the snapshot object to record + * which mirrors were successfully reopened. If atomicity is important + * when using mirrored snapshots to perform live storage migration, it + * is recommended to migrate only one disk per snapshot. After using + * a mirrored snapshot to perform a storage migration, the caller may + * also want to use virDomainBlockRebase() to reduce the length of the + * backing chain that was lengthened by the snapshot process. + * * Returns 0 if the selected snapshot(s) were successfully deleted, * -1 on error. */ @@ -17803,6 +17830,12 @@ virDomainSnapshotDelete(virDomainSnapshotPtr snapshot, "mutually exclusive")); goto error; } + if ((flags & VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT) && + (flags & VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT)) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("delete mirror flags are mutually exclusive")); + goto error; + } if (conn->driver->domainSnapshotDelete) { int ret = conn->driver->domainSnapshotDelete(snapshot, flags); diff --git a/tools/virsh.c b/tools/virsh.c index e521933..d0cdba5 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -16110,6 +16110,7 @@ vshParseSnapshotDiskspec(vshControl *ctl, virBufferPtr buf, const char *str) char *snapshot = NULL; char *driver = NULL; char *file = NULL; + char *mirror = NULL; char *spec = vshStrdup(ctl, str); char *tmp = spec; size_t len = strlen(str); @@ -16133,6 +16134,8 @@ vshParseSnapshotDiskspec(vshControl *ctl, virBufferPtr buf, const char *str) driver = tmp + strlen("driver="); else if (!file && STRPREFIX(tmp, "file=")) file = tmp + strlen("file="); + else if (!mirror && STRPREFIX(tmp, "mirror=")) + mirror = tmp + strlen("mirror="); else goto cleanup; } @@ -16140,12 +16143,12 @@ vshParseSnapshotDiskspec(vshControl *ctl, virBufferPtr buf, const char *str) virBufferEscapeString(buf, " <disk name='%s'", name); if (snapshot) virBufferAsprintf(buf, " snapshot='%s'", snapshot); - if (driver || file) { + if (driver || file || mirror) { virBufferAddLit(buf, ">\n"); if (driver) virBufferAsprintf(buf, " <driver type='%s'/>\n", driver); - if (file) - virBufferEscapeString(buf, " <source file='%s'/>\n", file); + virBufferEscapeString(buf, " <source file='%s'/>\n", file); + virBufferEscapeString(buf, " <mirror file='%s'/>\n", mirror); virBufferAddLit(buf, " </disk>\n"); } else { virBufferAddLit(buf, "/>\n"); @@ -17182,6 +17185,10 @@ static const vshCmdOptDef opts_snapshot_delete[] = { {"children-only", VSH_OT_BOOL, 0, N_("delete children but not snapshot")}, {"metadata", VSH_OT_BOOL, 0, N_("delete only libvirt metadata, leaving snapshot contents behind")}, + {"mirror-abort", VSH_OT_BOOL, 0, + N_("abort a mirror while removing snapshot")}, + {"mirror-pivot", VSH_OT_BOOL, 0, + N_("pivot to the mirror before removing snapshot")}, {NULL, 0, 0, NULL} }; @@ -17211,6 +17218,10 @@ cmdSnapshotDelete(vshControl *ctl, const vshCmd *cmd) flags |= VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY; if (vshCommandOptBool(cmd, "metadata")) flags |= VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY; + if (vshCommandOptBool(cmd, "mirror-abort")) + flags |= VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT; + if (vshCommandOptBool(cmd, "mirror-pivot")) + flags |= VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT; /* XXX If we wanted, we could emulate DELETE_CHILDREN_ONLY even on * older servers that reject the flag, by manually computing the diff --git a/tools/virsh.pod b/tools/virsh.pod index 0e7617b..cb121a5 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -2430,12 +2430,12 @@ is specified, the snapshot will not include vm state. The I<--disk-only> flag is used to request a disk-only snapshot. When this flag is in use, the command can also take additional I<diskspec> arguments to add <disk> elements to the xml. Each <diskspec> is in the -form B<disk[,snapshot=type][,driver=type][,file=name]>. To include a -literal comma in B<disk> or in B<file=name>, escape it with a second -comma. A literal I<--diskspec> must preceed each B<diskspec> unless -all three of I<domain>, I<name>, and I<description> are also present. -For example, a diskspec of "vda,snapshot=external,file=/path/to,,new" -results in the following XML: +form B<disk[,snapshot=type][,driver=type][,file=name][,mirror=name]>. +To include a literal comma in B<disk>, B<file=name>, or B<mirror=name>, +escape it with a second comma. A literal I<--diskspec> must preceed +each B<diskspec> unless all three of I<domain>, I<name>, and +I<description> are also present. For example, a diskspec of +"vda,snapshot=external,file=/path/to,,new" results in the following XML: <disk name='vda' snapshot='external'> <source file='/path/to,new'/> </disk> @@ -2578,7 +2578,7 @@ with an inactive snapshot that is combined with the I<--start> or I<--pause> flag. =item B<snapshot-delete> I<domain> {I<snapshot> | I<--current>} [I<--metadata>] -[{I<--children> | I<--children-only>}] +[{I<--children> | I<--children-only>}] [{I<--mirror-abort> | I<--mirror-pivot>}] Delete the snapshot for the domain named I<snapshot>, or the current snapshot with I<--current>. If this snapshot @@ -2593,6 +2593,13 @@ maintained by libvirt, while leaving the snapshot contents intact for access by external tools; otherwise deleting a snapshot also removes the data contents from that point in time. +If the snapshot is mirrored (that is, the <domainsnapshot> XML used to +create the snapshot included an external <mirror> designation for use +in live migration of storage), then this command will fail unless the +current snapshot is being deleted, and either the I<--mirror-abort> or +I<--mirror-pivot> flag is present to control which half of the mirror +is kept active. + =back =head1 NWFILTER COMMMANDS -- 1.7.7.6

For now, disk migration via mirroring is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout. But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data. We can make life a lot easier by being cowards, and forbidding certain operations on a domain. This patch guarantees that there will be at most one snapshot with disk mirroring, that it will always be the current snapshot, and that the user cannot redefine that snapshot nor hot-plug or hot-unplug any domain disks - for now, the only way to delete such a snapshot is by destroying the transient domain it is attached to. A future patch will add the ability to also delete such snapshots under a condition of a flag which says how to end the mirroring with the 'drive-reopen' monitor command; and once that is in place, then we can finally implement creation of the mirroring via 'drive-mirror'. With just this patch applied, the only way you can ever get virDomainHasDiskMirror to return true is by manually playing with the libvirt internal directory and then restarting libvirtd. * src/conf/domain_conf.h (virDomainSnapshotHasDiskMirror) (virDomainHasDiskMirror): New prototypes. * src/conf/domain_conf.c (virDomainSnapshotHasDiskMirror) (virDomainHasDiskMirror): Implement them. * src/libvirt_private.syms (domain_conf.h): Export them. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainSnapshotDelete, qemuDomainAttachDeviceDiskLive) (qemuDomainDetachDeviceDiskLive): Use it. (qemuDomainSnapshotLoad): Allow libvirtd restarts with active disk mirroring, but sanity check for only a current image with mirrors. --- src/conf/domain_conf.c | 15 ++++++++++++++- src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 42 +++++++++++++++++++++++++++++++++++++++++- 4 files changed, 57 insertions(+), 2 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index fd9fd2c..30cea8d 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7194,7 +7194,9 @@ virDomainHasDiskMirror(virDomainObjPtr vm) for (i = 0; i < vm->def->ndisks; i++) if (vm->def->disks[i]->mirror) return true; - return false; + if (!vm->current_snapshot) + return false; + return virDomainSnapshotHasDiskMirror(vm->current_snapshot->def); } int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) @@ -13833,6 +13835,17 @@ cleanup: return ret; } +/* Determine if the given snapshot contains a disk mirror. */ +bool +virDomainSnapshotHasDiskMirror(virDomainSnapshotDefPtr snapshot) +{ + int i; + for (i = 0; i < snapshot->ndisks; i++) + if (snapshot->disks[i].mirror) + return true; + return false; +} + char *virDomainSnapshotDefFormat(const char *domain_uuid, virDomainSnapshotDefPtr def, unsigned int flags, diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index a25cd1a..e56d7c9 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1731,6 +1731,7 @@ char *virDomainSnapshotDefFormat(const char *domain_uuid, int virDomainSnapshotAlignDisks(virDomainSnapshotDefPtr snapshot, int default_snapshot, bool require_match); +bool virDomainSnapshotHasDiskMirror(virDomainSnapshotDefPtr snapshot); virDomainSnapshotObjPtr virDomainSnapshotAssignDef(virDomainSnapshotObjListPtr snapshots, const virDomainSnapshotDefPtr def); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 4ed2f0c..8ecd0eb 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -447,6 +447,7 @@ virDomainSnapshotDropParent; virDomainSnapshotFindByName; virDomainSnapshotForEachChild; virDomainSnapshotForEachDescendant; +virDomainSnapshotHasDiskMirror; virDomainSnapshotObjListGetNames; virDomainSnapshotObjListGetNamesFrom; virDomainSnapshotObjListNum; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index dd49cb9..5b1ed87 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -320,7 +320,8 @@ static void qemuDomainSnapshotLoad(void *payload, char ebuf[1024]; unsigned int flags = (VIR_DOMAIN_SNAPSHOT_PARSE_REDEFINE | VIR_DOMAIN_SNAPSHOT_PARSE_DISKS | - VIR_DOMAIN_SNAPSHOT_PARSE_INTERNAL); + VIR_DOMAIN_SNAPSHOT_PARSE_INTERNAL | + VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR); virDomainObjLock(vm); if (virAsprintf(&snapDir, "%s/%s", baseDir, vm->def->name) < 0) { @@ -373,6 +374,16 @@ static void qemuDomainSnapshotLoad(void *payload, VIR_FREE(xmlStr); continue; } + if (!def->current && virDomainSnapshotHasDiskMirror(def)) { + /* Someone must have hand-modified the directory; ignore them. */ + VIR_ERROR(_("Disk mirroring unexpected since snapshot file '%s' " + "does not claim to be the current snapshot"), + fullpath); + virDomainSnapshotDefFree(def); + VIR_FREE(fullpath); + VIR_FREE(xmlStr); + continue; + } snap = virDomainSnapshotAssignDef(&vm->snapshots, def); if (snap == NULL) { @@ -5117,6 +5128,12 @@ qemuDomainAttachDeviceDiskLive(virConnectPtr conn, virCgroupPtr cgroup = NULL; int ret = -1; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + goto end; + } + if (disk->driverName != NULL && !STREQ(disk->driverName, "qemu")) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("unsupported driver name '%s' for disk '%s'"), @@ -5267,6 +5284,12 @@ qemuDomainDetachDeviceDiskLive(struct qemud_driver *driver, virDomainDiskDefPtr disk = dev->data.disk; int ret = -1; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + return -1; + } + switch (disk->device) { case VIR_DOMAIN_DISK_DEVICE_DISK: case VIR_DOMAIN_DISK_DEVICE_LUN: @@ -10914,6 +10937,12 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + goto cleanup; + } + snap = virDomainSnapshotFindByName(&vm->snapshots, snapshot->name); if (!snap) { qemuReportError(VIR_ERR_NO_DOMAIN_SNAPSHOT, @@ -11288,6 +11317,17 @@ static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + if (snap != vm->current_snapshot) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain has active disk mirrors")); + goto cleanup; + } + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("deletion of active disk mirrors unimplemented")); + goto cleanup; + } + if (!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY)) { if (!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY) && snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT) -- 1.7.7.6

Delete a mirrored snapshot by picking which of the two files in the mirror to reopen. This is not atomic, so we update the snapshot in place as we iterate through each successful disk. Since we limited mirrored snapshots to transient domains, there is no persistent configuration to update. This mode implies deleting the snapshot metadata but preserving the new file. * src/qemu/qemu_driver.c (qemuDomainSnapshotDelete): Honor new flags. (qemuDomainSnapshotDeleteMirror): New helper function. --- src/qemu/qemu_driver.c | 85 +++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 81 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5b1ed87..4ce31a3 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11283,6 +11283,66 @@ qemuDomainSnapshotReparentChildren(void *payload, rep->driver->snapshotDir); } +/* Must be called while holding a job. */ +static int +qemuDomainSnapshotDeleteMirror(struct qemud_driver *driver, + virDomainObjPtr vm, + virDomainSnapshotObjPtr snap, + bool pivot) +{ + int i; + int ret = 0; + qemuDomainObjPrivatePtr priv = vm->privateData; + char *device = NULL; + const char *dest; + char *file = NULL; + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + "%s", _("domain is not running")); + return -1; + } + + qemuDomainObjEnterMonitorWithDriver(driver, vm); + + for (i = 0; i < snap->def->ndisks; i++) { + if (!snap->def->disks[i].mirror) + continue; + VIR_FREE(device); + VIR_FREE(file); + dest = pivot ? snap->def->disks[i].mirror : snap->def->disks[i].file; + if (virAsprintf(&device, "drive-%s", + vm->def->disks[i]->info.alias) < 0 || + (file = strdup(dest)) == NULL) { + virReportOOMError(); + break; + } + if (qemuMonitorDriveReopen(priv->mon, device, dest, + snap->def->disks[i].driverType) < 0) + break; + /* TODO: Release lock and reset label. */ + + VIR_FREE(vm->def->disks[i]->src); + vm->def->disks[i]->src = file; + file = NULL; + if (pivot) + snap->def->disks[i].file = snap->def->disks[i].mirror; + snap->def->disks[i].mirror = NULL; + } + VIR_FREE(device); + VIR_FREE(file); + if (i < snap->def->ndisks) + ret = -1; + + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (ret < 0 && + qemuDomainSnapshotWriteMetadata(vm, snap, driver->snapshotDir) < 0) + VIR_WARN("failed writing snapshot '%s' after partial mirror deletion", + snap->def->name); + return ret; +} + static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, unsigned int flags) { @@ -11293,12 +11353,16 @@ static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, char uuidstr[VIR_UUID_STRING_BUFLEN]; struct qemu_snap_remove rem; struct snap_reparent rep; - bool metadata_only = !!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY); int external = 0; + bool metadata_only = !!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY); + bool mirror_abort = !!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT); + bool mirror_pivot = !!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT); virCheckFlags(VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN | VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY | - VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY, -1); + VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY | + VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_ABORT | + VIR_DOMAIN_SNAPSHOT_DELETE_MIRROR_PIVOT, -1); qemuDriverLock(driver); virUUIDFormat(snapshot->domain->uuid, uuidstr); @@ -11323,12 +11387,21 @@ static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, _("domain has active disk mirrors")); goto cleanup; } + if (!(mirror_abort || mirror_pivot)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("must specify whether to abort or pivot " + "mirrors before deleting snapshot")); + goto cleanup; + } + metadata_only = true; + flags = 0; + } else if (mirror_abort || mirror_pivot) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("deletion of active disk mirrors unimplemented")); + _("snapshot has no disk mirrors")); goto cleanup; } - if (!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_METADATA_ONLY)) { + if (!metadata_only) { if (!(flags & VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY) && snap->def->state == VIR_DOMAIN_DISK_SNAPSHOT) external++; @@ -11347,6 +11420,10 @@ static int qemuDomainSnapshotDelete(virDomainSnapshotPtr snapshot, if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; + if ((mirror_abort || mirror_pivot) && + qemuDomainSnapshotDeleteMirror(driver, vm, snap, mirror_pivot) < 0) + goto endjob; + if (flags & (VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN | VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN_ONLY)) { rem.driver = driver; -- 1.7.7.6

The hardest part of this patch is figuring out how to provide proper security labeling and lock manager setup for the mirror, as well as rolling it all back on error. * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateXML): Decide when mirrors are allowed. (qemuDomainSnapshotDiskPrepare): Prepare for mirrors. (qemuDomainSnapshotCreateSingleDiskActive): Turn on mirrors. (qemuDomainSnapshotUndoSingleDiskActive): Undo mirrors on failure. (qemuDomainSnapshotCreateDiskActive): Update caller. --- src/qemu/qemu_driver.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 114 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4ce31a3..a74e606 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9865,6 +9865,25 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, disk->name, disk->file); goto cleanup; } + if (disk->mirror) { + if (stat(disk->mirror, &st) < 0) { + if (errno != ENOENT) { + virReportSystemError(errno, + _("unable to stat mirror for " + "disk %s: %s"), + disk->name, disk->mirror); + goto cleanup; + } + } else if (!(S_ISBLK(st.st_mode) || !st.st_size || + allow_reuse)) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("external mirror file for disk %s " + "already exists and is not a block " + "device: %s"), + disk->name, disk->mirror); + goto cleanup; + } + } found = true; external++; break; @@ -9925,6 +9944,7 @@ qemuDomainSnapshotCreateSingleDiskActive(struct qemud_driver *driver, char *origsrc = NULL; char *origdriver = NULL; bool need_unlink = false; + bool need_unlink_mirror = false; if (snap->snapshot != VIR_DOMAIN_DISK_SNAPSHOT_EXTERNAL) { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -9952,13 +9972,22 @@ qemuDomainSnapshotCreateSingleDiskActive(struct qemud_driver *driver, if (fd < 0) goto cleanup; VIR_FORCE_CLOSE(fd); + + if (snap->mirror) { + fd = qemuOpenFile(driver, snap->mirror, + O_WRONLY | O_TRUNC | O_CREAT, + &need_unlink_mirror, NULL); + if (fd < 0) + goto cleanup; + VIR_FORCE_CLOSE(fd); + } } origsrc = disk->src; - disk->src = source; origdriver = disk->driverType; disk->driverType = (char *) "raw"; /* Don't want to probe backing files */ + disk->src = source; if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) goto cleanup; if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, @@ -9968,9 +9997,21 @@ qemuDomainSnapshotCreateSingleDiskActive(struct qemud_driver *driver, goto cleanup; } + if (snap->mirror) { + disk->src = snap->mirror; + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto cleanup2; + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", source); + goto cleanup2; + } + } + disk->src = origsrc; - origsrc = NULL; disk->driverType = origdriver; + origsrc = NULL; origdriver = NULL; /* create the actual snapshot */ @@ -9979,9 +10020,22 @@ qemuDomainSnapshotCreateSingleDiskActive(struct qemud_driver *driver, virDomainAuditDisk(vm, disk->src, source, "snapshot", ret >= 0); if (ret < 0) goto cleanup; + if (snap->mirror) { + ret = qemuMonitorDriveMirror(priv->mon, actions, device, snap->mirror, + driverType, + (VIR_DOMAIN_BLOCK_COPY_SHALLOW | + (reuse ? + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT : 0))); + virDomainAuditDisk(vm, NULL, snap->mirror, "mirror", ret >= 0); + if (ret < 0) { + ret = -2; + goto cleanup; + } + } /* Update vm in place to match changes. */ need_unlink = false; + need_unlink_mirror = false; VIR_FREE(disk->src); disk->src = source; source = NULL; @@ -10008,12 +10062,25 @@ cleanup: } if (need_unlink && unlink(source)) VIR_WARN("unable to unlink just-created %s", source); + if (need_unlink_mirror && unlink(snap->mirror)) + VIR_WARN("unable to unlink just-created %s", snap->mirror); VIR_FREE(device); VIR_FREE(source); VIR_FREE(driverType); VIR_FREE(persistSource); VIR_FREE(persistDriverType); return ret; + +cleanup2: + /* We failed to relabel a mirror, but successfully labeled the + * destination; so we need to release the destination. */ + disk->src = source; + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", source); + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm->def, disk) < 0) + VIR_WARN("Unable to restore security label on %s", source); + goto cleanup; } /* The domain is expected to hold monitor lock. This is the @@ -10025,7 +10092,7 @@ qemuDomainSnapshotUndoSingleDiskActive(struct qemud_driver *driver, virDomainDiskDefPtr origdisk, virDomainDiskDefPtr disk, virDomainDiskDefPtr persistDisk, - bool need_unlink) + bool need_unlink, char *mirror) { char *source = NULL; char *driverType = NULL; @@ -10043,6 +10110,23 @@ qemuDomainSnapshotUndoSingleDiskActive(struct qemud_driver *driver, goto cleanup; } + if (mirror) { + char *origsrc = disk->src; + char *origdriver = disk->driverType; + disk->src = mirror; + disk->driverType = (char *) "raw"; + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm->def, disk) < 0) + VIR_WARN("Unable to restore security label on %s", mirror); + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", mirror); + if (need_unlink && stat(mirror, &st) == 0 && + st.st_size == 0 && S_ISREG(st.st_mode) && unlink(mirror) < 0) + VIR_WARN("Unable to remove just-created %s", mirror); + disk->src = origsrc; + disk->driverType = origdriver; + } + if (virSecurityManagerRestoreImageLabel(driver->securityManager, vm->def, disk) < 0) VIR_WARN("Unable to restore security label on %s", disk->src); @@ -10182,13 +10266,21 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, ret = qemuMonitorTransaction(priv->mon, actions); virJSONValueFree(actions); if (ret < 0) { - /* Transaction failed; undo the changes to vm. */ + /* Transaction failed; undo the changes to vm. Creating a + * mirror takes two parts; a return of -2 tells us that we + * failed half-way through. */ + if (ret == -2) + i++; bool need_unlink = !(flags & VIR_DOMAIN_SNAPSHOT_CREATE_REUSE_EXT); while (--i >= 0) { virDomainDiskDefPtr persistDisk = NULL; + char *mirror = NULL; if (snap->def->disks[i].snapshot == VIR_DOMAIN_DISK_SNAPSHOT_NO) continue; + if (ret != -2) + mirror = snap->def->disks[i].mirror; + ret = -1; if (vm->newDef) { int indx = virDomainDiskIndexByName(vm->newDef, vm->def->disks[i]->dst, @@ -10201,7 +10293,7 @@ qemuDomainSnapshotCreateDiskActive(virConnectPtr conn, snap->def->dom->disks[i], vm->def->disks[i], persistDisk, - need_unlink); + need_unlink, mirror); } } } @@ -10269,6 +10361,7 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, unsigned int flags) { struct qemud_driver *driver = domain->conn->privateData; + qemuDomainObjPrivatePtr priv; virDomainObjPtr vm = NULL; char *xml = NULL; virDomainSnapshotObjPtr snap = NULL; @@ -10312,6 +10405,7 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + priv = vm->privateData; if (qemuProcessAutoDestroyActive(driver, vm)) { qemuReportError(VIR_ERR_OPERATION_INVALID, @@ -10330,6 +10424,21 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, goto cleanup; } + /* Right now, we only permit disk mirroring when qemu is new + * enough, and only when creating a new snapshot on a transient + * domain; that way, we don't have to deal with complications like + * restoring mirroring when stopping and rebooting a domain, or + * altering the set of mirrors when redefining a snapshot. We + * require the snapshot to exist as long as mirroring is + * active. */ + if (qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN) && + !vm->persistent && + (flags & (VIR_DOMAIN_SNAPSHOT_CREATE_HALT | + VIR_DOMAIN_SNAPSHOT_CREATE_NO_METADATA | + VIR_DOMAIN_SNAPSHOT_CREATE_REDEFINE)) == 0) + parse_flags |= VIR_DOMAIN_SNAPSHOT_PARSE_MIRROR; + if (!(def = virDomainSnapshotDefParseString(xmlDesc, driver->caps, QEMU_EXPECTED_VIRT_TYPES, parse_flags))) -- 1.7.7.6
participants (4)
-
Eric Blake
-
Jiri Denemark
-
Laine Stump
-
Paolo Bonzini