[libvirt] [PATCHv6 0/8] live block migration

v5: https://www.redhat.com/archives/libvir-list/2012-April/msg00753.html Differences in v6: - rebased on top of accepted patches v5:1-4/23 and latest tree - corresponds to patches v5:7-14/23 - patch 5/8 is new - patch v5:12/23 is dropped; qemu is giving us something better, but I still need to finish writing that patch - patch v5:11/23 comments were incorporated, with better cleanup on error - tweak series to deal with potential for qemu 1.1 to support copy but not pivot - limit series to just the bare minimum for now; patches from v5:15/23 are still on my tree, but not worth submitting until we know more about what qemu will provide I'm posting this more for reference for my efforts to backport this to RHEL 6 where I have tested against a candidate qemu build, and don't this is ready for upstream until we have confirmation that actual patches have gone into at least the qemu block queue for 1.1. Eric Blake (8): blockjob: react to active block copy blockjob: add qemu capabilities related to block jobs blockjob: return appropriate event and info blockjob: support pivot operation on cancel blockjob: make drive-reopen safer blockjob: implement block copy for qemu blockjob: allow for existing files blockjob: allow mirroring under SELinux src/conf/domain_conf.c | 12 ++ src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_driver.c | 396 +++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 7 + src/qemu/qemu_monitor.c | 37 ++++ src/qemu/qemu_monitor.h | 11 ++ src/qemu/qemu_monitor_json.c | 67 +++++++ src/qemu/qemu_monitor_json.h | 18 ++- 11 files changed, 549 insertions(+), 6 deletions(-) -- 1.7.7.6

For now, disk migration via block copy job is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout. But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data. We can make life a lot easier by being cowards, and forbidding certain operations on a domain. This patch guarantees that we never get in a state where we would have to restart a domain with a mirroring block copy, by preventing saves, snapshots, hot unplug of a disk in use, and conversion to a persistent domain (thankfully, it is still relatively easy to 'virsh undefine' a running domain to temporarily make it transient, run tests on 'virsh blockcopy', then 'virsh define' to restore the persistence). The change to qemudDomainDefine looks a bit odd for undoing an assignment, rather than probing up front to avoid the assignment, but this is because of how virDomainAssignDef combines both a lookup and assignment into a single function call. * src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype. * src/conf/domain_conf.c (virDomainHasDiskMirror): New function. * src/libvirt_private.syms (domain_conf.h): Export it. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous actions while block copy is already in action. * src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise. --- v6: no real change from v5 src/conf/domain_conf.c | 12 ++++++++++++ src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 29 +++++++++++++++++++++++++++++ src/qemu/qemu_hotplug.c | 7 +++++++ 5 files changed, 50 insertions(+), 0 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 184ff23..c134362 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7187,6 +7187,18 @@ virDomainDiskRemoveByName(virDomainDefPtr def, const char *name) return virDomainDiskRemove(def, i); } +/* Return true if VM has at least one disk involved in a current block + * copy job (that is, with a <mirror> element in the disk xml). */ +bool +virDomainHasDiskMirror(virDomainObjPtr vm) +{ + int i; + for (i = 0; i < vm->def->ndisks; i++) + if (vm->def->disks[i]->mirror) + return true; + return false; +} + int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) { if (VIR_REALLOC_N(def->nets, def->nnets + 1) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 5aa8fc1..9d74f44 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1954,6 +1954,7 @@ virDomainDiskDefPtr virDomainDiskRemove(virDomainDefPtr def, size_t i); virDomainDiskDefPtr virDomainDiskRemoveByName(virDomainDefPtr def, const char *name); +bool virDomainHasDiskMirror(virDomainObjPtr vm); int virDomainNetIndexByMac(virDomainDefPtr def, const unsigned char *mac); int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a4bd916..9d5a471 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -354,6 +354,7 @@ virDomainGraphicsSpiceZlibCompressionTypeFromString; virDomainGraphicsSpiceZlibCompressionTypeToString; virDomainGraphicsTypeFromString; virDomainGraphicsTypeToString; +virDomainHasDiskMirror; virDomainHostdevDefAlloc; virDomainHostdevDefClear; virDomainHostdevDefFree; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index c3555ca..582eafa 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2572,6 +2572,11 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_PARTIAL, sizeof(header.magic)); @@ -4964,6 +4969,12 @@ static virDomainPtr qemudDomainDefine(virConnectPtr conn, const char *xml) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1; if (virDomainSaveConfig(driver->configDir, @@ -10279,6 +10290,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } + if (!vm->persistent && (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT)) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cannot halt after transient domain snapshot")); @@ -10886,6 +10903,11 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } snap = virDomainSnapshotFindByName(&vm->snapshots, snapshot->name); if (!snap) { @@ -11661,6 +11683,13 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto cleanup; disk = vm->def->disks[idx]; + if (mode == BLOCK_JOB_PULL && disk->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 7cf7b90..4c5b863 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1721,6 +1721,13 @@ int qemuDomainDetachDiskDevice(struct qemud_driver *driver, detach = vm->def->disks[i]; + if (detach->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' is in an active block copy job"), + detach->dst); + goto cleanup; + } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0) != 0) { qemuReportError(VIR_ERR_INTERNAL_ERROR, -- 1.7.7.6

The new block copy storage migration sequence requires new monitor commands: adding just 'drive-mirror' allows live copy, and also adding 'drive-reopen' allows live migration. Both commands have been proposed[1] for qemu 1.1, although as of the writing of this commit message, it appears that drive-mirror might make it, but drive-reopen will probably be delayed to qemu 1.2. There is consensus that the libvirt API virDomainBlockRebase as already committed for 0.9.12 is flexible enough to expose whatever qemu will eventually let us do, even if the initial libvirt 0.9.12 driver for qemu can't expose the full power of the API; also, at least RHEL 6.3 will probably be backporting an early version of drive-reopen under the name __com.redhat_drive-reopen, where the difference in naming will let us cater to any minor differences in the monitor command interface while still preserving the libvirt API at whatever point upstream qemu commits to the monitor commands. This patch assumes the bare minimum needed for live migration, but uses separate feature bits in case qemu 1.1 ends up giving us only live copy. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01630.html * src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR) (QEMU_CAPS_DRIVE_REOPEN): New bits. * src/qemu/qemu_capabilities.c (qemuCaps): Name them. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set them. (qemuMonitorJSONDiskSnapshot): Fix formatting issues. (qemuMonitorJSONDriveMirror, qemuMonitorDriveReopen): New functions. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): Declare them. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): New passthroughs. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): Declare them. --- v6: update commit message to point to newer qemu threads, fix outdated comment in code src/qemu/qemu_capabilities.c | 3 ++ src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_monitor.c | 37 ++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 11 +++++++ src/qemu/qemu_monitor_json.c | 63 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 18 ++++++++++- 6 files changed, 132 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 3d1fb43..c5cb1af 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -161,6 +161,9 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "block-job-async", "scsi-cd", "ide-cd", + "drive-mirror", + + "drive-reopen", /* 95 */ ); struct qemu_feature_flags { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 7279cdb..5472a44 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -128,6 +128,8 @@ enum qemuCapsFlags { QEMU_CAPS_BLOCKJOB_ASYNC = 91, /* qemu 1.1 block-job-cancel */ QEMU_CAPS_SCSI_CD = 92, /* -device scsi-cd */ QEMU_CAPS_IDE_CD = 93, /* -device ide-cd */ + QEMU_CAPS_DRIVE_MIRROR = 94, /* drive-mirror monitor command */ + QEMU_CAPS_DRIVE_REOPEN = 95, /* drive-reopen monitor command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 2f66c46..25704e7 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2685,6 +2685,25 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; } +/* Start a drive-mirror block job. */ +int +qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, flags=%x", + mon, device, file, NULLSTR(format), flags); + + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, device, file, format, flags); + else + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("drive-mirror requires JSON monitor")); + return ret; +} + /* Use the transaction QMP command to run atomic snapshot commands. */ int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) @@ -2701,6 +2720,24 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) return ret; } +/* Use the drive-reopen monitor command. */ +int +qemuMonitorDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s", + mon, device, file, NULLSTR(format)); + + if (mon->json) + ret = qemuMonitorJSONDriveReopen(mon, device, file, format); + else + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("drive-reopen requires JSON monitor")); + return ret; +} + int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, char **reply, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index f3cdcdd..1b4b130 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -510,6 +510,17 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, bool reuse); int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int qemuMonitorDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index eb58f13..e0ea505 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -987,6 +987,10 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, qemuCapsSet(qemuCaps, QEMU_CAPS_BLOCKJOB_SYNC); else if (STREQ(name, "block-job-cancel")) qemuCapsSet(qemuCaps, QEMU_CAPS_BLOCKJOB_ASYNC); + else if (STREQ(name, "drive-mirror")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "drive-reopen")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); } ret = 0; @@ -3199,6 +3203,38 @@ cleanup: } int +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + bool shallow = (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) != 0; + bool reuse = (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) != 0; + + cmd = qemuMonitorJSONMakeCommand("drive-mirror", + "s:device", device, + "s:target", file, + "b:full", !shallow, + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) { int ret = -1; @@ -3226,6 +3262,33 @@ cleanup: return ret; } +int +qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + cmd = qemuMonitorJSONMakeCommand("drive-reopen", + "s:device", device, + "s:new-image-file", file, + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index aacbb5f..d93c37d 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -230,8 +230,22 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, const char *device, const char *file, const char *format, - bool reuse); -int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions); + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, -- 1.7.7.6

Handle the new type of block copy event and info. Of course, this patch does nothing until a later patch actually allows the creation/abort of a block copy job. And we'd really love to have an event without having to poll for the transition between pull and mirroring, but that will have to wait for qemu. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockJobImpl) (qemuMonitorJSONGetBlockJobInfoOne): Translate new job type. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful info query to save effort on a pivot request. --- v6: no real change from v5 src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_monitor_json.c | 4 ++++ 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 582eafa..e1584c6 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11712,6 +11712,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (ret < 0) goto endjob; + /* Snoop block copy operations, so future cancel operations can + * avoid checking if pivot is safe. */ + if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && + info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + disk->mirroring = true; + /* With synchronous block cancel, we must synthesize an event, and * we silently ignore the ABORT_ASYNC flag. With asynchronous * block cancel, the event will come from qemu, but without the diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index e0ea505..6bce701 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -797,6 +797,8 @@ qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type_str, "mirror")) + type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; switch ((virConnectDomainEventBlockJobStatus) event) { case VIR_DOMAIN_BLOCK_JOB_COMPLETED: @@ -3415,6 +3417,8 @@ static int qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, } if (STREQ(type, "stream")) info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type, "mirror")) + info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; else info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; -- 1.7.7.6

This is the bare minimum to end a copy job (of course, until a later patch adds the ability to start a copy job, this patch doesn't do much in isolation; I've just split the patches to ease the review). This patch intentionally avoids SELinux, lock manager, and audit actions. Also, if libvirtd restarts at the exact moment that a 'drive-reopen' is in flight, the proposed proper way to detect the outcome of that 'drive-reopen' would be to first pass in a witness fd with 'getfd', then at libvirtd restart, probe whether that file is still empty. This patch is enough to test the common case of success when used correctly, while saving the subtleties of proper cleanup for worst-case errors for later. When a mirror job is started, cancelling the job safely reverts back to the source disk, regardless of whether the destination is in phase 1 (streaming, in which case the destination is worthless) or phase 2 (mirroring, in which case the destination is synced up to the source at the time of the cancel). Our existing code does just fine in either phase, other than some bookkeeping cleanup; this implements live block copy, even if qemu 1.1 lacks 'drive-reopen'. Pivoting the job requires the use of the new 'drive-reopen' command. Here, failure of the command is potentially catastrophic to the domain, since the initial qemu implementation rips out the old disk before attempting to open the new one; qemu will attempt a recovery path of retrying the reopen on the original source, but if that also fails, the domain is hosed, with nothing libvirt can do about it. Ideas for future enhancements via new flags: If qemu 1.2 ever adds 'drive-reopen' inside 'transaction', then the problem will no longer exist (a transaction promises not to close the old file until after the new file is proven to work), at which point we would add a VIR_DOMAIN_REBASE_COPY_ATOMIC that fails up front if we detect an older qemu with the risky drive-reopen. We may also want to add a flag that fails up front if there is no reopen support at all, rather than waiting until the entire copy is done only to find out that pivot always fails. Interesting side note: while snapshot-create --disk-only creates a copy of the disk at a point in time by moving the domain on to a new file (the copy is the file now in the just-extended backing chain), blockjob --abort of a copy job creates a copy of the disk while keeping the domain on the original file. There may be potential improvements to the snapshot code to exploit block copy over multiple disks all at one point in time. And, if 'block-job-cancel' were made part of 'transaction', you could copy multiple disks at the same point in time without pausing the domain. This also implies we may want to add a --quiesce flag to virDomainBlockJobAbort, so that when breaking a mirror (whether by cancel or pivot), the side of the mirror that we are abandoning is at least in a stable state with regards to guest I/O. * src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag. (qemuDomainBlockPivot): New helper function. (qemuDomainBlockJobImpl): Implement it. --- v6: add probe for drive-reopen at pivot time, so that it is possible to support copy but not pivot with just drive-mirror src/qemu/qemu_driver.c | 112 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 111 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e1584c6..e562844 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11640,6 +11640,86 @@ cleanup: return ret; } +/* Called while holding the VM job lock, to implement a block job + * abort with pivot; this updates the VM definition as appropriate, on + * either success or failure (although there are some forms of + * catastrophic failure that will leave the VM unusable). */ +static int +qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, + const char *device, virDomainDiskDefPtr disk) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainBlockJobInfo info; + + /* Probe the status, if needed. */ + if (!disk->mirroring) { + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorBlockJob(priv->mon, device, NULL, 0, &info, + BLOCK_JOB_INFO, true); + qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } + if (ret == 1 && info.cur == info.end && + info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + disk->mirroring = true; + } + + if (!disk->mirroring) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' not ready for pivot yet"), + disk->dst); + goto cleanup; + } + + /* Attempt the pivot. */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveReopen(priv->mon, device, disk->mirror, + disk->mirrorFormat); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + /* XXX Until qemu adds support for 'drive-reopen' inside + * 'transaction', we have the remote risk of a catastrophic + * failure, where the drive-reopen fails but can't recover by + * reopening the source. Not much we can do about it. */ + + if (ret == 0) { + /* XXX We want to revoke security labels and disk lease, as + * well as audit that revocation, before dropping the original + * source. But it gets tricky if both source and mirror share + * common backing files (we want to only revoke the non-shared + * portion of the chain, and is made more difficult by the + * fact that we aren't tracking the full chain ourselves; so + * for now, we leak the access to the original. */ + VIR_FREE(disk->src); + VIR_FREE(disk->driverType); + disk->src = disk->mirror; + disk->driverType = disk->mirrorFormat; + disk->mirror = NULL; + disk->mirrorFormat = NULL; + disk->mirroring = false; + } else { + /* On failure, qemu abandons the mirror, and attempts to + * revert back to the source disk. Hopefully it was able to + * reopen things. */ + /* XXX should we be parsing the exact qemu error, or calling + * 'query-block', to see what state we really got left in + * before killing the mirroring job? And just as on the + * success case, there's security labeling to worry about. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } + +cleanup: + return ret; +} + static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, virDomainBlockJobInfoPtr info, @@ -11689,6 +11769,20 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, disk->dst); goto cleanup; } + if (mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + if (!(async && disk->mirror)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + _("pivot of disk '%s' requires an active copy job"), + disk->dst); + goto cleanup; + } + if (!qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN)) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("pivot not supported with this QEMU binary")); + goto cleanup; + } + } if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; @@ -11699,6 +11793,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto endjob; } + if (disk->mirror && mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + ret = qemuDomainBlockPivot(driver, vm, device, disk); + goto endjob; + } + qemuDomainObjEnterMonitorWithDriver(driver, vm); /* XXX - libvirt should really be tracking the backing file chain * itself, and validating that base is on the chain, rather than @@ -11718,6 +11818,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) disk->mirroring = true; + /* A successful block job cancelation stops any mirroring. */ + if (mode == BLOCK_JOB_ABORT && disk->mirror) { + /* XXX We should also revoke security labels and disk lease on + * the mirror, and audit that fact, before dropping things. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } + /* With synchronous block cancel, we must synthesize an event, and * we silently ignore the ABORT_ASYNC flag. With asynchronous * block cancel, the event will come from qemu, but without the @@ -11782,7 +11891,8 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC | + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT, -1); return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT, flags); } -- 1.7.7.6

Since libvirt drops locks between issuing a monitor command and getting a response, it is possible for libvirtd to be restarted before getting a response on a drive-reopen command; worse, it is also possible for the guest to shut itself down during the window while libvirtd is down, ending the qemu process. A management app needs to know if the pivot happened (and the destination file contains guest contents not in the source) or failed (and the source file contains guest contents not in the destination), but since the job is finished, 'query-block-jobs' no longer tracks the status of the job, and if the qemu process itself has disappeared, even 'query-block' cannot be checked to ask qemu its current state. An upstream qemu proposal[1] suggested adding an optional witness fd, passed in by 'getfd', where qemu writes one byte to that file at the point of a successful pivot, and where restarting libvirtd would check the contents of the witness file. However, that requires quite a bit of libvirt work[2], especially since libvirt does not currently have a way to expose such a witness fd to the management application; and it is not guaranteed that upstream qemu will even accept the witness fd solution. [1] https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01638.html [2] https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg02293.html Meanwhile, we can go with a simpler solution; it is not as elegant, but is something that can be implemented right away without design complications, and can be enhanced later if future qemu gives us more power. If we surround 'drive-reopen' with a pause/resume pair, then we can guarantee that the guest cannot modify either source or destination files in the window of libvirtd uncertainty, and the management app is guaranteed that either libvirt knows the outcome and reported it correctly; or that on libvirtd restart, the guest will still be paused and that the qemu process cannot have disappeared due to guest shutdown; and use that as a clue that the management app must implement recovery protocol, with both source and destination files still being in sync and with 'query-block' still being an option as part of that recovery. My testing of early implementations of 'drive-reopen' show that the pause window will typically be only a fraction of a second. * src/qemu/qemu_driver.c (qemuDomainBlockPivot): Pause around drive-reopen. (qemuDomainBlockJobImpl): Update caller. --- v6: new patch src/qemu/qemu_driver.c | 38 ++++++++++++++++++++++++++++++++++++-- 1 files changed, 36 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e562844..adbc27d 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11645,12 +11645,14 @@ cleanup: * either success or failure (although there are some forms of * catastrophic failure that will leave the VM unusable). */ static int -qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, +qemuDomainBlockPivot(virConnectPtr conn, + struct qemud_driver *driver, virDomainObjPtr vm, const char *device, virDomainDiskDefPtr disk) { int ret = -1; qemuDomainObjPrivatePtr priv = vm->privateData; virDomainBlockJobInfo info; + bool resume = false; /* Probe the status, if needed. */ if (!disk->mirroring) { @@ -11677,6 +11679,30 @@ qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, goto cleanup; } + /* Until 'drive-reopen' is part of a 'transaction', we want to + * make sure that management apps can tell whether the command + * succeeded, even if libvirtd is restarted at the wrong time. To + * accomplish that, we pause the guest before drive-reopen, and + * resume it only when we know the outcome; if libvirtd restarts, + * then management will see the guest still paused, and know that + * no guest I/O has caused the source and mirror to diverge. + * + * XXX once 'drive-reopen' is part of 'transaction', then we don't + * need to do this; we will want to add an atomic flag that says + * whether to fail or fall back to this method for older qemu. */ + if (virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { + if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, + QEMU_ASYNC_JOB_NONE) < 0) + goto cleanup; + + resume = true; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest unexpectedly quit")); + goto cleanup; + } + } + /* Attempt the pivot. */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDriveReopen(priv->mon, device, disk->mirror, @@ -11717,6 +11743,14 @@ qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, } cleanup: + if (resume && virDomainObjIsActive(vm) && + qemuProcessStartCPUs(driver, vm, conn, + VIR_DOMAIN_RUNNING_UNPAUSED, + QEMU_ASYNC_JOB_NONE) < 0 && + virGetLastError() == NULL) { + qemuReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("resuming after drive-reopen failed")); + } return ret; } @@ -11795,7 +11829,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (disk->mirror && mode == BLOCK_JOB_ABORT && (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { - ret = qemuDomainBlockPivot(driver, vm, device, disk); + ret = qemuDomainBlockPivot(dom->conn, driver, vm, device, disk); goto endjob; } -- 1.7.7.6

Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. --- v6: no real changes from v5 src/qemu/qemu_driver.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 122 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index adbc27d..cebbf46 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11950,10 +11950,128 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, } static int +qemuDomainBlockCopy(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *device = NULL; + virDomainDiskDefPtr disk; + int ret = -1; + int idx; + + /* Preliminaries: find the disk we are editing, sanity checks */ + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + + qemuDriverLock(driver); + virUUIDFormat(dom->uuid, uuidstr); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + if (!vm) { + qemuReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + + device = qemuDiskPathToAlias(vm, path, &idx); + if (!device) { + goto cleanup; + } + disk = vm->def->disks[idx]; + if (disk->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + + priv = vm->privateData; + /* XXX Should we add a flag that lets the user fail up front if + * pivot (QEMU_CAPS_DRIVE_REOPEN) is not supported, rather than + * making them wait until the mirroring phase to find out? */ + if (!(qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKJOB_ASYNC))) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("block copy is not supported with this QEMU binary")); + goto cleanup; + } + if (vm->persistent) { + /* XXX if qemu ever lets us start a new domain with mirroring + * already active, we can relax this; but for now, the risk of + * 'managedsave' due to libvirt-guests means we can't risk + * this on persistent domains. */ + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not transient")); + goto cleanup; + } + + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto endjob; + } + + /* Prepare the destination file. */ + /* XXX We also need to add security labeling, lock manager lease, + * and auditing of those events, as well as to support reuse of + * existing images, including probing the existing format of an + * existing image. */ + if (!format) + format = disk->driverType; + if ((format && !(disk->mirrorFormat = strdup(format))) || + !(disk->mirror = strdup(dest))) { + virReportOOMError(); + goto endjob; + } + + /* Actually start the mirroring */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, flags); + if (ret == 0 && bandwidth != 0) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL, true); + qemuDomainObjExitMonitorWithDriver(driver, vm); + +endjob: + if (ret < 0) { + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + } + if (qemuDomainObjEndJob(driver, vm) == 0) { + vm = NULL; + goto cleanup; + } + +cleanup: + VIR_FREE(device); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + +static int qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); + + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + const char *format = NULL; + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) + format = "raw"; + flags &= ~(VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW); + return qemuDomainBlockCopy(dom, path, base, format, bandwidth, flags); + } + return qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, BLOCK_JOB_PULL, flags); } @@ -11962,7 +12080,9 @@ static int qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { - return qemuDomainBlockRebase(dom, path, NULL, bandwidth, flags); + virCheckFlags(0, -1); + return qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, + BLOCK_JOB_PULL, flags); } static int -- 1.7.7.6

On 04/23/2012 08:49 PM, Eric Blake wrote:
Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. ---
v6: no real changes from v5
src/qemu/qemu_driver.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 122 insertions(+), 2 deletions(-)
Squash this in to deal with the recent change to block-stream taking a speed argument, since that fix deletes BLOCK_JOB_SPEED_INTERNAL. diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c index 4b5c8ad..045c8da 100644 --- i/src/qemu/qemu_driver.c +++ w/src/qemu/qemu_driver.c @@ -12005,6 +12005,12 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, _("block copy is not supported with this QEMU binary")); goto cleanup; } + if (bandwidth) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("setting initial speed of block copy is not " + "supported with this QEMU binary")); + goto cleanup; + } if (vm->persistent) { /* XXX if qemu ever lets us start a new domain with mirroring * already active, we can relax this; but for now, the risk of @@ -12040,9 +12046,6 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, flags); - if (ret == 0 && bandwidth != 0) - ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED_INTERNAL, true); qemuDomainObjExitMonitorWithDriver(driver, vm); endjob: -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This copies some of the checks from snapshots regarding testing when a file already exists. In the process, I noticed snapshots had hard-to-read logic, as well as a missing sanity check: REUSE_EXT should require the destination to already be present. * src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT flag. (qemuDomainBlockCopy): Wire up flag, and add some sanity checks. (qemuDomainSnapshotDiskPrepare): Require destination on REUSE_EXT. --- v6: no real change from v5 src/qemu/qemu_driver.c | 50 ++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 44 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index cebbf46..e3d3280 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9829,8 +9829,13 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, _("unable to stat for disk %s: %s"), disk->name, disk->file); goto cleanup; + } else if (allow_reuse) { + virReportSystemError(errno, + _("missing existing file for disk %s: %s"), + disk->name, disk->file); + goto cleanup; } - } else if (!(S_ISBLK(st.st_mode) || !st.st_size || allow_reuse)) { + } else if (!S_ISBLK(st.st_mode) && st.st_size && !allow_reuse) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("external snapshot file for disk %s already " "exists and is not a block device: %s"), @@ -11962,9 +11967,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, virDomainDiskDefPtr disk; int ret = -1; int idx; + struct stat st; /* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1); qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -12016,12 +12023,42 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } + /* XXX this is pessimistic; we could use 'query-block' or even + * keep track of the backing chain ourselves, rather than assuming + * that all non-raw source files have a current backing image */ + if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + STREQ_NULLABLE(format, "raw") && + STRNEQ_NULLABLE(disk->driverType, "raw")) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("raw shallow copy of non-raw disk '%s' not possible"), + disk->dst); + goto endjob; + } + /* Prepare the destination file. */ + if (stat(dest, &st) < 0) { + if (errno != ENOENT) { + virReportSystemError(errno, _("unable to stat for disk %s: %s"), + disk->dst, dest); + goto endjob; + } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + virReportSystemError(errno, + _("missing destination file for disk %s: %s"), + disk->dst, dest); + goto endjob; + } + } else if (!S_ISBLK(st.st_mode) && st.st_size && + !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("external destination file for disk %s already " + "exists and is not a block device: %s"), + disk->dst, dest); + goto endjob; + } + /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events, as well as to support reuse of - * existing images, including probing the existing format of an - * existing image. */ - if (!format) + * and auditing of those events. */ + if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) format = disk->driverType; if ((format && !(disk->mirrorFormat = strdup(format))) || !(disk->mirror = strdup(dest))) { @@ -12060,6 +12097,7 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | VIR_DOMAIN_BLOCK_REBASE_COPY | VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); -- 1.7.7.6

This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label at the end of the mirroring is a trickier prospect (we would have to know the backing chain of both source and destination, and be sure not to revoke rights to any part of the chain that is shared), so for now, virDomainBlockJobAbort still leaves things locked and labeled. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling. --- v6: no real change from v8 src/qemu/qemu_driver.c | 69 +++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 60 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e3d3280..e78a73e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11968,6 +11968,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, int ret = -1; int idx; struct stat st; + bool need_unlink = false; + char *mirror = NULL; + char *mirrorFormat = NULL; + char *origsrc = NULL; + char *origdriver = NULL; /* Preliminaries: find the disk we are editing, sanity checks */ virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | @@ -12056,29 +12061,75 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } - /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events. */ - if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) - format = disk->driverType; - if ((format && !(disk->mirrorFormat = strdup(format))) || - !(disk->mirror = strdup(dest))) { + if (!(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + int fd = qemuOpenFile(driver, dest, O_WRONLY | O_TRUNC | O_CREAT, + &need_unlink, NULL); + if (fd < 0) + goto endjob; + VIR_FORCE_CLOSE(fd); + if (!format) + format = disk->driverType; + } + if ((format && !(mirrorFormat = strdup(format))) || + !(mirror = strdup(dest))) { virReportOOMError(); goto endjob; } + /* Manipulate disk in place, in a way that can be reverted on + * failure, in order to set up labeling and locking. */ + origsrc = disk->src; + disk->src = (char *) dest; + origdriver = disk->driverType; + disk->driverType = (char *) "raw"; /* Don't want to probe backing files */ + + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto endjob; + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + } + /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, flags); + virDomainAuditDisk(vm, NULL, dest, "mirror", ret >= 0); if (ret == 0 && bandwidth != 0) ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, BLOCK_JOB_SPEED_INTERNAL, true); qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) { + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm->def, disk) < 0) + VIR_WARN("Unable to restore security label on %s", dest); + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + } + + disk->src = origsrc; + origsrc = NULL; + disk->driverType = origdriver; + origdriver = NULL; + + /* Update vm in place to match changes. */ + need_unlink = false; + disk->mirror = mirror; + disk->mirrorFormat = mirrorFormat; + mirror = NULL; + mirrorFormat = NULL; endjob: - if (ret < 0) { - VIR_FREE(disk->mirror); - VIR_FREE(disk->mirrorFormat); + if (origsrc) { + disk->src = origsrc; + disk->driverType = origdriver; } + if (need_unlink && unlink(dest)) + VIR_WARN("unable to unlink just-created %s", dest); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); if (qemuDomainObjEndJob(driver, vm) == 0) { vm = NULL; goto cleanup; -- 1.7.7.6

On 04/23/2012 08:49 PM, Eric Blake wrote:
This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label at the end of the mirroring is a trickier prospect (we would have to know the backing chain of both source and destination, and be sure not to revoke rights to any part of the chain that is shared), so for now, virDomainBlockJobAbort still leaves things locked and labeled.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling. ---
Given today's fix for snapshot, and that this code heavily copied from snapshot, I will be squashing this in: diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c index a2f88fd..da4ad7e 100644 --- i/src/qemu/qemu_driver.c +++ w/src/qemu/qemu_driver.c @@ -12031,6 +12031,7 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, char *mirrorFormat = NULL; char *origsrc = NULL; char *origdriver = NULL; + virCgroupPtr cgroup = NULL; /* Preliminaries: find the disk we are editing, sanity checks */ virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | @@ -12049,6 +12050,13 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, _("domain is not running")); goto cleanup; } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES) && + virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0)) { + qemuReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } device = qemuDiskPathToAlias(vm, path, &idx); if (!device) { @@ -12154,8 +12162,15 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) goto endjob; + if (cgroup && qemuSetupDiskCgroup(driver, vm, cgroup, disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto cleanup; + } if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, disk) < 0) { + if (cgroup && qemuTeardownDiskCgroup(driver, vm, cgroup, disk) < 0) + VIR_WARN("Failed to teardown cgroup for disk path %s", dest); if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) VIR_WARN("Unable to release lock on %s", dest); goto endjob; @@ -12167,6 +12182,8 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, virDomainAuditDisk(vm, NULL, dest, "mirror", ret >= 0); qemuDomainObjExitMonitorWithDriver(driver, vm); if (ret < 0) { + if (cgroup && qemuTeardownDiskCgroup(driver, vm, cgroup, disk) < 0) + VIR_WARN("Failed to teardown cgroup for disk path %s", dest); if (virSecurityManagerRestoreImageLabel(driver->securityManager, vm->def, disk) < 0) VIR_WARN("Unable to restore security label on %s", dest); @@ -12202,6 +12219,8 @@ endjob: } cleanup: + if (cgroup) + virCgroupFree(&cgroup); VIR_FREE(device); if (vm) virDomainObjUnlock(vm); -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Similar to the recent race fix for 'block-stream', it is possible to set the speed of a block copy job up front thanks to an optional 'speed' parameter to 'drive-mirror'. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveMirror): Set speed at job start. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror): Add parameter. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Adjust caller. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror): Likewise. --- v6: new patch. I haven't actually seen a qemu patch implementing this yet, but it is a logical extension of the recent block-stream patch. src/qemu/qemu_driver.c | 3 ++- src/qemu/qemu_monitor.c | 22 +++++++++++++++++----- src/qemu/qemu_monitor.h | 1 + src/qemu/qemu_monitor_json.c | 5 ++++- src/qemu/qemu_monitor_json.h | 1 + 5 files changed, 25 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 3a9e2d0..994393a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12108,7 +12108,8 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); - ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, flags); + ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, bandwidth, + flags); virDomainAuditDisk(vm, NULL, dest, "mirror", ret >= 0); qemuDomainObjExitMonitorWithDriver(driver, vm); if (ret < 0) { diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index fc84146..0c2ab48 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2685,19 +2685,31 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; } -/* Start a drive-mirror block job. */ +/* Start a drive-mirror block job. bandwidth is in MB/sec. */ int qemuMonitorDriveMirror(qemuMonitorPtr mon, const char *device, const char *file, - const char *format, unsigned int flags) + const char *format, unsigned long bandwidth, + unsigned int flags) { int ret = -1; + unsigned long long speed; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, bandwidth=%luM, flags=%x", + mon, device, file, NULLSTR(format), bandwidth, flags); - VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, flags=%x", - mon, device, file, NULLSTR(format), flags); + /* Convert bandwidth MiB to bytes */ + if (bandwidth > ULLONG_MAX / 1024 / 1024) { + qemuReportError(VIR_ERR_OVERFLOW, + _("bandwidth must be less than %llu"), + ULLONG_MAX / 1024 / 1024); + return -1; + } + speed = bandwidth * 1024ULL * 1024ULL; if (mon->json) - ret = qemuMonitorJSONDriveMirror(mon, device, file, format, flags); + ret = qemuMonitorJSONDriveMirror(mon, device, file, format, speed, + flags); else qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", _("drive-mirror requires JSON monitor")); diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 9ff8578..9dd0302 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -514,6 +514,7 @@ int qemuMonitorDriveMirror(qemuMonitorPtr mon, const char *device, const char *file, const char *format, + unsigned long bandwidth, unsigned int flags) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorDriveReopen(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 7b459ff..51b19cd 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3204,10 +3204,12 @@ cleanup: return ret; } +/* speed is in bytes/sec */ int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, const char *device, const char *file, - const char *format, unsigned int flags) + const char *format, unsigned long long speed, + unsigned int flags) { int ret = -1; virJSONValuePtr cmd; @@ -3219,6 +3221,7 @@ qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, "s:device", device, "s:target", file, "b:full", !shallow, + "U:speed", speed, "s:mode", reuse ? "existing" : "absolute-paths", format ? "s:format" : NULL, format, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index 71bd2cf..5371384 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -239,6 +239,7 @@ int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, const char *device, const char *file, const char *format, + unsigned long long speed, unsigned int flags) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, -- 1.7.7.6

On 04/26/2012 04:15 PM, Eric Blake wrote:
Similar to the recent race fix for 'block-stream', it is possible to set the speed of a block copy job up front thanks to an optional 'speed' parameter to 'drive-mirror'.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveMirror): Set speed at job start. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror): Add parameter. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Adjust caller. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror): Likewise. ---
v6: new patch. I haven't actually seen a qemu patch implementing this yet, but it is a logical extension of the recent block-stream patch.
Hm, not sure what this means compared to what you write on the top...
- VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, flags=%x", - mon, device, file, NULLSTR(format), flags); + /* Convert bandwidth MiB to bytes */ + if (bandwidth> ULLONG_MAX / 1024 / 1024) {
ULLONG_MAX / (1024 * 1024) -- but yours is also fine... ACK. Stefan

On 04/26/2012 02:38 PM, Stefan Berger wrote:
On 04/26/2012 04:15 PM, Eric Blake wrote:
Similar to the recent race fix for 'block-stream', it is possible to set the speed of a block copy job up front thanks to an optional 'speed' parameter to 'drive-mirror'.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveMirror): Set speed at job start. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror): Add parameter. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror): Likewise. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Adjust caller. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror): Likewise. ---
v6: new patch. I haven't actually seen a qemu patch implementing this yet, but it is a logical extension of the recent block-stream patch.
Hm, not sure what this means compared to what you write on the top...
It means: Paolo proposed 'drive-mirror' to upstream qemu prior to Stefan Hajnoczi's patch to 'block-stream'. qemu 1.1 has included Stefan's patch, but Paolo's is still in limbo, and needs to be rebased if it will even make qemu 1.1. But it is a logical extension to assume that when Paolo rebases his drive-mirror patches, that he will also add an optional 'speed' parameter; so as to avoid the same data race as was just fixed in 'block-stream'. And until I can point to an actual qemu commit with 'drive-mirror' (at all, not just 'drive-mirror' with speed), this patch is stalled just like the other 8 patches in this series. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Il 26/04/2012 22:55, Eric Blake ha scritto:
Paolo proposed 'drive-mirror' to upstream qemu prior to Stefan Hajnoczi's patch to 'block-stream'. qemu 1.1 has included Stefan's patch, but Paolo's is still in limbo, and needs to be rebased if it will even make qemu 1.1. But it is a logical extension to assume that when Paolo rebases his drive-mirror patches, that he will also add an optional 'speed' parameter; so as to avoid the same data race as was just fixed in 'block-stream'.
That's correct. Paolo

Similar to 8e532d3, but for block copy. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Avoid misleading error. --- v6: new patch; although it might be worth squashing this into 6/8 when upstream qemu ever adds 'drive-mirror' src/qemu/qemu_driver.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index debb6d6..e3492f8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12001,6 +12001,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } device = qemuDiskPathToAlias(vm, path, &idx); if (!device) { -- 1.7.7.6
participants (3)
-
Eric Blake
-
Paolo Bonzini
-
Stefan Berger