[libvirt] [PATCHv7 0/9] blockjob: live block migration

v6 of these patches is here: https://www.redhat.com/archives/libvir-list/2012-April/msg01240.html The 'drive-mirror' command not only missed qemu 1.1 (back when I first posted these patches), but also missed qemu 1.2. Paolo is still working on re-posting his patches to the qemu lists (see patch 2/9 for a link to his previous posting, which has had two months of commentary). I'm still not anxious to have these commands committed to upstream libvirt until 'drive-mirror' is at least in qemu.git rather than just Paolo's tree, but thought it would be worth rebasing the series to track the things that have changed since April. In v7, I've adjusted the series to cater to Paolo's proposed QMP semantics (most of the review on his last patch to the qemu list was on the internals, and not the interface itself), as well as to allow working out-of-the-box with the RHEL 6.3 implementation of block copy. Unfortunately, since Paolo's patches are still in flux, I suspect this will need a v8 that has seen further testing with both upstream qemu.git and RHEL 6.3 git. Eric Blake (9): blockjob: react to active block copy blockjob: add qemu capabilities related to block jobs blockjob: return appropriate event and info blockjob: support pivot operation on cancel blockjob: make drive-reopen safer blockjob: implement block copy for qemu blockjob: allow for existing files blockjob: allow mirroring under SELinux and cgroup blockjob: relabel entire existing chain include/libvirt/libvirt.h.in | 1 + src/conf/domain_conf.c | 12 ++ src/conf/domain_conf.h | 1 + src/libvirt.c | 7 +- src/libvirt_private.syms | 1 + src/qemu/qemu_capabilities.c | 2 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_driver.c | 413 ++++++++++++++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 7 + src/qemu/qemu_monitor.c | 52 ++++++ src/qemu/qemu_monitor.h | 14 ++ src/qemu/qemu_monitor_json.c | 101 ++++++++++- src/qemu/qemu_monitor_json.h | 21 ++- src/qemu/qemu_process.c | 2 + 14 files changed, 625 insertions(+), 11 deletions(-) -- 1.7.11.4

For now, disk migration via block copy job is not implemented in libvirt. But when we do implement it, we have to deal with the fact that qemu does not yet provide an easy way to re-start a qemu process with mirroring still intact. Paolo has proposed an idea for a persistent dirty bitmap that might make this possible, but until that design is complete, it's hard to say what changes libvirt would need. Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout. But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data. We can make life a lot easier by being cowards for now, forbidding certain operations on a domain. This patch guarantees that we never get in a state where we would have to restart a domain with a mirroring block copy, by preventing saves, snapshots, hot unplug of a disk in use, and conversion to a persistent domain (thankfully, it is still relatively easy to 'virsh undefine' a running domain to temporarily make it transient, run tests on 'virsh blockcopy', then 'virsh define' to restore the persistence). Later, if the qemu design is enhanced, we can relax our code. The change to qemudDomainDefine looks a bit odd for undoing an assignment, rather than probing up front to avoid the assignment, but this is because of how virDomainAssignDef combines both a lookup and assignment into a single function call. * src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype. * src/conf/domain_conf.c (virDomainHasDiskMirror): New function. * src/libvirt_private.syms (domain_conf.h): Export it. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous actions while block copy is already in action. * src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise. --- src/conf/domain_conf.c | 12 ++++++++++++ src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 29 +++++++++++++++++++++++++++++ src/qemu/qemu_hotplug.c | 7 +++++++ 5 files changed, 50 insertions(+) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 75756c4..b3b8a9f 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7830,6 +7830,18 @@ virDomainDiskRemoveByName(virDomainDefPtr def, const char *name) return virDomainDiskRemove(def, i); } +/* Return true if VM has at least one disk involved in a current block + * copy job (that is, with a <mirror> element in the disk xml). */ +bool +virDomainHasDiskMirror(virDomainObjPtr vm) +{ + int i; + for (i = 0; i < vm->def->ndisks; i++) + if (vm->def->disks[i]->mirror) + return true; + return false; +} + int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) { if (VIR_REALLOC_N(def->nets, def->nnets + 1) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index b6fb0db..5dbd8a1 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1963,6 +1963,7 @@ virDomainDiskDefPtr virDomainDiskRemove(virDomainDefPtr def, size_t i); virDomainDiskDefPtr virDomainDiskRemoveByName(virDomainDefPtr def, const char *name); +bool virDomainHasDiskMirror(virDomainObjPtr vm); int virDomainNetIndexByMac(virDomainDefPtr def, const virMacAddrPtr mac); int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 064e2a6..c786549 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -387,6 +387,7 @@ virDomainGraphicsSpiceZlibCompressionTypeFromString; virDomainGraphicsSpiceZlibCompressionTypeToString; virDomainGraphicsTypeFromString; virDomainGraphicsTypeToString; +virDomainHasDiskMirror; virDomainHostdevDefAlloc; virDomainHostdevDefClear; virDomainHostdevDefFree; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 23a7a75..df77df1 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2697,6 +2697,11 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_PARTIAL, sizeof(header.magic)); @@ -5562,6 +5567,12 @@ static virDomainPtr qemudDomainDefine(virConnectPtr conn, const char *xml) { } } def = NULL; + if (virDomainHasDiskMirror(vm)) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1; if (virDomainSaveConfig(driver->configDir, @@ -11115,6 +11126,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } + if (!vm->persistent && (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT)) { virReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cannot halt after transient domain snapshot")); @@ -11857,6 +11874,11 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } snap = virDomainSnapshotFindByName(vm->snapshots, snapshot->name); if (!snap) { @@ -12639,6 +12661,13 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto cleanup; disk = vm->def->disks[idx]; + if (mode == BLOCK_JOB_PULL && disk->mirror) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 7dc88ba..6dde2ef 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1776,6 +1776,13 @@ int qemuDomainDetachDiskDevice(struct qemud_driver *driver, detach = vm->def->disks[i]; + if (detach->mirror) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' is in an active block copy job"), + detach->dst); + goto cleanup; + } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0) != 0) { virReportError(VIR_ERR_INTERNAL_ERROR, -- 1.7.11.4

Upstream qemu 1.3 is adding two new monitor commands, 'drive-mirror' and 'block-job-complete'[1], which can drive live block copy and storage migration. Additionally, RHEL 6.3 had backported an earlier version of most of the same functionality, but under the names '__com.redhat_drive-mirror' and '__com.redhat_drive-reopen' and with slightly different JSON arguments, which we can support without too much extra effort. The libvirt API virDomainBlockRebase as already committed for 0.9.12 is flexible enough to expose the basics of block copy, but some additional features in the 'drive-mirror' qemu command, such as setting granularity or using a persistent bitmap, may later require a new libvirt API virDomainBlockCopy. The use of two capability bits allows the bulk of libvirt code to handle both RHEL 6.3 and upstream qemu 1.3 without caring about the difference in monitor commands used; one bit that says whether the commands are present, and another saying whether the RHEL 6.3 limitations are in effect. The choice of XML names for the capability bits is essential for interoperability with the RHEL 6.3 version of libvirt. [TODO - link to Paolo's v2 post in September, once he posts it] [1]https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg03082.html * src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR) (QEMU_CAPS_DRIVE_REOPEN): New bits. * src/qemu/qemu_capabilities.c (qemuCaps): Name them. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set them. (qemuMonitorJSONDriveMirror, qemuMonitorDrivePivot): New functions. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror) (qemuMonitorDrivePivot): Declare them. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror) (qemuMonitorDrivePivot): New passthroughs. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror) (qemuMonitorDrivePivot): Declare them. --- src/qemu/qemu_capabilities.c | 2 ++ src/qemu/qemu_capabilities.h | 2 ++ src/qemu/qemu_monitor.c | 52 ++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 14 ++++++++ src/qemu/qemu_monitor_json.c | 82 ++++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 21 ++++++++++-- 6 files changed, 171 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 43645bf..fcf1e47 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -177,6 +177,8 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "disable-s4", /* 105 */ "usb-redir.filter", + "drive-mirror", + "drive-reopen", ); struct _qemuCaps { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index c1519ed..f04451f 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -142,6 +142,8 @@ enum qemuCapsFlags { QEMU_CAPS_DISABLE_S3 = 104, /* S3 BIOS Advertisement on/off */ QEMU_CAPS_DISABLE_S4 = 105, /* S4 BIOS Advertisement on/off */ QEMU_CAPS_USB_REDIR_FILTER = 106, /* usb-redir.filter */ + QEMU_CAPS_DRIVE_MIRROR = 107, /* drive-mirror monitor command */ + QEMU_CAPS_DRIVE_REOPEN = 108, /* drive-reopen instead of block-job-complete */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index f8d717f..b7730fd 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2749,6 +2749,39 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; } +/* Start a drive-mirror block job. bandwidth is in MB/sec. */ +int +qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned long bandwidth, + bool reopen, unsigned int flags) +{ + int ret = -1; + unsigned long long speed; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, bandwidth=%ld, " + "reopen=%d, flags=%x", + mon, device, file, NULLSTR(format), bandwidth, reopen, flags); + + /* Convert bandwidth MiB to bytes */ + speed = bandwidth; + if (speed > ULLONG_MAX / 1024 / 1024) { + virReportError(VIR_ERR_OVERFLOW, + _("bandwidth must be less than %llu"), + ULLONG_MAX / 1024 / 1024); + return -1; + } + speed = bandwidth * 1024ULL * 1024ULL; + + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, device, file, format, speed, + reopen, flags); + else + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("drive-mirror requires JSON monitor")); + return ret; +} + /* Use the transaction QMP command to run atomic snapshot commands. */ int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) @@ -2765,6 +2798,25 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) return ret; } +/* Use the block-job-complete or drive-reopen monitor command to pivot + * a block copy job. */ +int +qemuMonitorDrivePivot(qemuMonitorPtr mon, const char *device, + const char *file, const char *format, bool reopen) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s, reopen=%d", + mon, device, file, NULLSTR(format), reopen); + + if (mon->json) + ret = qemuMonitorJSONDrivePivot(mon, device, file, format, reopen); + else + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("drive pivot requires JSON monitor")); + return ret; +} + int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, char **reply, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index d44e93c..e37dac8 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -530,6 +530,20 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, bool reuse); int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorDriveMirror(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + unsigned long bandwidth, + bool reopen, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int qemuMonitorDrivePivot(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + bool reopen) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index ed18a64..a223c08 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -1012,6 +1012,10 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, qemuCapsSet(caps, QEMU_CAPS_BLOCKJOB_ASYNC); else if (STREQ(name, "dump-guest-memory")) qemuCapsSet(caps, QEMU_CAPS_DUMP_GUEST_MEMORY); + else if (strstr(name, "drive-mirror")) + qemuCapsSet(caps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "__com.redhat_drive-reopen")) + qemuCapsSet(caps, QEMU_CAPS_DRIVE_REOPEN); } ret = 0; @@ -3304,6 +3308,52 @@ cleanup: return ret; } +/* speed is in bytes/sec */ +int +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + const char *device, const char *file, + const char *format, unsigned long long speed, + bool reopen, unsigned int flags) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + bool shallow = (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) != 0; + bool reuse = (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) != 0; + + if (reopen) + cmd = qemuMonitorJSONMakeCommand("__com.redhat_drive-mirror", + "s:device", device, + "s:target", file, + "U:speed", speed, + "b:full", !shallow, + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + else + cmd = qemuMonitorJSONMakeCommand("drive-mirror", + "s:device", device, + "s:target", file, + "U:speed", speed, + "s:sync", shallow ? "top" : "full", + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) { @@ -3332,6 +3382,38 @@ cleanup: return ret; } +int +qemuMonitorJSONDrivePivot(qemuMonitorPtr mon, const char *device, + const char *file, const char *format, bool reopen) +{ + int ret; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + if (reopen) + cmd = qemuMonitorJSONMakeCommand("__com.redhat_drive-reopen", + "s:device", device, + "s:new-image-file", file, + format ? "s:format" : NULL, format, + NULL); + else + cmd = qemuMonitorJSONMakeCommand("block-job-complete", + "s:device", device, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index d092b88..bdcf819 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -237,8 +237,25 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, const char *device, const char *file, const char *format, - bool reuse); -int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions); + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + unsigned long long speed, + bool reopen, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); +int qemuMonitorJSONDrivePivot(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format, + bool reopen) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, -- 1.7.11.4

Handle the new type of block copy event and info. Of course, this patch does nothing until a later patch actually allows the creation/abort of a block copy job. The new event is available in qemu 1.3, but not in RHEL 6.3; rather than doing polling ourselves to synthesize the event in RHEL 6.3, we just document that the event might not occur. * include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_READY): New block job status. * src/libvirt.c (virDomainBlockRebase): Document the event. * src/qemu/qemu_monitor_json.c (eventHandlers): New event. (qemuMonitorJSONHandleBlockJobReady): New function. (qemuMonitorJSONGetBlockJobInfoOne): Translate new job type. (qemuMonitorJSONHandleBlockJobImpl): Handle new event and job type. * src/qemu/qemu_process.c (qemuProcessHandleBlockJob): Recognize the event to minimize snooping. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful info query to save effort on a pivot request. --- include/libvirt/libvirt.h.in | 1 + src/libvirt.c | 7 ++++--- src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_monitor_json.c | 19 +++++++++++++++++-- src/qemu/qemu_process.c | 2 ++ 5 files changed, 30 insertions(+), 5 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index ca04f6c..ab7f3fe 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -3884,6 +3884,7 @@ typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, VIR_DOMAIN_BLOCK_JOB_CANCELED = 2, + VIR_DOMAIN_BLOCK_JOB_READY = 3, #ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_LAST diff --git a/src/libvirt.c b/src/libvirt.c index 6e25baf..05724c3 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18828,9 +18828,10 @@ error: * mirror all further changes to both source and destination. The user * must call virDomainBlockJobAbort() to end the mirroring while choosing * whether to revert to source or pivot to the destination. An event is - * issued when the job ends, and in the future, an event may be added when - * the job transitions from pulling to mirroring. If the job is aborted, - * a new job will have to start over from the beginning of the first phase. + * issued when the job ends, and depending on the hypervisor, an event may + * also be issued when the job transitions from pulling to mirroring. If + * the job is aborted, a new job will have to start over from the beginning + * of the first phase. * * Some hypervisors will restrict certain actions, such as virDomainSave() * or virDomainDetachDevice(), while a copy job is active; they may diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index df77df1..e8f42f2 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12687,6 +12687,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (ret < 0) goto endjob; + /* Snoop block copy operations, so future cancel operations can + * avoid checking if pivot is safe. */ + if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && + info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + disk->mirroring = true; + /* With synchronous block cancel, we must synthesize an event, and * we silently ignore the ABORT_ASYNC flag. With asynchronous * block cancel, the event will come from qemu, but without the diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index a223c08..b23be1e 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -69,6 +69,7 @@ static void qemuMonitorJSONHandlePMWakeup(qemuMonitorPtr mon, virJSONValuePtr da static void qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobReady(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleBalloonChange(qemuMonitorPtr mon, virJSONValuePtr data); typedef struct { @@ -81,6 +82,7 @@ static qemuEventHandler eventHandlers[] = { { "BLOCK_IO_ERROR", qemuMonitorJSONHandleIOError, }, { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, }, { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_READY", qemuMonitorJSONHandleBlockJobReady, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "POWERDOWN", qemuMonitorJSONHandlePowerdown, }, { "RESET", qemuMonitorJSONHandleReset, }, @@ -803,6 +805,8 @@ qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type_str, "mirror")) + type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; switch ((virConnectDomainEventBlockJobStatus) event) { case VIR_DOMAIN_BLOCK_JOB_COMPLETED: @@ -811,11 +815,12 @@ qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, event = VIR_DOMAIN_BLOCK_JOB_FAILED; break; case VIR_DOMAIN_BLOCK_JOB_CANCELED: + case VIR_DOMAIN_BLOCK_JOB_READY: break; case VIR_DOMAIN_BLOCK_JOB_FAILED: case VIR_DOMAIN_BLOCK_JOB_LAST: - VIR_DEBUG("should not get here"); - break; + VIR_DEBUG("should not get here"); + break; } out: @@ -879,6 +884,14 @@ qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, } static void +qemuMonitorJSONHandleBlockJobReady(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_READY); +} + +static void qemuMonitorJSONHandleBalloonChange(qemuMonitorPtr mon, virJSONValuePtr data) { @@ -3530,6 +3543,8 @@ static int qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, } if (STREQ(type, "stream")) info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type, "mirror")) + info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; else info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 5ac1d2b..7eac7db 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -913,6 +913,8 @@ qemuProcessHandleBlockJob(qemuMonitorPtr mon ATTRIBUTE_UNUSED, if (disk) { path = disk->src; event = virDomainEventBlockJobNewFromObj(vm, path, type, status); + if (disk->mirror && status == VIR_DOMAIN_BLOCK_JOB_READY) + disk->mirroring = true; } virDomainObjUnlock(vm); -- 1.7.11.4

This is the bare minimum to end a copy job (of course, until a later patch adds the ability to start a copy job, this patch doesn't do much in isolation; I've just split the patches to ease the review). This patch intentionally avoids SELinux, lock manager, and audit actions. Also, if libvirtd restarts at the exact moment that a 'block-job-complete' is in flight, the proposed proper way to detect the outcome of that would be with some additional query commands for qemu 1.3 (RHEL 6.3 is out of luck). This patch is enough to test the common case of success when used correctly, while saving the subtleties of proper cleanup for worst-case errors for later. When a mirror job is started, cancelling the job safely reverts back to the source disk, regardless of whether the destination is in phase 1 (streaming, in which case the destination is worthless) or phase 2 (mirroring, in which case the destination is synced up to the source at the time of the cancel). Our existing code does just fine in either phase, other than some bookkeeping cleanup; this implements live block copy. Pivoting the job requires the qemu 1.3 'block-job-complete' (safe) or the RHEL 6.3 '__com.redhat_drive-reopen' command (where failure of the command is potentially catastrophic to the domain, since it rips out the old disk before attempting to open the new one). Ideas for future enhancements via new flags: Since qemu 1.3 is safer than RHEL 6.3, it may be worth adding a VIR_DOMAIN_REBASE_COPY_ATOMIC flag that fails up front if we detect an older qemu with the risky pivot operation. Interesting side note: while snapshot-create --disk-only creates a copy of the disk at a point in time by moving the domain on to a new file (the copy is the file now in the just-extended backing chain), blockjob --abort of a copy job creates a copy of the disk while keeping the domain on the original file. There may be potential improvements to the snapshot code to exploit block copy over multiple disks all at one point in time. And, if 'block-job-cancel' were made part of 'transaction', you could copy multiple disks at the same point in time without pausing the domain. This also implies we may want to add a --quiesce flag to virDomainBlockJobAbort, so that when breaking a mirror (whether by cancel or pivot), the side of the mirror that we are abandoning is at least in a stable state with regards to guest I/O. * src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag. (qemuDomainBlockPivot): New helper function. (qemuDomainBlockJobImpl): Implement it. --- src/qemu/qemu_driver.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 106 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e8f42f2..df581b4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12607,6 +12607,87 @@ cleanup: return ret; } +/* Called while holding the VM job lock, to implement a block job + * abort with pivot; this updates the VM definition as appropriate, on + * either success or failure (although there are some forms of + * catastrophic failure that will leave the VM unusable). */ +static int +qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, + const char *device, virDomainDiskDefPtr disk) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainBlockJobInfo info; + bool reopen = qemuCapsGet(priv->caps, QEMU_CAPS_DRIVE_REOPEN); + + /* Probe the status, if needed. */ + if (!disk->mirroring) { + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorBlockJob(priv->mon, device, NULL, 0, &info, + BLOCK_JOB_INFO, true); + qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto cleanup; + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } + if (ret == 1 && info.cur == info.end && + info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + disk->mirroring = true; + } + + if (!disk->mirroring) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' not ready for pivot yet"), + disk->dst); + goto cleanup; + } + + /* Attempt the pivot. */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror, + disk->mirrorFormat, reopen); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + /* Note that RHEL 6.3 'drive-reopen' has the remote risk of a + * catastrophic failure, where the it fails but can't recover by + * reopening the source. Not much we can do about it. qemu 1.3 + * 'block-job-complete' is safer by design. */ + + if (ret == 0) { + /* XXX We want to revoke security labels and disk lease, as + * well as audit that revocation, before dropping the original + * source. But it gets tricky if both source and mirror share + * common backing files (we want to only revoke the non-shared + * portion of the chain, and is made more difficult by the + * fact that we aren't tracking the full chain ourselves; so + * for now, we leak the access to the original. */ + VIR_FREE(disk->src); + VIR_FREE(disk->driverType); + disk->src = disk->mirror; + disk->driverType = disk->mirrorFormat; + disk->mirror = NULL; + disk->mirrorFormat = NULL; + disk->mirroring = false; + } else { + /* On failure, qemu abandons the mirror, and attempts to + * revert back to the source disk. Hopefully it was able to + * reopen things. */ + /* XXX should we be parsing the exact qemu error, or calling + * 'query-block', to see what state we really got left in + * before killing the mirroring job? And just as on the + * success case, there's security labeling to worry about. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } + +cleanup: + return ret; +} + static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, virDomainBlockJobInfoPtr info, @@ -12667,6 +12748,14 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, disk->dst); goto cleanup; } + if (mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) && + !(async && disk->mirror)) { + virReportError(VIR_ERR_OPERATION_INVALID, + _("pivot of disk '%s' requires an active copy job"), + disk->dst); + goto cleanup; + } if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; @@ -12677,6 +12766,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto endjob; } + if (disk->mirror && mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + ret = qemuDomainBlockPivot(driver, vm, device, disk); + goto endjob; + } + qemuDomainObjEnterMonitorWithDriver(driver, vm); /* XXX - libvirt should really be tracking the backing file chain * itself, and validating that base is on the chain, rather than @@ -12693,6 +12788,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) disk->mirroring = true; + /* A successful block job cancelation stops any mirroring. */ + if (mode == BLOCK_JOB_ABORT && disk->mirror) { + /* XXX We should also revoke security labels and disk lease on + * the mirror, and audit that fact, before dropping things. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } + /* With synchronous block cancel, we must synthesize an event, and * we silently ignore the ABORT_ASYNC flag. With asynchronous * block cancel, the event will come from qemu, but without the @@ -12757,7 +12861,8 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC | + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT, -1); return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT, flags); } -- 1.7.11.4

Since libvirt drops locks between issuing a monitor command and getting a response, it is possible for libvirtd to be restarted before getting a response on a drive-reopen command; worse, it is also possible for the guest to shut itself down during the window while libvirtd is down, ending the qemu process. A management app needs to know if the pivot happened (and the destination file contains guest contents not in the source) or failed (and the source file contains guest contents not in the destination), but since the job is finished, 'query-block-jobs' no longer tracks the status of the job, and if the qemu process itself has disappeared, even 'query-block' cannot be checked to ask qemu its current state. This is only a problem for the RHEL 6.3 drive-reopen command; which partly explains why upstream qemu 1.3 abandoned that command and went with block-job-complete instead. If we surround 'drive-reopen' with a pause/resume pair, then we can guarantee that the guest cannot modify either source or destination files in the window of libvirtd uncertainty, and the management app is guaranteed that either libvirt knows the outcome and reported it correctly; or that on libvirtd restart, the guest will still be paused and that the qemu process cannot have disappeared due to guest shutdown; and use that as a clue that the management app must implement recovery protocol, with both source and destination files still being in sync and with 'query-block' still being an option as part of that recovery. My testing of early implementations of 'drive-reopen' show that the pause window will typically be only a fraction of a second. * src/qemu/qemu_driver.c (qemuDomainBlockPivot): Pause around drive-reopen. (qemuDomainBlockJobImpl): Update caller. --- src/qemu/qemu_driver.c | 35 +++++++++++++++++++++++++++++++++-- 1 file changed, 33 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index df581b4..4838871 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12612,13 +12612,15 @@ cleanup: * either success or failure (although there are some forms of * catastrophic failure that will leave the VM unusable). */ static int -qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, +qemuDomainBlockPivot(virConnectPtr conn, + struct qemud_driver *driver, virDomainObjPtr vm, const char *device, virDomainDiskDefPtr disk) { int ret = -1; qemuDomainObjPrivatePtr priv = vm->privateData; virDomainBlockJobInfo info; bool reopen = qemuCapsGet(priv->caps, QEMU_CAPS_DRIVE_REOPEN); + bool resume = false; /* Probe the status, if needed. */ if (!disk->mirroring) { @@ -12645,6 +12647,27 @@ qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, goto cleanup; } + /* If we are using the older 'drive-reopen', we want to make sure + * that management apps can tell whether the command succeeded, + * even if libvirtd is restarted at the wrong time. To accomplish + * that, we pause the guest before drive-reopen, and resume it + * only when we know the outcome; if libvirtd restarts, then + * management will see the guest still paused, and know that no + * guest I/O has caused the source and mirror to diverge. With + * the newer 'block-job-complete', things are already safe. */ + if (reopen && virDomainObjGetState(vm, NULL) == VIR_DOMAIN_RUNNING) { + if (qemuProcessStopCPUs(driver, vm, VIR_DOMAIN_PAUSED_SAVE, + QEMU_ASYNC_JOB_NONE) < 0) + goto cleanup; + + resume = true; + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("guest unexpectedly quit")); + goto cleanup; + } + } + /* Attempt the pivot. */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror, @@ -12685,6 +12708,14 @@ qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, } cleanup: + if (resume && virDomainObjIsActive(vm) && + qemuProcessStartCPUs(driver, vm, conn, + VIR_DOMAIN_RUNNING_UNPAUSED, + QEMU_ASYNC_JOB_NONE) < 0 && + virGetLastError() == NULL) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("resuming after drive-reopen failed")); + } return ret; } @@ -12768,7 +12799,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (disk->mirror && mode == BLOCK_JOB_ABORT && (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { - ret = qemuDomainBlockPivot(driver, vm, device, disk); + ret = qemuDomainBlockPivot(dom->conn, driver, vm, device, disk); goto endjob; } -- 1.7.11.4

Il 16/09/2012 01:53, Eric Blake ha scritto:
This is only a problem for the RHEL 6.3 drive-reopen command; which partly explains why upstream qemu 1.3 abandoned that command and went with block-job-complete instead.
Not really true, in fact it's worse with block-job-complete because it's asynchronous and can take longer. However, for QEMU 1.3 we'll provide a persistent dirty bitmap that will let you catch this when restarting libvirt. Paolo

Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, the audit logs aren't updated, and bandwidth is not supported. But those will be added as improvements in future patches. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. --- src/qemu/qemu_driver.c | 126 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 124 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4838871..da34f14 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12917,10 +12917,130 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, } static int +qemuDomainBlockCopy(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *device = NULL; + virDomainDiskDefPtr disk; + int ret = -1; + int idx; + bool reopen; + + /* Preliminaries: find the disk we are editing, sanity checks */ + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + + qemuDriverLock(driver); + virUUIDFormat(dom->uuid, uuidstr); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + if (!vm) { + virReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } + + device = qemuDiskPathToAlias(vm, path, &idx); + if (!device) { + goto cleanup; + } + disk = vm->def->disks[idx]; + if (disk->mirror) { + virReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + + priv = vm->privateData; + reopen = qemuCapsGet(priv->caps, QEMU_CAPS_DRIVE_REOPEN); + if (!(qemuCapsGet(priv->caps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->caps, QEMU_CAPS_BLOCKJOB_ASYNC))) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("block copy is not supported with this QEMU binary")); + goto cleanup; + } + if (vm->persistent) { + /* XXX if qemu ever lets us start a new domain with mirroring + * already active, we can relax this; but for now, the risk of + * 'managedsave' due to libvirt-guests means we can't risk + * this on persistent domains. */ + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not transient")); + goto cleanup; + } + + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto endjob; + } + + /* Prepare the destination file. */ + /* XXX We also need to add security labeling, lock manager lease, + * and auditing of those events, as well as to support reuse of + * existing images, including probing the existing format of an + * existing image. */ + if (!format) + format = disk->driverType; + if ((format && !(disk->mirrorFormat = strdup(format))) || + !(disk->mirror = strdup(dest))) { + virReportOOMError(); + goto endjob; + } + + /* Actually start the mirroring */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, bandwidth, + reopen, flags); + qemuDomainObjExitMonitorWithDriver(driver, vm); + +endjob: + if (ret < 0) { + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + } + if (qemuDomainObjEndJob(driver, vm) == 0) { + vm = NULL; + goto cleanup; + } + +cleanup: + VIR_FREE(device); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + +static int qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); + + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + const char *format = NULL; + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) + format = "raw"; + flags &= ~(VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW); + return qemuDomainBlockCopy(dom, path, base, format, bandwidth, flags); + } + return qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, BLOCK_JOB_PULL, flags); } @@ -12929,7 +13049,9 @@ static int qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { - return qemuDomainBlockRebase(dom, path, NULL, bandwidth, flags); + virCheckFlags(0, -1); + return qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, + BLOCK_JOB_PULL, flags); } static int -- 1.7.11.4

This copies some of the checks from snapshots regarding testing when a file already exists. In the process, I noticed snapshots had hard-to-read logic, as well as a missing sanity check: REUSE_EXT should require the destination to already be present. * src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT flag. (qemuDomainBlockCopy): Wire up flag, and add some sanity checks. (qemuDomainSnapshotDiskPrepare): Require destination on REUSE_EXT. --- src/qemu/qemu_driver.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 44 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index da34f14..0afb1c8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -10641,8 +10641,13 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, _("unable to stat for disk %s: %s"), disk->name, disk->file); goto cleanup; + } else if (allow_reuse) { + virReportSystemError(errno, + _("missing existing file for disk %s: %s"), + disk->name, disk->file); + goto cleanup; } - } else if (!(S_ISBLK(st.st_mode) || !st.st_size || allow_reuse)) { + } else if (!S_ISBLK(st.st_mode) && st.st_size && !allow_reuse) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("external snapshot file for disk %s already " "exists and is not a block device: %s"), @@ -12930,9 +12935,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, int ret = -1; int idx; bool reopen; + struct stat st; /* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1); qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -12987,12 +12994,42 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } + /* XXX this is pessimistic; we could use 'query-block' or even + * keep track of the backing chain ourselves, rather than assuming + * that all non-raw source files have a current backing image */ + if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + STREQ_NULLABLE(format, "raw") && + STRNEQ_NULLABLE(disk->driverType, "raw")) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("raw shallow copy of non-raw disk '%s' not possible"), + disk->dst); + goto endjob; + } + /* Prepare the destination file. */ + if (stat(dest, &st) < 0) { + if (errno != ENOENT) { + virReportSystemError(errno, _("unable to stat for disk %s: %s"), + disk->dst, dest); + goto endjob; + } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + virReportSystemError(errno, + _("missing destination file for disk %s: %s"), + disk->dst, dest); + goto endjob; + } + } else if (!S_ISBLK(st.st_mode) && st.st_size && + !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("external destination file for disk %s already " + "exists and is not a block device: %s"), + disk->dst, dest); + goto endjob; + } + /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events, as well as to support reuse of - * existing images, including probing the existing format of an - * existing image. */ - if (!format) + * and auditing of those events. */ + if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) format = disk->driverType; if ((format && !(disk->mirrorFormat = strdup(format))) || !(disk->mirror = strdup(dest))) { @@ -13029,6 +13066,7 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | VIR_DOMAIN_BLOCK_REBASE_COPY | VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); -- 1.7.11.4

This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, permit a block device through cgroups, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label at the end of the mirroring is a trickier prospect (we would have to know the backing chain of both source and destination, and be sure not to revoke rights to any part of the chain that is shared), so for now, virDomainBlockJobAbort still leaves things locked by the manager, open to cgroups, and labeled by SELinux. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling. --- src/qemu/qemu_driver.c | 88 ++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 79 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0afb1c8..b9014c3 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12936,6 +12936,12 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, int idx; bool reopen; struct stat st; + bool need_unlink = false; + char *mirror = NULL; + char *mirrorFormat = NULL; + char *origsrc = NULL; + char *origdriver = NULL; + virCgroupPtr cgroup = NULL; /* Preliminaries: find the disk we are editing, sanity checks */ virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | @@ -12954,6 +12960,13 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, _("domain is not running")); goto cleanup; } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES) && + virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } device = qemuDiskPathToAlias(vm, path, &idx); if (!device) { @@ -13027,33 +13040,90 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } - /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events. */ - if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) - format = disk->driverType; - if ((format && !(disk->mirrorFormat = strdup(format))) || - !(disk->mirror = strdup(dest))) { + if (!(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + int fd = qemuOpenFile(driver, dest, O_WRONLY | O_TRUNC | O_CREAT, + &need_unlink, NULL); + if (fd < 0) + goto endjob; + VIR_FORCE_CLOSE(fd); + if (!format) + format = disk->driverType; + } + if ((format && !(mirrorFormat = strdup(format))) || + !(mirror = strdup(dest))) { virReportOOMError(); goto endjob; } + /* Manipulate disk in place, in a way that can be reverted on + * failure, in order to set up labeling and locking. */ + origsrc = disk->src; + disk->src = (char *) dest; + origdriver = disk->driverType; + disk->driverType = (char *) "raw"; /* Don't want to probe backing files */ + + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto endjob; + if (cgroup && qemuSetupDiskCgroup(driver, vm, cgroup, disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto cleanup; + } + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (cgroup && qemuTeardownDiskCgroup(driver, vm, cgroup, disk) < 0) + VIR_WARN("Failed to teardown cgroup for disk path %s", dest); + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + } + /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDriveMirror(priv->mon, device, dest, format, bandwidth, reopen, flags); + virDomainAuditDisk(vm, NULL, dest, "mirror", ret >= 0); qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) { + if (cgroup && qemuTeardownDiskCgroup(driver, vm, cgroup, disk) < 0) + VIR_WARN("Failed to teardown cgroup for disk path %s", dest); + if (virSecurityManagerRestoreImageLabel(driver->securityManager, + vm->def, disk) < 0) + VIR_WARN("Unable to restore security label on %s", dest); + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + } + + disk->src = origsrc; + origsrc = NULL; + disk->driverType = origdriver; + origdriver = NULL; + + /* Update vm in place to match changes. */ + need_unlink = false; + disk->mirror = mirror; + disk->mirrorFormat = mirrorFormat; + mirror = NULL; + mirrorFormat = NULL; endjob: - if (ret < 0) { - VIR_FREE(disk->mirror); - VIR_FREE(disk->mirrorFormat); + if (origsrc) { + disk->src = origsrc; + disk->driverType = origdriver; } + if (need_unlink && unlink(dest)) + VIR_WARN("unable to unlink just-created %s", dest); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); if (qemuDomainObjEndJob(driver, vm) == 0) { vm = NULL; goto cleanup; } cleanup: + if (cgroup) + virCgroupFree(&cgroup); VIR_FREE(device); if (vm) virDomainObjUnlock(vm); -- 1.7.11.4

When using block copy to pivot over to a new chain, and the user passes --reuse-ext, it could be that the user has already transferred the entire backing chain, and that the external file being reused has a backing chain that needs SELinux labeling. https://bugzilla.redhat.com/show_bug.cgi?id=856247 * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Don't bypass labeling when reusing external file. --- This patch is new compared to the v6 posting. I'm actually planning on squashing this into patch 8/9 before pushing, but posting for separate review makes it easier to track how this is being backported to RHEL 6.3. src/qemu/qemu_driver.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b9014c3..246969e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -13060,7 +13060,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, origsrc = disk->src; disk->src = (char *) dest; origdriver = disk->driverType; - disk->driverType = (char *) "raw"; /* Don't want to probe backing files */ + if (!(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + /* Don't want to probe backing files unless the chain is + * coming from the external use */ + disk->driverType = (char *) "raw"; + } if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) goto endjob; -- 1.7.11.4

When using block copy to pivot over to a new chain, the backing files for the new chain might still need labeling (particularly if the user passes --reuse-ext with a relative backing file name). Relabeling a file that is already labeled won't hurt, so this just labels the entire chain at the point of the pivot. https://bugzilla.redhat.com/show_bug.cgi?id=856247 * src/qemu/qemu_driver.c (qemuDomainBlockPivot): Relabel chain before asking qemu to pivot. --- Diff from v7: do the recursive relabel at pivot time rather than initial copy time, since we intentionally permit the copy operation to start prior to copying the backing chain, to allow for some parallelism in the overall storage migration operation. I still plan on squashing this into patch 8/9; but am posting this separately for ease of backporting to RHEL 6.3. src/qemu/qemu_driver.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 32cd9ad..4049fba 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12615,6 +12615,7 @@ qemuDomainBlockPivot(virConnectPtr conn, virDomainBlockJobInfo info; bool reopen = qemuCapsGet(priv->caps, QEMU_CAPS_DRIVE_REOPEN); bool resume = false; + virCgroupPtr cgroup = NULL; /* Probe the status, if needed. */ if (!disk->mirroring) { @@ -12662,6 +12663,23 @@ qemuDomainBlockPivot(virConnectPtr conn, } } + /* We previously labeled only the top-level image; but if the + * image includes a relative backing file, the pivot may result in + * qemu needing to open the entire backing chain, so we need to + * label the entire chain. This action is safe even if the + * backing chain has already been labeled. */ + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES) && + virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } + if ((cgroup && qemuSetupDiskCgroup(driver, vm, cgroup, disk) < 0) || + virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) + goto cleanup; + /* Attempt the pivot. */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror, @@ -12702,6 +12720,8 @@ qemuDomainBlockPivot(virConnectPtr conn, } cleanup: + if (cgroup) + virCgroupFree(&cgroup); if (resume && virDomainObjIsActive(vm) && qemuProcessStartCPUs(driver, vm, conn, VIR_DOMAIN_RUNNING_UNPAUSED, -- 1.7.11.4

When using block copy to pivot over to a new chain, the backing files for the new chain might still need labeling (particularly if the user passes --reuse-ext with a relative backing file name). Relabeling a file that is already labeled won't hurt, so this just labels the entire chain at the point of the pivot. https://bugzilla.redhat.com/show_bug.cgi?id=856247 * src/qemu/qemu_driver.c (qemuDomainBlockPivot): Relabel chain before asking qemu to pivot. --- Diff from v7.5: relabel the new disk chain, not the old chain src/qemu/qemu_driver.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 8201c9d..828ad6e 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12596,6 +12596,9 @@ qemuDomainBlockPivot(virConnectPtr conn, virDomainBlockJobInfo info; bool reopen = qemuCapsGet(priv->caps, QEMU_CAPS_DRIVE_REOPEN); bool resume = false; + virCgroupPtr cgroup = NULL; + char *oldsrc = NULL; + char *olddriver = NULL; /* Probe the status, if needed. */ if (!disk->mirroring) { @@ -12643,6 +12646,30 @@ qemuDomainBlockPivot(virConnectPtr conn, } } + /* We previously labeled only the top-level image; but if the + * image includes a relative backing file, the pivot may result in + * qemu needing to open the entire backing chain, so we need to + * label the entire chain. This action is safe even if the + * backing chain has already been labeled. */ + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES) && + virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } + oldsrc = disk->src; + olddriver = disk->driverType; + disk->src = disk->mirror; + disk->driverType = disk->mirrorFormat; + if ((cgroup && qemuSetupDiskCgroup(driver, vm, cgroup, disk) < 0) || + virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + disk->src = oldsrc; + disk->driverType = olddriver; + goto cleanup; + } + /* Attempt the pivot. */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror, @@ -12662,10 +12689,8 @@ qemuDomainBlockPivot(virConnectPtr conn, * portion of the chain, and is made more difficult by the * fact that we aren't tracking the full chain ourselves; so * for now, we leak the access to the original. */ - VIR_FREE(disk->src); - VIR_FREE(disk->driverType); - disk->src = disk->mirror; - disk->driverType = disk->mirrorFormat; + VIR_FREE(oldsrc); + VIR_FREE(olddriver); disk->mirror = NULL; disk->mirrorFormat = NULL; disk->mirroring = false; @@ -12677,12 +12702,16 @@ qemuDomainBlockPivot(virConnectPtr conn, * 'query-block', to see what state we really got left in * before killing the mirroring job? And just as on the * success case, there's security labeling to worry about. */ + disk->src = oldsrc; + disk->driverType = olddriver; VIR_FREE(disk->mirror); VIR_FREE(disk->mirrorFormat); disk->mirroring = false; } cleanup: + if (cgroup) + virCgroupFree(&cgroup); if (resume && virDomainObjIsActive(vm) && qemuProcessStartCPUs(driver, vm, conn, VIR_DOMAIN_RUNNING_UNPAUSED, -- 1.7.11.4

When using block copy to pivot over to a new chain, the backing files for the new chain might still need labeling (particularly if the user passes --reuse-ext with a relative backing file name). Relabeling a file that is already labeled won't hurt, so this just labels the entire chain at the point of the pivot. But to do the relabel, we must probe the file type of an external file (the API already promised to do this, and the probe is safe; we are just moving the probe from qemu into libvirt). https://bugzilla.redhat.com/show_bug.cgi?id=856247 * src/qemu/qemu_driver.c (qemuDomainBlockPivot): Relabel chain before asking qemu to pivot. (qemuDomainBlockCopy): When reusing external file, probe its type. --- Changes since v7.6: hoist the safe format probing out of qemu into libvirt when reusing an external file in shallow mode; this version finally passed testing from VDSM folks src/qemu/qemu_driver.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 48 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6db72ee..acc3807 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -12491,6 +12491,9 @@ qemuDomainBlockPivot(virConnectPtr conn, virDomainBlockJobInfo info; bool reopen = qemuCapsGet(priv->caps, QEMU_CAPS_DRIVE_REOPEN); bool resume = false; + virCgroupPtr cgroup = NULL; + char *oldsrc = NULL; + char *olddriver = NULL; /* Probe the status, if needed. */ if (!disk->mirroring) { @@ -12538,6 +12541,33 @@ qemuDomainBlockPivot(virConnectPtr conn, } } + /* We previously labeled only the top-level image; but if the + * image includes a relative backing file, the pivot may result in + * qemu needing to open the entire backing chain, so we need to + * label the entire chain. This action is safe even if the + * backing chain has already been labeled; but only necessary when + * we know for sure that there is a backing chain. */ + if (disk->mirrorFormat && + qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES) && + virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0)) { + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Unable to find cgroup for %s"), + vm->def->name); + goto cleanup; + } + oldsrc = disk->src; + olddriver = disk->driverType; + disk->src = disk->mirror; + disk->driverType = disk->mirrorFormat; + if (disk->mirrorFormat && + ((cgroup && qemuSetupDiskCgroup(driver, vm, cgroup, disk) < 0) || + virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0)) { + disk->src = oldsrc; + disk->driverType = olddriver; + goto cleanup; + } + /* Attempt the pivot. */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDrivePivot(priv->mon, device, disk->mirror, @@ -12557,10 +12587,8 @@ qemuDomainBlockPivot(virConnectPtr conn, * portion of the chain, and is made more difficult by the * fact that we aren't tracking the full chain ourselves; so * for now, we leak the access to the original. */ - VIR_FREE(disk->src); - VIR_FREE(disk->driverType); - disk->src = disk->mirror; - disk->driverType = disk->mirrorFormat; + VIR_FREE(oldsrc); + VIR_FREE(olddriver); disk->mirror = NULL; disk->mirrorFormat = NULL; disk->mirroring = false; @@ -12572,12 +12600,16 @@ qemuDomainBlockPivot(virConnectPtr conn, * 'query-block', to see what state we really got left in * before killing the mirroring job? And just as on the * success case, there's security labeling to worry about. */ + disk->src = oldsrc; + disk->driverType = olddriver; VIR_FREE(disk->mirror); VIR_FREE(disk->mirrorFormat); disk->mirroring = false; } cleanup: + if (cgroup) + virCgroupFree(&cgroup); if (resume && virDomainObjIsActive(vm) && qemuProcessStartCPUs(driver, vm, conn, VIR_DOMAIN_RUNNING_UNPAUSED, @@ -12913,6 +12945,18 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, VIR_FORCE_CLOSE(fd); if (!format) format = disk->driverType; + } else if (!format && + !(flags & VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) && + (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW)) { + /* When reusing an existing shallow file, we need to know the + * file type in order to relabel the chain later in + * qemuDomainBlockPivot. Normally, probing is unsafe because + * raw files can be misidentified as non-raw, but in this + * particular case, the user said the file is not raw, and + * having a backing file implies non-raw. */ + int type = virStorageFileProbeFormat(dest); + if (type >= 0) + format = virStorageFileFormatTypeToString(type); } if ((format && !(mirrorFormat = strdup(format))) || !(mirror = strdup(dest))) { -- 1.7.11.4

SIGINT hasn't been blocked, which could lead to losing it somewhere in virDomainGetBlockJobInfo and not aborting the job. --- tools/virsh-domain.c | 24 ++++++++++++++++++------ 1 files changed, 18 insertions(+), 6 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 1df0872..01550c7 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -1308,7 +1308,7 @@ cmdBlockCommit(vshControl *ctl, const vshCmd *cmd) int timeout = 0; struct sigaction sig_action; struct sigaction old_sig_action; - sigset_t sigmask; + sigset_t sigmask, oldsigmask; struct timeval start; struct timeval curr; const char *path = NULL; @@ -1361,7 +1361,11 @@ cmdBlockCommit(vshControl *ctl, const vshCmd *cmd) while (blocking) { virDomainBlockJobInfo info; - int result = virDomainGetBlockJobInfo(dom, path, &info, 0); + int result; + + pthread_sigmask(SIG_BLOCK, &sigmask, &oldsigmask); + result = virDomainGetBlockJobInfo(dom, path, &info, 0); + pthread_sigmask(SIG_SETMASK, &oldsigmask, NULL); if (result < 0) { vshError(ctl, _("failed to query job for disk %s"), path); @@ -1450,7 +1454,7 @@ cmdBlockCopy(vshControl *ctl, const vshCmd *cmd) int timeout = 0; struct sigaction sig_action; struct sigaction old_sig_action; - sigset_t sigmask; + sigset_t sigmask, oldsigmask; struct timeval start; struct timeval curr; const char *path = NULL; @@ -1507,7 +1511,11 @@ cmdBlockCopy(vshControl *ctl, const vshCmd *cmd) while (blocking) { virDomainBlockJobInfo info; - int result = virDomainGetBlockJobInfo(dom, path, &info, 0); + int result; + + pthread_sigmask(SIG_BLOCK, &sigmask, &oldsigmask); + result = virDomainGetBlockJobInfo(dom, path, &info, 0); + pthread_sigmask(SIG_SETMASK, &oldsigmask, NULL); if (result <= 0) { vshError(ctl, _("failed to query job for disk %s"), path); @@ -1674,7 +1682,7 @@ cmdBlockPull(vshControl *ctl, const vshCmd *cmd) int timeout = 0; struct sigaction sig_action; struct sigaction old_sig_action; - sigset_t sigmask; + sigset_t sigmask,oldsigmask; struct timeval start; struct timeval curr; const char *path = NULL; @@ -1727,7 +1735,11 @@ cmdBlockPull(vshControl *ctl, const vshCmd *cmd) while (blocking) { virDomainBlockJobInfo info; - int result = virDomainGetBlockJobInfo(dom, path, &info, 0); + int result; + + pthread_sigmask(SIG_BLOCK, &sigmask, &oldsigmask); + result = virDomainGetBlockJobInfo(dom, path, &info, 0); + pthread_sigmask(SIG_SETMASK, &oldsigmask, NULL); if (result < 0) { vshError(ctl, _("failed to query job for disk %s"), path); -- 1.7.8.6

On 10/11/2012 09:12 AM, Ján Tomko wrote:
SIGINT hasn't been blocked, which could lead to losing it somewhere in virDomainGetBlockJobInfo and not aborting the job.
Indeed, looking at vshWatchJob, we do the same around other libvirt APIs when waiting for Ctrl-C; the problem is that if SIGINT arrives in the middle of a libvirt API, then it can cause poll() to fail with EINTR, and libvirt doesn't handle that well. Delaying the SIGINT until after we are out of libvirt API, and thus sure that an embedded poll() won't be interrupted, prevents some odd failures, including failures which leave the block job running instead of halting it. [I still wonder if libvirt shouldn't be so sensitive to EINTR failures on poll(), and/or whether the virsh loop should be a bit smarter about handling libvirt failure when it also knows that SIGINT fired, but that's a question for another day] [Also for another day - blockcopy, blockpull, and blockcommit share an awfully similar Ctrl-C polling loop. It might be nice to refactor this code to avoid the duplication, and/or convert things over to waiting for libvirt events instead of polling]
--- tools/virsh-domain.c | 24 ++++++++++++++++++------ 1 files changed, 18 insertions(+), 6 deletions(-)
ACK and pushed, with one nit:
@@ -1674,7 +1682,7 @@ cmdBlockPull(vshControl *ctl, const vshCmd *cmd) int timeout = 0; struct sigaction sig_action; struct sigaction old_sig_action; - sigset_t sigmask; + sigset_t sigmask,oldsigmask;
Missing a space here. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

--- tools/virsh-domain.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 01550c7..1a9097f 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -1468,13 +1468,13 @@ cmdBlockCopy(vshControl *ctl, const vshCmd *cmd) } if (vshCommandOptInt(cmd, "timeout", &timeout) > 0) { if (timeout < 1) { - vshError(ctl, "%s", _("migrate: Invalid timeout")); + vshError(ctl, "%s", _("invalid timeout")); return false; } /* Ensure that we can multiply by 1000 without overflowing. */ if (timeout > INT_MAX / 1000) { - vshError(ctl, "%s", _("migrate: Timeout is too big")); + vshError(ctl, "%s", _("timeout is too big")); return false; } timeout *= 1000; -- 1.7.8.6

On 10/11/2012 09:14 AM, Ján Tomko wrote:
--- tools/virsh-domain.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c index 01550c7..1a9097f 100644 --- a/tools/virsh-domain.c +++ b/tools/virsh-domain.c @@ -1468,13 +1468,13 @@ cmdBlockCopy(vshControl *ctl, const vshCmd *cmd) } if (vshCommandOptInt(cmd, "timeout", &timeout) > 0) { if (timeout < 1) { - vshError(ctl, "%s", _("migrate: Invalid timeout")); + vshError(ctl, "%s", _("invalid timeout")); return false; }
/* Ensure that we can multiply by 1000 without overflowing. */ if (timeout > INT_MAX / 1000) { - vshError(ctl, "%s", _("migrate: Timeout is too big")); + vshError(ctl, "%s", _("timeout is too big")); return false;
Copy-and-paste - evidence that our code could be factored better for more reuse. Oh well. ACK and pushed. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 09/15/2012 05:53 PM, Eric Blake wrote:
v6 of these patches is here: https://www.redhat.com/archives/libvir-list/2012-April/msg01240.html
The 'drive-mirror' command not only missed qemu 1.1 (back when I first posted these patches), but also missed qemu 1.2. Paolo is still working on re-posting his patches to the qemu lists (see patch 2/9 for a link to his previous posting, which has had two months of commentary). I'm still not anxious to have these commands committed to upstream libvirt until 'drive-mirror' is at least in qemu.git rather than just Paolo's tree, but thought it would be worth rebasing the series to track the things that have changed since April.
In v7, I've adjusted the series to cater to Paolo's proposed QMP semantics (most of the review on his last patch to the qemu list was on the internals, and not the interface itself), as well as to allow working out-of-the-box with the RHEL 6.3 implementation of block copy.
Unfortunately, since Paolo's patches are still in flux, I suspect this will need a v8 that has seen further testing with both upstream qemu.git and RHEL 6.3 git.
Paolo's patches are further along, but in the meantime, Jeff's patches for block-commit went in. So I am now working on rebasing this on top of my block-commit code, not to mention Dan's cleanups to monitor handling. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (3)
-
Eric Blake
-
Ján Tomko
-
Paolo Bonzini