[libvirt] [PATCHv4 00/18] live block migration via virDomainBlockCopy

v3 was here: https://www.redhat.com/archives/libvir-list/2012-April/msg00288.html changes from v3: - split v3 patch 8 into 1/18 and 9/18 - incorporate Laine's review comments on v3 patches 1 and 2 (here 2 and 3) - fix more bugs found during testing - upgrade to Paolo's latest build (which dropped 'mode':'no-backing-file' and replaced it with 'full':'true') - added a new patch 14/18 that lets blockcopy work with SELinux enforcing (it still leaks files rather than clean up context on job abort or pivot, but that's a harder problem to solve) repo.or.cz is acting up on me at the moment, so I won't have a repo to point to until I can investigate why tomorrow. I'll send an interdiff in reply to this message. Adam Litke (2): blockjob: add API for async virDomainBlockJobAbort blockjob: wire up qemu async virDomainBlockJobAbort Eric Blake (16): blockjob: add qemu capabilities related to block jobs blockjob: allow for fast-finishing job blockjob: add new API flags blockjob: add 'blockcopy' to virsh blockjob: enhance xml to track mirrors across libvirtd restart blockjob: react to active block copy blockjob: expose qemu commands for mirrored storage migration blockjob: return appropriate event and info blockjob: support pivot operation on cancel blockjob: implement block copy for qemu blockjob: allow for existing files blockjob: allow mirroring under SELinux blockjob: accommodate RHEL backport names blockjob: add virDomainBlockCopy blockjob: enhance virsh 'blockcopy' blockjob: wire up qemu and RPC for block copy docs/apibuild.py | 1 + docs/formatdomain.html.in | 11 + docs/schemas/domaincommon.rng | 19 ++- include/libvirt/libvirt.h.in | 50 +++++- include/libvirt/virterror.h | 1 + src/conf/domain_conf.c | 54 ++++++ src/conf/domain_conf.h | 6 + src/driver.h | 5 + src/libvirt.c | 223 +++++++++++++++++++++- src/libvirt_private.syms | 1 + src/libvirt_public.syms | 5 + src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_driver.c | 417 +++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_hotplug.c | 7 + src/qemu/qemu_monitor.c | 50 +++++ src/qemu/qemu_monitor.h | 18 ++- src/qemu/qemu_monitor_json.c | 167 ++++++++++++++--- src/qemu/qemu_monitor_json.h | 19 ++- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +- src/remote_protocol-structs | 9 + src/rpc/gendispatch.pl | 1 + src/util/virterror.c | 6 + tools/virsh.c | 141 +++++++++++--- tools/virsh.pod | 65 ++++++- 26 files changed, 1195 insertions(+), 99 deletions(-) -- 1.7.7.6

The new block copy storage migration sequence requires both the 'drive-mirror' action in 'transaction' (present if the 'drive-mirror' standalone monitor command also exists) and the 'drive-reopen' monitor command (it would be nice if that were also part of a 'transaction', but the initial qemu implementation has it standalone only). Furthermore, qemu 1.1 will be adding 'block_job_cancel' as an asynchronous command, but an earlier implementation (that supported only the qed file format) existed which was synchronous only. If 'drive-mirror' is added in time, then we can safely use 'drive-mirror' as the witness of whether 'block_job_cancel' is synchronous or asynchronous. As of this[1] qemu email, both commands have been proposed but not yet incorporated into the tree; in fact, the implementation I tested with has changed to match this[2] email that suggested a mandatory 'full':'bool' argument rather than 'mode':'no-backing-file'. So there is a risk that qemu 1.1 will have yet another subtly different implementation. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html [2]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg00886.html * src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR) (QEMU_CAPS_DRIVE_REOPEN): New bits. * src/qemu/qemu_capabilities.c (qemuCaps): Name them. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set them. (qemuMonitorJSONDiskSnapshot): Fix formatting issues. --- src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 2 ++ src/qemu/qemu_monitor_json.c | 15 ++++++++------- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 0e09d6d..1938ae4 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -156,6 +156,9 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "scsi-disk.channel", "scsi-block", "transaction", + + "drive-mirror", /* 90 */ + "drive-reopen", ); struct qemu_feature_flags { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 78cdbe0..405bf2a 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -124,6 +124,8 @@ enum qemuCapsFlags { QEMU_CAPS_SCSI_DISK_CHANNEL = 87, /* Is scsi-disk.channel available? */ QEMU_CAPS_SCSI_BLOCK = 88, /* -device scsi-block */ QEMU_CAPS_TRANSACTION = 89, /* transaction monitor command */ + QEMU_CAPS_DRIVE_MIRROR = 90, /* drive-mirror monitor command */ + QEMU_CAPS_DRIVE_REOPEN = 91, /* drive-reopen monitor command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index d09d779..c8ed087 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -939,12 +939,14 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, if (STREQ(name, "human-monitor-command")) *json_hmp = 1; - - if (STREQ(name, "system_wakeup")) + else if (STREQ(name, "system_wakeup")) qemuCapsSet(qemuCaps, QEMU_CAPS_WAKEUP); - - if (STREQ(name, "transaction")) + else if (STREQ(name, "transaction")) qemuCapsSet(qemuCaps, QEMU_CAPS_TRANSACTION); + else if (STREQ(name, "drive-mirror")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "drive-reopen")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); } ret = 0; @@ -3114,7 +3116,7 @@ qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, const char *device, const char *file, const char *format, bool reuse) { - int ret; + int ret = -1; virJSONValuePtr cmd; virJSONValuePtr reply = NULL; @@ -3132,14 +3134,13 @@ qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, if (actions) { if (virJSONValueArrayAppend(actions, cmd) < 0) { virReportOOMError(); - ret = -1; } else { ret = 0; cmd = NULL; } } else { if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) - goto cleanup; + goto cleanup; if (qemuMonitorJSONHasError(reply, "CommandNotFound") && qemuMonitorCheckHMP(mon, "snapshot_blkdev")) { -- 1.7.7.6

On 04/09/2012 11:52 PM, Eric Blake wrote:
The new block copy storage migration sequence requires both the 'drive-mirror' action in 'transaction' (present if the 'drive-mirror' standalone monitor command also exists) and the 'drive-reopen' monitor command (it would be nice if that were also part of a 'transaction', but the initial qemu implementation has it standalone only).
Furthermore, qemu 1.1 will be adding 'block_job_cancel' as an asynchronous command, but an earlier implementation (that supported only the qed file format) existed which was synchronous only. If 'drive-mirror' is added in time, then we can safely use 'drive-mirror' as the witness of whether 'block_job_cancel' is synchronous or asynchronous.
As of this[1] qemu email, both commands have been proposed but not yet incorporated into the tree; in fact, the implementation I tested with has changed to match this[2] email that suggested a mandatory 'full':'bool' argument rather than 'mode':'no-backing-file'. So there is a risk that qemu 1.1 will have yet another subtly different implementation. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html [2]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg00886.html
None of this gives me warm fuzzies :-/
* src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR) (QEMU_CAPS_DRIVE_REOPEN): New bits. * src/qemu/qemu_capabilities.c (qemuCaps): Name them. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set them. (qemuMonitorJSONDiskSnapshot): Fix formatting issues. --- src/qemu/qemu_capabilities.c | 3 +++ src/qemu/qemu_capabilities.h | 2 ++ src/qemu/qemu_monitor_json.c | 15 ++++++++------- 3 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 0e09d6d..1938ae4 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -156,6 +156,9 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "scsi-disk.channel", "scsi-block", "transaction", + + "drive-mirror", /* 90 */ + "drive-reopen", );
struct qemu_feature_flags { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 78cdbe0..405bf2a 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -124,6 +124,8 @@ enum qemuCapsFlags { QEMU_CAPS_SCSI_DISK_CHANNEL = 87, /* Is scsi-disk.channel available? */ QEMU_CAPS_SCSI_BLOCK = 88, /* -device scsi-block */ QEMU_CAPS_TRANSACTION = 89, /* transaction monitor command */ + QEMU_CAPS_DRIVE_MIRROR = 90, /* drive-mirror monitor command */ + QEMU_CAPS_DRIVE_REOPEN = 91, /* drive-reopen monitor command */
QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index d09d779..c8ed087 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -939,12 +939,14 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon,
if (STREQ(name, "human-monitor-command")) *json_hmp = 1; - - if (STREQ(name, "system_wakeup")) + else if (STREQ(name, "system_wakeup")) qemuCapsSet(qemuCaps, QEMU_CAPS_WAKEUP); - - if (STREQ(name, "transaction")) + else if (STREQ(name, "transaction")) qemuCapsSet(qemuCaps, QEMU_CAPS_TRANSACTION); + else if (STREQ(name, "drive-mirror")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "drive-reopen")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN);
Good optimization (turning the if's into else if's). I notice qemu is inconsistent wrt "-" vs. "_" in command names (yeah, I'm in a snarky mood ;-))
}
ret = 0; @@ -3114,7 +3116,7 @@ qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, const char *device, const char *file, const char *format, bool reuse) { - int ret; + int ret = -1; virJSONValuePtr cmd; virJSONValuePtr reply = NULL;
@@ -3132,14 +3134,13 @@ qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, if (actions) { if (virJSONValueArrayAppend(actions, cmd) < 0) { virReportOOMError(); - ret = -1; } else { ret = 0; cmd = NULL; } } else { if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) - goto cleanup; + goto cleanup;
if (qemuMonitorJSONHasError(reply, "CommandNotFound") && qemuMonitorCheckHMP(mon, "snapshot_blkdev")) {
I guess you just snuck this in here on general principle, right? Since there's no functional change, and it's rather minor, I don't have a problem including it. ACK.

On Tue, Apr 10, 2012 at 01:17:44PM -0400, Laine Stump wrote:
On 04/09/2012 11:52 PM, Eric Blake wrote:
The new block copy storage migration sequence requires both the 'drive-mirror' action in 'transaction' (present if the 'drive-mirror' standalone monitor command also exists) and the 'drive-reopen' monitor command (it would be nice if that were also part of a 'transaction', but the initial qemu implementation has it standalone only).
Furthermore, qemu 1.1 will be adding 'block_job_cancel' as an asynchronous command, but an earlier implementation (that supported only the qed file format) existed which was synchronous only. If 'drive-mirror' is added in time, then we can safely use 'drive-mirror' as the witness of whether 'block_job_cancel' is synchronous or asynchronous.
As of this[1] qemu email, both commands have been proposed but not yet incorporated into the tree; in fact, the implementation I tested with has changed to match this[2] email that suggested a mandatory 'full':'bool' argument rather than 'mode':'no-backing-file'. So there is a risk that qemu 1.1 will have yet another subtly different implementation. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html [2]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg00886.html
None of this gives me warm fuzzies :-/
Based on previous experience, my suggestion is that we do *not* merge any of this new libvirt API work, until the QEMU stuff has actually been accepted into their git master. The previous block streaming stuff is a nice history lesson in what happens when we merge stuff in libvirt before QEMU has agreed their final design :-( Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|

On 04/10/2012 11:24 AM, Daniel P. Berrange wrote:
As of this[1] qemu email, both commands have been proposed but not yet incorporated into the tree; in fact, the implementation I tested with has changed to match this[2] email that suggested a mandatory 'full':'bool' argument rather than 'mode':'no-backing-file'. So there is a risk that qemu 1.1 will have yet another subtly different implementation. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html [2]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg00886.html
None of this gives me warm fuzzies :-/
Based on previous experience, my suggestion is that we do *not* merge any of this new libvirt API work, until the QEMU stuff has actually been accepted into their git master. The previous block streaming stuff is a nice history lesson in what happens when we merge stuff in libvirt before QEMU has agreed their final design :-(
I'll push hard on the qemu folks to at least commit to the qapi for qemu 1.1, even if the implementation is not available until after that point, but I can live with the decision to not push the parts depending on the new monitor commands. This brings up some corollary questions: patch 3/18 depends on the new monitor command only insofar as it uses the existence of the command as a way to sanely tell if 'block_job_cancel' is asynchronous or synchronous. I may be able to rework that into something that doesn't depend on 'drive-mirror's existence, by instead tracking whether an event has been issued; is it worth me spending time on this effort, or should I wait for the qemu reaction to committing to 'drive-mirror' first? patch 2/18 and 4/18 are independent fixes, which could be pushed now without waiting for 1/18. Should I pursue this course of action? It is only in 1/18, as well as in 5/18 and later, where we are treading on the dangerous ground of committing to a libvirt API without a reference implementation in qemu; and even then, we can choose whether to commit to the libvirt API without immediate qemu support. All that said, I'd still appreciate a review of the full series, to catch any other potential mistakes while the code is still fresh on my mind, even if we delay pushing it to libvirt.git until qemu has made more progress. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Adam Litke <agl@us.ibm.com> Qemu has changed the semantics of the "block_job_cancel" API. The original qed implementation (pretty much only backported to RHEL 6.2 qemu) was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics going into qemu 1.1 for qcow2, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored. To adopt the new semantics while preserving compatibility the following updates are made to the virDomainBlockJob API: A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELED is recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled. For now, libvirt does not try to synthesize this event if using an older qemu that did not generate it. A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC is added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been canceled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status; this flag is an error if it is not known if the hypervisor supports asynchronous cancel. This patch also exposes the new flag through virsh, as well as wiring up the new event, but leaves the new flag for a later patch. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 ++++++++ src/libvirt.c | 14 ++++++++++- src/qemu/qemu_monitor_json.c | 44 ++++++++++++++++++++++++++++++----- tools/virsh.c | 51 ++++++++++++++++++++++++++--------------- tools/virsh.pod | 35 ++++++++++++++++++++-------- 5 files changed, 117 insertions(+), 37 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 499dcd4..97ad99d 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1946,6 +1946,15 @@ typedef enum { #endif } virDomainBlockJobType; +/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1 << 0, +} virDomainBlockJobAbortFlags; + /* An iterator for monitoring block job operations */ typedef unsigned long long virDomainBlockJobCursor; @@ -3617,6 +3626,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELED = 2, #ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_LAST diff --git a/src/libvirt.c b/src/libvirt.c index 16d1fd5..d44335a 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17902,7 +17902,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise-OR of virDomainBlockJobAbortFlags * * Cancel the active block job on the given disk. * @@ -17913,6 +17913,18 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, + * pass VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the @flags argument to enable + * asynchronous behavior; this flag fails if asynchronous support is not + * possible. If asynchronous jobs are supported, then when the job has + * been canceled, a BlockJob event will be emitted, with status + * VIR_DOMAIN_BLOCK_JOB_CANCELED (even if the ABORT_ASYNC flag was not + * used); but since events can be missed, it is also possible to use + * virDomainBlockJobInfo() to poll if the job is still running. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c8ed087..7c893d4 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -58,13 +58,14 @@ static void qemuMonitorJSONHandleIOError(qemuMonitorPtr mon, virJSONValuePtr dat static void qemuMonitorJSONHandleVNCConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleTrayChange(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMWakeup(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, virJSONValuePtr data); static struct { const char *type; @@ -80,13 +81,14 @@ static struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, { "SPICE_CONNECTED", qemuMonitorJSONHandleSPICEConnect, }, { "SPICE_INITIALIZED", qemuMonitorJSONHandleSPICEInitialize, }, { "SPICE_DISCONNECTED", qemuMonitorJSONHandleSPICEDisconnect, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "WAKEUP", qemuMonitorJSONHandlePMWakeup, }, { "SUSPEND", qemuMonitorJSONHandlePMSuspend, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, }, }; @@ -754,13 +756,15 @@ static void qemuMonitorJSONHandleSPICEDisconnect(qemuMonitorPtr mon, virJSONValu qemuMonitorJSONHandleGraphics(mon, data, VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT); } -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data) +static void +qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, + virJSONValuePtr data, + int event) { const char *device; const char *type_str; int type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; unsigned long long offset, len; - int status = VIR_DOMAIN_BLOCK_JOB_FAILED; if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in block job event"); @@ -785,11 +789,21 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; - if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED; + break; + case VIR_DOMAIN_BLOCK_JOB_CANCELED: + break; + case VIR_DOMAIN_BLOCK_JOB_FAILED: + VIR_DEBUG("should not get here"); + break; + } out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); } static void @@ -832,6 +846,22 @@ qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, qemuMonitorEmitPMSuspend(mon); } +static void +qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_COMPLETED); +} + +static void +qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_CANCELED); +} + int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, const char *cmd_str, diff --git a/tools/virsh.c b/tools/virsh.c index cfdd040..084d533 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7525,6 +7525,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, const char *name, *path; unsigned long bandwidth = 0; int ret = -1; + unsigned int flags = 0; if (!vshConnectionUsability(ctl, ctl->conn)) goto cleanup; @@ -7541,7 +7542,9 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, } if (mode == VSH_CMD_BLOCK_JOB_ABORT) { - ret = virDomainBlockJobAbort(dom, path, 0); + if (vshCommandOptBool(cmd, "async")) + flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + ret = virDomainBlockJobAbort(dom, path, flags); } else if (mode == VSH_CMD_BLOCK_JOB_INFO) { ret = virDomainGetBlockJobInfo(dom, path, info, 0); } else if (mode == VSH_CMD_BLOCK_JOB_SPEED) { @@ -7589,20 +7592,25 @@ cmdBlockPull(vshControl *ctl, const vshCmd *cmd) } /* - * "blockjobinfo" command + * "blockjob" command */ static const vshCmdInfo info_block_job[] = { - {"help", N_("Manage active block operations.")}, - {"desc", N_("Manage active block operations.")}, + {"help", N_("Manage active block operations")}, + {"desc", N_("Query, adjust speed, or cancel active block operations.")}, {NULL, NULL} }; static const vshCmdOptDef opts_block_job[] = { {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, - {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Abort the active job on the specified disk")}, - {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Get active job information for the specified disk")}, - {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Set the Bandwidth limit in MB/s")}, + {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("Abort the active job on the specified disk")}, + {"async", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("don't wait for --abort to complete")}, + {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("Get active job information for the specified disk")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, + N_("Set the Bandwidth limit in MB/s")}, {NULL, 0, 0, NULL} }; @@ -7613,19 +7621,24 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) virDomainBlockJobInfo info; const char *type; int ret; + bool abortMode = (vshCommandOptBool(cmd, "abort") || + vshCommandOptBool(cmd, "async")); + bool infoMode = vshCommandOptBool(cmd, "info"); + bool bandwidth = vshCommandOptBool(cmd, "bandwidth"); - if (vshCommandOptBool (cmd, "abort")) { - mode = VSH_CMD_BLOCK_JOB_ABORT; - } else if (vshCommandOptBool (cmd, "info")) { - mode = VSH_CMD_BLOCK_JOB_INFO; - } else if (vshCommandOptBool (cmd, "bandwidth")) { - mode = VSH_CMD_BLOCK_JOB_SPEED; - } else { + if (abortMode + infoMode + bandwidth > 1) { vshError(ctl, "%s", - _("One of --abort, --info, or --bandwidth is required")); + _("conflict between --abort, --info, and --bandwidth modes")); return false; } + if (abortMode) + mode = VSH_CMD_BLOCK_JOB_ABORT; + else if (bandwidth) + mode = VSH_CMD_BLOCK_JOB_SPEED; + else + mode = VSH_CMD_BLOCK_JOB_INFO; + ret = blockJobImpl(ctl, cmd, &info, mode); if (ret < 0) return false; @@ -7634,13 +7647,13 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) return true; if (info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL) - type = "Block Pull"; + type = _("Block Pull"); else - type = "Unknown job"; + type = _("Unknown job"); print_job_progress(type, info.end - info.cur, info.end); if (info.bandwidth != 0) - vshPrint(ctl, " Bandwidth limit: %lu MB/s\n", info.bandwidth); + vshPrint(ctl, _(" Bandwidth limit: %lu MB/s\n"), info.bandwidth); return true; } @@ -17115,8 +17128,8 @@ static const vshCmdDef domManagementCmds[] = { {"autostart", cmdAutostart, opts_autostart, info_autostart, 0}, {"blkdeviotune", cmdBlkdeviotune, opts_blkdeviotune, info_blkdeviotune, 0}, {"blkiotune", cmdBlkiotune, opts_blkiotune, info_blkiotune, 0}, - {"blockpull", cmdBlockPull, opts_block_pull, info_block_pull, 0}, {"blockjob", cmdBlockJob, opts_block_job, info_block_job, 0}, + {"blockpull", cmdBlockPull, opts_block_pull, info_block_pull, 0}, {"blockresize", cmdBlockResize, opts_block_resize, info_block_resize, 0}, {"change-media", cmdChangeMedia, opts_change_media, info_change_media, 0}, #ifndef WIN32 diff --git a/tools/virsh.pod b/tools/virsh.pod index a60e667..d6ba035 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -649,7 +649,10 @@ pulled, the disk no longer depends on that portion of the backing chain. It pulls data for the entire disk in the background, the process of the operation can be checked with B<blockjob>. -I<path> specifies fully-qualified path of the disk. +I<path> specifies fully-qualified path of the disk; it corresponds +to a unique target name (<target dev='name'/>) or source file (<source +file='name'/>) for one of the disk devices attached to I<domain> (see +also B<domblklist> for listing these names). I<bandwidth> specifies copying bandwidth limit in Mbps. =item B<blkdeviotune> I<domain> I<device> @@ -686,13 +689,21 @@ Both I<--live> and I<--current> flags may be given, but I<--current> is exclusive. If no flag is specified, behavior is different depending on hypervisor. -=item B<blockjob> I<domain> I<path> [I<--abort>] [I<--info>] [I<bandwidth>] +=item B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async>] | +[I<--info>] | [I<bandwidth>] } -Manage active block operations. +Manage active block operations. There are three modes: I<--info>, +I<bandwidth>, and I<--abort>; I<--info> is default except that I<--async> +implies I<--abort>. + +I<path> specifies fully-qualified path of the disk; it corresponds +to a unique target name (<target dev='name'/>) or source file (<source +file='name'/>) for one of the disk devices attached to I<domain> (see +also B<domblklist> for listing these names). -I<path> specifies fully-qualified path of the disk. If I<--abort> is specified, the active job on the specified disk will -be aborted. +be aborted. If I<--async> is also specified, this command will return +immediately, rather than waiting for the cancelation to complete. If I<--info> is specified, the active job information on the specified disk will be printed. I<bandwidth> can be used to set bandwidth limit for the active job. @@ -700,11 +711,15 @@ I<bandwidth> can be used to set bandwidth limit for the active job. =item B<blockresize> I<domain> I<path> I<size> Resize a block device of domain while the domain is running, I<path> -specifies the absolute path of the block device, I<size> is a scaled -integer (see B<NOTES> above) which defaults to KiB (blocks of 1024 bytes) -if there is no suffix. You must use a suffix of "B" to get bytes (note -that for historical reasons, this differs from B<vol-resize> which -defaults to bytes without a suffix). +specifies the absolute path of the block device; it corresponds +to a unique target name (<target dev='name'/>) or source file (<source +file='name'/>) for one of the disk devices attached to I<domain> (see +also B<domblklist> for listing these names). + +I<size> is a scaled integer (see B<NOTES> above) which defaults to KiB +(blocks of 1024 bytes) if there is no suffix. You must use a suffix of +"B" to get bytes (note that for historical reasons, this differs from +B<vol-resize> which defaults to bytes without a suffix). =item B<dominfo> I<domain-id> -- 1.7.7.6

On 04/09/2012 11:52 PM, Eric Blake wrote:
From: Adam Litke <agl@us.ibm.com>
Qemu has changed the semantics of the "block_job_cancel" API. The original qed implementation (pretty much only backported to RHEL 6.2 qemu) was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics going into qemu 1.1 for qcow2, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility the following updates are made to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELED is recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled. For now, libvirt does not try to synthesize this event if using an older qemu that did not generate it.
Okay, as I understand it, the only qemu binary that has a synchronous block_job_cancel command is the version that is part of RHEL6.2. So any compatibility code to allow for a synchronous block_job_cancel command in qemu would only ever make a difference for someone who built from upstream libvirt sources (or post-RHEL6.2 source rpm) to run on RHEL6.2 (and *didn't* build a post-RHEL6.2 qemu). Is that correct?
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC is added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been canceled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status; this flag is an error if it is not known if the hypervisor supports asynchronous cancel.
This patch also exposes the new flag through virsh, as well as wiring up the new event, but leaves the new flag for a later patch.
Did you leave out a word there? It exposes the new flag through virsh, ..., but leaves [blank] the new flag for a later patch? "implementing"? (It looks that way - the front end is there, but the backend hasn't been changed, and the backend has been rearranged a bit, but it would still error out if someone specified the flag, and doesn't implement the loop for the case when they hadn't specified the flag). Since we're stuck with the assumption of "synchronous unless otherwise specified", and we definitely want to allow for asynchronous, I'm okay with this, *as long as upstream qemu has also committed (in git, not just in email) to the block job cancel command being asynchronous*. I *think* I understood you correctly that this is the case.
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 ++++++++ src/libvirt.c | 14 ++++++++++- src/qemu/qemu_monitor_json.c | 44 ++++++++++++++++++++++++++++++----- tools/virsh.c | 51 ++++++++++++++++++++++++++--------------- tools/virsh.pod | 35 ++++++++++++++++++++-------- 5 files changed, 117 insertions(+), 37 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 499dcd4..97ad99d 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1946,6 +1946,15 @@ typedef enum { #endif } virDomainBlockJobType;
+/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1 << 0, +} virDomainBlockJobAbortFlags; + /* An iterator for monitoring block job operations */ typedef unsigned long long virDomainBlockJobCursor;
@@ -3617,6 +3626,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELED = 2,
#ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_LAST diff --git a/src/libvirt.c b/src/libvirt.c index 16d1fd5..d44335a 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17902,7 +17902,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise-OR of virDomainBlockJobAbortFlags * * Cancel the active block job on the given disk. * @@ -17913,6 +17913,18 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, + * pass VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the @flags argument to enable + * asynchronous behavior; this flag fails if asynchronous support is not + * possible. If asynchronous jobs are supported, then when the job has + * been canceled, a BlockJob event will be emitted, with status + * VIR_DOMAIN_BLOCK_JOB_CANCELED (even if the ABORT_ASYNC flag was not + * used); but since events can be missed, it is also possible to use + * virDomainBlockJobInfo() to poll if the job is still running. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c8ed087..7c893d4 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -58,13 +58,14 @@ static void qemuMonitorJSONHandleIOError(qemuMonitorPtr mon, virJSONValuePtr dat static void qemuMonitorJSONHandleVNCConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleTrayChange(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMWakeup(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, virJSONValuePtr data);
static struct { const char *type; @@ -80,13 +81,14 @@ static struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, { "SPICE_CONNECTED", qemuMonitorJSONHandleSPICEConnect, }, { "SPICE_INITIALIZED", qemuMonitorJSONHandleSPICEInitialize, }, { "SPICE_DISCONNECTED", qemuMonitorJSONHandleSPICEDisconnect, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "WAKEUP", qemuMonitorJSONHandlePMWakeup, }, { "SUSPEND", qemuMonitorJSONHandlePMSuspend, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, }, };
What? This isn't in alphabetical order? :-)
@@ -754,13 +756,15 @@ static void qemuMonitorJSONHandleSPICEDisconnect(qemuMonitorPtr mon, virJSONValu qemuMonitorJSONHandleGraphics(mon, data, VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT); }
-static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data) +static void +qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, + virJSONValuePtr data, + int event) { const char *device; const char *type_str; int type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; unsigned long long offset, len; - int status = VIR_DOMAIN_BLOCK_JOB_FAILED;
if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in block job event"); @@ -785,11 +789,21 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
- if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED;
Previously we also tested for offset != 0. Is that not actually necessary?
+ break; + case VIR_DOMAIN_BLOCK_JOB_CANCELED: + break; + case VIR_DOMAIN_BLOCK_JOB_FAILED: + VIR_DEBUG("should not get here"); + break; + }
out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); }
static void @@ -832,6 +846,22 @@ qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, qemuMonitorEmitPMSuspend(mon); }
+static void +qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_COMPLETED); +} + +static void +qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_CANCELED); +} + int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, const char *cmd_str, diff --git a/tools/virsh.c b/tools/virsh.c index cfdd040..084d533 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7525,6 +7525,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, const char *name, *path; unsigned long bandwidth = 0; int ret = -1; + unsigned int flags = 0;
if (!vshConnectionUsability(ctl, ctl->conn)) goto cleanup; @@ -7541,7 +7542,9 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, }
if (mode == VSH_CMD_BLOCK_JOB_ABORT) { - ret = virDomainBlockJobAbort(dom, path, 0); + if (vshCommandOptBool(cmd, "async")) + flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + ret = virDomainBlockJobAbort(dom, path, flags); } else if (mode == VSH_CMD_BLOCK_JOB_INFO) { ret = virDomainGetBlockJobInfo(dom, path, info, 0); } else if (mode == VSH_CMD_BLOCK_JOB_SPEED) { @@ -7589,20 +7592,25 @@ cmdBlockPull(vshControl *ctl, const vshCmd *cmd) }
/* - * "blockjobinfo" command + * "blockjob" command */ static const vshCmdInfo info_block_job[] = { - {"help", N_("Manage active block operations.")}, - {"desc", N_("Manage active block operations.")}, + {"help", N_("Manage active block operations")}, + {"desc", N_("Query, adjust speed, or cancel active block operations.")}, {NULL, NULL} };
static const vshCmdOptDef opts_block_job[] = { {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, - {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Abort the active job on the specified disk")}, - {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Get active job information for the specified disk")}, - {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Set the Bandwidth limit in MB/s")}, + {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("Abort the active job on the specified disk")}, + {"async", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("don't wait for --abort to complete")}, + {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("Get active job information for the specified disk")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, + N_("Set the Bandwidth limit in MB/s")}, {NULL, 0, 0, NULL} };
@@ -7613,19 +7621,24 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) virDomainBlockJobInfo info; const char *type; int ret; + bool abortMode = (vshCommandOptBool(cmd, "abort") || + vshCommandOptBool(cmd, "async")); + bool infoMode = vshCommandOptBool(cmd, "info"); + bool bandwidth = vshCommandOptBool(cmd, "bandwidth");
- if (vshCommandOptBool (cmd, "abort")) { - mode = VSH_CMD_BLOCK_JOB_ABORT; - } else if (vshCommandOptBool (cmd, "info")) { - mode = VSH_CMD_BLOCK_JOB_INFO; - } else if (vshCommandOptBool (cmd, "bandwidth")) { - mode = VSH_CMD_BLOCK_JOB_SPEED; - } else { + if (abortMode + infoMode + bandwidth > 1) { vshError(ctl, "%s", - _("One of --abort, --info, or --bandwidth is required")); + _("conflict between --abort, --info, and --bandwidth modes"));
"modes", or "options"? I guess "options" implies they can be combined, while "modes" implies they are mutually exclusive...
return false; }
+ if (abortMode) + mode = VSH_CMD_BLOCK_JOB_ABORT; + else if (bandwidth) + mode = VSH_CMD_BLOCK_JOB_SPEED; + else + mode = VSH_CMD_BLOCK_JOB_INFO; + ret = blockJobImpl(ctl, cmd, &info, mode); if (ret < 0) return false; @@ -7634,13 +7647,13 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) return true;
if (info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL) - type = "Block Pull"; + type = _("Block Pull"); else - type = "Unknown job"; + type = _("Unknown job");
print_job_progress(type, info.end - info.cur, info.end); if (info.bandwidth != 0) - vshPrint(ctl, " Bandwidth limit: %lu MB/s\n", info.bandwidth); + vshPrint(ctl, _(" Bandwidth limit: %lu MB/s\n"), info.bandwidth); return true; }
@@ -17115,8 +17128,8 @@ static const vshCmdDef domManagementCmds[] = { {"autostart", cmdAutostart, opts_autostart, info_autostart, 0}, {"blkdeviotune", cmdBlkdeviotune, opts_blkdeviotune, info_blkdeviotune, 0}, {"blkiotune", cmdBlkiotune, opts_blkiotune, info_blkiotune, 0}, - {"blockpull", cmdBlockPull, opts_block_pull, info_block_pull, 0}, {"blockjob", cmdBlockJob, opts_block_job, info_block_job, 0}, + {"blockpull", cmdBlockPull, opts_block_pull, info_block_pull, 0}, {"blockresize", cmdBlockResize, opts_block_resize, info_block_resize, 0}, {"change-media", cmdChangeMedia, opts_change_media, info_change_media, 0}, #ifndef WIN32 diff --git a/tools/virsh.pod b/tools/virsh.pod index a60e667..d6ba035 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -649,7 +649,10 @@ pulled, the disk no longer depends on that portion of the backing chain. It pulls data for the entire disk in the background, the process of the operation can be checked with B<blockjob>.
-I<path> specifies fully-qualified path of the disk. +I<path> specifies fully-qualified path of the disk; it corresponds +to a unique target name (<target dev='name'/>) or source file (<source +file='name'/>) for one of the disk devices attached to I<domain> (see +also B<domblklist> for listing these names). I<bandwidth> specifies copying bandwidth limit in Mbps.
=item B<blkdeviotune> I<domain> I<device> @@ -686,13 +689,21 @@ Both I<--live> and I<--current> flags may be given, but I<--current> is exclusive. If no flag is specified, behavior is different depending on hypervisor.
-=item B<blockjob> I<domain> I<path> [I<--abort>] [I<--info>] [I<bandwidth>] +=item B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async>] | +[I<--info>] | [I<bandwidth>] }
-Manage active block operations. +Manage active block operations. There are three modes: I<--info>, +I<bandwidth>, and I<--abort>; I<--info> is default except that I<--async> +implies I<--abort>. + +I<path> specifies fully-qualified path of the disk; it corresponds +to a unique target name (<target dev='name'/>) or source file (<source +file='name'/>) for one of the disk devices attached to I<domain> (see +also B<domblklist> for listing these names).
-I<path> specifies fully-qualified path of the disk. If I<--abort> is specified, the active job on the specified disk will -be aborted. +be aborted. If I<--async> is also specified, this command will return +immediately, rather than waiting for the cancelation to complete. If I<--info> is specified, the active job information on the specified disk will be printed. I<bandwidth> can be used to set bandwidth limit for the active job. @@ -700,11 +711,15 @@ I<bandwidth> can be used to set bandwidth limit for the active job. =item B<blockresize> I<domain> I<path> I<size>
Resize a block device of domain while the domain is running, I<path> -specifies the absolute path of the block device, I<size> is a scaled -integer (see B<NOTES> above) which defaults to KiB (blocks of 1024 bytes) -if there is no suffix. You must use a suffix of "B" to get bytes (note -that for historical reasons, this differs from B<vol-resize> which -defaults to bytes without a suffix). +specifies the absolute path of the block device; it corresponds +to a unique target name (<target dev='name'/>) or source file (<source +file='name'/>) for one of the disk devices attached to I<domain> (see +also B<domblklist> for listing these names). + +I<size> is a scaled integer (see B<NOTES> above) which defaults to KiB +(blocks of 1024 bytes) if there is no suffix. You must use a suffix of +"B" to get bytes (note that for historical reasons, this differs from +B<vol-resize> which defaults to bytes without a suffix).
=item B<dominfo> I<domain-id>
Really no surprises here. Functionally ACK, but as with the other patches in this series, I'll leave the philosophical/strategic ACK to others :-)

On 04/10/2012 03:01 PM, Laine Stump wrote:
On 04/09/2012 11:52 PM, Eric Blake wrote:
From: Adam Litke <agl@us.ibm.com>
Qemu has changed the semantics of the "block_job_cancel" API. The original qed implementation (pretty much only backported to RHEL 6.2 qemu) was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics going into qemu 1.1 for qcow2, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility the following updates are made to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELED is recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled. For now, libvirt does not try to synthesize this event if using an older qemu that did not generate it.
Okay, as I understand it, the only qemu binary that has a synchronous block_job_cancel command is the version that is part of RHEL6.2. So any compatibility code to allow for a synchronous block_job_cancel command in qemu would only ever make a difference for someone who built from upstream libvirt sources (or post-RHEL6.2 source rpm) to run on RHEL6.2 (and *didn't* build a post-RHEL6.2 qemu). Is that correct?
Correct, that's how I understand it. I'm cc'ing Adam and Stefan in case I missed something, though.
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC is added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been canceled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status; this flag is an error if it is not known if the hypervisor supports asynchronous cancel.
This patch also exposes the new flag through virsh, as well as wiring up the new event, but leaves the new flag for a later patch.
Did you leave out a word there? It exposes the new flag through virsh, ..., but leaves [blank] the new flag for a later patch? "implementing"?
Yes, adding 'implementing' helps it read better. Consider it done.
(It looks that way - the front end is there, but the backend hasn't been changed, and the backend has been rearranged a bit, but it would still error out if someone specified the flag, and doesn't implement the loop for the case when they hadn't specified the flag).
Since we're stuck with the assumption of "synchronous unless otherwise specified", and we definitely want to allow for asynchronous, I'm okay with this, *as long as upstream qemu has also committed (in git, not just in email) to the block job cancel command being asynchronous*. I *think* I understood you correctly that this is the case.
Yes, I'll update the commit message to point to the actual qemu commit id: commit 370521a1d6f5537ea7271c119f3fbb7b0fa57063 Author: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Date: Wed Jan 18 14:40:48 2012 +0000 qmp: add block_job_cancel command Add block_job_cancel, which stops an active block streaming operation. When the operation has been cancelled the new BLOCK_JOB_CANCELLED event is emitted. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Acked-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
@@ -80,13 +81,14 @@ static struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, { "SPICE_CONNECTED", qemuMonitorJSONHandleSPICEConnect, }, { "SPICE_INITIALIZED", qemuMonitorJSONHandleSPICEInitialize, }, { "SPICE_DISCONNECTED", qemuMonitorJSONHandleSPICEDisconnect, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "WAKEUP", qemuMonitorJSONHandlePMWakeup, }, { "SUSPEND", qemuMonitorJSONHandlePMSuspend, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, }, };
What? This isn't in alphabetical order? :-)
You caught me off-guard! :) This hunk is basically unchanged from Adam (other than the fact that I swapped to U.S. spelling for the callback method name, even though qemu picked the U.K spelling for the event name). If I sort the event names, it will be as a separate patch. (In fact, as the number of qemu events grows, sorting the list will allow us to do binary lookups instead of linear lookups, if we were worried about scaling effects).
@@ -785,11 +789,21 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
- if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED;
Previously we also tested for offset != 0. Is that not actually necessary?
This was Adam's change, so hopefully I'm justifying it correctly: Previously, we assumed failure, then checked for offset != 0 && offset == len before declaring success. But unless len == 0, offset == len _implies_ offset != 0, rendering the first check redundant. Now we assume success, and if offset != len declare failure. Yes, this allows a difference in behavior if len == 0; but on the other hand, I think the only way a block job can have len==0 is if there is nothing for the job to do in the first place, and if that is the case, you might as well declare it a success instead of a failure.
@@ -7613,19 +7621,24 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) virDomainBlockJobInfo info; const char *type; int ret; + bool abortMode = (vshCommandOptBool(cmd, "abort") || + vshCommandOptBool(cmd, "async")); + bool infoMode = vshCommandOptBool(cmd, "info"); + bool bandwidth = vshCommandOptBool(cmd, "bandwidth");
- if (vshCommandOptBool (cmd, "abort")) { - mode = VSH_CMD_BLOCK_JOB_ABORT; - } else if (vshCommandOptBool (cmd, "info")) { - mode = VSH_CMD_BLOCK_JOB_INFO; - } else if (vshCommandOptBool (cmd, "bandwidth")) { - mode = VSH_CMD_BLOCK_JOB_SPEED; - } else { + if (abortMode + infoMode + bandwidth > 1) { vshError(ctl, "%s", - _("One of --abort, --info, or --bandwidth is required")); + _("conflict between --abort, --info, and --bandwidth modes"));
"modes", or "options"? I guess "options" implies they can be combined, while "modes" implies they are mutually exclusive...
and they are indeed mutually exclusive.
Really no surprises here. Functionally ACK, but as with the other patches in this series, I'll leave the philosophical/strategic ACK to others :-)
And based on what we decide with qemu making it possible to detect async block cancel differently from RHEL 6.2 (see this email[1]), it would be reasonable to push this patch even if we defer the rest of the series. [1] https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01111.html -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 04/10/2012 03:30 PM, Eric Blake wrote:
Okay, as I understand it, the only qemu binary that has a synchronous block_job_cancel command is the version that is part of RHEL6.2. So any compatibility code to allow for a synchronous block_job_cancel command in qemu would only ever make a difference for someone who built from upstream libvirt sources (or post-RHEL6.2 source rpm) to run on RHEL6.2 (and *didn't* build a post-RHEL6.2 qemu). Is that correct?
Correct, that's how I understand it. I'm cc'ing Adam and Stefan in case I missed something, though.
And based on what we decide with qemu making it possible to detect async block cancel differently from RHEL 6.2 (see this email[1]), it would be reasonable to push this patch even if we defer the rest of the series.
[1] https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01111.html
Qemu just made it possible to tell the two apart. Luiz has accepted this patch[1] into the qmp queue, which means: RHEL 6.2 -> block_job_cancel, synchronous qemu 1.1 -> block-job-cancel, asynch [1] https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01225.html I'm assuming (hoping, really) that RHEL 6.3 will likewise go with the qemu 1.1 naming, since RHEL 6.3 is committed to the async interface. But that means I need to respin this patch to deal with the two namings for the two behaviors. I'll split the async block-job-cancel handling into a separate series that can be applied right away (as upstream qemu is now committed to it), in contrast with the rest of this series dealing with 'drive-mirror' that is still in a state of upstream flux. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Adam Litke <agl@us.ibm.com> Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block. Future patches may refactor things to allow other queries in parallel with this polling. For older qemu (namely RHEL 6.2, which backported an earlier version of block_job_cancel for the qed file format but was always synchronous), we don't synthesize the completion event, so instead we refuse to honor the async flag. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 67 ++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 59 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d9e35be..077c161 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11599,7 +11599,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int mode, unsigned int flags) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11621,6 +11621,18 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (!device) { goto cleanup; } + priv = vm->privateData; + + /* Asynchronous job cancellation was introduced at the same time + * as drive-mirror jobs, for upstream qemu 1.1. */ + if (mode == QEMU_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC) && + !qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("Asynchronous cancel is not supported with " + "this QEMU binary")); + goto cleanup; + } if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; @@ -11632,7 +11644,6 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, } qemuDomainObjEnterMonitorWithDriver(driver, vm); - priv = vm->privateData; /* XXX - add a qemu capability check, since only qemu 1.1 or newer * supports the base argument. * XXX - libvirt should really be tracking the backing file chain @@ -11641,6 +11652,44 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode); qemuDomainObjExitMonitorWithDriver(driver, vm); + /* Qemu provides asynchronous block job cancellation, but without + * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a + * synchronous operation. Provide this behavior by waiting here, + * so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + while (1) { + /* Poll every 50ms */ + static struct timespec ts = { .tv_sec = 0, + .tv_nsec = 50 * 1000 * 1000ull }; + virDomainBlockJobInfo dummy; + + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorBlockJob(priv->mon, device, NULL, 0, &dummy, + BLOCK_JOB_INFO); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (ret <= 0) + break; + + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + + nanosleep(&ts, NULL); + + qemuDriverLock(driver); + virDomainObjLock(vm); + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + ret = -1; + break; + } + } + } + endjob: if (qemuDomainObjEndJob(driver, vm) == 0) { vm = NULL; @@ -11658,8 +11707,9 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT); + virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); + return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT, + flags); } static int @@ -11667,7 +11717,8 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, NULL, 0, info, BLOCK_JOB_INFO); + return qemuDomainBlockJobImpl(dom, path, NULL, 0, info, BLOCK_JOB_INFO, + flags); } static int @@ -11676,7 +11727,7 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, { virCheckFlags(0, -1); return qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED); + BLOCK_JOB_SPEED, flags); } static int @@ -11687,10 +11738,10 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, virCheckFlags(0, -1); ret = qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, - BLOCK_JOB_PULL); + BLOCK_JOB_PULL, flags); if (ret == 0 && bandwidth != 0) ret = qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED); + BLOCK_JOB_SPEED, flags); return ret; } -- 1.7.7.6

In my testing, I was able to provoke an odd block pull failure: $ virsh blockpull dom vda --bandwidth 10000 error: Requested operation is not valid: No active operation on device: drive-virtio-disk0 merely by using gdb to artifically wait to do the block job set speed until after the pull had already finished. But in reality, that should be a success, since the pull finished before we had a chance to set speed. Furthermore, using a double job lock is annoying; we can alter the speed as part of the same job that started the block pull for less likelihood of hitting the speed failure in the first place. In fixing this, I discovered commit 10ec36e2 was wrong for marking the info parameter of qemuMonitorBlockJob as non-null. * src/qemu/qemu_monitor.h (qemuMonitorBlockJob): Fix prototype. (BLOCK_JOB_CMD): Add new mode. * src/qemu/qemu_driver.c (qemuDomainBlockRebase): Move secondary job call... (qemuDomainBlockJobImpl): ...here, for fewer locks. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Change return value on new internal mode. --- src/qemu/qemu_driver.c | 18 +++++++++--------- src/qemu/qemu_monitor.h | 5 +++-- src/qemu/qemu_monitor_json.c | 32 ++++++++++++++++++++------------ 3 files changed, 32 insertions(+), 23 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 077c161..425d340 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11650,14 +11650,20 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, * itself, and validating that base is on the chain, rather than * relying on qemu to do this. */ ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode); + if (ret == 0 && mode == BLOCK_JOB_PULL && bandwidth) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL); qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto endjob; + /* Qemu provides asynchronous block job cancellation, but without * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a * synchronous operation. Provide this behavior by waiting here, * so we don't get confused by newly scheduled block jobs. */ - if (ret == 0 && mode == BLOCK_JOB_ABORT && + if (mode == BLOCK_JOB_ABORT && !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { while (1) { /* Poll every 50ms */ @@ -11734,15 +11740,9 @@ static int qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { - int ret; - virCheckFlags(0, -1); - ret = qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, - BLOCK_JOB_PULL, flags); - if (ret == 0 && bandwidth != 0) - ret = qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED, flags); - return ret; + return qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, + BLOCK_JOB_PULL, flags); } static int diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index b480966..2e6ac79 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -530,7 +530,8 @@ typedef enum { BLOCK_JOB_ABORT = 0, BLOCK_JOB_INFO = 1, BLOCK_JOB_SPEED = 2, - BLOCK_JOB_PULL = 3, + BLOCK_JOB_SPEED_INTERNAL = 3, + BLOCK_JOB_PULL = 4, } BLOCK_JOB_CMD; int qemuMonitorBlockJob(qemuMonitorPtr mon, @@ -539,7 +540,7 @@ int qemuMonitorBlockJob(qemuMonitorPtr mon, unsigned long bandwidth, virDomainBlockJobInfoPtr info, int mode) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(5); + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); int qemuMonitorOpenGraphics(qemuMonitorPtr mon, const char *protocol, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 7c893d4..aea3765 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3438,7 +3438,7 @@ qemuMonitorJSONBlockJob(qemuMonitorPtr mon, } else if (mode == BLOCK_JOB_INFO) { cmd_name = "query-block-jobs"; cmd = qemuMonitorJSONMakeCommand(cmd_name, NULL); - } else if (mode == BLOCK_JOB_SPEED) { + } else if (mode == BLOCK_JOB_SPEED || mode == BLOCK_JOB_SPEED_INTERNAL) { cmd_name = "block_job_set_speed"; cmd = qemuMonitorJSONMakeCommand(cmd_name, "s:device", device, "U:value", @@ -3460,22 +3460,30 @@ qemuMonitorJSONBlockJob(qemuMonitorPtr mon, ret = qemuMonitorJSONCommand(mon, cmd, &reply); if (ret == 0 && virJSONValueObjectHasKey(reply, "error")) { - if (qemuMonitorJSONHasError(reply, "DeviceNotActive")) - qemuReportError(VIR_ERR_OPERATION_INVALID, - _("No active operation on device: %s"), device); - else if (qemuMonitorJSONHasError(reply, "DeviceInUse")) + ret = -1; + if (qemuMonitorJSONHasError(reply, "DeviceNotActive")) { + /* If a job completes before we get a chance to set the + * speed, we don't want to fail the original command. */ + if (mode == BLOCK_JOB_SPEED_INTERNAL) + ret = 0; + else + qemuReportError(VIR_ERR_OPERATION_INVALID, + _("No active operation on device: %s"), + device); + } else if (qemuMonitorJSONHasError(reply, "DeviceInUse")){ qemuReportError(VIR_ERR_OPERATION_FAILED, - _("Device %s in use"), device); - else if (qemuMonitorJSONHasError(reply, "NotSupported")) + _("Device %s in use"), device); + } else if (qemuMonitorJSONHasError(reply, "NotSupported")) { qemuReportError(VIR_ERR_OPERATION_INVALID, - _("Operation is not supported for device: %s"), device); - else if (qemuMonitorJSONHasError(reply, "CommandNotFound")) + _("Operation is not supported for device: %s"), + device); + } else if (qemuMonitorJSONHasError(reply, "CommandNotFound")) { qemuReportError(VIR_ERR_OPERATION_INVALID, - _("Command '%s' is not found"), cmd_name); - else + _("Command '%s' is not found"), cmd_name); + } else { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Unexpected error")); - ret = -1; + } } if (ret == 0 && mode == BLOCK_JOB_INFO) -- 1.7.7.6

This patch introduces a new block job, useful for live storage migration using pre-copy streaming. Using a live VM with the backing chain: base <- snap1 <- snap2 as the starting point, we have: - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY) creates /path/to/copy with the same format as snap2, with no backing file, so entire chain is copied and flattened - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) creates /path/to/copy as a raw file, so entire chain is copied and flattened - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW) creates /path/to/copy with the same format as snap2, but with snap1 as a backing file, so only snap2 is copied. - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) reuse existing /path/to/copy (must have empty contents, and format is probed from the metadata), and copy the full chain - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW) reuse existing /path/to/copy (contents must be identical to snap1, and format is probed from the metadata), and copy only the contents of snap2 - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) reuse existing /path/to/copy (must be raw volume with contents identical to snap1), and copy only the contents of snap2 Less useful combinations: - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) fail if source is not raw, otherwise create /path/to/copy as raw and the single file is copied (no chain involved) - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) makes little sense: the destination must be raw but have no contents, meaning that it is an empty file, so there is nothing to reuse The other three flags are rejected without VIR_DOMAIN_BLOCK_COPY. It would be nice if we could issue an event when pivoting from phase 1 to phase 2, but qemu hasn't implemented that, and we would have to poll in order to synthesize it ourselves. Meanwhile, qemu will give us a distinct job info and completion event when we either cancel or pivot to end the job. Pivoting is accomplished via the new: virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) Management applications can pre-create the copy with a relative backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT flag to have qemu reuse the metadata; if the management application also copies the backing files to a new location, this can be used to perform live storage migration of an entire backing chain. * include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_TYPE_COPY): New block job type. (virDomainBlockJobAbortFlags, virDomainBlockRebaseFlags): New enums. * src/libvirt.c (virDomainBlockRebase): Document the new flags, and implement general restrictions on flag combinations. (virDomainBlockJobAbort): Document the new flag. (virDomainSaveFlags, virDomainSnapshotCreateXML) (virDomainRevertToSnapshot, virDomainDetachDeviceFlags): Document restrictions. * include/libvirt/virterror.h (VIR_ERR_BLOCK_COPY_ACTIVE): New error. * src/util/virterror.c (virErrorMsg): Define it. --- include/libvirt/libvirt.h.in | 24 ++++++++++- include/libvirt/virterror.h | 1 + src/libvirt.c | 94 ++++++++++++++++++++++++++++++++++++++---- src/util/virterror.c | 6 +++ 4 files changed, 115 insertions(+), 10 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 97ad99d..ac5df95 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1934,12 +1934,15 @@ int virDomainUpdateDeviceFlags(virDomainPtr domain, /** * virDomainBlockJobType: * - * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull or - * virDomainBlockRebase) + * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull, or + * virDomainBlockRebase without flags), job ends on completion + * VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: Block Copy (virDomainBlockRebase with + * flags), job exists as long as mirroring is active */ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN = 0, VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, + VIR_DOMAIN_BLOCK_JOB_TYPE_COPY = 2, #ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_TYPE_LAST @@ -1950,9 +1953,11 @@ typedef enum { * virDomainBlockJobAbortFlags: * * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + * VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT: Pivot to mirror when ending a copy job */ typedef enum { VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1 << 0, + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT = 1 << 1, } virDomainBlockJobAbortFlags; /* An iterator for monitoring block job operations */ @@ -1983,6 +1988,21 @@ int virDomainBlockJobSetSpeed(virDomainPtr dom, const char *disk, int virDomainBlockPull(virDomainPtr dom, const char *disk, unsigned long bandwidth, unsigned int flags); + +/** + * virDomainBlockRebaseFlags: + * + * Flags available for virDomainBlockRebase(). + */ +typedef enum { + VIR_DOMAIN_BLOCK_REBASE_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW = 1 << 2, /* Make destination file raw */ + VIR_DOMAIN_BLOCK_REBASE_COPY = 1 << 3, /* Start a copy job */ +} virDomainBlockRebaseFlags; + int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags); diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index e04d29e..070fdb5 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -249,6 +249,7 @@ typedef enum { VIR_ERR_NO_DOMAIN_METADATA = 80, /* The metadata is not present */ VIR_ERR_MIGRATE_UNSAFE = 81, /* Migration is not safe */ VIR_ERR_OVERFLOW = 82, /* integer overflow */ + VIR_ERR_BLOCK_COPY_ACTIVE = 83, /* action prevented by block copy job */ } virErrorNumber; /** diff --git a/src/libvirt.c b/src/libvirt.c index d44335a..753a2e0 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2696,6 +2696,10 @@ error: * A save file can be inspected or modified slightly with * virDomainSaveImageGetXMLDesc() and virDomainSaveImageDefineXML(). * + * Some hypervisors may prevent this operation if there is a current + * block copy operation; in that case, use virDomainBlockJobAbort() + * to stop the block copy first. + * * Returns 0 in case of success and -1 in case of failure. */ int @@ -7891,6 +7895,11 @@ error: * virDomainUndefine(). A previous definition for this domain would be * overriden if it already exists. * + * Some hypervisors may prevent this operation if there is a current + * block copy operation on a transient domain with the same id as the + * domain being defined; in that case, use virDomainBlockJobAbort() to + * stop the block copy first. + * * Returns NULL in case of error, a pointer to the domain otherwise */ virDomainPtr @@ -9424,6 +9433,10 @@ error: * return failure if LIVE is specified but it only supports removing the * persisted device allocation. * + * Some hypervisors may prevent this operation if there is a current + * block copy operation on the device being detached; in that case, + * use virDomainBlockJobAbort() to stop the block copy first. + * * Returns 0 in case of success, -1 in case of failure. */ int @@ -17124,6 +17137,10 @@ virDomainSnapshotGetConnect(virDomainSnapshotPtr snapshot) * that it is still possible to fail after disks have changed, but only * in the much rarer cases of running out of memory or disk space). * + * Some hypervisors may prevent this operation if there is a current + * block copy operation; in that case, use virDomainBlockJobAbort() + * to stop the block copy first. + * * Returns an (opaque) virDomainSnapshotPtr on success, NULL on failure. */ virDomainSnapshotPtr @@ -17913,7 +17930,8 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * - * By default, this function performs a synchronous operation and the caller + * If the current block job for @disk is VIR_DOMAIN_BLOCK_JOB_TYPE_PULL, then + * by default, this function performs a synchronous operation and the caller * may assume that the operation has completed when 0 is returned. However, * BlockJob operations may take a long time to complete, and during this time * further domain interactions may be unresponsive. To avoid this problem, @@ -17925,6 +17943,15 @@ error: * used); but since events can be missed, it is also possible to use * virDomainBlockJobInfo() to poll if the job is still running. * + * If the current block job for @disk is VIR_DOMAIN_BLOCK_JOB_TYPE_COPY, then + * the default is to abort the mirroring and revert to the source disk; + * adding @flags of VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT causes this call to + * fail with VIR_ERR_BLOCK_COPY_ACTIVE if the copy is not fully populated, + * otherwise it will swap the disk over to the copy to end the mirroring. An + * event will be issued when the job is ended, and it is possible to use + * VIR_DOMAIN_BLOCK_JOB_ABORT_SYNC to control whether this command waits + * for the completion of the job. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, @@ -18173,19 +18200,55 @@ error: * @disk: path to the block device, or device shorthand * @base: path to backing file to keep, or NULL for no backing file * @bandwidth: (optional) specify copy bandwidth limit in Mbps - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise-OR of virDomainBlockRebaseFlags * * Populate a disk image with data from its backing image chain, and - * setting the backing image to @base. @base must be the absolute + * setting the backing image to @base, or alternatively copy an entire + * backing chain to a new file @base. + * + * When @flags is 0, this starts a pull, where @base must be the absolute * path of one of the backing images further up the chain, or NULL to * convert the disk image so that it has no backing image. Once all * data from its backing image chain has been pulled, the disk no * longer depends on those intermediate backing images. This function * pulls data for the entire device in the background. Progress of - * the operation can be checked with virDomainGetBlockJobInfo() and - * the operation can be aborted with virDomainBlockJobAbort(). When - * finished, an asynchronous event is raised to indicate the final - * status. + * the operation can be checked with virDomainGetBlockJobInfo() with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_PULL, and the operation can be + * aborted with virDomainBlockJobAbort(). When finished, an asynchronous + * event is raised to indicate the final status, and the job no longer + * exists. + * + * When @flags includes VIR_DOMAIN_BLOCK_REBASE_COPY, this starts a copy, + * where @base must be the name of a new file to copy the chain to. By + * default, the copy will pull the entire source chain into the destination + * file, but if @flags also contains VIR_DOMAIN_BLOCK_REBASE_SHALLOW, then + * only the top of the source chain will be copied (the source and + * destination have a common backing file). By default, @base will be + * created with the same file format as the source, but this can be altered + * by adding VIR_DOMAIN_BLOCK_REBASE_COPY_RAW to force the copy to be raw + * (does not make sense with the shallow flag unless the source is also raw), + * or by using VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT to reuse an existing file + * with initial contents identical to the backing file of the source (this + * allows a management app to pre-create files with relative backing file + * names, rather than the default of absolute backing file names; it is + * generally used with the shallow flag, since otherwise the destination + * file must start with empty contents). + * + * A copy job has two parts; in the first phase, the @bandwidth parameter + * affects how fast the source is pulled into the destination, and the job + * can only be canceled by reverting to the source file; progress in this + * phase can be tracked via the virDomainBlockJobInfo() command, with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_COPY. The job transitions to the + * second phase when the job info states cur == end, and remains alive to + * mirror all further changes to both source and destination. The user + * must call virDomainBlockJobAbort() to end the mirroring while choosing + * whether to revert to source or pivot to the destination. An event is + * issued when the job ends, and in the future, an event may be added when + * the job transitions from pulling to mirroring. + * + * Some hypervisors will restrict certain actions, such as virDomainSave() + * or virDomainDetachDevice(), while a copy job is active; they may + * also restrict a copy job to transient domains. * * The @disk parameter is either an unambiguous source name of the * block device (the <source file='...'/> sub-element, such as @@ -18199,7 +18262,8 @@ error: * suitable default. Some hypervisors do not support this feature and will * return an error if bandwidth is not 0. * - * When @base is NULL, this is identical to virDomainBlockPull(). + * When @base is NULL and @flags is 0, this is identical to + * virDomainBlockPull(). * * Returns 0 if the operation has started, -1 on failure. */ @@ -18232,6 +18296,20 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, goto error; } + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + if (!base) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("base is required when starting a copy")); + goto error; + } + } else if (flags & (VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW)) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("use of flags requires a copy job")); + goto error; + } + if (conn->driver->domainBlockRebase) { int ret; ret = conn->driver->domainBlockRebase(dom, disk, base, bandwidth, diff --git a/src/util/virterror.c b/src/util/virterror.c index ff9a36f..845081e 100644 --- a/src/util/virterror.c +++ b/src/util/virterror.c @@ -1250,6 +1250,12 @@ virErrorMsg(virErrorNumber error, const char *info) else errmsg = _("numerical overflow: %s"); break; + case VIR_ERR_BLOCK_COPY_ACTIVE: + if (!info) + errmsg = _("block copy still active"); + else + errmsg = _("block copy still active: %s"); + break; } return errmsg; } -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:14 -0600, Eric Blake wrote:
This patch introduces a new block job, useful for live storage migration using pre-copy streaming.
Using a live VM with the backing chain: base <- snap1 <- snap2 as the starting point, we have:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY) creates /path/to/copy with the same format as snap2, with no backing file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) creates /path/to/copy as a raw file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW) creates /path/to/copy with the same format as snap2, but with snap1 as a backing file, so only snap2 is copied.
In other words, this doesn't do any rebase at all, it just copies snap2 into copy and the result is the following chain: base <- snap1 <- copy Did I get it right?
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) reuse existing /path/to/copy (must have empty contents, and format is probed from the metadata), and copy the full chain
Hmm, in the past we tried to avoid format probing for security reasons. Shouldn't we avoid introducing new code that needs format probing? However, I'm not sure that's doable at all in this case.
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW) reuse existing /path/to/copy (contents must be identical to snap1, and format is probed from the metadata), and copy only the contents of snap2
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) reuse existing /path/to/copy (must be raw volume with contents identical to snap1), and copy only the contents of snap2
Less useful combinations:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) fail if source is not raw, otherwise create /path/to/copy as raw and the single file is copied (no chain involved)
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) makes little sense: the destination must be raw but have no contents, meaning that it is an empty file, so there is nothing to reuse
The other three flags are rejected without VIR_DOMAIN_BLOCK_COPY.
It would be nice if we could issue an event when pivoting from phase 1 to phase 2, but qemu hasn't implemented that, and we would have to poll in order to synthesize it ourselves. Meanwhile, qemu will give us a distinct job info and completion event when we either cancel or pivot to end the job. Pivoting is accomplished via the new:
virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)
Management applications can pre-create the copy with a relative backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT flag to have qemu reuse the metadata; if the management application also copies the backing files to a new location, this can be used to perform live storage migration of an entire backing chain.
This all sounds quite complicated but you explained it well in the documentation block for virDomainBlockRebase. When reading the above, I thought an example showing all steps needed to copy a disk image would be useful but reading virDomainBlockRebase documentation clarified it enough that I don't think the example is required :-) This all looks good and I think sticking this functionality into virDomainBlockRebase makes sense. Since the qemu interface is not set in stone yet, I won't formally ack any of these block copy patches :-) Jirka

On 04/13/2012 03:38 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:14 -0600, Eric Blake wrote:
This patch introduces a new block job, useful for live storage migration using pre-copy streaming.
Using a live VM with the backing chain: base <- snap1 <- snap2 as the starting point, we have:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY) creates /path/to/copy with the same format as snap2, with no backing file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) creates /path/to/copy as a raw file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW) creates /path/to/copy with the same format as snap2, but with snap1 as a backing file, so only snap2 is copied.
In other words, this doesn't do any rebase at all, it just copies snap2 into copy and the result is the following chain: base <- snap1 <- copy
Did I get it right?
Yes, you got the correct end result. I'll restate things in a slightly different manner, to explain even more why I like the naming: virDomainBlockRebase(,0) sets up a one-shot job that will eventually rebase the current image to have a different backing file; when the job completes (automatically), the rebase is complete (that is, the qemu process now has a different backing chain in memory). It is a property of 'block_stream' that the job can be aborted and restarted at will (if you abort at 20% pulled, then the next pull will start at 20%). virDomainBlockRebase(,_REBASE_COPY) sets up a long-running job that never completes without user intervention. When the user completes the job by use of virDomainBlockJobAbort(,_ABORT_PIVOT), then the rebase is complete (that is, the qemu process now has a different backing chain in memory). The job can be aborted at will, but if you abort before the streaming phase is complete, you have to start from scratch on the next attempt (that is, if you abort 'drive-mirror' at 20%, you have to restart it at 0%). More interesting is the case where you abort after streaming is complete and you are now in the mirroring phase - in that case, the copy is guaranteed to be consistent with the state of the source at the time you reverted to the source. But since both operations (with or without the flag) set up a block job that starts a rebase, and it is the completion of the block job (whether automatically or by user intervention) that finally commits qemu to using the alternate backing chain in memory, both operations fit under the generic name of causing a BlockRebase.
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) reuse existing /path/to/copy (must have empty contents, and format is probed from the metadata), and copy the full chain
Hmm, in the past we tried to avoid format probing for security reasons. Shouldn't we avoid introducing new code that needs format probing? However, I'm not sure that's doable at all in this case.
I definitely thought about that. The security hold can be summarized as: If you give the guest a raw image, but do not explicitly mark it as raw, then the guest can trivially (as in, without having to crack qemu) modify the header of the image in such a manner as to make a future probe detect something other than raw. For all other image types, there is no way for a guest to fake the wrong image type (at least, not without also cracking qemu, in which case you've got worse problems on your hand), since the guest cannot corrupt the metadata that a probe is reading. Therefore, there are two ways to defeat this hole, and using either method works in isolation to avoid the hole: Mitigation #1: Never probe a raw image; any other type of image can be safely probed once you have eliminated raw images. Mitigation #2: Be explicit about the image format for all images. You are asking for mitigation 2 (and see patch 16/18 where I give that to you in a new API virDomainBlockCopy). But I have already given you mitigation 1: if you specify VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, then you are avoiding the probe by declaring the image to be raw, and for all other image types, the probe is no longer a security hole.
Management applications can pre-create the copy with a relative backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT flag to have qemu reuse the metadata; if the management application also copies the backing files to a new location, this can be used to perform live storage migration of an entire backing chain.
This all sounds quite complicated but you explained it well in the documentation block for virDomainBlockRebase. When reading the above, I thought an example showing all steps needed to copy a disk image would be useful but reading virDomainBlockRebase documentation clarified it enough that I don't think the example is required :-)
I still ought to add another patch somewhere in the html documentation that gives a good summary of it all, especially since it is a multi-API-call sequence. But that can be a later patch.
This all looks good and I think sticking this functionality into virDomainBlockRebase makes sense. Since the qemu interface is not set in stone yet, I won't formally ack any of these block copy patches :-)
Fair enough. I need to rebase and post v5 anyways, so I'll apply tweaks from your v4 review at that time. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Apr 09, 2012 at 21:52:14 -0600, Eric Blake wrote:
This patch introduces a new block job, useful for live storage migration using pre-copy streaming.
Using a live VM with the backing chain: base <- snap1 <- snap2 as the starting point, we have:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY) creates /path/to/copy with the same format as snap2, with no backing file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) creates /path/to/copy as a raw file, so entire chain is copied and flattened
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW) creates /path/to/copy with the same format as snap2, but with snap1 as a backing file, so only snap2 is copied.
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) reuse existing /path/to/copy (must have empty contents, and format is probed from the metadata), and copy the full chain
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW) reuse existing /path/to/copy (contents must be identical to snap1, and format is probed from the metadata), and copy only the contents of snap2
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) reuse existing /path/to/copy (must be raw volume with contents identical to snap1), and copy only the contents of snap2
Less useful combinations:
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) fail if source is not raw, otherwise create /path/to/copy as raw and the single file is copied (no chain involved)
- virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) makes little sense: the destination must be raw but have no contents, meaning that it is an empty file, so there is nothing to reuse
The other three flags are rejected without VIR_DOMAIN_BLOCK_COPY.
It would be nice if we could issue an event when pivoting from phase 1 to phase 2, but qemu hasn't implemented that, and we would have to poll in order to synthesize it ourselves. Meanwhile, qemu will give us a distinct job info and completion event when we either cancel or pivot to end the job. Pivoting is accomplished via the new:
virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)
Management applications can pre-create the copy with a relative backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT flag to have qemu reuse the metadata; if the management application also copies the backing files to a new location, this can be used to perform live storage migration of an entire backing chain.
* include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_TYPE_COPY): New block job type. (virDomainBlockJobAbortFlags, virDomainBlockRebaseFlags): New enums. * src/libvirt.c (virDomainBlockRebase): Document the new flags, and implement general restrictions on flag combinations. (virDomainBlockJobAbort): Document the new flag. (virDomainSaveFlags, virDomainSnapshotCreateXML) (virDomainRevertToSnapshot, virDomainDetachDeviceFlags): Document restrictions. * include/libvirt/virterror.h (VIR_ERR_BLOCK_COPY_ACTIVE): New error. * src/util/virterror.c (virErrorMsg): Define it. --- include/libvirt/libvirt.h.in | 24 ++++++++++- include/libvirt/virterror.h | 1 + src/libvirt.c | 94 ++++++++++++++++++++++++++++++++++++++---- src/util/virterror.c | 6 +++ 4 files changed, 115 insertions(+), 10 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 97ad99d..ac5df95 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1934,12 +1934,15 @@ int virDomainUpdateDeviceFlags(virDomainPtr domain, /** * virDomainBlockJobType: * - * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull or - * virDomainBlockRebase) + * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull, or + * virDomainBlockRebase without flags), job ends on completion + * VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: Block Copy (virDomainBlockRebase with + * flags), job exists as long as mirroring is active */ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN = 0, VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, + VIR_DOMAIN_BLOCK_JOB_TYPE_COPY = 2,
#ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_TYPE_LAST @@ -1950,9 +1953,11 @@ typedef enum { * virDomainBlockJobAbortFlags: * * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + * VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT: Pivot to mirror when ending a copy job */ typedef enum { VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1 << 0, + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT = 1 << 1, } virDomainBlockJobAbortFlags;
/* An iterator for monitoring block job operations */ @@ -1983,6 +1988,21 @@ int virDomainBlockJobSetSpeed(virDomainPtr dom, const char *disk,
int virDomainBlockPull(virDomainPtr dom, const char *disk, unsigned long bandwidth, unsigned int flags); + +/** + * virDomainBlockRebaseFlags: + * + * Flags available for virDomainBlockRebase(). + */ +typedef enum { + VIR_DOMAIN_BLOCK_REBASE_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW = 1 << 2, /* Make destination file raw */ + VIR_DOMAIN_BLOCK_REBASE_COPY = 1 << 3, /* Start a copy job */ +} virDomainBlockRebaseFlags; + int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags); diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index e04d29e..070fdb5 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -249,6 +249,7 @@ typedef enum { VIR_ERR_NO_DOMAIN_METADATA = 80, /* The metadata is not present */ VIR_ERR_MIGRATE_UNSAFE = 81, /* Migration is not safe */ VIR_ERR_OVERFLOW = 82, /* integer overflow */ + VIR_ERR_BLOCK_COPY_ACTIVE = 83, /* action prevented by block copy job */ } virErrorNumber;
/** diff --git a/src/libvirt.c b/src/libvirt.c index d44335a..753a2e0 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2696,6 +2696,10 @@ error: * A save file can be inspected or modified slightly with * virDomainSaveImageGetXMLDesc() and virDomainSaveImageDefineXML(). * + * Some hypervisors may prevent this operation if there is a current + * block copy operation; in that case, use virDomainBlockJobAbort() + * to stop the block copy first. + * * Returns 0 in case of success and -1 in case of failure. */ int @@ -7891,6 +7895,11 @@ error: * virDomainUndefine(). A previous definition for this domain would be * overriden if it already exists. * + * Some hypervisors may prevent this operation if there is a current + * block copy operation on a transient domain with the same id as the + * domain being defined; in that case, use virDomainBlockJobAbort() to + * stop the block copy first. + * * Returns NULL in case of error, a pointer to the domain otherwise */ virDomainPtr @@ -9424,6 +9433,10 @@ error: * return failure if LIVE is specified but it only supports removing the * persisted device allocation. * + * Some hypervisors may prevent this operation if there is a current + * block copy operation on the device being detached; in that case, + * use virDomainBlockJobAbort() to stop the block copy first. + * * Returns 0 in case of success, -1 in case of failure. */ int @@ -17124,6 +17137,10 @@ virDomainSnapshotGetConnect(virDomainSnapshotPtr snapshot) * that it is still possible to fail after disks have changed, but only * in the much rarer cases of running out of memory or disk space). * + * Some hypervisors may prevent this operation if there is a current + * block copy operation; in that case, use virDomainBlockJobAbort() + * to stop the block copy first. + * * Returns an (opaque) virDomainSnapshotPtr on success, NULL on failure. */ virDomainSnapshotPtr @@ -17913,7 +17930,8 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * - * By default, this function performs a synchronous operation and the caller + * If the current block job for @disk is VIR_DOMAIN_BLOCK_JOB_TYPE_PULL, then + * by default, this function performs a synchronous operation and the caller * may assume that the operation has completed when 0 is returned. However, * BlockJob operations may take a long time to complete, and during this time * further domain interactions may be unresponsive. To avoid this problem, @@ -17925,6 +17943,15 @@ error: * used); but since events can be missed, it is also possible to use * virDomainBlockJobInfo() to poll if the job is still running. * + * If the current block job for @disk is VIR_DOMAIN_BLOCK_JOB_TYPE_COPY, then + * the default is to abort the mirroring and revert to the source disk; + * adding @flags of VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT causes this call to + * fail with VIR_ERR_BLOCK_COPY_ACTIVE if the copy is not fully populated, + * otherwise it will swap the disk over to the copy to end the mirroring. An + * event will be issued when the job is ended, and it is possible to use + * VIR_DOMAIN_BLOCK_JOB_ABORT_SYNC to control whether this command waits + * for the completion of the job. + *
Oops, s/ABORT_SYNC/ABORT_ASYNC/ I notice that when reading the virsh patch where you have --async while I remembered this ABORT_SYNC flag from here, which was of course wrong since the enum contains ASYNC. Jirka

Rather than further overloading 'blockpull', I decided to create a new virsh command to expose the new flags of virDomainBlockRebase. Someday, I'd also like to make blockpull and blockcopy have a synchronous mode, which blocks until the event happens or Ctrl-C is pressed, as well as a --verbose flag to print status updates before the job finishes - but not today. * tools/virsh.c (VSH_CMD_BLOCK_JOB_COPY): New mode. (blockJobImpl): Support new flags. (cmdBlockCopy): New command. (cmdBlockJob): Support new job info, new abort flag. * tools/virsh.pod (blockcopy, blockjob): Document the new command and flags. --- tools/virsh.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++------- tools/virsh.pod | 36 +++++++++++++++++++++++-- 2 files changed, 101 insertions(+), 13 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 084d533..25403f5 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7515,16 +7515,18 @@ typedef enum { VSH_CMD_BLOCK_JOB_INFO = 1, VSH_CMD_BLOCK_JOB_SPEED = 2, VSH_CMD_BLOCK_JOB_PULL = 3, -} VSH_CMD_BLOCK_JOB_MODE; + VSH_CMD_BLOCK_JOB_COPY = 4, +} vshCmdBlockJobMode; static int blockJobImpl(vshControl *ctl, const vshCmd *cmd, - virDomainBlockJobInfoPtr info, int mode) + virDomainBlockJobInfoPtr info, int mode) { virDomainPtr dom = NULL; const char *name, *path; unsigned long bandwidth = 0; int ret = -1; + const char *base = NULL; unsigned int flags = 0; if (!vshConnectionUsability(ctl, ctl->conn)) @@ -7541,22 +7543,39 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, goto cleanup; } - if (mode == VSH_CMD_BLOCK_JOB_ABORT) { + switch ((vshCmdBlockJobMode) mode) { + case VSH_CMD_BLOCK_JOB_ABORT: if (vshCommandOptBool(cmd, "async")) flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + if (vshCommandOptBool(cmd, "pivot")) + flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT; ret = virDomainBlockJobAbort(dom, path, flags); - } else if (mode == VSH_CMD_BLOCK_JOB_INFO) { + break; + case VSH_CMD_BLOCK_JOB_INFO: ret = virDomainGetBlockJobInfo(dom, path, info, 0); - } else if (mode == VSH_CMD_BLOCK_JOB_SPEED) { + break; + case VSH_CMD_BLOCK_JOB_SPEED: ret = virDomainBlockJobSetSpeed(dom, path, bandwidth, 0); - } else if (mode == VSH_CMD_BLOCK_JOB_PULL) { - const char *base = NULL; + break; + case VSH_CMD_BLOCK_JOB_PULL: if (vshCommandOptString(cmd, "base", &base) < 0) goto cleanup; if (base) ret = virDomainBlockRebase(dom, path, base, bandwidth, 0); else ret = virDomainBlockPull(dom, path, bandwidth, 0); + break; + case VSH_CMD_BLOCK_JOB_COPY: + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; + if (vshCommandOptBool(cmd, "shallow")) + flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + if (vshCommandOptBool(cmd, "reuse-external")) + flags |= VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + if (vshCommandOptBool(cmd, "raw")) + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + if (vshCommandOptString(cmd, "dest", &base) < 0) + goto cleanup; + ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); } cleanup: @@ -7566,6 +7585,34 @@ cleanup: } /* + * "blockcopy" command + */ +static const vshCmdInfo info_block_copy[] = { + {"help", N_("Start a block copy operation.")}, + {"desc", N_("Populate a disk from its backing image.")}, + {NULL, NULL} +}; + +static const vshCmdOptDef opts_block_copy[] = { + {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, + {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, + {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, + {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, + {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, + {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, + {NULL, 0, 0, NULL} +}; + +static bool +cmdBlockCopy(vshControl *ctl, const vshCmd *cmd) +{ + if (blockJobImpl(ctl, cmd, NULL, VSH_CMD_BLOCK_JOB_COPY) != 0) + return false; + return true; +} + +/* * "blockpull" command */ static const vshCmdInfo info_block_pull[] = { @@ -7607,6 +7654,8 @@ static const vshCmdOptDef opts_block_job[] = { N_("Abort the active job on the specified disk")}, {"async", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("don't wait for --abort to complete")}, + {"pivot", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("conclude and pivot a copy job")}, {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Get active job information for the specified disk")}, {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, @@ -7622,7 +7671,8 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) const char *type; int ret; bool abortMode = (vshCommandOptBool(cmd, "abort") || - vshCommandOptBool(cmd, "async")); + vshCommandOptBool(cmd, "async") || + vshCommandOptBool(cmd, "pivot")); bool infoMode = vshCommandOptBool(cmd, "info"); bool bandwidth = vshCommandOptBool(cmd, "bandwidth"); @@ -7646,10 +7696,17 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) if (ret == 0 || mode != VSH_CMD_BLOCK_JOB_INFO) return true; - if (info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL) + switch (info.type) { + case VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: type = _("Block Pull"); - else + break; + case VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: + type = _("Block Copy"); + break; + default: type = _("Unknown job"); + break; + } print_job_progress(type, info.end - info.cur, info.end); if (info.bandwidth != 0) @@ -17128,6 +17185,7 @@ static const vshCmdDef domManagementCmds[] = { {"autostart", cmdAutostart, opts_autostart, info_autostart, 0}, {"blkdeviotune", cmdBlkdeviotune, opts_blkdeviotune, info_blkdeviotune, 0}, {"blkiotune", cmdBlkiotune, opts_blkiotune, info_blkiotune, 0}, + {"blockcopy", cmdBlockCopy, opts_block_copy, info_block_copy, 0}, {"blockjob", cmdBlockJob, opts_block_job, info_block_job, 0}, {"blockpull", cmdBlockPull, opts_block_pull, info_block_pull, 0}, {"blockresize", cmdBlockResize, opts_block_resize, info_block_resize, 0}, diff --git a/tools/virsh.pod b/tools/virsh.pod index d6ba035..ee84ce5 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -638,6 +638,34 @@ currently in use by a running domain. Other contexts that require a MAC address of virtual interface (such as I<detach-interface> or I<domif-setlink>) will accept the MAC address printed by this command. +=item B<blockcopy> I<domain> I<path> I<dest> [I<bandwidth>] [I<--shallow>] +[I<--reuse-external>] [I<--raw>] + +Copy a disk backing image chain to I<dest>. By default, this command +flattens the entire chain; but if I<--shallow> is specified, the copy +shares the backing chain. + +If I<--reuse-external> is specified, then I<dest> must exist and have +contents identical to the resulting backing file (that is, it must +start with contents matching the backing file I<disk> if I<--shallow> +is used, otherwise it must start empty); this option is typically used +to set up a relative backing file name in the destination. + +The format of the destination is determined by the first match in the +following list: if I<--raw> is specified, it will be raw; if +I<--reuse-external> is specified, the existing destination is probed +for a format; and in all other cases, the destination format will +match the source format. + +The copy runs in the background; initially, the job must copy all data +from the source, and during this phase, the job can only be canceled to +revert back to the source disk. After this phase completes, both the +source and the destination remain mirrored until a call to B<blockjob> +with the I<--abort> and I<--pivot> flags pivots over to the copy. + +I<path> specifies fully-qualified path of the disk. +I<bandwidth> specifies copying bandwidth limit in Mbps. + =item B<blockpull> I<domain> I<path> [I<bandwidth>] [I<base>] Populate a disk from its backing image chain. By default, this command @@ -689,12 +717,12 @@ Both I<--live> and I<--current> flags may be given, but I<--current> is exclusive. If no flag is specified, behavior is different depending on hypervisor. -=item B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async>] | +=item B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async>] [I<--pivot>] | [I<--info>] | [I<bandwidth>] } Manage active block operations. There are three modes: I<--info>, I<bandwidth>, and I<--abort>; I<--info> is default except that I<--async> -implies I<--abort>. +or I<--pivot> implies I<--abort>. I<path> specifies fully-qualified path of the disk; it corresponds to a unique target name (<target dev='name'/>) or source file (<source @@ -703,7 +731,9 @@ also B<domblklist> for listing these names). If I<--abort> is specified, the active job on the specified disk will be aborted. If I<--async> is also specified, this command will return -immediately, rather than waiting for the cancelation to complete. +immediately, rather than waiting for the cancelation to complete. If +I<--pivot> is specified, this requests that an active copy job +be pivoted over to the new copy. If I<--info> is specified, the active job information on the specified disk will be printed. I<bandwidth> can be used to set bandwidth limit for the active job. -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:15 -0600, Eric Blake wrote:
Rather than further overloading 'blockpull', I decided to create a new virsh command to expose the new flags of virDomainBlockRebase.
Someday, I'd also like to make blockpull and blockcopy have a synchronous mode, which blocks until the event happens or Ctrl-C is pressed, as well as a --verbose flag to print status updates before the job finishes - but not today.
* tools/virsh.c (VSH_CMD_BLOCK_JOB_COPY): New mode. (blockJobImpl): Support new flags. (cmdBlockCopy): New command. (cmdBlockJob): Support new job info, new abort flag. * tools/virsh.pod (blockcopy, blockjob): Document the new command and flags. --- tools/virsh.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++------- tools/virsh.pod | 36 +++++++++++++++++++++++-- 2 files changed, 101 insertions(+), 13 deletions(-)
diff --git a/tools/virsh.c b/tools/virsh.c index 084d533..25403f5 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7515,16 +7515,18 @@ typedef enum { VSH_CMD_BLOCK_JOB_INFO = 1, VSH_CMD_BLOCK_JOB_SPEED = 2, VSH_CMD_BLOCK_JOB_PULL = 3, -} VSH_CMD_BLOCK_JOB_MODE; + VSH_CMD_BLOCK_JOB_COPY = 4, +} vshCmdBlockJobMode;
static int blockJobImpl(vshControl *ctl, const vshCmd *cmd, - virDomainBlockJobInfoPtr info, int mode) + virDomainBlockJobInfoPtr info, int mode) { virDomainPtr dom = NULL; const char *name, *path; unsigned long bandwidth = 0; int ret = -1; + const char *base = NULL; unsigned int flags = 0;
if (!vshConnectionUsability(ctl, ctl->conn)) @@ -7541,22 +7543,39 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, goto cleanup; }
- if (mode == VSH_CMD_BLOCK_JOB_ABORT) { + switch ((vshCmdBlockJobMode) mode) { + case VSH_CMD_BLOCK_JOB_ABORT: if (vshCommandOptBool(cmd, "async")) flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + if (vshCommandOptBool(cmd, "pivot")) + flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT; ret = virDomainBlockJobAbort(dom, path, flags); - } else if (mode == VSH_CMD_BLOCK_JOB_INFO) { + break; + case VSH_CMD_BLOCK_JOB_INFO: ret = virDomainGetBlockJobInfo(dom, path, info, 0); - } else if (mode == VSH_CMD_BLOCK_JOB_SPEED) { + break; + case VSH_CMD_BLOCK_JOB_SPEED: ret = virDomainBlockJobSetSpeed(dom, path, bandwidth, 0); - } else if (mode == VSH_CMD_BLOCK_JOB_PULL) { - const char *base = NULL; + break; + case VSH_CMD_BLOCK_JOB_PULL: if (vshCommandOptString(cmd, "base", &base) < 0) goto cleanup; if (base) ret = virDomainBlockRebase(dom, path, base, bandwidth, 0); else ret = virDomainBlockPull(dom, path, bandwidth, 0); + break; + case VSH_CMD_BLOCK_JOB_COPY: + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; + if (vshCommandOptBool(cmd, "shallow")) + flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + if (vshCommandOptBool(cmd, "reuse-external")) + flags |= VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + if (vshCommandOptBool(cmd, "raw")) + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + if (vshCommandOptString(cmd, "dest", &base) < 0) + goto cleanup; + ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); }
cleanup: @@ -7566,6 +7585,34 @@ cleanup: }
/* + * "blockcopy" command + */ +static const vshCmdInfo info_block_copy[] = { + {"help", N_("Start a block copy operation.")}, + {"desc", N_("Populate a disk from its backing image.")}, + {NULL, NULL} +}; + +static const vshCmdOptDef opts_block_copy[] = { + {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, + {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, + {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, + {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, + {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, + {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, + {NULL, 0, 0, NULL} +};
You are pretty inconsistent in upper/lower-case letter at the beginning of each description string.
+ +static bool +cmdBlockCopy(vshControl *ctl, const vshCmd *cmd) +{ + if (blockJobImpl(ctl, cmd, NULL, VSH_CMD_BLOCK_JOB_COPY) != 0) + return false; + return true; +} + +/* * "blockpull" command */ static const vshCmdInfo info_block_pull[] = { @@ -7607,6 +7654,8 @@ static const vshCmdOptDef opts_block_job[] = { N_("Abort the active job on the specified disk")}, {"async", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("don't wait for --abort to complete")}, + {"pivot", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("conclude and pivot a copy job")}, {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Get active job information for the specified disk")}, {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE,
This has the same kind of inconsistency but you are not making it any worse :-) The rest looks good. Jirka

On 04/13/2012 03:55 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:15 -0600, Eric Blake wrote:
Rather than further overloading 'blockpull', I decided to create a new virsh command to expose the new flags of virDomainBlockRebase.
Someday, I'd also like to make blockpull and blockcopy have a synchronous mode, which blocks until the event happens or Ctrl-C is pressed, as well as a --verbose flag to print status updates before the job finishes - but not today.
Actually, I ended up doing this for v5 :)
+ +static const vshCmdOptDef opts_block_copy[] = { + {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, + {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, + {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, + {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, + {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, + {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, + {NULL, 0, 0, NULL} +};
You are pretty inconsistent in upper/lower-case letter at the beginning of each description string.
Copy-and-paste strikes again. I think we've been favoring lower-case elsewhere, so I'll make it consistent on the respin. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

QUESTION: should we parse and ignore <mirror> on input, rather than rejecting it? By rejecting it, I can't add a unit test, since the unit test framework currently doesn't expose a way to trigger internal parsing. In order to track a block copy job across libvirtd restarts, we need to save internal XML that tracks the name of the file holding the mirror. Displaying this name in dumpxml might also be useful to the user, even if we don't yet have a way to (re-) start a domain with mirroring enabled up front. This is done with a new <mirror> sub-element to <disk>, as in: <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/original.img'/> <mirror file='/var/lib/libvirt/images/copy.img' format='qcow2'/> ... </disk> * docs/schemas/domaincommon.rng (diskspec): Add diskMirror. * docs/formatdomain.html.in (elementsDisks): Document it. * src/conf/domain_conf.h (_virDomainDiskDef): New members. * src/conf/domain_conf.c (virDomainDiskDefFree): Clean them. (virDomainDiskDefParseXML): Parse them, but only internally. (virDomainDiskDefFormat): Output them. --- docs/formatdomain.html.in | 11 ++++++++++ docs/schemas/domaincommon.rng | 19 +++++++++++++++-- src/conf/domain_conf.c | 42 +++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 5 ++++ 4 files changed, 74 insertions(+), 3 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a382d30..534c44b 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1296,6 +1296,17 @@ </table> <span class="since">Since 0.9.7</span> </dd> + <dt><code>mirror</code></dt> + <dd> + This element is present if the hypervisor has started a block + copy operation (via the <code>virDomainBlockCopy</code> API), + where the mirror location in attribute <code>file</code> will + eventually have the same contents as the source, and with the + file format in attribute <code>format</code> (which might + differ from the format of the source). For now, this element + only valid in output; it is rejected on + input. <span class="since">Since 0.9.12</span> + </dd> <dt><code>target</code></dt> <dd>The <code>target</code> element controls the bus / device under which the disk is exposed to the guest diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 0cc04af..66c91a2 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -772,6 +772,9 @@ <ref name="driver"/> </optional> <optional> + <ref name='diskMirror'/> + </optional> + <optional> <ref name="diskAuth"/> </optional> <ref name="target"/> @@ -1013,9 +1016,7 @@ </element> </define> <!-- - Disk may use a special driver for access. Currently this is - only defined for Xen for tap/aio and file, but will certainly be - extended in the future, and libvirt doesn't look for specific values. + Disk may use a special driver for access. --> <define name="driver"> <element name="driver"> @@ -3024,6 +3025,18 @@ <empty/> </element> </define> + <define name='diskMirror'> + <element name='mirror'> + <attribute name='file'> + <ref name='absFilePath'/> + </attribute> + <optional> + <attribute name='format'> + <ref name="genericName"/> + </attribute> + </optional> + </element> + </define> <define name="diskAuth"> <element name="auth"> <attribute name="username"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c6b97e1..8899653 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -933,6 +933,8 @@ void virDomainDiskDefFree(virDomainDiskDefPtr def) VIR_FREE(def->dst); VIR_FREE(def->driverName); VIR_FREE(def->driverType); + VIR_FREE(def->mirror); + VIR_FREE(def->mirrorFormat); VIR_FREE(def->auth.username); if (def->auth.secretType == VIR_DOMAIN_DISK_SECRET_TYPE_USAGE) VIR_FREE(def->auth.secret.usage); @@ -3318,6 +3320,8 @@ virDomainDiskDefParseXML(virCapsPtr caps, char *ioeventfd = NULL; char *event_idx = NULL; char *copy_on_read = NULL; + char *mirror = NULL; + char *mirrorFormat = NULL; char *devaddr = NULL; virStorageEncryptionPtr encryption = NULL; char *serial = NULL; @@ -3453,6 +3457,22 @@ virDomainDiskDefParseXML(virCapsPtr caps, ioeventfd = virXMLPropString(cur, "ioeventfd"); event_idx = virXMLPropString(cur, "event_idx"); copy_on_read = virXMLPropString(cur, "copy_on_read"); + } else if ((mirror == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "mirror"))) { + if (flags & VIR_DOMAIN_XML_INTERNAL_STATUS) { + mirror = virXMLPropString(cur, "file"); + if (!mirror) { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("mirror requires file name")); + goto error; + } + mirrorFormat = virXMLPropString(cur, "format"); + } else { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("Cannot handle disk mirror on " + "input yet")); + goto error; + } } else if (xmlStrEqual(cur->name, BAD_CAST "auth")) { authUsername = virXMLPropString(cur, "username"); if (authUsername == NULL) { @@ -3867,6 +3887,10 @@ virDomainDiskDefParseXML(virCapsPtr caps, driverName = NULL; def->driverType = driverType; driverType = NULL; + def->mirror = mirror; + mirror = NULL; + def->mirrorFormat = mirrorFormat; + mirrorFormat = NULL; def->encryption = encryption; encryption = NULL; def->serial = serial; @@ -3882,6 +3906,12 @@ virDomainDiskDefParseXML(virCapsPtr caps, !(def->driverName = strdup(caps->defaultDiskDriverName))) goto no_memory; + + if (def->mirror && !def->mirrorFormat && + caps->defaultDiskDriverType && + !(def->mirrorFormat = strdup(caps->defaultDiskDriverType))) + goto no_memory; + if (def->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && virDomainDiskDefAssignAddress(caps, def) < 0) goto error; @@ -3907,6 +3937,8 @@ cleanup: VIR_FREE(authUsage); VIR_FREE(driverType); VIR_FREE(driverName); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); VIR_FREE(cachetag); VIR_FREE(error_policy); VIR_FREE(rerror_policy); @@ -10829,6 +10861,16 @@ virDomainDiskDefFormat(virBufferPtr buf, } } + /* For now, mirroring is currently output-only: we always output + * it, but refuse to parse it on input except for internal parse + * on libvirtd restart. */ + if (def->mirror) { + virBufferEscapeString(buf, " <mirror file='%s'", def->mirror); + if (def->mirrorFormat) + virBufferAsprintf(buf, " format='%s'", def->mirrorFormat); + virBufferAddLit(buf, "/>\n"); + } + virBufferAsprintf(buf, " <target dev='%s' bus='%s'", def->dst, bus); if ((def->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY || diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 0eed60e..abc953d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -563,6 +563,10 @@ struct _virDomainDiskDef { char *driverName; char *driverType; + char *mirror; + char *mirrorFormat; + bool mirroring; + virDomainBlockIoTuneInfo blkdeviotune; char *serial; @@ -2125,6 +2129,7 @@ VIR_ENUM_DECL(virDomainDiskIo) VIR_ENUM_DECL(virDomainDiskSecretType) VIR_ENUM_DECL(virDomainDiskSnapshot) VIR_ENUM_DECL(virDomainDiskTray) +VIR_ENUM_DECL(virDomainDiskMirrorStage) VIR_ENUM_DECL(virDomainIoEventFd) VIR_ENUM_DECL(virDomainVirtioEventIdx) VIR_ENUM_DECL(virDomainDiskCopyOnRead) -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:16 -0600, Eric Blake wrote:
QUESTION: should we parse and ignore <mirror> on input, rather than rejecting it? By rejecting it, I can't add a unit test, since the unit test framework currently doesn't expose a way to trigger internal parsing.
In order to track a block copy job across libvirtd restarts, we need to save internal XML that tracks the name of the file holding the mirror. Displaying this name in dumpxml might also be useful to the user, even if we don't yet have a way to (re-) start a domain with mirroring enabled up front. This is done with a new <mirror> sub-element to <disk>, as in:
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/original.img'/> <mirror file='/var/lib/libvirt/images/copy.img' format='qcow2'/> ... </disk>
However, this would mean, the XML from virDomainGetXMLDesc() cannot be directly used for defining new domain. I think there are two possible ways to go. Either output <mirror> the way you did it and ignore it on input (similarly to what we do with aliases or SELinux labels) or put the mirror element out of domain XML and put it into state XML only.
* docs/schemas/domaincommon.rng (diskspec): Add diskMirror. * docs/formatdomain.html.in (elementsDisks): Document it. * src/conf/domain_conf.h (_virDomainDiskDef): New members. * src/conf/domain_conf.c (virDomainDiskDefFree): Clean them. (virDomainDiskDefParseXML): Parse them, but only internally. (virDomainDiskDefFormat): Output them. --- docs/formatdomain.html.in | 11 ++++++++++ docs/schemas/domaincommon.rng | 19 +++++++++++++++-- src/conf/domain_conf.c | 42 +++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 5 ++++ 4 files changed, 74 insertions(+), 3 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a382d30..534c44b 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1296,6 +1296,17 @@ </table> <span class="since">Since 0.9.7</span> </dd> + <dt><code>mirror</code></dt> + <dd> + This element is present if the hypervisor has started a block + copy operation (via the <code>virDomainBlockCopy</code> API), + where the mirror location in attribute <code>file</code> will + eventually have the same contents as the source, and with the + file format in attribute <code>format</code> (which might + differ from the format of the source). For now, this element + only valid in output; it is rejected on + input. <span class="since">Since 0.9.12</span> + </dd> <dt><code>target</code></dt> <dd>The <code>target</code> element controls the bus / device under which the disk is exposed to the guest diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 0cc04af..66c91a2 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -772,6 +772,9 @@ <ref name="driver"/> </optional> <optional> + <ref name='diskMirror'/> + </optional> + <optional> <ref name="diskAuth"/> </optional> <ref name="target"/> @@ -1013,9 +1016,7 @@ </element> </define> <!-- - Disk may use a special driver for access. Currently this is - only defined for Xen for tap/aio and file, but will certainly be - extended in the future, and libvirt doesn't look for specific values. + Disk may use a special driver for access. -->
Unrelated but it's not worth a separate patch.
<define name="driver"> <element name="driver"> @@ -3024,6 +3025,18 @@ <empty/> </element> </define> + <define name='diskMirror'> + <element name='mirror'> + <attribute name='file'> + <ref name='absFilePath'/> + </attribute> + <optional> + <attribute name='format'> + <ref name="genericName"/> + </attribute> + </optional> + </element> + </define> <define name="diskAuth"> <element name="auth"> <attribute name="username"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c6b97e1..8899653 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -933,6 +933,8 @@ void virDomainDiskDefFree(virDomainDiskDefPtr def) VIR_FREE(def->dst); VIR_FREE(def->driverName); VIR_FREE(def->driverType); + VIR_FREE(def->mirror); + VIR_FREE(def->mirrorFormat); VIR_FREE(def->auth.username); if (def->auth.secretType == VIR_DOMAIN_DISK_SECRET_TYPE_USAGE) VIR_FREE(def->auth.secret.usage); @@ -3318,6 +3320,8 @@ virDomainDiskDefParseXML(virCapsPtr caps, char *ioeventfd = NULL; char *event_idx = NULL; char *copy_on_read = NULL; + char *mirror = NULL; + char *mirrorFormat = NULL; char *devaddr = NULL; virStorageEncryptionPtr encryption = NULL; char *serial = NULL; @@ -3453,6 +3457,22 @@ virDomainDiskDefParseXML(virCapsPtr caps, ioeventfd = virXMLPropString(cur, "ioeventfd"); event_idx = virXMLPropString(cur, "event_idx"); copy_on_read = virXMLPropString(cur, "copy_on_read"); + } else if ((mirror == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "mirror"))) {
You could have also removed those extra () when copy&pasting the code :-)
+ if (flags & VIR_DOMAIN_XML_INTERNAL_STATUS) { + mirror = virXMLPropString(cur, "file"); + if (!mirror) { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("mirror requires file name")); + goto error; + } + mirrorFormat = virXMLPropString(cur, "format"); + } else { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("Cannot handle disk mirror on " + "input yet")); + goto error; + } } else if (xmlStrEqual(cur->name, BAD_CAST "auth")) { authUsername = virXMLPropString(cur, "username"); if (authUsername == NULL) { @@ -3867,6 +3887,10 @@ virDomainDiskDefParseXML(virCapsPtr caps, driverName = NULL; def->driverType = driverType; driverType = NULL; + def->mirror = mirror; + mirror = NULL; + def->mirrorFormat = mirrorFormat; + mirrorFormat = NULL; def->encryption = encryption; encryption = NULL; def->serial = serial; @@ -3882,6 +3906,12 @@ virDomainDiskDefParseXML(virCapsPtr caps, !(def->driverName = strdup(caps->defaultDiskDriverName))) goto no_memory;
+ + if (def->mirror && !def->mirrorFormat && + caps->defaultDiskDriverType && + !(def->mirrorFormat = strdup(caps->defaultDiskDriverType))) + goto no_memory; + if (def->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && virDomainDiskDefAssignAddress(caps, def) < 0) goto error; @@ -3907,6 +3937,8 @@ cleanup: VIR_FREE(authUsage); VIR_FREE(driverType); VIR_FREE(driverName); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); VIR_FREE(cachetag); VIR_FREE(error_policy); VIR_FREE(rerror_policy); @@ -10829,6 +10861,16 @@ virDomainDiskDefFormat(virBufferPtr buf, } }
+ /* For now, mirroring is currently output-only: we always output + * it, but refuse to parse it on input except for internal parse + * on libvirtd restart. */ + if (def->mirror) { + virBufferEscapeString(buf, " <mirror file='%s'", def->mirror); + if (def->mirrorFormat) + virBufferAsprintf(buf, " format='%s'", def->mirrorFormat); + virBufferAddLit(buf, "/>\n"); + } + virBufferAsprintf(buf, " <target dev='%s' bus='%s'", def->dst, bus); if ((def->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY || diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 0eed60e..abc953d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -563,6 +563,10 @@ struct _virDomainDiskDef { char *driverName; char *driverType;
+ char *mirror; + char *mirrorFormat; + bool mirroring; + virDomainBlockIoTuneInfo blkdeviotune;
char *serial; @@ -2125,6 +2129,7 @@ VIR_ENUM_DECL(virDomainDiskIo) VIR_ENUM_DECL(virDomainDiskSecretType) VIR_ENUM_DECL(virDomainDiskSnapshot) VIR_ENUM_DECL(virDomainDiskTray) +VIR_ENUM_DECL(virDomainDiskMirrorStage) VIR_ENUM_DECL(virDomainIoEventFd) VIR_ENUM_DECL(virDomainVirtioEventIdx) VIR_ENUM_DECL(virDomainDiskCopyOnRead)
This last hunk looks weired. Should it go into another patch perhaps? Jirka

On 04/13/2012 05:23 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:16 -0600, Eric Blake wrote:
QUESTION: should we parse and ignore <mirror> on input, rather than rejecting it? By rejecting it, I can't add a unit test, since the unit test framework currently doesn't expose a way to trigger internal parsing.
In order to track a block copy job across libvirtd restarts, we need to save internal XML that tracks the name of the file holding the mirror. Displaying this name in dumpxml might also be useful to the user, even if we don't yet have a way to (re-) start a domain with mirroring enabled up front. This is done with a new <mirror> sub-element to <disk>, as in:
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/original.img'/> <mirror file='/var/lib/libvirt/images/copy.img' format='qcow2'/> ... </disk>
However, this would mean, the XML from virDomainGetXMLDesc() cannot be directly used for defining new domain. I think there are two possible ways to go. Either output <mirror> the way you did it and ignore it on input (similarly to what we do with aliases or SELinux labels) or put the mirror element out of domain XML and put it into state XML only.
I like seeing the output (it makes it obvious that a mirror is still running, and what --pivot will switch to). I'll redo this patch to ignore rather than reject <mirror> on input for inactive domains, at which point I can then add an xml2xml round-trip testcase (and remove the QUESTION paragraph from my commit message).
@@ -3453,6 +3457,22 @@ virDomainDiskDefParseXML(virCapsPtr caps, ioeventfd = virXMLPropString(cur, "ioeventfd"); event_idx = virXMLPropString(cur, "event_idx"); copy_on_read = virXMLPropString(cur, "copy_on_read"); + } else if ((mirror == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "mirror"))) {
You could have also removed those extra () when copy&pasting the code :-)
Copy-and-paste strikes again :)
+ char *mirror; + char *mirrorFormat; + bool mirroring; + virDomainBlockIoTuneInfo blkdeviotune;
char *serial; @@ -2125,6 +2129,7 @@ VIR_ENUM_DECL(virDomainDiskIo) VIR_ENUM_DECL(virDomainDiskSecretType) VIR_ENUM_DECL(virDomainDiskSnapshot) VIR_ENUM_DECL(virDomainDiskTray) +VIR_ENUM_DECL(virDomainDiskMirrorStage) VIR_ENUM_DECL(virDomainIoEventFd) VIR_ENUM_DECL(virDomainVirtioEventIdx) VIR_ENUM_DECL(virDomainDiskCopyOnRead)
This last hunk looks weired. Should it go into another patch perhaps?
Good catch. Junk left over from an earlier revision (back when qemu didn't keep the job once in the mirroring stage, so I had to track the job manually via additional xml state). I'll delete it for v5. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

For now, disk migration via block copy job is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout. But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data. We can make life a lot easier by being cowards, and forbidding certain operations on a domain. This patch guarantees that we never get in a state where we would have to restart a domain with a mirroring block copy, by preventing saves, snapshots, hot unplug of a disk in use, and conversion to a persistent domain. * src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype. * src/conf/domain_conf.c (virDomainHasDiskMirror): New function. * src/libvirt_private.syms (domain_conf.h): Export it. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous actions while block copy is already in action. * src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise. --- src/conf/domain_conf.c | 12 ++++++++++++ src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 7 +++++++ 5 files changed, 51 insertions(+), 1 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 8899653..3aa6861 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7183,6 +7183,18 @@ virDomainDiskRemoveByName(virDomainDefPtr def, const char *name) return virDomainDiskRemove(def, i); } +/* Return true if VM has at least one disk involved in a current block + * copy job (that is, with a <mirror> element in the disk xml). */ +bool +virDomainHasDiskMirror(virDomainObjPtr vm) +{ + int i; + for (i = 0; i < vm->def->ndisks; i++) + if (vm->def->disks[i]->mirror) + return true; + return false; +} + int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) { if (VIR_REALLOC_N(def->nets, def->nnets + 1) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index abc953d..77c501c 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1954,6 +1954,7 @@ virDomainDiskDefPtr virDomainDiskRemove(virDomainDefPtr def, size_t i); virDomainDiskDefPtr virDomainDiskRemoveByName(virDomainDefPtr def, const char *name); +bool virDomainHasDiskMirror(virDomainObjPtr vm); int virDomainNetIndexByMac(virDomainDefPtr def, const unsigned char *mac); int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a90f8a0..570940d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -354,6 +354,7 @@ virDomainGraphicsSpiceZlibCompressionTypeFromString; virDomainGraphicsSpiceZlibCompressionTypeToString; virDomainGraphicsTypeFromString; virDomainGraphicsTypeToString; +virDomainHasDiskMirror; virDomainHostdevDefAlloc; virDomainHostdevDefClear; virDomainHostdevDefFree; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 425d340..b0937eb 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2558,6 +2558,11 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_PARTIAL, sizeof(header.magic)); @@ -4947,6 +4952,12 @@ static virDomainPtr qemudDomainDefine(virConnectPtr conn, const char *xml) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1; if (virDomainSaveConfig(driver->configDir, @@ -10262,6 +10273,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } + if (!vm->persistent && (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT)) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cannot halt after transient domain snapshot")); @@ -10869,6 +10886,11 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } snap = virDomainSnapshotFindByName(&vm->snapshots, snapshot->name); if (!snap) { @@ -11607,6 +11629,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, char uuidstr[VIR_UUID_STRING_BUFLEN]; char *device = NULL; int ret = -1; + int idx; qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11617,7 +11640,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto cleanup; } - device = qemuDiskPathToAlias(vm, path, NULL); + device = qemuDiskPathToAlias(vm, path, &idx); if (!device) { goto cleanup; } @@ -11633,6 +11656,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, "this QEMU binary")); goto cleanup; } + if (mode == BLOCK_JOB_PULL && vm->def->disks[idx]->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + vm->def->disks[idx]->dst); + goto cleanup; + } if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 857b980..98fa8f8 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1721,6 +1721,13 @@ int qemuDomainDetachDiskDevice(struct qemud_driver *driver, detach = vm->def->disks[i]; + if (detach->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' is in an active block copy job"), + detach->dst); + goto cleanup; + } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0) != 0) { qemuReportError(VIR_ERR_INTERNAL_ERROR, -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:17 -0600, Eric Blake wrote:
For now, disk migration via block copy job is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout.
But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data.
We can make life a lot easier by being cowards, and forbidding certain operations on a domain. This patch guarantees that we never get in a state where we would have to restart a domain with a mirroring block copy, by preventing saves, snapshots, hot unplug of a disk in use, and conversion to a persistent domain.
* src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype. * src/conf/domain_conf.c (virDomainHasDiskMirror): New function. * src/libvirt_private.syms (domain_conf.h): Export it. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous actions while block copy is already in action. * src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise. --- src/conf/domain_conf.c | 12 ++++++++++++ src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 7 +++++++ 5 files changed, 51 insertions(+), 1 deletions(-)
diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 8899653..3aa6861 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7183,6 +7183,18 @@ virDomainDiskRemoveByName(virDomainDefPtr def, const char *name) return virDomainDiskRemove(def, i); }
+/* Return true if VM has at least one disk involved in a current block + * copy job (that is, with a <mirror> element in the disk xml). */ +bool +virDomainHasDiskMirror(virDomainObjPtr vm) +{ + int i; + for (i = 0; i < vm->def->ndisks; i++) + if (vm->def->disks[i]->mirror) + return true; + return false; +} + int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) { if (VIR_REALLOC_N(def->nets, def->nnets + 1) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index abc953d..77c501c 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1954,6 +1954,7 @@ virDomainDiskDefPtr virDomainDiskRemove(virDomainDefPtr def, size_t i); virDomainDiskDefPtr virDomainDiskRemoveByName(virDomainDefPtr def, const char *name); +bool virDomainHasDiskMirror(virDomainObjPtr vm);
int virDomainNetIndexByMac(virDomainDefPtr def, const unsigned char *mac); int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a90f8a0..570940d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -354,6 +354,7 @@ virDomainGraphicsSpiceZlibCompressionTypeFromString; virDomainGraphicsSpiceZlibCompressionTypeToString; virDomainGraphicsTypeFromString; virDomainGraphicsTypeToString; +virDomainHasDiskMirror; virDomainHostdevDefAlloc; virDomainHostdevDefClear; virDomainHostdevDefFree; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 425d340..b0937eb 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2558,6 +2558,11 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + }
memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_PARTIAL, sizeof(header.magic)); @@ -4947,6 +4952,12 @@ static virDomainPtr qemudDomainDefine(virConnectPtr conn, const char *xml) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1;
I think it would be better to do this check a bit earlier in the process to avoid the need to undo virDomainObjAssignDef().
if (virDomainSaveConfig(driver->configDir, @@ -10262,6 +10273,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } + if (!vm->persistent && (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT)) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cannot halt after transient domain snapshot")); @@ -10869,6 +10886,11 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + }
snap = virDomainSnapshotFindByName(&vm->snapshots, snapshot->name); if (!snap) { @@ -11607,6 +11629,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, char uuidstr[VIR_UUID_STRING_BUFLEN]; char *device = NULL; int ret = -1; + int idx;
qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11617,7 +11640,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto cleanup; }
- device = qemuDiskPathToAlias(vm, path, NULL); + device = qemuDiskPathToAlias(vm, path, &idx); if (!device) { goto cleanup; } @@ -11633,6 +11656,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, "this QEMU binary")); goto cleanup; } + if (mode == BLOCK_JOB_PULL && vm->def->disks[idx]->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + vm->def->disks[idx]->dst); + goto cleanup; + }
if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 857b980..98fa8f8 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1721,6 +1721,13 @@ int qemuDomainDetachDiskDevice(struct qemud_driver *driver,
detach = vm->def->disks[i];
+ if (detach->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' is in an active block copy job"), + detach->dst); + goto cleanup; + } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0) != 0) { qemuReportError(VIR_ERR_INTERNAL_ERROR,
OK Jirka

On 04/13/2012 05:35 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:17 -0600, Eric Blake wrote:
For now, disk migration via block copy job is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout.
@@ -4947,6 +4952,12 @@ static virDomainPtr qemudDomainDefine(virConnectPtr conn, const char *xml) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1;
I think it would be better to do this check a bit earlier in the process to avoid the need to undo virDomainObjAssignDef().
I see where you are coming from in the limited context shown in this patch (and it even matches my initial thoughts when first trying to write the patch), but look at the bigger picture: if (!(vm = virDomainAssignDef(driver->caps, &driver->domains, def, false))) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1; That is, we don't know what vm is, to check if it has a disk mirror, until after we have already associated a potential persistent definition to vm, since virDomainAssignDef has the effect of doing both a lookup and assignment in one function. The only way to avoid assigning the persistent definition would be to do a lookup to see if there is already an existing vm that matches the name/uuid in the def, but that still implies that you have to have def available, and it also implies a change to domain_conf.c to add a new entry point that does a lookup without the assignment, just so we can probe for active mirrors, then follow it by another lookup with assignment. Therefore, I decided that it was easier to keep the existing lookup with assignment and undo the assignment than to add new entry points and redundant lookup work. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Apr 13, 2012 at 09:28:08 -0600, Eric Blake wrote:
On 04/13/2012 05:35 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:17 -0600, Eric Blake wrote:
For now, disk migration via block copy job is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout.
@@ -4947,6 +4952,12 @@ static virDomainPtr qemudDomainDefine(virConnectPtr conn, const char *xml) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1;
I think it would be better to do this check a bit earlier in the process to avoid the need to undo virDomainObjAssignDef().
I see where you are coming from in the limited context shown in this patch (and it even matches my initial thoughts when first trying to write the patch), but look at the bigger picture:
if (!(vm = virDomainAssignDef(driver->caps, &driver->domains, def, false))) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1;
That is, we don't know what vm is, to check if it has a disk mirror, until after we have already associated a potential persistent definition
Oh, right. Although I was looking at this change in the context of complete qemudDomainDefine(), I somehow missed the fact that virDomainAssignDef() does the lookup. OK, then. Jirka

As mentioned several commits ago when the capability bits were added, the new block copy storage migration sequence requires both the 'drive-mirror' and 'drive-reopen' monitor commands. As of this[1] qemu email, both commands have been proposed but not yet incorporated into the tree; in fact, the implementation I tested with has changed to match this[2] email that suggested a mandatory 'full':'bool' argument rather than 'mode':'no-backing-file'. So there is a risk that qemu 1.1 will have yet another subtly different implementation. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html [2]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg00886.html * src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): New functions. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): Declare them. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): New passthroughs. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): Declare them. --- src/qemu/qemu_monitor.c | 50 ++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 13 ++++++++ src/qemu/qemu_monitor_json.c | 70 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 19 ++++++++++- 4 files changed, 150 insertions(+), 2 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index e1a8d4c..969e453 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2685,6 +2685,32 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; } +/* Add the drive-mirror action to a transaction. */ +int +qemuMonitorDriveMirror(qemuMonitorPtr mon, virJSONValuePtr actions, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, actions=%p, device=%s, file=%s, format=%s, flags=%x", + mon, actions, device, file, format, flags); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, actions, device, file, format, + flags); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-mirror requires JSON monitor")); + return ret; +} + /* Use the transaction QMP command to run atomic snapshot commands. */ int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) @@ -2701,6 +2727,30 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) return ret; } +/* Use the drive-reopen monitor command. */ +int +qemuMonitorDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s", + mon, device, file, format); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveReopen(mon, device, file, format); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-reopen requires JSON monitor")); + return ret; +} + int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, char **reply, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 2e6ac79..0534b4a 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -508,8 +508,21 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, const char *file, const char *format, bool reuse); + +int qemuMonitorDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index aea3765..89f0eb8 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3189,6 +3189,49 @@ cleanup: } int +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virJSONValuePtr actions, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + bool shallow = (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) != 0; + bool reuse = (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) != 0; + + cmd = qemuMonitorJSONMakeCommandRaw(actions != NULL, + "drive-mirror", + "s:device", device, + "s:target", file, + "b:full", !shallow, + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if (actions) { + if (virJSONValueArrayAppend(actions, cmd) < 0) { + virReportOOMError(); + } else { + cmd = NULL; + ret = 0; + } + } else { + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + ret = qemuMonitorJSONCheckError(cmd, reply); + } + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) { int ret = -1; @@ -3216,6 +3259,33 @@ cleanup: return ret; } +int +qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + cmd = qemuMonitorJSONMakeCommand("drive-reopen", + "s:device", device, + "s:new-image-file", file, + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index a0f67aa..9cfae8d 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -230,8 +230,23 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, const char *device, const char *file, const char *format, - bool reuse); -int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions); + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:18 -0600, Eric Blake wrote:
As mentioned several commits ago when the capability bits were added, the new block copy storage migration sequence requires both the 'drive-mirror' and 'drive-reopen' monitor commands.
As of this[1] qemu email, both commands have been proposed but not yet incorporated into the tree; in fact, the implementation I tested with has changed to match this[2] email that suggested a mandatory 'full':'bool' argument rather than 'mode':'no-backing-file'. So there is a risk that qemu 1.1 will have yet another subtly different implementation. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html [2]https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg00886.html
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): New functions. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): Declare them. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): New passthroughs. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): Declare them. --- src/qemu/qemu_monitor.c | 50 ++++++++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 13 ++++++++ src/qemu/qemu_monitor_json.c | 70 ++++++++++++++++++++++++++++++++++++++++++ src/qemu/qemu_monitor_json.h | 19 ++++++++++- 4 files changed, 150 insertions(+), 2 deletions(-)
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index e1a8d4c..969e453 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2685,6 +2685,32 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; }
+/* Add the drive-mirror action to a transaction. */ +int +qemuMonitorDriveMirror(qemuMonitorPtr mon, virJSONValuePtr actions, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, actions=%p, device=%s, file=%s, format=%s, flags=%x", + mon, actions, device, file, format, flags); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, actions, device, file, format, + flags); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-mirror requires JSON monitor"));
Why is this VIR_ERR_INVALID_ARG instead of VIR_ERR_CONFIG_UNSUPPORTED?
+ return ret; +} + /* Use the transaction QMP command to run atomic snapshot commands. */ int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) @@ -2701,6 +2727,30 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) return ret; }
+/* Use the drive-reopen monitor command. */ +int +qemuMonitorDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s", + mon, device, file, format); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveReopen(mon, device, file, format); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-reopen requires JSON monitor"));
Likewise.
+ return ret; +} + int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, char **reply, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 2e6ac79..0534b4a 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -508,8 +508,21 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, const char *file, const char *format, bool reuse); + +int qemuMonitorDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3);
int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index aea3765..89f0eb8 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3189,6 +3189,49 @@ cleanup: }
int +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virJSONValuePtr actions, + const char *device, const char *file, + const char *format, unsigned int flags) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + bool shallow = (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) != 0; + bool reuse = (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) != 0; + + cmd = qemuMonitorJSONMakeCommandRaw(actions != NULL, + "drive-mirror", + "s:device", device, + "s:target", file, + "b:full", !shallow, + "s:mode", + reuse ? "existing" : "absolute-paths", + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if (actions) { + if (virJSONValueArrayAppend(actions, cmd) < 0) { + virReportOOMError(); + } else { + cmd = NULL; + ret = 0; + } + } else { + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + ret = qemuMonitorJSONCheckError(cmd, reply); + } + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) { int ret = -1; @@ -3216,6 +3259,33 @@ cleanup: return ret; }
+int +qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + cmd = qemuMonitorJSONMakeCommand("drive-reopen", + "s:device", device, + "s:new-image-file", file, + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index a0f67aa..9cfae8d 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -230,8 +230,23 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, const char *device, const char *file, const char *format, - bool reuse); -int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions); + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, + const char *device, + const char *file, + const char *format, + unsigned int flags) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3);
int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str,
OK Jirka

On 04/13/2012 06:31 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:18 -0600, Eric Blake wrote:
As mentioned several commits ago when the capability bits were added, the new block copy storage migration sequence requires both the 'drive-mirror' and 'drive-reopen' monitor commands.
+ + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, actions, device, file, format, + flags); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-mirror requires JSON monitor"));
Why is this VIR_ERR_INVALID_ARG instead of VIR_ERR_CONFIG_UNSUPPORTED?
Probably copy-and-paste, but worth fixing along the same lines as the other message about "qemu binary" being too old. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

Handle the new type of block copy event and info. Of course, this patch does nothing until a later patch actually allows the creation/abort of a block copy job. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockJobImpl) (qemuMonitorJSONGetBlockJobInfoOne): Translate new job type. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful info query to save effort on a pivot request. --- src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_monitor_json.c | 4 ++++ 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b0937eb..931e095 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11687,6 +11687,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (ret < 0) goto endjob; + /* Snoop block copy operations, so future cancel operations can + * avoid checking if pivot is safe. */ + if (mode == BLOCK_JOB_INFO && ret == 1 && vm->def->disks[idx]->mirror && + info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + vm->def->disks[idx]->mirroring = true; + /* Qemu provides asynchronous block job cancellation, but without * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a * synchronous operation. Provide this behavior by waiting here, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 89f0eb8..8475dab 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -788,6 +788,8 @@ qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type_str, "mirror")) + type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; switch (event) { case VIR_DOMAIN_BLOCK_JOB_COMPLETED: @@ -3412,6 +3414,8 @@ static int qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, } if (STREQ(type, "stream")) info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type, "mirror")) + info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; else info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:19 -0600, Eric Blake wrote:
Handle the new type of block copy event and info. Of course, this patch does nothing until a later patch actually allows the creation/abort of a block copy job.
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockJobImpl) (qemuMonitorJSONGetBlockJobInfoOne): Translate new job type. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful info query to save effort on a pivot request. --- src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_monitor_json.c | 4 ++++ 2 files changed, 10 insertions(+), 0 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b0937eb..931e095 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11687,6 +11687,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (ret < 0) goto endjob;
+ /* Snoop block copy operations, so future cancel operations can + * avoid checking if pivot is safe. */ + if (mode == BLOCK_JOB_INFO && ret == 1 && vm->def->disks[idx]->mirror && + info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + vm->def->disks[idx]->mirroring = true; + /* Qemu provides asynchronous block job cancellation, but without * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a * synchronous operation. Provide this behavior by waiting here, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 89f0eb8..8475dab 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -788,6 +788,8 @@ qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon,
if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type_str, "mirror")) + type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY;
switch (event) { case VIR_DOMAIN_BLOCK_JOB_COMPLETED: @@ -3412,6 +3414,8 @@ static int qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, } if (STREQ(type, "stream")) info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type, "mirror")) + info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; else info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN;
OK Jirka

This is the bare minimum to end a copy job (of course, until a later patch adds the ability to start a copy job, this patch doesn't do much in isolation; I've just split the patches to ease the review). This patch intentionally avoids SELinux, lock manager, and audit actions, saving that for a later patch that affects the overall lifecycle of a disk copy. In particular, I'm still fuzzy on the exact qemu error semantics, and whether I need to make more of an effort after a 'drive-reopen' fails. When a mirror job is started, cancelling the job safely reverts back to the source disk, regardless of whether the destination is in phase 1 (streaming, in which case the destination is worthless) or phase 2 (mirroring, in which case the destination is sync'd up to the source at the time of the cancel). Our existing code does just fine in either phase, other than some bookkeeping cleanup. Pivoting the job requires the use of the new 'drive-reopen' command. Here, failure of the command is potentially catastrophic to the domain, since it rips out the old disk before attempting to open the new one; if our recovery path of retrying the reopen on the original source disk also fails, the domain is hosed. If only qemu could get 'drive-reopen' inside 'transaction'... Interesting side note: while snapshot-create --disk-only creates a copy of the disk at a point in time by moving the domain on to a new file (the copy is the file now in the just-extended backing chain), blockjob --abort of a copy job creates a copy of the disk while keeping the domain on the original file. There may be potential improvements to the snapshot code to exploit block copy over multiple disks all at one point in time. And, if 'block_job_cancel' were made part of 'transaction', you could copy multiple disks at the same point in time without pausing the domain. This also implies we may want to add a --quiesce flag to the pivot operation, so that when breaking a mirror, the side of the mirror that we are abandoning is at least in a stable state with regards to guest I/O. * src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag. (qemuDomainBlockPivot): New helper function. (qemuDomainBlockJobImpl): Implement it. --- src/qemu/qemu_driver.c | 115 +++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 110 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 931e095..2207184 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11618,6 +11618,86 @@ cleanup: return ret; } +/* Called while holding the VM job lock, to implement a block job + * abort with pivot; this updates the VM definition as appropriate, on + * either success or failure (although there are some forms of + * catastrophic failure that will leave the VM unusable). */ +static int +qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, + const char *device, virDomainDiskDefPtr disk) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainBlockJobInfo info; + + /* Probe the status, if needed. */ + if (!disk->mirroring) { + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorBlockJob(priv->mon, device, NULL, 0, &info, + BLOCK_JOB_INFO); + qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } + if (ret == 1 && info.cur == info.end && + info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + disk->mirroring = true; + } + + if (!disk->mirroring) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' not ready for pivot yet"), + disk->dst); + goto cleanup; + } + + /* Attempt the pivot. */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveReopen(priv->mon, device, disk->mirror, + disk->mirrorFormat); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + /* XXX Until qemu adds support for 'drive-reopen' inside + * 'transaction', we have the remote risk of a catastrophic + * failure, where the drive-reopen fails but can't recover by + * reopening the source. Not much we can do about it. */ + + if (ret == 0) { + /* XXX We want to revoke security labels and disk lease, as + * well as audit that revocation, before dropping the original + * source. But it gets tricky if both source and mirror share + * common backing files (we want to only revoke the non-shared + * portion of the chain, and is made more difficult by the + * fact that we aren't tracking the full chain ourselves; so + * for now, we leak the access to the original. */ + VIR_FREE(disk->src); + VIR_FREE(disk->driverType); + disk->src = disk->mirror; + disk->driverType = disk->mirrorFormat; + disk->mirror = NULL; + disk->mirrorFormat = NULL; + disk->mirroring = false; + } else { + /* On failure, qemu abandons the mirror, and attempts to + * revert back to the source disk. Hopefully it was able to + * reopen things. */ + /* XXX should we be parsing the exact qemu error, or calling + * 'query-block', to see what state we really got left in + * before killing the mirroring job? And just as on the + * success case, there's security labeling to worry about. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } + +cleanup: + return ret; +} + static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, virDomainBlockJobInfoPtr info, @@ -11630,6 +11710,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, char *device = NULL; int ret = -1; int idx; + virDomainDiskDefPtr disk = NULL; qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11644,6 +11725,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (!device) { goto cleanup; } + disk = vm->def->disks[idx]; priv = vm->privateData; /* Asynchronous job cancellation was introduced at the same time @@ -11656,10 +11738,17 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, "this QEMU binary")); goto cleanup; } - if (mode == BLOCK_JOB_PULL && vm->def->disks[idx]->mirror) { + if (disk->mirror && mode == BLOCK_JOB_PULL) { qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, _("disk '%s' already in active block copy job"), - vm->def->disks[idx]->dst); + disk->dst); + goto cleanup; + } + if (!disk->mirror && mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + _("pivot of disk '%s' requires an active copy job"), + disk->dst); goto cleanup; } @@ -11672,6 +11761,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto endjob; } + if (disk->mirror && mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + ret = qemuDomainBlockPivot(driver, vm, device, disk); + goto endjob; + } + qemuDomainObjEnterMonitorWithDriver(driver, vm); /* XXX - add a qemu capability check, since only qemu 1.1 or newer * supports the base argument. @@ -11689,9 +11784,18 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, /* Snoop block copy operations, so future cancel operations can * avoid checking if pivot is safe. */ - if (mode == BLOCK_JOB_INFO && ret == 1 && vm->def->disks[idx]->mirror && + if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) - vm->def->disks[idx]->mirroring = true; + disk->mirroring = true; + + /* A successful block job cancelation stops any mirroring. */ + if (mode == BLOCK_JOB_ABORT && disk->mirror) { + /* XXX We should also revoke security labels and disk lease on + * the mirror, and audit that fact, before dropping things. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } /* Qemu provides asynchronous block job cancellation, but without * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a @@ -11748,7 +11852,8 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC | + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT, -1); return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT, flags); } -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:20 -0600, Eric Blake wrote:
This is the bare minimum to end a copy job (of course, until a later patch adds the ability to start a copy job, this patch doesn't do much in isolation; I've just split the patches to ease the review).
Yeah, thanks for doing that.
This patch intentionally avoids SELinux, lock manager, and audit actions, saving that for a later patch that affects the overall lifecycle of a disk copy. In particular, I'm still fuzzy on the exact qemu error semantics, and whether I need to make more of an effort after a 'drive-reopen' fails.
When a mirror job is started, cancelling the job safely reverts back to the source disk, regardless of whether the destination is in phase 1 (streaming, in which case the destination is worthless) or phase 2 (mirroring, in which case the destination is sync'd up to the source at the time of the cancel). Our existing code does just fine in either phase, other than some bookkeeping cleanup.
Pivoting the job requires the use of the new 'drive-reopen' command. Here, failure of the command is potentially catastrophic to the domain, since it rips out the old disk before attempting to open the new one; if our recovery path of retrying the reopen on the original source disk also fails, the domain is hosed. If only qemu could get 'drive-reopen' inside 'transaction'...
Interesting side note: while snapshot-create --disk-only creates a copy of the disk at a point in time by moving the domain on to a new file (the copy is the file now in the just-extended backing chain), blockjob --abort of a copy job creates a copy of the disk while keeping the domain on the original file. There may be potential improvements to the snapshot code to exploit block copy over multiple disks all at one point in time. And, if 'block_job_cancel' were made part of 'transaction', you could copy multiple disks at the same point in time without pausing the domain. This also implies we may want to add a --quiesce flag to the pivot operation, so that when breaking a mirror, the side of the mirror that we are abandoning is at least in a stable state with regards to guest I/O.
* src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag. (qemuDomainBlockPivot): New helper function. (qemuDomainBlockJobImpl): Implement it. --- src/qemu/qemu_driver.c | 115 +++++++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 110 insertions(+), 5 deletions(-)
OK Jirka

Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. --- src/qemu/qemu_driver.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 113 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2207184..6d5a5da 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11877,10 +11877,121 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, } static int +qemuDomainBlockCopy(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *device = NULL; + virDomainDiskDefPtr disk; + int ret = -1; + int idx; + + /* Preliminaries: find the disk we are editing, sanity checks */ + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + + qemuDriverLock(driver); + virUUIDFormat(dom->uuid, uuidstr); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + if (!vm) { + qemuReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + + device = qemuDiskPathToAlias(vm, path, &idx); + if (!device) { + goto cleanup; + } + disk = vm->def->disks[idx]; + if (disk->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + + priv = vm->privateData; + if (!(qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN))) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("block copy is not supported with this QEMU binary")); + goto cleanup; + } + if (vm->persistent) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not transient")); + goto cleanup; + } + + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto endjob; + } + + /* Prepare the destination file. */ + /* XXX We also need to add security labeling, lock manager lease, + * and auditing of those events, as well as to support reuse of + * existing images, including probing the existing format of an + * existing image. */ + if (!format) + format = disk->driverType; + if ((format && !(disk->mirrorFormat = strdup(format))) || + !(disk->mirror = strdup(dest))) { + virReportOOMError(); + goto endjob; + } + + /* Actually start the mirroring */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags); + if (ret == 0 && bandwidth != 0) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL); + qemuDomainObjExitMonitorWithDriver(driver, vm); + +endjob: + if (ret < 0) { + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + } + if (qemuDomainObjEndJob(driver, vm) == 0) { + vm = NULL; + goto cleanup; + } + +cleanup: + VIR_FREE(device); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + +static int qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); + + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + const char *format = NULL; + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) + format = "raw"; + flags &= ~(VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW); + return qemuDomainBlockCopy(dom, path, base, format, bandwidth, flags); + } + return qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, BLOCK_JOB_PULL, flags); } @@ -11889,6 +12000,7 @@ static int qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { + virCheckFlags(0, -1); return qemuDomainBlockRebase(dom, path, NULL, bandwidth, flags); } -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:21 -0600, Eric Blake wrote:
Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. --- src/qemu/qemu_driver.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 113 insertions(+), 1 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 2207184..6d5a5da 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11877,10 +11877,121 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, }
static int +qemuDomainBlockCopy(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *device = NULL; + virDomainDiskDefPtr disk; + int ret = -1; + int idx; + + /* Preliminaries: find the disk we are editing, sanity checks */ + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + + qemuDriverLock(driver); + virUUIDFormat(dom->uuid, uuidstr); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + if (!vm) { + qemuReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + + device = qemuDiskPathToAlias(vm, path, &idx); + if (!device) { + goto cleanup; + } + disk = vm->def->disks[idx]; + if (disk->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + + priv = vm->privateData; + if (!(qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN))) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s",
We usually use VIR_ERR_CONFIG_UNSUPPORTED in such cases.
+ _("block copy is not supported with this QEMU binary")); + goto cleanup; + } + if (vm->persistent) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not transient")); + goto cleanup; + }
I guess I wasn't paying enough attention somewhere but why do we forbid block copy for persistent domains? I understand why we want to forbid certain operations when block copy is active but I don't see a reason for penalizing persistent domains.
+ + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto endjob; + } + + /* Prepare the destination file. */ + /* XXX We also need to add security labeling, lock manager lease, + * and auditing of those events, as well as to support reuse of + * existing images, including probing the existing format of an + * existing image. */ + if (!format) + format = disk->driverType; + if ((format && !(disk->mirrorFormat = strdup(format))) || + !(disk->mirror = strdup(dest))) { + virReportOOMError(); + goto endjob; + } + + /* Actually start the mirroring */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags); + if (ret == 0 && bandwidth != 0) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL); + qemuDomainObjExitMonitorWithDriver(driver, vm);
Can BLOCK_JOB_SPEED_INTERNAL fail while block copy remains running? If so, we should try to abort the job in case we fail to set the speed, otherwise we will report an error and forget about the job while qemu will keep copying data.
+ +endjob: + if (ret < 0) { + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + } + if (qemuDomainObjEndJob(driver, vm) == 0) { + vm = NULL; + goto cleanup; + } + +cleanup: + VIR_FREE(device); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + +static int qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); + + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + const char *format = NULL; + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) + format = "raw"; + flags &= ~(VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW); + return qemuDomainBlockCopy(dom, path, base, format, bandwidth, flags); + } + return qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, BLOCK_JOB_PULL, flags); } @@ -11889,6 +12000,7 @@ static int qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { + virCheckFlags(0, -1); return qemuDomainBlockRebase(dom, path, NULL, bandwidth, flags); }
Hmm, personally, I would make qemuDomainBlockPull call qemuDomainBlockJobImpl directly instead of going through qemuDomainBlockRebase while adding the flags check... Jirka

On 04/13/2012 08:46 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:21 -0600, Eric Blake wrote:
Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. --- src/qemu/qemu_driver.c | 114 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 113 insertions(+), 1 deletions(-)
+ + priv = vm->privateData; + if (!(qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN))) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s",
We usually use VIR_ERR_CONFIG_UNSUPPORTED in such cases.
Sure, easy to fix. Actually, looking at 'git grep -B3 -i "qemu binary" src/qemu', I counted an INTERNAL_ERROR, an OPERATION_FAILED, and a couple of OPERATION_INVALID that should all be cleaned up, to match the fact that the majority of uses did indeed favor CONFIG_UNSUPPORTED. I'll split the cleanup into a separate patch.
+ _("block copy is not supported with this QEMU binary")); + goto cleanup; + } + if (vm->persistent) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not transient")); + goto cleanup; + }
I guess I wasn't paying enough attention somewhere but why do we forbid block copy for persistent domains? I understand why we want to forbid certain operations when block copy is active but I don't see a reason for penalizing persistent domains.
It was in patch 8/18 where I added the restrictions, and hopefully documented in that commit message why limiting things to transient is a good first step: 1. the first client of live migration is oVirt, which uses transient domains 2. qemu does not (yet) provide a way to resume a mirror when restarting a domain, so anything that would require restoring a domain from saved state is broken: incoming migration, virsh restore, and virsh start after a managed save. But users would be upset if they saved a domain, only to find out that they cannot then restore it, so I squelched things one step earlier in the process, by preventing any save of a domain so that we never have a broken save image in the first place. My worry now comes from the fact that managedsave is on the list of forbidden operations. If a domain is transient, managedsave is already forbidden (it is assumed that you either don't care about the domain if the host dies, or that you are running a higher-level app like oVirt that knows how to rebuild the guest on a different host). But if a guest is persistent, and you use the libvirt-guests init script, then you have a right to assume that rebooting your host will resume your guests in the same state that they were prior to the host going down - because libvirt-guests uses managedsave. If we allow a mirror job on a persistent domain, we violate this assumption (libvirt-guests will fail to save the guest). Therefore, I forbid to start a mirror job on a persistent domain, just as I forbid to 'virsh define' a transient domain into a persistent domain if a mirror job is active. If, at a later date, qemu comes up with a way to resume mirroring when restarting a domain, we can relax these restrictions.
+ + /* Actually start the mirroring */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags); + if (ret == 0 && bandwidth != 0) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL); + qemuDomainObjExitMonitorWithDriver(driver, vm);
Can BLOCK_JOB_SPEED_INTERNAL fail while block copy remains running? If so, we should try to abort the job in case we fail to set the speed, otherwise we will report an error and forget about the job while qemu will keep copying data.
Good catch, and virDomainBlockPull() suffers from the same ramifications. I think both code paths need the same fix.
@@ -11889,6 +12000,7 @@ static int qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { + virCheckFlags(0, -1); return qemuDomainBlockRebase(dom, path, NULL, bandwidth, flags); }
Hmm, personally, I would make qemuDomainBlockPull call qemuDomainBlockJobImpl directly instead of going through qemuDomainBlockRebase while adding the flags check...
Easy enough to change. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Apr 13, 2012 at 09:15:18 -0600, Eric Blake wrote:
On 04/13/2012 08:46 AM, Jiri Denemark wrote:
I guess I wasn't paying enough attention somewhere but why do we forbid block copy for persistent domains? I understand why we want to forbid certain operations when block copy is active but I don't see a reason for penalizing persistent domains.
It was in patch 8/18 where I added the restrictions, and hopefully documented in that commit message why limiting things to transient is a good first step:
1. the first client of live migration is oVirt, which uses transient domains
2. qemu does not (yet) provide a way to resume a mirror when restarting a domain, so anything that would require restoring a domain from saved state is broken: incoming migration, virsh restore, and virsh start after a managed save. But users would be upset if they saved a domain, only to find out that they cannot then restore it, so I squelched things one step earlier in the process, by preventing any save of a domain so that we never have a broken save image in the first place.
My worry now comes from the fact that managedsave is on the list of forbidden operations. If a domain is transient, managedsave is already forbidden (it is assumed that you either don't care about the domain if the host dies, or that you are running a higher-level app like oVirt that knows how to rebuild the guest on a different host). But if a guest is persistent, and you use the libvirt-guests init script, then
Yeah, this was the part I didn't see. Failing manually issued commands is fine but failing managedsave called from libvirt-guests during host shutdown/reboot is not nice. I agree with forbidding block copy for persistent domains then. Jirka

On 04/13/2012 09:15 AM, Eric Blake wrote:
On 04/13/2012 08:46 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:21 -0600, Eric Blake wrote:
Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches.
+ qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags); + if (ret == 0 && bandwidth != 0) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL); + qemuDomainObjExitMonitorWithDriver(driver, vm);
Can BLOCK_JOB_SPEED_INTERNAL fail while block copy remains running? If so, we should try to abort the job in case we fail to set the speed, otherwise we will report an error and forget about the job while qemu will keep copying data.
Good catch, and virDomainBlockPull() suffers from the same ramifications. I think both code paths need the same fix.
Actually, thinking about this a bit more: qemu documents that block-job-set-speed will fail if: DeviceNotActive: streaming is not active (but we already ignore this for BLOCK_JOB_SPEED_INTERNAL, thanks to commit a9d3495) NotSupported: throttling not supported It doesn't document any other errors; there are probably errors such as DeviceNotFound if an invalid device is specified, but we are less likely to hit those, since we would have hit that same error with the initial drive-mirror. Meanwhile, libvirt.c currently documents: * The maximum bandwidth (in Mbps) that will be used to do the copy can be * specified with the bandwidth parameter. If set to 0, libvirt will choose a * suitable default. Some hypervisors do not support this feature and will * return an error if bandwidth is not 0. But trying to cancel the job is risky - we've already modified the destination file; it would be nicer if we could detect the throttling limitation before even starting the job, but qemu doesn't let us do that. I will see about requesting the ability to set bandwidth as part of starting a job, rather than our current limitation of a guaranteed race, as a qemu patch. In the meantime, I think I will patch this instead by documentation: state that non-zero bandwidth in virDomainBlockRebase() may be silently ignored; if error checking is needed, the user should use a separate call to virDomainBlockJobSetSpeed(); and to see if a request was honored, use virDomainGetBlockJobInfo() (with the known race that the job might already be complete). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Apr 16, 2012 at 17:28:10 -0600, Eric Blake wrote:
On 04/13/2012 09:15 AM, Eric Blake wrote:
On 04/13/2012 08:46 AM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:21 -0600, Eric Blake wrote:
Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches.
+ qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags); + if (ret == 0 && bandwidth != 0) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL); + qemuDomainObjExitMonitorWithDriver(driver, vm);
Can BLOCK_JOB_SPEED_INTERNAL fail while block copy remains running? If so, we should try to abort the job in case we fail to set the speed, otherwise we will report an error and forget about the job while qemu will keep copying data.
Good catch, and virDomainBlockPull() suffers from the same ramifications. I think both code paths need the same fix.
In the meantime, I think I will patch this instead by documentation: state that non-zero bandwidth in virDomainBlockRebase() may be silently ignored; if error checking is needed, the user should use a separate call to virDomainBlockJobSetSpeed(); and to see if a request was honored, use virDomainGetBlockJobInfo() (with the known race that the job might already be complete).
Makes sense, although we don't want to store the result of BLOCK_JOB_SPEED_INTERNAL to be stored in ret, then. Jirka

On 04/17/2012 02:32 AM, Jiri Denemark wrote:
In the meantime, I think I will patch this instead by documentation: state that non-zero bandwidth in virDomainBlockRebase() may be silently ignored; if error checking is needed, the user should use a separate call to virDomainBlockJobSetSpeed(); and to see if a request was honored, use virDomainGetBlockJobInfo() (with the known race that the job might already be complete).
Makes sense, although we don't want to store the result of BLOCK_JOB_SPEED_INTERNAL to be stored in ret, then.
Actually, qemu may save us - they are agreeing that this is a problem, and will probably have a patch before libvirt 0.9.12 that allows block-job-set-speed to be called prior to block-stream or drive-mirror; at which point the correct fix is to then try to set the speed _prior_ to starting the job (but only when we know we are talking to qemu 1.1, not RHEL 6.2). It would leave RHEL 6.2 with the race, but oh well. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This copies some of the checks from snapshots regarding testing when a file already exists. In the process, I noticed a missing sanity check in snapshots: REUSE_EXT should require the destination to already be present. * src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT flag. (qemuDomainBlockCopy): Wire up flag, and add some sanity checks. (qemuDomainSnapshotDiskPrepare): Require destination on REUSE_EXT. --- src/qemu/qemu_driver.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 43 insertions(+), 5 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6d5a5da..661ccb4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9812,6 +9812,11 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, _("unable to stat for disk %s: %s"), disk->name, disk->file); goto cleanup; + } else if (allow_reuse) { + virReportSystemError(errno, + _("missing existing file for disk %s: %s"), + disk->name, disk->file); + goto cleanup; } } else if (!(S_ISBLK(st.st_mode) || !st.st_size || allow_reuse)) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -11889,9 +11894,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, virDomainDiskDefPtr disk; int ret = -1; int idx; + struct stat st; /* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1); qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11936,12 +11943,42 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } + /* XXX this is pessimistic; we could use 'query-block' or even + * keep track of the backing chain ourselves, rather than assuming + * that all non-raw source files have a current backing image */ + if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + STREQ_NULLABLE(format, "raw") && + STRNEQ_NULLABLE(disk->driverType, "raw")) { + qemuReportError(VIR_ERR_INVALID_ARG, + _("raw shallow copy of non-raw disk '%s' not possible"), + disk->dst); + goto endjob; + } + /* Prepare the destination file. */ + if (stat(dest, &st) < 0) { + if (errno != ENOENT) { + virReportSystemError(errno, _("unable to stat for disk %s: %s"), + disk->dst, dest); + goto endjob; + } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + virReportSystemError(errno, + _("missing destination file for disk %s: %s"), + disk->dst, dest); + goto endjob; + } + } else if (!(S_ISBLK(st.st_mode) || !st.st_size || + (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT))) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("external destination file for disk %s already " + "exists and is not a block device: %s"), + disk->dst, dest); + goto endjob; + } + /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events, as well as to support reuse of - * existing images, including probing the existing format of an - * existing image. */ - if (!format) + * and auditing of those events. */ + if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) format = disk->driverType; if ((format && !(disk->mirrorFormat = strdup(format))) || !(disk->mirror = strdup(dest))) { @@ -11980,6 +12017,7 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | VIR_DOMAIN_BLOCK_REBASE_COPY | VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:22 -0600, Eric Blake wrote:
This copies some of the checks from snapshots regarding testing when a file already exists. In the process, I noticed a missing sanity check in snapshots: REUSE_EXT should require the destination to already be present.
* src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT flag. (qemuDomainBlockCopy): Wire up flag, and add some sanity checks. (qemuDomainSnapshotDiskPrepare): Require destination on REUSE_EXT. --- src/qemu/qemu_driver.c | 48 +++++++++++++++++++++++++++++++++++++++++++----- 1 files changed, 43 insertions(+), 5 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 6d5a5da..661ccb4 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9812,6 +9812,11 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, _("unable to stat for disk %s: %s"), disk->name, disk->file); goto cleanup; + } else if (allow_reuse) { + virReportSystemError(errno, + _("missing existing file for disk %s: %s"), + disk->name, disk->file); + goto cleanup; } } else if (!(S_ISBLK(st.st_mode) || !st.st_size || allow_reuse)) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -11889,9 +11894,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, virDomainDiskDefPtr disk; int ret = -1; int idx; + struct stat st;
/* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1);
qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11936,12 +11943,42 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; }
+ /* XXX this is pessimistic; we could use 'query-block' or even + * keep track of the backing chain ourselves, rather than assuming + * that all non-raw source files have a current backing image */ + if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + STREQ_NULLABLE(format, "raw") && + STRNEQ_NULLABLE(disk->driverType, "raw")) { + qemuReportError(VIR_ERR_INVALID_ARG, + _("raw shallow copy of non-raw disk '%s' not possible"), + disk->dst); + goto endjob; + } + /* Prepare the destination file. */ + if (stat(dest, &st) < 0) { + if (errno != ENOENT) { + virReportSystemError(errno, _("unable to stat for disk %s: %s"), + disk->dst, dest); + goto endjob; + } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + virReportSystemError(errno, + _("missing destination file for disk %s: %s"), + disk->dst, dest); + goto endjob; + } + } else if (!(S_ISBLK(st.st_mode) || !st.st_size || + (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT))) {
Eh, this is quite hard to parse, I'd move the negation inside; if (!S_ISBLK(st.st_mode) && st.st_size && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT))
+ qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("external destination file for disk %s already " + "exists and is not a block device: %s"), + disk->dst, dest); + goto endjob; + } + /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events, as well as to support reuse of - * existing images, including probing the existing format of an - * existing image. */ - if (!format) + * and auditing of those events. */ + if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) format = disk->driverType; if ((format && !(disk->mirrorFormat = strdup(format))) || !(disk->mirror = strdup(dest))) { @@ -11980,6 +12017,7 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | VIR_DOMAIN_BLOCK_REBASE_COPY | VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1);
OK Jirka

This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label on failure or at the end of the mirroring is a trickier prospect. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling. --- src/qemu/qemu_driver.c | 63 +++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 54 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 661ccb4..41f545f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11895,6 +11895,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, int ret = -1; int idx; struct stat st; + bool need_unlink = false; + char *mirror = NULL; + char *mirrorFormat = NULL; + char *origsrc = NULL; + char *origdriver = NULL; /* Preliminaries: find the disk we are editing, sanity checks */ virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | @@ -11976,29 +11981,69 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } - /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events. */ - if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) - format = disk->driverType; - if ((format && !(disk->mirrorFormat = strdup(format))) || - !(disk->mirror = strdup(dest))) { + if (!(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + int fd = qemuOpenFile(driver, dest, O_WRONLY | O_TRUNC | O_CREAT, + &need_unlink, NULL); + if (fd < 0) + goto endjob; + VIR_FORCE_CLOSE(fd); + if (!format) + format = disk->driverType; + } + if ((format && !(mirrorFormat = strdup(format))) || + !(mirror = strdup(dest))) { virReportOOMError(); goto endjob; } + /* Manipulate disk in place, in a way that can be reverted on + * failure, in order to set up labeling and locking. */ + origsrc = disk->src; + disk->src = (char *) dest; + origdriver = disk->driverType; + disk->driverType = (char *) "raw"; /* Don't want to probe backing files */ + + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto endjob; + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + } + + disk->src = origsrc; + origsrc = NULL; + disk->driverType = origdriver; + origdriver = NULL; + /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags); + virDomainAuditDisk(vm, NULL, dest, "mirror", ret >= 0); if (ret == 0 && bandwidth != 0) ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, BLOCK_JOB_SPEED_INTERNAL); qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto endjob; + + /* Update vm in place to match changes. */ + need_unlink = false; + disk->mirror = mirror; + disk->mirrorFormat = mirrorFormat; + mirror = NULL; + mirrorFormat = NULL; endjob: - if (ret < 0) { - VIR_FREE(disk->mirror); - VIR_FREE(disk->mirrorFormat); + if (origsrc) { + disk->src = origsrc; + disk->driverType = origdriver; } + if (need_unlink && unlink(dest)) + VIR_WARN("unable to unlink just-created %s", dest); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); if (qemuDomainObjEndJob(driver, vm) == 0) { vm = NULL; goto cleanup; -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:23 -0600, Eric Blake wrote:
This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label on failure or at the end of the mirroring is a trickier prospect.
* src/qemu/qemu_driver.c (qemuDomainBlockCopy): Set up labeling. --- src/qemu/qemu_driver.c | 63 +++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 54 insertions(+), 9 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 661ccb4..41f545f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11895,6 +11895,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, int ret = -1; int idx; struct stat st; + bool need_unlink = false; + char *mirror = NULL; + char *mirrorFormat = NULL; + char *origsrc = NULL; + char *origdriver = NULL;
/* Preliminaries: find the disk we are editing, sanity checks */ virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | @@ -11976,29 +11981,69 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; }
- /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events. */ - if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) - format = disk->driverType; - if ((format && !(disk->mirrorFormat = strdup(format))) || - !(disk->mirror = strdup(dest))) { + if (!(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + int fd = qemuOpenFile(driver, dest, O_WRONLY | O_TRUNC | O_CREAT, + &need_unlink, NULL); + if (fd < 0) + goto endjob; + VIR_FORCE_CLOSE(fd); + if (!format) + format = disk->driverType; + } + if ((format && !(mirrorFormat = strdup(format))) || + !(mirror = strdup(dest))) { virReportOOMError(); goto endjob; }
+ /* Manipulate disk in place, in a way that can be reverted on + * failure, in order to set up labeling and locking. */ + origsrc = disk->src; + disk->src = (char *) dest; + origdriver = disk->driverType; + disk->driverType = (char *) "raw"; /* Don't want to probe backing files */ + + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto endjob; + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + }
Is there a reason for calling virDomainLockDiskDetach only when SetImageLabel fails and not calling it if mirroring itself fails?
+ + disk->src = origsrc; + origsrc = NULL; + disk->driverType = origdriver; + origdriver = NULL; + /* Actually start the mirroring */ qemuDomainObjEnterMonitorWithDriver(driver, vm); ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, flags); + virDomainAuditDisk(vm, NULL, dest, "mirror", ret >= 0); if (ret == 0 && bandwidth != 0) ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, BLOCK_JOB_SPEED_INTERNAL); qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto endjob; + + /* Update vm in place to match changes. */ + need_unlink = false; + disk->mirror = mirror; + disk->mirrorFormat = mirrorFormat; + mirror = NULL; + mirrorFormat = NULL;
endjob: - if (ret < 0) { - VIR_FREE(disk->mirror); - VIR_FREE(disk->mirrorFormat); + if (origsrc) { + disk->src = origsrc; + disk->driverType = origdriver; } + if (need_unlink && unlink(dest)) + VIR_WARN("unable to unlink just-created %s", dest); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); if (qemuDomainObjEndJob(driver, vm) == 0) { vm = NULL; goto cleanup;
Jirka

On 04/13/2012 02:45 PM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:23 -0600, Eric Blake wrote:
This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label on failure or at the end of the mirroring is a trickier prospect.
+ + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto endjob; + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + }
Is there a reason for calling virDomainLockDiskDetach only when SetImageLabel fails and not calling it if mirroring itself fails?
Simplicity of code - it's easier to code up permissions granting (and leaking that) then it is to figure out which permissions need to be reversed. I'll attempt to do a better job in v5, at least when mirroring fails. But revoking permissions after 'drive-reopen' is a bear - you can't do it immediately after the 'drive-reopen' command, but have to wait until after the BLOCK_JOB_COMPLETED event before you know that qemu no longer needs the file that you are about to revoke rights on. And if libvirtd restarts in between the 'drive-reopen' command and the eventual event from qemu, then we have to store state in the XML to remind ourselves to do 'query-block' and 'query-block-job' when reconnecting to the domain to figure out what state qemu is still in. In fact, Paolo just made a proposal to upstream qemu today that would add another aspect into the cleanup arena: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01638.html the idea there is that libvirt would be responsible for creating a local file (under /var/lib/libvirt/qemu, alongside the domain XML), and qemu writes one byte to the file when the drive-reopen actually alters state, so that part of the libvirtd restart recovery process is to check whether the file is empty (the reopen didn't happen) or has contents (the reopen succeeded, start the cleanup work for any resources no longer appropriate to qemu). I'll have to figure out how much of that I can interact with in my v5 series. In short, I figured it was better to have code that could at least demonstrate the API, even if it leaks, and work on plugging the leaks as I have more time, than it was to wait for perfect code but miss out on review and integration time. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Apr 13, 2012 at 14:59:32 -0600, Eric Blake wrote:
On 04/13/2012 02:45 PM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:23 -0600, Eric Blake wrote:
This copies heavily from qemuDomainSnapshotCreateSingleDiskActive(), in order to set the SELinux label, obtain locking manager lease, and audit the fact that we hand a new file over to qemu. Alas, releasing the lease and label on failure or at the end of the mirroring is a trickier prospect.
+ + if (virDomainLockDiskAttach(driver->lockManager, vm, disk) < 0) + goto endjob; + if (virSecurityManagerSetImageLabel(driver->securityManager, vm->def, + disk) < 0) { + if (virDomainLockDiskDetach(driver->lockManager, vm, disk) < 0) + VIR_WARN("Unable to release lock on %s", dest); + goto endjob; + }
Is there a reason for calling virDomainLockDiskDetach only when SetImageLabel fails and not calling it if mirroring itself fails?
Simplicity of code - it's easier to code up permissions granting (and leaking that) then it is to figure out which permissions need to be reversed. I'll attempt to do a better job in v5, at least when mirroring fails. But revoking permissions after 'drive-reopen' is a bear - you can't do it immediately after the 'drive-reopen' command, but
But there's no drive-reopen involved here. We only issue drive-mirror command that starts the process. And if it fails, i.e., we don't even start copying data, we don't call virDomainLockDiskDetach to undo virDomainLockDiskAttach. It's possible we don't need to call that because the file is unlinked in the end, which is sufficient to remove the lock but I'm not sure and that's why I'm asking :-) Jirka

On 04/13/2012 03:15 PM, Jiri Denemark wrote:
Is there a reason for calling virDomainLockDiskDetach only when SetImageLabel fails and not calling it if mirroring itself fails?
Simplicity of code - it's easier to code up permissions granting (and leaking that) then it is to figure out which permissions need to be reversed. I'll attempt to do a better job in v5, at least when mirroring fails. But revoking permissions after 'drive-reopen' is a bear - you can't do it immediately after the 'drive-reopen' command, but
But there's no drive-reopen involved here. We only issue drive-mirror command that starts the process. And if it fails, i.e., we don't even start copying data, we don't call virDomainLockDiskDetach to undo virDomainLockDiskAttach. It's possible we don't need to call that because the file is unlinked in the end, which is sufficient to remove the lock but I'm not sure and that's why I'm asking :-)
I'm not sure whether virDomainSnapshotCreateXML shares the same bug, or whether when I converted things over to the 'transaction' command if I got it fixed. At any rate, you're right that I should at least clean up this instance, since we know the failure path of drive-mirror is a lot simpler than the success path of drive-reopen. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

RHEL-only drive-mirror and drive-reopen are still under upstream qemu discussion; as a result, RHEL decided to backport things under a downstream name. Accommodate this alternate spelling. I don't think it's worth trying to support both spellings at once: if you build upstream libvirt on RHEL, you lose out on the feature, but then you are also capable of building upstream qemu for RHEL to reinstate the feature (that is, if you use RHEL, you should either stick to the distro patches, or you are assumed to be capable of building the entire virt stack yourself). * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Check for alternate spelling. (int qemuMonitorJSONDriveMirror, qemuMonitorJSONDriveReopen): Use that spelling. --- src/qemu/qemu_monitor_json.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 8475dab..27bed97 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -975,9 +975,9 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, qemuCapsSet(qemuCaps, QEMU_CAPS_WAKEUP); else if (STREQ(name, "transaction")) qemuCapsSet(qemuCaps, QEMU_CAPS_TRANSACTION); - else if (STREQ(name, "drive-mirror")) + else if (STREQ(name, "__com.redhat_drive-mirror")) qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); - else if (STREQ(name, "drive-reopen")) + else if (STREQ(name, "__com.redhat_drive-reopen")) qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); } @@ -3203,7 +3203,7 @@ qemuMonitorJSONDriveMirror(qemuMonitorPtr mon ATTRIBUTE_UNUSED, bool reuse = (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) != 0; cmd = qemuMonitorJSONMakeCommandRaw(actions != NULL, - "drive-mirror", + "__com.redhat_drive-mirror", "s:device", device, "s:target", file, "b:full", !shallow, @@ -3269,7 +3269,7 @@ qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, virJSONValuePtr cmd; virJSONValuePtr reply = NULL; - cmd = qemuMonitorJSONMakeCommand("drive-reopen", + cmd = qemuMonitorJSONMakeCommand("__com.redhat_drive-reopen", "s:device", device, "s:new-image-file", file, format ? "s:format" : NULL, format, -- 1.7.7.6

On 04/09/2012 09:52 PM, Eric Blake wrote:
RHEL-only
drive-mirror and drive-reopen are still under upstream qemu discussion; as a result, RHEL decided to backport things under a downstream name. Accommodate this alternate spelling. I don't think it's worth trying to support both spellings at once: if you build upstream libvirt on RHEL, you lose out on the feature, but then you are also capable of building upstream qemu for RHEL to reinstate the feature (that is, if you use RHEL, you should either stick to the distro patches, or you are assumed to be capable of building the entire virt stack yourself).
* src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Check for alternate spelling. (int qemuMonitorJSONDriveMirror, qemuMonitorJSONDriveReopen): Use that spelling. --- src/qemu/qemu_monitor_json.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-)
Just a heads-up for those following this thread: Paolo reposted a proposal for the 'drive-mirror' job that might make it into qemu 1.1: https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01630.html In particular, the upstream proposal does _not_ allow for 'drive-mirror' as part of transaction; that would have to wait for qemu 1.2. As a result, my earlier proposal for supporting snapshot+mirror is not possible with the upstream qemu 1.1 proposal: https://www.redhat.com/archives/libvir-list/2012-March/msg01033.html However, the backport of __com.redhat_drive-mirror into RHEL 6.3 was based on the earlier versions that were proposing the blkmirror driver instead of a block job implementation, and since snapshot+mirror was first developed against a potential RHEL build, Paolo has made an effort to keep things working there, so I am still maintaining that patch series and will post a rebase of the series on top of my blockjob patches. Of course, the snapshot+mirror won't be committed upstream without upstream qemu support, but at least it will provide a comparison between the two live storage migration proposals, as well as something that could still be backported to RHEL. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

This new API provides additional flexibility over what can be crammed on top of virDomainBlockRebase (namely, the ability to specify an arbitrary destination format, for things like copying qcow2 into qed without having to pre-create the destination), at the expense that it cannot be backported without bumping the .so version. * include/libvirt/libvirt.h.in (virDomainBlockCopy): New API. * src/libvirt.c (virDomainBlockCopy): Implement it. * src/libvirt_public.syms (LIBVIRT_0.9.12): Export it. * src/driver.h (virDrvDomainBlockCopy): New driver callback. * docs/apibuild.py (CParser.parseSignature): Add exception. --- docs/apibuild.py | 1 + include/libvirt/libvirt.h.in | 18 ++++++- src/driver.h | 5 ++ src/libvirt.c | 119 +++++++++++++++++++++++++++++++++++++++++- src/libvirt_public.syms | 5 ++ 5 files changed, 145 insertions(+), 3 deletions(-) diff --git a/docs/apibuild.py b/docs/apibuild.py index 1ac0281..bf06f3b 100755 --- a/docs/apibuild.py +++ b/docs/apibuild.py @@ -1650,6 +1650,7 @@ class CParser: "virDomainBlockJobSetSpeed" : (False, ("bandwidth")), "virDomainBlockPull" : (False, ("bandwidth")), "virDomainBlockRebase" : (False, ("bandwidth")), + "virDomainBlockCopy" : (False, ("bandwidth")), "virDomainMigrateGetMaxSpeed" : (False, ("bandwidth")) } def checkLongLegacyFunction(self, name, return_type, signature): diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index ac5df95..7cd5bbe 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1937,7 +1937,7 @@ int virDomainUpdateDeviceFlags(virDomainPtr domain, * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull, or * virDomainBlockRebase without flags), job ends on completion * VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: Block Copy (virDomainBlockRebase with - * flags), job exists as long as mirroring is active + * flags, or virDomainBlockCopy), job exists as long as mirroring is active */ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN = 0, @@ -2007,6 +2007,22 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags); +/** + * virDomainBlockCopyFlags: + * + * Flags available for virDomainBlockCopy(). + */ +typedef enum { + VIR_DOMAIN_BLOCK_COPY_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ +} virDomainBlockCopyFlags; + +int virDomainBlockCopy(virDomainPtr dom, const char *disk, const char *dest, + const char *format, unsigned long bandwidth, + unsigned int flags); + /* Block I/O throttling support */ diff --git a/src/driver.h b/src/driver.h index 03d249b..e10ba71 100644 --- a/src/driver.h +++ b/src/driver.h @@ -788,6 +788,10 @@ typedef int (*virDrvDomainBlockRebase)(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags); +typedef int + (*virDrvDomainBlockCopy)(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags); typedef int (*virDrvSetKeepAlive)(virConnectPtr conn, @@ -1005,6 +1009,7 @@ struct _virDriver { virDrvDomainBlockJobSetSpeed domainBlockJobSetSpeed; virDrvDomainBlockPull domainBlockPull; virDrvDomainBlockRebase domainBlockRebase; + virDrvDomainBlockCopy domainBlockCopy; virDrvSetKeepAlive setKeepAlive; virDrvConnectIsAlive isAlive; virDrvNodeSuspendForDuration nodeSuspendForDuration; diff --git a/src/libvirt.c b/src/libvirt.c index 753a2e0..bdde654 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18263,7 +18263,11 @@ error: * return an error if bandwidth is not 0. * * When @base is NULL and @flags is 0, this is identical to - * virDomainBlockPull(). + * virDomainBlockPull(). Conversely, when @flags includes + * VIR_DOMAIN_BLOCK_REBASE_COPY, this is shorthand for + * virDomainBlockCopy(dom, disk, base, + * flags & VIR_DOMAIN_BLOCK_COPY_RAW ? "raw" : NULL, bandwidth, + * flags & (VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)). * * Returns 0 if the operation has started, -1 on failure. */ @@ -18273,7 +18277,7 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, { virConnectPtr conn; - VIR_DOMAIN_DEBUG(dom, "disk=%s, base=%s bandwidth=%lu, flags=%x", + VIR_DOMAIN_DEBUG(dom, "disk=%s, base=%s, bandwidth=%lu, flags=%x", disk, NULLSTR(base), bandwidth, flags); virResetLastError(); @@ -18328,6 +18332,117 @@ error: /** + * virDomainBlockCopy: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @dest: path to the copy destination + * @format: format of the destination + * @bandwidth: (optional) specify copy bandwidth limit in Mbps + * @flags: bitwise-OR of virDomainBlockCopyFlags + * + * Start a block copy job, where @dest is the name of a new file to copy + * the chain to. By default, the copy will pull the entire source chain + * into the destination file, but if @flags also contains + * VIR_DOMAIN_BLOCK_COPY_SHALLOW, then only the top of the source chain + * will be copied (the source and destination have a common backing file). + * By default, @dest will be created with the same file format as the + * source, but this can be altered by passing a non-NULL @format argument, + * or by using VIR_DOMAIN_BLOCK_COPY_REUSE_EXT to reuse an existing file + * with initial contents identical to the backing file of the source (this + * allows a management app to pre-create files with relative backing file + * names, rather than the default of absolute backing file names; it is + * generally used with the shallow flag, since otherwise the destination + * file must start with empty contents). + * + * A copy job has two parts; in the first phase, the @bandwidth parameter + * affects how fast the source is pulled into the destination, and the job + * can only be canceled by reverting to the source file; progress in this + * phase can be tracked via the virDomainBlockJobInfo() command, with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_COPY. The job transitions to the + * second phase when the job info states cur == end, and remains alive to + * mirror all further changes to both source and destination. The user + * must call virDomainBlockJobAbort() to end the mirroring while choosing + * whether to revert to source or pivot to the destination. An event is + * issued when the job ends, and in the future, an event may be added when + * the job transitions from pulling to mirroring. + * + * Some hypervisors will restrict certain actions, such as virDomainSave() + * or virDomainDetachDevice(), while a copy job is active; they may + * also restrict a copy job to transient domains. + * + * The @disk parameter is either an unambiguous source name of the + * block device (the <source file='...'/> sub-element, such as + * "/path/to/image"), or the device target shorthand (the + * <target dev='...'/> sub-element, such as "xvda"). Valid names + * can be found by calling virDomainGetXMLDesc() and inspecting + * elements within //domain/devices/disk. + * + * The maximum bandwidth (in Mbps) that will be used to do the copy can be + * specified with the bandwidth parameter. If set to 0, libvirt will choose a + * suitable default. Some hypervisors do not support this feature and will + * return an error if bandwidth is not 0. + * + * When @format is NULL, this is equivalent to calling + * virDomainBlockRebase() with the VIR_DOMAIN_BLOCK_REBASE_COPY flag added + * to @flags. Additionally, if @format is "raw", this is equivalent to + * calling virDomainBlockRebase() with the VIR_DOMAIN_BLOCK_REBASE_COPY + * and VIR_DOMAIN_BLOCK_REBASE_COPY_RAW flags added to @flags. + * + * Returns 0 if the operation has started, -1 on failure. + */ +int virDomainBlockCopy(virDomainPtr dom, const char *disk, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(dom, + "disk=%s, dest=%s, format=%s, bandwidth=%lu, flags=%x", + disk, dest, NULLSTR(format), bandwidth, flags); + + virResetLastError(); + + if (!VIR_IS_CONNECTED_DOMAIN (dom)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + conn = dom->conn; + + if (dom->conn->flags & VIR_CONNECT_RO) { + virLibDomainError(VIR_ERR_OPERATION_DENIED, __FUNCTION__); + goto error; + } + + if (!disk) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("disk is NULL")); + goto error; + } + if (!dest) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("dest is NULL")); + goto error; + } + + if (conn->driver->domainBlockCopy) { + int ret; + ret = conn->driver->domainBlockCopy(dom, disk, dest, format, bandwidth, + flags); + if (ret < 0) + goto error; + return ret; + } + + virLibDomainError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(dom->conn); + return -1; +} + + +/** * virDomainOpenGraphics: * @dom: pointer to domain object * @idx: index of graphics config to open diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 46c13fb..d152ab9 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -534,4 +534,9 @@ LIBVIRT_0.9.11 { virDomainPMWakeup; } LIBVIRT_0.9.10; +LIBVIRT_0.9.12 { + global: + virDomainBlockCopy; +} LIBVIRT_0.9.11; + # .... define new API here using predicted next version number .... -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:25 -0600, Eric Blake wrote:
This new API provides additional flexibility over what can be crammed on top of virDomainBlockRebase (namely, the ability to specify an arbitrary destination format, for things like copying qcow2 into qed without having to pre-create the destination), at the expense that it cannot be backported without bumping the .so version.
* include/libvirt/libvirt.h.in (virDomainBlockCopy): New API. * src/libvirt.c (virDomainBlockCopy): Implement it. * src/libvirt_public.syms (LIBVIRT_0.9.12): Export it. * src/driver.h (virDrvDomainBlockCopy): New driver callback. * docs/apibuild.py (CParser.parseSignature): Add exception. --- docs/apibuild.py | 1 + include/libvirt/libvirt.h.in | 18 ++++++- src/driver.h | 5 ++ src/libvirt.c | 119 +++++++++++++++++++++++++++++++++++++++++- src/libvirt_public.syms | 5 ++ 5 files changed, 145 insertions(+), 3 deletions(-)
diff --git a/docs/apibuild.py b/docs/apibuild.py index 1ac0281..bf06f3b 100755 --- a/docs/apibuild.py +++ b/docs/apibuild.py @@ -1650,6 +1650,7 @@ class CParser: "virDomainBlockJobSetSpeed" : (False, ("bandwidth")), "virDomainBlockPull" : (False, ("bandwidth")), "virDomainBlockRebase" : (False, ("bandwidth")), + "virDomainBlockCopy" : (False, ("bandwidth")), "virDomainMigrateGetMaxSpeed" : (False, ("bandwidth")) }
def checkLongLegacyFunction(self, name, return_type, signature): diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index ac5df95..7cd5bbe 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1937,7 +1937,7 @@ int virDomainUpdateDeviceFlags(virDomainPtr domain, * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull, or * virDomainBlockRebase without flags), job ends on completion * VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: Block Copy (virDomainBlockRebase with - * flags), job exists as long as mirroring is active + * flags, or virDomainBlockCopy), job exists as long as mirroring is active */ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN = 0, @@ -2007,6 +2007,22 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags);
+/** + * virDomainBlockCopyFlags: + * + * Flags available for virDomainBlockCopy(). + */ +typedef enum { + VIR_DOMAIN_BLOCK_COPY_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ +} virDomainBlockCopyFlags;
Hmm, we have several flags enums that end up being passed to a single internal API and one has to be extra careful when adding new flags. Should we make a note about this to the affected enums?
+ +int virDomainBlockCopy(virDomainPtr dom, const char *disk, const char *dest, + const char *format, unsigned long bandwidth, + unsigned int flags); +
/* Block I/O throttling support */
diff --git a/src/driver.h b/src/driver.h index 03d249b..e10ba71 100644 --- a/src/driver.h +++ b/src/driver.h @@ -788,6 +788,10 @@ typedef int (*virDrvDomainBlockRebase)(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags); +typedef int + (*virDrvDomainBlockCopy)(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags);
typedef int (*virDrvSetKeepAlive)(virConnectPtr conn, @@ -1005,6 +1009,7 @@ struct _virDriver { virDrvDomainBlockJobSetSpeed domainBlockJobSetSpeed; virDrvDomainBlockPull domainBlockPull; virDrvDomainBlockRebase domainBlockRebase; + virDrvDomainBlockCopy domainBlockCopy; virDrvSetKeepAlive setKeepAlive; virDrvConnectIsAlive isAlive; virDrvNodeSuspendForDuration nodeSuspendForDuration; diff --git a/src/libvirt.c b/src/libvirt.c index 753a2e0..bdde654 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18263,7 +18263,11 @@ error: * return an error if bandwidth is not 0. * * When @base is NULL and @flags is 0, this is identical to - * virDomainBlockPull(). + * virDomainBlockPull(). Conversely, when @flags includes + * VIR_DOMAIN_BLOCK_REBASE_COPY, this is shorthand for + * virDomainBlockCopy(dom, disk, base, + * flags & VIR_DOMAIN_BLOCK_COPY_RAW ? "raw" : NULL, bandwidth, + * flags & (VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)). * * Returns 0 if the operation has started, -1 on failure. */ @@ -18273,7 +18277,7 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, { virConnectPtr conn;
- VIR_DOMAIN_DEBUG(dom, "disk=%s, base=%s bandwidth=%lu, flags=%x", + VIR_DOMAIN_DEBUG(dom, "disk=%s, base=%s, bandwidth=%lu, flags=%x", disk, NULLSTR(base), bandwidth, flags);
virResetLastError(); @@ -18328,6 +18332,117 @@ error:
/** + * virDomainBlockCopy: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @dest: path to the copy destination + * @format: format of the destination + * @bandwidth: (optional) specify copy bandwidth limit in Mbps + * @flags: bitwise-OR of virDomainBlockCopyFlags
OK, so this new API may be used to avoid format guessing involved in virDomainBlockRebase. Shouldn't we introduce an enhanced version of virDomainBlockRebase with format parameter instead of introducing an API with a different name that does almost the same as virDomainBlockRebase? Jirka

On 04/13/2012 03:25 PM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:25 -0600, Eric Blake wrote:
This new API provides additional flexibility over what can be crammed on top of virDomainBlockRebase (namely, the ability to specify an arbitrary destination format, for things like copying qcow2 into qed without having to pre-create the destination), at the expense that it cannot be backported without bumping the .so version.
+typedef enum { + VIR_DOMAIN_BLOCK_COPY_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ +} virDomainBlockCopyFlags;
Hmm, we have several flags enums that end up being passed to a single internal API and one has to be extra careful when adding new flags. Should we make a note about this to the affected enums?
Yes, a note would be helpful (it's only the two least-significant bits that we are playing fast and loose with at the moment, but if we ever expand to a third common bit, I see where it could get confusing).
/** + * virDomainBlockCopy: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @dest: path to the copy destination + * @format: format of the destination + * @bandwidth: (optional) specify copy bandwidth limit in Mbps + * @flags: bitwise-OR of virDomainBlockCopyFlags
OK, so this new API may be used to avoid format guessing involved in virDomainBlockRebase. Shouldn't we introduce an enhanced version of virDomainBlockRebase with format parameter instead of introducing an API with a different name that does almost the same as virDomainBlockRebase?
And what would you name it? I'm saying that virDomainBlockCopy _is_ an enhanced virDomainBlockRebase, and the name BlockCopy was the name I picked, as it looks nicer than virDomainBlockRebase2(). I'm also debating whether to add a 'const char *base' argument, to allow a copy of a partial chain. That is, starting from: base <- snap1 <- snap2 virDomainBlockRebase can only give you: copy (flat image, no backing file, by omitting _SHALLOW) base <- snap1 <- copy (use _SHALLOW, topmost backing file is shared) and if you want to get to: base <- copy you currently have to call virDomainBlockRebase _twice_, with the following API sequence: virDomainBlockRebase(dom, disk, "copy", bandwidth, VIR_DOMAIN_BLOCK_REBASE_COPY | VIR_DOMAIN_BLOCK_REBASE_SHALLOW); virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT); virDomainBlockRebase(dom, disk, "base", bandwidth, 0); But you could theoretically do the same result in one less step, with an optional base argument, by calling: virDomainBlockCopy(dom, disk, "copy", "base", bandwidth, 0); virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT); non-NULL base combined with the _SHALLOW flag would either be outright rejected, or else be useful to validate that the 'base' you are passing in really is the backing file of the source that you are copying from (or even be a way to signify whether you want an absolute or relative backing file name jammed into the destination file). But Paolo hasn't yet decided whether to commit to that optional argument in qemu yet (I originally had it in place for v1 of this series, back when Paolo was trying to add a blkmirror device instead of a block job for mirroring, but ripped it out along my way). I guess I should wait for more feedback on the qemu front before committing to the final form of this proposed libvirt API. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Fri, Apr 13, 2012 at 15:42:20 -0600, Eric Blake wrote:
On 04/13/2012 03:25 PM, Jiri Denemark wrote:
On Mon, Apr 09, 2012 at 21:52:25 -0600, Eric Blake wrote:
This new API provides additional flexibility over what can be crammed on top of virDomainBlockRebase (namely, the ability to specify an arbitrary destination format, for things like copying qcow2 into qed without having to pre-create the destination), at the expense that it cannot be backported without bumping the .so version.
+typedef enum { + VIR_DOMAIN_BLOCK_COPY_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ +} virDomainBlockCopyFlags;
Hmm, we have several flags enums that end up being passed to a single internal API and one has to be extra careful when adding new flags. Should we make a note about this to the affected enums?
Yes, a note would be helpful (it's only the two least-significant bits that we are playing fast and loose with at the moment, but if we ever expand to a third common bit, I see where it could get confusing).
/** + * virDomainBlockCopy: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @dest: path to the copy destination + * @format: format of the destination + * @bandwidth: (optional) specify copy bandwidth limit in Mbps + * @flags: bitwise-OR of virDomainBlockCopyFlags
OK, so this new API may be used to avoid format guessing involved in virDomainBlockRebase. Shouldn't we introduce an enhanced version of virDomainBlockRebase with format parameter instead of introducing an API with a different name that does almost the same as virDomainBlockRebase?
And what would you name it? I'm saying that virDomainBlockCopy _is_ an enhanced virDomainBlockRebase, and the name BlockCopy was the name I picked, as it looks nicer than virDomainBlockRebase2().
I don't know, I was probably expecting something like virDomainBlockRebaseExt :-P I'm just missing a clear link between virDomainBlockRebase and virDomainBlockCopy. I guess a note to virDomainBlockRebase documentation mentioning virDomainBlockCopy as an enhanced version would work too.
I guess I should wait for more feedback on the qemu front before committing to the final form of this proposed libvirt API.
Yeah, I think that's wise. Jirka

On 04/16/2012 08:29 AM, Jiri Denemark wrote:
/** + * virDomainBlockCopy: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @dest: path to the copy destination + * @format: format of the destination + * @bandwidth: (optional) specify copy bandwidth limit in Mbps + * @flags: bitwise-OR of virDomainBlockCopyFlags
OK, so this new API may be used to avoid format guessing involved in virDomainBlockRebase. Shouldn't we introduce an enhanced version of virDomainBlockRebase with format parameter instead of introducing an API with a different name that does almost the same as virDomainBlockRebase?
And what would you name it? I'm saying that virDomainBlockCopy _is_ an enhanced virDomainBlockRebase, and the name BlockCopy was the name I picked, as it looks nicer than virDomainBlockRebase2().
I don't know, I was probably expecting something like virDomainBlockRebaseExt :-P I'm just missing a clear link between virDomainBlockRebase and virDomainBlockCopy. I guess a note to virDomainBlockRebase documentation mentioning virDomainBlockCopy as an enhanced version would work too.
But I already did that :)
+++ b/src/libvirt.c @@ -18263,7 +18263,11 @@ error: * return an error if bandwidth is not 0. * * When @base is NULL and @flags is 0, this is identical to - * virDomainBlockPull(). + * virDomainBlockPull(). Conversely, when @flags includes + * VIR_DOMAIN_BLOCK_REBASE_COPY, this is shorthand for + * virDomainBlockCopy(dom, disk, base, + * flags & VIR_DOMAIN_BLOCK_COPY_RAW ? "raw" : NULL, bandwidth, + * flags & (VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)).
BlockRebase linking to BlockPull...
/** + * virDomainBlockCopy: ... + * + * When @format is NULL, this is equivalent to calling + * virDomainBlockRebase() with the VIR_DOMAIN_BLOCK_REBASE_COPY flag added + * to @flags. Additionally, if @format is "raw", this is equivalent to + * calling virDomainBlockRebase() with the VIR_DOMAIN_BLOCK_REBASE_COPY + * and VIR_DOMAIN_BLOCK_REBASE_COPY_RAW flags added to @flags.
and BlockCopy linking to BlockRebase.
I guess I should wait for more feedback on the qemu front before committing to the final form of this proposed libvirt API.
Yeah, I think that's wise.
Still true, but I can at least post the patches for reference in v5, since I have been testing them. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Mon, Apr 16, 2012 at 21:21:54 -0600, Eric Blake wrote:
On 04/16/2012 08:29 AM, Jiri Denemark wrote:
/** + * virDomainBlockCopy: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @dest: path to the copy destination + * @format: format of the destination + * @bandwidth: (optional) specify copy bandwidth limit in Mbps + * @flags: bitwise-OR of virDomainBlockCopyFlags
OK, so this new API may be used to avoid format guessing involved in virDomainBlockRebase. Shouldn't we introduce an enhanced version of virDomainBlockRebase with format parameter instead of introducing an API with a different name that does almost the same as virDomainBlockRebase?
And what would you name it? I'm saying that virDomainBlockCopy _is_ an enhanced virDomainBlockRebase, and the name BlockCopy was the name I picked, as it looks nicer than virDomainBlockRebase2().
I don't know, I was probably expecting something like virDomainBlockRebaseExt :-P I'm just missing a clear link between virDomainBlockRebase and virDomainBlockCopy. I guess a note to virDomainBlockRebase documentation mentioning virDomainBlockCopy as an enhanced version would work too.
But I already did that :)
Heh, indeed you did. Sorry for the noise ;) Jirka

Expose the full abilities of virDomainBlockCopy. * tools/virsh.c (blockJobImpl): Add --format option for block copy. * tools/virsh.pod (blockcopy): Document this. --- tools/virsh.c | 26 ++++++++++++++++++++------ tools/virsh.pod | 10 +++++----- 2 files changed, 25 insertions(+), 11 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 25403f5..8669a8a 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7527,6 +7527,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, unsigned long bandwidth = 0; int ret = -1; const char *base = NULL; + const char *format = NULL; unsigned int flags = 0; if (!vshConnectionUsability(ctl, ctl->conn)) @@ -7566,16 +7567,28 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, ret = virDomainBlockPull(dom, path, bandwidth, 0); break; case VSH_CMD_BLOCK_JOB_COPY: - flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; if (vshCommandOptBool(cmd, "shallow")) - flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + flags |= VIR_DOMAIN_BLOCK_COPY_SHALLOW; if (vshCommandOptBool(cmd, "reuse-external")) - flags |= VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; - if (vshCommandOptBool(cmd, "raw")) - flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + flags |= VIR_DOMAIN_BLOCK_COPY_REUSE_EXT; if (vshCommandOptString(cmd, "dest", &base) < 0) goto cleanup; - ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); + if (vshCommandOptString(cmd, "format", &format) < 0) + goto cleanup; + if (!format) { + if (vshCommandOptBool(cmd, "raw")) + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; + ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); + } else { + if (vshCommandOptBool(cmd, "raw")) { + vshError(ctl, "%s", + _("--raw and --format are mutually exclusive")); + goto cleanup; + } + ret = virDomainBlockCopy(dom, path, base, format, + bandwidth, flags); + } } cleanup: @@ -7598,6 +7611,7 @@ static const vshCmdOptDef opts_block_copy[] = { {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, + {"format", VSH_OT_DATA, VSH_OFLAG_NONE, N_("file format of dest")}, {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, diff --git a/tools/virsh.pod b/tools/virsh.pod index ee84ce5..65e2429 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -639,7 +639,7 @@ address of virtual interface (such as I<detach-interface> or I<domif-setlink>) will accept the MAC address printed by this command. =item B<blockcopy> I<domain> I<path> I<dest> [I<bandwidth>] [I<--shallow>] -[I<--reuse-external>] [I<--raw>] +[I<--reuse-external>] { [I<--raw>] | I<format> } Copy a disk backing image chain to I<dest>. By default, this command flattens the entire chain; but if I<--shallow> is specified, the copy @@ -652,10 +652,10 @@ is used, otherwise it must start empty); this option is typically used to set up a relative backing file name in the destination. The format of the destination is determined by the first match in the -following list: if I<--raw> is specified, it will be raw; if -I<--reuse-external> is specified, the existing destination is probed -for a format; and in all other cases, the destination format will -match the source format. +following list: if I<--raw> is specified, it will be raw; if I<format> +is specified, it will have that format; if I<--reuse-external> is +specified, the existing destination is probed for a format; and in +all other cases, the destination format will match the source format. The copy runs in the background; initially, the job must copy all data from the source, and during this phase, the job can only be canceled to -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:26 -0600, Eric Blake wrote:
Expose the full abilities of virDomainBlockCopy.
* tools/virsh.c (blockJobImpl): Add --format option for block copy. * tools/virsh.pod (blockcopy): Document this. --- tools/virsh.c | 26 ++++++++++++++++++++------ tools/virsh.pod | 10 +++++----- 2 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/tools/virsh.c b/tools/virsh.c index 25403f5..8669a8a 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7527,6 +7527,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, unsigned long bandwidth = 0; int ret = -1; const char *base = NULL; + const char *format = NULL; unsigned int flags = 0;
if (!vshConnectionUsability(ctl, ctl->conn)) @@ -7566,16 +7567,28 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, ret = virDomainBlockPull(dom, path, bandwidth, 0); break; case VSH_CMD_BLOCK_JOB_COPY: - flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; if (vshCommandOptBool(cmd, "shallow")) - flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + flags |= VIR_DOMAIN_BLOCK_COPY_SHALLOW; if (vshCommandOptBool(cmd, "reuse-external")) - flags |= VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; - if (vshCommandOptBool(cmd, "raw")) - flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + flags |= VIR_DOMAIN_BLOCK_COPY_REUSE_EXT; if (vshCommandOptString(cmd, "dest", &base) < 0) goto cleanup; - ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); + if (vshCommandOptString(cmd, "format", &format) < 0) + goto cleanup; + if (!format) { + if (vshCommandOptBool(cmd, "raw")) + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; + ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); + } else { + if (vshCommandOptBool(cmd, "raw")) { + vshError(ctl, "%s", + _("--raw and --format are mutually exclusive")); + goto cleanup; + } + ret = virDomainBlockCopy(dom, path, base, format, + bandwidth, flags); + } }
cleanup: @@ -7598,6 +7611,7 @@ static const vshCmdOptDef opts_block_copy[] = { {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, + {"format", VSH_OT_DATA, VSH_OFLAG_NONE, N_("file format of dest")}, {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, diff --git a/tools/virsh.pod b/tools/virsh.pod index ee84ce5..65e2429 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -639,7 +639,7 @@ address of virtual interface (such as I<detach-interface> or I<domif-setlink>) will accept the MAC address printed by this command.
=item B<blockcopy> I<domain> I<path> I<dest> [I<bandwidth>] [I<--shallow>] -[I<--reuse-external>] [I<--raw>] +[I<--reuse-external>] { [I<--raw>] | I<format> }
Copy a disk backing image chain to I<dest>. By default, this command flattens the entire chain; but if I<--shallow> is specified, the copy @@ -652,10 +652,10 @@ is used, otherwise it must start empty); this option is typically used to set up a relative backing file name in the destination.
The format of the destination is determined by the first match in the -following list: if I<--raw> is specified, it will be raw; if -I<--reuse-external> is specified, the existing destination is probed -for a format; and in all other cases, the destination format will -match the source format. +following list: if I<--raw> is specified, it will be raw; if I<format> +is specified, it will have that format; if I<--reuse-external> is +specified, the existing destination is probed for a format; and in +all other cases, the destination format will match the source format.
The copy runs in the background; initially, the job must copy all data from the source, and during this phase, the job can only be canceled to
OK but this patch may be affected in case we decide to rename virDomainBlockCopy. Jirka

Almost trivial; the trick was dealing with the fact that we're stuck with 'unsigned long bandwidth' due to earlier design decisions. Also, prefer the correct flag names (although we intentionally chose the _SHALLOW and _REUSE_EXT values to be equal, to allow for loose handling in our code). * src/remote/remote_protocol.x (remote_domain_block_copy_args): New struct. * src/remote/remote_driver.c (remote_driver): Use it. * src/rpc/gendispatch.pl (name_to_ProcName): Cater to legacy bandwidth type. * src/remote_protocol-structs: Regenerate. * src/qemu/qemu_driver.c (qemuDriver): Wire up driver callback. (qemuDomainBlockCopy): Use preferred flag names. --- src/qemu/qemu_driver.c | 13 +++++++------ src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +++++++++++- src/remote_protocol-structs | 9 +++++++++ src/rpc/gendispatch.pl | 1 + 5 files changed, 29 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 41f545f..04dfece 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11902,8 +11902,8 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, char *origdriver = NULL; /* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | - VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_COPY_SHALLOW | + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT, -1); qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11951,7 +11951,7 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, /* XXX this is pessimistic; we could use 'query-block' or even * keep track of the backing chain ourselves, rather than assuming * that all non-raw source files have a current backing image */ - if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + if ((flags & VIR_DOMAIN_BLOCK_COPY_SHALLOW) && STREQ_NULLABLE(format, "raw") && STRNEQ_NULLABLE(disk->driverType, "raw")) { qemuReportError(VIR_ERR_INVALID_ARG, @@ -11966,14 +11966,14 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, virReportSystemError(errno, _("unable to stat for disk %s: %s"), disk->dst, dest); goto endjob; - } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + } else if (flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT) { virReportSystemError(errno, _("missing destination file for disk %s: %s"), disk->dst, dest); goto endjob; } } else if (!(S_ISBLK(st.st_mode) || !st.st_size || - (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT))) { + (flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT))) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("external destination file for disk %s already " "exists and is not a block device: %s"), @@ -11981,7 +11981,7 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } - if (!(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) { + if (!(flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT)) { int fd = qemuOpenFile(driver, dest, O_WRONLY | O_TRUNC | O_CREAT, &need_unlink, NULL); if (fd < 0) @@ -13181,6 +13181,7 @@ static virDriver qemuDriver = { .domainBlockJobSetSpeed = qemuDomainBlockJobSetSpeed, /* 0.9.4 */ .domainBlockPull = qemuDomainBlockPull, /* 0.9.4 */ .domainBlockRebase = qemuDomainBlockRebase, /* 0.9.10 */ + .domainBlockCopy = qemuDomainBlockCopy, /* 0.9.12 */ .isAlive = qemuIsAlive, /* 0.9.8 */ .nodeSuspendForDuration = nodeSuspendForDuration, /* 0.9.8 */ .domainSetBlockIoTune = qemuDomainSetBlockIoTune, /* 0.9.8 */ diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index af46384..b431779 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -5095,6 +5095,7 @@ static virDriver remote_driver = { .domainBlockJobSetSpeed = remoteDomainBlockJobSetSpeed, /* 0.9.4 */ .domainBlockPull = remoteDomainBlockPull, /* 0.9.4 */ .domainBlockRebase = remoteDomainBlockRebase, /* 0.9.10 */ + .domainBlockCopy = remoteDomainBlockCopy, /* 0.9.12 */ .setKeepAlive = remoteSetKeepAlive, /* 0.9.8 */ .isAlive = remoteIsAlive, /* 0.9.8 */ .nodeSuspendForDuration = remoteNodeSuspendForDuration, /* 0.9.8 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 2d57247..65b4a1d 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -1188,6 +1188,14 @@ struct remote_domain_block_rebase_args { unsigned hyper bandwidth; unsigned int flags; }; +struct remote_domain_block_copy_args { + remote_nonnull_domain dom; + remote_nonnull_string path; + remote_nonnull_string dest; + remote_string format; + unsigned hyper bandwidth; + unsigned int flags; +}; struct remote_domain_set_block_io_tune_args { remote_nonnull_domain dom; @@ -2782,7 +2790,9 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_PM_WAKEUP = 267, /* autogen autogen */ REMOTE_PROC_DOMAIN_EVENT_TRAY_CHANGE = 268, /* autogen autogen */ REMOTE_PROC_DOMAIN_EVENT_PMWAKEUP = 269, /* autogen autogen */ - REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270 /* autogen autogen */ + REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270, /* autogen autogen */ + + REMOTE_PROC_DOMAIN_BLOCK_COPY = 271 /* autogen autogen */ /* * Notice how the entries are grouped in sets of 10 ? diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 9b2414f..dd2aa72 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -845,6 +845,14 @@ struct remote_domain_block_rebase_args { uint64_t bandwidth; u_int flags; }; +struct remote_domain_block_copy_args { + remote_nonnull_domain dom; + remote_nonnull_string path; + remote_nonnull_string dest; + remote_string format; + uint64_t bandwidth; + u_int flags; +}; struct remote_domain_set_block_io_tune_args { remote_nonnull_domain dom; remote_nonnull_string disk; @@ -2192,4 +2200,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_EVENT_TRAY_CHANGE = 268, REMOTE_PROC_DOMAIN_EVENT_PMWAKEUP = 269, REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270, + REMOTE_PROC_DOMAIN_BLOCK_COPY = 271, }; diff --git a/src/rpc/gendispatch.pl b/src/rpc/gendispatch.pl index f161ee0..e5e28e0 100755 --- a/src/rpc/gendispatch.pl +++ b/src/rpc/gendispatch.pl @@ -232,6 +232,7 @@ my $long_legacy = { NodeGetInfo => { ret => { memory => 1 } }, DomainBlockPull => { arg => { bandwidth => 1 } }, DomainBlockRebase => { arg => { bandwidth => 1 } }, + DomainBlockCopy => { arg => { bandwidth => 1 } }, DomainBlockJobSetSpeed => { arg => { bandwidth => 1 } }, DomainMigrateGetMaxSpeed => { ret => { bandwidth => 1 } }, }; -- 1.7.7.6

On Mon, Apr 09, 2012 at 21:52:27 -0600, Eric Blake wrote:
Almost trivial; the trick was dealing with the fact that we're stuck with 'unsigned long bandwidth' due to earlier design decisions. Also, prefer the correct flag names (although we intentionally chose the _SHALLOW and _REUSE_EXT values to be equal, to allow for loose handling in our code).
* src/remote/remote_protocol.x (remote_domain_block_copy_args): New struct. * src/remote/remote_driver.c (remote_driver): Use it. * src/rpc/gendispatch.pl (name_to_ProcName): Cater to legacy bandwidth type. * src/remote_protocol-structs: Regenerate. * src/qemu/qemu_driver.c (qemuDriver): Wire up driver callback. (qemuDomainBlockCopy): Use preferred flag names. --- src/qemu/qemu_driver.c | 13 +++++++------ src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +++++++++++- src/remote_protocol-structs | 9 +++++++++ src/rpc/gendispatch.pl | 1 + 5 files changed, 29 insertions(+), 7 deletions(-)
Looks OK. Jirka

On 04/09/2012 09:52 PM, Eric Blake wrote:
v3 was here: https://www.redhat.com/archives/libvir-list/2012-April/msg00288.html
changes from v3: - split v3 patch 8 into 1/18 and 9/18 - incorporate Laine's review comments on v3 patches 1 and 2 (here 2 and 3) - fix more bugs found during testing - upgrade to Paolo's latest build (which dropped 'mode':'no-backing-file' and replaced it with 'full':'true') - added a new patch 14/18 that lets blockcopy work with SELinux enforcing (it still leaks files rather than clean up context on job abort or pivot, but that's a harder problem to solve)
repo.or.cz is acting up on me at the moment, so I won't have a repo to point to until I can investigate why tomorrow. I'll send an interdiff in reply to this message.
Interdiff from v3 is attached src/conf/domain_conf.c | 2 +- src/libvirt.c | 9 +++- src/qemu/qemu_driver.c | 102 +++++++++++++++++++++++++++++------------- src/qemu/qemu_monitor.c | 8 ++-- src/qemu/qemu_monitor.h | 10 +---- src/qemu/qemu_monitor_json.c | 15 +++--- src/qemu/qemu_monitor_json.h | 2 +- tools/virsh.c | 3 +- tools/virsh.pod | 33 +++++++++----- 9 files changed, 114 insertions(+), 70 deletions(-) -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (4)
-
Daniel P. Berrange
-
Eric Blake
-
Jiri Denemark
-
Laine Stump