[libvirt] [PATCHv3 00/16] live block migration via virDomainBlockCopy

v1 was here: https://www.redhat.com/archives/libvir-list/2012-April/msg00068.html v2 was here: https://www.redhat.com/archives/libvir-list/2012-April/msg00222.html changes from v2: added patch 12/16, addressed some review comments and made minor changes from more of my own testing CAVEAT: Paolo and I had an IRC conversation, where we decided that 'drive-mirror' might do better to take a mandatory 'full':'bool' (or an even more powerful optional '*base':'str') argument for determining how much to stream (full vs. shallow, or even the backing file of the copy as matching the base argument of 'block_stream') rather than overloading the 'mode':'no-backing-file' as the only way to get a full pull. If that change goes in qemu, then we will need a v4, to tweak how the 'drive-mirror' monitor command is called, and take advantage of the additional flexibility it offers. changes from v1: Paolo has updated the qemu side of things, and built a scratch image for RHEL that I was able to test with for the new semantics. I was actually able to successfully run two different block copy jobs, one canceled and one pivoted, with SELinux disabled and this patch series. Patch 13/16 is for reference only when working with Paolo's build; it will not go upstream. Patches 14-16 are optional; the extra flexibility might be nice, but I haven't yet played with those three enough to know if Paolo's build behaves like I was expecting. Adam Litke (2): blockjob: add API for async virDomainBlockJobAbort blockjob: wire up qemu async virDomainBlockJobAbort Eric Blake (14): blockjob: allow for fast-finishing job blockjob: add new API flags blockjob: add 'blockcopy' to virsh blockjob: enhance xml to track mirrors across libvirtd restart blockjob: react to active block copy blockjob: expose qemu commands for mirrored storage migration blockjob: return appropriate event and info blockjob: support pivot operation on cancel blockjob: implement block copy for qemu blockjob: allow for existing files blockjob: accommodate RHEL backport names blockjob: add virDomainBlockCopy blockjob: enhance virsh 'blockcopy' blockjob: wire up qemu and RPC for block copy docs/apibuild.py | 1 + docs/formatdomain.html.in | 11 ++ docs/schemas/domaincommon.rng | 19 ++- include/libvirt/libvirt.h.in | 50 +++++- include/libvirt/virterror.h | 1 + src/conf/domain_conf.c | 54 ++++++ src/conf/domain_conf.h | 6 + src/driver.h | 5 + src/libvirt.c | 220 +++++++++++++++++++++++- src/libvirt_private.syms | 1 + src/libvirt_public.syms | 5 + src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_driver.c | 379 +++++++++++++++++++++++++++++++++++++++-- src/qemu/qemu_hotplug.c | 7 + src/qemu/qemu_monitor.c | 50 ++++++ src/qemu/qemu_monitor.h | 26 +++- src/qemu/qemu_monitor_json.c | 166 +++++++++++++++--- src/qemu/qemu_monitor_json.h | 19 ++- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 ++- src/remote_protocol-structs | 9 + src/rpc/gendispatch.pl | 1 + src/util/virterror.c | 6 + tools/virsh.c | 140 ++++++++++++--- tools/virsh.pod | 40 ++++- 26 files changed, 1143 insertions(+), 91 deletions(-) -- 1.7.7.6

From: Adam Litke <agl@us.ibm.com> Qemu has changed the semantics of the "block_job_cancel" API. The original qed implementation (pretty much only backported to RHEL 6.2 qemu) was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics going into qemu 1.1 for qcow2, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored. To adopt the new semantics while preserving compatibility the following updates are made to the virDomainBlockJob API: A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED is recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled. For now, libvirt does not try to synthesize this event if using an older qemu that did not generate it. A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC is added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status; this flag is an error if it is not known if the hypervisor supports asynchronous cancel. This patch also exposes the new flag through virsh. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 +++++++++ src/libvirt.c | 10 ++++++++- src/qemu/qemu_monitor_json.c | 42 +++++++++++++++++++++++++++++++------ tools/virsh.c | 46 ++++++++++++++++++++++++++--------------- tools/virsh.pod | 9 +++++-- 5 files changed, 89 insertions(+), 28 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 499dcd4..97ad99d 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1946,6 +1946,15 @@ typedef enum { #endif } virDomainBlockJobType; +/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1 << 0, +} virDomainBlockJobAbortFlags; + /* An iterator for monitoring block job operations */ typedef unsigned long long virDomainBlockJobCursor; @@ -3617,6 +3626,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELED = 2, #ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_LAST diff --git a/src/libvirt.c b/src/libvirt.c index 16d1fd5..af22232 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17902,7 +17902,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise-OR of virDomainBlockJobAbortFlags * * Cancel the active block job on the given disk. * @@ -17913,6 +17913,14 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, + * pass VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the @flags argument to enable + * asynchronous behavior. Either way, when the job has been cancelled, a + * BlockJob event will be emitted, with status VIR_DOMAIN_BLOCK_JOB_CANCELLED. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index d09d779..c9f0f0c 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -58,13 +58,14 @@ static void qemuMonitorJSONHandleIOError(qemuMonitorPtr mon, virJSONValuePtr dat static void qemuMonitorJSONHandleVNCConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleTrayChange(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMWakeup(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, virJSONValuePtr data); static struct { const char *type; @@ -80,13 +81,14 @@ static struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, { "SPICE_CONNECTED", qemuMonitorJSONHandleSPICEConnect, }, { "SPICE_INITIALIZED", qemuMonitorJSONHandleSPICEInitialize, }, { "SPICE_DISCONNECTED", qemuMonitorJSONHandleSPICEDisconnect, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "WAKEUP", qemuMonitorJSONHandlePMWakeup, }, { "SUSPEND", qemuMonitorJSONHandlePMSuspend, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, }, }; @@ -754,13 +756,15 @@ static void qemuMonitorJSONHandleSPICEDisconnect(qemuMonitorPtr mon, virJSONValu qemuMonitorJSONHandleGraphics(mon, data, VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT); } -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data) +static void +qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, + virJSONValuePtr data, + int event) { const char *device; const char *type_str; int type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; unsigned long long offset, len; - int status = VIR_DOMAIN_BLOCK_JOB_FAILED; if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in block job event"); @@ -785,11 +789,19 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; - if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED; + break; + case VIR_DOMAIN_BLOCK_JOB_FAILED: + case VIR_DOMAIN_BLOCK_JOB_CANCELED: + break; + } out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); } static void @@ -832,6 +844,22 @@ qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, qemuMonitorEmitPMSuspend(mon); } +static void +qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_COMPLETED); +} + +static void +qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_CANCELED); +} + int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, const char *cmd_str, diff --git a/tools/virsh.c b/tools/virsh.c index cfdd040..7180b83 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7525,6 +7525,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, const char *name, *path; unsigned long bandwidth = 0; int ret = -1; + unsigned int flags = 0; if (!vshConnectionUsability(ctl, ctl->conn)) goto cleanup; @@ -7541,7 +7542,9 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, } if (mode == VSH_CMD_BLOCK_JOB_ABORT) { - ret = virDomainBlockJobAbort(dom, path, 0); + if (vshCommandOptBool(cmd, "async")) + flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + ret = virDomainBlockJobAbort(dom, path, flags); } else if (mode == VSH_CMD_BLOCK_JOB_INFO) { ret = virDomainGetBlockJobInfo(dom, path, info, 0); } else if (mode == VSH_CMD_BLOCK_JOB_SPEED) { @@ -7589,20 +7592,25 @@ cmdBlockPull(vshControl *ctl, const vshCmd *cmd) } /* - * "blockjobinfo" command + * "blockjob" command */ static const vshCmdInfo info_block_job[] = { - {"help", N_("Manage active block operations.")}, - {"desc", N_("Manage active block operations.")}, + {"help", N_("Manage active block operations")}, + {"desc", N_("Query, adjust speed, or cancel active block operations.")}, {NULL, NULL} }; static const vshCmdOptDef opts_block_job[] = { {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, - {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Abort the active job on the specified disk")}, - {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Get active job information for the specified disk")}, - {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Set the Bandwidth limit in MB/s")}, + {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("Abort the active job on the specified disk")}, + {"async", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("don't wait for --abort to complete")}, + {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("Get active job information for the specified disk")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, + N_("Set the Bandwidth limit in MB/s")}, {NULL, 0, 0, NULL} }; @@ -7613,19 +7621,23 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) virDomainBlockJobInfo info; const char *type; int ret; + bool abortMode = vshCommandOptBool(cmd, "abort"); + bool infoMode = vshCommandOptBool(cmd, "info"); + bool bandwidth = vshCommandOptBool(cmd, "bandwidth"); - if (vshCommandOptBool (cmd, "abort")) { - mode = VSH_CMD_BLOCK_JOB_ABORT; - } else if (vshCommandOptBool (cmd, "info")) { - mode = VSH_CMD_BLOCK_JOB_INFO; - } else if (vshCommandOptBool (cmd, "bandwidth")) { - mode = VSH_CMD_BLOCK_JOB_SPEED; - } else { + if (abortMode + infoMode + bandwidth > 1) { vshError(ctl, "%s", _("One of --abort, --info, or --bandwidth is required")); return false; } + if (abortMode) + mode = VSH_CMD_BLOCK_JOB_ABORT; + else if (bandwidth) + mode = VSH_CMD_BLOCK_JOB_SPEED; + else + mode = VSH_CMD_BLOCK_JOB_INFO; + ret = blockJobImpl(ctl, cmd, &info, mode); if (ret < 0) return false; @@ -7634,13 +7646,13 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) return true; if (info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL) - type = "Block Pull"; + type = _("Block Pull"); else - type = "Unknown job"; + type = _("Unknown job"); print_job_progress(type, info.end - info.cur, info.end); if (info.bandwidth != 0) - vshPrint(ctl, " Bandwidth limit: %lu MB/s\n", info.bandwidth); + vshPrint(ctl, _(" Bandwidth limit: %lu MB/s\n"), info.bandwidth); return true; } diff --git a/tools/virsh.pod b/tools/virsh.pod index a60e667..aac72d1 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -686,13 +686,16 @@ Both I<--live> and I<--current> flags may be given, but I<--current> is exclusive. If no flag is specified, behavior is different depending on hypervisor. -=item B<blockjob> I<domain> I<path> [I<--abort>] [I<--info>] [I<bandwidth>] +=item B<blockjob> I<domain> I<path> { I<--abort> [I<--async>] | +[I<--info>] | I<bandwidth> } -Manage active block operations. +Manage active block operations. If no mode is chosen, I<--info> is assumed. I<path> specifies fully-qualified path of the disk. + If I<--abort> is specified, the active job on the specified disk will -be aborted. +be aborted. If I<--async> is also specified, this command will return +immediately, rather than waiting for the cancelation to complete. If I<--info> is specified, the active job information on the specified disk will be printed. I<bandwidth> can be used to set bandwidth limit for the active job. -- 1.7.7.6

On 04/06/2012 02:29 PM, Eric Blake wrote:
From: Adam Litke <agl@us.ibm.com>
Qemu has changed the semantics of the "block_job_cancel" API. The original qed implementation (pretty much only backported to RHEL 6.2 qemu) was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics going into qemu 1.1 for qcow2, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility the following updates are made to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED is recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled. For now, libvirt does not try to synthesize this event if using an older qemu that did not generate it.
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC is added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status; this flag is an error if it is not known if the hypervisor supports asynchronous cancel.
This patch also exposes the new flag through virsh.
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 +++++++++ src/libvirt.c | 10 ++++++++- src/qemu/qemu_monitor_json.c | 42 +++++++++++++++++++++++++++++++------ tools/virsh.c | 46 ++++++++++++++++++++++++++--------------- tools/virsh.pod | 9 +++++-- 5 files changed, 89 insertions(+), 28 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 499dcd4..97ad99d 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1946,6 +1946,15 @@ typedef enum { #endif } virDomainBlockJobType;
+/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1 << 0, +} virDomainBlockJobAbortFlags; + /* An iterator for monitoring block job operations */ typedef unsigned long long virDomainBlockJobCursor;
@@ -3617,6 +3626,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELED = 2,
You know, this is one case where I think the Queen's English has it right - something just seems *wrong* about spelling CANCELED with one "L"...
#ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_LAST diff --git a/src/libvirt.c b/src/libvirt.c index 16d1fd5..af22232 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17902,7 +17902,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise-OR of virDomainBlockJobAbortFlags * * Cancel the active block job on the given disk. * @@ -17913,6 +17913,14 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, + * pass VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the @flags argument to enable + * asynchronous behavior. Either way, when the job has been cancelled, a + * BlockJob event will be emitted, with status VIR_DOMAIN_BLOCK_JOB_CANCELLED. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index d09d779..c9f0f0c 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -58,13 +58,14 @@ static void qemuMonitorJSONHandleIOError(qemuMonitorPtr mon, virJSONValuePtr dat static void qemuMonitorJSONHandleVNCConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleSPICEDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleTrayChange(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMWakeup(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, virJSONValuePtr data);
static struct { const char *type; @@ -80,13 +81,14 @@ static struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, { "SPICE_CONNECTED", qemuMonitorJSONHandleSPICEConnect, }, { "SPICE_INITIALIZED", qemuMonitorJSONHandleSPICEInitialize, }, { "SPICE_DISCONNECTED", qemuMonitorJSONHandleSPICEDisconnect, }, { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, }, { "WAKEUP", qemuMonitorJSONHandlePMWakeup, }, { "SUSPEND", qemuMonitorJSONHandlePMSuspend, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, }, };
@@ -754,13 +756,15 @@ static void qemuMonitorJSONHandleSPICEDisconnect(qemuMonitorPtr mon, virJSONValu qemuMonitorJSONHandleGraphics(mon, data, VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT); }
-static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data) +static void +qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, + virJSONValuePtr data, + int event) { const char *device; const char *type_str; int type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; unsigned long long offset, len; - int status = VIR_DOMAIN_BLOCK_JOB_FAILED;
if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in block job event"); @@ -785,11 +789,19 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
- if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED; + break;
Okay, so if you're talking to an "old" qemu, it will return a COMPLETED event with offset < length to indicate failure, which is handled here...
+ case VIR_DOMAIN_BLOCK_JOB_FAILED:
and if it's a new qemu, it will return FAILED explicitly, which is handled here. What happens when it's an old libvirt and new qemu? (not that there's anything libvirt could do about it, I'm just curious)
+ case VIR_DOMAIN_BLOCK_JOB_CANCELED: + break; + }
out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); }
static void @@ -832,6 +844,22 @@ qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, qemuMonitorEmitPMSuspend(mon); }
+static void +qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_COMPLETED); +} + +static void +qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_CANCELED); +} + int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, const char *cmd_str, diff --git a/tools/virsh.c b/tools/virsh.c index cfdd040..7180b83 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7525,6 +7525,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, const char *name, *path; unsigned long bandwidth = 0; int ret = -1; + unsigned int flags = 0;
if (!vshConnectionUsability(ctl, ctl->conn)) goto cleanup; @@ -7541,7 +7542,9 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, }
if (mode == VSH_CMD_BLOCK_JOB_ABORT) { - ret = virDomainBlockJobAbort(dom, path, 0); + if (vshCommandOptBool(cmd, "async")) + flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + ret = virDomainBlockJobAbort(dom, path, flags); } else if (mode == VSH_CMD_BLOCK_JOB_INFO) { ret = virDomainGetBlockJobInfo(dom, path, info, 0); } else if (mode == VSH_CMD_BLOCK_JOB_SPEED) { @@ -7589,20 +7592,25 @@ cmdBlockPull(vshControl *ctl, const vshCmd *cmd) }
/* - * "blockjobinfo" command + * "blockjob" command */ static const vshCmdInfo info_block_job[] = { - {"help", N_("Manage active block operations.")}, - {"desc", N_("Manage active block operations.")}, + {"help", N_("Manage active block operations")}, + {"desc", N_("Query, adjust speed, or cancel active block operations.")}, {NULL, NULL} };
static const vshCmdOptDef opts_block_job[] = { {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, - {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Abort the active job on the specified disk")}, - {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Get active job information for the specified disk")}, - {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Set the Bandwidth limit in MB/s")}, + {"abort", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("Abort the active job on the specified disk")}, + {"async", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("don't wait for --abort to complete")}, + {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("Get active job information for the specified disk")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, + N_("Set the Bandwidth limit in MB/s")}, {NULL, 0, 0, NULL} };
@@ -7613,19 +7621,23 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) virDomainBlockJobInfo info; const char *type; int ret; + bool abortMode = vshCommandOptBool(cmd, "abort"); + bool infoMode = vshCommandOptBool(cmd, "info"); + bool bandwidth = vshCommandOptBool(cmd, "bandwidth");
- if (vshCommandOptBool (cmd, "abort")) { - mode = VSH_CMD_BLOCK_JOB_ABORT; - } else if (vshCommandOptBool (cmd, "info")) { - mode = VSH_CMD_BLOCK_JOB_INFO; - } else if (vshCommandOptBool (cmd, "bandwidth")) { - mode = VSH_CMD_BLOCK_JOB_SPEED; - } else { + if (abortMode + infoMode + bandwidth > 1) { vshError(ctl, "%s", _("One of --abort, --info, or --bandwidth is required"));
You're changing the test here to require 0 or 1 of the options, rather than exactly one, effectively making --info the default. That makes sense, but I think the error message should be a bit different, maybe "no more than one of --abort, --info, or --bandwidth is allowed (default is --info)".
return false; }
+ if (abortMode) + mode = VSH_CMD_BLOCK_JOB_ABORT; + else if (bandwidth) + mode = VSH_CMD_BLOCK_JOB_SPEED; + else + mode = VSH_CMD_BLOCK_JOB_INFO; + ret = blockJobImpl(ctl, cmd, &info, mode); if (ret < 0) return false; @@ -7634,13 +7646,13 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) return true;
if (info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL) - type = "Block Pull"; + type = _("Block Pull"); else - type = "Unknown job"; + type = _("Unknown job");
print_job_progress(type, info.end - info.cur, info.end); if (info.bandwidth != 0) - vshPrint(ctl, " Bandwidth limit: %lu MB/s\n", info.bandwidth); + vshPrint(ctl, _(" Bandwidth limit: %lu MB/s\n"), info.bandwidth); return true; }
diff --git a/tools/virsh.pod b/tools/virsh.pod index a60e667..aac72d1 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -686,13 +686,16 @@ Both I<--live> and I<--current> flags may be given, but I<--current> is exclusive. If no flag is specified, behavior is different depending on hypervisor.
-=item B<blockjob> I<domain> I<path> [I<--abort>] [I<--info>] [I<bandwidth>] +=item B<blockjob> I<domain> I<path> { I<--abort> [I<--async>] | +[I<--info>] | I<bandwidth> }
A bit awkward to parse, but it does eventually make sense :-)
-Manage active block operations. +Manage active block operations. If no mode is chosen, I<--info> is assumed.
I<path> specifies fully-qualified path of the disk. + If I<--abort> is specified, the active job on the specified disk will -be aborted. +be aborted. If I<--async> is also specified, this command will return +immediately, rather than waiting for the cancelation to complete. If I<--info> is specified, the active job information on the specified disk will be printed. I<bandwidth> can be used to set bandwidth limit for the active job.
ACK with the change to the error message.

On 04/09/2012 02:45 PM, Laine Stump wrote:
On 04/06/2012 02:29 PM, Eric Blake wrote:
From: Adam Litke <agl@us.ibm.com>
Qemu has changed the semantics of the "block_job_cancel" API. The original qed implementation (pretty much only backported to RHEL 6.2 qemu) was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics going into qemu 1.1 for qcow2, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
@@ -3617,6 +3626,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELED = 2,
You know, this is one case where I think the Queen's English has it right - something just seems *wrong* about spelling CANCELED with one "L"...
I think the consensus was to add aliases so that either spelling would work, but that will have to be another patch for another day.
* + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, + * pass VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the @flags argument to enable + * asynchronous behavior. Either way, when the job has been cancelled, a + * BlockJob event will be emitted, with status VIR_DOMAIN_BLOCK_JOB_CANCELLED.
But it would help if I can spell it consistently within a single patch :) Also, I'm going to update this to mention that the ASYNC flag is not recognized if qemu is too old, and that the event is not guaranteed (along with the polling loop that the user can rely on if they don't want to trust the event).
@@ -785,11 +789,19 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
- if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED; + break;
Okay, so if you're talking to an "old" qemu, it will return a COMPLETED event with offset < length to indicate failure, which is handled here...
Actually, the old qemu would never return an event on cancellation (except in the inherent race where you request a cancel, but the job completes normally before qemu can act on your request). Rather, both the old qemu and new qemu return the 'BLOCK_JOB_COMPLETE' event on completion, and you must then look at offset/len to decide if it was a successful completion or a failure. It is only the new qemu that adds the new 'BLOCK_JOB_CANCELLED' event (this time, we have the qemu spelling, and it is UK spelling).
+ case VIR_DOMAIN_BLOCK_JOB_FAILED:
and if it's a new qemu, it will return FAILED explicitly, which is handled here. What happens when it's an old libvirt and new qemu? (not that there's anything libvirt could do about it, I'm just curious)
Actually, new qemu only emits two types of events (COMPLETE and CANCELLED), and this helper function converts those two types into three user-visible types (COMPLETED, FAILED, CANCELED). I think what I should do is mark the unreachable case statement accordingly.
+ case VIR_DOMAIN_BLOCK_JOB_CANCELED: + break; + }
out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); }
static void @@ -832,6 +844,22 @@ qemuMonitorJSONHandlePMSuspend(qemuMonitorPtr mon, qemuMonitorEmitPMSuspend(mon); }
+static void +qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_COMPLETED); +} + +static void +qemuMonitorJSONHandleBlockJobCanceled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, + VIR_DOMAIN_BLOCK_JOB_CANCELED); +}
Here's our two event handlers that both call into the helper function that translates the 2 qemu events into 3 libvirt events.
@@ -7613,19 +7621,23 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) virDomainBlockJobInfo info; const char *type; int ret; + bool abortMode = vshCommandOptBool(cmd, "abort"); + bool infoMode = vshCommandOptBool(cmd, "info"); + bool bandwidth = vshCommandOptBool(cmd, "bandwidth");
- if (vshCommandOptBool (cmd, "abort")) { - mode = VSH_CMD_BLOCK_JOB_ABORT; - } else if (vshCommandOptBool (cmd, "info")) { - mode = VSH_CMD_BLOCK_JOB_INFO; - } else if (vshCommandOptBool (cmd, "bandwidth")) { - mode = VSH_CMD_BLOCK_JOB_SPEED; - } else { + if (abortMode + infoMode + bandwidth > 1) { vshError(ctl, "%s", _("One of --abort, --info, or --bandwidth is required"));
You're changing the test here to require 0 or 1 of the options, rather than exactly one, effectively making --info the default. That makes sense, but I think the error message should be a bit different, maybe "no more than one of --abort, --info, or --bandwidth is allowed (default is --info)".
In fact, in a later patch, I changed it even more, to make --async (or --pivot) in isolation imply --abort. The important part to remember here is that this error message only occurs if more than one mode is requested at once.
-=item B<blockjob> I<domain> I<path> [I<--abort>] [I<--info>] [I<bandwidth>] +=item B<blockjob> I<domain> I<path> { I<--abort> [I<--async>] | +[I<--info>] | I<bandwidth> }
A bit awkward to parse, but it does eventually make sense :-)
I think I can rebase part of that refactoring earlier, so in v4, I think I will just list things as: bool abortMode = (vshCommandOptBool(cmd, "abort") || vshCommandOptBool(cmd, "async")); if (abortMode + infoMode + bandwidth > 1) { vshError(ctl, "%s", _("conflict between --abort, --info, and --bandwidth modes")); B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async] | [I<--info>] | [I<bandwidth>] } Manage active block operations. There are three modes: I<--info>, I<bandwidth>, and I<--abort>; I<--info> is default except that I<--async> implies I<--abort>.
ACK with the change to the error message.
I've got enough tweaks that it will be worth looking at again in v4. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

From: Adam Litke <agl@us.ibm.com> Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block. Future patches may refactor things to allow other queries in parallel with this polling. Unfortunately, there's no good way to tell if qemu will emit the new event, so this implementation always polls to deal with older qemu. Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 55 +++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 48 insertions(+), 7 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d9e35be..5f0eb05 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11599,7 +11599,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int mode, unsigned int flags) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11641,6 +11641,45 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode); qemuDomainObjExitMonitorWithDriver(driver, vm); + /* Qemu provides asynchronous block job cancellation, but without + * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a + * synchronous operation. Provide this behavior by waiting here, + * so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + ret = 1; + while (1) { + /* Poll every 50ms */ + struct timespec ts = { .tv_sec = 0, + .tv_nsec = 50 * 1000 * 1000ull }; + virDomainBlockJobInfo dummy; + + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorBlockJob(priv->mon, device, NULL, 0, &dummy, + BLOCK_JOB_INFO); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (ret <= 0) + break; + + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + + nanosleep(&ts, NULL); + + qemuDriverLock(driver); + virDomainObjLock(vm); + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + ret = -1; + break; + } + } + } + endjob: if (qemuDomainObjEndJob(driver, vm) == 0) { vm = NULL; @@ -11658,8 +11697,9 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT); + virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); + return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT, + flags); } static int @@ -11667,7 +11707,8 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, NULL, 0, info, BLOCK_JOB_INFO); + return qemuDomainBlockJobImpl(dom, path, NULL, 0, info, BLOCK_JOB_INFO, + flags); } static int @@ -11676,7 +11717,7 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, { virCheckFlags(0, -1); return qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED); + BLOCK_JOB_SPEED, flags); } static int @@ -11687,10 +11728,10 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, virCheckFlags(0, -1); ret = qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, - BLOCK_JOB_PULL); + BLOCK_JOB_PULL, flags); if (ret == 0 && bandwidth != 0) ret = qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED); + BLOCK_JOB_SPEED, flags); return ret; } -- 1.7.7.6

On 04/06/2012 02:29 PM, Eric Blake wrote:
From: Adam Litke <agl@us.ibm.com>
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block. Future patches may refactor things to allow other queries in parallel with this polling. Unfortunately, there's no good way to tell if qemu will emit the new event, so this implementation always polls to deal with older qemu.
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 55 +++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 48 insertions(+), 7 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index d9e35be..5f0eb05 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11599,7 +11599,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int mode, unsigned int flags) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11641,6 +11641,45 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode); qemuDomainObjExitMonitorWithDriver(driver, vm);
Do I understand correctly that qemu's abort was always asynchronous, and prior to this, an abort call to libvirt would erroneously return immediately? (I'm still trying to understand the code...)
+ /* Qemu provides asynchronous block job cancellation, but without + * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a + * synchronous operation. Provide this behavior by waiting here, + * so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + ret = 1;
Why is ret set to 1? It will be assigned the return value of qemuMonitorBlockJob before it's ever used for anything, so this seems unnecessary.
+ while (1) { + /* Poll every 50ms */ + struct timespec ts = { .tv_sec = 0, + .tv_nsec = 50 * 1000 * 1000ull };
This timespec could just as well be static const, couldn't it? No big deal one way or the other, though.
+ virDomainBlockJobInfo dummy; + + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorBlockJob(priv->mon, device, NULL, 0, &dummy, + BLOCK_JOB_INFO); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + if (ret <= 0) + break; + + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + + nanosleep(&ts, NULL); + + qemuDriverLock(driver); + virDomainObjLock(vm); + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + ret = -1; + break; + } + } + } + endjob: if (qemuDomainObjEndJob(driver, vm) == 0) { vm = NULL; @@ -11658,8 +11697,9 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT); + virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); + return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT, + flags); }
static int @@ -11667,7 +11707,8 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, NULL, 0, info, BLOCK_JOB_INFO); + return qemuDomainBlockJobImpl(dom, path, NULL, 0, info, BLOCK_JOB_INFO, + flags); }
static int @@ -11676,7 +11717,7 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, { virCheckFlags(0, -1); return qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED); + BLOCK_JOB_SPEED, flags); }
static int @@ -11687,10 +11728,10 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base,
virCheckFlags(0, -1); ret = qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, - BLOCK_JOB_PULL); + BLOCK_JOB_PULL, flags); if (ret == 0 && bandwidth != 0) ret = qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED); + BLOCK_JOB_SPEED, flags); return ret; }
ACK once the items above are addressed (either by explaining, changing code, or telling me why I'm wrong)

On 04/09/2012 03:26 PM, Laine Stump wrote:
On 04/06/2012 02:29 PM, Eric Blake wrote:
From: Adam Litke <agl@us.ibm.com>
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block. Future patches may refactor things to allow other queries in parallel with this polling. Unfortunately, there's no good way to tell if qemu will emit the new event, so this implementation always polls to deal with older qemu.
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Eric Blake <eblake@redhat.com> --- src/qemu/qemu_driver.c | 55 +++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 48 insertions(+), 7 deletions(-)
@@ -11641,6 +11641,45 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode); qemuDomainObjExitMonitorWithDriver(driver, vm);
Do I understand correctly that qemu's abort was always asynchronous, and prior to this, an abort call to libvirt would erroneously return immediately? (I'm still trying to understand the code...)
Worse. qemu 1.1 will be adding the block_stream and block_job_cancel monitor commands for the first time upstream, where block_job_cancel is asynchronous. But RHEL 6.2 backported an early version of block_stream that worked on QED only, and where block_job_cancel was synchronous. The early synchronous version never emits an event (if we want to guarantee an event, we will have to synthesize one in libvirt). The asynchronous code always emits an event, but due to the RHEL 6.2 early backport (or even an ill-timed libvirtd restart), you cannot rely on the event. Therefore, libvirt has to poll. On the old qemu, the poll will be a single iteration (because the event has always completed). On the new qemu, the poll might be a single iteration (there's an inherent race where qemu can finish things and send the event before libvirt gets a chance to poll), or it might be several iterations. So even if we only see one iteration, we cannot assume the old qemu. I never got a good answer from Adam how long the poll might last, but a good estimate is that the block job takes time to cancel in proportion to the number of sectors that must still be flushed to disk, so a larger disk is more likely to have a longer poll. Since libvirt already exposed virDomainBlockJobCancel as synchronous by default, we must maintain that default. I suppose I could work on an additional patch that either allows you to mix the ASYNC flag with RHEL 6.2 qemu (by libvirt synthesizing the event) or by rejecting the ASYNC flag if we don't know for sure that we have newer qemu; but my rationale for not doing so in this patch is that (1) _only_ RHEL 6.2 backported the preview version of block_job_cancel (shame on them for not naming it __com.redhat_block_job_cancel) - all other qemu versions either lack the command altogether or have upstream qemu 1.1's semantics of an async command, and (2) I couldn't come up with a good way to probe whether we are talking to new or old qemu during capability parsing. So such a patch would only be relevant for RHEL 6.2, and might not be worth posting to upstream libvirt. Actually, on thinking about it, I guess we _might_ have a way to tell old qemu apart from new qemu: old qemu has block_job_cancel but not drive-mirror. If drive-mirror gets approved in time for qemu 1.1, then we can use that as the witness to whether block_job_cancel is asynchronous.
+ /* Qemu provides asynchronous block job cancellation, but without + * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a + * synchronous operation. Provide this behavior by waiting here, + * so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { + ret = 1;
Why is ret set to 1? It will be assigned the return value of qemuMonitorBlockJob before it's ever used for anything, so this seems unnecessary.
Oh, I see. I refactored the code (my original attempt swapped the sleep and the qemuMonitorBlockJob, so priming the variable was necessary to avoid an early exit). I'll remove the dead assignment.
+ while (1) { + /* Poll every 50ms */ + struct timespec ts = { .tv_sec = 0, + .tv_nsec = 50 * 1000 * 1000ull };
This timespec could just as well be static const, couldn't it? No big deal one way or the other, though.
Sure, since it isn't going to change. Here, I was just copying the polling loop we had elsewhere in qemu_migration.c.
ACK once the items above are addressed (either by explaining, changing code, or telling me why I'm wrong)
I'll add in the capability check for v4 (since I already have to respin the series for other reasons). -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

In my testing, I was able to provoke an odd block pull failure: $ virsh blockpull dom vda --bandwidth 10000 error: Requested operation is not valid: No active operation on device: drive-virtio-disk0 merely by using gdb to artifically wait to do the block job set speed until after the pull had already finished. But in reality, that should be a success, since the pull finished before we had a chance to set speed. Furthermore, using a double job lock is annoying; we can alter the speed as part of the same job that started the block pull for less likelihood of hitting the speed failure in the first place. In fixing this, I discovered commit 10ec36e2 was wrong for marking the info parameter of qemuMonitorBlockJob as non-null. * src/qemu/qemu_monitor.h (qemuMonitorBlockJob): Fix prototype. (BLOCK_JOB_CMD): Add new mode. * src/qemu/qemu_driver.c (qemuDomainBlockRebase): Move secondary job call... (qemuDomainBlockJobImpl): ...here, for fewer locks. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONBlockJob): Change return value on new internal mode. --- src/qemu/qemu_driver.c | 18 +++++++++--------- src/qemu/qemu_monitor.h | 5 +++-- src/qemu/qemu_monitor_json.c | 32 ++++++++++++++++++++------------ 3 files changed, 32 insertions(+), 23 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5f0eb05..f3bd043 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11639,14 +11639,20 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, * itself, and validating that base is on the chain, rather than * relying on qemu to do this. */ ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode); + if (ret == 0 && mode == BLOCK_JOB_PULL && bandwidth) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL); qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto endjob; + /* Qemu provides asynchronous block job cancellation, but without * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a * synchronous operation. Provide this behavior by waiting here, * so we don't get confused by newly scheduled block jobs. */ - if (ret == 0 && mode == BLOCK_JOB_ABORT && + if (mode == BLOCK_JOB_ABORT && !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) { ret = 1; while (1) { @@ -11724,15 +11730,9 @@ static int qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { - int ret; - virCheckFlags(0, -1); - ret = qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, - BLOCK_JOB_PULL, flags); - if (ret == 0 && bandwidth != 0) - ret = qemuDomainBlockJobImpl(dom, path, NULL, bandwidth, NULL, - BLOCK_JOB_SPEED, flags); - return ret; + return qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, + BLOCK_JOB_PULL, flags); } static int diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index b480966..2e6ac79 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -530,7 +530,8 @@ typedef enum { BLOCK_JOB_ABORT = 0, BLOCK_JOB_INFO = 1, BLOCK_JOB_SPEED = 2, - BLOCK_JOB_PULL = 3, + BLOCK_JOB_SPEED_INTERNAL = 3, + BLOCK_JOB_PULL = 4, } BLOCK_JOB_CMD; int qemuMonitorBlockJob(qemuMonitorPtr mon, @@ -539,7 +540,7 @@ int qemuMonitorBlockJob(qemuMonitorPtr mon, unsigned long bandwidth, virDomainBlockJobInfoPtr info, int mode) - ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(5); + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); int qemuMonitorOpenGraphics(qemuMonitorPtr mon, const char *protocol, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index c9f0f0c..13c35c8 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -3435,7 +3435,7 @@ qemuMonitorJSONBlockJob(qemuMonitorPtr mon, } else if (mode == BLOCK_JOB_INFO) { cmd_name = "query-block-jobs"; cmd = qemuMonitorJSONMakeCommand(cmd_name, NULL); - } else if (mode == BLOCK_JOB_SPEED) { + } else if (mode == BLOCK_JOB_SPEED || mode == BLOCK_JOB_SPEED_INTERNAL) { cmd_name = "block_job_set_speed"; cmd = qemuMonitorJSONMakeCommand(cmd_name, "s:device", device, "U:value", @@ -3457,22 +3457,30 @@ qemuMonitorJSONBlockJob(qemuMonitorPtr mon, ret = qemuMonitorJSONCommand(mon, cmd, &reply); if (ret == 0 && virJSONValueObjectHasKey(reply, "error")) { - if (qemuMonitorJSONHasError(reply, "DeviceNotActive")) - qemuReportError(VIR_ERR_OPERATION_INVALID, - _("No active operation on device: %s"), device); - else if (qemuMonitorJSONHasError(reply, "DeviceInUse")) + ret = -1; + if (qemuMonitorJSONHasError(reply, "DeviceNotActive")) { + /* If a job completes before we get a chance to set the + * speed, we don't want to fail the original command. */ + if (mode == BLOCK_JOB_SPEED_INTERNAL) + ret = 0; + else + qemuReportError(VIR_ERR_OPERATION_INVALID, + _("No active operation on device: %s"), + device); + } else if (qemuMonitorJSONHasError(reply, "DeviceInUse")){ qemuReportError(VIR_ERR_OPERATION_FAILED, - _("Device %s in use"), device); - else if (qemuMonitorJSONHasError(reply, "NotSupported")) + _("Device %s in use"), device); + } else if (qemuMonitorJSONHasError(reply, "NotSupported")) { qemuReportError(VIR_ERR_OPERATION_INVALID, - _("Operation is not supported for device: %s"), device); - else if (qemuMonitorJSONHasError(reply, "CommandNotFound")) + _("Operation is not supported for device: %s"), + device); + } else if (qemuMonitorJSONHasError(reply, "CommandNotFound")) { qemuReportError(VIR_ERR_OPERATION_INVALID, - _("Command '%s' is not found"), cmd_name); - else + _("Command '%s' is not found"), cmd_name); + } else { qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("Unexpected error")); - ret = -1; + } } if (ret == 0 && mode == BLOCK_JOB_INFO) -- 1.7.7.6

This patch introduces a new block job, useful for live storage migration using pre-copy streaming. Using a live VM with the backing chain: base <- snap1 <- snap2 as the starting point, we have: - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY) creates /path/to/copy with the same format as snap2, with no backing file, so entire chain is copied and flattened - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) creates /path/to/copy as a raw file, so entire chain is copied and flattened - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW) creates /path/to/copy with the same format as snap2, but with snap1 as a backing file, so only snap2 is copied. - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) reuse existing /path/to/copy (must have empty contents, and format is probed from the metadata), and copy the full chain - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW) reuse existing /path/to/copy (contents must be identical to snap1, and format is probed from the metadata), and copy only the contents of snap2 - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) reuse existing /path/to/copy (must be raw volume with contents identical to snap1), and copy only the contents of snap2 Less useful combinations: - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_SHALLOW| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) fail if source is not raw, otherwise create /path/to/copy as raw and the single file is copied (no chain involved) - virDomainBlockRebase(dom, disk, "/path/to/copy", 0, VIR_DOMAIN_BLOCK_REBASE_COPY|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT| VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) makes little sense: the destination must be raw but have no contents, meaning that it is an empty file, so there is nothing to reuse The other three flags are rejected without VIR_DOMAIN_BLOCK_COPY. It would be nice if we could issue an event when pivoting from phase 1 to phase 2, but qemu hasn't implemented that, so we would have to poll in order to synthesize it ourselves. Meanwhile, qemu will give us a distinct job info and completion event when we either cancel or pivot to end the job. Pivoting is accomplished via the new: virDomainBlockJobAbort(dom, disk, VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT) Management applications can pre-create the copy with a relative backing file name, and use the VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT flag to have qemu reuse the metadata; if the management application also copies the backing files to a new location, this can be used to perform live storage migration of an entire backing chain. * include/libvirt/libvirt.h.in (VIR_DOMAIN_BLOCK_JOB_TYPE_COPY): New block job type. (virDomainBlockJobAbortFlags, virDomainBlockRebaseFlags): New enums. * src/libvirt.c (virDomainBlockRebase): Document the new flags, and implement general restrictions on flag combinations. (virDomainBlockJobAbort): Document the new flag. (virDomainSaveFlags, virDomainSnapshotCreateXML) (virDomainRevertToSnapshot, virDomainDetachDeviceFlags): Document restrictions. * include/libvirt/virterror.h (VIR_ERR_BLOCK_COPY_ACTIVE): New error. * src/util/virterror.c (virErrorMsg): Define it. --- include/libvirt/libvirt.h.in | 24 ++++++++++- include/libvirt/virterror.h | 1 + src/libvirt.c | 95 ++++++++++++++++++++++++++++++++++++++---- src/util/virterror.c | 6 +++ 4 files changed, 116 insertions(+), 10 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index 97ad99d..ac5df95 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1934,12 +1934,15 @@ int virDomainUpdateDeviceFlags(virDomainPtr domain, /** * virDomainBlockJobType: * - * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull or - * virDomainBlockRebase) + * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull, or + * virDomainBlockRebase without flags), job ends on completion + * VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: Block Copy (virDomainBlockRebase with + * flags), job exists as long as mirroring is active */ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN = 0, VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, + VIR_DOMAIN_BLOCK_JOB_TYPE_COPY = 2, #ifdef VIR_ENUM_SENTINELS VIR_DOMAIN_BLOCK_JOB_TYPE_LAST @@ -1950,9 +1953,11 @@ typedef enum { * virDomainBlockJobAbortFlags: * * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + * VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT: Pivot to mirror when ending a copy job */ typedef enum { VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1 << 0, + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT = 1 << 1, } virDomainBlockJobAbortFlags; /* An iterator for monitoring block job operations */ @@ -1983,6 +1988,21 @@ int virDomainBlockJobSetSpeed(virDomainPtr dom, const char *disk, int virDomainBlockPull(virDomainPtr dom, const char *disk, unsigned long bandwidth, unsigned int flags); + +/** + * virDomainBlockRebaseFlags: + * + * Flags available for virDomainBlockRebase(). + */ +typedef enum { + VIR_DOMAIN_BLOCK_REBASE_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW = 1 << 2, /* Make destination file raw */ + VIR_DOMAIN_BLOCK_REBASE_COPY = 1 << 3, /* Start a copy job */ +} virDomainBlockRebaseFlags; + int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags); diff --git a/include/libvirt/virterror.h b/include/libvirt/virterror.h index e04d29e..070fdb5 100644 --- a/include/libvirt/virterror.h +++ b/include/libvirt/virterror.h @@ -249,6 +249,7 @@ typedef enum { VIR_ERR_NO_DOMAIN_METADATA = 80, /* The metadata is not present */ VIR_ERR_MIGRATE_UNSAFE = 81, /* Migration is not safe */ VIR_ERR_OVERFLOW = 82, /* integer overflow */ + VIR_ERR_BLOCK_COPY_ACTIVE = 83, /* action prevented by block copy job */ } virErrorNumber; /** diff --git a/src/libvirt.c b/src/libvirt.c index af22232..d4b648c 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -2696,6 +2696,10 @@ error: * A save file can be inspected or modified slightly with * virDomainSaveImageGetXMLDesc() and virDomainSaveImageDefineXML(). * + * Some hypervisors may prevent this operation if there is a current + * block copy operation; in that case, use virDomainBlockJobAbort() + * to stop the block copy first. + * * Returns 0 in case of success and -1 in case of failure. */ int @@ -7891,6 +7895,11 @@ error: * virDomainUndefine(). A previous definition for this domain would be * overriden if it already exists. * + * Some hypervisors may prevent this operation if there is a current + * block copy operation on a transient domain with the same id as the + * domain being defined; in that case, use virDomainBlockJobAbort() to + * stop the block copy first. + * * Returns NULL in case of error, a pointer to the domain otherwise */ virDomainPtr @@ -9424,6 +9433,10 @@ error: * return failure if LIVE is specified but it only supports removing the * persisted device allocation. * + * Some hypervisors may prevent this operation if there is a current + * block copy operation on the device being detached; in that case, + * use virDomainBlockJobAbort() to stop the block copy first. + * * Returns 0 in case of success, -1 in case of failure. */ int @@ -17124,6 +17137,10 @@ virDomainSnapshotGetConnect(virDomainSnapshotPtr snapshot) * that it is still possible to fail after disks have changed, but only * in the much rarer cases of running out of memory or disk space). * + * Some hypervisors may prevent this operation if there is a current + * block copy operation; in that case, use virDomainBlockJobAbort() + * to stop the block copy first. + * * Returns an (opaque) virDomainSnapshotPtr on success, NULL on failure. */ virDomainSnapshotPtr @@ -17913,13 +17930,24 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * - * By default, this function performs a synchronous operation and the caller + * If the current block job for @disk is VIR_DOMAIN_BLOCK_JOB_TYPE_PULL, then + * by default, this function performs a synchronous operation and the caller * may assume that the operation has completed when 0 is returned. However, * BlockJob operations may take a long time to complete, and during this time * further domain interactions may be unresponsive. To avoid this problem, * pass VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the @flags argument to enable * asynchronous behavior. Either way, when the job has been cancelled, a * BlockJob event will be emitted, with status VIR_DOMAIN_BLOCK_JOB_CANCELLED. + * In this usage, @flags must not contain VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT. + * + * If the current block job for @disk is VIR_DOMAIN_BLOCK_JOB_TYPE_COPY, then + * the default is to abort the mirroring and revert to the source disk; + * adding @flags of VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT causes this call to + * fail with VIR_ERR_BLOCK_COPY_ACTIVE if the copy is not fully populated, + * otherwise it will swap the disk over to the copy to end the mirroring. An + * event will be issued when the job is ended, and it is possible to use + * VIR_DOMAIN_BLOCK_JOB_ABORT_SYNC to control whether this command waits + * for the completion of the job. * * Returns -1 in case of failure, 0 when successful. */ @@ -18169,19 +18197,55 @@ error: * @disk: path to the block device, or device shorthand * @base: path to backing file to keep, or NULL for no backing file * @bandwidth: (optional) specify copy bandwidth limit in Mbps - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise-OR of virDomainBlockRebaseFlags * * Populate a disk image with data from its backing image chain, and - * setting the backing image to @base. @base must be the absolute + * setting the backing image to @base, or alternatively copy an entire + * backing chain to a new file @base. + * + * When @flags is 0, this starts a pull, where @base must be the absolute * path of one of the backing images further up the chain, or NULL to * convert the disk image so that it has no backing image. Once all * data from its backing image chain has been pulled, the disk no * longer depends on those intermediate backing images. This function * pulls data for the entire device in the background. Progress of - * the operation can be checked with virDomainGetBlockJobInfo() and - * the operation can be aborted with virDomainBlockJobAbort(). When - * finished, an asynchronous event is raised to indicate the final - * status. + * the operation can be checked with virDomainGetBlockJobInfo() with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_PULL, and the operation can be + * aborted with virDomainBlockJobAbort(). When finished, an asynchronous + * event is raised to indicate the final status, and the job no longer + * exists. + * + * When @flags includes VIR_DOMAIN_BLOCK_REBASE_COPY, this starts a copy, + * where @base must be the name of a new file to copy the chain to. By + * default, the copy will pull the entire source chain into the destination + * file, but if @flags also contains VIR_DOMAIN_BLOCK_REBASE_SHALLOW, then + * only the top of the source chain will be copied (the source and + * destination have a common backing file). By default, @base will be + * created with the same file format as the source, but this can be altered + * by adding VIR_DOMAIN_BLOCK_REBASE_COPY_RAW to force the copy to be raw + * (does not make sense with the shallow flag unless the source is also raw), + * or by using VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT to reuse an existing file + * with initial contents identical to the backing file of the source (this + * allows a management app to pre-create files with relative backing file + * names, rather than the default of absolute backing file names; it is + * generally used with the shallow flag, since otherwise the destination + * file must start with empty contents). + * + * A copy job has two parts; in the first phase, the @bandwidth parameter + * affects how fast the source is pulled into the destination, and the job + * can only be canceled by reverting to the source file; progress in this + * phase can be tracked via the virDomainBlockJobInfo() command, with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_COPY. The job transitions to the + * second phase when the job info states cur == end, and remains alive to + * mirror all further changes to both source and destination. The user + * must call virDomainBlockJobAbort() to end the mirroring while choosing + * whether to revert to source or pivot to the destination. An event is + * issued when the job ends, and in the future, an event may be added when + * the job transitions from pulling to mirroring. + * + * Some hypervisors will restrict certain actions, such as virDomainSave() + * or virDomainDetachDevice(), while a copy job is active; they may + * also restrict a copy job to transient domains. * * The @disk parameter is either an unambiguous source name of the * block device (the <source file='...'/> sub-element, such as @@ -18195,7 +18259,8 @@ error: * suitable default. Some hypervisors do not support this feature and will * return an error if bandwidth is not 0. * - * When @base is NULL, this is identical to virDomainBlockPull(). + * When @base is NULL and @flags is 0, this is identical to + * virDomainBlockPull(). * * Returns 0 if the operation has started, -1 on failure. */ @@ -18228,6 +18293,20 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, goto error; } + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + if (!base) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("base is required when starting a copy")); + goto error; + } + } else if (flags & (VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW)) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("use of flags requires a copy job")); + goto error; + } + if (conn->driver->domainBlockRebase) { int ret; ret = conn->driver->domainBlockRebase(dom, disk, base, bandwidth, diff --git a/src/util/virterror.c b/src/util/virterror.c index ff9a36f..845081e 100644 --- a/src/util/virterror.c +++ b/src/util/virterror.c @@ -1250,6 +1250,12 @@ virErrorMsg(virErrorNumber error, const char *info) else errmsg = _("numerical overflow: %s"); break; + case VIR_ERR_BLOCK_COPY_ACTIVE: + if (!info) + errmsg = _("block copy still active"); + else + errmsg = _("block copy still active: %s"); + break; } return errmsg; } -- 1.7.7.6

Rather than further overloading 'blockpull', I decided to create a new virsh command to expose the new flags of virDomainBlockRebase. Someday, I'd also like to make blockpull and blockcopy have a synchronous mode, which blocks until the event happens or Ctrl-C is pressed, as well as a --verbose flag to print status updates before the job finishes - but not today. * tools/virsh.c (VSH_CMD_BLOCK_JOB_COPY): New mode. (blockJobImpl): Support new flags. (cmdBlockCopy): New command. (cmdBlockJob): Support new job info, new abort flag. * tools/virsh.pod (blockcopy, blockjob): Document the new command and flags. --- tools/virsh.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++------- tools/virsh.pod | 37 +++++++++++++++++++++++-- 2 files changed, 104 insertions(+), 14 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 7180b83..99f3d22 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7515,16 +7515,18 @@ typedef enum { VSH_CMD_BLOCK_JOB_INFO = 1, VSH_CMD_BLOCK_JOB_SPEED = 2, VSH_CMD_BLOCK_JOB_PULL = 3, -} VSH_CMD_BLOCK_JOB_MODE; + VSH_CMD_BLOCK_JOB_COPY = 4, +} vshCmdBlockJobMode; static int blockJobImpl(vshControl *ctl, const vshCmd *cmd, - virDomainBlockJobInfoPtr info, int mode) + virDomainBlockJobInfoPtr info, int mode) { virDomainPtr dom = NULL; const char *name, *path; unsigned long bandwidth = 0; int ret = -1; + const char *base = NULL; unsigned int flags = 0; if (!vshConnectionUsability(ctl, ctl->conn)) @@ -7541,22 +7543,39 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, goto cleanup; } - if (mode == VSH_CMD_BLOCK_JOB_ABORT) { + switch ((vshCmdBlockJobMode) mode) { + case VSH_CMD_BLOCK_JOB_ABORT: if (vshCommandOptBool(cmd, "async")) flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC; + if (vshCommandOptBool(cmd, "pivot")) + flags |= VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT; ret = virDomainBlockJobAbort(dom, path, flags); - } else if (mode == VSH_CMD_BLOCK_JOB_INFO) { + break; + case VSH_CMD_BLOCK_JOB_INFO: ret = virDomainGetBlockJobInfo(dom, path, info, 0); - } else if (mode == VSH_CMD_BLOCK_JOB_SPEED) { + break; + case VSH_CMD_BLOCK_JOB_SPEED: ret = virDomainBlockJobSetSpeed(dom, path, bandwidth, 0); - } else if (mode == VSH_CMD_BLOCK_JOB_PULL) { - const char *base = NULL; + break; + case VSH_CMD_BLOCK_JOB_PULL: if (vshCommandOptString(cmd, "base", &base) < 0) goto cleanup; if (base) ret = virDomainBlockRebase(dom, path, base, bandwidth, 0); else ret = virDomainBlockPull(dom, path, bandwidth, 0); + break; + case VSH_CMD_BLOCK_JOB_COPY: + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; + if (vshCommandOptBool(cmd, "shallow")) + flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + if (vshCommandOptBool(cmd, "reuse-external")) + flags |= VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; + if (vshCommandOptBool(cmd, "raw")) + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + if (vshCommandOptString(cmd, "dest", &base) < 0) + goto cleanup; + ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); } cleanup: @@ -7566,6 +7585,34 @@ cleanup: } /* + * "blockcopy" command + */ +static const vshCmdInfo info_block_copy[] = { + {"help", N_("Start a block copy operation.")}, + {"desc", N_("Populate a disk from its backing image.")}, + {NULL, NULL} +}; + +static const vshCmdOptDef opts_block_copy[] = { + {"domain", VSH_OT_DATA, VSH_OFLAG_REQ, N_("domain name, id or uuid")}, + {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, + {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, + {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, + {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, + {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, + {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, + {NULL, 0, 0, NULL} +}; + +static bool +cmdBlockCopy(vshControl *ctl, const vshCmd *cmd) +{ + if (blockJobImpl(ctl, cmd, NULL, VSH_CMD_BLOCK_JOB_COPY) != 0) + return false; + return true; +} + +/* * "blockpull" command */ static const vshCmdInfo info_block_pull[] = { @@ -7607,6 +7654,8 @@ static const vshCmdOptDef opts_block_job[] = { N_("Abort the active job on the specified disk")}, {"async", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("don't wait for --abort to complete")}, + {"pivot", VSH_OT_BOOL, VSH_OFLAG_NONE, + N_("conclude and pivot a copy job")}, {"info", VSH_OT_BOOL, VSH_OFLAG_NONE, N_("Get active job information for the specified disk")}, {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, @@ -7621,7 +7670,9 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) virDomainBlockJobInfo info; const char *type; int ret; - bool abortMode = vshCommandOptBool(cmd, "abort"); + bool abortMode = (vshCommandOptBool(cmd, "abort") || + vshCommandOptBool(cmd, "async") || + vshCommandOptBool(cmd, "pivot")); bool infoMode = vshCommandOptBool(cmd, "info"); bool bandwidth = vshCommandOptBool(cmd, "bandwidth"); @@ -7645,10 +7696,17 @@ cmdBlockJob(vshControl *ctl, const vshCmd *cmd) if (ret == 0 || mode != VSH_CMD_BLOCK_JOB_INFO) return true; - if (info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_PULL) + switch (info.type) { + case VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: type = _("Block Pull"); - else + break; + case VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: + type = _("Block Copy"); + break; + default: type = _("Unknown job"); + break; + } print_job_progress(type, info.end - info.cur, info.end); if (info.bandwidth != 0) @@ -17127,8 +17185,9 @@ static const vshCmdDef domManagementCmds[] = { {"autostart", cmdAutostart, opts_autostart, info_autostart, 0}, {"blkdeviotune", cmdBlkdeviotune, opts_blkdeviotune, info_blkdeviotune, 0}, {"blkiotune", cmdBlkiotune, opts_blkiotune, info_blkiotune, 0}, - {"blockpull", cmdBlockPull, opts_block_pull, info_block_pull, 0}, + {"blockcopy", cmdBlockCopy, opts_block_copy, info_block_copy, 0}, {"blockjob", cmdBlockJob, opts_block_job, info_block_job, 0}, + {"blockpull", cmdBlockPull, opts_block_pull, info_block_pull, 0}, {"blockresize", cmdBlockResize, opts_block_resize, info_block_resize, 0}, {"change-media", cmdChangeMedia, opts_change_media, info_change_media, 0}, #ifndef WIN32 diff --git a/tools/virsh.pod b/tools/virsh.pod index aac72d1..50a495d 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -638,6 +638,34 @@ currently in use by a running domain. Other contexts that require a MAC address of virtual interface (such as I<detach-interface> or I<domif-setlink>) will accept the MAC address printed by this command. +=item B<blockcopy> I<domain> I<path> I<dest> [I<bandwidth>] [I<--shallow>] +[I<--reuse-external>] [I<--raw>] + +Copy a disk backing image chain to I<dest>. By default, this command +flattens the entire chain; but if I<--shallow> is specified, the copy +shares the backing chain. + +If I<--reuse-external> is specified, then I<dest> must exist and have +contents identical to the resulting backing file (that is, it must +start with contents matching the backing file I<disk> if I<--shallow> +is used, otherwise it must start empty); this option is typically used +to set up a relative backing file name in the destination. + +The format of the destination is determined by the first match in the +following list: if I<--raw> is specified, it will be raw; if +I<--reuse-external> is specified, the existing destination is probed +for a format; and in all other cases, the destination format will +match the source format. + +The copy runs in the background; initially, the job must copy all data +from the source, and during this phase, the job can only be canceled to +revert back to the source disk. After this phase completes, both the +source and the destination remain mirrored until a call to B<blockjob> +with the I<--abort> and I<--pivot> flags pivots over to the copy. + +I<path> specifies fully-qualified path of the disk. +I<bandwidth> specifies copying bandwidth limit in Mbps. + =item B<blockpull> I<domain> I<path> [I<bandwidth>] [I<base>] Populate a disk from its backing image chain. By default, this command @@ -686,16 +714,19 @@ Both I<--live> and I<--current> flags may be given, but I<--current> is exclusive. If no flag is specified, behavior is different depending on hypervisor. -=item B<blockjob> I<domain> I<path> { I<--abort> [I<--async>] | +=item B<blockjob> I<domain> I<path> { [I<--abort>] [I<--async>] [I<--pivot] | [I<--info>] | I<bandwidth> } -Manage active block operations. If no mode is chosen, I<--info> is assumed. +Manage active block operations. If no mode is chosen, I<--info> is assumed, +except that I<--async> and I<--pivot> imply I<--abort>. I<path> specifies fully-qualified path of the disk. If I<--abort> is specified, the active job on the specified disk will be aborted. If I<--async> is also specified, this command will return -immediately, rather than waiting for the cancelation to complete. +immediately, rather than waiting for the cancelation to complete. If +I<--pivot> is specified, this requests that an active copy job +be pivoted over to the new copy. If I<--info> is specified, the active job information on the specified disk will be printed. I<bandwidth> can be used to set bandwidth limit for the active job. -- 1.7.7.6

QUESTION: should we parse and ignore <mirror> on input, rather than rejecting it? By rejecting it, I can't add a unit test, since the unit test framework currently doesn't expose a way to trigger internal parsing. In order to track a block copy job across libvirtd restarts, we need to save internal XML that tracks the name of the file holding the mirror. Displaying this name in dumpxml might also be useful to the user, even if we don't yet have a way to (re-) start a domain with mirroring enabled up front. This is done with a new <mirror> sub-element to <disk>, as in: <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/original.img'/> <mirror file='/var/lib/libvirt/images/copy.img' format='qcow2'/> ... </disk> * docs/schemas/domaincommon.rng (diskspec): Add diskMirror. * docs/formatdomain.html.in (elementsDisks): Document it. * src/conf/domain_conf.h (_virDomainDiskDef): New members. * src/conf/domain_conf.c (virDomainDiskDefFree): Clean them. (virDomainDiskDefParseXML): Parse them, but only internally. (virDomainDiskDefFormat): Output them. --- docs/formatdomain.html.in | 11 ++++++++++ docs/schemas/domaincommon.rng | 19 +++++++++++++++-- src/conf/domain_conf.c | 42 +++++++++++++++++++++++++++++++++++++++++ src/conf/domain_conf.h | 5 ++++ 4 files changed, 74 insertions(+), 3 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index a382d30..534c44b 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1296,6 +1296,17 @@ </table> <span class="since">Since 0.9.7</span> </dd> + <dt><code>mirror</code></dt> + <dd> + This element is present if the hypervisor has started a block + copy operation (via the <code>virDomainBlockCopy</code> API), + where the mirror location in attribute <code>file</code> will + eventually have the same contents as the source, and with the + file format in attribute <code>format</code> (which might + differ from the format of the source). For now, this element + only valid in output; it is rejected on + input. <span class="since">Since 0.9.12</span> + </dd> <dt><code>target</code></dt> <dd>The <code>target</code> element controls the bus / device under which the disk is exposed to the guest diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 0cc04af..66c91a2 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -772,6 +772,9 @@ <ref name="driver"/> </optional> <optional> + <ref name='diskMirror'/> + </optional> + <optional> <ref name="diskAuth"/> </optional> <ref name="target"/> @@ -1013,9 +1016,7 @@ </element> </define> <!-- - Disk may use a special driver for access. Currently this is - only defined for Xen for tap/aio and file, but will certainly be - extended in the future, and libvirt doesn't look for specific values. + Disk may use a special driver for access. --> <define name="driver"> <element name="driver"> @@ -3024,6 +3025,18 @@ <empty/> </element> </define> + <define name='diskMirror'> + <element name='mirror'> + <attribute name='file'> + <ref name='absFilePath'/> + </attribute> + <optional> + <attribute name='format'> + <ref name="genericName"/> + </attribute> + </optional> + </element> + </define> <define name="diskAuth"> <element name="auth"> <attribute name="username"> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index c6b97e1..38c461a 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -933,6 +933,8 @@ void virDomainDiskDefFree(virDomainDiskDefPtr def) VIR_FREE(def->dst); VIR_FREE(def->driverName); VIR_FREE(def->driverType); + VIR_FREE(def->mirror); + VIR_FREE(def->mirrorFormat); VIR_FREE(def->auth.username); if (def->auth.secretType == VIR_DOMAIN_DISK_SECRET_TYPE_USAGE) VIR_FREE(def->auth.secret.usage); @@ -3318,6 +3320,8 @@ virDomainDiskDefParseXML(virCapsPtr caps, char *ioeventfd = NULL; char *event_idx = NULL; char *copy_on_read = NULL; + char *mirror = NULL; + char *mirrorFormat = NULL; char *devaddr = NULL; virStorageEncryptionPtr encryption = NULL; char *serial = NULL; @@ -3453,6 +3457,22 @@ virDomainDiskDefParseXML(virCapsPtr caps, ioeventfd = virXMLPropString(cur, "ioeventfd"); event_idx = virXMLPropString(cur, "event_idx"); copy_on_read = virXMLPropString(cur, "copy_on_read"); + } else if ((mirror == NULL) && + (xmlStrEqual(cur->name, BAD_CAST "mirror"))) { + if (flags & VIR_DOMAIN_XML_INTERNAL_STATUS) { + mirror = virXMLPropString(cur, "file"); + if (!mirror) { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("mirror requires file name")); + goto error; + } + mirrorFormat = virXMLPropString(cur, "format"); + } else { + virDomainReportError(VIR_ERR_XML_ERROR, "%s", + _("Cannot handle disk mirror on " + "input yet")); + goto error; + } } else if (xmlStrEqual(cur->name, BAD_CAST "auth")) { authUsername = virXMLPropString(cur, "username"); if (authUsername == NULL) { @@ -3867,6 +3887,10 @@ virDomainDiskDefParseXML(virCapsPtr caps, driverName = NULL; def->driverType = driverType; driverType = NULL; + def->mirror = mirror; + mirror = NULL; + def->mirrorFormat = mirrorFormat; + mirrorFormat = NULL; def->encryption = encryption; encryption = NULL; def->serial = serial; @@ -3882,6 +3906,12 @@ virDomainDiskDefParseXML(virCapsPtr caps, !(def->driverName = strdup(caps->defaultDiskDriverName))) goto no_memory; + + if (def->mirror && !def->mirrorFormat && + caps->defaultDiskDriverType && + !(def->mirrorFormat = strdup(caps->defaultDiskDriverType))) + goto no_memory; + if (def->info.type == VIR_DOMAIN_DEVICE_ADDRESS_TYPE_NONE && virDomainDiskDefAssignAddress(caps, def) < 0) goto error; @@ -3907,6 +3937,8 @@ cleanup: VIR_FREE(authUsage); VIR_FREE(driverType); VIR_FREE(driverName); + VIR_FREE(mirror); + VIR_FREE(mirrorFormat); VIR_FREE(cachetag); VIR_FREE(error_policy); VIR_FREE(rerror_policy); @@ -10829,6 +10861,16 @@ virDomainDiskDefFormat(virBufferPtr buf, } } + /* For now, mirroring is currently output-only: we always output + * it, but refuse to parse it on input except for internal parse + * on libvirtd restart. */ + if (def->mirror) { + virBufferEscapeString(buf, " <mirror file='%s'", def->mirror); + if (def->mirrorFormat) + virBufferAsprintf(buf, " format='%s'", def->mirrorFormat); + virBufferAddLit(buf, ">\n"); + } + virBufferAsprintf(buf, " <target dev='%s' bus='%s'", def->dst, bus); if ((def->device == VIR_DOMAIN_DISK_DEVICE_FLOPPY || diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 0eed60e..abc953d 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -563,6 +563,10 @@ struct _virDomainDiskDef { char *driverName; char *driverType; + char *mirror; + char *mirrorFormat; + bool mirroring; + virDomainBlockIoTuneInfo blkdeviotune; char *serial; @@ -2125,6 +2129,7 @@ VIR_ENUM_DECL(virDomainDiskIo) VIR_ENUM_DECL(virDomainDiskSecretType) VIR_ENUM_DECL(virDomainDiskSnapshot) VIR_ENUM_DECL(virDomainDiskTray) +VIR_ENUM_DECL(virDomainDiskMirrorStage) VIR_ENUM_DECL(virDomainIoEventFd) VIR_ENUM_DECL(virDomainVirtioEventIdx) VIR_ENUM_DECL(virDomainDiskCopyOnRead) -- 1.7.7.6

For now, disk migration via block copy job is not implemented. But when we do implement it, we have to deal with the fact that qemu does not provide an easy way to re-start a qemu process with mirroring still intact (it _might_ be possible by using qemu -S then an initial 'drive-mirror' with disk reuse before starting the domain, but that gets hairy). Even something like 'virDomainSave' becomes hairy, if you realize the implications that 'virDomainRestore' would be stuck with recreating the same mirror layout. But if we step back and look at the bigger picture, we realize that the initial client of live storage migration via disk mirroring is oVirt, which always uses transient domains, and that if a transient domain is destroyed while a mirror exists, oVirt can easily restart the storage migration by creating a new domain that visits just the source storage, with no loss in data. We can make life a lot easier by being cowards, and forbidding certain operations on a domain. This patch guarantees that we never get in a state where we would have to restart a domain with a mirroring block copy, by preventing saves, snapshots, hot unplug of a disk in use, and conversion to a persistent domain. * src/conf/domain_conf.h (virDomainHasDiskMirror): New prototype. * src/conf/domain_conf.c (virDomainHasDiskMirror): New function. * src/libvirt_private.syms (domain_conf.h): Export it. * src/qemu/qemu_driver.c (qemuDomainSaveInternal) (qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot) (qemuDomainBlockJobImpl, qemudDomainDefine): Prevent dangerous actions while block copy is already in action. * src/qemu/qemu_hotplug.c (qemuDomainDetachDiskDevice): Likewise. --- src/conf/domain_conf.c | 12 ++++++++++++ src/conf/domain_conf.h | 1 + src/libvirt_private.syms | 1 + src/qemu/qemu_driver.c | 31 ++++++++++++++++++++++++++++++- src/qemu/qemu_hotplug.c | 7 +++++++ 5 files changed, 51 insertions(+), 1 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 38c461a..e67be42 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -7183,6 +7183,18 @@ virDomainDiskRemoveByName(virDomainDefPtr def, const char *name) return virDomainDiskRemove(def, i); } +/* Return true if VM has at least one disk involved in a current block + * copy job (that is, with a <mirror> element in the disk xml). */ +bool +virDomainHasDiskMirror(virDomainObjPtr vm) +{ + int i; + for (i = 0; i < vm->def->ndisks; i++) + if (vm->def->disks[i]->mirror) + return true; + return false; +} + int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net) { if (VIR_REALLOC_N(def->nets, def->nnets + 1) < 0) diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index abc953d..77c501c 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -1954,6 +1954,7 @@ virDomainDiskDefPtr virDomainDiskRemove(virDomainDefPtr def, size_t i); virDomainDiskDefPtr virDomainDiskRemoveByName(virDomainDefPtr def, const char *name); +bool virDomainHasDiskMirror(virDomainObjPtr vm); int virDomainNetIndexByMac(virDomainDefPtr def, const unsigned char *mac); int virDomainNetInsert(virDomainDefPtr def, virDomainNetDefPtr net); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index a90f8a0..570940d 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -354,6 +354,7 @@ virDomainGraphicsSpiceZlibCompressionTypeFromString; virDomainGraphicsSpiceZlibCompressionTypeToString; virDomainGraphicsTypeFromString; virDomainGraphicsTypeToString; +virDomainHasDiskMirror; virDomainHostdevDefAlloc; virDomainHostdevDefClear; virDomainHostdevDefFree; diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index f3bd043..4421506 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2558,6 +2558,11 @@ qemuDomainSaveInternal(struct qemud_driver *driver, virDomainPtr dom, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } memset(&header, 0, sizeof(header)); memcpy(header.magic, QEMUD_SAVE_PARTIAL, sizeof(header.magic)); @@ -4947,6 +4952,12 @@ static virDomainPtr qemudDomainDefine(virConnectPtr conn, const char *xml) { goto cleanup; } def = NULL; + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + virDomainObjAssignDef(vm, NULL, false); + goto cleanup; + } vm->persistent = 1; if (virDomainSaveConfig(driver->configDir, @@ -10262,6 +10273,12 @@ qemuDomainSnapshotCreateXML(virDomainPtr domain, "%s", _("domain is marked for auto destroy")); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } + if (!vm->persistent && (flags & VIR_DOMAIN_SNAPSHOT_CREATE_HALT)) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", _("cannot halt after transient domain snapshot")); @@ -10869,6 +10886,11 @@ static int qemuDomainRevertToSnapshot(virDomainSnapshotPtr snapshot, _("no domain with matching uuid '%s'"), uuidstr); goto cleanup; } + if (virDomainHasDiskMirror(vm)) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, "%s", + _("domain has active block copy job")); + goto cleanup; + } snap = virDomainSnapshotFindByName(&vm->snapshots, snapshot->name); if (!snap) { @@ -11607,6 +11629,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, char uuidstr[VIR_UUID_STRING_BUFLEN]; char *device = NULL; int ret = -1; + int idx; qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11617,10 +11640,16 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto cleanup; } - device = qemuDiskPathToAlias(vm, path, NULL); + device = qemuDiskPathToAlias(vm, path, &idx); if (!device) { goto cleanup; } + if (mode == BLOCK_JOB_PULL && vm->def->disks[idx]->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + vm->def->disks[idx]->dst); + goto cleanup; + } if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) goto cleanup; diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 857b980..98fa8f8 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -1721,6 +1721,13 @@ int qemuDomainDetachDiskDevice(struct qemud_driver *driver, detach = vm->def->disks[i]; + if (detach->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' is in an active block copy job"), + detach->dst); + goto cleanup; + } + if (qemuCgroupControllerActive(driver, VIR_CGROUP_CONTROLLER_DEVICES)) { if (virCgroupForDomain(driver->cgroup, vm->def->name, &cgroup, 0) != 0) { qemuReportError(VIR_ERR_INTERNAL_ERROR, -- 1.7.7.6

The new block copy storage migration sequence requires both the 'drive-mirror' action in 'transaction' (present if the 'drive-mirror' standalone monitor command also exists) and the 'drive-reopen' monitor command (it would be nice if that were also part of a 'transaction', but the initial qemu implementation has it standalone only). As of this[1] qemu email, both commands have been proposed but not yet incorporated into the tree, so there is a risk that qemu 1.1 will not have these commands, or will have something subtly different. [1]https://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01524.html * src/qemu/qemu_capabilities.h (QEMU_CAPS_DRIVE_MIRROR) (QEMU_CAPS_DRIVE_REOPEN): New bits. * src/qemu/qemu_capabilities.c (qemuCaps): Name them. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Set them. (qemuMonitorJSONDriveMirror, qemuMonitorDriveReopen): New functions. * src/qemu/qemu_monitor_json.h (qemuMonitorJSONDriveMirror) (qemuMonitorDriveReopen): Declare them. * src/qemu/qemu_monitor.c (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): New passthroughs. * src/qemu/qemu_monitor.h (qemuMonitorDriveMirror) (qemuMonitorDriveReopen): Declare them. --- src/qemu/qemu_capabilities.c | 3 + src/qemu/qemu_capabilities.h | 2 + src/qemu/qemu_monitor.c | 50 ++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 21 ++++++++++ src/qemu/qemu_monitor_json.c | 86 ++++++++++++++++++++++++++++++++++++++--- src/qemu/qemu_monitor_json.h | 19 ++++++++- 6 files changed, 172 insertions(+), 9 deletions(-) diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c index 0e09d6d..1938ae4 100644 --- a/src/qemu/qemu_capabilities.c +++ b/src/qemu/qemu_capabilities.c @@ -156,6 +156,9 @@ VIR_ENUM_IMPL(qemuCaps, QEMU_CAPS_LAST, "scsi-disk.channel", "scsi-block", "transaction", + + "drive-mirror", /* 90 */ + "drive-reopen", ); struct qemu_feature_flags { diff --git a/src/qemu/qemu_capabilities.h b/src/qemu/qemu_capabilities.h index 78cdbe0..405bf2a 100644 --- a/src/qemu/qemu_capabilities.h +++ b/src/qemu/qemu_capabilities.h @@ -124,6 +124,8 @@ enum qemuCapsFlags { QEMU_CAPS_SCSI_DISK_CHANNEL = 87, /* Is scsi-disk.channel available? */ QEMU_CAPS_SCSI_BLOCK = 88, /* -device scsi-block */ QEMU_CAPS_TRANSACTION = 89, /* transaction monitor command */ + QEMU_CAPS_DRIVE_MIRROR = 90, /* drive-mirror monitor command */ + QEMU_CAPS_DRIVE_REOPEN = 91, /* drive-reopen monitor command */ QEMU_CAPS_LAST, /* this must always be the last item */ }; diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index e1a8d4c..75694b7 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2685,6 +2685,32 @@ qemuMonitorDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, return ret; } +/* Add the drive-mirror action to a transaction. */ +int +qemuMonitorDriveMirror(qemuMonitorPtr mon, virJSONValuePtr actions, + const char *device, const char *file, + const char *format, int mode) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, actions=%p, device=%s, file=%s, format=%s, mode=%o", + mon, actions, device, file, format, mode); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveMirror(mon, actions, device, file, format, + mode); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-mirror requires JSON monitor")); + return ret; +} + /* Use the transaction QMP command to run atomic snapshot commands. */ int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) @@ -2701,6 +2727,30 @@ qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) return ret; } +/* Use the drive-reopen monitor command. */ +int +qemuMonitorDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret = -1; + + VIR_DEBUG("mon=%p, device=%s, file=%s, format=%s", + mon, device, file, format); + + if (!mon) { + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("monitor must not be NULL")); + return -1; + } + + if (mon->json) + ret = qemuMonitorJSONDriveReopen(mon, device, file, format); + else + qemuReportError(VIR_ERR_INVALID_ARG, "%s", + _("drive-reopen requires JSON monitor")); + return ret; +} + int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, char **reply, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 2e6ac79..a432b6f 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -508,8 +508,29 @@ int qemuMonitorDiskSnapshot(qemuMonitorPtr mon, const char *file, const char *format, bool reuse); + +typedef enum { + QEMU_MONITOR_DRIVE_MIRROR_ABSOLUTE, + QEMU_MONITOR_DRIVE_MIRROR_EXISTING, + QEMU_MONITOR_DRIVE_MIRROR_NO_BACKING, + + QEMU_MONITOR_DRIVE_MIRROR_LAST +} qemuMonitorDriveMirrorMode; + +int qemuMonitorDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, + const char *device, + const char *file, + const char *format, + int mode) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); int qemuMonitorTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorArbitraryCommand(qemuMonitorPtr mon, const char *cmd, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 13c35c8..ffa6c0c 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -967,12 +967,14 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, if (STREQ(name, "human-monitor-command")) *json_hmp = 1; - - if (STREQ(name, "system_wakeup")) + else if (STREQ(name, "system_wakeup")) qemuCapsSet(qemuCaps, QEMU_CAPS_WAKEUP); - - if (STREQ(name, "transaction")) + else if (STREQ(name, "transaction")) qemuCapsSet(qemuCaps, QEMU_CAPS_TRANSACTION); + else if (STREQ(name, "drive-mirror")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); + else if (STREQ(name, "drive-reopen")) + qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); } ret = 0; @@ -3142,7 +3144,7 @@ qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, const char *device, const char *file, const char *format, bool reuse) { - int ret; + int ret = -1; virJSONValuePtr cmd; virJSONValuePtr reply = NULL; @@ -3160,14 +3162,13 @@ qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, virJSONValuePtr actions, if (actions) { if (virJSONValueArrayAppend(actions, cmd) < 0) { virReportOOMError(); - ret = -1; } else { ret = 0; cmd = NULL; } } else { if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) - goto cleanup; + goto cleanup; if (qemuMonitorJSONHasError(reply, "CommandNotFound") && qemuMonitorCheckHMP(mon, "snapshot_blkdev")) { @@ -3185,6 +3186,50 @@ cleanup: return ret; } +VIR_ENUM_DECL(qemuMonitorDriveMirror) +VIR_ENUM_IMPL(qemuMonitorDriveMirror, QEMU_MONITOR_DRIVE_MIRROR_LAST, + "absolute-paths", "exsisting", "no-backing-file"); + +int +qemuMonitorJSONDriveMirror(qemuMonitorPtr mon ATTRIBUTE_UNUSED, + virJSONValuePtr actions, + const char *device, const char *file, + const char *format, int mode) +{ + int ret = -1; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + cmd = qemuMonitorJSONMakeCommandRaw(actions != NULL, + "drive-mirror", + "s:device", device, + "s:target", file, + "s:mode", + qemuMonitorDriveMirrorTypeToString(mode), + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if (actions) { + if (virJSONValueArrayAppend(actions, cmd) < 0) { + virReportOOMError(); + } else { + cmd = NULL; + ret = 0; + } + } else { + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + ret = qemuMonitorJSONCheckError(cmd, reply); + } + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) { @@ -3213,6 +3258,33 @@ cleanup: return ret; } +int +qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, + const char *file, const char *format) +{ + int ret; + virJSONValuePtr cmd; + virJSONValuePtr reply = NULL; + + cmd = qemuMonitorJSONMakeCommand("drive-reopen", + "s:device", device, + "s:new-image-file", file, + format ? "s:format" : NULL, format, + NULL); + if (!cmd) + return -1; + + if ((ret = qemuMonitorJSONCommand(mon, cmd, &reply)) < 0) + goto cleanup; + + ret = qemuMonitorJSONCheckError(cmd, reply); + +cleanup: + virJSONValueFree(cmd); + virJSONValueFree(reply); + return ret; +} + int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, char **reply_str, diff --git a/src/qemu/qemu_monitor_json.h b/src/qemu/qemu_monitor_json.h index a0f67aa..136c0f3 100644 --- a/src/qemu/qemu_monitor_json.h +++ b/src/qemu/qemu_monitor_json.h @@ -230,8 +230,23 @@ int qemuMonitorJSONDiskSnapshot(qemuMonitorPtr mon, const char *device, const char *file, const char *format, - bool reuse); -int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions); + bool reuse) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) + ATTRIBUTE_NONNULL(4) ATTRIBUTE_NONNULL(5); +int qemuMonitorJSONDriveMirror(qemuMonitorPtr mon, + virJSONValuePtr actions, + const char *device, + const char *file, + const char *format, + int mode) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(3) ATTRIBUTE_NONNULL(4); +int qemuMonitorJSONTransaction(qemuMonitorPtr mon, virJSONValuePtr actions) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); +int qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, + const char *device, + const char *file, + const char *format) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorJSONArbitraryCommand(qemuMonitorPtr mon, const char *cmd_str, -- 1.7.7.6

Handle the new type of block copy event and info. Of course, this patch does nothing until a later patch actually allows the creation/abort of a block copy job. * src/qemu/qemu_monitor_json.c (qemuMonitorJSONHandleBlockJobImpl) (qemuMonitorJSONGetBlockJobInfoOne): Translate new job type. * src/qemu/qemu_driver.c (qemuDomainBlockJobImpl): Snoop a successful info query to save effort on a pivot request. --- src/qemu/qemu_driver.c | 6 ++++++ src/qemu/qemu_monitor_json.c | 4 ++++ 2 files changed, 10 insertions(+), 0 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4421506..5dc263c 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11676,6 +11676,12 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (ret < 0) goto endjob; + /* Snoop block copy operations, so future cancel operations can + * avoid checking if pivot is safe. */ + if (mode == BLOCK_JOB_INFO && ret == 1 && vm->def->disks[idx]->mirror && + info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + vm->def->disks[idx]->mirroring = true; + /* Qemu provides asynchronous block job cancellation, but without * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a * synchronous operation. Provide this behavior by waiting here, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index ffa6c0c..bb080e5 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -788,6 +788,8 @@ qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type_str, "mirror")) + type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; switch (event) { case VIR_DOMAIN_BLOCK_JOB_COMPLETED: @@ -3411,6 +3413,8 @@ static int qemuMonitorJSONGetBlockJobInfoOne(virJSONValuePtr entry, } if (STREQ(type, "stream")) info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; + else if (STREQ(type, "mirror")) + info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_COPY; else info->type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; -- 1.7.7.6

This is the bare minimum to end a copy job (of course, until a later patch adds the ability to start a copy job, this patch doesn't do much in isolation; I've just split the patches to ease the review). This patch intentionally avoids SELinux, lock manager, and audit actions, saving that for a later patch that affects the overall lifecycle of a disk copy. In particular, I'm still fuzzy on the exact qemu error semantics, and whether I need to make more of an effort after a 'drive-reopen' fails. When a mirror job is started, cancelling the job safely reverts back to the source disk, regardless of whether the destination is in phase 1 (streaming, in which case the destination is worthless) or phase 2 (mirroring, in which case the destination is sync'd up to the source at the time of the cancel). Our existing code does just fine in either phase, other than some bookkeeping cleanup. Pivoting the job requires the use of the new 'drive-reopen' command. Here, failure of the command is potentially catastrophic to the domain, since it rips out the old disk before attempting to open the new one; if our recovery path of retrying the reopen on the original source disk also fails, the domain is hosed. If only qemu could get 'drive-reopen' inside 'transaction'... Interesting side note: while snapshot-create --disk-only creates a copy of the disk at a point in time by moving the domain on to a new file (the copy is the file now in the just-extended backing chain), blockjob --abort of a copy job creates a copy of the disk while keeping the domain on the original file. There may be potential improvements to the snapshot code to exploit block copy over multiple disks all at one point in time. And, if 'block_job_cancel' were made part of 'transaction', you could copy multiple disks at the same point in time without pausing the domain. This also implies we may want to add a --quiesce flag to the pivot operation, so that when breaking a mirror, the side of the mirror that we are abandoning is at least in a stable state with regards to guest I/O. * src/qemu/qemu_driver.c (qemuDomainBlockJobAbort): Accept new flag. (qemuDomainBlockPivot): New helper function. (qemuDomainBlockJobImpl): Implement it. --- src/qemu/qemu_driver.c | 118 +++++++++++++++++++++++++++++++++++++++++++++--- 1 files changed, 112 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 5dc263c..dd844a1 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11618,6 +11618,86 @@ cleanup: return ret; } +/* Called while holding the VM job lock, to implement a block job + * abort with pivot; this updates the VM definition as appropriate, on + * either success or failure (although there are some forms of + * catastrophic failure that will leave the VM unusable). */ +static int +qemuDomainBlockPivot(struct qemud_driver *driver, virDomainObjPtr vm, + const char *device, virDomainDiskDefPtr disk) +{ + int ret = -1; + qemuDomainObjPrivatePtr priv = vm->privateData; + virDomainBlockJobInfo info; + + /* Probe the status, if needed. */ + if (!disk->mirroring) { + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorBlockJob(priv->mon, device, NULL, 0, &info, + BLOCK_JOB_INFO); + qemuDomainObjExitMonitorWithDriver(driver, vm); + if (ret < 0) + goto cleanup; + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto cleanup; + } + if (ret == 1 && info.cur == info.end && + info.type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) + disk->mirroring = true; + } + + if (!disk->mirroring) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' not ready for pivot yet"), + disk->dst); + goto cleanup; + } + + /* Attempt the pivot. */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveReopen(priv->mon, device, disk->mirror, + disk->mirrorFormat); + qemuDomainObjExitMonitorWithDriver(driver, vm); + + /* XXX Until qemu adds support for 'drive-reopen' inside + * 'transaction', we have the remote risk of a catastrophic + * failure, where the drive-reopen fails but can't recover by + * reopening the source. Not much we can do about it. */ + + if (ret == 0) { + /* XXX We want to revoke security labels and disk lease, as + * well as audit that revocation, before dropping the original + * source. But it gets tricky if both source and mirror share + * common backing files (we want to only revoke the non-shared + * portion of the chain, and is made more difficult by the + * fact that we aren't tracking the full chain ourselves; so + * for now, we leak the access to the original. */ + VIR_FREE(disk->src); + VIR_FREE(disk->driverType); + disk->src = disk->mirror; + disk->driverType = disk->mirrorFormat; + disk->mirror = NULL; + disk->mirrorFormat = NULL; + disk->mirroring = false; + } else { + /* On failure, qemu abandons the mirror, and attempts to + * revert back to the source disk. Hopefully it was able to + * reopen things. */ + /* XXX should we be parsing the exact qemu error, or calling + * 'query-block', to see what state we really got left in + * before killing the mirroring job? And just as on the + * success case, there's security labeling to worry about. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } + +cleanup: + return ret; +} + static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, virDomainBlockJobInfoPtr info, @@ -11630,6 +11710,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, char *device = NULL; int ret = -1; int idx; + virDomainDiskDefPtr disk = NULL; qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11644,10 +11725,20 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, if (!device) { goto cleanup; } - if (mode == BLOCK_JOB_PULL && vm->def->disks[idx]->mirror) { + disk = vm->def->disks[idx]; + priv = vm->privateData; + + if (disk->mirror && mode == BLOCK_JOB_PULL) { qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, _("disk '%s' already in active block copy job"), - vm->def->disks[idx]->dst); + disk->dst); + goto cleanup; + } + if (!disk->mirror && mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, + _("pivot of disk '%s' requires an active copy job"), + disk->dst); goto cleanup; } @@ -11660,8 +11751,13 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, goto endjob; } + if (disk->mirror && mode == BLOCK_JOB_ABORT && + (flags & VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT)) { + ret = qemuDomainBlockPivot(driver, vm, device, disk); + goto endjob; + } + qemuDomainObjEnterMonitorWithDriver(driver, vm); - priv = vm->privateData; /* XXX - add a qemu capability check, since only qemu 1.1 or newer * supports the base argument. * XXX - libvirt should really be tracking the backing file chain @@ -11678,9 +11774,18 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, const char *base, /* Snoop block copy operations, so future cancel operations can * avoid checking if pivot is safe. */ - if (mode == BLOCK_JOB_INFO && ret == 1 && vm->def->disks[idx]->mirror && + if (mode == BLOCK_JOB_INFO && ret == 1 && disk->mirror && info->cur == info->end && info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COPY) - vm->def->disks[idx]->mirroring = true; + disk->mirroring = true; + + /* A successful block job cancelation stops any mirroring. */ + if (mode == BLOCK_JOB_ABORT && disk->mirror) { + /* XXX We should also revoke security labels and disk lease on + * the mirror, and audit that fact, before dropping things. */ + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + disk->mirroring = false; + } /* Qemu provides asynchronous block job cancellation, but without * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a @@ -11738,7 +11843,8 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC | + VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT, -1); return qemuDomainBlockJobImpl(dom, path, NULL, 0, NULL, BLOCK_JOB_ABORT, flags); } -- 1.7.7.6

Minimal patch to wire up all the pieces in the previous patches to actually enable a block copy job. By minimal, I mean that qemu creates the file (that is, no REUSE_EXT flag support yet), SELinux must be disabled, a lock manager is not informed, and the audit logs aren't updated. But those will be added as improvements in future patches. * src/qemu/qemu_driver.c (qemuDomainBlockCopy): New function. (qemuDomainBlockRebase): Call it when appropriate. --- src/qemu/qemu_driver.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 118 insertions(+), 1 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index dd844a1..1b1c921 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11868,10 +11868,126 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, } static int +qemuDomainBlockCopy(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + struct qemud_driver *driver = dom->conn->privateData; + virDomainObjPtr vm = NULL; + qemuDomainObjPrivatePtr priv; + char uuidstr[VIR_UUID_STRING_BUFLEN]; + char *device = NULL; + virDomainDiskDefPtr disk; + int ret = -1; + int idx; + int mode; + + /* Preliminaries: find the disk we are editing, sanity checks */ + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + + qemuDriverLock(driver); + virUUIDFormat(dom->uuid, uuidstr); + vm = virDomainFindByUUID(&driver->domains, dom->uuid); + if (!vm) { + qemuReportError(VIR_ERR_NO_DOMAIN, + _("no domain with matching uuid '%s'"), uuidstr); + goto cleanup; + } + + device = qemuDiskPathToAlias(vm, path, &idx); + if (!device) { + goto cleanup; + } + disk = vm->def->disks[idx]; + if (disk->mirror) { + qemuReportError(VIR_ERR_BLOCK_COPY_ACTIVE, + _("disk '%s' already in active block copy job"), + disk->dst); + goto cleanup; + } + + priv = vm->privateData; + if (!(qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_MIRROR) && + qemuCapsGet(priv->qemuCaps, QEMU_CAPS_DRIVE_REOPEN))) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("block copy is not supported with this QEMU binary")); + goto cleanup; + } + if (vm->persistent) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not transient")); + goto cleanup; + } + + if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_MODIFY) < 0) + goto cleanup; + + if (!virDomainObjIsActive(vm)) { + qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", + _("domain is not running")); + goto endjob; + } + + /* Prepare the destination file. */ + /* XXX We also need to add security labeling, lock manager lease, + * and auditing of those events, as well as to support reuse of + * existing images, including probing the existing format of an + * existing image. */ + if (!format) + format = disk->driverType; + if ((format && !(disk->mirrorFormat = strdup(format))) || + !(disk->mirror = strdup(dest))) { + virReportOOMError(); + goto endjob; + } + if (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) + mode = QEMU_MONITOR_DRIVE_MIRROR_ABSOLUTE; + else + mode = QEMU_MONITOR_DRIVE_MIRROR_NO_BACKING; + + /* Actually start the mirroring */ + qemuDomainObjEnterMonitorWithDriver(driver, vm); + ret = qemuMonitorDriveMirror(priv->mon, NULL, device, dest, format, mode); + if (ret == 0 && bandwidth != 0) + ret = qemuMonitorBlockJob(priv->mon, device, NULL, bandwidth, NULL, + BLOCK_JOB_SPEED_INTERNAL); + qemuDomainObjExitMonitorWithDriver(driver, vm); + +endjob: + if (ret < 0) { + VIR_FREE(disk->mirror); + VIR_FREE(disk->mirrorFormat); + } + if (qemuDomainObjEndJob(driver, vm) == 0) { + vm = NULL; + goto cleanup; + } + +cleanup: + VIR_FREE(device); + if (vm) + virDomainObjUnlock(vm); + qemuDriverUnlock(driver); + return ret; +} + +static int qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { - virCheckFlags(0, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); + + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY) { + const char *format = NULL; + if (flags & VIR_DOMAIN_BLOCK_REBASE_COPY_RAW) + format = "raw"; + flags &= ~(VIR_DOMAIN_BLOCK_REBASE_COPY | + VIR_DOMAIN_BLOCK_REBASE_COPY_RAW); + return qemuDomainBlockCopy(dom, path, base, format, bandwidth, flags); + } + return qemuDomainBlockJobImpl(dom, path, base, bandwidth, NULL, BLOCK_JOB_PULL, flags); } @@ -11880,6 +11996,7 @@ static int qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { + virCheckFlags(0, -1); return qemuDomainBlockRebase(dom, path, NULL, bandwidth, flags); } -- 1.7.7.6

This copies some of the checks from snapshots regarding testing when a file already exists. In the process, I noticed a missing sanity check in snapshots: REUSE_EXT should require the destination to already be present. * src/qemu/qemu_driver.c (qemuDomainBlockRebase): Allow REUSE_EXT flag. (qemuDomainBlockCopy): Wire up flag, and add some sanity checks. (qemuDomainSnapshotDiskPrepare): Require destination on REUSE_EXT. --- src/qemu/qemu_driver.c | 61 +++++++++++++++++++++++++++++++++++++++++++---- 1 files changed, 55 insertions(+), 6 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 1b1c921..0e9ea41 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -9812,6 +9812,11 @@ qemuDomainSnapshotDiskPrepare(virDomainObjPtr vm, virDomainSnapshotDefPtr def, _("unable to stat for disk %s: %s"), disk->name, disk->file); goto cleanup; + } else if (allow_reuse) { + virReportSystemError(errno, + _("missing existing file for disk %s: %s"), + disk->name, disk->file); + goto cleanup; } } else if (!(S_ISBLK(st.st_mode) || !st.st_size || allow_reuse)) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -11881,9 +11886,11 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, int ret = -1; int idx; int mode; + struct stat st; /* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1); qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11928,19 +11935,60 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } + if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + STREQ_NULLABLE(format, "raw") && + STRNEQ_NULLABLE(disk->driverType, "raw")) { + qemuReportError(VIR_ERR_INVALID_ARG, + _("raw shallow copy of non-raw disk '%s' not possible"), + disk->dst); + goto endjob; + } + /* XXX this is pessimistic; we could use 'query-block' or even + * keep track of the backing chain ourselves, rather than assuming + * that all non-raw source files have a current backing image */ + if ((flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) && + !(flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + STRNEQ_NULLABLE(disk->driverType, "raw")) { + qemuReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, + _("non-shallow copy of non-raw disk '%s' to existing " + "destination not supported"), + disk->dst); + goto endjob; + } + /* Prepare the destination file. */ + if (stat(dest, &st) < 0) { + if (errno != ENOENT) { + virReportSystemError(errno, _("unable to stat for disk %s: %s"), + disk->dst, dest); + goto endjob; + } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + virReportSystemError(errno, + _("missing destination file for disk %s: %s"), + disk->dst, dest); + goto endjob; + } + } else if (!(S_ISBLK(st.st_mode) || !st.st_size || + (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT))) { + qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("external destination file for disk %s already " + "exists and is not a block device: %s"), + disk->dst, dest); + goto endjob; + } + /* XXX We also need to add security labeling, lock manager lease, - * and auditing of those events, as well as to support reuse of - * existing images, including probing the existing format of an - * existing image. */ - if (!format) + * and auditing of those events. */ + if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) format = disk->driverType; if ((format && !(disk->mirrorFormat = strdup(format))) || !(disk->mirror = strdup(dest))) { virReportOOMError(); goto endjob; } - if (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) + if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) + mode = QEMU_MONITOR_DRIVE_MIRROR_EXISTING; + else if (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) mode = QEMU_MONITOR_DRIVE_MIRROR_ABSOLUTE; else mode = QEMU_MONITOR_DRIVE_MIRROR_NO_BACKING; @@ -11976,6 +12024,7 @@ qemuDomainBlockRebase(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags) { virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | + VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT | VIR_DOMAIN_BLOCK_REBASE_COPY | VIR_DOMAIN_BLOCK_REBASE_COPY_RAW, -1); -- 1.7.7.6

RHEL-only drive-mirror and drive-reopen are still under upstream qemu discussion; as a result, RHEL decided to backport things under a downstream name. Accommodate this alternate spelling. I don't think it's worth trying to support both spellings at once: if you build upstream libvirt on RHEL, you lose out on the feature, but then you are also capable of building upstream qemu for RHEL to reinstate the feature (that is, if you use RHEL, you should either stick to the distro patches, or you are assumed to be capable of building the entire virt stack yourself). * src/qemu/qemu_monitor_json.c (qemuMonitorJSONCheckCommands): Check for alternate spelling. (int qemuMonitorJSONDriveMirror, qemuMonitorJSONDriveReopen): Use that spelling. --- src/qemu/qemu_monitor_json.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index bb080e5..2718fef 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -973,9 +973,9 @@ qemuMonitorJSONCheckCommands(qemuMonitorPtr mon, qemuCapsSet(qemuCaps, QEMU_CAPS_WAKEUP); else if (STREQ(name, "transaction")) qemuCapsSet(qemuCaps, QEMU_CAPS_TRANSACTION); - else if (STREQ(name, "drive-mirror")) + else if (STREQ(name, "__com.redhat_drive-mirror")) qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_MIRROR); - else if (STREQ(name, "drive-reopen")) + else if (STREQ(name, "__com.redhat_drive-reopen")) qemuCapsSet(qemuCaps, QEMU_CAPS_DRIVE_REOPEN); } @@ -3203,7 +3203,7 @@ qemuMonitorJSONDriveMirror(qemuMonitorPtr mon ATTRIBUTE_UNUSED, virJSONValuePtr reply = NULL; cmd = qemuMonitorJSONMakeCommandRaw(actions != NULL, - "drive-mirror", + "__com.redhat_drive-mirror", "s:device", device, "s:target", file, "s:mode", @@ -3268,7 +3268,7 @@ qemuMonitorJSONDriveReopen(qemuMonitorPtr mon, const char *device, virJSONValuePtr cmd; virJSONValuePtr reply = NULL; - cmd = qemuMonitorJSONMakeCommand("drive-reopen", + cmd = qemuMonitorJSONMakeCommand("__com.redhat_drive-reopen", "s:device", device, "s:new-image-file", file, format ? "s:format" : NULL, format, -- 1.7.7.6

This new API provides additional flexibility over what can be crammed on top of virDomainBlockRebase (namely, the ability to specify an arbitrary destination format, for things like copying qcow2 into qed without having to pre-create the destination), at the expense that it cannot be backported without bumping the .so version. * include/libvirt/libvirt.h.in (virDomainBlockCopy): New API. * src/libvirt.c (virDomainBlockCopy): Implement it. * src/libvirt_public.syms (LIBVIRT_0.9.12): Export it. * src/driver.h (virDrvDomainBlockCopy): New driver callback. * docs/apibuild.py (CParser.parseSignature): Add exception. --- docs/apibuild.py | 1 + include/libvirt/libvirt.h.in | 18 ++++++- src/driver.h | 5 ++ src/libvirt.c | 119 +++++++++++++++++++++++++++++++++++++++++- src/libvirt_public.syms | 5 ++ 5 files changed, 145 insertions(+), 3 deletions(-) diff --git a/docs/apibuild.py b/docs/apibuild.py index 1ac0281..bf06f3b 100755 --- a/docs/apibuild.py +++ b/docs/apibuild.py @@ -1650,6 +1650,7 @@ class CParser: "virDomainBlockJobSetSpeed" : (False, ("bandwidth")), "virDomainBlockPull" : (False, ("bandwidth")), "virDomainBlockRebase" : (False, ("bandwidth")), + "virDomainBlockCopy" : (False, ("bandwidth")), "virDomainMigrateGetMaxSpeed" : (False, ("bandwidth")) } def checkLongLegacyFunction(self, name, return_type, signature): diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index ac5df95..7cd5bbe 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1937,7 +1937,7 @@ int virDomainUpdateDeviceFlags(virDomainPtr domain, * VIR_DOMAIN_BLOCK_JOB_TYPE_PULL: Block Pull (virDomainBlockPull, or * virDomainBlockRebase without flags), job ends on completion * VIR_DOMAIN_BLOCK_JOB_TYPE_COPY: Block Copy (virDomainBlockRebase with - * flags), job exists as long as mirroring is active + * flags, or virDomainBlockCopy), job exists as long as mirroring is active */ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN = 0, @@ -2007,6 +2007,22 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, const char *base, unsigned long bandwidth, unsigned int flags); +/** + * virDomainBlockCopyFlags: + * + * Flags available for virDomainBlockCopy(). + */ +typedef enum { + VIR_DOMAIN_BLOCK_COPY_SHALLOW = 1 << 0, /* Limit copy to top of source + backing chain */ + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT = 1 << 1, /* Reuse existing external + file for a copy */ +} virDomainBlockCopyFlags; + +int virDomainBlockCopy(virDomainPtr dom, const char *disk, const char *dest, + const char *format, unsigned long bandwidth, + unsigned int flags); + /* Block I/O throttling support */ diff --git a/src/driver.h b/src/driver.h index 03d249b..e10ba71 100644 --- a/src/driver.h +++ b/src/driver.h @@ -788,6 +788,10 @@ typedef int (*virDrvDomainBlockRebase)(virDomainPtr dom, const char *path, const char *base, unsigned long bandwidth, unsigned int flags); +typedef int + (*virDrvDomainBlockCopy)(virDomainPtr dom, const char *path, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags); typedef int (*virDrvSetKeepAlive)(virConnectPtr conn, @@ -1005,6 +1009,7 @@ struct _virDriver { virDrvDomainBlockJobSetSpeed domainBlockJobSetSpeed; virDrvDomainBlockPull domainBlockPull; virDrvDomainBlockRebase domainBlockRebase; + virDrvDomainBlockCopy domainBlockCopy; virDrvSetKeepAlive setKeepAlive; virDrvConnectIsAlive isAlive; virDrvNodeSuspendForDuration nodeSuspendForDuration; diff --git a/src/libvirt.c b/src/libvirt.c index d4b648c..7a8f7c0 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -18260,7 +18260,11 @@ error: * return an error if bandwidth is not 0. * * When @base is NULL and @flags is 0, this is identical to - * virDomainBlockPull(). + * virDomainBlockPull(). Conversely, when @flags includes + * VIR_DOMAIN_BLOCK_REBASE_COPY, this is shorthand for + * virDomainBlockCopy(dom, disk, base, + * flags & VIR_DOMAIN_BLOCK_COPY_RAW ? "raw" : NULL, bandwidth, + * flags & (VIR_DOMAIN_BLOCK_REBASE_SHALLOW|VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)). * * Returns 0 if the operation has started, -1 on failure. */ @@ -18270,7 +18274,7 @@ int virDomainBlockRebase(virDomainPtr dom, const char *disk, { virConnectPtr conn; - VIR_DOMAIN_DEBUG(dom, "disk=%s, base=%s bandwidth=%lu, flags=%x", + VIR_DOMAIN_DEBUG(dom, "disk=%s, base=%s, bandwidth=%lu, flags=%x", disk, NULLSTR(base), bandwidth, flags); virResetLastError(); @@ -18325,6 +18329,117 @@ error: /** + * virDomainBlockCopy: + * @dom: pointer to domain object + * @disk: path to the block device, or device shorthand + * @dest: path to the copy destination + * @format: format of the destination + * @bandwidth: (optional) specify copy bandwidth limit in Mbps + * @flags: bitwise-OR of virDomainBlockCopyFlags + * + * Start a block copy job, where @dest is the name of a new file to copy + * the chain to. By default, the copy will pull the entire source chain + * into the destination file, but if @flags also contains + * VIR_DOMAIN_BLOCK_COPY_SHALLOW, then only the top of the source chain + * will be copied (the source and destination have a common backing file). + * By default, @dest will be created with the same file format as the + * source, but this can be altered by passing a non-NULL @format argument, + * or by using VIR_DOMAIN_BLOCK_COPY_REUSE_EXT to reuse an existing file + * with initial contents identical to the backing file of the source (this + * allows a management app to pre-create files with relative backing file + * names, rather than the default of absolute backing file names; it is + * generally used with the shallow flag, since otherwise the destination + * file must start with empty contents). + * + * A copy job has two parts; in the first phase, the @bandwidth parameter + * affects how fast the source is pulled into the destination, and the job + * can only be canceled by reverting to the source file; progress in this + * phase can be tracked via the virDomainBlockJobInfo() command, with a + * job type of VIR_DOMAIN_BLOCK_JOB_TYPE_COPY. The job transitions to the + * second phase when the job info states cur == end, and remains alive to + * mirror all further changes to both source and destination. The user + * must call virDomainBlockJobAbort() to end the mirroring while choosing + * whether to revert to source or pivot to the destination. An event is + * issued when the job ends, and in the future, an event may be added when + * the job transitions from pulling to mirroring. + * + * Some hypervisors will restrict certain actions, such as virDomainSave() + * or virDomainDetachDevice(), while a copy job is active; they may + * also restrict a copy job to transient domains. + * + * The @disk parameter is either an unambiguous source name of the + * block device (the <source file='...'/> sub-element, such as + * "/path/to/image"), or the device target shorthand (the + * <target dev='...'/> sub-element, such as "xvda"). Valid names + * can be found by calling virDomainGetXMLDesc() and inspecting + * elements within //domain/devices/disk. + * + * The maximum bandwidth (in Mbps) that will be used to do the copy can be + * specified with the bandwidth parameter. If set to 0, libvirt will choose a + * suitable default. Some hypervisors do not support this feature and will + * return an error if bandwidth is not 0. + * + * When @format is NULL, this is equivalent to calling + * virDomainBlockRebase() with the VIR_DOMAIN_BLOCK_REBASE_COPY flag added + * to @flags. Additionally, if @format is "raw", this is equivalent to + * calling virDomainBlockRebase() with the VIR_DOMAIN_BLOCK_REBASE_COPY + * and VIR_DOMAIN_BLOCK_REBASE_COPY_RAW flags added to @flags. + * + * Returns 0 if the operation has started, -1 on failure. + */ +int virDomainBlockCopy(virDomainPtr dom, const char *disk, + const char *dest, const char *format, + unsigned long bandwidth, unsigned int flags) +{ + virConnectPtr conn; + + VIR_DOMAIN_DEBUG(dom, + "disk=%s, dest=%s, format=%s, bandwidth=%lu, flags=%x", + disk, dest, NULLSTR(format), bandwidth, flags); + + virResetLastError(); + + if (!VIR_IS_CONNECTED_DOMAIN (dom)) { + virLibDomainError(VIR_ERR_INVALID_DOMAIN, __FUNCTION__); + virDispatchError(NULL); + return -1; + } + conn = dom->conn; + + if (dom->conn->flags & VIR_CONNECT_RO) { + virLibDomainError(VIR_ERR_OPERATION_DENIED, __FUNCTION__); + goto error; + } + + if (!disk) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("disk is NULL")); + goto error; + } + if (!dest) { + virLibDomainError(VIR_ERR_INVALID_ARG, + _("dest is NULL")); + goto error; + } + + if (conn->driver->domainBlockCopy) { + int ret; + ret = conn->driver->domainBlockCopy(dom, disk, dest, format, bandwidth, + flags); + if (ret < 0) + goto error; + return ret; + } + + virLibDomainError(VIR_ERR_NO_SUPPORT, __FUNCTION__); + +error: + virDispatchError(dom->conn); + return -1; +} + + +/** * virDomainOpenGraphics: * @dom: pointer to domain object * @idx: index of graphics config to open diff --git a/src/libvirt_public.syms b/src/libvirt_public.syms index 46c13fb..d152ab9 100644 --- a/src/libvirt_public.syms +++ b/src/libvirt_public.syms @@ -534,4 +534,9 @@ LIBVIRT_0.9.11 { virDomainPMWakeup; } LIBVIRT_0.9.10; +LIBVIRT_0.9.12 { + global: + virDomainBlockCopy; +} LIBVIRT_0.9.11; + # .... define new API here using predicted next version number .... -- 1.7.7.6

Expose the full abilities of virDomainBlockCopy. * tools/virsh.c (blockJobImpl): Add --format option for block copy. * tools/virsh.pod (blockcopy): Document this. --- tools/virsh.c | 25 ++++++++++++++++++++----- tools/virsh.pod | 10 +++++----- 2 files changed, 25 insertions(+), 10 deletions(-) diff --git a/tools/virsh.c b/tools/virsh.c index 99f3d22..e855951 100644 --- a/tools/virsh.c +++ b/tools/virsh.c @@ -7527,6 +7527,7 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, unsigned long bandwidth = 0; int ret = -1; const char *base = NULL; + const char *format = NULL; unsigned int flags = 0; if (!vshConnectionUsability(ctl, ctl->conn)) @@ -7568,14 +7569,27 @@ blockJobImpl(vshControl *ctl, const vshCmd *cmd, case VSH_CMD_BLOCK_JOB_COPY: flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; if (vshCommandOptBool(cmd, "shallow")) - flags |= VIR_DOMAIN_BLOCK_REBASE_SHALLOW; + flags |= VIR_DOMAIN_BLOCK_COPY_SHALLOW; if (vshCommandOptBool(cmd, "reuse-external")) - flags |= VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT; - if (vshCommandOptBool(cmd, "raw")) - flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + flags |= VIR_DOMAIN_BLOCK_COPY_REUSE_EXT; if (vshCommandOptString(cmd, "dest", &base) < 0) goto cleanup; - ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); + if (vshCommandOptString(cmd, "format", &format) < 0) + goto cleanup; + if (!format) { + if (vshCommandOptBool(cmd, "raw")) + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY_RAW; + flags |= VIR_DOMAIN_BLOCK_REBASE_COPY; + ret = virDomainBlockRebase(dom, path, base, bandwidth, flags); + } else { + if (vshCommandOptBool(cmd, "raw")) { + vshError(ctl, "%s", + _("--raw and --format are mutually exclusive")); + goto cleanup; + } + ret = virDomainBlockCopy(dom, path, base, format, + bandwidth, flags); + } } cleanup: @@ -7598,6 +7612,7 @@ static const vshCmdOptDef opts_block_copy[] = { {"path", VSH_OT_DATA, VSH_OFLAG_REQ, N_("Fully-qualified path of disk")}, {"dest", VSH_OT_DATA, VSH_OFLAG_REQ, N_("path of the copy to create")}, {"bandwidth", VSH_OT_DATA, VSH_OFLAG_NONE, N_("Bandwidth limit in MB/s")}, + {"format", VSH_OT_DATA, VSH_OFLAG_NONE, N_("file format of dest")}, {"shallow", VSH_OT_BOOL, 0, N_("make the copy share a backing chain")}, {"reuse-external", VSH_OT_BOOL, 0, N_("reuse existing destination")}, {"raw", VSH_OT_BOOL, 0, N_("use raw destination file")}, diff --git a/tools/virsh.pod b/tools/virsh.pod index 50a495d..1e5ce12 100644 --- a/tools/virsh.pod +++ b/tools/virsh.pod @@ -639,7 +639,7 @@ address of virtual interface (such as I<detach-interface> or I<domif-setlink>) will accept the MAC address printed by this command. =item B<blockcopy> I<domain> I<path> I<dest> [I<bandwidth>] [I<--shallow>] -[I<--reuse-external>] [I<--raw>] +[I<--reuse-external>] { [I<--raw>] | I<format> } Copy a disk backing image chain to I<dest>. By default, this command flattens the entire chain; but if I<--shallow> is specified, the copy @@ -652,10 +652,10 @@ is used, otherwise it must start empty); this option is typically used to set up a relative backing file name in the destination. The format of the destination is determined by the first match in the -following list: if I<--raw> is specified, it will be raw; if -I<--reuse-external> is specified, the existing destination is probed -for a format; and in all other cases, the destination format will -match the source format. +following list: if I<--raw> is specified, it will be raw; if I<format> +is specified, it will have that format; if I<--reuse-external> is +specified, the existing destination is probed for a format; and in +all other cases, the destination format will match the source format. The copy runs in the background; initially, the job must copy all data from the source, and during this phase, the job can only be canceled to -- 1.7.7.6

Almost trivial; the trick was dealing with the fact that we're stuck with 'unsigned long bandwidth' due to earlier design decisions. Also, prefer the correct flag names (although we intentionally chose the _SHALLOW and _REUSE_EXT values to be equal, to allow for loose handling in our code). * src/remote/remote_protocol.x (remote_domain_block_copy_args): New struct. * src/remote/remote_driver.c (remote_driver): Use it. * src/rpc/gendispatch.pl (name_to_ProcName): Cater to legacy bandwidth type. * src/remote_protocol-structs: Regenerate. * src/qemu/qemu_driver.c (qemuDriver): Wire up driver callback. (qemuDomainBlockCopy): Use preferred flag names. --- src/qemu/qemu_driver.c | 21 +++++++++++---------- src/remote/remote_driver.c | 1 + src/remote/remote_protocol.x | 12 +++++++++++- src/remote_protocol-structs | 9 +++++++++ src/rpc/gendispatch.pl | 1 + 5 files changed, 33 insertions(+), 11 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0e9ea41..7ba41ee 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11889,8 +11889,8 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, struct stat st; /* Preliminaries: find the disk we are editing, sanity checks */ - virCheckFlags(VIR_DOMAIN_BLOCK_REBASE_SHALLOW | - VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT, -1); + virCheckFlags(VIR_DOMAIN_BLOCK_COPY_SHALLOW | + VIR_DOMAIN_BLOCK_COPY_REUSE_EXT, -1); qemuDriverLock(driver); virUUIDFormat(dom->uuid, uuidstr); @@ -11935,7 +11935,7 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, goto endjob; } - if ((flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + if ((flags & VIR_DOMAIN_BLOCK_COPY_SHALLOW) && STREQ_NULLABLE(format, "raw") && STRNEQ_NULLABLE(disk->driverType, "raw")) { qemuReportError(VIR_ERR_INVALID_ARG, @@ -11946,8 +11946,8 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, /* XXX this is pessimistic; we could use 'query-block' or even * keep track of the backing chain ourselves, rather than assuming * that all non-raw source files have a current backing image */ - if ((flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) && - !(flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) && + if ((flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT) && + !(flags & VIR_DOMAIN_BLOCK_COPY_SHALLOW) && STRNEQ_NULLABLE(disk->driverType, "raw")) { qemuReportError(VIR_ERR_ARGUMENT_UNSUPPORTED, _("non-shallow copy of non-raw disk '%s' to existing " @@ -11962,14 +11962,14 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, virReportSystemError(errno, _("unable to stat for disk %s: %s"), disk->dst, dest); goto endjob; - } else if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) { + } else if (flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT) { virReportSystemError(errno, _("missing destination file for disk %s: %s"), disk->dst, dest); goto endjob; } } else if (!(S_ISBLK(st.st_mode) || !st.st_size || - (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT))) { + (flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT))) { qemuReportError(VIR_ERR_CONFIG_UNSUPPORTED, _("external destination file for disk %s already " "exists and is not a block device: %s"), @@ -11979,16 +11979,16 @@ qemuDomainBlockCopy(virDomainPtr dom, const char *path, /* XXX We also need to add security labeling, lock manager lease, * and auditing of those events. */ - if (!format && !(flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT)) + if (!format && !(flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT)) format = disk->driverType; if ((format && !(disk->mirrorFormat = strdup(format))) || !(disk->mirror = strdup(dest))) { virReportOOMError(); goto endjob; } - if (flags & VIR_DOMAIN_BLOCK_REBASE_REUSE_EXT) + if (flags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT) mode = QEMU_MONITOR_DRIVE_MIRROR_EXISTING; - else if (flags & VIR_DOMAIN_BLOCK_REBASE_SHALLOW) + else if (flags & VIR_DOMAIN_BLOCK_COPY_SHALLOW) mode = QEMU_MONITOR_DRIVE_MIRROR_ABSOLUTE; else mode = QEMU_MONITOR_DRIVE_MIRROR_NO_BACKING; @@ -13143,6 +13143,7 @@ static virDriver qemuDriver = { .domainBlockJobSetSpeed = qemuDomainBlockJobSetSpeed, /* 0.9.4 */ .domainBlockPull = qemuDomainBlockPull, /* 0.9.4 */ .domainBlockRebase = qemuDomainBlockRebase, /* 0.9.10 */ + .domainBlockCopy = qemuDomainBlockCopy, /* 0.9.12 */ .isAlive = qemuIsAlive, /* 0.9.8 */ .nodeSuspendForDuration = nodeSuspendForDuration, /* 0.9.8 */ .domainSetBlockIoTune = qemuDomainSetBlockIoTune, /* 0.9.8 */ diff --git a/src/remote/remote_driver.c b/src/remote/remote_driver.c index af46384..b431779 100644 --- a/src/remote/remote_driver.c +++ b/src/remote/remote_driver.c @@ -5095,6 +5095,7 @@ static virDriver remote_driver = { .domainBlockJobSetSpeed = remoteDomainBlockJobSetSpeed, /* 0.9.4 */ .domainBlockPull = remoteDomainBlockPull, /* 0.9.4 */ .domainBlockRebase = remoteDomainBlockRebase, /* 0.9.10 */ + .domainBlockCopy = remoteDomainBlockCopy, /* 0.9.12 */ .setKeepAlive = remoteSetKeepAlive, /* 0.9.8 */ .isAlive = remoteIsAlive, /* 0.9.8 */ .nodeSuspendForDuration = remoteNodeSuspendForDuration, /* 0.9.8 */ diff --git a/src/remote/remote_protocol.x b/src/remote/remote_protocol.x index 2d57247..65b4a1d 100644 --- a/src/remote/remote_protocol.x +++ b/src/remote/remote_protocol.x @@ -1188,6 +1188,14 @@ struct remote_domain_block_rebase_args { unsigned hyper bandwidth; unsigned int flags; }; +struct remote_domain_block_copy_args { + remote_nonnull_domain dom; + remote_nonnull_string path; + remote_nonnull_string dest; + remote_string format; + unsigned hyper bandwidth; + unsigned int flags; +}; struct remote_domain_set_block_io_tune_args { remote_nonnull_domain dom; @@ -2782,7 +2790,9 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_PM_WAKEUP = 267, /* autogen autogen */ REMOTE_PROC_DOMAIN_EVENT_TRAY_CHANGE = 268, /* autogen autogen */ REMOTE_PROC_DOMAIN_EVENT_PMWAKEUP = 269, /* autogen autogen */ - REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270 /* autogen autogen */ + REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270, /* autogen autogen */ + + REMOTE_PROC_DOMAIN_BLOCK_COPY = 271 /* autogen autogen */ /* * Notice how the entries are grouped in sets of 10 ? diff --git a/src/remote_protocol-structs b/src/remote_protocol-structs index 9b2414f..dd2aa72 100644 --- a/src/remote_protocol-structs +++ b/src/remote_protocol-structs @@ -845,6 +845,14 @@ struct remote_domain_block_rebase_args { uint64_t bandwidth; u_int flags; }; +struct remote_domain_block_copy_args { + remote_nonnull_domain dom; + remote_nonnull_string path; + remote_nonnull_string dest; + remote_string format; + uint64_t bandwidth; + u_int flags; +}; struct remote_domain_set_block_io_tune_args { remote_nonnull_domain dom; remote_nonnull_string disk; @@ -2192,4 +2200,5 @@ enum remote_procedure { REMOTE_PROC_DOMAIN_EVENT_TRAY_CHANGE = 268, REMOTE_PROC_DOMAIN_EVENT_PMWAKEUP = 269, REMOTE_PROC_DOMAIN_EVENT_PMSUSPEND = 270, + REMOTE_PROC_DOMAIN_BLOCK_COPY = 271, }; diff --git a/src/rpc/gendispatch.pl b/src/rpc/gendispatch.pl index f161ee0..e5e28e0 100755 --- a/src/rpc/gendispatch.pl +++ b/src/rpc/gendispatch.pl @@ -232,6 +232,7 @@ my $long_legacy = { NodeGetInfo => { ret => { memory => 1 } }, DomainBlockPull => { arg => { bandwidth => 1 } }, DomainBlockRebase => { arg => { bandwidth => 1 } }, + DomainBlockCopy => { arg => { bandwidth => 1 } }, DomainBlockJobSetSpeed => { arg => { bandwidth => 1 } }, DomainMigrateGetMaxSpeed => { ret => { bandwidth => 1 } }, }; -- 1.7.7.6
participants (2)
-
Eric Blake
-
Laine Stump