[libvirt] [PATCH] BlockJob: Support sync/async virDomainBlockJobAbort

Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored. To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API: A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled. A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status. Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block. This patch implements the new event type, the API flag, and the polling. The main outstanding issue is whether we should bound the amount of time we will wait for cancellation and return an error. Comments on this proposal? Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 ++++++++++ src/libvirt.c | 9 ++++++++- src/qemu/qemu_driver.c | 24 +++++++++++++++++------- src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 36 +++++++++++++++++++++++++++++------- 6 files changed, 90 insertions(+), 15 deletions(-) diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index e436f3c..fc7f028 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1776,6 +1776,15 @@ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, } virDomainBlockJobType; +/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1, +} virDomainBlockJobAbortFlags; + /* An iterator for monitoring block job operations */ typedef unsigned long long virDomainBlockJobCursor; @@ -3293,6 +3302,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELLED = 2, } virConnectDomainEventBlockJobStatus; /** diff --git a/src/libvirt.c b/src/libvirt.c index a540424..0e886f5 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17299,7 +17299,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise or of virDomainBlockJobAbortFlags * * Cancel the active block job on the given disk. * @@ -17310,6 +17310,13 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, pass + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous + * behavior. When the job has been cancelled, a BlockJob event will be emitted. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 712f1fc..d971a5f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11379,7 +11379,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int flags, int mode) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11420,6 +11420,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, qemuDomainObjEnterMonitorWithDriver(driver, vm); priv = vm->privateData; ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode); + /* + * Qemu provides asynchronous block job cancellation but without the + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous + * operation. Provide this behavior by waiting here (with the monitor + * locked) so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) + ret = qemuMonitorBlockJobCancelWait(priv->mon, device); qemuDomainObjExitMonitorWithDriver(driver, vm); endjob: @@ -11439,8 +11448,7 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, 0, NULL, BLOCK_JOB_ABORT); + return qemuDomainBlockJobImpl(dom, path, 0, NULL, flags, BLOCK_JOB_ABORT); } static int @@ -11448,7 +11456,7 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, 0, info, BLOCK_JOB_INFO); + return qemuDomainBlockJobImpl(dom, path, 0, info, flags, BLOCK_JOB_INFO); } static int @@ -11456,7 +11464,8 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, BLOCK_JOB_SPEED); + return qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, + BLOCK_JOB_SPEED); } static int @@ -11466,9 +11475,10 @@ qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, int ret; virCheckFlags(0, -1); - ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, BLOCK_JOB_PULL); + ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, + BLOCK_JOB_PULL); if (ret == 0 && bandwidth != 0) - ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, + ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, BLOCK_JOB_SPEED); return ret; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ad7e2a5..5c22486 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2585,6 +2585,30 @@ int qemuMonitorScreendump(qemuMonitorPtr mon, return ret; } +/* Poll the monitor to wait for the block job on a given disk to end. + * We don't need to worry about another block job starting since we have the + * driver locked. */ +int +qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device) +{ + VIR_DEBUG("mon=%p, device=%p", mon, device); + /* XXX: Should we provide some sort of escape hatch for this wait? */ + while (1) { + /* Poll every 100ms */ + int ret; + struct timespec ts = { .tv_sec = 0, .tv_nsec = 100 * 1000 * 1000ull }; + virDomainBlockJobInfo info; + + ret = qemuMonitorBlockJob(mon, device, 0, &info, BLOCK_JOB_INFO); + if (ret < 0) + return -1; + else if (ret == 0) + return 0; + else + nanosleep(&ts, NULL); + } +} + int qemuMonitorBlockJob(qemuMonitorPtr mon, const char *device, unsigned long bandwidth, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 15acf8b..afc081e 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -510,6 +510,8 @@ typedef enum { BLOCK_JOB_PULL = 3, } BLOCK_JOB_CMD; +int qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device); + int qemuMonitorBlockJob(qemuMonitorPtr mon, const char *device, unsigned long bandwidth, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 7eb2a92..2ca8eeb 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -57,7 +57,8 @@ static void qemuMonitorJSONHandleIOError(qemuMonitorPtr mon, virJSONValuePtr dat static void qemuMonitorJSONHandleVNCConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, virJSONValuePtr data); struct { const char *type; @@ -73,7 +74,8 @@ struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCancelled, }, }; @@ -685,13 +687,14 @@ static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValueP qemuMonitorJSONHandleVNC(mon, data, VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT); } -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data) +static void qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, + virJSONValuePtr data, + int event) { const char *device; const char *type_str; int type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; unsigned long long offset, len; - int status = VIR_DOMAIN_BLOCK_JOB_FAILED; if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in block job event"); @@ -716,13 +719,32 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL; - if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED; + break; + case VIR_DOMAIN_BLOCK_JOB_FAILED: + case VIR_DOMAIN_BLOCK_JOB_CANCELLED: + break; + } out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); +} + +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_COMPLETED); } +static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_CANCELLED); +} int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, -- 1.7.5.rc1

On Fri, Jan 13, 2012 at 02:51:23PM -0600, Adam Litke wrote:
Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled.
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
This patch implements the new event type, the API flag, and the polling. The main outstanding issue is whether we should bound the amount of time we will wait for cancellation and return an error.
Comments on this proposal?
Hi Adam, Thanks for the patch. That approach seems reasonable. Re: bounding the amount of time we wait for cancellation, if the time isn't bounded, what happens if the cancellation never happens? IMO the most important thing is to make sure that the caller has a way to return to a normally functioning state even if the virDomainBlockJobAbort call is unable to cancel the job. I'll defer to the other folks on the code, since I am at this point likely to do more harm than good. :) Dave
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 ++++++++++ src/libvirt.c | 9 ++++++++- src/qemu/qemu_driver.c | 24 +++++++++++++++++------- src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 36 +++++++++++++++++++++++++++++------- 6 files changed, 90 insertions(+), 15 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index e436f3c..fc7f028 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1776,6 +1776,15 @@ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, } virDomainBlockJobType;
+/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1, +} virDomainBlockJobAbortFlags; + /* An iterator for monitoring block job operations */ typedef unsigned long long virDomainBlockJobCursor;
@@ -3293,6 +3302,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELLED = 2, } virConnectDomainEventBlockJobStatus;
/** diff --git a/src/libvirt.c b/src/libvirt.c index a540424..0e886f5 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17299,7 +17299,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise or of virDomainBlockJobAbortFlags * * Cancel the active block job on the given disk. * @@ -17310,6 +17310,13 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, pass + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous + * behavior. When the job has been cancelled, a BlockJob event will be emitted. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 712f1fc..d971a5f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11379,7 +11379,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int flags, int mode) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11420,6 +11420,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, qemuDomainObjEnterMonitorWithDriver(driver, vm); priv = vm->privateData; ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode); + /* + * Qemu provides asynchronous block job cancellation but without the + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous + * operation. Provide this behavior by waiting here (with the monitor + * locked) so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) + ret = qemuMonitorBlockJobCancelWait(priv->mon, device); qemuDomainObjExitMonitorWithDriver(driver, vm);
endjob: @@ -11439,8 +11448,7 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, 0, NULL, BLOCK_JOB_ABORT); + return qemuDomainBlockJobImpl(dom, path, 0, NULL, flags, BLOCK_JOB_ABORT); }
static int @@ -11448,7 +11456,7 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, 0, info, BLOCK_JOB_INFO); + return qemuDomainBlockJobImpl(dom, path, 0, info, flags, BLOCK_JOB_INFO); }
static int @@ -11456,7 +11464,8 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, BLOCK_JOB_SPEED); + return qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, + BLOCK_JOB_SPEED); }
static int @@ -11466,9 +11475,10 @@ qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, int ret;
virCheckFlags(0, -1); - ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, BLOCK_JOB_PULL); + ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, + BLOCK_JOB_PULL); if (ret == 0 && bandwidth != 0) - ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, + ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, BLOCK_JOB_SPEED); return ret; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ad7e2a5..5c22486 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2585,6 +2585,30 @@ int qemuMonitorScreendump(qemuMonitorPtr mon, return ret; }
+/* Poll the monitor to wait for the block job on a given disk to end. + * We don't need to worry about another block job starting since we have the + * driver locked. */ +int +qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device) +{ + VIR_DEBUG("mon=%p, device=%p", mon, device); + /* XXX: Should we provide some sort of escape hatch for this wait? */ + while (1) { + /* Poll every 100ms */ + int ret; + struct timespec ts = { .tv_sec = 0, .tv_nsec = 100 * 1000 * 1000ull }; + virDomainBlockJobInfo info; + + ret = qemuMonitorBlockJob(mon, device, 0, &info, BLOCK_JOB_INFO); + if (ret < 0) + return -1; + else if (ret == 0) + return 0; + else + nanosleep(&ts, NULL); + } +} + int qemuMonitorBlockJob(qemuMonitorPtr mon, const char *device, unsigned long bandwidth, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 15acf8b..afc081e 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -510,6 +510,8 @@ typedef enum { BLOCK_JOB_PULL = 3, } BLOCK_JOB_CMD;
+int qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device); + int qemuMonitorBlockJob(qemuMonitorPtr mon, const char *device, unsigned long bandwidth, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 7eb2a92..2ca8eeb 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -57,7 +57,8 @@ static void qemuMonitorJSONHandleIOError(qemuMonitorPtr mon, virJSONValuePtr dat static void qemuMonitorJSONHandleVNCConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, virJSONValuePtr data);
struct { const char *type; @@ -73,7 +74,8 @@ struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCancelled, }, };
@@ -685,13 +687,14 @@ static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValueP qemuMonitorJSONHandleVNC(mon, data, VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT); }
-static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data) +static void qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, + virJSONValuePtr data, + int event) { const char *device; const char *type_str; int type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; unsigned long long offset, len; - int status = VIR_DOMAIN_BLOCK_JOB_FAILED;
if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in block job event"); @@ -716,13 +719,32 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
- if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED; + break; + case VIR_DOMAIN_BLOCK_JOB_FAILED: + case VIR_DOMAIN_BLOCK_JOB_CANCELLED: + break; + }
out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); +} + +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_COMPLETED); }
+static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_CANCELLED); +}
int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, -- 1.7.5.rc1

On Fri, Jan 13, 2012 at 04:48:30PM -0500, Dave Allan wrote:
On Fri, Jan 13, 2012 at 02:51:23PM -0600, Adam Litke wrote:
Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled.
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
This patch implements the new event type, the API flag, and the polling. The main outstanding issue is whether we should bound the amount of time we will wait for cancellation and return an error.
Comments on this proposal?
Hi Adam,
Thanks for the patch. That approach seems reasonable. Re: bounding the amount of time we wait for cancellation, if the time isn't bounded, what happens if the cancellation never happens? IMO the most important thing is to make sure that the caller has a way to return to a normally functioning state even if the virDomainBlockJobAbort call is unable to cancel the job.
Yes, agreed. But this brings up the familiar problem with timeouts: selecting the right amount of time to wait. I don't have a good answer here, but it's not really all that bad if we bail out too early. In that case we can just return VIR_ERR_OPERATION_TIMEOUT to the caller and they can retry the operation again. Unfortunately, these semantics were not present in the initial API. Is adding a new error condition (timeout) to an existing API allowed?
I'll defer to the other folks on the code, since I am at this point likely to do more harm than good. :)
Dave
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 ++++++++++ src/libvirt.c | 9 ++++++++- src/qemu/qemu_driver.c | 24 +++++++++++++++++------- src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 36 +++++++++++++++++++++++++++++------- 6 files changed, 90 insertions(+), 15 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index e436f3c..fc7f028 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1776,6 +1776,15 @@ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, } virDomainBlockJobType;
+/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1, +} virDomainBlockJobAbortFlags; + /* An iterator for monitoring block job operations */ typedef unsigned long long virDomainBlockJobCursor;
@@ -3293,6 +3302,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELLED = 2, } virConnectDomainEventBlockJobStatus;
/** diff --git a/src/libvirt.c b/src/libvirt.c index a540424..0e886f5 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17299,7 +17299,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise or of virDomainBlockJobAbortFlags * * Cancel the active block job on the given disk. * @@ -17310,6 +17310,13 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, pass + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous + * behavior. When the job has been cancelled, a BlockJob event will be emitted. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 712f1fc..d971a5f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11379,7 +11379,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int flags, int mode) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11420,6 +11420,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, qemuDomainObjEnterMonitorWithDriver(driver, vm); priv = vm->privateData; ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode); + /* + * Qemu provides asynchronous block job cancellation but without the + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous + * operation. Provide this behavior by waiting here (with the monitor + * locked) so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) + ret = qemuMonitorBlockJobCancelWait(priv->mon, device); qemuDomainObjExitMonitorWithDriver(driver, vm);
endjob: @@ -11439,8 +11448,7 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, 0, NULL, BLOCK_JOB_ABORT); + return qemuDomainBlockJobImpl(dom, path, 0, NULL, flags, BLOCK_JOB_ABORT); }
static int @@ -11448,7 +11456,7 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, 0, info, BLOCK_JOB_INFO); + return qemuDomainBlockJobImpl(dom, path, 0, info, flags, BLOCK_JOB_INFO); }
static int @@ -11456,7 +11464,8 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, BLOCK_JOB_SPEED); + return qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, + BLOCK_JOB_SPEED); }
static int @@ -11466,9 +11475,10 @@ qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, int ret;
virCheckFlags(0, -1); - ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, BLOCK_JOB_PULL); + ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, + BLOCK_JOB_PULL); if (ret == 0 && bandwidth != 0) - ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, + ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, BLOCK_JOB_SPEED); return ret; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ad7e2a5..5c22486 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2585,6 +2585,30 @@ int qemuMonitorScreendump(qemuMonitorPtr mon, return ret; }
+/* Poll the monitor to wait for the block job on a given disk to end. + * We don't need to worry about another block job starting since we have the + * driver locked. */ +int +qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device) +{ + VIR_DEBUG("mon=%p, device=%p", mon, device); + /* XXX: Should we provide some sort of escape hatch for this wait? */ + while (1) { + /* Poll every 100ms */ + int ret; + struct timespec ts = { .tv_sec = 0, .tv_nsec = 100 * 1000 * 1000ull }; + virDomainBlockJobInfo info; + + ret = qemuMonitorBlockJob(mon, device, 0, &info, BLOCK_JOB_INFO); + if (ret < 0) + return -1; + else if (ret == 0) + return 0; + else + nanosleep(&ts, NULL); + } +} + int qemuMonitorBlockJob(qemuMonitorPtr mon, const char *device, unsigned long bandwidth, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 15acf8b..afc081e 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -510,6 +510,8 @@ typedef enum { BLOCK_JOB_PULL = 3, } BLOCK_JOB_CMD;
+int qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device); + int qemuMonitorBlockJob(qemuMonitorPtr mon, const char *device, unsigned long bandwidth, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 7eb2a92..2ca8eeb 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -57,7 +57,8 @@ static void qemuMonitorJSONHandleIOError(qemuMonitorPtr mon, virJSONValuePtr dat static void qemuMonitorJSONHandleVNCConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, virJSONValuePtr data);
struct { const char *type; @@ -73,7 +74,8 @@ struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCancelled, }, };
@@ -685,13 +687,14 @@ static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValueP qemuMonitorJSONHandleVNC(mon, data, VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT); }
-static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data) +static void qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, + virJSONValuePtr data, + int event) { const char *device; const char *type_str; int type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; unsigned long long offset, len; - int status = VIR_DOMAIN_BLOCK_JOB_FAILED;
if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in block job event"); @@ -716,13 +719,32 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
- if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED; + break; + case VIR_DOMAIN_BLOCK_JOB_FAILED: + case VIR_DOMAIN_BLOCK_JOB_CANCELLED: + break; + }
out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); +} + +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_COMPLETED); }
+static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_CANCELLED); +}
int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, -- 1.7.5.rc1
-- Adam Litke <agl@us.ibm.com> IBM Linux Technology Center

On Fri, Jan 13, 2012 at 04:30:56PM -0600, Adam Litke wrote:
On Fri, Jan 13, 2012 at 04:48:30PM -0500, Dave Allan wrote:
On Fri, Jan 13, 2012 at 02:51:23PM -0600, Adam Litke wrote:
Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled.
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
This patch implements the new event type, the API flag, and the polling. The main outstanding issue is whether we should bound the amount of time we will wait for cancellation and return an error.
Comments on this proposal?
Hi Adam,
Thanks for the patch. That approach seems reasonable. Re: bounding the amount of time we wait for cancellation, if the time isn't bounded, what happens if the cancellation never happens? IMO the most important thing is to make sure that the caller has a way to return to a normally functioning state even if the virDomainBlockJobAbort call is unable to cancel the job.
Yes, agreed. But this brings up the familiar problem with timeouts: selecting the right amount of time to wait. I don't have a good answer here, but it's not really all that bad if we bail out too early. In that case we can just return VIR_ERR_OPERATION_TIMEOUT to the caller and they can retry the operation again. Unfortunately, these semantics were not present in the initial API. Is adding a new error condition (timeout) to an existing API allowed?
Yes, a new error return value is fine for existing API.
I'll defer to the other folks on the code, since I am at this point likely to do more harm than good. :)
Dave
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 ++++++++++ src/libvirt.c | 9 ++++++++- src/qemu/qemu_driver.c | 24 +++++++++++++++++------- src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 36 +++++++++++++++++++++++++++++------- 6 files changed, 90 insertions(+), 15 deletions(-)
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index e436f3c..fc7f028 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1776,6 +1776,15 @@ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, } virDomainBlockJobType;
+/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1, +} virDomainBlockJobAbortFlags; + /* An iterator for monitoring block job operations */ typedef unsigned long long virDomainBlockJobCursor;
@@ -3293,6 +3302,7 @@ typedef void (*virConnectDomainEventGraphicsCallback)(virConnectPtr conn, typedef enum { VIR_DOMAIN_BLOCK_JOB_COMPLETED = 0, VIR_DOMAIN_BLOCK_JOB_FAILED = 1, + VIR_DOMAIN_BLOCK_JOB_CANCELLED = 2, } virConnectDomainEventBlockJobStatus;
/** diff --git a/src/libvirt.c b/src/libvirt.c index a540424..0e886f5 100644 --- a/src/libvirt.c +++ b/src/libvirt.c @@ -17299,7 +17299,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise or of virDomainBlockJobAbortFlags * * Cancel the active block job on the given disk. * @@ -17310,6 +17310,13 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, pass + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous + * behavior. When the job has been cancelled, a BlockJob event will be emitted. + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 712f1fc..d971a5f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -11379,7 +11379,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int flags, int mode) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11420,6 +11420,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, qemuDomainObjEnterMonitorWithDriver(driver, vm); priv = vm->privateData; ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode); + /* + * Qemu provides asynchronous block job cancellation but without the + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous + * operation. Provide this behavior by waiting here (with the monitor + * locked) so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) + ret = qemuMonitorBlockJobCancelWait(priv->mon, device); qemuDomainObjExitMonitorWithDriver(driver, vm);
endjob: @@ -11439,8 +11448,7 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, 0, NULL, BLOCK_JOB_ABORT); + return qemuDomainBlockJobImpl(dom, path, 0, NULL, flags, BLOCK_JOB_ABORT); }
static int @@ -11448,7 +11456,7 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, 0, info, BLOCK_JOB_INFO); + return qemuDomainBlockJobImpl(dom, path, 0, info, flags, BLOCK_JOB_INFO); }
static int @@ -11456,7 +11464,8 @@ qemuDomainBlockJobSetSpeed(virDomainPtr dom, const char *path, unsigned long bandwidth, unsigned int flags) { virCheckFlags(0, -1); - return qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, BLOCK_JOB_SPEED); + return qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, + BLOCK_JOB_SPEED); }
static int @@ -11466,9 +11475,10 @@ qemuDomainBlockPull(virDomainPtr dom, const char *path, unsigned long bandwidth, int ret;
virCheckFlags(0, -1); - ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, BLOCK_JOB_PULL); + ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, + BLOCK_JOB_PULL); if (ret == 0 && bandwidth != 0) - ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, + ret = qemuDomainBlockJobImpl(dom, path, bandwidth, NULL, flags, BLOCK_JOB_SPEED); return ret; } diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index ad7e2a5..5c22486 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -2585,6 +2585,30 @@ int qemuMonitorScreendump(qemuMonitorPtr mon, return ret; }
+/* Poll the monitor to wait for the block job on a given disk to end. + * We don't need to worry about another block job starting since we have the + * driver locked. */ +int +qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device) +{ + VIR_DEBUG("mon=%p, device=%p", mon, device); + /* XXX: Should we provide some sort of escape hatch for this wait? */ + while (1) { + /* Poll every 100ms */ + int ret; + struct timespec ts = { .tv_sec = 0, .tv_nsec = 100 * 1000 * 1000ull }; + virDomainBlockJobInfo info; + + ret = qemuMonitorBlockJob(mon, device, 0, &info, BLOCK_JOB_INFO); + if (ret < 0) + return -1; + else if (ret == 0) + return 0; + else + nanosleep(&ts, NULL); + } +} + int qemuMonitorBlockJob(qemuMonitorPtr mon, const char *device, unsigned long bandwidth, diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 15acf8b..afc081e 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -510,6 +510,8 @@ typedef enum { BLOCK_JOB_PULL = 3, } BLOCK_JOB_CMD;
+int qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device); + int qemuMonitorBlockJob(qemuMonitorPtr mon, const char *device, unsigned long bandwidth, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 7eb2a92..2ca8eeb 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -57,7 +57,8 @@ static void qemuMonitorJSONHandleIOError(qemuMonitorPtr mon, virJSONValuePtr dat static void qemuMonitorJSONHandleVNCConnect(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCInitialize(qemuMonitorPtr mon, virJSONValuePtr data); static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValuePtr data); -static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, virJSONValuePtr data); +static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, virJSONValuePtr data);
struct { const char *type; @@ -73,7 +74,8 @@ struct { { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, }, { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, }, { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, }, - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, }, + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, }, + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCancelled, }, };
@@ -685,13 +687,14 @@ static void qemuMonitorJSONHandleVNCDisconnect(qemuMonitorPtr mon, virJSONValueP qemuMonitorJSONHandleVNC(mon, data, VIR_DOMAIN_EVENT_GRAPHICS_DISCONNECT); }
-static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr data) +static void qemuMonitorJSONHandleBlockJobImpl(qemuMonitorPtr mon, + virJSONValuePtr data, + int event) { const char *device; const char *type_str; int type = VIR_DOMAIN_BLOCK_JOB_TYPE_UNKNOWN; unsigned long long offset, len; - int status = VIR_DOMAIN_BLOCK_JOB_FAILED;
if ((device = virJSONValueObjectGetString(data, "device")) == NULL) { VIR_WARN("missing device in block job event"); @@ -716,13 +719,32 @@ static void qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da if (STREQ(type_str, "stream")) type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
- if (offset != 0 && offset == len) - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED; + switch (event) { + case VIR_DOMAIN_BLOCK_JOB_COMPLETED: + /* Make sure the whole device has been processed */ + if (offset != len) + event = VIR_DOMAIN_BLOCK_JOB_FAILED; + break; + case VIR_DOMAIN_BLOCK_JOB_FAILED: + case VIR_DOMAIN_BLOCK_JOB_CANCELLED: + break; + }
out: - qemuMonitorEmitBlockJob(mon, device, type, status); + qemuMonitorEmitBlockJob(mon, device, type, event); +} + +static void qemuMonitorJSONHandleBlockJobCompleted(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_COMPLETED); }
+static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_CANCELLED); +}
int qemuMonitorJSONHumanCommandWithFd(qemuMonitorPtr mon, -- 1.7.5.rc1
-- Adam Litke <agl@us.ibm.com> IBM Linux Technology Center

----- Original Message -----
From: "Adam Litke" <agl@us.ibm.com> To: "Dave Allan" <dallan@redhat.com> Cc: libvir-list@redhat.com Sent: Friday, January 13, 2012 9:51:23 PM Subject: [libvirt] [PATCH] BlockJob: Support sync/async virDomainBlockJobAbort
Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled.
[...]
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed.
Why do you raise the event VIR_DOMAIN_BLOCK_JOB_CANCELLED also in the case where libvirt takes care of it internally? Is there a specific use case for this?
API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
Do you have an example of what operation may block? Is this the common behavior for all the other async tasks that libvirt is managing internally? -- Federico

On Wed, Jan 18, 2012 at 12:59:29PM -0500, Federico Simoncelli wrote:
----- Original Message -----
From: "Adam Litke" <agl@us.ibm.com> To: "Dave Allan" <dallan@redhat.com> Cc: libvir-list@redhat.com Sent: Friday, January 13, 2012 9:51:23 PM Subject: [libvirt] [PATCH] BlockJob: Support sync/async virDomainBlockJobAbort
Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled.
[...]
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed.
Why do you raise the event VIR_DOMAIN_BLOCK_JOB_CANCELLED also in the case where libvirt takes care of it internally? Is there a specific use case for this?
We just raise the events we receive from qemu and qemu will always generate these. IMO, there is no harm in always raising the events. If a user doesn't care about them, then they don't have to register for notifications.
API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
Do you have an example of what operation may block? Is this the common behavior for all the other async tasks that libvirt is managing internally?
Any operation that needs to issue a monitor command would block. This is a large list. That being said, the existing implementation already has this potential issue with the ASYNC flag being the workaround. I cannot answer definitively as to whether all libvirt async tasks can block. I suspect the answer depends on whether the operation relies on a monitor command that can block in qemu. As far as I know the only other command like this (coincidentally it is being improved in qemu by Luiz) is virDomainMemoryStats(). -- Adam Litke <agl@us.ibm.com> IBM Linux Technology Center

On 01/19/2012 02:00 PM, Adam Litke wrote:
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed.
Why do you raise the event VIR_DOMAIN_BLOCK_JOB_CANCELLED also in the case where libvirt takes care of it internally? Is there a specific use case for this?
We just raise the events we receive from qemu and qemu will always generate these. IMO, there is no harm in always raising the events. If a user doesn't care about them, then they don't have to register for notifications.
I guess I reviewed under the assumption that you'd only raise the libvirt event if the user is expecting it; but I guess I can see your point of always raising the libvirt event when the qemu event happens. This works fine for the new semantics of the qemu command. The problem is the when the user has the old qemu that only had the blocking semantics, but requests ASYNC operation, at which point, they still expect the libvirt event. Which means that libvirt must raise the event by itself; but now if the user upgrades to new qemu, and we hit the race where libvirt can't tell whether the command completed because it was the old blocking version or whether it was async but completed before we queried, then converting the qemu event to a libvirt event _and_ having libvirt raise the event itself would result in double events to the user. Hence, I still think that being smarter and only converting qemu event to libvirt event when there is a pending async cancel, and discarding it otherwise, is the way to go.
API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
Do you have an example of what operation may block? Is this the common behavior for all the other async tasks that libvirt is managing internally?
Any operation that needs to issue a monitor command would block. This is a large list. That being said, the existing implementation already has this potential issue with the ASYNC flag being the workaround.
I cannot answer definitively as to whether all libvirt async tasks can block. I suspect the answer depends on whether the operation relies on a monitor command that can block in qemu. As far as I know the only other command like this (coincidentally it is being improved in qemu by Luiz) is virDomainMemoryStats().
Jiri Denemark worked on some code to classify qemu monitor commands into categories, so that when an async monitor command is in effect, other safe commands can be done in the meantime between times that the async job regains control to issue another monitor command to poll its status. This would involve grabbing an async job around the time where we wait for job cancellation to complete, and dropping the lock each time we sleep until the next poll. Waiting for migration is an example of setting up an async job. But I'm okay if we save that for future improvements (and get Jirka's help), rather than trying to complicate this initial implementation with the additional semantics of using an async monitor job. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 01/13/2012 01:51 PM, Adam Litke wrote:
Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
How can we tell the two implementations of qemu apart? That is, how is libvirt supposed to know whether it is calling the old (blocking) variant, or the new async variant, in order to know whether it should wait for an event? On the other hand, I think we can make things work: call cancel, then call job status if the job is complete, then we are done, whether or not we have old or new command semantics (and if we have new command semantics, be prepared to ignore an event from qemu) if the job is not complete, the we have the new semantics, and we know to expect the event coupled with: if the user passed 0 for the flag (they want blocking), then make the command wait for completion (either by job-status query or by getting the event), and no event is passed to the user if the user passed ASYNC for the flag, then they want an event; if we detect the async variant of the command, then we hook up our qemu event handler to turn around and issue a user event. Conversely, if job-query says the command is complete, then we generate our own event without waiting for a qemu event. That means that we _don't_ blindly pass the qemu event on to the user; rather, on reception of a qemu event, we determine if we have an incomplete cancel in flight, and only then do we pass it on to the user; otherwise, we assume that the qemu event came after we already saw the job-info query saying things were complete, and therefore, we had already generated the user's event and can ignore the qemu event. It would help things if the new qemu command were spelled with a different name than the old blocking version.
To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled.
I'm not sure it should be raised unconditionally on receipt of a qemu event, but only in the case of a virDomainBlockJobAbort(ASYNC) where we detect that cancel is in flight when we return control to the user.
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status.
Agreed.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
Agreed.
This patch implements the new event type, the API flag, and the polling. The main outstanding issue is whether we should bound the amount of time we will wait for cancellation and return an error.
I'm almost wondering if we should break this patch into a series of smaller patches, but I'll go ahead and review it intact.
Comments on this proposal?
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 ++++++++++ src/libvirt.c | 9 ++++++++- src/qemu/qemu_driver.c | 24 +++++++++++++++++------- src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 36 +++++++++++++++++++++++++++++------- 6 files changed, 90 insertions(+), 15 deletions(-)
Your patch adds trailing whitespace :( 'make syntax-check' would diagnose this, and other issues.
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index e436f3c..fc7f028 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1776,6 +1776,15 @@ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, } virDomainBlockJobType;
+/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1,
I'd write this as (1 << 0), to make it easier to add new flags in the future as (1 << 1) and so forth.
+++ b/src/libvirt.c @@ -17299,7 +17299,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise or of virDomainBlockJobAbortFlags
Nit: for consistency, s/bitwise or/bitwise-OR/
* * Cancel the active block job on the given disk. * @@ -17310,6 +17310,13 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, pass + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous + * behavior. When the job has been cancelled, a BlockJob event will be emitted.
Interesting mix of US (behavior) and UK (cancelled) spelling in one line. We tend to stick to US spelling (canceled) in the public docs, although this is not a hard and fast rule. I'm not sure whether we should guarantee that the BlockJob event will only be emitted if ASYNC was passed in; or whether a BlockJob event is unconditionally emitted even if the user passed flags of 0 and therefore the cancellation completed. I guess my thoughts at the head of this email would lean toward the former interpretation (require ASYNC before the event will reach the user).
+++ b/src/qemu/qemu_driver.c @@ -11379,7 +11379,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int flags, int mode)
Use 'unsigned int' for the flags ('make syntax-check' caught this), and I'd make flags the last argument.
{ struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11420,6 +11420,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, qemuDomainObjEnterMonitorWithDriver(driver, vm); priv = vm->privateData; ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode); + /* + * Qemu provides asynchronous block job cancellation but without the + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous + * operation. Provide this behavior by waiting here (with the monitor + * locked) so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC))
Indentation - I'd make the ! of the second row flush with 'ret' of the first row: if (ret ... !(flags ... Semantic wise, I don't think this quite works with the older semantics of block_job_cancel. That is, I think that you have to have logic more like: if (ret == 0 && mode == BLOCK_JOB_ABORT) { query status if done { if ASYNC generate libvirt event } else { if !ASYNC wait for completion else set state to convert qemu event to libvirt event } }
@@ -11439,8 +11448,7 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1);
Don't drop this line. Rather, alter it to be: virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1); to state that we are explicitly allowing only that one flag value through (previously, we were allowing no flags at all; dropping the line would allow all 32 bits through).
+++ b/src/qemu/qemu_monitor.c @@ -2585,6 +2585,30 @@ int qemuMonitorScreendump(qemuMonitorPtr mon, return ret; }
+/* Poll the monitor to wait for the block job on a given disk to end. + * We don't need to worry about another block job starting since we have the + * driver locked. */
Can qemu support multiple block jobs in parallel? If so, are we going to regret our decision to serialize things in libvirt?
+int +qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device) +{ + VIR_DEBUG("mon=%p, device=%p", mon, device); + /* XXX: Should we provide some sort of escape hatch for this wait? */
I'm okay with waiting indefinitely, for lack of any better suggestions on how we would let the user pick a timeout value.
+ while (1) { + /* Poll every 100ms */ + int ret; + struct timespec ts = { .tv_sec = 0, .tv_nsec = 100 * 1000 * 1000ull }; + virDomainBlockJobInfo info;
usleep() instead of nanosleep() might be an easier interface to use for the length of time you are sleeping. libvirt.git only has one other instance of nanosleep() at the moment. (That reminds me: someday, I'd really like an LGPLv2+ gnulib interface that lets you call sane_sleep(double), without having all the shenanigans of nanosleep, and with more range than usleep - but that's a story for another day).
+ + ret = qemuMonitorBlockJob(mon, device, 0, &info, BLOCK_JOB_INFO); + if (ret < 0) + return -1; + else if (ret == 0) + return 0; + else + nanosleep(&ts, NULL);
Ah, so we are polling solely on the job info command, and not waiting for the qemu event. I guess we could make things slightly smarter by also paying attention to whether we expect a qemu event, in which case we avoid the polling, but that would make this loop a bit more complex, and I'm not sure if it is worth the tradeoff.
+static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_CANCELLED);
This is where I'd make things conditional to only emit a cancelled event if the user is expecting one because they passed the ASYNC flag. I think there are still a few design issues to address in a v2 patch, but I'd love to see this go in before 0.9.10. I wish I had more time to actually contribute fixes; but here's my quickie fixes for just the syntax problems caught by 'make syntax-check', to prove that I at least compile-tested and minimally played with this patch. If anyone else has comments on the approach or my suggestions for tweaking the design slightly - speak up now! diff --git i/src/libvirt.c w/src/libvirt.c index 3821dd8..1507338 100644 --- i/src/libvirt.c +++ w/src/libvirt.c @@ -17379,7 +17379,7 @@ error: * further domain interactions may be unresponsive. To avoid this problem, pass * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous * behavior. When the job has been cancelled, a BlockJob event will be emitted. - * + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c index cc20c4f..3d1766b 100644 --- i/src/qemu/qemu_driver.c +++ w/src/qemu/qemu_driver.c @@ -11239,7 +11239,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int flags, int mode) + unsigned int flags, int mode) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11281,7 +11281,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, priv = vm->privateData; ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode); /* - * Qemu provides asynchronous block job cancellation but without the + * Qemu provides asynchronous block job cancellation but without the * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous * operation. Provide this behavior by waiting here (with the monitor * locked) so we don't get confused by newly scheduled block jobs. diff --git i/src/qemu/qemu_monitor.c w/src/qemu/qemu_monitor.c index 5c22486..9095eb6 100644 --- i/src/qemu/qemu_monitor.c +++ w/src/qemu/qemu_monitor.c @@ -1,7 +1,7 @@ /* * qemu_monitor.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2011 Red Hat, Inc. + * Copyright (C) 2006-2012 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -2605,7 +2605,7 @@ qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device) else if (ret == 0) return 0; else - nanosleep(&ts, NULL); + nanosleep(&ts, NULL); } } -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On Thu, Jan 19, 2012 at 05:39:43PM -0700, Eric Blake wrote:
On 01/13/2012 01:51 PM, Adam Litke wrote:
Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
How can we tell the two implementations of qemu apart? That is, how is libvirt supposed to know whether it is calling the old (blocking) variant, or the new async variant, in order to know whether it should wait for an event? On the other hand, I think we can make things work:
call cancel, then call job status if the job is complete, then we are done, whether or not we have old or new command semantics (and if we have new command semantics, be prepared to ignore an event from qemu)
I don't really like ignoring domain events based on one specific API invocation -- see comment farther below.
if the job is not complete, the we have the new semantics, and we know to expect the event
Pretty clever, but there is a very small race window where the cancel completed and a new job has started immediately. In that case, we could assume we have new command semantics when we actually have an old qemu.
coupled with:
if the user passed 0 for the flag (they want blocking), then make the command wait for completion (either by job-status query or by getting the event), and no event is passed to the user
if the user passed ASYNC for the flag, then they want an event; if we detect the async variant of the command, then we hook up our qemu event handler to turn around and issue a user event. Conversely, if job-query says the command is complete, then we generate our own event without waiting for a qemu event.
That means that we _don't_ blindly pass the qemu event on to the user; rather, on reception of a qemu event, we determine if we have an incomplete cancel in flight, and only then do we pass it on to the user; otherwise, we assume that the qemu event came after we already saw the job-info query saying things were complete, and therefore, we had already generated the user's event and can ignore the qemu event.
It would help things if the new qemu command were spelled with a different name than the old blocking version.
To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled.
I'm not sure it should be raised unconditionally on receipt of a qemu event, but only in the case of a virDomainBlockJobAbort(ASYNC) where we detect that cancel is in flight when we return control to the user.
What happens if an impatient user calls cancel several times on multiple devices, sometimes using ASYNC and sometimes not). Clearly then, we should only raise events for the devices that the user passed the ASYNC flag. We are going to need a flag per disk that specifies whether block_job_cancel events are user-deliverable. This is all getting pretty complicated. Can we take a step back and look at how we can simplify this? I just thought of another reason why I would prefer not to mask out events based on the initial API call. Imagine a libvirtd where multiple users/applications are connected. User A is a disk manager that monitors BlockJob operations (for logging and VM placement decision making). User B is an administrator. If we mask the cancel event depending on User B's invocation of the API, then User A will not be guaranteed to see a consistent view of what is happening on the system (ie. it may not receive some events it was expecting). I am not sure if this is practical (or desireable), but here is an idea: - We always deliver any events that may arise - For the case where an event is requested but qemu is too old to issue one: * Create a new timer with a callback that calls query-block-jobs to check for active jobs. If no active jobs are found, issue the event. Otherwise, reset the timer. (We would need to handle the case where a new job is created before the timer fires. This isn't that difficult).
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status.
Agreed.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
Agreed.
This patch implements the new event type, the API flag, and the polling. The main outstanding issue is whether we should bound the amount of time we will wait for cancellation and return an error.
I'm almost wondering if we should break this patch into a series of smaller patches, but I'll go ahead and review it intact.
Comments on this proposal?
Signed-off-by: Adam Litke <agl@us.ibm.com> Cc: Stefan Hajnoczi <stefanha@gmail.com> Cc: Eric Blake <eblake@redhat.com> --- include/libvirt/libvirt.h.in | 10 ++++++++++ src/libvirt.c | 9 ++++++++- src/qemu/qemu_driver.c | 24 +++++++++++++++++------- src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++++++ src/qemu/qemu_monitor.h | 2 ++ src/qemu/qemu_monitor_json.c | 36 +++++++++++++++++++++++++++++------- 6 files changed, 90 insertions(+), 15 deletions(-)
Your patch adds trailing whitespace :( 'make syntax-check' would diagnose this, and other issues.
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in index e436f3c..fc7f028 100644 --- a/include/libvirt/libvirt.h.in +++ b/include/libvirt/libvirt.h.in @@ -1776,6 +1776,15 @@ typedef enum { VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1, } virDomainBlockJobType;
+/** + * virDomainBlockJobAbortFlags: + * + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion + */ +typedef enum { + VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1,
I'd write this as (1 << 0), to make it easier to add new flags in the future as (1 << 1) and so forth.
+++ b/src/libvirt.c @@ -17299,7 +17299,7 @@ error: * virDomainBlockJobAbort: * @dom: pointer to domain object * @disk: path to the block device, or device shorthand - * @flags: extra flags; not used yet, so callers should always pass 0 + * @flags: bitwise or of virDomainBlockJobAbortFlags
Nit: for consistency, s/bitwise or/bitwise-OR/
* * Cancel the active block job on the given disk. * @@ -17310,6 +17310,13 @@ error: * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * By default, this function performs a synchronous operation and the caller + * may assume that the operation has completed when 0 is returned. However, + * BlockJob operations may take a long time to complete, and during this time + * further domain interactions may be unresponsive. To avoid this problem, pass + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous + * behavior. When the job has been cancelled, a BlockJob event will be emitted.
Interesting mix of US (behavior) and UK (cancelled) spelling in one line. We tend to stick to US spelling (canceled) in the public docs, although this is not a hard and fast rule.
I'm not sure whether we should guarantee that the BlockJob event will only be emitted if ASYNC was passed in; or whether a BlockJob event is unconditionally emitted even if the user passed flags of 0 and therefore the cancellation completed. I guess my thoughts at the head of this email would lean toward the former interpretation (require ASYNC before the event will reach the user).
+++ b/src/qemu/qemu_driver.c @@ -11379,7 +11379,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int mode) + int flags, int mode)
Use 'unsigned int' for the flags ('make syntax-check' caught this), and I'd make flags the last argument.
{ struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11420,6 +11420,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, qemuDomainObjEnterMonitorWithDriver(driver, vm); priv = vm->privateData; ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode); + /* + * Qemu provides asynchronous block job cancellation but without the + * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous + * operation. Provide this behavior by waiting here (with the monitor + * locked) so we don't get confused by newly scheduled block jobs. + */ + if (ret == 0 && mode == BLOCK_JOB_ABORT && + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC))
Indentation - I'd make the ! of the second row flush with 'ret' of the first row:
if (ret ... !(flags ...
Semantic wise, I don't think this quite works with the older semantics of block_job_cancel. That is, I think that you have to have logic more like:
if (ret == 0 && mode == BLOCK_JOB_ABORT) { query status if done { if ASYNC generate libvirt event } else { if !ASYNC wait for completion else set state to convert qemu event to libvirt event } }
@@ -11439,8 +11448,7 @@ cleanup: static int qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags) { - virCheckFlags(0, -1);
Don't drop this line. Rather, alter it to be:
virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1);
to state that we are explicitly allowing only that one flag value through (previously, we were allowing no flags at all; dropping the line would allow all 32 bits through).
+++ b/src/qemu/qemu_monitor.c @@ -2585,6 +2585,30 @@ int qemuMonitorScreendump(qemuMonitorPtr mon, return ret; }
+/* Poll the monitor to wait for the block job on a given disk to end. + * We don't need to worry about another block job starting since we have the + * driver locked. */
Can qemu support multiple block jobs in parallel? If so, are we going to regret our decision to serialize things in libvirt?
There can be one per block device.
+int +qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device) +{ + VIR_DEBUG("mon=%p, device=%p", mon, device); + /* XXX: Should we provide some sort of escape hatch for this wait? */
I'm okay with waiting indefinitely, for lack of any better suggestions on how we would let the user pick a timeout value.
+ while (1) { + /* Poll every 100ms */ + int ret; + struct timespec ts = { .tv_sec = 0, .tv_nsec = 100 * 1000 * 1000ull }; + virDomainBlockJobInfo info;
usleep() instead of nanosleep() might be an easier interface to use for the length of time you are sleeping. libvirt.git only has one other instance of nanosleep() at the moment. (That reminds me: someday, I'd really like an LGPLv2+ gnulib interface that lets you call sane_sleep(double), without having all the shenanigans of nanosleep, and with more range than usleep - but that's a story for another day).
+ + ret = qemuMonitorBlockJob(mon, device, 0, &info, BLOCK_JOB_INFO); + if (ret < 0) + return -1; + else if (ret == 0) + return 0; + else + nanosleep(&ts, NULL);
Ah, so we are polling solely on the job info command, and not waiting for the qemu event. I guess we could make things slightly smarter by also paying attention to whether we expect a qemu event, in which case we avoid the polling, but that would make this loop a bit more complex, and I'm not sure if it is worth the tradeoff.
In earlier discussions, we decided to forget the event for this part because then we need to know whether qemu even supports them. I suppose we would just need to poll the first time and that would tell us if events are supported.
+static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr mon, + virJSONValuePtr data) +{ + qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_CANCELLED);
This is where I'd make things conditional to only emit a cancelled event if the user is expecting one because they passed the ASYNC flag.
I think there are still a few design issues to address in a v2 patch, but I'd love to see this go in before 0.9.10. I wish I had more time to actually contribute fixes; but here's my quickie fixes for just the syntax problems caught by 'make syntax-check', to prove that I at least compile-tested and minimally played with this patch. If anyone else has comments on the approach or my suggestions for tweaking the design slightly - speak up now!
Thanks for your careful review!
diff --git i/src/libvirt.c w/src/libvirt.c index 3821dd8..1507338 100644 --- i/src/libvirt.c +++ w/src/libvirt.c @@ -17379,7 +17379,7 @@ error: * further domain interactions may be unresponsive. To avoid this problem, pass * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous * behavior. When the job has been cancelled, a BlockJob event will be emitted. - * + * * Returns -1 in case of failure, 0 when successful. */ int virDomainBlockJobAbort(virDomainPtr dom, const char *disk, diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c index cc20c4f..3d1766b 100644 --- i/src/qemu/qemu_driver.c +++ w/src/qemu/qemu_driver.c @@ -11239,7 +11239,7 @@ cleanup: static int qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, unsigned long bandwidth, virDomainBlockJobInfoPtr info, - int flags, int mode) + unsigned int flags, int mode) { struct qemud_driver *driver = dom->conn->privateData; virDomainObjPtr vm = NULL; @@ -11281,7 +11281,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path, priv = vm->privateData; ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode); /* - * Qemu provides asynchronous block job cancellation but without the + * Qemu provides asynchronous block job cancellation but without the * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous * operation. Provide this behavior by waiting here (with the monitor * locked) so we don't get confused by newly scheduled block jobs. diff --git i/src/qemu/qemu_monitor.c w/src/qemu/qemu_monitor.c index 5c22486..9095eb6 100644 --- i/src/qemu/qemu_monitor.c +++ w/src/qemu/qemu_monitor.c @@ -1,7 +1,7 @@ /* * qemu_monitor.c: interaction with QEMU monitor console * - * Copyright (C) 2006-2011 Red Hat, Inc. + * Copyright (C) 2006-2012 Red Hat, Inc. * Copyright (C) 2006 Daniel P. Berrange * * This library is free software; you can redistribute it and/or @@ -2605,7 +2605,7 @@ qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device) else if (ret == 0) return 0; else - nanosleep(&ts, NULL); + nanosleep(&ts, NULL); } }
-- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
-- Adam Litke <agl@us.ibm.com> IBM Linux Technology Center

On Fri, Jan 13, 2012 at 8:51 PM, Adam Litke <agl@us.ibm.com> wrote:
Qemu has changed the semantics of the "block_job_cancel" API. Originally, the operation was synchronous (ie. upon command completion, the operation was guaranteed to be completely stopped). With the new semantics, a "block_job_cancel" merely requests that the operation be cancelled and an event is triggered once the cancellation request has been honored.
To adopt the new semantics while preserving compatibility I propose the following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event will be raised whenever it is received from qemu. This event indicates that a block job has been successfully cancelled.
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the virDomainBlockJobAbort API. When enabled, this function will operate asynchronously (ie, it can return before the job has actually been cancelled). When the API is used in this mode, it is the responsibility of the caller to wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the virDomainGetBlockJobInfo API to check the cancellation status.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll using qemu's "query-block-jobs" API and will not return until the operation has been completed. API users are advised that this operation is unbounded and further interaction with the domain during this period may block.
This patch implements the new event type, the API flag, and the polling. The main outstanding issue is whether we should bound the amount of time we will wait for cancellation and return an error.
Comments on this proposal?
Hi, What's the latest thinking on this issue? I'm happy to help come up with an acceptable patch. I found Adam's initial approach plus suggested cleanups fine. Would that be accepted? Stefan
participants (5)
-
Adam Litke
-
Dave Allan
-
Eric Blake
-
Federico Simoncelli
-
Stefan Hajnoczi