On 01/13/2012 01:51 PM, Adam Litke wrote:
Qemu has changed the semantics of the "block_job_cancel"
API. Originally, the
operation was synchronous (ie. upon command completion, the operation was
guaranteed to be completely stopped). With the new semantics, a
"block_job_cancel" merely requests that the operation be cancelled and an
event
is triggered once the cancellation request has been honored.
How can we tell the two implementations of qemu apart? That is, how is
libvirt supposed to know whether it is calling the old (blocking)
variant, or the new async variant, in order to know whether it should
wait for an event? On the other hand, I think we can make things work:
call cancel, then call job status
if the job is complete, then we are done, whether or not we have old or
new command semantics (and if we have new command semantics, be prepared
to ignore an event from qemu)
if the job is not complete, the we have the new semantics, and we know
to expect the event
coupled with:
if the user passed 0 for the flag (they want blocking), then make the
command wait for completion (either by job-status query or by getting
the event), and no event is passed to the user
if the user passed ASYNC for the flag, then they want an event; if we
detect the async variant of the command, then we hook up our qemu event
handler to turn around and issue a user event. Conversely, if job-query
says the command is complete, then we generate our own event without
waiting for a qemu event.
That means that we _don't_ blindly pass the qemu event on to the user;
rather, on reception of a qemu event, we determine if we have an
incomplete cancel in flight, and only then do we pass it on to the user;
otherwise, we assume that the qemu event came after we already saw the
job-info query saying things were complete, and therefore, we had
already generated the user's event and can ignore the qemu event.
It would help things if the new qemu command were spelled with a
different name than the old blocking version.
To adopt the new semantics while preserving compatibility I propose the
following updates to the virDomainBlockJob API:
A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELLED will be recognized by
libvirt. Regardless of the flags used with virDomainBlockJobAbort, this event
will be raised whenever it is received from qemu. This event indicates that a
block job has been successfully cancelled.
I'm not sure it should be raised unconditionally on receipt of a qemu
event, but only in the case of a virDomainBlockJobAbort(ASYNC) where we
detect that cancel is in flight when we return control to the user.
A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC will be added to the
virDomainBlockJobAbort API. When enabled, this function will operate
asynchronously (ie, it can return before the job has actually been cancelled).
When the API is used in this mode, it is the responsibility of the caller to
wait for a VIR_DOMAIN_BLOCK_JOB_CANCELLED event or poll via the
virDomainGetBlockJobInfo API to check the cancellation status.
Agreed.
Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll
using qemu's "query-block-jobs" API and will not return until the operation
has
been completed. API users are advised that this operation is unbounded and
further interaction with the domain during this period may block.
Agreed.
This patch implements the new event type, the API flag, and the polling. The
main outstanding issue is whether we should bound the amount of time we will
wait for cancellation and return an error.
I'm almost wondering if we should break this patch into a series of
smaller patches, but I'll go ahead and review it intact.
Comments on this proposal?
Signed-off-by: Adam Litke <agl(a)us.ibm.com>
Cc: Stefan Hajnoczi <stefanha(a)gmail.com>
Cc: Eric Blake <eblake(a)redhat.com>
---
include/libvirt/libvirt.h.in | 10 ++++++++++
src/libvirt.c | 9 ++++++++-
src/qemu/qemu_driver.c | 24 +++++++++++++++++-------
src/qemu/qemu_monitor.c | 24 ++++++++++++++++++++++++
src/qemu/qemu_monitor.h | 2 ++
src/qemu/qemu_monitor_json.c | 36 +++++++++++++++++++++++++++++-------
6 files changed, 90 insertions(+), 15 deletions(-)
Your patch adds trailing whitespace :( 'make syntax-check' would
diagnose this, and other issues.
diff --git a/include/libvirt/libvirt.h.in b/include/libvirt/libvirt.h.in
index e436f3c..fc7f028 100644
--- a/include/libvirt/libvirt.h.in
+++ b/include/libvirt/libvirt.h.in
@@ -1776,6 +1776,15 @@ typedef enum {
VIR_DOMAIN_BLOCK_JOB_TYPE_PULL = 1,
} virDomainBlockJobType;
+/**
+ * virDomainBlockJobAbortFlags:
+ *
+ * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC: Request only, do not wait for completion
+ */
+typedef enum {
+ VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC = 1,
I'd write this as (1 << 0), to make it easier to add new flags in the
future as (1 << 1) and so forth.
+++ b/src/libvirt.c
@@ -17299,7 +17299,7 @@ error:
* virDomainBlockJobAbort:
* @dom: pointer to domain object
* @disk: path to the block device, or device shorthand
- * @flags: extra flags; not used yet, so callers should always pass 0
+ * @flags: bitwise or of virDomainBlockJobAbortFlags
Nit: for consistency, s/bitwise or/bitwise-OR/
*
* Cancel the active block job on the given disk.
*
@@ -17310,6 +17310,13 @@ error:
* can be found by calling virDomainGetXMLDesc() and inspecting
* elements within //domain/devices/disk.
*
+ * By default, this function performs a synchronous operation and the caller
+ * may assume that the operation has completed when 0 is returned. However,
+ * BlockJob operations may take a long time to complete, and during this time
+ * further domain interactions may be unresponsive. To avoid this problem, pass
+ * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable asynchronous
+ * behavior. When the job has been cancelled, a BlockJob event will be emitted.
Interesting mix of US (behavior) and UK (cancelled) spelling in one
line. We tend to stick to US spelling (canceled) in the public docs,
although this is not a hard and fast rule.
I'm not sure whether we should guarantee that the BlockJob event will
only be emitted if ASYNC was passed in; or whether a BlockJob event is
unconditionally emitted even if the user passed flags of 0 and therefore
the cancellation completed. I guess my thoughts at the head of this
email would lean toward the former interpretation (require ASYNC before
the event will reach the user).
+++ b/src/qemu/qemu_driver.c
@@ -11379,7 +11379,7 @@ cleanup:
static int
qemuDomainBlockJobImpl(virDomainPtr dom, const char *path,
unsigned long bandwidth, virDomainBlockJobInfoPtr info,
- int mode)
+ int flags, int mode)
Use 'unsigned int' for the flags ('make syntax-check' caught this), and
I'd make flags the last argument.
{
struct qemud_driver *driver = dom->conn->privateData;
virDomainObjPtr vm = NULL;
@@ -11420,6 +11420,15 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const char *path,
qemuDomainObjEnterMonitorWithDriver(driver, vm);
priv = vm->privateData;
ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode);
+ /*
+ * Qemu provides asynchronous block job cancellation but without the
+ * VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a synchronous
+ * operation. Provide this behavior by waiting here (with the monitor
+ * locked) so we don't get confused by newly scheduled block jobs.
+ */
+ if (ret == 0 && mode == BLOCK_JOB_ABORT &&
+ !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC))
Indentation - I'd make the ! of the second row flush with 'ret' of the
first row:
if (ret ...
!(flags ...
Semantic wise, I don't think this quite works with the older semantics
of block_job_cancel. That is, I think that you have to have logic more
like:
if (ret == 0 && mode == BLOCK_JOB_ABORT) {
query status
if done {
if ASYNC
generate libvirt event
} else {
if !ASYNC
wait for completion
else
set state to convert qemu event to libvirt event
}
}
@@ -11439,8 +11448,7 @@ cleanup:
static int
qemuDomainBlockJobAbort(virDomainPtr dom, const char *path, unsigned int flags)
{
- virCheckFlags(0, -1);
Don't drop this line. Rather, alter it to be:
virCheckFlags(VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC, -1);
to state that we are explicitly allowing only that one flag value
through (previously, we were allowing no flags at all; dropping the line
would allow all 32 bits through).
+++ b/src/qemu/qemu_monitor.c
@@ -2585,6 +2585,30 @@ int qemuMonitorScreendump(qemuMonitorPtr mon,
return ret;
}
+/* Poll the monitor to wait for the block job on a given disk to end.
+ * We don't need to worry about another block job starting since we have the
+ * driver locked. */
Can qemu support multiple block jobs in parallel? If so, are we going
to regret our decision to serialize things in libvirt?
+int
+qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon, const char *device)
+{
+ VIR_DEBUG("mon=%p, device=%p", mon, device);
+ /* XXX: Should we provide some sort of escape hatch for this wait? */
I'm okay with waiting indefinitely, for lack of any better suggestions
on how we would let the user pick a timeout value.
+ while (1) {
+ /* Poll every 100ms */
+ int ret;
+ struct timespec ts = { .tv_sec = 0, .tv_nsec = 100 * 1000 * 1000ull };
+ virDomainBlockJobInfo info;
usleep() instead of nanosleep() might be an easier interface to use for
the length of time you are sleeping. libvirt.git only has one other
instance of nanosleep() at the moment. (That reminds me: someday, I'd
really like an LGPLv2+ gnulib interface that lets you call
sane_sleep(double), without having all the shenanigans of nanosleep,
and with more range than usleep - but that's a story for another day).
+
+ ret = qemuMonitorBlockJob(mon, device, 0, &info, BLOCK_JOB_INFO);
+ if (ret < 0)
+ return -1;
+ else if (ret == 0)
+ return 0;
+ else
+ nanosleep(&ts, NULL);
Ah, so we are polling solely on the job info command, and not waiting
for the qemu event. I guess we could make things slightly smarter by
also paying attention to whether we expect a qemu event, in which case
we avoid the polling, but that would make this loop a bit more complex,
and I'm not sure if it is worth the tradeoff.
+static void qemuMonitorJSONHandleBlockJobCancelled(qemuMonitorPtr
mon,
+ virJSONValuePtr data)
+{
+ qemuMonitorJSONHandleBlockJobImpl(mon, data, VIR_DOMAIN_BLOCK_JOB_CANCELLED);
This is where I'd make things conditional to only emit a cancelled event
if the user is expecting one because they passed the ASYNC flag.
I think there are still a few design issues to address in a v2 patch,
but I'd love to see this go in before 0.9.10. I wish I had more time to
actually contribute fixes; but here's my quickie fixes for just the
syntax problems caught by 'make syntax-check', to prove that I at least
compile-tested and minimally played with this patch. If anyone else has
comments on the approach or my suggestions for tweaking the design
slightly - speak up now!
diff --git i/src/libvirt.c w/src/libvirt.c
index 3821dd8..1507338 100644
--- i/src/libvirt.c
+++ w/src/libvirt.c
@@ -17379,7 +17379,7 @@ error:
* further domain interactions may be unresponsive. To avoid this
problem, pass
* VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC in the flags argument to enable
asynchronous
* behavior. When the job has been cancelled, a BlockJob event will be
emitted.
- *
+ *
* Returns -1 in case of failure, 0 when successful.
*/
int virDomainBlockJobAbort(virDomainPtr dom, const char *disk,
diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c
index cc20c4f..3d1766b 100644
--- i/src/qemu/qemu_driver.c
+++ w/src/qemu/qemu_driver.c
@@ -11239,7 +11239,7 @@ cleanup:
static int
qemuDomainBlockJobImpl(virDomainPtr dom, const char *path,
unsigned long bandwidth,
virDomainBlockJobInfoPtr info,
- int flags, int mode)
+ unsigned int flags, int mode)
{
struct qemud_driver *driver = dom->conn->privateData;
virDomainObjPtr vm = NULL;
@@ -11281,7 +11281,7 @@ qemuDomainBlockJobImpl(virDomainPtr dom, const
char *path,
priv = vm->privateData;
ret = qemuMonitorBlockJob(priv->mon, device, bandwidth, info, mode);
/*
- * Qemu provides asynchronous block job cancellation but without the
+ * Qemu provides asynchronous block job cancellation but without the
* VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a
synchronous
* operation. Provide this behavior by waiting here (with the monitor
* locked) so we don't get confused by newly scheduled block jobs.
diff --git i/src/qemu/qemu_monitor.c w/src/qemu/qemu_monitor.c
index 5c22486..9095eb6 100644
--- i/src/qemu/qemu_monitor.c
+++ w/src/qemu/qemu_monitor.c
@@ -1,7 +1,7 @@
/*
* qemu_monitor.c: interaction with QEMU monitor console
*
- * Copyright (C) 2006-2011 Red Hat, Inc.
+ * Copyright (C) 2006-2012 Red Hat, Inc.
* Copyright (C) 2006 Daniel P. Berrange
*
* This library is free software; you can redistribute it and/or
@@ -2605,7 +2605,7 @@ qemuMonitorBlockJobCancelWait(qemuMonitorPtr mon,
const char *device)
else if (ret == 0)
return 0;
else
- nanosleep(&ts, NULL);
+ nanosleep(&ts, NULL);
}
}
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org