On 04/10/2012 03:01 PM, Laine Stump wrote:
On 04/09/2012 11:52 PM, Eric Blake wrote:
> From: Adam Litke <agl(a)us.ibm.com>
>
> Qemu has changed the semantics of the "block_job_cancel" API. The
original
> qed implementation (pretty much only backported to RHEL 6.2 qemu) was
> synchronous (ie. upon command completion, the operation was guaranteed to
> be completely stopped). With the new semantics going into qemu 1.1 for
> qcow2, a "block_job_cancel" merely requests that the operation be
cancelled
> and an event is triggered once the cancellation request has been honored.
>
> To adopt the new semantics while preserving compatibility the following
> updates are made to the virDomainBlockJob API:
>
> A new block job event type VIR_DOMAIN_BLOCK_JOB_CANCELED is recognized by
> libvirt. Regardless of the flags used with virDomainBlockJobAbort, this
> event will be raised whenever it is received from qemu. This event
> indicates that a block job has been successfully cancelled. For now,
> libvirt does not try to synthesize this event if using an older qemu that
> did not generate it.
Okay, as I understand it, the only qemu binary that has a synchronous
block_job_cancel command is the version that is part of RHEL6.2. So any
compatibility code to allow for a synchronous block_job_cancel command
in qemu would only ever make a difference for someone who built from
upstream libvirt sources (or post-RHEL6.2 source rpm) to run on RHEL6.2
(and *didn't* build a post-RHEL6.2 qemu). Is that correct?
Correct, that's how I understand it. I'm cc'ing Adam and Stefan in case
I missed something, though.
>
> A new extension flag VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC is added to the
> virDomainBlockJobAbort API. When enabled, this function will operate
> asynchronously (ie, it can return before the job has actually been canceled).
> When the API is used in this mode, it is the responsibility of the caller to
> wait for a VIR_DOMAIN_BLOCK_JOB_CANCELED event or poll via the
> virDomainGetBlockJobInfo API to check the cancellation status; this flag
> is an error if it is not known if the hypervisor supports asynchronous cancel.
>
> This patch also exposes the new flag through virsh, as well as wiring
> up the new event, but leaves the new flag for a later patch.
Did you leave out a word there? It exposes the new flag through virsh,
..., but leaves [blank] the new flag for a later patch? "implementing"?
Yes, adding 'implementing' helps it read better. Consider it done.
(It looks that way - the front end is there, but the backend hasn't been
changed, and the backend has been rearranged a bit, but it would still
error out if someone specified the flag, and doesn't implement the loop
for the case when they hadn't specified the flag).
Since we're stuck with the assumption of "synchronous unless otherwise
specified", and we definitely want to allow for asynchronous, I'm okay
with this, *as long as upstream qemu has also committed (in git, not
just in email) to the block job cancel command being asynchronous*. I
*think* I understood you correctly that this is the case.
Yes, I'll update the commit message to point to the actual qemu commit id:
commit 370521a1d6f5537ea7271c119f3fbb7b0fa57063
Author: Stefan Hajnoczi <stefanha(a)linux.vnet.ibm.com>
Date: Wed Jan 18 14:40:48 2012 +0000
qmp: add block_job_cancel command
Add block_job_cancel, which stops an active block streaming operation.
When the operation has been cancelled the new BLOCK_JOB_CANCELLED event
is emitted.
Signed-off-by: Stefan Hajnoczi <stefanha(a)linux.vnet.ibm.com>
Acked-by: Luiz Capitulino <lcapitulino(a)redhat.com>
Signed-off-by: Kevin Wolf <kwolf(a)redhat.com>
> @@ -80,13 +81,14 @@ static struct {
> { "VNC_CONNECTED", qemuMonitorJSONHandleVNCConnect, },
> { "VNC_INITIALIZED", qemuMonitorJSONHandleVNCInitialize, },
> { "VNC_DISCONNECTED", qemuMonitorJSONHandleVNCDisconnect, },
> - { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJob, },
> { "SPICE_CONNECTED", qemuMonitorJSONHandleSPICEConnect, },
> { "SPICE_INITIALIZED", qemuMonitorJSONHandleSPICEInitialize, },
> { "SPICE_DISCONNECTED", qemuMonitorJSONHandleSPICEDisconnect, },
> { "DEVICE_TRAY_MOVED", qemuMonitorJSONHandleTrayChange, },
> { "WAKEUP", qemuMonitorJSONHandlePMWakeup, },
> { "SUSPEND", qemuMonitorJSONHandlePMSuspend, },
> + { "BLOCK_JOB_COMPLETED", qemuMonitorJSONHandleBlockJobCompleted, },
> + { "BLOCK_JOB_CANCELLED", qemuMonitorJSONHandleBlockJobCanceled, },
> };
What? This isn't in alphabetical order? :-)
You caught me off-guard! :) This hunk is basically unchanged from Adam
(other than the fact that I swapped to U.S. spelling for the callback
method name, even though qemu picked the U.K spelling for the event
name). If I sort the event names, it will be as a separate patch. (In
fact, as the number of qemu events grows, sorting the list will allow us
to do binary lookups instead of linear lookups, if we were worried about
scaling effects).
> @@ -785,11 +789,21 @@ static void
qemuMonitorJSONHandleBlockJob(qemuMonitorPtr mon, virJSONValuePtr da
> if (STREQ(type_str, "stream"))
> type = VIR_DOMAIN_BLOCK_JOB_TYPE_PULL;
>
> - if (offset != 0 && offset == len)
> - status = VIR_DOMAIN_BLOCK_JOB_COMPLETED;
> + switch (event) {
> + case VIR_DOMAIN_BLOCK_JOB_COMPLETED:
> + /* Make sure the whole device has been processed */
> + if (offset != len)
> + event = VIR_DOMAIN_BLOCK_JOB_FAILED;
Previously we also tested for offset != 0. Is that not actually necessary?
This was Adam's change, so hopefully I'm justifying it correctly:
Previously, we assumed failure, then checked for offset != 0 && offset
== len before declaring success. But unless len == 0, offset == len
_implies_ offset != 0, rendering the first check redundant.
Now we assume success, and if offset != len declare failure. Yes, this
allows a difference in behavior if len == 0; but on the other hand, I
think the only way a block job can have len==0 is if there is nothing
for the job to do in the first place, and if that is the case, you might
as well declare it a success instead of a failure.
> @@ -7613,19 +7621,24 @@ cmdBlockJob(vshControl *ctl, const vshCmd
*cmd)
> virDomainBlockJobInfo info;
> const char *type;
> int ret;
> + bool abortMode = (vshCommandOptBool(cmd, "abort") ||
> + vshCommandOptBool(cmd, "async"));
> + bool infoMode = vshCommandOptBool(cmd, "info");
> + bool bandwidth = vshCommandOptBool(cmd, "bandwidth");
>
> - if (vshCommandOptBool (cmd, "abort")) {
> - mode = VSH_CMD_BLOCK_JOB_ABORT;
> - } else if (vshCommandOptBool (cmd, "info")) {
> - mode = VSH_CMD_BLOCK_JOB_INFO;
> - } else if (vshCommandOptBool (cmd, "bandwidth")) {
> - mode = VSH_CMD_BLOCK_JOB_SPEED;
> - } else {
> + if (abortMode + infoMode + bandwidth > 1) {
> vshError(ctl, "%s",
> - _("One of --abort, --info, or --bandwidth is
required"));
> + _("conflict between --abort, --info, and --bandwidth
modes"));
"modes", or "options"? I guess "options" implies they can
be combined,
while "modes" implies they are mutually exclusive...
and they are indeed mutually exclusive.
Really no surprises here. Functionally ACK, but as with the other
patches in this series, I'll leave the philosophical/strategic ACK to
others :-)
And based on what we decide with qemu making it possible to detect async
block cancel differently from RHEL 6.2 (see this email[1]), it would be
reasonable to push this patch even if we defer the rest of the series.
[1]
https://lists.gnu.org/archive/html/qemu-devel/2012-04/msg01111.html
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org