On 04/09/2012 03:26 PM, Laine Stump wrote:
On 04/06/2012 02:29 PM, Eric Blake wrote:
> From: Adam Litke <agl(a)us.ibm.com>
>
> Without the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag, libvirt will internally poll
> using qemu's "query-block-jobs" API and will not return until the
operation has
> been completed. API users are advised that this operation is unbounded and
> further interaction with the domain during this period may block. Future
> patches may refactor things to allow other queries in parallel with this
> polling. Unfortunately, there's no good way to tell if qemu will emit the
> new event, so this implementation always polls to deal with older qemu.
>
> Signed-off-by: Adam Litke <agl(a)us.ibm.com>
> Cc: Stefan Hajnoczi <stefanha(a)gmail.com>
> Signed-off-by: Eric Blake <eblake(a)redhat.com>
> ---
> src/qemu/qemu_driver.c | 55 +++++++++++++++++++++++++++++++++++++++++------
> 1 files changed, 48 insertions(+), 7 deletions(-)
>
> @@ -11641,6 +11641,45 @@ qemuDomainBlockJobImpl(virDomainPtr dom,
const char *path, const char *base,
> ret = qemuMonitorBlockJob(priv->mon, device, base, bandwidth, info, mode);
> qemuDomainObjExitMonitorWithDriver(driver, vm);
Do I understand correctly that qemu's abort was always asynchronous, and
prior to this, an abort call to libvirt would erroneously return
immediately? (I'm still trying to understand the code...)
Worse. qemu 1.1 will be adding the block_stream and block_job_cancel
monitor commands for the first time upstream, where block_job_cancel is
asynchronous. But RHEL 6.2 backported an early version of block_stream
that worked on QED only, and where block_job_cancel was synchronous.
The early synchronous version never emits an event (if we want to
guarantee an event, we will have to synthesize one in libvirt). The
asynchronous code always emits an event, but due to the RHEL 6.2 early
backport (or even an ill-timed libvirtd restart), you cannot rely on the
event. Therefore, libvirt has to poll.
On the old qemu, the poll will be a single iteration (because the event
has always completed). On the new qemu, the poll might be a single
iteration (there's an inherent race where qemu can finish things and
send the event before libvirt gets a chance to poll), or it might be
several iterations. So even if we only see one iteration, we cannot
assume the old qemu. I never got a good answer from Adam how long the
poll might last, but a good estimate is that the block job takes time to
cancel in proportion to the number of sectors that must still be flushed
to disk, so a larger disk is more likely to have a longer poll.
Since libvirt already exposed virDomainBlockJobCancel as synchronous by
default, we must maintain that default. I suppose I could work on an
additional patch that either allows you to mix the ASYNC flag with RHEL
6.2 qemu (by libvirt synthesizing the event) or by rejecting the ASYNC
flag if we don't know for sure that we have newer qemu; but my rationale
for not doing so in this patch is that (1) _only_ RHEL 6.2 backported
the preview version of block_job_cancel (shame on them for not naming it
__com.redhat_block_job_cancel) - all other qemu versions either lack the
command altogether or have upstream qemu 1.1's semantics of an async
command, and (2) I couldn't come up with a good way to probe whether we
are talking to new or old qemu during capability parsing. So such a
patch would only be relevant for RHEL 6.2, and might not be worth
posting to upstream libvirt.
Actually, on thinking about it, I guess we _might_ have a way to tell
old qemu apart from new qemu: old qemu has block_job_cancel but not
drive-mirror. If drive-mirror gets approved in time for qemu 1.1, then
we can use that as the witness to whether block_job_cancel is asynchronous.
>
> + /* Qemu provides asynchronous block job cancellation, but without
> + * the VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC flag libvirt guarantees a
> + * synchronous operation. Provide this behavior by waiting here,
> + * so we don't get confused by newly scheduled block jobs.
> + */
> + if (ret == 0 && mode == BLOCK_JOB_ABORT &&
> + !(flags & VIR_DOMAIN_BLOCK_JOB_ABORT_ASYNC)) {
> + ret = 1;
Why is ret set to 1? It will be assigned the return value of
qemuMonitorBlockJob before it's ever used for anything, so this seems
unnecessary.
Oh, I see. I refactored the code (my original attempt swapped the sleep
and the qemuMonitorBlockJob, so priming the variable was necessary to
avoid an early exit). I'll remove the dead assignment.
> + while (1) {
> + /* Poll every 50ms */
> + struct timespec ts = { .tv_sec = 0,
> + .tv_nsec = 50 * 1000 * 1000ull };
This timespec could just as well be static const, couldn't it? No big
deal one way or the other, though.
Sure, since it isn't going to change. Here, I was just copying the
polling loop we had elsewhere in qemu_migration.c.
ACK once the items above are addressed (either by explaining, changing
code, or telling me why I'm wrong)
I'll add in the capability check for v4 (since I already have to respin
the series for other reasons).
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org