On 04/13/2012 01:29 AM, Michal Privoznik wrote:
On 12.04.2012 21:59, Eric Blake wrote:
> I'm tired of shell-scripting to wait for completion of a block pull,
> when virsh can be taught to do the same. I couldn't quite reuse
> vshWatchJob, as this is not a case of a long-running command where
> a second thread must be used to probe job status (at least, not unless
> I make virsh start doing blocking waits for an event to fire), but it
> served as inspiration for my simpler single-threaded loop. There is
> up to a half-second delay between sending SIGINT and the job being
> aborted, but I didn't think it worth the complexity of a second thread
> and use of poll() just to minimize that delay.
>
> @@ -7295,8 +7295,9 @@ repoll:
> }
>
> GETTIMEOFDAY(&curr);
> - if ( timeout && ((int)(curr.tv_sec - start.tv_sec) * 1000 + \
> - (int)(curr.tv_usec - start.tv_usec) / 1000) > timeout *
1000 ) {
> + if (timeout && (((int)(curr.tv_sec - start.tv_sec) * 1000 +
> + (int)(curr.tv_usec - start.tv_usec) / 1000) >
> + timeout * 1000)) {
> /* suspend the domain when migration timeouts. */
> vshDebug(ctl, VSH_ERR_DEBUG, "%s timeout", label);
> if (timeout_func)
These two chunks are rather cosmetic but I'm okay with having them in
this patch not a separate one.
I noticed them because I was copying and pasting from them. Depending
on whether I spot other cleanups for my v2, I may split the cleanups
into a separate patch.
> + while (blocking) {
> + virDomainBlockJobInfo info;
> + int result = virDomainGetBlockJobInfo(dom, path, &info, 0);
> +
> + if (result < 0) {
> + vshError(ctl, _("failed to query job for disk %s"), path);
> + goto cleanup;
> + }
> + if (result == 0)
> + break;
> +
> + if (verbose)
> + print_job_progress(_("Block Pull"), info.end - info.cur,
info.end);
> +
> + GETTIMEOFDAY(&curr);
> + if (intCaught || (timeout &&
> + (((int)(curr.tv_sec - start.tv_sec) * 1000 +
> + (int)(curr.tv_usec - start.tv_usec) / 1000) >
> + timeout * 1000))) {
> + vshDebug(ctl, VSH_ERR_DEBUG,
> + intCaught ? "interrupted" : "timeout");
> + intCaught = 0;
> + timeout = 0;
> + quit = true;
> + if (virDomainBlockJobAbort(dom, path, 0) < 0) {
> + vshError(ctl, _("failed to abort job for disk %s"),
path);
> + goto cleanup;
> + }
Don't we need blocking = false; here? Otherwise we may spin another
rounds and issue GetBlockJobInfo API over and over again. But on the
other hand, I can imagine this is desired behavior. If BlockJobAbort
succeeds and returns 0 (we are enforcing synchronous operation here) it
should mean that block job has really been aborted. Therefore we may
issue GetBlockJobInfo to ascertain ourselves.
Hmm, interesting question, especially in light of the new async cancel
flag. I'm not sure how long a cancellation can take (it has to flush
all pending I/O, which may take a while), but you are correct that
passing in 0 for the abort flags means to wait until the cancel is done;
we may be waiting up to half a second to issue the abort from the time
you pressed Ctrl-C, but worse, we are now waiting for the entire
duration of the BlockJobAbort call, which might possibly be in the range
of seconds, and that can feel awfully slow to a user trying to hit
Ctrl-C to regain control of their terminal as fast as possible.
I think I need to add an --async flag that controls whether we return
control to the user as fast as possible after Ctrl-C (and fails if we
are talking to a too-old libvirtd), or omit the flag so that we
guarantee to wait until the job is cancelled, even if it takes a while
after the Ctrl-C. I'll have to respin this patch, and since I want to
do the same for my new 'virsh blockcopy' patch, I'll fold it into my
block storage migration series.
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org