On Mon, 25 Jan 2016, John Ferlan wrote:
Since this has been sitting for a bit, I'll poke the bear, but
also CC
Peter since he reviewed V1 and had some specific concerns...
Thanks for picking up this review.
The concern before was how the code would handle the user passing a
different, equivalent path when querying the block job. I'm pretty sure we
came to the conclusion that that wouldn't ever have worked: the block job
wouldn't be able to be identified if the path was different.
On 01/06/2016 10:03 PM, Michael Chapman wrote:
> After a block job hits 100%, we only need to apply a timeout waiting for
> a block job event if exactly one of the BLOCK_JOB or BLOCK_JOB_2
> callbacks were able to be registered.
>
> If neither callback could be registered, there's clearly no need for a
> timeout.
>
> If both callbacks were registered, then we're guaranteed to eventually
> get one of the events. The path being used by virsh must be exactly the
> source path or target device in the domain's disk definition, and these
> are the respective strings sent back in these two events.
>
> Signed-off-by: Michael Chapman <mike(a)very.puzzling.org>
> ---
>
> v1 discussion at:
>
http://www.redhat.com/archives/libvir-list/2016-January/msg00031.html
>
> Changes since v1:
> - Fixed bugs in cb_id/cb_id2 conditionals
> - Consistently used break to exit loop, dropped cleanup label
> - Clarified the logic and behaviour in comments
> - Improved commit message
>
> tools/virsh-domain.c | 60 ++++++++++++++++++++++++++++------------------------
> 1 file changed, 32 insertions(+), 28 deletions(-)
>
> diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
> index edbbc34..853416c 100644
> --- a/tools/virsh-domain.c
> +++ b/tools/virsh-domain.c
> @@ -1875,14 +1875,17 @@ virshBlockJobWaitFree(virshBlockJobWaitDataPtr data)
> * virshBlockJobWait:
> * @data: private data initialized by virshBlockJobWaitInit
> *
> - * Waits for the block job to complete. This function prefers to get an event
> - * from libvirt but still has fallback means if the device name can't be
matched
> + * Waits for the block job to complete. This function prefers to wait for a
> + * matching VIR_DOMAIN_EVENT_ID_BLOCK_JOB or VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2
> + * event from libvirt, however it has a fallback mode should either of these
nit: It's usually "...; however, ..."
> + * events not be available.
> *
> - * This function returns values from the virConnectDomainEventBlockJobStatus enum
> - * or -1 in case of a internal error. Fallback states if a block job vanishes
> - * without triggering the event is VIR_DOMAIN_BLOCK_JOB_COMPLETED. For two phase
> - * jobs after the retry count for waiting for the event expires is
> - * VIR_DOMAIN_BLOCK_JOB_READY.
> + * This function returns values from the virConnectDomainEventBlockJobStatus
> + * enum or -1 in case of a internal error.
nit: "...of an internal..." (usage of an instead of a)
> + *
> + * If the fallback mode is activated the returned event is
> + * VIR_DOMAIN_BLOCK_JOB_COMPLETED if the block job vanishes, or
nit: "...block job vanishes or..." (no need for a comma)
> + * VIR_DOMAIN_BLOCK_JOB_READY if the block job reaches 100%.
> */
> static int
> virshBlockJobWait(virshBlockJobWaitDataPtr data)
> @@ -1932,28 +1935,32 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
>
> if (result < 0) {
> vshError(data->ctl, _("failed to query job for disk %s"),
data->dev);
> - goto cleanup;
> + break;
Usage of 'goto' is a libvirt coding convention, even though perhaps not
a convention of all programming styles. I think use of goto here should
be kept (here and other places that follow in this function).
If you feel strongly about usage of break vs. goto, then this should be
a two patch series - the first to convert "goto" usage into "break"
usage and the second to resolve the issue of continuing with the timeout
logic even if both callbacks could not be registered.
I am inclined to change the gotos to breaks. My rationale is that the code
really has two different success cases, ret = READY and ret = COMPLETED,
but having only one of these handled by a normal break out of the loop
seems odd. With my changes, both success cases are handled in the same
way.
Perhaps just the error cases should jump to cleanup? That could mean the
label could be moved after the "print 100% completed" block (and the ret
tests could be dropped there).
> }
>
> - /* if we've got an event for the device we are waiting for we can end
> - * the waiting loop */
> + /* If either callback could be registered and we've got an event, we
can
> + * can end the waiting loop */
> if ((data->cb_id >= 0 || data->cb_id2 >= 0) &&
data->status != -1) {
> ret = data->status;
> - goto cleanup;
> + break;
> }
>
> - /* since virsh can't guarantee that the path provided by the user will
> - * later be matched in the event we will need to keep the fallback
> - * approach and claim success if the block job finishes or vanishes. */
> - if (result == 0)
> - break;
> + /* Fallback behaviour is only needed if one or both callbacks could not
> + * be registered */
> + if (data->cb_id < 0 || data->cb_id2 < 0) {
> + /* If the block job vanishes, synthesize a COMPLETED event */
> + if (result == 0) {
> + ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED;
> + break;
> + }
>
> - /* for two-phase jobs we will try to wait in the synchronized phase
> - * for event arrival since 100% completion doesn't necessarily mean
that
> - * the block job has finished and can be terminated with success */
> - if (info.end == info.cur && --retries == 0) {
> - ret = VIR_DOMAIN_BLOCK_JOB_READY;
> - goto cleanup;
> + /* If the block job hits 100%, wait a little while for a possible
> + * event from libvirt, else synthesize our own READY event */
"If the block job hits 100%, wait a little while for a possible event
from libvirt unless both callbacks could not be registered in order to
synthesize our own READY event"
> + if (info.end == info.cur &&
> + ((data->cb_id < 0 && data->cb_id2 < 0) ||
--retries == 0)) {
> + ret = VIR_DOMAIN_BLOCK_JOB_READY;
> + break;
> + }
> }
>
> if (data->verbose)
Existing... Not part of this change per se; however, if we're printing
and "info.end == info.cur", then we'll already printed 100%
'retries'
times if one of the callbacks does exist...
Easy enough to change it so that it only calls virshPrintJobProgress if
progress has actually advanced. And yes, I'll do this in a separate patch
this time. :-)
John
> @@ -1962,26 +1969,23 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
>
> if (data->timeout && virTimeMillisNow(&curr) < 0) {
> vshSaveLibvirtError();
> - goto cleanup;
> + break;
> }
>
> if (intCaught || (data->timeout && (curr - start >
data->timeout))) {
> if (virDomainBlockJobAbort(data->dom, data->dev, abort_flags) <
0) {
> vshError(data->ctl, _("failed to abort job for disk
'%s'"),
> data->dev);
> - goto cleanup;
> + break;
> }
>
> ret = VIR_DOMAIN_BLOCK_JOB_CANCELED;
> - goto cleanup;
> + break;
> }
>
> usleep(500 * 1000);
> }
>
> - ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED;
> -
> - cleanup:
> /* print 100% completed */
> if (data->verbose &&
> (ret == VIR_DOMAIN_BLOCK_JOB_COMPLETED ||
>
- Michael