Re: [libvirt] [PATCH v2] virsh: improve waiting for block job readiness

25 Jan 2016

Since this has been sitting for a bit, I'll poke the bear, but also CC
Peter since he reviewed V1 and had some specific concerns...


On 01/06/2016 10:03 PM, Michael Chapman wrote:
...
After a block job hits 100%, we only need to apply a timeout waiting for
a block job event if exactly one of the BLOCK_JOB or BLOCK_JOB_2
callbacks were able to be registered.
If neither callback could be registered, there's clearly no need for a
timeout.
If both callbacks were registered, then we're guaranteed to eventually
get one of the events. The path being used by virsh must be exactly the
source path or target device in the domain's disk definition, and these
are the respective strings sent back in these two events.
Signed-off-by: Michael Chapman <mike@very.puzzling.org>
---
v1 discussion at:
http://www.redhat.com/archives/libvir-list/2016-January/msg00031.html
Changes since v1:
- Fixed bugs in cb_id/cb_id2 conditionals
- Consistently used break to exit loop, dropped cleanup label
- Clarified the logic and behaviour in comments
- Improved commit message
tools/virsh-domain.c | 60 ++++++++++++++++++++++++++++------------------------
 1 file changed, 32 insertions(+), 28 deletions(-)

diff --git a/tools/virsh-domain.c b/tools/virsh-domain.c
index edbbc34..853416c 100644
--- a/tools/virsh-domain.c
+++ b/tools/virsh-domain.c
@@ -1875,14 +1875,17 @@ virshBlockJobWaitFree(virshBlockJobWaitDataPtr data)
  * virshBlockJobWait:
  * @data: private data initialized by virshBlockJobWaitInit
  *
- * Waits for the block job to complete. This function prefers to get an event
- * from libvirt but still has fallback means if the device name can't be matched
+ * Waits for the block job to complete. This function prefers to wait for a
+ * matching VIR_DOMAIN_EVENT_ID_BLOCK_JOB or VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2
+ * event from libvirt, however it has a fallback mode should either of these
nit: It's usually "...; however, ..."
...
+ * events not be available.
  *
- * This function returns values from the virConnectDomainEventBlockJobStatus enum
- * or -1 in case of a internal error. Fallback states if a block job vanishes
- * without triggering the event is VIR_DOMAIN_BLOCK_JOB_COMPLETED. For two phase
- * jobs after the retry count for waiting for the event expires is
- * VIR_DOMAIN_BLOCK_JOB_READY.
+ * This function returns values from the virConnectDomainEventBlockJobStatus
+ * enum or -1 in case of a internal error.
nit: "...of an internal..."  (usage of an instead of a)
...
+ *
+ * If the fallback mode is activated the returned event is
+ * VIR_DOMAIN_BLOCK_JOB_COMPLETED if the block job vanishes, or
nit: "...block job vanishes or..."  (no need for a comma)
...
+ * VIR_DOMAIN_BLOCK_JOB_READY if the block job reaches 100%.
  */
 static int
 virshBlockJobWait(virshBlockJobWaitDataPtr data)
@@ -1932,28 +1935,32 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
if (result < 0) {
             vshError(data->ctl, _("failed to query job for disk %s"), data->dev);
-            goto cleanup;
+            break;
Usage of 'goto' is a libvirt coding convention, even though perhaps not
a convention of all programming styles. I think use of goto here should
be kept (here and other places that follow in this function).

If you feel strongly about usage of break vs. goto, then this should be
a two patch series - the first to convert "goto" usage into "break"
usage and the second to resolve the issue of continuing with the timeout
logic even if both callbacks could not be registered.
...
}
-        /* if we've got an event for the device we are waiting for we can end
-         * the waiting loop */
+        /* If either callback could be registered and we've got an event, we can
+         * can end the waiting loop */
         if ((data->cb_id >= 0 || data->cb_id2 >= 0) && data->status != -1) {
             ret = data->status;
-            goto cleanup;
+            break;
         }
-        /* since virsh can't guarantee that the path provided by the user will
-         * later be matched in the event we will need to keep the fallback
-         * approach and claim success if the block job finishes or vanishes. */
-        if (result == 0)
-            break;
+        /* Fallback behaviour is only needed if one or both callbacks could not
+         * be registered */
+        if (data->cb_id < 0 || data->cb_id2 < 0) {
+            /* If the block job vanishes, synthesize a COMPLETED event */
+            if (result == 0) {
+                ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED;
+                break;
+            }
-        /* for two-phase jobs we will try to wait in the synchronized phase
-         * for event arrival since 100% completion doesn't necessarily mean that
-         * the block job has finished and can be terminated with success */
-        if (info.end == info.cur && --retries == 0) {
-            ret = VIR_DOMAIN_BLOCK_JOB_READY;
-            goto cleanup;
+            /* If the block job hits 100%, wait a little while for a possible
+             * event from libvirt, else synthesize our own READY event */
"If the block job hits 100%, wait a little while for a possible event
from libvirt unless both callbacks could not be registered in order to
synthesize our own READY event"
...
+            if (info.end == info.cur &&
+                ((data->cb_id < 0 && data->cb_id2 < 0) || --retries == 0)) {
+                ret = VIR_DOMAIN_BLOCK_JOB_READY;
+                break;
+            }
         }
if (data->verbose)
Existing... Not part of this change per se; however, if we're printing
and "info.end == info.cur", then we'll already printed 100% 'retries'
times if one of the callbacks does exist...


John
...
@@ -1962,26 +1969,23 @@ virshBlockJobWait(virshBlockJobWaitDataPtr data)
if (data->timeout && virTimeMillisNow(&curr) < 0) {
             vshSaveLibvirtError();
-            goto cleanup;
+            break;
         }
if (intCaught || (data->timeout && (curr - start > data->timeout))) {
             if (virDomainBlockJobAbort(data->dom, data->dev, abort_flags) < 0) {
                 vshError(data->ctl, _("failed to abort job for disk '%s'"),
                          data->dev);
-                goto cleanup;
+                break;
             }
ret = VIR_DOMAIN_BLOCK_JOB_CANCELED;
-            goto cleanup;
+            break;
         }
usleep(500 * 1000);
     }
-    ret = VIR_DOMAIN_BLOCK_JOB_COMPLETED;
-
- cleanup:
     /* print 100% completed */
     if (data->verbose &&
         (ret == VIR_DOMAIN_BLOCK_JOB_COMPLETED ||


    

Re: [libvirt] [PATCH v2] virsh: improve waiting for block job readiness

John Ferlan