I was thinking since a migration won't finish in 50 ms, we are going to do at least one sleep anyway. So, I just moved the sleep around. 

I agree that the ideal approach is to wait for migration completion event notification from Qemu, rather than polling every 50 ms. 

-Xing

On Fri, Apr 10, 2015 at 2:02 AM, Jiri Denemark <jdenemar@redhat.com> wrote:
On Fri, Apr 10, 2015 at 09:49:43 +0200, Michal Privoznik wrote:
> On 10.04.2015 00:32, Xing Lin wrote:
> > Hi,
> >
> > It seems that I found a bug in the qemuMigrationWaitForCompletion(). It
> > will do a sleep of 50 ms, even when qemuMigrationUpdateJobStatus() detects
> > that a migration has completed.
> >
> >
> > Here is the original implementation of the while loop.
> >
> >
> >    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> >         /* Poll every 50ms for progress & to allow cancellation */
> >         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull
> > };
> >
> >         if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) == -1)
> >             break;
> >
> >         /* cancel migration if disk I/O error is emitted while migrating */
> >         if (abort_on_error &&
> >             virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED &&
> >             pauseReason == VIR_DOMAIN_PAUSED_IOERROR) {
> >             virReportError(VIR_ERR_OPERATION_FAILED,
> >                            _("%s: %s"), job, _("failed due to I/O error"));
> >             break;
> >         }
> >
> >         if (dconn && virConnectIsAlive(dconn) <= 0) {
> >             virReportError(VIR_ERR_OPERATION_FAILED, "%s",
> >                            _("Lost connection to destination host"));
> >             break;
> >         }
> >
> >         virObjectUnlock(vm);
> >
> >         nanosleep(&ts, NULL);
> >
> >         virObjectLock(vm);
> >     }
> >
> >
> > Even when qemuMigrationUpdateJobStatus() detects a migration has completed,
> > the nanosleep() will be called again. When the while condition is checked,
> > then the program breaks from the while loop. This introduces an unnecessary
> > sleep of 50 ms which can be avoided by doing a sleep first. The code should
> > likely be structured as following.
> >
> >    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> >         /* Poll every 50ms for progress & to allow cancellation */
> >         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull
> > };
> >
> >         // do sleep here.
> >         virObjectUnlock(vm);
> >
> >         nanosleep(&ts, NULL);
> >
> >         virObjectLock(vm);
>
> Yeah, I guess it's very unlikely that migration will complete within
> first 50ms after issuing the command on the monitor (in which case we
> would again wait pointlessly). The other approach would be to leave the
> sleep() where it is, but enclose it in if (jobInfo->type ==
> VIR_DOMAIN_JOB_UNBOUNDED) {}. I have no preference over these two
> versions though. So ACK to the patch of yours. However, I'll postpone
> merging the patch for a while and let others express their feelings.

Or change to loop to while (1) and add a break before the sleep if
jobInfo->type is not VIR_DOMAIN_JOB_UNBOUNDED.

Personally, I don't mind either way. The ultimate solution is to avoid
any sleeps by waiting for an event. However, Juan did not implement the
even in QEMU yet. Once he does that, I will make patches for libvirt
that will use that instead of polling every 50ms.

Jirka