I was thinking since a migration won't finish in 50 ms, we are going to do
at least one sleep anyway. So, I just moved the sleep around.
I agree that the ideal approach is to wait for migration completion event
notification from Qemu, rather than polling every 50 ms.
-Xing
On Fri, Apr 10, 2015 at 2:02 AM, Jiri Denemark <jdenemar(a)redhat.com> wrote:
On Fri, Apr 10, 2015 at 09:49:43 +0200, Michal Privoznik wrote:
> On 10.04.2015 00:32, Xing Lin wrote:
> > Hi,
> >
> > It seems that I found a bug in the qemuMigrationWaitForCompletion(). It
> > will do a sleep of 50 ms, even when qemuMigrationUpdateJobStatus()
detects
> > that a migration has completed.
> >
> >
> > Here is the original implementation of the while loop.
> >
> >
> > while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> > /* Poll every 50ms for progress & to allow cancellation */
> > struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 *
1000ull
> > };
> >
> > if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) ==
-1)
> > break;
> >
> > /* cancel migration if disk I/O error is emitted while
migrating */
> > if (abort_on_error &&
> > virDomainObjGetState(vm, &pauseReason) ==
VIR_DOMAIN_PAUSED &&
> > pauseReason == VIR_DOMAIN_PAUSED_IOERROR) {
> > virReportError(VIR_ERR_OPERATION_FAILED,
> > _("%s: %s"), job, _("failed due to
I/O
error"));
> > break;
> > }
> >
> > if (dconn && virConnectIsAlive(dconn) <= 0) {
> > virReportError(VIR_ERR_OPERATION_FAILED, "%s",
> > _("Lost connection to destination
host"));
> > break;
> > }
> >
> > virObjectUnlock(vm);
> >
> > nanosleep(&ts, NULL);
> >
> > virObjectLock(vm);
> > }
> >
> >
> > Even when qemuMigrationUpdateJobStatus() detects a migration has
completed,
> > the nanosleep() will be called again. When the while condition is
checked,
> > then the program breaks from the while loop. This introduces an
unnecessary
> > sleep of 50 ms which can be avoided by doing a sleep first. The code
should
> > likely be structured as following.
> >
> > while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
> > /* Poll every 50ms for progress & to allow cancellation */
> > struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 *
1000ull
> > };
> >
> > // do sleep here.
> > virObjectUnlock(vm);
> >
> > nanosleep(&ts, NULL);
> >
> > virObjectLock(vm);
>
> Yeah, I guess it's very unlikely that migration will complete within
> first 50ms after issuing the command on the monitor (in which case we
> would again wait pointlessly). The other approach would be to leave the
> sleep() where it is, but enclose it in if (jobInfo->type ==
> VIR_DOMAIN_JOB_UNBOUNDED) {}. I have no preference over these two
> versions though. So ACK to the patch of yours. However, I'll postpone
> merging the patch for a while and let others express their feelings.
Or change to loop to while (1) and add a break before the sleep if
jobInfo->type is not VIR_DOMAIN_JOB_UNBOUNDED.
Personally, I don't mind either way. The ultimate solution is to avoid
any sleeps by waiting for an event. However, Juan did not implement the
even in QEMU yet. Once he does that, I will make patches for libvirt
that will use that instead of polling every 50ms.
Jirka