On Thu, Sep 01, 2016 at 02:11:34PM -0500, Eric Blake wrote:
On 09/01/2016 08:57 AM, Kashyap Chamarthy wrote:
> So, I'm trying to understand how libvirt reports the "cur" and
"end"
> values. I've read the virDomainBlockJobInfo() struct, it wasn't crystal
> clear. It states:
>
> /*
> * The following fields provide an indication of block job progress. @cur
> * indicates the current position and will be between 0 and @end. @end is
> * the final cursor position for this operation and represents completion.
> * To approximate progress, divide @cur by @end.
> */
Libvirt is (more or less) reporting numbers from qemu. What's more,
@end need not be the same between calls; qemu is free to change the end
value as it comes up with more work to do (or sees that less work
remains than initially estimated), all that REALLY matters is that the
ratio between the two numbers converges, and is < 1 while busy, and == 1
when complete. Except that there are cases in qemu where a block job
really has 0 work to do.
Prior to qemu exposing the "ready":true/false flag, libvirt had to guess
whether equal numbers really meant done, or if it merely meant nearly
done. But now that we have "ready", I think the sanest course of action
for libvirt is to fudge the numbers from qemu. After all, we've already
documented (both in libvirt and in qemu) that @end is not fixed, so much
as a moving target (it just doesn't move very much on operations where
we have a good grasp on how much work remains from the start, like a
deep copy; but is more prone to move on operations like live commit that
are influenced by how active the guest is at writing data that we are
attempting to commit at the same time).
[...]
> [kashyap]
>
> So, if the job hasn't started yet, what should libvirt report?
>
> [mprivozn]
>
> That's the question. We can't change the [virDomainBlockJobInfo]
> struct (otherwise we won't be ABI compatible), so we can't really
> add a bolean there 'bool job_started'.
>
> Or we can introduce new "job type" which wouldn't really be a job
> type, but we will fill status.job with it to say explicitly job
> hasn't started yet
>
>
> So, any other thoughts here, on how to proceed here?
My preference would be:
If qemu doesn't report anything (because the job is not started yet),
then libvirt should report cur=0, end=1 (the job still has 100% to go).
If qemu reports 0/0 and "done":false, then libvirt should report cur=0,
end=1 (that is, we fudge the end to be larger, because the job is not
done yet).
If qemu reports 0/0 and "done":true (because the job was really a
no-op), then libvirt should report cur=1, end=1 (the job is 100% complete).
If qemu reports 0/0 and lacks "done" (older qemu), then libvirt just has
to guess. I'm not sure which guess is most appropriate; maybe libvirt
itself will have to set up a timer and report 0/1 the first time, and
only report 1/1 after a minimum time has elapsed, to make sure qemu has
had a chance to do something about the job. Or maybe we don't worry
about it, and just have libvirt report 0/0 because we really don't know
any better.
If qemu reports a/b, where a < b and b > 0, use those numbers as is. We
don't even have to check "done".
If qemu reports a/a, where a > 0, then also check "done". If
"done":false is present, report a-1/a (the job is not quite done); if
"done" is absent or "done":true is present, report a/a (the job is
done).
[...]
Excellent, thanks for the detailed response, Eric. That clarifies a
lot.
I've filed this bug, and captured the discussion from this thread.
https://bugzilla.redhat.com/show_bug.cgi?id=1372613 -- Improve live
block device job status reporting via virDomainBlockJobInfo()
* * *
Michal has pointed to these two fixes related to the above bug, in his
branch:
https://github.com/zippy2/libvirt/commit/47dcb46 --
qemuDomainGetBlockJobInfo: Move info translation into separate func
https://github.com/zippy2/libvirt/commit/25557dd --
virDomainGetBlockJobInfo: Fix corner case when qemu reports no info
I don't have a proper reproducer to test the corner case. But I just
tested the regular "deep copy" test case, with the above fixes
$ git describe
v2.1.0-231-g25557dd
$ sudo ./run tools/virsh start cvm1
$ sudo ./run tools/virsh dumpxml \
--inactive cvm1 > /var/tmp/cvm1.xml
$ sudo ./run tools/virsh undefine cvm1
Domain cvm1 has been undefined
$ sudo ./run tools/virsh blockcopy cvm1 \
vda /export/copy-cirrvm1.qcow2 --wait --verbose
Block Copy: [100 %]
Now in mirroring phase
$ sudo ./run tools/virsh blockjob cvm1 vda --raw
type=Block Copy
bandwidth=0
cur=41126400
end=41126400
$ sudo ./run tools/virsh blockjob cvm1 vda --abort
--
/kashyap