
On Thu, Oct 06, 2016 at 09:25:26AM -0500, Eric Blake wrote:
On 10/06/2016 06:34 AM, Peter Krempa wrote:
[...]
We expose the state of the copy job in the XML and forward the READY event from qemu to the users.
I was not aware of that when I was chatting on IRC yesterday; that's useful to know, because virDomainGetBlockJobInfo() is NOT exposing that information at the moment.
That is what this RFC was asking to consider -- whether an [I think it has to be a new one] API should report.
The documentation suggests that block jobs should listen to the events and act accordingly only after receiving the event.
Yes, but the documentation ALSO states that waiting for cur==end is SUPPOSED to work. And it doesn't.
Yes.
libvirt finds cur==end AND sends a pivot request, all in the window before QEMU would have sent "ready": true field [emitted as part of the
This is not true. Libvirt checks that the mirror is actually ready. It's done by the commit you've mentioned above.
In other words, Nova sees cur==end, and requests the pivot, but libvirt is rejecting Nova's request because 'ready' is not true yet; and Nova then gives up rather than trying again.
Indeed ^ (I made this correction in my other response.)
QMP `query-block-jobs` command's response, indicating that the job has actually completed], however the pivot request fails because it requires "ready": true.
The problem is that you are polling the block job info which correctly reports that all data was properly copied and you are inferring the block job state from that data.
But the problem here is that qemu is NOT accurately reporting data - it is reporting cur==end with the promise that they are only equal if the job is stable, WHEN THE JOB IS NOT STABLE.
That's precisely the source of the confusion for Nova here.
I'm against deliberately reporting false data in the block info structure.
We are NOT falsifying any information, any more than we are falsifying information by changing cur/end to 0/1 when ready:false and qemu reported 0/0. (see commit 988218ca).
Indeed, it seems inconsistent to allow it in one case (like the above commit ID) to adjust (& _not_ falsify, as you accurately point out) libvirt reporting, but not the other case (cur==end, "ready": false case when cur != 0).
The application should register handlers for the block job events and act only if it receives such event. Additionally you may want to check that the state is correct in the XML. The current block job information structure can't be extended unfortunately.
Yes, changing Nova to use event handlers is a good idea. But I'm ALSO in favor of fixing libvirt to work around the qemu bug, by intentionally munging the output to state cur<end (even if qemu reported cur==end) if qemu reports ready:false.
Given the above, I've re-opened the bug here: https://bugzilla.redhat.com/show_bug.cgi?id=1382165#c3 -- /kashyap