Backround
---------
For QEMU block device jobs, the "ready" boolean field (part of QMP
`query-block-jobs`) was introduced in commit ef6dbf1 (available in QEMU
v2.2.0 or above):
http://git.qemu.org/?p=qemu.git;a=commitdiff;h=ef6dbf1e4 --
blockjob: Add "ready" field
"When a block job signals readiness, this is currently reported only
through QMP. If qemu wants to use block jobs for internal tasks,
there needs to be another way to correctly detect when a block job
may be completed.
For this reason, introduce a bool "ready" which is set when the
block job may be completed."
And, libvirt was fixed to use the above field in this commit (available
in libvirt v1.2.18 or above):
http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=eae5924 -- qemu:
Update state of block job to READY only if it actually is ready
RFC
---
Currently libvirt block APIs (& consequently higher-level applications
like Nova which use these APIs) rely on polling for job completion via
virDomainGetBlockJobInfo(), which uses QMP `query-block-jobs`, and
waits for QEMU to report "offset" == "len", which translates to
libvirt
"cur" == "end". Based on this, libvirt can take an action (whether
to
gracefully abort, or pivot to the copy in case of a COPY job).
Since QEMU reports the "ready": true field (followed by a
BLOCK_JOB_READY QMP event). It would be helpful if libvirt expose this
via an API, so upper layers could instead use that, rather than polling.
Problem scenario
----------------
When virDomainBlockRebase() is invoked to start a copy job, then
aborting the said copy operation with virDomainBlockJobAbort() + flag
VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT can result in a potential race
condition (due to the way the virDomainGetBlockJobInfo() reports the job
status) where the pivot operation fails.
Race condition window
~~~~~~~~~~~~~~~~~~~~~
libvirt finds cur==end AND sends a pivot request, all in the window
before QEMU would have sent "ready": true field [emitted as part of the
QMP `query-block-jobs` command's response, indicating that the job has
actually completed], however the pivot request fails because it requires
"ready": true.
So Eric Blake suggests:
QEMU 2.0 or 1.x probably had a synchronous setup where you could
never observer cur==end on a non-ready job. But I don't remember
enough history to point to when QEMU switched jobs to be a bit more
asynchronous. Maybe there was no qemu regression - maybe it was
BECAUSE of other block-job additions in 2.2 that offset==len was no
longer reliable. I don't know that for sure.
But what it DOES sound like is that IF qemu reports "ready": false,
offset==len is not reliable, and libvirt should be taught to fudge
that.
And hopefully, QEMU too old to report "ready:" at all is reliable
with regards to offset==len, because that's all we have to go by.
For now, I filed this upstream libvirt bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats &
the "ready" field of `query-block-jobs`
However, exposing the "ready" boolean from QMP `query-block-jobs` might
be worth considering.
--
/kashyap