Some mgmt still use polling for block job completion. After job completion the
job failure/success is infered by inspecting domain xml. With legacy block job
processing this does not always work.
The issue deals with how libvirt processes events. If no other thread is
waiting for blockjob event then event processing if offloaded to worker thread.
If now virDomainGetBlockJobInfo API is called then as block job is already
dismissed in legacy scheme the API returns 0 but backing chain is not yet
updated as processing yet be done in worker thread. Now mgmt checks backing
chain right after return from the API call and detects error.
This happens quite often under load. I guess because of we have only one worker
thread for all the domains.
The event delivery is synchronous in qemu and block job completed event is sent
in job finalize step so if block job is absent the event is already delivered
and we just need to process it.
Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy(a)virtuozzo.com>
---
src/qemu/qemu_driver.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 05917eb..25f66df 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -14740,8 +14740,15 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom,
ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm), job->name,
&rawInfo);
if (qemuDomainObjExitMonitor(driver, vm) < 0)
ret = -1;
- if (ret <= 0)
+ if (ret < 0)
+ goto endjob;
+ if (ret == 0) {
+ qemuDomainObjPrivatePtr priv = vm->privateData;
+
+ if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV))
+ qemuBlockJobUpdate(vm, job, QEMU_ASYNC_JOB_NONE);
goto endjob;
+ }
if (qemuBlockJobInfoTranslate(&rawInfo, info, disk,
flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES)
< 0) {
--
1.8.3.1