[PATCH 0/7] qemu: Add workaround for apps polling virDomainGetBlockJobInfo with pre-blockdev qemu

This is a workaround for the issue that Nikolay tries to fix in: https://www.redhat.com/archives/libvir-list/2020-November/msg00669.html This one is a far more focused hack for apps which don't bother to update to events for block job completion checking. Peter Krempa (7): virDomainGetBlockJobInfo: Discourage polling for block job completion detection virDomainGetBlockJobInfo: Reword docs for fallback values qemuMonitorBlockJobInfo: Store 'ready' and 'ready_present' separately qemuBlockJobInfoTranslate: Use explicit comparison against 0 qemuDomainGetBlockJobInfo: Use qemuMonitorGetAllBlockJobInfo qemu: monitor: Remove unused qemuMonitorGetBlockJobInfo qemuDomainGetBlockJobInfo: Work stats for unfinished pre-blockdev blockjob src/libvirt-domain.c | 23 +++++++++++++++------ src/qemu/qemu_driver.c | 39 +++++++++++++++++++++++------------- src/qemu/qemu_monitor.c | 29 --------------------------- src/qemu/qemu_monitor.h | 7 ++----- src/qemu/qemu_monitor_json.c | 8 ++------ src/qemu/qemu_process.c | 4 ++-- 6 files changed, 48 insertions(+), 62 deletions(-) -- 2.28.0

Add a note saying that polling virDomainGetBlockJobInfo is not a good idea. Use events instead. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/libvirt-domain.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index f5cd43ecea..5d3747fd39 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -9947,6 +9947,11 @@ virDomainBlockJobAbort(virDomainPtr dom, const char *disk, * and was no-op. In this case libvirt reports cur = 1 and end = 1. * Since 2.3.0. * + * Applications looking for a reliable and low-overhead way to determine whether + * a block job already finished or reached synchronised phase should register a + * handler for the VIR_DOMAIN_EVENT_ID_BLOCK_JOB_2 event instead of polling this + * API. + * * Note that the progress reported for blockjobs corresponding to a pull-mode * backup don't report progress of the backup but rather usage of temporary * space required for the backup. -- 2.28.0

Explicitly state that if 'end == 1' the data doesn't represent actual progress in most cases. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/libvirt-domain.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 5d3747fd39..5edc73ecad 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -9940,12 +9940,18 @@ virDomainBlockJobAbort(virDomainPtr dom, const char *disk, * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * - * As a corner case underlying hypervisor may report cur == 0 and - * end == 0 when the block job hasn't been started yet. In this - * case libvirt reports cur = 0 and end = 1. However, hypervisor - * may return cur == 0 and end == 0 if the block job has finished - * and was no-op. In this case libvirt reports cur = 1 and end = 1. - * Since 2.3.0. + * In cases when libvirt can't determine actual progress of the block job from + * the underlying hypervisor due to corner cases such as the job wasn't yet + * fully initialized, or finalized and thus the progress can't be queried, + * libvirt reports 'cur = 0, end = 1'. + * + * For jobs requiring finalizing via qemuDomainBlockJobAbort() with + * VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT flag which reached synchronised phase, but + * were empty, or the progress can't be determined libvirt returns + * 'cur = 1, end = 1'. + * + * Users thus should not consider any data where 'end = 1' as absolute progress + * value. * * Applications looking for a reliable and low-overhead way to determine whether * a block job already finished or reached synchronised phase should register a -- 2.28.0

Don't make the logic confusing by representing the 3 options using an integer with negative values. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 9 +++++---- src/qemu/qemu_monitor.h | 3 ++- src/qemu/qemu_monitor_json.c | 8 ++------ src/qemu/qemu_process.c | 4 ++-- 4 files changed, 11 insertions(+), 13 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4fd70ed300..800f98e474 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14650,17 +14650,18 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, * and end are zero, in which case qemu hasn't started the * job yet. */ if (!info->cur && !info->end) { - if (rawInfo->ready > 0) { - info->cur = info->end = 1; - } else if (!rawInfo->ready) { + if (rawInfo->ready_present) { info->end = 1; + if (rawInfo->ready) + info->cur = 1; } } /* If qemu reports that it's not ready yet don't make the job go to * cur == end as some apps wrote code polling this instead of waiting for * the ready event */ - if (rawInfo->ready == 0 && + if (rawInfo->ready_present && + !rawInfo->ready && info->cur == info->end && info->cur > 0) info->cur -= 1; diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 49be2d5412..a2f28f9492 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1117,7 +1117,8 @@ struct _qemuMonitorBlockJobInfo { unsigned long long bandwidth; /* in bytes/s */ virDomainBlockJobCursor cur; virDomainBlockJobCursor end; - int ready; /* -1 if unknown, 0 if not ready, 1 if ready */ + bool ready_present; + bool ready; }; GHashTable *qemuMonitorGetAllBlockJobInfo(qemuMonitorPtr mon, diff --git a/src/qemu/qemu_monitor_json.c b/src/qemu/qemu_monitor_json.c index 4db00e284a..94e482cc97 100644 --- a/src/qemu/qemu_monitor_json.c +++ b/src/qemu/qemu_monitor_json.c @@ -5049,7 +5049,6 @@ qemuMonitorJSONParseBlockJobInfo(GHashTable *blockJobs, qemuMonitorBlockJobInfoPtr info = NULL; const char *device; const char *type; - bool ready; if (!(device = virJSONValueObjectGetString(entry, "device"))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", @@ -5067,9 +5066,6 @@ qemuMonitorJSONParseBlockJobInfo(GHashTable *blockJobs, return -1; } - /* assume we don't know the state */ - info->ready = -1; - if (!(type = virJSONValueObjectGetString(entry, "type"))) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("entry was missing 'type'")); @@ -5104,8 +5100,8 @@ qemuMonitorJSONParseBlockJobInfo(GHashTable *blockJobs, return -1; } - if (virJSONValueObjectGetBoolean(entry, "ready", &ready) == 0) - info->ready = ready; + if (virJSONValueObjectGetBoolean(entry, "ready", &info->ready) == 0) + info->ready_present = true; return 0; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 9d83825190..cbd29d867b 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -8176,8 +8176,8 @@ qemuProcessRefreshLegacyBlockjob(void *payload, return -1; if (disk->mirror) { - if (info->ready == 1 || - (info->ready == -1 && info->end == info->cur)) { + if ((!info->ready_present && info->end == info->cur) || + info->ready) { disk->mirrorState = VIR_DOMAIN_DISK_MIRROR_STATE_READY; job->state = VIR_DOMAIN_BLOCK_JOB_READY; } -- 2.28.0

On 12/4/20 4:07 PM, Peter Krempa wrote:
Don't make the logic confusing by representing the 3 options using an integer with negative values.
Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 9 +++++---- src/qemu/qemu_monitor.h | 3 ++- src/qemu/qemu_monitor_json.c | 8 ++------ src/qemu/qemu_process.c | 4 ++-- 4 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index 49be2d5412..a2f28f9492 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1117,7 +1117,8 @@ struct _qemuMonitorBlockJobInfo { unsigned long long bandwidth; /* in bytes/s */ virDomainBlockJobCursor cur; virDomainBlockJobCursor end; - int ready; /* -1 if unknown, 0 if not ready, 1 if ready */ + bool ready_present; + bool ready;
At your discretion this can be virTristateBool. Michal

Using ! on integers is misleading. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 800f98e474..b82f7e249a 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14649,7 +14649,7 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, * applications think job is completed. Except when both cur * and end are zero, in which case qemu hasn't started the * job yet. */ - if (!info->cur && !info->end) { + if (info->cur == 0 && info->end == 0) { if (rawInfo->ready_present) { info->end = 1; if (rawInfo->ready) -- 2.28.0

Replace qemuMonitorGetBlockJobInfo by qemuMonitorGetAllBlockJobInfo and hash table lookup. This basically open-codes qemuMonitorGetBlockJobInfo, but it will be removed in next patch. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index b82f7e249a..e2fde6c76f 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14695,8 +14695,9 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, virDomainObjPtr vm; virDomainDiskDefPtr disk; int ret = -1; - qemuMonitorBlockJobInfo rawInfo; + qemuMonitorBlockJobInfoPtr rawInfo; g_autoptr(qemuBlockJobData) job = NULL; + g_autoptr(GHashTable) blockjobstats = NULL; virCheckFlags(VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES, -1); @@ -14722,18 +14723,21 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, } qemuDomainObjEnterMonitor(driver, vm); - ret = qemuMonitorGetBlockJobInfo(qemuDomainGetMonitor(vm), job->name, &rawInfo); - if (qemuDomainObjExitMonitor(driver, vm) < 0) - ret = -1; - if (ret <= 0) + blockjobstats = qemuMonitorGetAllBlockJobInfo(qemuDomainGetMonitor(vm), true); + if (qemuDomainObjExitMonitor(driver, vm) < 0 || !blockjobstats) goto endjob; - if (qemuBlockJobInfoTranslate(&rawInfo, info, disk, - flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES) < 0) { - ret = -1; + if (!(rawInfo = g_hash_table_lookup(blockjobstats, job->name))) { + ret = 0; goto endjob; } + if (qemuBlockJobInfoTranslate(rawInfo, info, disk, + flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES) < 0) + goto endjob; + + ret = 1; + endjob: qemuDomainObjEndJob(driver, vm); -- 2.28.0

Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_monitor.c | 29 ----------------------------- src/qemu/qemu_monitor.h | 4 ---- 2 files changed, 33 deletions(-) diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c index 551b65e778..40f2997cb6 100644 --- a/src/qemu/qemu_monitor.c +++ b/src/qemu/qemu_monitor.c @@ -3393,35 +3393,6 @@ qemuMonitorGetAllBlockJobInfo(qemuMonitorPtr mon, } -/** - * qemuMonitorGetBlockJobInfo: - * Parse Block Job information, and populate info for the named device. - * Return 1 if info available, 0 if device has no block job, and -1 on error. - */ -int -qemuMonitorGetBlockJobInfo(qemuMonitorPtr mon, - const char *alias, - qemuMonitorBlockJobInfoPtr info) -{ - GHashTable *all; - qemuMonitorBlockJobInfoPtr data; - int ret = 0; - - VIR_DEBUG("alias=%s, info=%p", alias, info); - - if (!(all = qemuMonitorGetAllBlockJobInfo(mon, true))) - return -1; - - if ((data = virHashLookup(all, alias))) { - *info = *data; - ret = 1; - } - - virHashFree(all); - return ret; -} - - int qemuMonitorJobDismiss(qemuMonitorPtr mon, const char *jobname) diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h index a2f28f9492..3a09b995ce 100644 --- a/src/qemu/qemu_monitor.h +++ b/src/qemu/qemu_monitor.h @@ -1123,10 +1123,6 @@ struct _qemuMonitorBlockJobInfo { GHashTable *qemuMonitorGetAllBlockJobInfo(qemuMonitorPtr mon, bool rawjobname); -int qemuMonitorGetBlockJobInfo(qemuMonitorPtr mon, - const char *device, - qemuMonitorBlockJobInfoPtr info) - ATTRIBUTE_NONNULL(2) ATTRIBUTE_NONNULL(3); int qemuMonitorJobDismiss(qemuMonitorPtr mon, const char *jobname) -- 2.28.0

If the job has finished, but we didn't yet process the completion fake that it's still incomplete so that apps which decided to poll qemuDomainGetBlockJobInfo rather than use events can be sure that the XML update was completed. Signed-off-by: Peter Krempa <pkrempa@redhat.com> --- src/qemu/qemu_driver.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index e2fde6c76f..3051d2c7ca 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -14642,6 +14642,15 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, virDomainDiskDefPtr disk, bool reportBytes) { + /* If the job data is no longer present this means that the job already + * disappeared in qemu (pre-blockdev) but libvirt didn't process the + * finishing yet. Fake a incomplete job. */ + if (!rawInfo) { + info->cur = 0; + info->end = 1; + return 0; + } + info->cur = rawInfo->cur; info->end = rawInfo->end; @@ -14727,10 +14736,7 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, if (qemuDomainObjExitMonitor(driver, vm) < 0 || !blockjobstats) goto endjob; - if (!(rawInfo = g_hash_table_lookup(blockjobstats, job->name))) { - ret = 0; - goto endjob; - } + rawInfo = g_hash_table_lookup(blockjobstats, job->name); if (qemuBlockJobInfoTranslate(rawInfo, info, disk, flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES) < 0) -- 2.28.0

On 12/4/20 4:07 PM, Peter Krempa wrote:
This is a workaround for the issue that Nikolay tries to fix in:
https://www.redhat.com/archives/libvir-list/2020-November/msg00669.html
This one is a far more focused hack for apps which don't bother to update to events for block job completion checking.
Peter Krempa (7): virDomainGetBlockJobInfo: Discourage polling for block job completion detection virDomainGetBlockJobInfo: Reword docs for fallback values qemuMonitorBlockJobInfo: Store 'ready' and 'ready_present' separately qemuBlockJobInfoTranslate: Use explicit comparison against 0 qemuDomainGetBlockJobInfo: Use qemuMonitorGetAllBlockJobInfo qemu: monitor: Remove unused qemuMonitorGetBlockJobInfo qemuDomainGetBlockJobInfo: Work stats for unfinished pre-blockdev blockjob
src/libvirt-domain.c | 23 +++++++++++++++------ src/qemu/qemu_driver.c | 39 +++++++++++++++++++++++------------- src/qemu/qemu_monitor.c | 29 --------------------------- src/qemu/qemu_monitor.h | 7 ++----- src/qemu/qemu_monitor_json.c | 8 ++------ src/qemu/qemu_process.c | 4 ++-- 6 files changed, 48 insertions(+), 62 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal

On Fri, Dec 04, 2020 at 18:26:46 +0100, Michal Privoznik wrote:
On 12/4/20 4:07 PM, Peter Krempa wrote:
This is a workaround for the issue that Nikolay tries to fix in:
https://www.redhat.com/archives/libvir-list/2020-November/msg00669.html
This one is a far more focused hack for apps which don't bother to update to events for block job completion checking.
Peter Krempa (7): virDomainGetBlockJobInfo: Discourage polling for block job completion detection virDomainGetBlockJobInfo: Reword docs for fallback values qemuMonitorBlockJobInfo: Store 'ready' and 'ready_present' separately qemuBlockJobInfoTranslate: Use explicit comparison against 0 qemuDomainGetBlockJobInfo: Use qemuMonitorGetAllBlockJobInfo qemu: monitor: Remove unused qemuMonitorGetBlockJobInfo qemuDomainGetBlockJobInfo: Work stats for unfinished pre-blockdev blockjob
src/libvirt-domain.c | 23 +++++++++++++++------ src/qemu/qemu_driver.c | 39 +++++++++++++++++++++++------------- src/qemu/qemu_monitor.c | 29 --------------------------- src/qemu/qemu_monitor.h | 7 ++----- src/qemu/qemu_monitor_json.c | 8 ++------ src/qemu/qemu_process.c | 4 ++-- 6 files changed, 48 insertions(+), 62 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com>
You've accidentally pushed this series :) ... and didn't apply your R-bs :D
participants (2)
-
Michal Privoznik
-
Peter Krempa