[libvirt] [PATCH 0/2] Report block jobs more wisely

There's been a discussion recently (even here on the list [1]) that Nova is sometimes unable to fetch block job stats properly. Basically, we may report job.cur == job.end == 0 in some cases tricking Nova into thinking job is done when in fact it hasn't even started yet. Michal Privoznik (2): qemuDomainGetBlockJobInfo: Move info translation into separate func virDomainGetBlockJobInfo: Fix corner case when qemu reports no info src/libvirt-domain.c | 7 ++++++ src/qemu/qemu_driver.c | 58 ++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 49 insertions(+), 16 deletions(-) -- 2.8.4

Even though we merely just pass to users whatever qemu provided on the monitor, we still do some translation. For instance we turn bytes into mebibytes, or fix job type if needed. However, in the future there is more fixing to be done so this code deserves its own function. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/qemu/qemu_driver.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 807e06d..4535eb8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16330,6 +16330,34 @@ qemuDomainBlockJobAbort(virDomainPtr dom, static int +qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, + virDomainBlockJobInfoPtr info, + virDomainDiskDefPtr disk, + bool reportBytes) +{ + info->cur = rawInfo->cur; + info->end = rawInfo->end; + + info->type = rawInfo->type; + if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && + disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) + info->type = disk->mirrorJob; + + if (rawInfo->bandwidth && !reportBytes) + rawInfo->bandwidth = VIR_DIV_UP(rawInfo->bandwidth, 1024 * 1024); + info->bandwidth = rawInfo->bandwidth; + if (info->bandwidth != rawInfo->bandwidth) { + virReportError(VIR_ERR_OVERFLOW, + _("bandwidth %llu cannot be represented in result"), + rawInfo->bandwidth); + return -1; + } + + return 0; +} + + +static int qemuDomainGetBlockJobInfo(virDomainPtr dom, const char *path, virDomainBlockJobInfoPtr info, @@ -16376,22 +16404,8 @@ qemuDomainGetBlockJobInfo(virDomainPtr dom, if (ret <= 0) goto endjob; - info->cur = rawInfo.cur; - info->end = rawInfo.end; - - info->type = rawInfo.type; - if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && - disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) - info->type = disk->mirrorJob; - - if (rawInfo.bandwidth && - !(flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES)) - rawInfo.bandwidth = VIR_DIV_UP(rawInfo.bandwidth, 1024 * 1024); - info->bandwidth = rawInfo.bandwidth; - if (info->bandwidth != rawInfo.bandwidth) { - virReportError(VIR_ERR_OVERFLOW, - _("bandwidth %llu cannot be represented in result"), - rawInfo.bandwidth); + if (qemuBlockJobInfoTranslate(&rawInfo, info, disk, + flags & VIR_DOMAIN_BLOCK_JOB_INFO_BANDWIDTH_BYTES) < 0) { ret = -1; goto endjob; } -- 2.8.4

https://bugzilla.redhat.com/show_bug.cgi?id=1372613 Apparently, some management applications use the following code pattern when waiting for a block job to finish: while (1) { virDomainGetBlockJobInfo(dom, disk, info, flags); if (info.cur == info.end) break; sleep(1); } Problem with this approach is in its corner cases. In case of QEMU, libvirt merely pass what has been reported on the monitor. However, if the block job hasn't started yet, qemu reports cur == end == 0 which tricks mgmt apps into thinking job is complete. The solution is to mangle cur/end values as described here [1]. 1: https://www.redhat.com/archives/libvir-list/2016-September/msg00017.html Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt-domain.c | 7 +++++++ src/qemu/qemu_driver.c | 12 ++++++++++++ 2 files changed, 19 insertions(+) diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 46f0318..fa82e49 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -9896,6 +9896,13 @@ virDomainBlockJobAbort(virDomainPtr dom, const char *disk, * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * As a corner case underlying hypervisor may report cur == 0 and + * end == 0 when the block job hasn't been started yet. In this + * case libvirt reports cur = 0 and end = 1. However, hypervisor + * may return cur == 0 and end == 0 if the block job has finished + * and was no-op. In this case libvirt reports cur = 1 and end = 1. + * Since 2.3.0. + * * Returns -1 in case of failure, 0 when nothing found, 1 when info was found. */ int diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4535eb8..df0d916 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16338,6 +16338,18 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, info->cur = rawInfo->cur; info->end = rawInfo->end; + /* Fix job completeness reporting. If cur == end mgmt + * applications think job is completed. Except when both cur + * and end are zero, in which case qemu hasn't started the + * job yet. */ + if (!info->cur && !info->end) { + if (rawInfo->ready > 0) { + info->cur = info->end = 1; + } else if (rawInfo->ready < 0) { + info->end = 1; + } + } + info->type = rawInfo->type; if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT) -- 2.8.4

On 09/05/2016 07:48 AM, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1372613
Apparently, some management applications use the following code pattern when waiting for a block job to finish:
while (1) { virDomainGetBlockJobInfo(dom, disk, info, flags);
if (info.cur == info.end) break;
sleep(1); }
Problem with this approach is in its corner cases. In case of QEMU, libvirt merely pass what has been reported on the monitor. However, if the block job hasn't started yet, qemu reports cur == end == 0 which tricks mgmt apps into thinking job is complete.
The solution is to mangle cur/end values as described here [1].
1: https://www.redhat.com/archives/libvir-list/2016-September/msg00017.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt-domain.c | 7 +++++++ src/qemu/qemu_driver.c | 12 ++++++++++++ 2 files changed, 19 insertions(+)
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 46f0318..fa82e49 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -9896,6 +9896,13 @@ virDomainBlockJobAbort(virDomainPtr dom, const char *disk, * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * As a corner case underlying hypervisor may report cur == 0 and + * end == 0 when the block job hasn't been started yet. In this + * case libvirt reports cur = 0 and end = 1. However, hypervisor + * may return cur == 0 and end == 0 if the block job has finished + * and was no-op. In this case libvirt reports cur = 1 and end = 1. + * Since 2.3.0. + * * Returns -1 in case of failure, 0 when nothing found, 1 when info was found. */ int diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4535eb8..df0d916 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16338,6 +16338,18 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, info->cur = rawInfo->cur; info->end = rawInfo->end;
+ /* Fix job completeness reporting. If cur == end mgmt + * applications think job is completed. Except when both cur + * and end are zero, in which case qemu hasn't started the + * job yet. */ + if (!info->cur && !info->end) { + if (rawInfo->ready > 0) { + info->cur = info->end = 1; + } else if (rawInfo->ready < 0) { + info->end = 1; + }
Can info->ready == 0 ? w/ info->cur = info->end = 0 If so, then we're in the same mess or some other weird condition. Seems like "ready" will be set in qemu during block_job_event_ready, so that would say to me that as long as the structure is allocated, ready will be false and conceivably info->cur = info->end = 0. Wouldn't that mean the < 0 should be <= 0 ACK (both patches) in principle, just need a clarification of what's being seen... John
+ } + info->type = rawInfo->type; if (info->type == VIR_DOMAIN_BLOCK_JOB_TYPE_COMMIT && disk->mirrorJob == VIR_DOMAIN_BLOCK_JOB_TYPE_ACTIVE_COMMIT)

On 09.09.2016 23:30, John Ferlan wrote:
On 09/05/2016 07:48 AM, Michal Privoznik wrote:
https://bugzilla.redhat.com/show_bug.cgi?id=1372613
Apparently, some management applications use the following code pattern when waiting for a block job to finish:
while (1) { virDomainGetBlockJobInfo(dom, disk, info, flags);
if (info.cur == info.end) break;
sleep(1); }
Problem with this approach is in its corner cases. In case of QEMU, libvirt merely pass what has been reported on the monitor. However, if the block job hasn't started yet, qemu reports cur == end == 0 which tricks mgmt apps into thinking job is complete.
The solution is to mangle cur/end values as described here [1].
1: https://www.redhat.com/archives/libvir-list/2016-September/msg00017.html
Signed-off-by: Michal Privoznik <mprivozn@redhat.com> --- src/libvirt-domain.c | 7 +++++++ src/qemu/qemu_driver.c | 12 ++++++++++++ 2 files changed, 19 insertions(+)
diff --git a/src/libvirt-domain.c b/src/libvirt-domain.c index 46f0318..fa82e49 100644 --- a/src/libvirt-domain.c +++ b/src/libvirt-domain.c @@ -9896,6 +9896,13 @@ virDomainBlockJobAbort(virDomainPtr dom, const char *disk, * can be found by calling virDomainGetXMLDesc() and inspecting * elements within //domain/devices/disk. * + * As a corner case underlying hypervisor may report cur == 0 and + * end == 0 when the block job hasn't been started yet. In this + * case libvirt reports cur = 0 and end = 1. However, hypervisor + * may return cur == 0 and end == 0 if the block job has finished + * and was no-op. In this case libvirt reports cur = 1 and end = 1. + * Since 2.3.0. + * * Returns -1 in case of failure, 0 when nothing found, 1 when info was found. */ int diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 4535eb8..df0d916 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -16338,6 +16338,18 @@ qemuBlockJobInfoTranslate(qemuMonitorBlockJobInfoPtr rawInfo, info->cur = rawInfo->cur; info->end = rawInfo->end;
+ /* Fix job completeness reporting. If cur == end mgmt + * applications think job is completed. Except when both cur + * and end are zero, in which case qemu hasn't started the + * job yet. */ + if (!info->cur && !info->end) { + if (rawInfo->ready > 0) { + info->cur = info->end = 1; + } else if (rawInfo->ready < 0) { + info->end = 1; + }
Can info->ready == 0 ? w/ info->cur = info->end = 0
If so, then we're in the same mess or some other weird condition.
Seems like "ready" will be set in qemu during block_job_event_ready, so that would say to me that as long as the structure is allocated, ready will be false and conceivably info->cur = info->end = 0.
Wouldn't that mean the < 0 should be <= 0
Okay, better safe than sorry; this looks reasonable to me. I mean, it looks unlikely to me that qemu will already allocate its internal job structure without filling @cur and @end, but it is possible, so I can do the change. Do you want me to send v2? Michal

[...]
Can info->ready == 0 ? w/ info->cur = info->end = 0
If so, then we're in the same mess or some other weird condition.
Seems like "ready" will be set in qemu during block_job_event_ready, so that would say to me that as long as the structure is allocated, ready will be false and conceivably info->cur = info->end = 0.
Wouldn't that mean the < 0 should be <= 0
Okay, better safe than sorry; this looks reasonable to me. I mean, it looks unlikely to me that qemu will already allocate its internal job structure without filling @cur and @end, but it is possible, so I can do the change.
Do you want me to send v2?
Not necessary... John

On 09/09/2016 04:30 PM, John Ferlan wrote:
+ /* Fix job completeness reporting. If cur == end mgmt + * applications think job is completed. Except when both cur + * and end are zero, in which case qemu hasn't started the + * job yet. */ + if (!info->cur && !info->end) {
We get here if qemu reports 0/0 (or if qemu reports nothing, and we end up with 0/0 because we 0-initialized the object)...
+ if (rawInfo->ready > 0) { + info->cur = info->end = 1;
if qemu reported done (on a no-op job), then we fudge to 1/1 and the caller knows we are done...
+ } else if (rawInfo->ready < 0) { + info->end = 1;
if qemu didn't tell us it was ready, then we fudge to 0/1. I thought the original email thread was that if rawInfo->ready == 0 (qemu explicitly told us it is NOT done) that we want to fudge to 0/1, and then the real question is that if qemu tells us nothing at all about rawInfo->ready, then fudging MIGHT treat a no-op job as never ending, so it was better to leave it at 0/0 (an application getting 0/0 when talking to new-enough libvirt then knows it is talking to older qemu). In other words, I think this condition is slightly better as rawInfo->ready == 0, and leave the rawInfo->ready < 0 case as 0/0. Or am I misremembering the results of the earlier thread?
+ }
Can info->ready == 0 ? w/ info->cur = info->end = 0
If so, then we're in the same mess or some other weird condition.
Seems like "ready" will be set in qemu during block_job_event_ready, so that would say to me that as long as the structure is allocated, ready will be false and conceivably info->cur = info->end = 0.
Wouldn't that mean the < 0 should be <= 0
<= 0 would make both explicitly not done and no answer from qemu mean the same thing - we fudge to 0/1 to tell the caller that it is not done. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 12.09.2016 21:19, Eric Blake wrote:
On 09/09/2016 04:30 PM, John Ferlan wrote:
+ /* Fix job completeness reporting. If cur == end mgmt + * applications think job is completed. Except when both cur + * and end are zero, in which case qemu hasn't started the + * job yet. */ + if (!info->cur && !info->end) {
We get here if qemu reports 0/0 (or if qemu reports nothing, and we end up with 0/0 because we 0-initialized the object)...
+ if (rawInfo->ready > 0) { + info->cur = info->end = 1;
if qemu reported done (on a no-op job), then we fudge to 1/1 and the caller knows we are done...
+ } else if (rawInfo->ready < 0) { + info->end = 1;
if qemu didn't tell us it was ready, then we fudge to 0/1.
I thought the original email thread was that if rawInfo->ready == 0 (qemu explicitly told us it is NOT done) that we want to fudge to 0/1, and then the real question is that if qemu tells us nothing at all about rawInfo->ready, then fudging MIGHT treat a no-op job as never ending, so it was better to leave it at 0/0 (an application getting 0/0 when talking to new-enough libvirt then knows it is talking to older qemu). In other words, I think this condition is slightly better as rawInfo->ready == 0, and leave the rawInfo->ready < 0 case as 0/0.
Or am I misremembering the results of the earlier thread?
So, just to make it crystal clear, is this what you're saying? ready | initial C/R |fudged C/R ------+-------------+---------- < 0 | 0/0 | 0/0 = 0 | 0/0 | 0/1
0 | 0/0 | 1/1
Michal

On 13.09.2016 11:52, Michal Privoznik wrote:
On 12.09.2016 21:19, Eric Blake wrote:
On 09/09/2016 04:30 PM, John Ferlan wrote:
+ /* Fix job completeness reporting. If cur == end mgmt + * applications think job is completed. Except when both cur + * and end are zero, in which case qemu hasn't started the + * job yet. */ + if (!info->cur && !info->end) {
We get here if qemu reports 0/0 (or if qemu reports nothing, and we end up with 0/0 because we 0-initialized the object)...
+ if (rawInfo->ready > 0) { + info->cur = info->end = 1;
if qemu reported done (on a no-op job), then we fudge to 1/1 and the caller knows we are done...
+ } else if (rawInfo->ready < 0) { + info->end = 1;
if qemu didn't tell us it was ready, then we fudge to 0/1.
I thought the original email thread was that if rawInfo->ready == 0 (qemu explicitly told us it is NOT done) that we want to fudge to 0/1, and then the real question is that if qemu tells us nothing at all about rawInfo->ready, then fudging MIGHT treat a no-op job as never ending, so it was better to leave it at 0/0 (an application getting 0/0 when talking to new-enough libvirt then knows it is talking to older qemu). In other words, I think this condition is slightly better as rawInfo->ready == 0, and leave the rawInfo->ready < 0 case as 0/0.
Or am I misremembering the results of the earlier thread?
So, just to make it crystal clear, is this what you're saying?
ready | initial C/R |fudged C/R
Oh, This should have been C/E instead of C/R. Current/End. 'e' and 'r' keys are just too close to each other :-)
------+-------------+---------- < 0 | 0/0 | 0/0 = 0 | 0/0 | 0/1
0 | 0/0 | 1/1
Michal

On 09/13/2016 06:02 AM, Michal Privoznik wrote:
So, just to make it crystal clear, is this what you're saying?
ready | initial C/R |fudged C/R
Oh, This should have been C/E instead of C/R. Current/End. 'e' and 'r' keys are just too close to each other :-)
------+-------------+---------- < 0 | 0/0 | 0/0 = 0 | 0/0 | 0/1
0 | 0/0 | 1/1
Yes, I think that's the best. Then we have documented to apps higher in the stack that if they see 0/0, it is because qemu is too old to give us a definitive answer; in all other cases, we have fudged data so that higher layers don't have to worry about 0/0. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 13.09.2016 19:03, Eric Blake wrote:
On 09/13/2016 06:02 AM, Michal Privoznik wrote:
So, just to make it crystal clear, is this what you're saying?
ready | initial C/R |fudged C/R
Oh, This should have been C/E instead of C/R. Current/End. 'e' and 'r' keys are just too close to each other :-)
------+-------------+---------- < 0 | 0/0 | 0/0 = 0 | 0/0 | 0/1
0 | 0/0 | 1/1
Yes, I think that's the best. Then we have documented to apps higher in the stack that if they see 0/0, it is because qemu is too old to give us a definitive answer; in all other cases, we have fudged data so that higher layers don't have to worry about 0/0.
Thank you guys, I've pushed the patch then (with the behaviour as described in the table). Michal

On Mon, Sep 05, 2016 at 01:48:21PM +0200, Michal Privoznik wrote:
There's been a discussion recently (even here on the list [1]) that Nova is sometimes unable to fetch block job stats properly. Basically, we may report job.cur == job.end == 0 in some cases tricking Nova into thinking job is done when in fact it hasn't even started yet.
Michal Privoznik (2): qemuDomainGetBlockJobInfo: Move info translation into separate func virDomainGetBlockJobInfo: Fix corner case when qemu reports no info
src/libvirt-domain.c | 7 ++++++ src/qemu/qemu_driver.c | 58 ++++++++++++++++++++++++++++++++++++-------------- 2 files changed, 49 insertions(+), 16 deletions(-)
I've tested the obvious case and reported it in this thread: http://www.redhat.com/archives/libvir-list/2016-September/msg00042.html FWIW: Tested-by: Kashyap Chamarthy <kchamart@redhat.com> -- /kashyap
participants (4)
-
Eric Blake
-
John Ferlan
-
Kashyap Chamarthy
-
Michal Privoznik