[libvirt] [PATCH] cpustat: fix regression when cpus are offline

It turns out that the cpuacct results properly account for offline cpus, and always returns results for every possible cpu, not just the online ones. So there is no need to check the map of online cpus in the first place, merely only a need to know the maximum possible cpu. Meanwhile, virNodeGetCPUBitmap had a subtle change from returning the maximum id to instead returning the width of the bitmap (one larger than the maximum id), which made this code encounter some off-by-one logic leading to bad error messages when a cpu was offline: $ virsh cpu-stats dom error: Failed to virDomainGetCPUStats() error: An error occurred, but the cause is unknown * src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Drop pointless check for cpumap changes, and use correct number of cpus. --- Fixes the regression noticed here: https://www.redhat.com/archives/libvir-list/2012-October/msg01508.html src/qemu/qemu_driver.c | 26 +++----------------------- 1 file changed, 3 insertions(+), 23 deletions(-) diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 18be7d9..f817319 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -13597,7 +13597,6 @@ qemuDomainGetPercpuStats(virDomainPtr domain, unsigned int ncpus) { virBitmapPtr map = NULL; - virBitmapPtr map2 = NULL; int rv = -1; int i, id, max_id; char *pos; @@ -13609,7 +13608,6 @@ qemuDomainGetPercpuStats(virDomainPtr domain, virTypedParameterPtr ent; int param_idx; unsigned long long cpu_time; - bool result; /* return the number of supported params */ if (nparams == 0 && ncpus != 0) @@ -13621,7 +13619,7 @@ qemuDomainGetPercpuStats(virDomainPtr domain, return rv; if (ncpus == 0) { /* returns max cpu ID */ - rv = max_id + 1; + rv = max_id; goto cleanup; } @@ -13648,11 +13646,7 @@ qemuDomainGetPercpuStats(virDomainPtr domain, id = start_cpu + ncpus - 1; for (i = 0; i <= id; i++) { - if (virBitmapGetBit(map, i, &result) < 0) - goto cleanup; - if (!result) { - cpu_time = 0; - } else if (virStrToLong_ull(pos, &pos, 10, &cpu_time) < 0) { + if (virStrToLong_ull(pos, &pos, 10, &cpu_time) < 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _("cpuacct parse error")); goto cleanup; @@ -13680,22 +13674,9 @@ qemuDomainGetPercpuStats(virDomainPtr domain, if (getSumVcpuPercpuStats(group, priv->nvcpupids, sum_cpu_time, n) < 0) goto cleanup; - /* Check that the mapping of online cpus didn't change mid-parse. */ - map2 = nodeGetCPUBitmap(domain->conn, &max_id); - if (!map2 || !virBitmapEqual(map, map2)) { - virReportError(VIR_ERR_OPERATION_INVALID, "%s", - _("the set of online cpus changed while reading")); - goto cleanup; - } - sum_cpu_pos = sum_cpu_time; for (i = 0; i <= id; i++) { - if (virBitmapGetBit(map, i, &result) < 0) - goto cleanup; - if (!result) - cpu_time = 0; - else - cpu_time = *(sum_cpu_pos++); + cpu_time = *(sum_cpu_pos++); if (i < start_cpu) continue; if (virTypedParameterAssign(¶ms[(i - start_cpu) * nparams + @@ -13711,7 +13692,6 @@ cleanup: VIR_FREE(sum_cpu_time); VIR_FREE(buf); virBitmapFree(map); - virBitmapFree(map2); return rv; } -- 1.7.11.7

On 10/25/12 22:57, Eric Blake wrote:
It turns out that the cpuacct results properly account for offline cpus, and always returns results for every possible cpu, not just the online ones. So there is no need to check the map of online cpus in the first place, merely only a need to know the maximum possible cpu. Meanwhile, virNodeGetCPUBitmap had a subtle change from returning the maximum id to instead returning the width of the bitmap (one larger than the maximum id), which made this code encounter some off-by-one logic leading to bad error messages when a cpu was offline:
$ virsh cpu-stats dom error: Failed to virDomainGetCPUStats()
error: An error occurred, but the cause is unknown
* src/qemu/qemu_driver.c (qemuDomainGetPercpuStats): Drop pointless check for cpumap changes, and use correct number of cpus. ---
Fixes the regression noticed here: https://www.redhat.com/archives/libvir-list/2012-October/msg01508.html
src/qemu/qemu_driver.c | 26 +++----------------------- 1 file changed, 3 insertions(+), 23 deletions(-)
diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 18be7d9..f817319 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -13597,7 +13597,6 @@ qemuDomainGetPercpuStats(virDomainPtr domain, unsigned int ncpus) { virBitmapPtr map = NULL; - virBitmapPtr map2 = NULL;
When you apply this patch "map" is also not needed (never actualy used). Is there a way to replace "map = nodeGetCPUBitmap(domain->conn, &max_id);" to get max_id? That's the only thing that code is used to.
int rv = -1; int i, id, max_id; char *pos;
Otherwise looks good. (ACK if there isn't any other option to get max_id than to call nodeGetCPUBitmap) Peter

On 10/26/2012 07:59 AM, Peter Krempa wrote:
On 10/25/12 22:57, Eric Blake wrote:
It turns out that the cpuacct results properly account for offline cpus, and always returns results for every possible cpu, not just the online ones. So there is no need to check the map of online cpus in the first place, merely only a need to know the maximum possible cpu. Meanwhile, virNodeGetCPUBitmap had a subtle change from returning the maximum id to instead returning the width of the bitmap (one larger than the maximum id), which made this code encounter some off-by-one logic leading to bad error messages when a cpu was offline:
virBitmapPtr map = NULL;
- virBitmapPtr map2 = NULL;
When you apply this patch "map" is also not needed (never actualy used).
Is there a way to replace "map = nodeGetCPUBitmap(domain->conn, &max_id);" to get max_id? That's the only thing that code is used to.
int rv = -1; int i, id, max_id; char *pos;
Otherwise looks good. (ACK if there isn't any other option to get max_id than to call nodeGetCPUBitmap)
Hmm, good point. I'll spin a v2 that adds a new helper function in nodeinfo.c for getting at just the number of host cpus. -- Eric Blake eblake@redhat.com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
participants (2)
-
Eric Blake
-
Peter Krempa