On Thu, Apr 24, 2025 at 02:07:17PM +0100, Daniel P. Berrangé wrote:
On Thu, Apr 24, 2025 at 01:22:46PM +0200, Felix Huettner via Devel wrote:
> In case of a host that has a large number of cpus offline the count of
> host cpus and the last bit set in the virHostCPUGetOnlineBitmap might
> diverge significantly. This can e.g. be the case when disabeling smt via
> /sys/devices/system/cpu/smt/control.
>
> On the host this looks like:
> ```
> $ cat /sys/devices/system/cpu/present
> 0-255
> $ cat /sys/devices/system/cpu/online
> 0-127
> ```
>
> However in this case virBitmapToData previously only allocated 16 bytes
> for the output bitmap. This is becase the last set bit is on the 15th
> byte.
>
> Users of virHostCPUGetMap however rely on the "cpumap" containing enough
> space for all existing cpus (so they would expect 32 bytes in this case).
> E.g. cmdNodeCpuMap relies on this for its output. It will then actually
> read 32 bytes from the start of the "cpumap" address where in this case
> the last 16 of these bytes are uninitialized.
>
> This manifests itself in flapping outputs of "virsh nodecpumap --pretty"
like:
> ```
> $ virsh nodecpumap --pretty
> CPUs present: 256
> CPUs online: 128
> CPU map: 0-127,192,194,202
>
> $ virsh nodecpumap --pretty
> CPUs present: 256
> CPUs online: 128
> CPU map: 0-127,192,194,197
>
> $ virsh nodecpumap --pretty
> CPUs present: 256
> CPUs online: 128
> CPU map: 0-127,192,194,196-197
> ```
>
> This in turn potentially causes users of this data to report wrong cpu
> counts.
>
> Note that this only seems to happen with at least 256 phyiscal cpus
> where at least 128 are offline.
>
> We fix this by preallocating the expected bitmap size.
>
> Signed-off-by: Felix Huettner <felix.huettner(a)stackit.cloud>
> ---
> src/util/virhostcpu.c | 14 +++++++++-----
> 1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/src/util/virhostcpu.c b/src/util/virhostcpu.c
> index 5dbcc8987c..626faa88cf 100644
> --- a/src/util/virhostcpu.c
> +++ b/src/util/virhostcpu.c
> @@ -1091,22 +1091,26 @@ virHostCPUGetMap(unsigned char **cpumap,
> {
> g_autoptr(virBitmap) cpus = NULL;
> int ret = -1;
> - int dummy;
>
> virCheckFlags(0, -1);
>
> + ret = virHostCPUGetCount();
This sets 'ret' to a positive value....
> +
> if (!cpumap && !online)
> - return virHostCPUGetCount();
> + return ret;
>
> if (!(cpus = virHostCPUGetOnlineBitmap()))
> goto cleanup;
...in the failure scenario we now jump to 'cleanup'
with 'ret' still positive.
Better to use a different pattern with separate variables
int ret = -1;
int ncpus = virHostCPUGetCount();
...do stuff...
ret = ncpus;
cleanup:
if (ret < 0)
...
return ret;
Hi Daniel,
thanks for the review.
I will post a fixed v2.
I decided to get rid of the "goto cleanup" completely and just return -1
instead. "cleanup" seems to only have existed to deallocate cpumap, but
by the time we jump there cpumap can not possibly be allocated.
So i hope this makes it more easy to read.
Thanks a lot,
Felix
>
> >
> > - if (cpumap)
> > - virBitmapToData(cpus, cpumap, &dummy);
> > + if (cpumap) {
> > + int len = (ret + CHAR_BIT) / CHAR_BIT;
> > + *cpumap = g_new0(unsigned char, len);
> > + virBitmapToDataBuf(cpus, *cpumap, len);
> > + }
> > +
> > if (online)
> > *online = virBitmapCountBits(cpus);
> >
> > - ret = virHostCPUGetCount();
> >
> > cleanup:
> > if (ret < 0 && cpumap)
> >
> > base-commit: c5a73f75bc6cae4f466d0a6344d5b3277ac9c2f4
> > --
> > 2.43.0
> >
>
> With regards,
> Daniel
> --
> |:
https://berrange.com/
> |:
https://libvirt.org/
> |:
https://entangle-photo.org/
>