[libvirt] vcpupin reports bogus vcpu affinities

For pinned vcpus, vcpupin will report inaccurate affinity values on machines with high core counts (256 cores in my case). The problem is produced as follows: $ virsh vcpupin myguest 0 4 $ virsh vcpupin myguest 0 VCPU CPU Affinity --------------------------- 0 4,192,194,196-197 Running taskset on the qemu threads shows the correct affinity, so this seems to be a reporting problem. Strangely, the value "192" is significant. If I pin a cpu greater than 192, the problem no longer appears. I believe the cause of the problem in my case is that in this case in src/conf/domain_conf.c:virDomainDefGetVcpuPinInfoHelper: ... if (vcpu && vcpu->cpumask) bitmap = vcpu->cpumask; ... vcpu->cpumask is "shortened" in that it is only long enough to contain the last set bit in the mask. However, when we go to copy the mask to the buffer that is returned, we use the masklen passed to the function which is the "full" masklen with a bit for each cpu. So it seems virBitmapToDataBuf copies some extra data past the end of the bitmask. Why the "192" value is always set and I typically see similar bogus bits set is still unknown. What is the function meant to assume in this case? Is it sane to assume that the bitmask is the full length of the buffer here and it's the responsibility of the setter of vcpu->cpumask to provide the length of the bitmap we're expecting? Or should we assume that we may receive a shortened bitmask here and expand the bitmask before copying to the buffer? -John

On 3/26/19 4:06 PM, Allen, John wrote:
For pinned vcpus, vcpupin will report inaccurate affinity values on machines with high core counts (256 cores in my case). The problem is produced as follows:
$ virsh vcpupin myguest 0 4
$ virsh vcpupin myguest 0
VCPU CPU Affinity --------------------------- 0 4,192,194,196-197
Running taskset on the qemu threads shows the correct affinity, so this seems to be a reporting problem. Strangely, the value "192" is significant. If I pin a cpu greater than 192, the problem no longer appears.
I believe the cause of the problem in my case is that in this case in src/conf/domain_conf.c:virDomainDefGetVcpuPinInfoHelper:
... if (vcpu && vcpu->cpumask) bitmap = vcpu->cpumask; ...
vcpu->cpumask is "shortened" in that it is only long enough to contain the last set bit in the mask. However, when we go to copy the mask to the buffer that is returned, we use the masklen passed to the function which is the "full" masklen with a bit for each cpu. So it seems virBitmapToDataBuf copies some extra data past the end of the bitmask. Why the "192" value is always set and I typically see similar bogus bits set is still unknown.
What is the function meant to assume in this case? Is it sane to assume that the bitmask is the full length of the buffer here and it's the responsibility of the setter of vcpu->cpumask to provide the length of the bitmap we're expecting? Or should we assume that we may receive a shortened bitmask here and expand the bitmask before copying to the buffer?
I didn't dig into the code much but I can try and help figure it out. Can you also provide: * libvirt version * host distro * output of 'virsh nodeinfo' * output of 'virsh nodecpumap' Thanks, Cole

On Wed, Apr 03, 2019 at 01:48:33PM -0400, Cole Robinson wrote:
On 3/26/19 4:06 PM, Allen, John wrote:
For pinned vcpus, vcpupin will report inaccurate affinity values on machines with high core counts (256 cores in my case). The problem is produced as follows:
$ virsh vcpupin myguest 0 4
$ virsh vcpupin myguest 0
VCPU CPU Affinity --------------------------- 0 4,192,194,196-197
Running taskset on the qemu threads shows the correct affinity, so this seems to be a reporting problem. Strangely, the value "192" is significant. If I pin a cpu greater than 192, the problem no longer appears.
I believe the cause of the problem in my case is that in this case in src/conf/domain_conf.c:virDomainDefGetVcpuPinInfoHelper:
... if (vcpu && vcpu->cpumask) bitmap = vcpu->cpumask; ...
vcpu->cpumask is "shortened" in that it is only long enough to contain the last set bit in the mask. However, when we go to copy the mask to the buffer that is returned, we use the masklen passed to the function which is the "full" masklen with a bit for each cpu. So it seems virBitmapToDataBuf copies some extra data past the end of the bitmask. Why the "192" value is always set and I typically see similar bogus bits set is still unknown.
What is the function meant to assume in this case? Is it sane to assume that the bitmask is the full length of the buffer here and it's the responsibility of the setter of vcpu->cpumask to provide the length of the bitmap we're expecting? Or should we assume that we may receive a shortened bitmask here and expand the bitmask before copying to the buffer?
Hi Cole, Sorry for the delayed response. I am just getting back from vacation. I have provided the information below. However, I believe I have an understanding of the problem and I will be submitting a patch later today.
I didn't dig into the code much but I can try and help figure it out. Can you also provide:
* libvirt version
libvirt 5.3.0
* host distro
Ubuntu 18.04
* output of 'virsh nodeinfo'
CPU model: x86_64 CPU(s): 256 CPU frequency: 1739 MHz CPU socket(s): 1 Core(s) per socket: 64 Thread(s) per core: 2 NUMA cell(s): 2 Memory size: 131833228 KiB
* output of 'virsh nodecpumap'
CPUs present: 256 CPUs online: 256 CPU map: yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
Thanks, Cole
participants (2)
-
Allen, John
-
Cole Robinson