On Fri, 2016-01-29 at 01:32 -0500, Shivaprasad G Bhat wrote:
The nodeinfo output was fixed earlier to reflect the actual cpus
available in
KVM mode on PPC64. The earlier fixes covered the aspect of not making a host
look overcommitted when its not. The current fixes are aimed at helping the
users make better decisions on the kind of guest cpu topology that can be
supported on the given sucore_per_core setting of KVM host and also hint the
way to pin the guest vcpus efficiently.
I am planning to add some test cases once the approach is accepted.
With respect to Patch 2:
The second patch adds a new element to the cpus tag and I need your inputs on
if that is okay. Also if there is a better way. I am not sure if the existing
clients have RNG checks that might fail with the approach. Or if the checks
are not enoforced on the elements but only on the tags.
With my approach if the rng checks pass, the new element "capacity" even if
ignored by many clients would have no impact except for PPC64.
To the extent I looked at code, the siblings changes dont affect existing
libvirt functionality. Please do let me know otherwise.
So, I've been going through this old thread trying to figure out
a way to improve the status quo. I'd like to collect as much
feedback as possible, especially from people who have worked in
this area of libvirt before or have written tools based on it.
As hinted above, this series is really trying to address two
different issue, and I think it's helpful to reason about them
separately.
** Guest threads limit **
My dual-core laptop will happily run a guest configured with
<cpu>
<topology sockets='1' cores='1' threads='128'/>
</cpu>
but POWER guests are limited to 8/subcores_per_core threads.
We need to report this information to the user somehow, and
I can't see an existing place where it would fit nicely. We
definitely don't want to overload the meaning of an existing
element/attribute with this. It should also only appear in
the (dom)capabilities XML of ppc64 hosts.
I don't think this is too problematic or controversial, we
just need to pick a nice place to display this information.
** Efficient guest topology **
To achieve optimal performance, you want to match guest
threads with host threads.
On x86, you can choose suitable host threads by looking at
the capabilities XML: the presence of elements like
<cpu id='2' socket_id='0' core_id='1'
siblings='2-3'/>
<cpu id='3' socket_id='0' core_id='1'
siblings='2-3'/>
means you should configure your guest to use
<vcpu placement='static' cpuset='2-3'>2</vcpu>
<cpu>
<topology sockets='1' cores='1' threads='2'/>
</cpu>
Notice how siblings can be found either looking at the
attribute with the same name, or by matching them using the
value of the core_id attribute. Also notice how you are
supposed to pin as many vCPUs as the number of elements in
the cpuset - one guest thread per host thread.
On POWER, this gets much trickier: only the *primary* thread
of each (sub)core appears to be online in the host, but all
threads can actually have a vCPU running on them. So
<cpu id='0' socket_id='0' core_id='32'
siblings='0,4'/>
<cpu id='4' socket_id='0' core_id='32'
siblings='0,4'/>
which is what you'd get with subcores_per_core=2, is very
confusing.
The optimal guest topology in this case would be
<vcpu placement='static' cpuset='4'>4</vcpu>
<cpu>
<topology sockets='1' cores='1' threads='4'/>
</cpu>
but neither approaches mentioned above work to figure out the
correct value for the cpuset attribute.
In this case, a possible solution would be to alter the values
of the core_id and siblings attribute such that both would be
the same as the id attribute, which would naturally make both
approaches described above work.
Additionaly, a new attribute would be introduced to serve as
a multiplier for the "one guest thread per host thread" rule
mentioned earlier: the resulting XML would look like
<cpu id='0' socket_id='0' core_id='0' siblings='0'
capacity='4'/>
<cpu id='4' socket_id='0' core_id='4' siblings='4'
capacity='4'/>
which contains all the information needed to build the right
guest topology. The capacity attribute would have value 1 on
all architectures except for ppc64.
We could arguably use the capacity attribute to cover the
use case described in the first part as well, by declaring that
any value other than 1 means there's a limit to the number of
threads a guest core can have. I think doing so has the
potential to produce much grief in the future, so I'd rather
keep them separate - even if it means inventing a new element.
It's been also proposed to add a physical_core_id attribute,
which would contain the real core id and allow tools to figure
out which subcores belong to the same core - it would be the
same as core_id for all other architectures and for ppc64
when subcores_per_core=1. It's not clear whether having this
attribute would be useful or just confusing.
This is all I have for now. Please let me know what you think
about it.
--
Andrea Bolognani
Software Engineer - Virtualization Team