On Thu, May 05, 2016 at 08:48:05PM +0200, Andrea Bolognani wrote:
On Fri, 2016-01-29 at 01:32 -0500, Shivaprasad G Bhat wrote:
> The nodeinfo output was fixed earlier to reflect the actual cpus available in
> KVM mode on PPC64. The earlier fixes covered the aspect of not making a host
> look overcommitted when its not. The current fixes are aimed at helping the
> users make better decisions on the kind of guest cpu topology that can be
> supported on the given sucore_per_core setting of KVM host and also hint the
> way to pin the guest vcpus efficiently.
>
> I am planning to add some test cases once the approach is accepted.
>
> With respect to Patch 2:
> The second patch adds a new element to the cpus tag and I need your inputs on
> if that is okay. Also if there is a better way. I am not sure if the existing
> clients have RNG checks that might fail with the approach. Or if the checks
> are not enoforced on the elements but only on the tags.
>
> With my approach if the rng checks pass, the new element "capacity" even
if
> ignored by many clients would have no impact except for PPC64.
>
> To the extent I looked at code, the siblings changes dont affect existing
> libvirt functionality. Please do let me know otherwise.
So, I've been going through this old thread trying to figure out
a way to improve the status quo. I'd like to collect as much
feedback as possible, especially from people who have worked in
this area of libvirt before or have written tools based on it.
As hinted above, this series is really trying to address two
different issue, and I think it's helpful to reason about them
separately.
** Guest threads limit **
My dual-core laptop will happily run a guest configured with
<cpu>
<topology sockets='1' cores='1' threads='128'/>
</cpu>
but POWER guests are limited to 8/subcores_per_core threads.
We need to report this information to the user somehow, and
I can't see an existing place where it would fit nicely. We
definitely don't want to overload the meaning of an existing
element/attribute with this. It should also only appear in
the (dom)capabilities XML of ppc64 hosts.
I don't think this is too problematic or controversial, we
just need to pick a nice place to display this information.
** Efficient guest topology **
To achieve optimal performance, you want to match guest
threads with host threads.
On x86, you can choose suitable host threads by looking at
the capabilities XML: the presence of elements like
<cpu id='2' socket_id='0' core_id='1'
siblings='2-3'/>
<cpu id='3' socket_id='0' core_id='1'
siblings='2-3'/>
means you should configure your guest to use
<vcpu placement='static' cpuset='2-3'>2</vcpu>
<cpu>
<topology sockets='1' cores='1' threads='2'/>
</cpu>
Notice how siblings can be found either looking at the
attribute with the same name, or by matching them using the
value of the core_id attribute. Also notice how you are
supposed to pin as many vCPUs as the number of elements in
the cpuset - one guest thread per host thread.
On POWER, this gets much trickier: only the *primary* thread
of each (sub)core appears to be online in the host, but all
threads can actually have a vCPU running on them. So
<cpu id='0' socket_id='0' core_id='32'
siblings='0,4'/>
<cpu id='4' socket_id='0' core_id='32'
siblings='0,4'/>
which is what you'd get with subcores_per_core=2, is very
confusing.
The optimal guest topology in this case would be
<vcpu placement='static' cpuset='4'>4</vcpu>
<cpu>
<topology sockets='1' cores='1' threads='4'/>
</cpu>
but neither approaches mentioned above work to figure out the
correct value for the cpuset attribute.
In this case, a possible solution would be to alter the values
of the core_id and siblings attribute such that both would be
the same as the id attribute, which would naturally make both
approaches described above work.
Additionaly, a new attribute would be introduced to serve as
a multiplier for the "one guest thread per host thread" rule
mentioned earlier: the resulting XML would look like
<cpu id='0' socket_id='0' core_id='0' siblings='0'
capacity='4'/>
<cpu id='4' socket_id='0' core_id='4' siblings='4'
capacity='4'/>
which contains all the information needed to build the right
guest topology. The capacity attribute would have value 1 on
all architectures except for ppc64.
I don't really like the fact that with this design, we effectively
have a bunch of <cpu> which are invisible whose existance is just
implied by the 'capacity=4' attribute.
I also don't like tailoring output of capabilities XML for one
specific use case.
IOW, I think we should explicitly represent all the CPUs in the
node capabilities, even if they are offline in the host. We could
introduce a new attribute to indicate the status of CPUs. So
instead of
<cpu id='0' socket_id='0' core_id='0' siblings='0'
capacity='4'/>
<cpu id='4' socket_id='0' core_id='4' siblings='4'
capacity='4'/>
I'd like to see
<cpu id='0' socket_id='0' core_id='0' siblings='0-3'
state="online"/>
<cpu id='0' socket_id='0' core_id='0' siblings='0-3'
state="offline"/>
<cpu id='0' socket_id='0' core_id='0' siblings='0-3'
state="offline"/>
<cpu id='0' socket_id='0' core_id='0' siblings='0-3'
state="offline"/>
<cpu id='4' socket_id='0' core_id='4' siblings='4-7'
state="online"/>
<cpu id='4' socket_id='0' core_id='4' siblings='4-7'
state="offline"/>
<cpu id='4' socket_id='0' core_id='4' siblings='4-7'
state="offline"/>
<cpu id='4' socket_id='0' core_id='4' siblings='4-7'
state="offline"/>
The domain capabilities meanwhile is where you'd express any usage
constraint for cores/threads requried by QEMU.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|:
http://entangle-photo.org -o-
http://live.gnome.org/gtk-vnc :|