On Tue, 2016-05-31 at 16:08 +1000, David Gibson wrote:
> QEMU fails with errors like
>
> qemu-kvm: Cannot support more than 8 threads on PPC with KVM
> qemu-kvm: Cannot support more than 1 threads on PPC with TCG
>
> depending on the guest type.
Note that in a sense the two errors come about for different reasons.
On Power, to a much greater degree than x86, threads on the same core
have observably different behaviour from threads on different cores.
Because of that, there's no reasonable way for KVM to present more
guest threads-per-core than there are host threads-per-core.
The limit of 1 thread on TCG is simply because no-one's ever bothered
to implement SMT emulation in qemu.
That just means in the future we might have to expose something
other than an hardcoded '1' as guest thread limit for TCG guests;
the interface would remain valid AFAICT.
> physical_core_id would be 32 for all of the above - it would
> just be the very value of core_id the kernel reads from the
> hardware and reports through sysfs.
>
> The tricky bit is that, when subcores are in use, core_id and
> physical_core_id would not match. They will always match on
> architectures that lack the concept of subcores, though.
Yeah, I'm still not terribly convinced that we should even be
presenting physical core info instead of *just* logical core info. If
you care that much about physical core topology, you probably
shouldn't be running your system in subcore mode.
Me neither. We could leave it out initially, and add it later
if it turns out to be useful, I guess.
> > > The optimal guest topology in this case would be
> > >
> > > <vcpu placement='static'
cpuset='4'>4</vcpu>
> > > <cpu>
> > > <topology sockets='1' cores='1'
threads='4'/>
> > > </cpu>
> >
> > So when we pin to logical CPU #4, ppc KVM is smart enough to see that it's
a
> > subcore thread, will then make use of the offline threads in the same subcore?
> > Or does libvirt do anything fancy to facilitate this case?
>
> My understanding is that libvirt shouldn't have to do anything
> to pass the hint to kvm, but David will have the authoritative
> answer here.
Um.. I'm not totally certain. It will be one of two things:
a) you just bind the guest thread to the representative host thread
b) you bind the guest thread to a cpumask with all of the host
threads on the relevant (sub)core - including the offline host
threads
I'll try to figure out which one it is.
I played with this a bit: I created a guest with
<vcpu placement='static' cpuset='0,8'>8</vcpu>
<cpu>
<topology sockets='1' cores='2' threads='4'/>
</cpu>
and then, inside the guest, I used cgroups to pin a bunch
of busy loops to specific vCPUs.
As long as all the load (8+ busy loops) was distributed
only across vCPUs 0-3, one of the host threads remained idle.
As soon as the first of the jobs was moved to vCPUs 4-7, the
other host thread immediately jumped to 100%.
This seems to indicate that QEMU / KVM are actually smart
enough to schedule guest threads on the corresponding host
threads. I think :)
On the other hand, when I changed the guest to distribute the
8 vCPUs among 2 sockets with 4 cores each instead, the second
host thread would start running as soon as I started the
second busy loop.
> We won't know whether the proposal is actually sensible
until
> David weighs in, but I'm adding Martin back in the loop so
> we can maybe give us the oVirt angle in the meantime.
TBH, I'm not really sure what you want from me. Most of the questions
seem to be libvirt design decisions which are independent of the layers
below.
I mostly need you to sanity check my proposals and point out
any incorrect / dubious claims, just like you did above :)
The design of features like this one can have pretty
significant consequences for the interactions between the
various layers, and when the choices are not straightforward
I think it's better to gather as much feedback as possible
from across the stack before moving forward with an
implementation.
--
Andrea Bolognani
Software Engineer - Virtualization Team