Re: [libvirt] [RFC PATCH 0/2] nodeinfo: PPC64: Fix topology and siblings info on capabilities and nodeinfo

16 Jun 2016

      On Fri, 10 Jun 2016 17:52:47 +0200
Andrea Bolognani <abologna@redhat.com> wrote:
...
On Tue, 2016-05-31 at 16:08 +1000, David Gibson wrote:
...
...
QEMU fails with errors like

   qemu-kvm: Cannot support more than 8 threads on PPC with KVM
   qemu-kvm: Cannot support more than 1 threads on PPC with TCG

depending on the guest type.  

Note that in a sense the two errors come about for different reasons.

On Power, to a much greater degree than x86, threads on the same core
have observably different behaviour from threads on different cores.
Because of that, there's no reasonable way for KVM to present more
guest threads-per-core than there are host threads-per-core.

The limit of 1 thread on TCG is simply because no-one's ever bothered
to implement SMT emulation in qemu.
That just means in the future we might have to expose something
other than an hardcoded '1' as guest thread limit for TCG guests;
the interface would remain valid AFAICT.
Right, that's kind of my point.
...
...
...
physical_core_id would be 32 for all of the above - it would
just be the very value of core_id the kernel reads from the
hardware and reports through sysfs.

The tricky bit is that, when subcores are in use, core_id and
physical_core_id would not match. They will always match on
architectures that lack the concept of subcores, though.  

Yeah, I'm still not terribly convinced that we should even be
presenting physical core info instead of *just* logical core info.  If
you care that much about physical core topology, you probably
shouldn't be running your system in subcore mode.
Me neither. We could leave it out initially, and add it later
if it turns out to be useful, I guess.
I think that's a good idea.
...
...
...
...
...
The optimal guest topology in this case would be

    <vcpu placement='static' cpuset='4'>4</vcpu>
    <cpu>
      <topology sockets='1' cores='1' threads='4'/>
    </cpu>    

So when we pin to logical CPU #4, ppc KVM is smart enough to see that it's a
subcore thread, will then make use of the offline threads in the same subcore?
Or does libvirt do anything fancy to facilitate this case?    

My understanding is that libvirt shouldn't have to do anything
to pass the hint to kvm, but David will have the authoritative
answer here.  

Um.. I'm not totally certain.  It will be one of two things:
    a) you just bind the guest thread to the representative host thread
    b) you bind the guest thread to a cpumask with all of the host
       threads on the relevant (sub)core - including the offline host
       threads

I'll try to figure out which one it is.
I played with this a bit: I created a guest with
  <vcpu placement='static' cpuset='0,8'>8</vcpu>
  <cpu>
    <topology sockets='1' cores='2' threads='4'/>
  </cpu>
and then, inside the guest, I used cgroups to pin a bunch
of busy loops to specific vCPUs.
As long as all the load (8+ busy loops) was distributed
only across vCPUs 0-3, one of the host threads remained idle.
As soon as the first of the jobs was moved to vCPUs 4-7, the
other host thread immediately jumped to 100%.
This seems to indicate that QEMU / KVM are actually smart
enough to schedule guest threads on the corresponding host
threads. I think :)
Uh.. yes.  Guest threads on the same guest core will always be
scheduled together on a physical (sub)core.  In fact, it *has* to be
done this way because recent processors contain the msgsnd / msgrcv
instructions which directly send interrupts from one thread to
another.  Those instructions are not HV privileged, so they can be
invoked directly by the guest and their behaviour can't be virtualized.

This is one of the ways in which threads on the same core are
guest-observably different from threads on different cores mentioned
above.
...
On the other hand, when I changed the guest to distribute the
8 vCPUs among 2 sockets with 4 cores each instead, the second
host thread would start running as soon as I started the
second busy loop.
Right likewise a single physical (sub)core can never simultaneously run
threads from multiple guests (or guest and host).  msgsnd above, as
well as some other things, would allow one guest to interfere with
another, breaking isolation.

This is the reason that having multiple threads active in the host
while also running guests would be almost impossibly difficult.
...
...
...
We won't know whether the proposal is actually sensible until
David weighs in, but I'm adding Martin back in the loop so
we can maybe give us the oVirt angle in the meantime.  

TBH, I'm not really sure what you want from me.  Most of the questions
seem to be libvirt design decisions which are independent of the layers
below.
I mostly need you to sanity check my proposals and point out
any incorrect / dubious claims, just like you did above :)
The design of features like this one can have pretty
significant consequences for the interactions between the
various layers, and when the choices are not straightforward
I think it's better to gather as much feedback as possible
from across the stack before moving forward with an
implementation.
-- 
Andrea Bolognani
Software Engineer - Virtualization Team
-- 
David Gibson <dgibson@redhat.com>
Senior Software Engineer, Virtualization, Red Hat