On Wed, Jul 12, 2017 at 05:09:37PM +0530, Shivaprasad G Bhat wrote:
On 07/12/2017 04:25 PM, Andrea Bolognani wrote:
> [libvir-list added to the loop]
>
> On Tue, 2017-07-04 at 10:47 +0200, Greg Kurz wrote:
> > On Tue, 4 Jul 2017 17:29:01 +1000 David Gibson
<david(a)gibson.dropbear.id.au> wrote:
> > > On Mon, Jul 03, 2017 at 06:48:25PM +0200, Greg Kurz wrote:
> > > > The sPAPR machine always create a default PHB during initialization,
even
> > > > if -nodefaults was passed on the command line. This forces the user
to
> > > > rely on -global if she wants to set properties of the default PHB,
such
> > > > as numa_node.
> > > > This patch introduces a new machine create-default-phb property to
control
> > > > whether the default PHB must be created or not. It defaults to on in
order
> > > > to preserve old setups (which is also the motivation to not alter
the
> > > > current behavior of -nodefaults).
> > > > If create-default-phb is set to off, the default PHB isn't
created, nor
> > > > any other device usually created with it. It is mandatory to provide
> > > > a PHB on the command line to be able to use PCI devices (otherwise
QEMU
> > > > won't start). For example, the following creates a PHB with the
same
> > > > mappings as the default PHB and also sets the NUMA affinity:
> > > > -machine type=pseries,create-default-phb=off \
> > > > -numa node,nodeid=0 -device
spapr-pci-host-bridge,index=0,numa_node=0
> > > So, I agree that the distinction between default devices that are
> > > disabled with -nodefaults and default devices that aren't is a big
> > > mess in qemu configuration. But on the other hand this only addresses
> > > one tiny aspect of that, and in the meantime means we will silently
> > > ignore some other configuration options in some conditions.
> > > So, what's the immediate benefit / use case for this?
Setting numa_node for emulated devices is the benefit for now. On x86, I
figured there is
no way to set the numa_node for the root controller and the emulated devices
sitting there
all have numa_node set to -1. Only the devices on the pxb can have a
sensible value specified.
Given that we have the equivalent restriction on x86, I'm not seeing
removing it on Power as a priority.
Does it mean, the emulated devices/drivers don't care about the
numa_node
they are on?
Probably not. If nothing else, I expect the slowdown of bad NUMA
affinity is probably not significant compared to the general slowness
of an emulated device. Since the device is also in software, the
standard NUMA stuff on the host may already migrate the relevant
things to match the (host) NUMA node where the guest code is running,
so it may be strictly irrelevant in that sense as well.
Would it be fine on PPC to disallow setting the NUMA node for the
default
PHB because that is where
all the emulated devices sit ?
> > With the current code base, the only way to set properties of the default
> > PHB, is to pass -global spapr-pci-host-bridge.prop=value for each property.
> > The immediate benefit of this patch is to unify the way libvirt passes
> > PHB description to the command line:
> > ie, do:
> > -machine type=pseries,create-default-phb=off \
> > -device spapr-pci-host-bridge,prop1=a,prop2=b,prop3=c \
> > -device spapr-pci-host-bridge,prop1=d,prop2=e,prop3=f
> > instead of:
> > -machine type=pseries \
> > -global spapr-pci-host-bridge.prop1=a \
> > -global spapr-pci-host-bridge.prop2=b \
> > -global spapr-pci-host-bridge.prop3=c \
> > -device spapr-pci-host-bridge,prop1=d,prop2=e,prop3=f
> So, I'm thinking about this mostly in terms of NUMA nodes
> because that's the use case I'm aware of.
>
> The problem with using -global is not that it requires using
> a different syntax to set properties for the default PHB,
> but rather that such properties are then inherited by all
> other PHBs unless explicitly overridden. Not creating the
> default PHB at all would solve the issue.
>
> On the other hand, libvirt would then need to either
>
> 1) only allow setting NUMA nodes for PHBs if QEMU supports
> the new option, leaving QEMU < 2.10 users behind; or
>
> 2) implement handling for both the new and old behavior.
>
> I'm not sure we could get away with 1), and going for 2)
> means more work both for QEMU and libvirt developers for
> very little actual gain, so I'd be inclined to scrap this
> and just build the libvirt glue on top of the existing
> interface.
>
> That is, of course, unless
>
> 1) having a random selection of PHBs not assigned to any
> NUMA node is a sensible use case. This is something
> we just can't do reliably with the current interface:
> we can decide to set the NUMA node only for say, PHBs
> 1 and 3 leaving 0 and 2 alone, but once we set it for
> the default PHB we *have* to set it for all remaining
> ones as well. libvirt will by default assign emulated
> devices to the default PHB, so I would rather expect
> users to leave that one alone and set a NUMA node for
> all other PHBs; or
>
> 2) there are other properties outside of numa_node we
> might want to deal with; or
>
> 3) it turns out it's okay to require a recent QEMU :)
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson