On 07/12/2017 04:25 PM, Andrea Bolognani wrote:
[libvir-list added to the loop]
On Tue, 2017-07-04 at 10:47 +0200, Greg Kurz wrote:
> On Tue, 4 Jul 2017 17:29:01 +1000 David Gibson <david(a)gibson.dropbear.id.au>
wrote:
>> On Mon, Jul 03, 2017 at 06:48:25PM +0200, Greg Kurz wrote:
>>>
>>> The sPAPR machine always create a default PHB during initialization, even
>>> if -nodefaults was passed on the command line. This forces the user to
>>> rely on -global if she wants to set properties of the default PHB, such
>>> as numa_node.
>>>
>>> This patch introduces a new machine create-default-phb property to control
>>> whether the default PHB must be created or not. It defaults to on in order
>>> to preserve old setups (which is also the motivation to not alter the
>>> current behavior of -nodefaults).
>>>
>>> If create-default-phb is set to off, the default PHB isn't created, nor
>>> any other device usually created with it. It is mandatory to provide
>>> a PHB on the command line to be able to use PCI devices (otherwise QEMU
>>> won't start). For example, the following creates a PHB with the same
>>> mappings as the default PHB and also sets the NUMA affinity:
>>>
>>> -machine type=pseries,create-default-phb=off \
>>> -numa node,nodeid=0 -device spapr-pci-host-bridge,index=0,numa_node=0
>>
>> So, I agree that the distinction between default devices that are
>> disabled with -nodefaults and default devices that aren't is a big
>> mess in qemu configuration. But on the other hand this only addresses
>> one tiny aspect of that, and in the meantime means we will silently
>> ignore some other configuration options in some conditions.
>>
>> So, what's the immediate benefit / use case for this?
Setting numa_node for emulated devices is the benefit for now. On x86, I
figured there is
no way to set the numa_node for the root controller and the emulated
devices sitting there
all have numa_node set to -1. Only the devices on the pxb can have a
sensible value specified.
Does it mean, the emulated devices/drivers don't care about the
numa_node they are on?
Would it be fine on PPC to disallow setting the NUMA node for the
default PHB because that is where
all the emulated devices sit ?
>
> With the current code base, the only way to set properties of the default
> PHB, is to pass -global spapr-pci-host-bridge.prop=value for each property.
> The immediate benefit of this patch is to unify the way libvirt passes
> PHB description to the command line:
>
> ie, do:
>
> -machine type=pseries,create-default-phb=off \
> -device spapr-pci-host-bridge,prop1=a,prop2=b,prop3=c \
> -device spapr-pci-host-bridge,prop1=d,prop2=e,prop3=f
>
> instead of:
>
> -machine type=pseries \
> -global spapr-pci-host-bridge.prop1=a \
> -global spapr-pci-host-bridge.prop2=b \
> -global spapr-pci-host-bridge.prop3=c \
> -device spapr-pci-host-bridge,prop1=d,prop2=e,prop3=f
So, I'm thinking about this mostly in terms of NUMA nodes
because that's the use case I'm aware of.
The problem with using -global is not that it requires using
a different syntax to set properties for the default PHB,
but rather that such properties are then inherited by all
other PHBs unless explicitly overridden. Not creating the
default PHB at all would solve the issue.
On the other hand, libvirt would then need to either
1) only allow setting NUMA nodes for PHBs if QEMU supports
the new option, leaving QEMU < 2.10 users behind; or
2) implement handling for both the new and old behavior.
I'm not sure we could get away with 1), and going for 2)
means more work both for QEMU and libvirt developers for
very little actual gain, so I'd be inclined to scrap this
and just build the libvirt glue on top of the existing
interface.
That is, of course, unless
1) having a random selection of PHBs not assigned to any
NUMA node is a sensible use case. This is something
we just can't do reliably with the current interface:
we can decide to set the NUMA node only for say, PHBs
1 and 3 leaving 0 and 2 alone, but once we set it for
the default PHB we *have* to set it for all remaining
ones as well. libvirt will by default assign emulated
devices to the default PHB, so I would rather expect
users to leave that one alone and set a NUMA node for
all other PHBs; or
2) there are other properties outside of numa_node we
might want to deal with; or
3) it turns out it's okay to require a recent QEMU :)
--
Andrea Bolognani / Red Hat / Virtualization