On Tue, 21 Jan 2014 11:10:30 +0100
Andreas Färber <afaerber(a)suse.de> wrote:
Am 21.01.2014 10:51, schrieb Chen Fan:
> On Tue, 2014-01-21 at 10:31 +0100, Igor Mammedov wrote:
>> On Tue, 21 Jan 2014 15:12:45 +0800
>> Chen Fan <chen.fan.fnst(a)cn.fujitsu.com> wrote:
>>> On Mon, 2014-01-20 at 13:29 +0100, Igor Mammedov wrote:
>>>> On Fri, 17 Jan 2014 17:13:55 -0200
>>>> Eduardo Habkost <ehabkost(a)redhat.com> wrote:
>>>>> On Wed, Jan 15, 2014 at 03:37:04PM +0100, Igor Mammedov wrote:
>>>>>> I recall there were objections to it since APIC ID contains
topology
>>>>>> information and it's not trivial for user to get it right.
>>>>>> The last idea that was discussed to fix it was not expose APIC
ID to
>>>>>> user but rather introduce QOM hierarchy like:
>>>>>> /machine/node/N/socket/X/core/Y/thread/Z
>>>>>> and use it in user interface as a means to specify an arbitrary
CPU
>>>>>> and let QEMU calculate APIC ID based on this path.
>>>>>>
>>>>>> But nobody took on implementing it yet.
>>>>>
>>>>> We're taking so long to get a decent interface implemented, that
part of
>>>>> me is considering exposing the APIC ID directly like suggested
before,
>>>>> and requiring libvirt to calculate topology-aware APIC IDs[1] to
>>>>> properly implement CPU hotplug (and possibly for other tasks).
>>>> If you are speaking about
>>>> 'qemu will core dump with "-smp 254, sockets=2, cores=3,
threads=2"'
>>>>
http://patchwork.ozlabs.org/patch/301272/
>>>> bug then it's limitation of ACPI implementation,
>>>> I'm going to refactor it to use full APIC ids instead of using
bitmap,
>>>> so that we won't ever run into issue regardless of cpu supported CPU
count.
>>>>
>>>>>
>>>>> Another part of me is hoping that the libvirt developers ask us to
>>>>> please not do that, so I can use it as argument against exposing
the
>>>>> APIC IDs directly the next time we discuss this. :)
>>>>
>>>> why not try your /machine/node/N/socket/X/core/Y/thread/Z idea first.
>>>> It will benefit not only cpu hotplug but also '-numa' and
topology
>>>> description in general.
>>>>
>>> have there been any plan/model of the idea? Need to add a new option to
>>> qemu command?
>> I suppose we can start with internal default implementation first.
>>
>> one way could be
>> 1. let machine prebuild empty QOM tree
/machine/node/N/socket/X/core/Y/thread/Z
>> 2. add node, socket, core, thread properties to CPU and link CPU into
respective
>> link created by #1
>>
> Thanks, I hope I can take some time to make some patches to implement
> it.
Please give us a few hours to reply. :)
/machine/node seems too broad a term to me.
You can't prebuild the full tree, you can only prepare the nodes.
core[Y]/thread[Z] was previously discussed as syntax.
The important part to decide on will be what is going to be child<> and
what link<>. Has anyone played with the Intel Quark platform for
instance? (Galileo board or upcoming Edison card) On a regular
mainboard, we would have socket[X] as a link<x86_64-cpu>, which might
point to a child<cpu> /machine/memory-node[W]/cpu[X]. But if we do so we
can't reassign it to another memory node - acceptable? With Quark (or
Qseven modules etc.) there would be a container object rather than the
/machine itself that has a child<i386-cpu> instead of a link<i386-cpu>.
I guess the memory nodes could still be on the /machine though.
The other point of discussion between Anthony and me was whether core[Y]
should be a link<> or child<>, same for thread. I believe a child<> is
better as it enforces that unrealizing the CPU will unrealize all its
cores and all its threads in the future.
In terms of parent/child relationship, I
guess we are not going to come up
with uniform design, since boards could differ very much in that aspect.
I was rather thinking in terms of providing stable/uniform CLI/QMP NUMA
interface using QOM tree.
At startup we potentially have cpu topology information and set of NUMA
nodes, so we could pre-build containers up to the point where CPU threads
are attached and pre-create empty links<CPU> and fill them later with actual
CPU threads.
More issues may pop up when thinking about it longer than a few minutes.
But yes, we need to start investigating this, and so far I had other
priorities like getting the CPUState mess I created cleaned up.
Regards,
Andreas