On 07/24/14 17:03, Peter Krempa wrote:
On 07/24/14 16:40, Daniel P. Berrange wrote:
> On Thu, Jul 24, 2014 at 04:30:43PM +0200, Peter Krempa wrote:
>> On 07/24/14 16:21, Daniel P. Berrange wrote:
>>> On Thu, Jul 24, 2014 at 02:20:22PM +0200, Peter Krempa wrote:
>
>>> So from that POV, I'd say that when we initially configure the
>>> NUMA / huge page information for a guest at boot time, we should
>>> be doing that wrt to the 'maxMemory' size, instead of the current
>>> 'memory' size. ie the actual NUMA topology is all setup upfront
>>> even though the DIMMS are not present for some of this topology.
>>>
>>>> "address" determines the address in the guest's memory
space where the
>>>> memory will be mapped. This is optional and not recommended being set by
>>>> the user (except for special cases).
>>>>
>>>> For expansion the model="pflash" device may be added.
>>>>
>>>> For migration the target VM needs to be started with the hotplugged
>>>> modules already specified on the command line, which is in line how we
>>>> treat devices currently.
>>>>
>>>> My suggestion above contrasts with the approach Michal and Martin took
>>>> when adding the numa and hugepage backing capabilities as they describe
>>>> a node while this describes the memory device beneath it. I think those
>>>> two approaches can co-exist whilst being mutually-exclusive. Simply when
>>>> using memory hotplug, the memory will need to be specified using the
>>>> memory modules. Non-hotplug guests could use the approach defined
>>>> originally.
>>>
>>> I don't think it is viable to have two different approaches for
configuring
>>> NUMA / huge page information. Apps should not have to change the way they
>>> configure NUMA/hugepages when they decide they want to take advantage of
>>> DIMM hotplug.
>>
>> Well, the two approaches are orthogonal in the information they store.
>> The existing approach stores the memory topology from the point of view
>> of the numa node whereas the <device> based approach from the point of
>> the memory module.
>
> Sure, they are clearly designed from different POV, but I'm saying that
> from an application POV is it very unpleasant to have 2 different ways
> to configure the same concept in the XML. So I really don't want us to
> go down that route unless there is absolutely no other option to achieve
> an acceptable level of functionality. If that really were the case, then
> I would strongly consider reverting everything related to NUMA that we
> have just done during this dev cycle and not releasing it as is.
>
>> The difference is that the existing approach currently wouldn't allow
>> splitting a numa node into more memory devices to allow
>> plugging/unplugging them.
>
> There's no reason why we have to assume 1 memory slot per guest or
> per node when booting the guest. If the user wants the ability to
> unplug, they could set their XML config so the guest has arbitrary
> slot granularity. eg if i have a guest
>
> - memory == 8 GB
> - max-memory == 16 GB
> - NUMA nodes == 4
>
> Then we could allow them to specify 32 memory slots each 512 MB
> in size. This would allow them to plug/unplug memory from NUMA
> nodes in 512 MB granularity.
In real hardware you still can plug in modules of different sizes. (eg
1GiB + 2Gib) ...
Well, while this makes it pretty close to real hardware, the emulated
one doesn't have a problem with plugging "dimms" of weird
(non-power-of-2) sizing. And we are loosing flexibility due to that.
Hmm, now that the rest of the Hugepage stuff was pushed and the release
is rather soon. What approach should I take? I'd rather avoid crippling
the interface for memory hotplug and having to add separate apis and
other stuff and mostly I'd like to avoid having to re-do it after
consumers of libvirt deem it to be unflexible.
Peter