-----Original Message-----
From: Martin Kletzander <mkletzan(a)redhat.com>
Sent: Thursday, March 25, 2021 10:28 PM
To: Daniel P. Berrangé <berrange(a)redhat.com>
Cc: Zhong, Luyao <luyao.zhong(a)intel.com>; libvir-list(a)redhat.com
Subject: Re: [libvirt][PATCH v4 0/3] introduce 'restrictive' mode in numatune
On Thu, Mar 25, 2021 at 02:14:47PM +0000, Daniel P. Berrangé wrote:
>On Thu, Mar 25, 2021 at 03:10:56PM +0100, Martin Kletzander wrote:
>> On Thu, Mar 25, 2021 at 09:11:02AM +0000, Zhong, Luyao wrote:
>> >
>> >
>> > > -----Original Message-----
>> > > From: Martin Kletzander <mkletzan(a)redhat.com>
>> > > Sent: Thursday, March 25, 2021 4:46 AM
>> > > To: Daniel P. Berrangé <berrange(a)redhat.com>
>> > > Cc: Zhong, Luyao <luyao.zhong(a)intel.com>;
libvir-list(a)redhat.com
>> > > Subject: Re: [libvirt][PATCH v4 0/3] introduce 'restrictive'
mode
>> > > in numatune
>> > >
>> > > On Tue, Mar 23, 2021 at 09:48:02AM +0000, Daniel P. Berrangé wrote:
>> > > >On Tue, Mar 23, 2021 at 10:59:02AM +0800, Luyao Zhong wrote:
>> > > >> Before this patch set, numatune only has three memory modes:
>> > > >> static, interleave and prefered. These memory policies are
>> > > >> ultimately set by mbind() system call.
>> > > >>
>> > > >> Memory policy could be 'hard coded' into the kernel,
but none
>> > > >> of above policies fit our requirment under this case.
mbind()
>> > > >> support default memory policy, but it requires a NULL
>> > > >> nodemask. So obviously setting allowed memory nodes is
cgroups'
mission under this case.
>> > > >> So we introduce a new option for mode in numatune named
'restrictive'.
>> > > >>
>> > > >> <numatune>
>> > > >> <memory mode="restrictive"
nodeset="1-4,^3"/>
>> > > >> <memnode cellid="0"
mode="restrictive" nodeset="1"/>
>> > > >> <memnode cellid="2"
mode="restrictive" nodeset="2"/>
>> > > >> </numatune>
>> > > >
>> > > >'restrictive' is rather a wierd name and doesn't
really tell me
>> > > >what the memory policy is going to be. As far as I can tell from
>> > > >the patches, it seems this causes us to not set any memory
>> > > >alllocation policy at all. IOW, we're using some undefined
host default
policy.
>> > > >
>> > > >Given this I think we should be calling it either "none"
or "default"
>> > > >
>> > >
>> > > I was against "default" because having such option possible,
but
>> > > the actual default being different sounds stupid. Similarly
>> > > "none" sounds like no restrictions are applied or that it is
the
>> > > same as if nothing was specified. It is funny to imagine the
>> > > situation when I am explaining to someone how to achieve this
solution:
>> > >
>> > > "The default is 'strict', you need to explicitly set
it to 'default'."
>> > >
>> > > or
>> > >
>> > > "What setting did you use?"
>> > > "None"
>> > > "As in no mode or in mode='none'?"
>> > >
>> > > As I said before, please come up with any name, but not these
>> > > that are IMHO actually more confusing.
>> > >
>> >
>> > Hi Daniel and Martin, thanks for your reply, just as Martin said
>> > current default mode is "strict", so "default" was
deprecated at
>> > the beginning when I proposed this change. And actually we have
>> > cgroups restricting the memory resource so could we call this a
>> > "none" mode? I still don't have a better name. ☹
>> >
>>
>> Me neither as figuring out the names when our names do not precisely
>> map to anything else (since we are using multiple solutions to get as
>> close to the desired result as possible) is difficult because there
>> is no similar pre-existing setting. And using anything like
"cgroups-only"
>> would limit us in the future, probably.
>
>What I'm still really missing in this series is a clear statement of
>what the problem with the current modes is, and what this new mode
>provides to solve it. The documentation for the new XML attribute is
>not clear on this and neither are the commit messages. There's a
>pointer to an enourmous mailing list thread, but reading through
>50 messages is a not a viable way to learn the answer.
>
>I'm not even certain that we should be introducing a new mode value at
>all, as opposed to a separate attribute.
>
Yes, Luyao, could you summarize the reason for the new mode? I think that the
difference in behaviour between using cgroups and memory binding as opposed
to just using cgroups should be enough for others to be able to figure out when
to use this mode and when not.
Sure.
Let me give a concrete use case first. There is a new feature in kernel but not merged
yet,
we call it memory tiering. (
https://lwn.net/Articles/802544/). If memory tiering is
enabled on host,
DRAM is top tier memory, and PMEM(persistent memory) is second tier memory, PMEM is shown
as
numa node without cpu. Pages can be migrated between DRAM node and PMEM node based on
DRAM
pressure and how cold/hot they are. *this memory policy* is implemented in kernel. So we
need
a default mode here, but from libvirt's perspective, the "defaut" mode is
"strict", it's not MPOL_DEFAULT
(
https://man7.org/linux/man-pages/man2/mbind.2.html) defined in kernel. Besides, to make
memory tiering
works well, cgroups setting is necessary, since it restricts that the pages can only be
migrated between the
DRAM and PMEM nodes that we specified (NUMA affinity support).
Except for upper use case, we might have some scenarios that only requires cgroups
restriction.
That's why "restrictive" mode is proposed.
In a word, if a user requires default mode(MPOL_DEFAULT) and require cgroups to restrict
memory allocation,
"restrictive" mode will be useful.
BR,
Luyao