Before this patch set, numatune only has three memory modes:
static, interleave and prefered. These memory policies are
ultimately set by mbind() system call.
Memory policy could be 'hard coded' into the kernel, but none of
above policies fit our requirment under this case. mbind() support
default memory policy, but it requires a NULL nodemask. So obviously
setting allowed memory nodes is cgroups' mission under this case.
So we introduce a new option for mode in numatune named 'restrictive'.
<numatune>
<memory mode="restrictive" nodeset="1-4,^3"/>
<memnode cellid="0" mode="restrictive"
nodeset="1"/>
<memnode cellid="2" mode="restrictive"
nodeset="2"/>
</numatune>
The config above means we only use cgroups to restrict the allowed
memory nodes and not setting any specific memory policies explicitly.
For this new "restrictive" mode, there is a concrete use case about a
new feature in kernel but not merged yet, we call it memory tiering.
(
https://lwn.net/Articles/802544/).
If memory tiering is enabled on host, DRAM is top tier memory, and
PMEM(persistent memory) is second tier memory, PMEM is shown as numa node
without cpu. Pages can be migrated between DRAM node and PMEM node based on
DRAM pressure and how cold/hot they are. *this memory policy* is implemented
in kernel. So we need a default mode here, but from libvirt's perspective,
the "defaut" mode is "strict", it's not MPOL_DEFAULT
(
https://man7.org/linux/man-pages/man2/mbind.2.html) defined in kernel.
And to make memory tiering works well, cgroups setting is necessary, since
it restricts that the pages can only be migrated between the DRAM and PMEM
nodes that we specified (NUMA affinity support).
Just using cgroups with multiple nodes in the nodeset makes kernel decide
on which node (out of those in the restricted set) to allocate on, but specifying
"strict" basically allocates it sequentially (on the first one until it is
full,
then on the next one and so on).
In a word, if a user requires default mode(MPOL_DEFAULT), that means they want
kernel decide the memory allocation and also want the cgroups to restrict memory
nodes, "restrictive" mode will be useful.
BR,
Luyao
Luyao Zhong (3):
docs: add docs for 'restrictive' option for mode in numatune
schema: add 'restrictive' config option for mode in numatune
qemu: add parser and formatter for 'restrictive' mode in numatune
docs/formatdomain.rst | 7 +++-
docs/schemas/domaincommon.rng | 2 +
include/libvirt/libvirt-domain.h | 1 +
src/conf/numa_conf.c | 9 ++++
src/qemu/qemu_command.c | 6 ++-
src/qemu/qemu_process.c | 27 ++++++++++++
src/util/virnuma.c | 3 ++
.../numatune-memnode-invalid-mode.err | 1 +
.../numatune-memnode-invalid-mode.xml | 33 +++++++++++++++
...emnode-restrictive-mode.x86_64-latest.args | 38 +++++++++++++++++
.../numatune-memnode-restrictive-mode.xml | 33 +++++++++++++++
tests/qemuxml2argvtest.c | 2 +
...memnode-restrictive-mode.x86_64-latest.xml | 41 +++++++++++++++++++
tests/qemuxml2xmltest.c | 1 +
14 files changed, 201 insertions(+), 3 deletions(-)
create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.err
create mode 100644 tests/qemuxml2argvdata/numatune-memnode-invalid-mode.xml
create mode 100644
tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.x86_64-latest.args
create mode 100644 tests/qemuxml2argvdata/numatune-memnode-restrictive-mode.xml
create mode 100644
tests/qemuxml2xmloutdata/numatune-memnode-restrictive-mode.x86_64-latest.xml
--
2.25.4