[PATCH 0/2] Small fixes for restrictive numatune mode

See individual commit messages for explanations (duh!) Martin Kletzander (2): qemu, ch: Move threads to cgroup dir before changing parameters docs: Clarify restrictive numatune mode docs/formatdomain.rst | 11 ++++++++++- src/ch/ch_process.c | 12 ++++++------ src/qemu/qemu_process.c | 10 +++++----- 3 files changed, 21 insertions(+), 12 deletions(-) -- 2.40.0

With cgroupv2 this has better effect on the resource allocation. An excerpt from Documentation/admin-guide/cgroup-v2.rst explains is this way: Migrating a process across cgroups is a relatively expensive operation and stateful resources such as memory are not moved together with the process. This is an explicit design decision as there often exist inherent trade-offs between migration and various hot paths in terms of synchronization cost. [...] Setting a non-empty value to "cpuset.mems" causes memory of tasks within the cgroup to be migrated to the designated nodes if they are currently using memory outside of the designated nodes. Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- src/ch/ch_process.c | 12 ++++++------ src/qemu/qemu_process.c | 10 +++++----- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/src/ch/ch_process.c b/src/ch/ch_process.c index c145a7849647..44c5b0611ef5 100644 --- a/src/ch/ch_process.c +++ b/src/ch/ch_process.c @@ -262,6 +262,12 @@ virCHProcessSetupPid(virDomainObj *vm, if (virCgroupNewThread(priv->cgroup, nameval, id, true, &cgroup) < 0) goto cleanup; + /* Move the thread to the sub dir before changing the settings so that + * all take effect even with cgroupv2. */ + VIR_INFO("Adding pid %d to cgroup", pid); + if (virCgroupAddThread(cgroup, pid) < 0) + goto cleanup; + if (virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) { if (use_cpumask && virDomainCgroupSetupCpusetCpus(cgroup, use_cpumask) < 0) @@ -274,12 +280,6 @@ virCHProcessSetupPid(virDomainObj *vm, if (virDomainCgroupSetupVcpuBW(cgroup, period, quota) < 0) goto cleanup; - - /* Move the thread to the sub dir */ - VIR_INFO("Adding pid %d to cgroup", pid); - if (virCgroupAddThread(cgroup, pid) < 0) - goto cleanup; - } if (!affinity_cpumask) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index b9e9a7d320c4..8baa88287567 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -2630,6 +2630,11 @@ qemuProcessSetupPid(virDomainObj *vm, if (virCgroupNewThread(priv->cgroup, nameval, id, true, &cgroup) < 0) goto cleanup; + /* Move the thread to the sub dir before changing the settings so that + * all take effect even with cgroupv2. */ + if (virCgroupAddThread(cgroup, pid) < 0) + goto cleanup; + if (virCgroupHasController(priv->cgroup, VIR_CGROUP_CONTROLLER_CPUSET)) { if (use_cpumask && virDomainCgroupSetupCpusetCpus(cgroup, use_cpumask) < 0) @@ -2642,11 +2647,6 @@ qemuProcessSetupPid(virDomainObj *vm, if (virDomainCgroupSetupVcpuBW(cgroup, period, quota) < 0) goto cleanup; - - /* Move the thread to the sub dir */ - if (virCgroupAddThread(cgroup, pid) < 0) - goto cleanup; - } if (!affinity_cpumask) -- 2.40.0

Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2185184 Signed-off-by: Martin Kletzander <mkletzan@redhat.com> --- docs/formatdomain.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/formatdomain.rst b/docs/formatdomain.rst index c31c2fe35c76..cb3fe3dc1c33 100644 --- a/docs/formatdomain.rst +++ b/docs/formatdomain.rst @@ -1207,7 +1207,10 @@ NUMA Node Tuning 'restrictive', defaults to 'strict'. The value 'restrictive' specifies using system default policy and only cgroups is used to restrict the memory nodes, and it requires setting mode to 'restrictive' in ``memnode`` - elements. Attribute ``nodeset`` specifies the NUMA nodes, using the same + elements (see quirk below). This exists solely for the purpose of being able + to request movement of such memory for a running domain using ``virsh + numatune`` or ``virDomainSetNumaParameters` and is not guaranteed to happen. + Attribute ``nodeset`` specifies the NUMA nodes, using the same syntax as attribute ``cpuset`` of element ``vcpu``. Attribute ``placement`` ( :since:`since 0.9.12` ) can be used to indicate the memory placement mode for domain process, its value can be either "static" or "auto", defaults to @@ -1227,6 +1230,12 @@ NUMA Node Tuning addresses guest NUMA node for which the settings are applied. Attributes ``mode`` and ``nodeset`` have the same meaning and syntax as in ``memory`` element. This setting is not compatible with automatic placement. + Note that for ``memnode`` this will only guide the memory access for the vCPU + threads or similar mechanism and is very hypervisor-specific. This does not + guarantee the placement of the node's memory allocation. For proper + restriction other means should be used (e.g. different mode, preallocated + hugepages). + :since:`QEMU Since 1.2.7` -- 2.40.0

On 4/20/23 10:49, Martin Kletzander wrote:
See individual commit messages for explanations (duh!)
Martin Kletzander (2): qemu, ch: Move threads to cgroup dir before changing parameters docs: Clarify restrictive numatune mode
docs/formatdomain.rst | 11 ++++++++++- src/ch/ch_process.c | 12 ++++++------ src/qemu/qemu_process.c | 10 +++++----- 3 files changed, 21 insertions(+), 12 deletions(-)
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal

On Thu, Apr 20, 2023 at 10:49:17AM +0200, Martin Kletzander wrote:
See individual commit messages for explanations (duh!)
Martin Kletzander (2): qemu, ch: Move threads to cgroup dir before changing parameters docs: Clarify restrictive numatune mode
Reviewed-by: Pavel Hrdina <phrdina@redhat.com>
participants (3)
-
Martin Kletzander
-
Michal Prívozník
-
Pavel Hrdina