On 10/24/22 13:54, Pavel Hrdina wrote:
On Sun, Oct 23, 2022 at 02:08:28PM +0200, Eric van Blokland wrote:
> systemd in hybrid mode uses v1 hierarchies for controllers and v2 for
> process tracking.
>
> The LXC code uses virCgroupAddMachineProcess() to move processes into
> appropriate cgroup by manipulating cgroupfs directly. (Note, despite
> libvirt also supports talking to systemd directly via
> org.freedesktop.machine1 API.)
>
> If this path is taken, libvirt/lxc must convince systemd that processes
> really belong to new cgroup, i.e. also the tracking v2 hierarchy must
> undergo migration too.
>
> The current check would evaluate v2 backend as unavailable with hybrid
> mode (because there are no available controllers). Simplify the
> condition and consider the mounted cgroup2 as sufficient to touch v2
> hierarchy.
>
> This consequently creates an issue with binding the V2 mount. In hybrid
> mode the V2 filesystem may be mounted upon the V1 filesystem. By reversing
> the order in which backends are mounted in virCgroupBindMount this problem
> is circumvented.
>
> Fixes: #182
> Signed-off-by: Eric van Blokland <mail(a)ericvanblokland.nl>
> ---
> src/util/vircgroup.c | 8 +++++---
> src/util/vircgroupv2.c | 12 ------------
> 2 files changed, 5 insertions(+), 15 deletions(-)
>
> diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c
> index a6a409af3d..48fbcf625a 100644
> --- a/src/util/vircgroup.c
> +++ b/src/util/vircgroup.c
> @@ -2924,9 +2924,11 @@ virCgroupBindMount(virCgroup *group, const char *oldroot,
> size_t i;
> virCgroup *parent = virCgroupGetNested(group);
>
> - for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) {
> - if (parent->backends[i] &&
> - parent->backends[i]->bindMount(parent, oldroot, mountopts) < 0)
{
> + /* In hybrid environments, V2 may be mounted over V1.
> + * Mount the backends in reverse order. */
I don't understand what you mean by mounted over?
> + for (i = 1; i <= VIR_CGROUP_BACKEND_TYPE_LAST; i++) {
> + if (parent->backends[VIR_CGROUP_BACKEND_TYPE_LAST - i] &&
> + parent->backends[VIR_CGROUP_BACKEND_TYPE_LAST -
i]->bindMount(parent, oldroot, mountopts) < 0) {
> return -1;
> }
> }
> diff --git a/src/util/vircgroupv2.c b/src/util/vircgroupv2.c
> index 4c110940cf..0e0c61d466 100644
> --- a/src/util/vircgroupv2.c
> +++ b/src/util/vircgroupv2.c
> @@ -75,22 +75,10 @@ virCgroupV2Available(void)
> if (STRNEQ(entry.mnt_type, "cgroup2"))
> continue;
>
> - /* Systemd uses cgroup v2 for process tracking but no controller is
> - * available. We should consider this configuration as cgroup v2 is
> - * not available. */
> - contFile = g_strdup_printf("%s/cgroup.controllers",
entry.mnt_dir);
> -
> - if (virFileReadAll(contFile, 1024 * 1024, &contStr) < 0)
> - goto cleanup;
> -
> - if (STREQ(contStr, ""))
> - continue;
> -
I don't like this at all and IMO this is incorrect fix of the issue you
are trying to address. In hybrid mode with systemd the cgroup v2
controller is not a real controller. It's something systemd uses for
process tracking and some other features. It is owned by systemd and we
should not touch it directly at all. We need to use proper systemd APIs
to make any changes to that directory or if needed ask systemd to create
cgroup with Delegate=yes which in this case is probably also not the
correct approach.
Maybe the proper way is to have systemd developers explain to us how the
hybrid mode should be handled. Because we came up with more solutions
but neither was good (enough). Either the cgroup hierarchy works for
qemu driver, or lxc driver or machined but never for all three at the
same time.
Alternatively, we may just refuse to use cgroups in hybrid mode which
even systemd maintainer admitted was a step sideways [1].
Michal
1:
https://github.com/systemd/systemd/issues/10107#issuecomment-424028793