[libvirt PATCH 0/9] fix cgroups on systemd hosts

When running on host with systemd there is an ownership issue of the root VM cgroup. When it is created for us by systemd using machined the owner of the root VM cgroup is systemd and we should not touch any of the files as systemd can and will modify any values configured there. Basically we had the issue since introduction of machined support in libvirt 1.1.1 back in 2013. With systemd implementing more cgroup APIs the `systemctl daemon-reload` would change more values configured by libvirt. The solution to the issue is to use systemd DBus APIs to configure cgroups but unfortunately they don't cover everything that libvirt needs. For that reason we will use systemd DBus APIs only for values that affect sibling cgroups where the resources are distributed proportionally, such as blkio.weight or cpu.shares. For the remaining resources we will keep the current code where we work with the files directly but we move everything into a child cgroup of the VM root cgroup where we are free to do whatever we like including thread configuration. Pavel Hrdina (9): virsystemd: export virSystemdHasMachined virsystemd: introduce virSystemdGetMachineByPID virsystemd: introduce virSystemdGetMachineUnitByPID vircgroup: use DBus call to systemd for some APIs vircgroupv1: refactor virCgroupV1DetectPlacement vircgroupv2: move task into cgroup before enabling controllers vircgroup: introduce virCgroupV1Exists and virCgroupV2Exists vircgroup: introduce nested cgroup to properly work with systemd tests: add cgroup nested tests docs/cgroups.html.in | 29 +- src/libvirt_private.syms | 2 + src/util/vircgroup.c | 300 ++++++++++++++---- src/util/vircgroupbackend.h | 5 + src/util/vircgrouppriv.h | 10 + src/util/vircgroupv1.c | 122 +++++-- src/util/vircgroupv2.c | 82 ++++- src/util/virsystemd.c | 105 +++++- src/util/virsystemd.h | 4 + tests/vircgroupdata/systemd-legacy.cgroups | 12 + tests/vircgroupdata/systemd-legacy.mounts | 11 + .../vircgroupdata/systemd-legacy.self.cgroup | 11 + tests/vircgroupdata/systemd-unified.cgroups | 13 + tests/vircgroupdata/systemd-unified.mounts | 1 + .../vircgroupdata/systemd-unified.self.cgroup | 1 + tests/vircgrouptest.c | 72 +++++ tests/virsystemdtest.c | 39 ++- 17 files changed, 687 insertions(+), 132 deletions(-) create mode 100644 tests/vircgroupdata/systemd-legacy.cgroups create mode 100644 tests/vircgroupdata/systemd-legacy.mounts create mode 100644 tests/vircgroupdata/systemd-legacy.self.cgroup create mode 100644 tests/vircgroupdata/systemd-unified.cgroups create mode 100644 tests/vircgroupdata/systemd-unified.mounts create mode 100644 tests/vircgroupdata/systemd-unified.self.cgroup -- 2.29.2

Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virsystemd.c | 2 +- src/util/virsystemd.h | 2 ++ 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0636b0d8c9..b2d2282ce0 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3304,6 +3304,7 @@ virSystemdGetActivation; virSystemdGetMachineNameByPID; virSystemdHasLogind; virSystemdHasLogindResetCachedValue; +virSystemdHasMachined; virSystemdHasMachinedResetCachedValue; virSystemdMakeScopeName; virSystemdMakeSliceName; diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 3f689365e4..58b5c650b5 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -147,7 +147,7 @@ void virSystemdHasLogindResetCachedValue(void) * -1 = error * 0 = machine1 is available */ -static int +int virSystemdHasMachined(void) { int ret; diff --git a/src/util/virsystemd.h b/src/util/virsystemd.h index dfea75948b..9ce16b7de1 100644 --- a/src/util/virsystemd.h +++ b/src/util/virsystemd.h @@ -57,6 +57,8 @@ int virSystemdTerminateMachine(const char *name); void virSystemdNotifyStartup(void); +int virSystemdHasMachined(void); + int virSystemdHasLogind(void); int virSystemdCanSuspend(bool *result); -- 2.29.2

Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- src/util/virsystemd.c | 52 ++++++++++++++++++++++++++++++------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index 58b5c650b5..d2fcf0bf0c 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -194,21 +194,21 @@ virSystemdHasLogind(void) } -char * -virSystemdGetMachineNameByPID(pid_t pid) +/** + * virSystemdGetMachineByPID: + * @conn: dbus connection + * @pid: pid of running VM + * + * Returns dbus object path to VM registered with machined. + * On error returns NULL. + */ +static char * +virSystemdGetMachineByPID(GDBusConnection *conn, + pid_t pid) { - GDBusConnection *conn; g_autoptr(GVariant) message = NULL; g_autoptr(GVariant) reply = NULL; - g_autoptr(GVariant) gvar = NULL; - g_autofree char *object = NULL; - char *name = NULL; - - if (virSystemdHasMachined() < 0) - return NULL; - - if (!(conn = virGDBusGetSystemBus())) - return NULL; + char *object = NULL; message = g_variant_new("(u)", pid); @@ -225,13 +225,33 @@ virSystemdGetMachineNameByPID(pid_t pid) g_variant_get(reply, "(o)", &object); - g_variant_unref(reply); - reply = NULL; - VIR_DEBUG("Domain with pid %lld has object path '%s'", (long long) pid, object); - g_variant_unref(message); + return object; +} + + +char * +virSystemdGetMachineNameByPID(pid_t pid) +{ + GDBusConnection *conn; + g_autoptr(GVariant) message = NULL; + g_autoptr(GVariant) reply = NULL; + g_autoptr(GVariant) gvar = NULL; + g_autofree char *object = NULL; + char *name = NULL; + + if (virSystemdHasMachined() < 0) + return NULL; + + if (!(conn = virGDBusGetSystemBus())) + return NULL; + + object = virSystemdGetMachineByPID(conn, pid); + if (!object) + return NULL; + message = g_variant_new("(ss)", "org.freedesktop.machine1.Machine", "Name"); -- 2.29.2

Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- src/libvirt_private.syms | 1 + src/util/virsystemd.c | 51 ++++++++++++++++++++++++++++++++++++++++ src/util/virsystemd.h | 2 ++ tests/virsystemdtest.c | 39 ++++++++++++++++++++++++++---- 4 files changed, 88 insertions(+), 5 deletions(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b2d2282ce0..837986845f 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -3302,6 +3302,7 @@ virSystemdCanSuspend; virSystemdCreateMachine; virSystemdGetActivation; virSystemdGetMachineNameByPID; +virSystemdGetMachineUnitByPID; virSystemdHasLogind; virSystemdHasLogindResetCachedValue; virSystemdHasMachined; diff --git a/src/util/virsystemd.c b/src/util/virsystemd.c index d2fcf0bf0c..c109324e96 100644 --- a/src/util/virsystemd.c +++ b/src/util/virsystemd.c @@ -276,6 +276,57 @@ virSystemdGetMachineNameByPID(pid_t pid) } +/** + * virSystemdGetMachineUnitByPID: + * @pid: pid of running VM + * + * Returns systemd Unit name of a running VM registered with machined. + * On error returns NULL. + */ +char * +virSystemdGetMachineUnitByPID(pid_t pid) +{ + GDBusConnection *conn; + g_autoptr(GVariant) message = NULL; + g_autoptr(GVariant) reply = NULL; + g_autoptr(GVariant) gvar = NULL; + g_autofree char *object = NULL; + char *unit = NULL; + + if (virSystemdHasMachined() < 0) + return NULL; + + if (!(conn = virGDBusGetSystemBus())) + return NULL; + + object = virSystemdGetMachineByPID(conn, pid); + if (!object) + return NULL; + + message = g_variant_new("(ss)", + "org.freedesktop.machine1.Machine", "Unit"); + + if (virGDBusCallMethod(conn, + &reply, + G_VARIANT_TYPE("(v)"), + NULL, + "org.freedesktop.machine1", + object, + "org.freedesktop.DBus.Properties", + "Get", + message) < 0) + return NULL; + + g_variant_get(reply, "(v)", &gvar); + g_variant_get(gvar, "s", &unit); + + VIR_DEBUG("Domain with pid %lld has unit name '%s'", + (long long) pid, unit); + + return unit; +} + + /** * virSystemdCreateMachine: * @name: driver unique name of the machine diff --git a/src/util/virsystemd.h b/src/util/virsystemd.h index 9ce16b7de1..cd329c49f9 100644 --- a/src/util/virsystemd.h +++ b/src/util/virsystemd.h @@ -69,6 +69,8 @@ int virSystemdCanHybridSleep(bool *result); char *virSystemdGetMachineNameByPID(pid_t pid); +char *virSystemdGetMachineUnitByPID(pid_t pid); + int virSystemdGetActivation(virSystemdActivationMap *map, size_t nmap, virSystemdActivationPtr *act); diff --git a/tests/virsystemdtest.c b/tests/virsystemdtest.c index bd0ca51140..b48cdb950c 100644 --- a/tests/virsystemdtest.c +++ b/tests/virsystemdtest.c @@ -53,11 +53,10 @@ VIR_MOCK_WRAP_RET_ARGS(g_dbus_connection_call_sync, GError **, error) { GVariant *reply = NULL; + g_autoptr(GVariant) params = parameters; - if (parameters) { - g_variant_ref_sink(parameters); - g_variant_unref(parameters); - } + if (params) + g_variant_ref_sink(params); VIR_MOCK_REAL_INIT(g_dbus_connection_call_sync); @@ -71,7 +70,19 @@ VIR_MOCK_WRAP_RET_ARGS(g_dbus_connection_call_sync, reply = g_variant_new("(o)", "/org/freedesktop/machine1/machine/qemu_2ddemo"); } else if (STREQ(method_name, "Get")) { - reply = g_variant_new("(v)", g_variant_new_string("qemu-demo")); + const char *prop; + g_variant_get(params, "(@s&s)", NULL, &prop); + + if (STREQ(prop, "Name")) { + reply = g_variant_new("(v)", g_variant_new_string("qemu-demo")); + } else if (STREQ(prop, "Unit")) { + reply = g_variant_new("(v)", + g_variant_new_string("machine-qemu-demo.scope")); + } else { + *error = g_dbus_error_new_for_dbus_error( + "org.freedesktop.systemd.badthing", + "Unknown machine property"); + } } else { reply = g_variant_new("()"); } @@ -330,6 +341,23 @@ testGetMachineName(const void *opaque G_GNUC_UNUSED) } +static int +testGetMachineUnit(const void *opaque G_GNUC_UNUSED) +{ + g_autofree char *tmp = virSystemdGetMachineUnitByPID(1234); + + if (!tmp) { + fprintf(stderr, "%s", "Failed to create get machine unit\n"); + return -1; + } + + if (STREQ(tmp, "machine-qemu-demo.scope")) + return 0; + + return -1; +} + + struct testNameData { const char *name; const char *expected; @@ -656,6 +684,7 @@ mymain(void) DO_TEST("Test create bad systemd ", testCreateBadSystemd); DO_TEST("Test create with network ", testCreateNetwork); DO_TEST("Test getting machine name ", testGetMachineName); + DO_TEST("Test getting machine unit ", testGetMachineUnit); # define TEST_SCOPE(_name, unitname, _legacy) \ do { \ -- 2.29.2

When running on host with systemd we register VMs with machined. In this case systemd creates the root VM cgroup for us. This has some implications where one of them is that systemd owns all files inside the root VM cgroup and we should not touch them. If we change any value in file that systemd knows about it will be changed to what systemd thinks it should be when executing `systemctl daemon-reload`. These are the APIs that we need to call using systemd because they set limits that are proportional to sibling cgroups. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- src/util/vircgroup.c | 41 ++++++++++++++++++++++++++++++++++ src/util/vircgrouppriv.h | 6 +++++ src/util/vircgroupv1.c | 48 ++++++++++++++++++++++++++++------------ src/util/vircgroupv2.c | 48 ++++++++++++++++++++++++++++------------ 4 files changed, 115 insertions(+), 28 deletions(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index b5c74e94de..7a0e14eb73 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -41,6 +41,7 @@ #include "virerror.h" #include "virlog.h" #include "virfile.h" +#include "virgdbus.h" #include "virhash.h" #include "virstring.h" #include "virsystemd.h" @@ -484,6 +485,37 @@ virCgroupGetBlockDevString(const char *path) } +int +virCgroupSetValueDBus(const char *unitName, + const char *key, + GVariant *value) +{ + GDBusConnection *conn; + g_autoptr(GVariant) message = NULL; + GVariantBuilder builder; + GVariant *props = NULL; + + g_variant_builder_init(&builder, G_VARIANT_TYPE("a(sv)")); + g_variant_builder_add(&builder, "(sv)", key, value); + props = g_variant_builder_end(&builder); + + message = g_variant_new("(sb@a(sv))", unitName, true, props); + + if (!(conn = virGDBusGetSystemBus())) + return -1; + + return virGDBusCallMethod(conn, + NULL, + NULL, + NULL, + "org.freedesktop.systemd1", + "/org/freedesktop/systemd1", + "org.freedesktop.systemd1.Manager", + "SetUnitProperties", + message); +} + + int virCgroupSetValueRaw(const char *path, const char *value) @@ -1129,6 +1161,10 @@ virCgroupNewDetectMachine(const char *name, } } + newGroup->unitName = virSystemdGetMachineUnitByPID(pid); + if (virSystemdHasMachined() == 0 && !newGroup->unitName) + return -1; + *group = g_steal_pointer(&newGroup); return 0; } @@ -1229,6 +1265,10 @@ virCgroupNewMachineSystemd(const char *name, if (virCgroupEnableMissingControllers(path, controllers, &newGroup) < 0) return -1; + newGroup->unitName = virSystemdGetMachineUnitByPID(pidleader); + if (!newGroup->unitName) + return -1; + if (virCgroupAddProcess(newGroup, pidleader) < 0) { virErrorPtr saved; @@ -3588,6 +3628,7 @@ virCgroupFree(virCgroupPtr group) g_free(group->unified.mountPoint); g_free(group->unified.placement); + g_free(group->unitName); g_free(group); } diff --git a/src/util/vircgrouppriv.h b/src/util/vircgrouppriv.h index 85ba5393e0..f88d00814d 100644 --- a/src/util/vircgrouppriv.h +++ b/src/util/vircgrouppriv.h @@ -64,8 +64,14 @@ struct _virCgroup { virCgroupV1Controller legacy[VIR_CGROUP_CONTROLLER_LAST]; virCgroupV2Controller unified; + + char *unitName; }; +int virCgroupSetValueDBus(const char *unitName, + const char *key, + GVariant *value); + int virCgroupSetValueRaw(const char *path, const char *value); diff --git a/src/util/vircgroupv1.c b/src/util/vircgroupv1.c index 2b4d625c35..edb2c23bbc 100644 --- a/src/util/vircgroupv1.c +++ b/src/util/vircgroupv1.c @@ -946,7 +946,6 @@ virCgroupV1SetBlkioWeight(virCgroupPtr group, unsigned int weight) { g_autofree char *path = NULL; - g_autofree char *value = NULL; if (virCgroupV1PathOfController(group, VIR_CGROUP_CONTROLLER_BLKIO, "blkio.bfq.weight", &path) < 0) { @@ -968,9 +967,15 @@ virCgroupV1SetBlkioWeight(virCgroupPtr group, return -1; } - value = g_strdup_printf("%u", weight); + if (group->unitName) { + GVariant *value = g_variant_new("t", weight); - return virCgroupSetValueRaw(path, value); + return virCgroupSetValueDBus(group->unitName, "BlockIOWeight", value); + } else { + g_autofree char *value = g_strdup_printf("%u", weight); + + return virCgroupSetValueRaw(path, value); + } } @@ -1203,15 +1208,8 @@ virCgroupV1SetBlkioDeviceWeight(virCgroupPtr group, const char *devPath, unsigned int weight) { - g_autofree char *str = NULL; - g_autofree char *blkstr = NULL; g_autofree char *path = NULL; - if (!(blkstr = virCgroupGetBlockDevString(devPath))) - return -1; - - str = g_strdup_printf("%s%d", blkstr, weight); - if (virCgroupV1PathOfController(group, VIR_CGROUP_CONTROLLER_BLKIO, "blkio.weight_device", &path) < 0) { return -1; @@ -1223,7 +1221,23 @@ virCgroupV1SetBlkioDeviceWeight(virCgroupPtr group, return -1; } - return virCgroupSetValueRaw(path, str); + if (group->unitName) { + GVariant *value = NULL; + + value = g_variant_new_parsed("[(%s, uint64 %u)]", path, weight); + + return virCgroupSetValueDBus(group->unitName, "BlockIODeviceWeight", value); + } else { + g_autofree char *str = NULL; + g_autofree char *blkstr = NULL; + + if (!(blkstr = virCgroupGetBlockDevString(devPath))) + return -1; + + str = g_strdup_printf("%s%d", blkstr, weight); + + return virCgroupSetValueRaw(path, str); + } } @@ -1862,9 +1876,15 @@ static int virCgroupV1SetCpuShares(virCgroupPtr group, unsigned long long shares) { - return virCgroupSetValueU64(group, - VIR_CGROUP_CONTROLLER_CPU, - "cpu.shares", shares); + if (group->unitName) { + GVariant *value = g_variant_new("t", shares); + + return virCgroupSetValueDBus(group->unitName, "CPUShares", value); + } else { + return virCgroupSetValueU64(group, + VIR_CGROUP_CONTROLLER_CPU, + "cpu.shares", shares); + } } diff --git a/src/util/vircgroupv2.c b/src/util/vircgroupv2.c index 4a239f067a..3f3e5ac83d 100644 --- a/src/util/vircgroupv2.c +++ b/src/util/vircgroupv2.c @@ -609,7 +609,6 @@ virCgroupV2SetBlkioWeight(virCgroupPtr group, unsigned int weight) { g_autofree char *path = NULL; - g_autofree char *value = NULL; const char *format = "%u"; if (virCgroupV2PathOfController(group, VIR_CGROUP_CONTROLLER_BLKIO, @@ -633,9 +632,15 @@ virCgroupV2SetBlkioWeight(virCgroupPtr group, return -1; } - value = g_strdup_printf(format, weight); + if (group->unitName) { + GVariant *value = g_variant_new("t", weight); - return virCgroupSetValueRaw(path, value); + return virCgroupSetValueDBus(group->unitName, "IOWeight", value); + } else { + g_autofree char *value = g_strdup_printf(format, weight); + + return virCgroupSetValueRaw(path, value); + } } @@ -820,13 +825,6 @@ virCgroupV2SetBlkioDeviceWeight(virCgroupPtr group, unsigned int weight) { g_autofree char *path = NULL; - g_autofree char *str = NULL; - g_autofree char *blkstr = NULL; - - if (!(blkstr = virCgroupGetBlockDevString(devPath))) - return -1; - - str = g_strdup_printf("%s%d", blkstr, weight); if (virCgroupV2PathOfController(group, VIR_CGROUP_CONTROLLER_BLKIO, "io.weight", &path) < 0) { @@ -839,7 +837,23 @@ virCgroupV2SetBlkioDeviceWeight(virCgroupPtr group, return -1; } - return virCgroupSetValueRaw(path, str); + if (group->unitName) { + GVariant *value = NULL; + + value = g_variant_new_parsed("[(%s, uint64 %u)]", path, weight); + + return virCgroupSetValueDBus(group->unitName, "IODeviceWeight", value); + } else { + g_autofree char *str = NULL; + g_autofree char *blkstr = NULL; + + if (!(blkstr = virCgroupGetBlockDevString(devPath))) + return -1; + + str = g_strdup_printf("%s%d", blkstr, weight); + + return virCgroupSetValueRaw(path, str); + } } @@ -1458,9 +1472,15 @@ static int virCgroupV2SetCpuShares(virCgroupPtr group, unsigned long long shares) { - return virCgroupSetValueU64(group, - VIR_CGROUP_CONTROLLER_CPU, - "cpu.weight", shares); + if (group->unitName) { + GVariant *value = g_variant_new("t", shares); + + return virCgroupSetValueDBus(group->unitName, "CPUWeight", value); + } else { + return virCgroupSetValueU64(group, + VIR_CGROUP_CONTROLLER_CPU, + "cpu.weight", shares); + } } -- 2.29.2

Remove one level of indentation by splitting the condition. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- src/util/vircgroupv1.c | 37 +++++++++++++++++++++---------------- 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/src/util/vircgroupv1.c b/src/util/vircgroupv1.c index edb2c23bbc..f4dd20fd73 100644 --- a/src/util/vircgroupv1.c +++ b/src/util/vircgroupv1.c @@ -339,23 +339,28 @@ virCgroupV1DetectPlacement(virCgroupPtr group, for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { const char *typestr = virCgroupV1ControllerTypeToString(i); - if (virCgroupV1MountOptsMatchController(controllers, typestr) && - group->legacy[i].mountPoint != NULL && - group->legacy[i].placement == NULL) { - /* - * selfpath == "/" + path="" -> "/" - * selfpath == "/libvirt.service" + path == "" -> "/libvirt.service" - * selfpath == "/libvirt.service" + path == "foo" -> "/libvirt.service/foo" - */ - if (i == VIR_CGROUP_CONTROLLER_SYSTEMD) { - group->legacy[i].placement = g_strdup(selfpath); - } else { - bool delim = STREQ(selfpath, "/") || STREQ(path, ""); + if (!virCgroupV1MountOptsMatchController(controllers, typestr)) + continue; - group->legacy[i].placement = g_strdup_printf("%s%s%s", selfpath, - delim ? "" : "/", - path); - } + if (!group->legacy[i].mountPoint) + continue; + + if (group->legacy[i].placement) + continue; + + /* + * selfpath == "/" + path="" -> "/" + * selfpath == "/libvirt.service" + path == "" -> "/libvirt.service" + * selfpath == "/libvirt.service" + path == "foo" -> "/libvirt.service/foo" + */ + if (i == VIR_CGROUP_CONTROLLER_SYSTEMD) { + group->legacy[i].placement = g_strdup(selfpath); + } else { + bool delim = STREQ(selfpath, "/") || STREQ(path, ""); + + group->legacy[i].placement = g_strdup_printf("%s%s%s", selfpath, + delim ? "" : "/", + path); } } -- 2.29.2

When we create a new child cgroup and the parent cgroup has any process attached to it enabling controllers for the child cgroup fails with error. We need to move the process into the child cgroup first before enabling any controllers. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- src/util/vircgroup.c | 11 ++++++----- src/util/vircgroupbackend.h | 1 + src/util/vircgroupv1.c | 1 + src/util/vircgroupv2.c | 13 +++++++++++++ 4 files changed, 21 insertions(+), 5 deletions(-) diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 7a0e14eb73..44a3064876 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -694,13 +694,14 @@ static int virCgroupMakeGroup(virCgroupPtr parent, virCgroupPtr group, bool create, + pid_t pid, unsigned int flags) { size_t i; for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) { if (group->backends[i] && - group->backends[i]->makeGroup(parent, group, create, flags) < 0) { + group->backends[i]->makeGroup(parent, group, create, pid, flags) < 0) { virCgroupRemove(group); return -1; } @@ -970,7 +971,7 @@ virCgroupNewPartition(const char *path, return -1; if (parent) { - if (virCgroupMakeGroup(parent, newGroup, create, VIR_CGROUP_NONE) < 0) + if (virCgroupMakeGroup(parent, newGroup, create, -1, VIR_CGROUP_NONE) < 0) return -1; } @@ -1033,7 +1034,7 @@ virCgroupNewDomainPartition(virCgroupPtr partition, * a group for driver, is to avoid overhead to track * cumulative usage that we don't need. */ - if (virCgroupMakeGroup(partition, newGroup, true, + if (virCgroupMakeGroup(partition, newGroup, true, -1, VIR_CGROUP_MEM_HIERACHY) < 0) { return -1; } @@ -1090,7 +1091,7 @@ virCgroupNewThread(virCgroupPtr domain, if (virCgroupNewFromParent(domain, name, controllers, &newGroup) < 0) return -1; - if (virCgroupMakeGroup(domain, newGroup, create, VIR_CGROUP_THREAD) < 0) + if (virCgroupMakeGroup(domain, newGroup, create, -1, VIR_CGROUP_THREAD) < 0) return -1; *group = g_steal_pointer(&newGroup); @@ -1192,7 +1193,7 @@ virCgroupEnableMissingControllers(char *path, &tmp) < 0) return -1; - if (virCgroupMakeGroup(parent, tmp, true, VIR_CGROUP_SYSTEMD) < 0) + if (virCgroupMakeGroup(parent, tmp, true, -1, VIR_CGROUP_SYSTEMD) < 0) return -1; virCgroupFree(parent); diff --git a/src/util/vircgroupbackend.h b/src/util/vircgroupbackend.h index 4ca2e38af2..497dde5625 100644 --- a/src/util/vircgroupbackend.h +++ b/src/util/vircgroupbackend.h @@ -122,6 +122,7 @@ typedef int (*virCgroupMakeGroupCB)(virCgroupPtr parent, virCgroupPtr group, bool create, + pid_t pid, unsigned int flags); typedef int diff --git a/src/util/vircgroupv1.c b/src/util/vircgroupv1.c index f4dd20fd73..51ada12f6e 100644 --- a/src/util/vircgroupv1.c +++ b/src/util/vircgroupv1.c @@ -622,6 +622,7 @@ static int virCgroupV1MakeGroup(virCgroupPtr parent, virCgroupPtr group, bool create, + pid_t pid G_GNUC_UNUSED, unsigned int flags) { size_t i; diff --git a/src/util/vircgroupv2.c b/src/util/vircgroupv2.c index 3f3e5ac83d..d590f83187 100644 --- a/src/util/vircgroupv2.c +++ b/src/util/vircgroupv2.c @@ -406,10 +406,17 @@ virCgroupV2EnableController(virCgroupPtr group, } +static int +virCgroupV2AddTask(virCgroupPtr group, + pid_t pid, + unsigned int flags); + + static int virCgroupV2MakeGroup(virCgroupPtr parent, virCgroupPtr group, bool create, + pid_t pid, unsigned int flags) { g_autofree char *path = NULL; @@ -455,6 +462,12 @@ virCgroupV2MakeGroup(virCgroupPtr parent, } } else { size_t i; + + if (pid > 0) { + if (virCgroupV2AddTask(group, pid, VIR_CGROUP_TASK_PROCESS) < 0) + return -1; + } + for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { int rc; -- 2.29.2

This will check if the cgroup actually exists on the system. Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- src/util/vircgroupbackend.h | 4 ++++ src/util/vircgroupv1.c | 27 +++++++++++++++++++++++++++ src/util/vircgroupv2.c | 15 +++++++++++++++ 3 files changed, 46 insertions(+) diff --git a/src/util/vircgroupbackend.h b/src/util/vircgroupbackend.h index 497dde5625..a338ffb618 100644 --- a/src/util/vircgroupbackend.h +++ b/src/util/vircgroupbackend.h @@ -118,6 +118,9 @@ typedef int const char *key, char **path); +typedef bool +(*virCgroupExistsCB)(virCgroupPtr group); + typedef int (*virCgroupMakeGroupCB)(virCgroupPtr parent, virCgroupPtr group, @@ -382,6 +385,7 @@ struct _virCgroupBackend { virCgroupGetAnyControllerCB getAnyController; virCgroupPathOfControllerCB pathOfController; virCgroupMakeGroupCB makeGroup; + virCgroupExistsCB exists; virCgroupRemoveCB remove; virCgroupAddTaskCB addTask; virCgroupHasEmptyTasksCB hasEmptyTasks; diff --git a/src/util/vircgroupv1.c b/src/util/vircgroupv1.c index 51ada12f6e..8a9a63e0b7 100644 --- a/src/util/vircgroupv1.c +++ b/src/util/vircgroupv1.c @@ -690,6 +690,32 @@ virCgroupV1MakeGroup(virCgroupPtr parent, } +static bool +virCgroupV1Exists(virCgroupPtr group) +{ + size_t i; + + for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { + g_autofree char *path = NULL; + + if (i == VIR_CGROUP_CONTROLLER_SYSTEMD) + continue; + + if (!group->legacy[i].mountPoint) + continue; + + if (virCgroupV1PathOfController(group, i, "", &path) < 0) + return false; + + if (!virFileExists(path)) { + return false; + } + } + + return true; +} + + static int virCgroupV1Remove(virCgroupPtr group) { @@ -2154,6 +2180,7 @@ virCgroupBackend virCgroupV1Backend = { .getAnyController = virCgroupV1GetAnyController, .pathOfController = virCgroupV1PathOfController, .makeGroup = virCgroupV1MakeGroup, + .exists = virCgroupV1Exists, .remove = virCgroupV1Remove, .addTask = virCgroupV1AddTask, .hasEmptyTasks = virCgroupV1HasEmptyTasks, diff --git a/src/util/vircgroupv2.c b/src/util/vircgroupv2.c index d590f83187..c96f405dfa 100644 --- a/src/util/vircgroupv2.c +++ b/src/util/vircgroupv2.c @@ -499,6 +499,20 @@ virCgroupV2MakeGroup(virCgroupPtr parent, } +static bool +virCgroupV2Exists(virCgroupPtr group) +{ + g_autofree char *path = NULL; + int controller; + + controller = virCgroupV2GetAnyController(group); + if (virCgroupV2PathOfController(group, controller, "", &path) < 0) + return false; + + return virFileExists(path); +} + + static int virCgroupV2Remove(virCgroupPtr group) { @@ -1895,6 +1909,7 @@ virCgroupBackend virCgroupV2Backend = { .getAnyController = virCgroupV2GetAnyController, .pathOfController = virCgroupV2PathOfController, .makeGroup = virCgroupV2MakeGroup, + .exists = virCgroupV2Exists, .remove = virCgroupV2Remove, .addTask = virCgroupV2AddTask, .hasEmptyTasks = virCgroupV2HasEmptyTasks, -- 2.29.2

When running on host with systemd we register VMs with machined. In this case systemd creates the root VM cgroup for us. This has some implications where one of them is that systemd owns all files inside the root VM cgroup and we should not touch them. We already use DBus calls for some of the APIs but for the remaining ones we will continue accessing the files directly. Systemd doesn't support threaded cgroups so we need to do this. The reason why we don't use DBus for most of the APIs is that we already have a code that works with files and we would have to check if systemd supports each API. This change introduces new topology on systemd hosts: $ROOT | +- machine.slice | +- machine-qemu\x2d1\x2dvm1.scope | +- libvirt | +- emulator +- vcpu0 +- vcpu0 compared to the previous topology: $ROOT | +- machine.slice | +- machine-qemu\x2d1\x2dvm1.scope | +- emulator +- vcpu0 +- vcpu0 Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- docs/cgroups.html.in | 29 +++-- src/util/vircgroup.c | 250 ++++++++++++++++++++++++++++++--------- src/util/vircgrouppriv.h | 4 + src/util/vircgroupv1.c | 15 ++- src/util/vircgroupv2.c | 6 + 5 files changed, 239 insertions(+), 65 deletions(-) diff --git a/docs/cgroups.html.in b/docs/cgroups.html.in index 78dede1bba..412a9360ff 100644 --- a/docs/cgroups.html.in +++ b/docs/cgroups.html.in @@ -117,21 +117,27 @@ $ROOT | +- machine-qemu\x2d1\x2dvm1.scope | | - | +- emulator - | +- vcpu0 - | +- vcpu1 + | +- libvirt + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 | +- machine-qemu\x2d2\x2dvm2.scope | | - | +- emulator - | +- vcpu0 - | +- vcpu1 + | +- libvirt + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 | +- machine-qemu\x2d3\x2dvm3.scope | | - | +- emulator - | +- vcpu0 - | +- vcpu1 + | +- libvirt + | | + | +- emulator + | +- vcpu0 + | +- vcpu1 | +- machine-engineering.slice | | @@ -148,6 +154,11 @@ $ROOT +- machine-lxc\x2d33333\x2dcontainer3.scope </pre> + <p> + Prior libvirt 7.1.0 the topology doesn't have extra + <code>libvirt</code> directory. + </p> + <h3><a id="currentLayoutGeneric">Non-systemd cgroups layout</a></h3> <p> diff --git a/src/util/vircgroup.c b/src/util/vircgroup.c index 44a3064876..402224813b 100644 --- a/src/util/vircgroup.c +++ b/src/util/vircgroup.c @@ -711,6 +711,22 @@ virCgroupMakeGroup(virCgroupPtr parent, } +static bool +virCgroupExists(virCgroupPtr group) +{ + size_t i; + + for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) { + if (group->backends[i] && + !group->backends[i]->exists(group)) { + return false; + } + } + + return true; +} + + /** * virCgroupNew: * @path: path for the new group @@ -764,10 +780,11 @@ virCgroupAddTaskInternal(virCgroupPtr group, unsigned int flags) { size_t i; + virCgroupPtr parent = virCgroupGetNested(group); for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) { - if (group->backends[i] && - group->backends[i]->addTask(group, pid, flags) < 0) { + if (parent->backends[i] && + parent->backends[i]->addTask(parent, pid, flags) < 0) { return -1; } } @@ -980,6 +997,28 @@ virCgroupNewPartition(const char *path, } +static int +virCgroupNewNested(virCgroupPtr parent, + int controllers, + bool create, + pid_t pid, + virCgroupPtr *nested) +{ + g_autoptr(virCgroup) new = NULL; + + if (virCgroupNewFromParent(parent, "libvirt", controllers, &new) < 0) + return -1; + + if (create) { + if (virCgroupMakeGroup(parent, new, create, pid, VIR_CGROUP_NONE) < 0) + return -1; + } + + *nested = g_steal_pointer(&new); + return 0; +} + + /** * virCgroupNewSelf: * @@ -1064,6 +1103,7 @@ virCgroupNewThread(virCgroupPtr domain, { g_autofree char *name = NULL; g_autoptr(virCgroup) newGroup = NULL; + virCgroupPtr parent = NULL; int controllers; *group = NULL; @@ -1088,10 +1128,12 @@ virCgroupNewThread(virCgroupPtr domain, (1 << VIR_CGROUP_CONTROLLER_CPUACCT) | (1 << VIR_CGROUP_CONTROLLER_CPUSET)); - if (virCgroupNewFromParent(domain, name, controllers, &newGroup) < 0) + parent = virCgroupGetNested(domain); + + if (virCgroupNewFromParent(parent, name, controllers, &newGroup) < 0) return -1; - if (virCgroupMakeGroup(domain, newGroup, create, -1, VIR_CGROUP_THREAD) < 0) + if (virCgroupMakeGroup(parent, newGroup, create, -1, VIR_CGROUP_THREAD) < 0) return -1; *group = g_steal_pointer(&newGroup); @@ -1142,6 +1184,7 @@ virCgroupNewDetectMachine(const char *name, { size_t i; g_autoptr(virCgroup) newGroup = NULL; + g_autoptr(virCgroup) nested = NULL; *group = NULL; @@ -1166,6 +1209,12 @@ virCgroupNewDetectMachine(const char *name, if (virSystemdHasMachined() == 0 && !newGroup->unitName) return -1; + if (virCgroupNewNested(newGroup, controllers, false, -1, &nested) < 0) + return -1; + + if (virCgroupExists(nested)) + newGroup->nested = g_steal_pointer(&nested); + *group = g_steal_pointer(&newGroup); return 0; } @@ -1225,6 +1274,7 @@ virCgroupNewMachineSystemd(const char *name, int rv; g_autoptr(virCgroup) init = NULL; g_autoptr(virCgroup) newGroup = NULL; + g_autoptr(virCgroup) nested = NULL; g_autofree char *path = NULL; size_t i; @@ -1270,6 +1320,11 @@ virCgroupNewMachineSystemd(const char *name, if (!newGroup->unitName) return -1; + if (virCgroupNewNested(newGroup, controllers, true, pidleader, &nested) < 0) + return -1; + + newGroup->nested = g_steal_pointer(&nested); + if (virCgroupAddProcess(newGroup, pidleader) < 0) { virErrorPtr saved; @@ -1457,7 +1512,9 @@ virCgroupGetBlkioIoServiced(virCgroupPtr group, long long *requests_read, long long *requests_write) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, getBlkioIoServiced, -1, bytes_read, bytes_write, requests_read, requests_write); @@ -1484,7 +1541,9 @@ virCgroupGetBlkioIoDeviceServiced(virCgroupPtr group, long long *requests_read, long long *requests_write) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, getBlkioIoDeviceServiced, -1, path, bytes_read, bytes_write, requests_read, requests_write); @@ -1535,7 +1594,9 @@ virCgroupSetBlkioDeviceReadIops(virCgroupPtr group, const char *path, unsigned int riops) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, setBlkioDeviceReadIops, -1, path, riops); } @@ -1553,7 +1614,9 @@ virCgroupSetBlkioDeviceWriteIops(virCgroupPtr group, const char *path, unsigned int wiops) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, setBlkioDeviceWriteIops, -1, path, wiops); } @@ -1571,7 +1634,9 @@ virCgroupSetBlkioDeviceReadBps(virCgroupPtr group, const char *path, unsigned long long rbps) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, setBlkioDeviceReadBps, -1, path, rbps); } @@ -1588,7 +1653,9 @@ virCgroupSetBlkioDeviceWriteBps(virCgroupPtr group, const char *path, unsigned long long wbps) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, setBlkioDeviceWriteBps, -1, path, wbps); } @@ -1624,7 +1691,9 @@ virCgroupGetBlkioDeviceReadIops(virCgroupPtr group, const char *path, unsigned int *riops) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, getBlkioDeviceReadIops, -1, path, riops); } @@ -1641,7 +1710,9 @@ virCgroupGetBlkioDeviceWriteIops(virCgroupPtr group, const char *path, unsigned int *wiops) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, getBlkioDeviceWriteIops, -1, path, wiops); } @@ -1658,7 +1729,9 @@ virCgroupGetBlkioDeviceReadBps(virCgroupPtr group, const char *path, unsigned long long *rbps) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, getBlkioDeviceReadBps, -1, path, rbps); } @@ -1675,7 +1748,9 @@ virCgroupGetBlkioDeviceWriteBps(virCgroupPtr group, const char *path, unsigned long long *wbps) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_BLKIO, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_BLKIO, getBlkioDeviceWriteBps, -1, path, wbps); } @@ -1708,7 +1783,9 @@ virCgroupGetBlkioDeviceWeight(virCgroupPtr group, int virCgroupSetMemory(virCgroupPtr group, unsigned long long kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, setMemory, -1, kb); } @@ -1735,7 +1812,9 @@ virCgroupGetMemoryStat(virCgroupPtr group, unsigned long long *inactiveFile, unsigned long long *unevictable) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, getMemoryStat, -1, cache, activeAnon, inactiveAnon, activeFile, inactiveFile, @@ -1754,7 +1833,9 @@ virCgroupGetMemoryStat(virCgroupPtr group, int virCgroupGetMemoryUsage(virCgroupPtr group, unsigned long *kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, getMemoryUsage, -1, kb); } @@ -1770,7 +1851,9 @@ virCgroupGetMemoryUsage(virCgroupPtr group, unsigned long *kb) int virCgroupSetMemoryHardLimit(virCgroupPtr group, unsigned long long kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, setMemoryHardLimit, -1, kb); } @@ -1786,7 +1869,9 @@ virCgroupSetMemoryHardLimit(virCgroupPtr group, unsigned long long kb) int virCgroupGetMemoryHardLimit(virCgroupPtr group, unsigned long long *kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, getMemoryHardLimit, -1, kb); } @@ -1802,7 +1887,9 @@ virCgroupGetMemoryHardLimit(virCgroupPtr group, unsigned long long *kb) int virCgroupSetMemorySoftLimit(virCgroupPtr group, unsigned long long kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, setMemorySoftLimit, -1, kb); } @@ -1818,7 +1905,9 @@ virCgroupSetMemorySoftLimit(virCgroupPtr group, unsigned long long kb) int virCgroupGetMemorySoftLimit(virCgroupPtr group, unsigned long long *kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, getMemorySoftLimit, -1, kb); } @@ -1834,7 +1923,9 @@ virCgroupGetMemorySoftLimit(virCgroupPtr group, unsigned long long *kb) int virCgroupSetMemSwapHardLimit(virCgroupPtr group, unsigned long long kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, setMemSwapHardLimit, -1, kb); } @@ -1850,7 +1941,9 @@ virCgroupSetMemSwapHardLimit(virCgroupPtr group, unsigned long long kb) int virCgroupGetMemSwapHardLimit(virCgroupPtr group, unsigned long long *kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, getMemSwapHardLimit, -1, kb); } @@ -1866,7 +1959,9 @@ virCgroupGetMemSwapHardLimit(virCgroupPtr group, unsigned long long *kb) int virCgroupGetMemSwapUsage(virCgroupPtr group, unsigned long long *kb) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_MEMORY, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_MEMORY, getMemSwapUsage, -1, kb); } @@ -1882,7 +1977,9 @@ virCgroupGetMemSwapUsage(virCgroupPtr group, unsigned long long *kb) int virCgroupSetCpusetMems(virCgroupPtr group, const char *mems) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUSET, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUSET, setCpusetMems, -1, mems); } @@ -1898,7 +1995,9 @@ virCgroupSetCpusetMems(virCgroupPtr group, const char *mems) int virCgroupGetCpusetMems(virCgroupPtr group, char **mems) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUSET, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUSET, getCpusetMems, -1, mems); } @@ -1914,7 +2013,9 @@ virCgroupGetCpusetMems(virCgroupPtr group, char **mems) int virCgroupSetCpusetMemoryMigrate(virCgroupPtr group, bool migrate) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUSET, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUSET, setCpusetMemoryMigrate, -1, migrate); } @@ -1930,7 +2031,9 @@ virCgroupSetCpusetMemoryMigrate(virCgroupPtr group, bool migrate) int virCgroupGetCpusetMemoryMigrate(virCgroupPtr group, bool *migrate) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUSET, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUSET, getCpusetMemoryMigrate, -1, migrate); } @@ -1946,7 +2049,9 @@ virCgroupGetCpusetMemoryMigrate(virCgroupPtr group, bool *migrate) int virCgroupSetCpusetCpus(virCgroupPtr group, const char *cpus) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUSET, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUSET, setCpusetCpus, -1, cpus); } @@ -1962,7 +2067,9 @@ virCgroupSetCpusetCpus(virCgroupPtr group, const char *cpus) int virCgroupGetCpusetCpus(virCgroupPtr group, char **cpus) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUSET, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUSET, getCpusetCpus, -1, cpus); } @@ -1977,7 +2084,9 @@ virCgroupGetCpusetCpus(virCgroupPtr group, char **cpus) int virCgroupDenyAllDevices(virCgroupPtr group) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_DEVICES, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_DEVICES, denyAllDevices, -1); } @@ -1998,7 +2107,9 @@ virCgroupDenyAllDevices(virCgroupPtr group) int virCgroupAllowAllDevices(virCgroupPtr group, int perms) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_DEVICES, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_DEVICES, allowAllDevices, -1, perms); } @@ -2018,7 +2129,9 @@ int virCgroupAllowDevice(virCgroupPtr group, char type, int major, int minor, int perms) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_DEVICES, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_DEVICES, allowDevice, -1, type, major, minor, perms); } @@ -2044,6 +2157,7 @@ virCgroupAllowDevicePath(virCgroupPtr group, bool ignoreEacces) { struct stat sb; + virCgroupPtr parent = virCgroupGetNested(group); if (stat(path, &sb) < 0) { if (errno == EACCES && ignoreEacces) @@ -2058,7 +2172,7 @@ virCgroupAllowDevicePath(virCgroupPtr group, if (!S_ISCHR(sb.st_mode) && !S_ISBLK(sb.st_mode)) return 1; - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_DEVICES, + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_DEVICES, allowDevice, -1, S_ISCHR(sb.st_mode) ? 'c' : 'b', major(sb.st_rdev), @@ -2082,7 +2196,9 @@ int virCgroupDenyDevice(virCgroupPtr group, char type, int major, int minor, int perms) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_DEVICES, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_DEVICES, denyDevice, -1, type, major, minor, perms); } @@ -2108,6 +2224,7 @@ virCgroupDenyDevicePath(virCgroupPtr group, bool ignoreEacces) { struct stat sb; + virCgroupPtr parent = virCgroupGetNested(group); if (stat(path, &sb) < 0) { if (errno == EACCES && ignoreEacces) @@ -2122,7 +2239,7 @@ virCgroupDenyDevicePath(virCgroupPtr group, if (!S_ISCHR(sb.st_mode) && !S_ISBLK(sb.st_mode)) return 1; - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_DEVICES, + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_DEVICES, denyDevice, -1, S_ISCHR(sb.st_mode) ? 'c' : 'b', major(sb.st_rdev), @@ -2383,7 +2500,9 @@ virCgroupGetCpuShares(virCgroupPtr group, unsigned long long *shares) int virCgroupSetCpuCfsPeriod(virCgroupPtr group, unsigned long long cfs_period) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPU, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPU, setCpuCfsPeriod, -1, cfs_period); } @@ -2399,7 +2518,9 @@ virCgroupSetCpuCfsPeriod(virCgroupPtr group, unsigned long long cfs_period) int virCgroupGetCpuCfsPeriod(virCgroupPtr group, unsigned long long *cfs_period) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPU, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPU, getCpuCfsPeriod, -1, cfs_period); } @@ -2416,7 +2537,9 @@ virCgroupGetCpuCfsPeriod(virCgroupPtr group, unsigned long long *cfs_period) int virCgroupSetCpuCfsQuota(virCgroupPtr group, long long cfs_quota) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPU, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPU, setCpuCfsQuota, -1, cfs_quota); } @@ -2424,7 +2547,9 @@ virCgroupSetCpuCfsQuota(virCgroupPtr group, long long cfs_quota) int virCgroupGetCpuacctPercpuUsage(virCgroupPtr group, char **usage) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUACCT, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUACCT, getCpuacctPercpuUsage, -1, usage); } @@ -2727,7 +2852,9 @@ virCgroupKillPainfully(virCgroupPtr group) int virCgroupGetCpuCfsQuota(virCgroupPtr group, long long *cfs_quota) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPU, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPU, getCpuCfsQuota, -1, cfs_quota); } @@ -2735,7 +2862,9 @@ virCgroupGetCpuCfsQuota(virCgroupPtr group, long long *cfs_quota) int virCgroupGetCpuacctUsage(virCgroupPtr group, unsigned long long *usage) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUACCT, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUACCT, getCpuacctUsage, -1, usage); } @@ -2744,7 +2873,9 @@ int virCgroupGetCpuacctStat(virCgroupPtr group, unsigned long long *user, unsigned long long *sys) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_CPUACCT, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPUACCT, getCpuacctStat, -1, user, sys); } @@ -2752,7 +2883,9 @@ virCgroupGetCpuacctStat(virCgroupPtr group, unsigned long long *user, int virCgroupSetFreezerState(virCgroupPtr group, const char *state) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_FREEZER, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_FREEZER, setFreezerState, -1, state); } @@ -2760,7 +2893,9 @@ virCgroupSetFreezerState(virCgroupPtr group, const char *state) int virCgroupGetFreezerState(virCgroupPtr group, char **state) { - VIR_CGROUP_BACKEND_CALL(group, VIR_CGROUP_CONTROLLER_FREEZER, + virCgroupPtr parent = virCgroupGetNested(group); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_FREEZER, getFreezerState, -1, state); } @@ -2770,10 +2905,11 @@ virCgroupBindMount(virCgroupPtr group, const char *oldroot, const char *mountopts) { size_t i; + virCgroupPtr parent = virCgroupGetNested(group); for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) { - if (group->backends[i] && - group->backends[i]->bindMount(group, oldroot, mountopts) < 0) { + if (parent->backends[i] && + parent->backends[i]->bindMount(parent, oldroot, mountopts) < 0) { return -1; } } @@ -2788,10 +2924,11 @@ int virCgroupSetOwner(virCgroupPtr cgroup, int controllers) { size_t i; + virCgroupPtr parent = virCgroupGetNested(cgroup); for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) { - if (cgroup->backends[i] && - cgroup->backends[i]->setOwner(cgroup, uid, gid, controllers) < 0) { + if (parent->backends[i] && + parent->backends[i]->setOwner(parent, uid, gid, controllers) < 0) { return -1; } } @@ -2810,7 +2947,9 @@ int virCgroupSetOwner(virCgroupPtr cgroup, bool virCgroupSupportsCpuBW(virCgroupPtr cgroup) { - VIR_CGROUP_BACKEND_CALL(cgroup, VIR_CGROUP_CONTROLLER_CPU, + virCgroupPtr parent = virCgroupGetNested(cgroup); + + VIR_CGROUP_BACKEND_CALL(parent, VIR_CGROUP_CONTROLLER_CPU, supportsCpuBW, false); } @@ -2818,10 +2957,11 @@ int virCgroupHasEmptyTasks(virCgroupPtr cgroup, int controller) { size_t i; + virCgroupPtr parent = virCgroupGetNested(cgroup); for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) { - if (cgroup->backends[i]) { - int rc = cgroup->backends[i]->hasEmptyTasks(cgroup, controller); + if (parent->backends[i]) { + int rc = parent->backends[i]->hasEmptyTasks(parent, controller); if (rc <= 0) return rc; } @@ -3630,6 +3770,7 @@ virCgroupFree(virCgroupPtr group) g_free(group->unified.mountPoint); g_free(group->unified.placement); g_free(group->unitName); + g_free(group->nested); g_free(group); } @@ -3641,9 +3782,12 @@ virCgroupDelThread(virCgroupPtr cgroup, int idx) { g_autoptr(virCgroup) new_cgroup = NULL; + virCgroupPtr parent = NULL; if (cgroup) { - if (virCgroupNewThread(cgroup, nameval, idx, false, &new_cgroup) < 0) + parent = virCgroupGetNested(cgroup); + + if (virCgroupNewThread(parent, nameval, idx, false, &new_cgroup) < 0) return -1; /* Remove the offlined cgroup */ diff --git a/src/util/vircgrouppriv.h b/src/util/vircgrouppriv.h index f88d00814d..7c63780800 100644 --- a/src/util/vircgrouppriv.h +++ b/src/util/vircgrouppriv.h @@ -66,8 +66,12 @@ struct _virCgroup { virCgroupV2Controller unified; char *unitName; + virCgroupPtr nested; }; +#define virCgroupGetNested(cgroup) \ + (cgroup->nested ? cgroup->nested : cgroup) + int virCgroupSetValueDBus(const char *unitName, const char *key, GVariant *value); diff --git a/src/util/vircgroupv1.c b/src/util/vircgroupv1.c index 8a9a63e0b7..79015ddaa4 100644 --- a/src/util/vircgroupv1.c +++ b/src/util/vircgroupv1.c @@ -338,6 +338,8 @@ virCgroupV1DetectPlacement(virCgroupPtr group, for (i = 0; i < VIR_CGROUP_CONTROLLER_LAST; i++) { const char *typestr = virCgroupV1ControllerTypeToString(i); + g_autofree char* placement = NULL; + char *tmp = NULL; if (!virCgroupV1MountOptsMatchController(controllers, typestr)) continue; @@ -348,17 +350,24 @@ virCgroupV1DetectPlacement(virCgroupPtr group, if (group->legacy[i].placement) continue; + /* On systemd we create a nested cgroup for some cgroup tasks + * but the placement should point to the root cgroup. */ + placement = g_strdup(selfpath); + tmp = g_strrstr(placement, "/libvirt"); + if (tmp) + *tmp = '\0'; + /* * selfpath == "/" + path="" -> "/" * selfpath == "/libvirt.service" + path == "" -> "/libvirt.service" * selfpath == "/libvirt.service" + path == "foo" -> "/libvirt.service/foo" */ if (i == VIR_CGROUP_CONTROLLER_SYSTEMD) { - group->legacy[i].placement = g_strdup(selfpath); + group->legacy[i].placement = g_strdup(placement); } else { - bool delim = STREQ(selfpath, "/") || STREQ(path, ""); + bool delim = STREQ(placement, "/") || STREQ(path, ""); - group->legacy[i].placement = g_strdup_printf("%s%s%s", selfpath, + group->legacy[i].placement = g_strdup_printf("%s%s%s", placement, delim ? "" : "/", path); } diff --git a/src/util/vircgroupv2.c b/src/util/vircgroupv2.c index c96f405dfa..0fb5188f45 100644 --- a/src/util/vircgroupv2.c +++ b/src/util/vircgroupv2.c @@ -208,6 +208,12 @@ virCgroupV2DetectPlacement(virCgroupPtr group, if (tmp) *tmp = '\0'; + /* On systemd we create a nested cgroup for some cgroup tasks + * but the placement should point to the root cgroup. */ + tmp = g_strrstr(placement, "/libvirt"); + if (tmp) + *tmp = '\0'; + /* * selfpath == "/" + path="" -> "/" * selfpath == "/libvirt.service" + path == "" -> "/libvirt.service" -- 2.29.2

Signed-off-by: Pavel Hrdina <phrdina@redhat.com> --- tests/vircgroupdata/systemd-legacy.cgroups | 12 ++++ tests/vircgroupdata/systemd-legacy.mounts | 11 +++ .../vircgroupdata/systemd-legacy.self.cgroup | 11 +++ tests/vircgroupdata/systemd-unified.cgroups | 13 ++++ tests/vircgroupdata/systemd-unified.mounts | 1 + .../vircgroupdata/systemd-unified.self.cgroup | 1 + tests/vircgrouptest.c | 72 +++++++++++++++++++ 7 files changed, 121 insertions(+) create mode 100644 tests/vircgroupdata/systemd-legacy.cgroups create mode 100644 tests/vircgroupdata/systemd-legacy.mounts create mode 100644 tests/vircgroupdata/systemd-legacy.self.cgroup create mode 100644 tests/vircgroupdata/systemd-unified.cgroups create mode 100644 tests/vircgroupdata/systemd-unified.mounts create mode 100644 tests/vircgroupdata/systemd-unified.self.cgroup diff --git a/tests/vircgroupdata/systemd-legacy.cgroups b/tests/vircgroupdata/systemd-legacy.cgroups new file mode 100644 index 0000000000..444354e3c8 --- /dev/null +++ b/tests/vircgroupdata/systemd-legacy.cgroups @@ -0,0 +1,12 @@ +#subsys_name hierarchy num_cgroups enabled +blkio 1 1 1 +cpu 2 1 1 +cpuacct 3 1 1 +cpuset 4 1 1 +devices 5 1 1 +freezer 6 1 1 +hugetlb 7 1 1 +memory 8 1 1 +net_cls 9 1 1 +perf_event 10 1 1 +pids 11 1 1 diff --git a/tests/vircgroupdata/systemd-legacy.mounts b/tests/vircgroupdata/systemd-legacy.mounts new file mode 100644 index 0000000000..23462e9e68 --- /dev/null +++ b/tests/vircgroupdata/systemd-legacy.mounts @@ -0,0 +1,11 @@ +cgroup /not/really/sys/fs/cgroup/blkio cgroup rw,seclabel,nosuid,nodev,noexec,relatime,blkio 0 0 +cgroup /not/really/sys/fs/cgroup/cpu cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpu 0 0 +cgroup /not/really/sys/fs/cgroup/cpuacct cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpuacct 0 0 +cgroup /not/really/sys/fs/cgroup/cpuset cgroup rw,seclabel,nosuid,nodev,noexec,relatime,cpuset 0 0 +cgroup /not/really/sys/fs/cgroup/devices cgroup rw,seclabel,nosuid,nodev,noexec,relatime,devices 0 0 +cgroup /not/really/sys/fs/cgroup/freezer cgroup rw,seclabel,nosuid,nodev,noexec,relatime,freezer 0 0 +cgroup /not/really/sys/fs/cgroup/hugetlb cgroup rw,seclabel,nosuid,nodev,noexec,relatime,hugetlb 0 0 +cgroup /not/really/sys/fs/cgroup/memory cgroup rw,seclabel,nosuid,nodev,noexec,relatime,memory 0 0 +cgroup /not/really/sys/fs/cgroup/net_cls cgroup rw,seclabel,nosuid,nodev,noexec,relatime,net_cls 0 0 +cgroup /not/really/sys/fs/cgroup/perf_event cgroup rw,seclabel,nosuid,nodev,noexec,relatime,perf_event 0 0 +cgroup /not/really/sys/fs/cgroup/pids cgroup rw,seclabel,nosuid,nodev,noexec,relatime,pids 0 0 diff --git a/tests/vircgroupdata/systemd-legacy.self.cgroup b/tests/vircgroupdata/systemd-legacy.self.cgroup new file mode 100644 index 0000000000..5c133a3c08 --- /dev/null +++ b/tests/vircgroupdata/systemd-legacy.self.cgroup @@ -0,0 +1,11 @@ +1:blkio:/libvirt +2:cpu:/libvirt/emulator +3:cpuacct:/libvirt/emulator +4:cpuset:/libvirt/emulator +5:devices:/libvirt +6:freezer:/libvirt +7:hugetlb:/ +8:memory:/libvirt +9:net_cls:/libvirt +10:perf_event:/libvirt +11:pids:/ diff --git a/tests/vircgroupdata/systemd-unified.cgroups b/tests/vircgroupdata/systemd-unified.cgroups new file mode 100644 index 0000000000..e0d8a3561c --- /dev/null +++ b/tests/vircgroupdata/systemd-unified.cgroups @@ -0,0 +1,13 @@ +#subsys_name hierarchy num_cgroups enabled +cpuset 0 1 1 +cpu 0 1 1 +cpuacct 0 1 1 +blkio 0 1 1 +memory 0 1 1 +devices 0 1 1 +freezer 0 1 1 +net_cls 0 1 1 +perf_event 0 1 1 +net_prio 0 1 1 +hugetlb 0 1 1 +pids 0 1 1 diff --git a/tests/vircgroupdata/systemd-unified.mounts b/tests/vircgroupdata/systemd-unified.mounts new file mode 100644 index 0000000000..8225f37f45 --- /dev/null +++ b/tests/vircgroupdata/systemd-unified.mounts @@ -0,0 +1 @@ +cgroup2 /not/really/sys/fs/cgroup cgroup2 rw,seclabel,nosuid,nodev,noexec,relatime,nsdelegate 0 0 diff --git a/tests/vircgroupdata/systemd-unified.self.cgroup b/tests/vircgroupdata/systemd-unified.self.cgroup new file mode 100644 index 0000000000..6007ce7e18 --- /dev/null +++ b/tests/vircgroupdata/systemd-unified.self.cgroup @@ -0,0 +1 @@ +0::/libvirt/emulator diff --git a/tests/vircgrouptest.c b/tests/vircgrouptest.c index 1baa71e61c..85c5a6ebd4 100644 --- a/tests/vircgrouptest.c +++ b/tests/vircgrouptest.c @@ -576,6 +576,64 @@ static int testCgroupNewForSelfHybrid(const void *args G_GNUC_UNUSED) } +static int testCgroupNewForSelfSystemdLegacy(const void *args G_GNUC_UNUSED) +{ + g_autoptr(virCgroup) cgroup = NULL; + const char *empty[VIR_CGROUP_CONTROLLER_LAST] = { 0 }; + const char *mounts[VIR_CGROUP_CONTROLLER_LAST] = { + [VIR_CGROUP_CONTROLLER_BLKIO] = "/not/really/sys/fs/cgroup/blkio", + [VIR_CGROUP_CONTROLLER_CPU] = "/not/really/sys/fs/cgroup/cpu", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "/not/really/sys/fs/cgroup/cpuacct", + [VIR_CGROUP_CONTROLLER_CPUSET] = "/not/really/sys/fs/cgroup/cpuset", + [VIR_CGROUP_CONTROLLER_DEVICES] = "/not/really/sys/fs/cgroup/devices", + [VIR_CGROUP_CONTROLLER_FREEZER] = "/not/really/sys/fs/cgroup/freezer", + [VIR_CGROUP_CONTROLLER_MEMORY] = "/not/really/sys/fs/cgroup/memory", + [VIR_CGROUP_CONTROLLER_NET_CLS] = "/not/really/sys/fs/cgroup/net_cls", + [VIR_CGROUP_CONTROLLER_PERF_EVENT] = "/not/really/sys/fs/cgroup/perf_event", + }; + const char *placement[VIR_CGROUP_CONTROLLER_LAST] = { + [VIR_CGROUP_CONTROLLER_BLKIO] = "", + [VIR_CGROUP_CONTROLLER_CPU] = "", + [VIR_CGROUP_CONTROLLER_CPUACCT] = "", + [VIR_CGROUP_CONTROLLER_CPUSET] = "", + [VIR_CGROUP_CONTROLLER_DEVICES] = "", + [VIR_CGROUP_CONTROLLER_FREEZER] = "", + [VIR_CGROUP_CONTROLLER_MEMORY] = "", + [VIR_CGROUP_CONTROLLER_NET_CLS] = "", + [VIR_CGROUP_CONTROLLER_PERF_EVENT] = "", + }; + + if (virCgroupNewSelf(&cgroup) < 0) { + fprintf(stderr, "Cannot create cgroup for self\n"); + return -1; + } + + return validateCgroup(cgroup, mounts, empty, placement, NULL, NULL, 0); +} + + +static int testCgroupNewForSelfSystemdUnified(const void *args G_GNUC_UNUSED) +{ + g_autoptr(virCgroup) cgroup = NULL; + const char *empty[VIR_CGROUP_CONTROLLER_LAST] = { 0 }; + unsigned int controllers = + (1 << VIR_CGROUP_CONTROLLER_CPU) | + (1 << VIR_CGROUP_CONTROLLER_CPUACCT) | + (1 << VIR_CGROUP_CONTROLLER_MEMORY) | + (1 << VIR_CGROUP_CONTROLLER_DEVICES) | + (1 << VIR_CGROUP_CONTROLLER_BLKIO); + + if (virCgroupNewSelf(&cgroup) < 0) { + fprintf(stderr, "Cannot create cgroup for self\n"); + return -1; + } + + return validateCgroup(cgroup, empty, empty, empty, + "/not/really/sys/fs/cgroup", "", + controllers); +} + + static int testCgroupAvailable(const void *args) { bool got = virCgroupAvailable(); @@ -1046,6 +1104,20 @@ mymain(void) ret = -1; cleanupFakeFS(fakerootdir); + fakerootdir = initFakeFS("legacy", "systemd-legacy"); + if (virTestRun("New cgroup for self (systemd-legacy)", + testCgroupNewForSelfSystemdLegacy, NULL) < 0) { + ret = -1; + } + cleanupFakeFS(fakerootdir); + + fakerootdir = initFakeFS("unified", "systemd-unified"); + if (virTestRun("New cgroup for self (systemd-unified)", + testCgroupNewForSelfSystemdUnified, NULL) < 0) { + ret = -1; + } + cleanupFakeFS(fakerootdir); + return ret == 0 ? EXIT_SUCCESS : EXIT_FAILURE; } -- 2.29.2

On 2/9/21 2:26 PM, Pavel Hrdina wrote:
When running on host with systemd there is an ownership issue of the root VM cgroup. When it is created for us by systemd using machined the owner of the root VM cgroup is systemd and we should not touch any of the files as systemd can and will modify any values configured there.
Basically we had the issue since introduction of machined support in libvirt 1.1.1 back in 2013. With systemd implementing more cgroup APIs the `systemctl daemon-reload` would change more values configured by libvirt.
The solution to the issue is to use systemd DBus APIs to configure cgroups but unfortunately they don't cover everything that libvirt needs.
For that reason we will use systemd DBus APIs only for values that affect sibling cgroups where the resources are distributed proportionally, such as blkio.weight or cpu.shares. For the remaining resources we will keep the current code where we work with the files directly but we move everything into a child cgroup of the VM root cgroup where we are free to do whatever we like including thread configuration.
Yeah, if only there was a way to tell machined to not touch CGroups we've created until they offer full set of features.
Pavel Hrdina (9): virsystemd: export virSystemdHasMachined virsystemd: introduce virSystemdGetMachineByPID virsystemd: introduce virSystemdGetMachineUnitByPID vircgroup: use DBus call to systemd for some APIs vircgroupv1: refactor virCgroupV1DetectPlacement vircgroupv2: move task into cgroup before enabling controllers vircgroup: introduce virCgroupV1Exists and virCgroupV2Exists vircgroup: introduce nested cgroup to properly work with systemd tests: add cgroup nested tests
docs/cgroups.html.in | 29 +- src/libvirt_private.syms | 2 + src/util/vircgroup.c | 300 ++++++++++++++---- src/util/vircgroupbackend.h | 5 + src/util/vircgrouppriv.h | 10 + src/util/vircgroupv1.c | 122 +++++-- src/util/vircgroupv2.c | 82 ++++- src/util/virsystemd.c | 105 +++++- src/util/virsystemd.h | 4 + tests/vircgroupdata/systemd-legacy.cgroups | 12 + tests/vircgroupdata/systemd-legacy.mounts | 11 + .../vircgroupdata/systemd-legacy.self.cgroup | 11 + tests/vircgroupdata/systemd-unified.cgroups | 13 + tests/vircgroupdata/systemd-unified.mounts | 1 + .../vircgroupdata/systemd-unified.self.cgroup | 1 + tests/vircgrouptest.c | 72 +++++ tests/virsystemdtest.c | 39 ++- 17 files changed, 687 insertions(+), 132 deletions(-) create mode 100644 tests/vircgroupdata/systemd-legacy.cgroups create mode 100644 tests/vircgroupdata/systemd-legacy.mounts create mode 100644 tests/vircgroupdata/systemd-legacy.self.cgroup create mode 100644 tests/vircgroupdata/systemd-unified.cgroups create mode 100644 tests/vircgroupdata/systemd-unified.mounts create mode 100644 tests/vircgroupdata/systemd-unified.self.cgroup
Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Michal
participants (2)
-
Michal Privoznik
-
Pavel Hrdina