
On Fri, 21.10.16 11:19, Daniel P. Berrange (berrange@redhat.com) wrote:
On Thu, Oct 20, 2016 at 02:59:45PM -0400, Tejun Heo wrote:
(reposting w/ libvir-list cc'd, sorry about the delay in reposting, was traveling and then on vacation)
Hello, Daniel. How have you been?
We (facebook) are deploying cgroup v2 and internally use libvirt to manage virtual machines, so I'm trying to add cgroup v2 support to libvirt.
Because cgroup v2's resource configurations differ from v1 in varying degrees depending on the specific resource type, it unfortunately introduces new configurations (some completely new configs, others just a different range / format). This means that adding cgroup v2 support to libvirt requires adding new config options to it and maybe implementing some form of translation mechanism between overlapping configs.
The upcoming systemd release includes all that's necessary to support v1/v2 compatibility so that users setting resource configs through systemd don't have to worry about whether v1 or v2 is in use. I'm wondering whether it would make sense to make libvirt use dbus calls to systemd to set resource configs when systemd is in use, so that it can piggyback on systemd's v1/v2 compatibility.
The big question I have around cgroup v2 is state of support for all controllers that libvirt uses (cpu, cpuacct, cpuset, memory, devices, freezer, blkio). IIUC, not all of these have been ported to cgroup v2 setup and the cpu port in particular was rejected by Linux maintainers. Libvirt has a general policy that we won't support features that only exist in out of tree patches (applies to kernel and any other software we build against or use).
IIRC from earlier discussions, the model for dealing with processes in cgroup v2 was quite different. In libvirt we rely on the ability to assign different threads within a process to different cgroups, because we need to control CPU schedular parameters on different threads in QEMU. eg we have vCPU threads, I/O threads and general emulator threads each of which get different policies.
When I spoke with Lennart about cgroup v2, way back in Jan, he indicated that while systemd can technically work with a system where some controllers are mounted as v1, while others are mounted as v2, this would not be an officially supported solution. Thus systemd in Fedora was not likely to switch to v2 until all required controllers could use v2. I'm not sure if this still corresponds to Lennarts current views, so CC'ing him to confirm/deny.
So, the "hybrid" mode is probably nothing RHEL or so would want to support. However, I think it might be a good step for Fedora at least. But yes, supporting this mode means additional porting effort for the various daemons that access cgroupfs...
I recall that systemd policy for v2 was inteded to be that no app should write to cgroup sysfs except for systemd, unless there was a sub-tree created with Delegate=yes set on the scope. So this clearly means when using v2 we'll have to use the systemd DBus APIs for managing cgroups v2 on such hosts.
Yes, this is our policy: the cgroup tree is private property of systemd (at least regarding write access), except when your have a service or scope unit where Delegate=yes is set, in which case you can manage your own subtree of that freely. Lennart -- Lennart Poettering, Red Hat