On Tue, Apr 07, 2020 at 12:48:26PM +0200, Marc-André Lureau wrote:
Hi
On Tue, Apr 7, 2020 at 10:55 AM Pavel Hrdina <phrdina(a)redhat.com> wrote:
>
> On Mon, Apr 06, 2020 at 11:26:57PM +0200, marcandre.lureau(a)redhat.com wrote:
> > From: Marc-André Lureau <marcandre.lureau(a)redhat.com>
> >
> > Hi,
> >
> > This is a small series that allows basic QEMU VM CGroup support with
> > the help of machined --user:
> >
https://github.com/systemd/systemd/pull/15312
> >
> > The first few patches are fixes to register dbus and slirp-helper
> > correctly with the VM cgroup.
> >
> > A few changes are done to the machined support, adding session
> > support, and registering the VM to get a systemd scope cgroup under
> > user machine.slice.
>
> Hi,
>
> Before we start with anything I would like to know what is the
> motivation behind having CGroup support for session VMs?
My initial motivation was to have a way to group VM processes and kill
them altogether, because I tend to have a lot of them around after a
while.
Given that systemd --user is very capable and based on
https://www.freedesktop.org/wiki/Software/systemd/writing-vm-managers/,
I thought that was probably the way to go.
>
> From the systemd pull request it looks like you would like to have
> session VMs under the /sys/fs/cgroup/machine.slice which is completely
> wrong as we should not mix system and session VMs under the same slice.
No, it is under user.slice, ex with this series:
CGroup: /user.slice/user-1000.slice/user(a)1000.service
├─machine.slice
│ └─machine-qemu\x2delmarco\x2d1\x2dfedora.scope
│ ├─24714 /usr/bin/swtpm socket --daemon --ctrl
type=unixio,path=/run/user/1000/libvirt/qemu/run/swtpm/1-fedora-swtpm.sock,mode=0600
--tpmstate dir=/home/elmarco/.config/libvirt/qemu/swtpm/053f84e7>
│ ├─24716 /usr/bin/dbus-daemon
--config-file=/run/user/1000/libvirt/qemu/run/dbus/1-fedora-dbus.conf
│ ├─24719
/home/elmarco/src/libslirp-rs/target/debug/libslirp-helper --fd=27
--dbus-id=slirp-52:54:00:9c:bb:6c
--dbus-address=unix:path=/run/user/1000/libvirt/qemu/run/dbus/1-fedora-dbus.sock
--exi>
│ ├─24722 /usr/bin/qemu-system-x86_64 -name
guest=fedora,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/home/elmarco/.config/libvirt/qemu/lib/domain-1-fedora/master-key.aes
-obje>
│ └─emulator
OK, that sounds good, I did no realize it works like that if the
machine.slice file is placed under the user directory.
> In addition it would not work because because you would use
session
> D-Bus which would start machined under user running session VM and that
> user will not have permissions to do anything with the system
> machine.slice. If a regular user wants to do anything with cgroups
> delegation has to be used and obviously we cannot delegate the system
> machine.slice, it would have to live in a different location and since
> the QEMU process is running under the specific user it would have to
> live within /sys/fs/cgroup/user.slice/user-1000.slice/user(a)1000.service/
> where by default only memory and pids controllers are available.
> Delegation would have to be set in order to get other controllers as
> well and all of this would work only if cgroups v2 are used.
I thought delegation was required too, but I can't see any "Delegate="
in my user machine cgroup tree. (using systemctl --user show - note
that /machine.slice doesn't have Delegate set either)
But you can see that basic process management works fine with the
systemd series proposed.
systemd does partial delegation for the user cgroup so the user is able
to use memory and pids controllers by default. If user needs to use
other controllers as well the administrator has to set Delegate=yes
using for example 'systemctl edit user(a)1000.service'.
We would have to document this as a prerequisite to be able to use other
controllers as well.
Yes, this is certainly cgroups v2 only.
OK, I wanted to be sure that this targets cgroups v2 only.
In general the idea sounds good and it would allow users to restrict VMs
to not consume all the resources assigned to themselves and also to get
some VM statistics that are gathered from cgroups.
Most of the patches in this series looks like unrelated fixes of our
current code so I would suggest posting them in separately from the
session cgroup support.
Pavel