On Fri, Mar 20, 2020 at 05:42:57PM +0100, Pavel Hrdina wrote:
On Fri, Mar 20, 2020 at 03:59:41PM +0000, Daniel P. Berrangé wrote:
> On Fri, Mar 20, 2020 at 04:48:58PM +0100, Pavel Hrdina wrote:
> > On Fri, Mar 20, 2020 at 03:39:58PM +0000, Daniel P. Berrangé wrote:
> > > On Fri, Mar 20, 2020 at 04:30:07PM +0100, Pavel Hrdina wrote:
> > > > On Fri, Mar 20, 2020 at 01:40:10PM +0000, Daniel P. Berrangé wrote:
> > > > > This simple series allows apps to choose how cgroups are managed
by
> > > > > libvirt, between cgroupfs and machined, or disabled entirely.
> > > >
> > > > I'm not so sure about this series. The situation with cgroups
and
> > > > systemd is a bit more complex then the current code handles.
> > > >
> > > > There is an existing issue where we are violating the delegation
rules
> > > > described by cgroups and systemd.
> > > >
> > > > Currently the "cgroupfs" approach is used only on
non-systemd hosts and
> > > > we should keep it that way. Libvirt is not allowed to mangle with
> > > > cgroups owned by systemd so IMHO the warning in configuration file
is
> > > > not enough because without delegation cgroups will not work properly
in
> > > > libvirt.
> > >
> > > That isn't the case currently AFAICT from current code.
> > >
> > > Before this series, the virCgroupNewMachine method will first try
> > > systemd and then fallback to directly cgroupfs.
> > >
> > > This fallback can happen on a systemd host, when machined is not
> > > installed, as machined is an optional component.
> >
> > Right, I should have checked the code.
> >
> > > If we need to mandate use of systemd on systemd hosts, then our
> > > existing code is broken and needs fixing.
> >
> > I think we should do that because without delegation we should not touch
> > anything in cgroups.
> >
> > > I'm happy todo such a fix, and then adjust this series to take
> > > account of it. Essentially we'd allow apps to specify
'cgroupfs'
> > > or 'machined' but we'd enforce they only make safe choices.
> > >
> > > ie on a systemd host we'd only allow 'none' or
'machined'
> > >
> > > On a non-systemd host, or on a systemd host where we've
> > > been delegated a subtree, we'd only allow 'none' or
'cgroupfs'
> >
> > This sounds reasonable, I was thinking about the check if we have
> > delegated subtree but decided not to mention it. The only place where
> > we can check the delegation is using systemd API, probably over D-Bus.
> >
> > It's not reflected anywhere in the cgroup files.
>
> The key scenario for cgroupfs would be if libvirtd was run inside a
> container on a host that otherwise uses systemd. IIUC, the container
> should get given a delegated subtree and we need to be able to use
> cgroupfs backend here. Currently this works (accidentally) because
> we'll try to talk to machined and fail, so fallback to direct usage.
That should be OK and we can support this even outside of containers if
we can verify that the cgroup subtree is delegated and it's safe to use
it for VMs.
The wrong usage of delegation by libvirt is the source of the reported
BUG so we should be restrictive.
Do you have a pointer to the bugs you've seen in this area. In particular
I'm wondering if there's a need for different behaviour with v1 vs v2
cgroups. Systemd has always wanted proper delegation behaviour from apps,
but IIUC, this was mostly a nice-to-have under v1, compared to a must-have
under v2.
My big concern is that we put in a restriction and accidentally break
someone's deployment scenario under v1.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|