On Wed, Apr 07, 2021 at 11:28:48PM -0400, Eric Farman wrote:
On 4/7/21 9:07 AM, Pavel Hrdina wrote:
> On Fri, Mar 26, 2021 at 05:25:03PM +0100, Eric Farman wrote:
> > The introduction of nested cgroups used a little macro
> > virCgroupGetNested() to retrieve the nested cgroup
> > pointer, if one exists. But this macro isn't used when
> > removing cgroups, resulting in some messages:
> >
> > Mar 25 20:55:17 fedora33 libvirtd[955]: unable to open
'/sys/fs/cgroup/machine.slice/machine-qemu\x2d1\x2dguest.scope/': No such file or
directory
> > Mar 25 20:55:17 fedora33 libvirtd[955]: Failed to remove cgroup for guest
> >
> > That directory exists while the guest is running, as it
> > was created by systemd/machined, so the code probably meant
> > to open the libvirt/ subdirectory from that point.
> >
> > Similarly, there happen to be BPF-related file descriptors
> > that don't get cleaned up in this process too, because they
> > are anchored off the nested cgroup location:
> >
> > [test@fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
> > 35
> > [test@fedora33 ~]# virsh create guest.xml
> > Domain 'guest' created from guest.xml
> >
> > [test@fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
> > 42
> > [test@fedora33 ~]# virsh shutdown guest
> > Domain 'guest' is being shutdown
> >
> > [test@fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
> > 37
> > [test@fedora33 ~]# virsh create guest.xml
> > Domain 'guest' created from guest.xml
> >
> > [test@fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
> > 44
> > [test@fedora33 ~]# virsh shutdown guest
> > Domain 'guest' is being shutdown
> >
> > [test@fedora33 ~]# ls /proc/$(pgrep libvirtd)/fd/* | wc -l
> > 39
> >
> > Let's fix this by using the same macro when removing cgroups,
> > so that it picks up the right structure and can remove the
> > associated resources properly.
> >
> > Fixes: 184245f53b94 ("vircgroup: introduce nested cgroup to properly work
with systemd")
> > Signed-off-by: Eric Farman <farman(a)linux.ibm.com>
> > ---
> > src/util/vircgroup.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
>
> I don't thing this patch is correct. With systemd we would get the same
> error without the nested cgroup as well. It's because we terminate the
> qemu process which makes systemd remove the VM root cgroup as well.
I don't experience any problems reverting the blamed patch. The qemu cgroup
code doesn't make any distinction about systemd or not; it just calls the
virCgroupRemove() to clean up the resources that were set up here during
init:
qemuInitCgroup()
virCgroupNewMachine()
virCgroupNewMachineSystemd()
virCgroupNewNested()
The group pointer that's stashed in qemu's struct is that of the
machine-qemu...scope group, rather than the nested group, but nothing in the
cleanup path touches group->nested. My initial patch is certainly flawed (as
you explain below), so maybe something like this is better?
@@ -2615,6 +2615,9 @@ virCgroupRemove(virCgroupPtr group)
{
size_t i;
+ if (group->nested)
+ virCgroupRemove(group->nested);
+
for (i = 0; i < VIR_CGROUP_BACKEND_TYPE_LAST; i++) {
if (group->backends[i]) {
int rc = group->backends[i]->remove(group);
Not great, since it cleans up the nested group but then still attempts to
clean up the machine-qemu...scope group that was setup by systemd. This
group wasn't setup by virCgroupV2MakeGroup(), so calling virCgroupV2Remove()
seems wrong too. Not sure how to address that.
I'm not sure how this will help. As I've already pointed out calling
virCgroupRemove(group) results in calling one or both functions:
virCgroupV1Remove()
virCgroupV2Remove()
Where both of these functions will call virCgroupRemoveRecursively()
which will recursively remove all subdirectories including the nested
one.
So the extra
if (group->nested)
virCgroupRemove(group->nested);
will not help with anything.
Looking at the code (I did not test it) it looks like the error message
is produced by following this path:
qemuProcessStop()
qemuRemoveCgroup()
virCgroupTerminateMachine() this will make systemd to remove the cgroup
virCgroupRemove()
virCgroupV2Remove()
virCgroupV2DevicesRemoveProg()
virCgroupV2DevicesDetectProg()
open()
Unfortunately we cannot simply ignore ENOENT in the
virCgroupV2DevicesDetectProg() because it is used in different places
where such error should be reported.
What we can do is to introduce new parameter `bool quiet` into
virCgroupV2DevicesDetectProg() that would silence the ENOENT error if
set to true and we could use it from virCgroupV2DevicesRemoveProg()
or something similar.
Pavel
>
> This happens only on cgroup controllers managed by systemd. For example
> if you switch to cgroups v1 where each controller is in separate
> directory not all controllers supported by libvirt are also supported by
> systemd. In this case libvirt creates all the cgroups by itself and is
> responsible to cleanup as well. With this patch we would not remove the
> VM root cgroups in these controllers. This would affect following
> controllers:
>
> cpuset
> freezer
> net_cls
> net_prio
> perf_event
>
> You can verify what happens with cgroups v1 by adding
> systemd.unified_cgroup_hierarchy=0 to your kernel cmdline.
Neat, thanks for that tip.
Thanks,
Eric
>
> Pavel
>