On Sun, Dec 25, 2016 at 12:21:18AM +0100, Guido Günther wrote:
On Sat, Dec 24, 2016 at 05:14:44PM +0100, Guido Günther wrote:
> Hi Cedric,x
> On Wed, Dec 21, 2016 at 02:36:39PM +0100, Cedric Bosdonnat wrote:
> > Hey Christian,
> >
> > On Tue, 2016-12-20 at 12:29 +0100, Christian Ehrhardt wrote:
> > > Hi,
> > > I found an issue in libvirt related to libvirt-lxc, but fail to find the
root cause.
> > >
> > > The TL;DR is: libvirt-lxc guests get killed on libvirt restart due to
"internal error: No valid cgroup for machine"
> > >
> > > It was able to reproduce libvirt 1.3.1, 2.4 and 2.5 as packages in Ubuntu
and Debian.
> > > I wanted to ask for two things:
> > > - wider coverage where this does reproduce
> >
> > I couldn't reproduce here with openSUSE Tumbleweed and libvirt 2.5
packages.
>
> I had a short look and it seems like this sequence is killing all running
> libvirt-lxc guests reliably:
>
> # no lxc guest running yet
> export LIBVIRT_DEFAULT_URI=lxc:///
> DOMAIN=sl
> systemctl daemon-reload
>
> # start lxc guest
> virsh start ${DOMAIN}
> sleep 1 # give vm some time to start
> systemctl restart libvirtd
Using ftrae I can see that systemd moves the process into the wrong
cgroup on start:
systemd-1 [000] .... 652.333068: cgroup_attach_task: dst_root=3 dst_id=80
dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1 [000] .... 652.333117: cgroup_attach_task: dst_root=3 dst_id=80
dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1 [000] .... 652.333160: cgroup_attach_task: dst_root=6 dst_id=80
dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1 [000] .... 652.333203: cgroup_attach_task: dst_root=4 dst_id=107
dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1 [000] .... 652.333245: cgroup_attach_task: dst_root=8 dst_id=80
dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
systemd-1 [000] .... 652.333286: cgroup_attach_task: dst_root=7 dst_id=84
dst_level=2 dst_path=/system.slice/libvirtd.service pid=4073 comm=libvirt_lxc
I've attached the script to reproduce this and would be happy about
ideas of the root cause.
Ok, so when libvirt starts an LXC guest, it creates a machine slice with
system to hold the container processes. The machine slice has the container
PID 1 as its leader, but libvirt also adds the libvirt_lxc controller and
and any qemu-nbd processes to the cgroups assoicated with this machine
slice..... except it only does this for resource cgroups its using and
does *not* do this for the systemd cgroup.
So if you query libvirtd.service status, it'll show libvirt_lxc being
associated with that, instead of the machine slice
# systemctl status libvirtd.service
● libvirtd.service - Virtualization daemon
Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset:
enabled)
Active: active (running) since Thu 2017-01-05 10:38:02 GMT; 10s ago
Docs: man:libvirtd(8)
http://libvirt.org
Main PID: 6723 (libvirtd)
Tasks: 20 (limit: 4915)
CGroup: /system.slice/libvirtd.service
├─1547 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
--leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
├─1548 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf
--leasefile-ro --dhcp-script=/usr/libexec/libvirt_leaseshelper
├─6723 /usr/sbin/libvirtd --listen
└─6888 /usr/libexec/libvirt_lxc --name sl --console 25 --security=selinux
--handshake 28
# systemctl status machine-lxc\\x2d6888\\x2dsl.scope
● machine-lxc\x2d6888\x2dsl.scope - Container lxc-6888-sl
Loaded: loaded (/run/systemd/transient/machine-lxc\x2d6888\x2dsl.scope; transient;
vendor preset: disabled)
Transient: yes
Active: active (running) since Thu 2017-01-05 10:38:04 GMT; 13s ago
Tasks: 1 (limit: 16384)
Memory: 812.0K
CPU: 25ms
CGroup: /machine.slice/machine-lxc\x2d6888\x2dsl.scope
└─6889 /bin/bash
Now, when you do a restart of libvirtd.service, systemd will ensure that all
the processes associated with that service are in the right cgroups, moving
them if needed. systemd only refreshes its view of cgroup placement when
you do a daemon-reload. Hence it only notices that libvirt moved libvirt_lxc
after doing a daemon-reload. Anyway, systemd moves libvirt_lxc back into
the cgroups associated with libvirtd.service.
I think to fix this, we will need to ensure that we move libvirt_lxc into
the machine slice for the systemd cgroup controller too.
Regards,
Daniel
--
|:
http://berrange.com -o-
http://www.flickr.com/photos/dberrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org :|
|:
http://entangle-photo.org -o-
http://search.cpan.org/~danberr/ :|