On Wed, Sep 04, 2024 at 23:47:12 +0300, Dmitrii Abramov wrote:
Hello, Libvirt community.
We have one strange issue with libivrtd.
Hi,
please note that user-related questions are suited for the libvirt-users
list.
We’ve been using Libvirtd in docker for several years. This year we switched to the new
generation of processes AMD 7663 and we started to use the new version(for us) of
libviirtd 8.0.0. Before this we used Libvirt 6.0
Right now we have such situation:
if we restart the container with Libvirt or Libvirt crashes and Docker engine restarts it
the new process of Libvirt send SIGTERM to all running QEMU processes.
This highlight from strace of qemu process:
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2045513, si_uid=0} ---
This is output from bash where you can see that PID 2045513 is a new process of
Libvirt(look at the uptime of container and PID)
milan15 : ~ [0] # ps afxj | grep 2045513
2044884 2045700 2045699 2044884 pts/3 2045699 S+ 0 0:00 \_ grep
--color=auto 2045513
2045492 2045513 2045513 2045513 ? -1 Ssl 0 0:04 \_ libvirtd -l
milan15 : ~ [0] #
milan15 : ~ [0] # docker ps | grep "libvirt$"
5b9e7d81a2f3 registry.beget.ru/vps/docker-libvirt/ubuntu/jammy:20240124-1
"libvirtd -l" 6 days ago Up 2 minutes libvirt
milan15 : ~ [0] # docker top libvirt
UID PID PPID C STIME
TTY TIME CMD
root 2045513 2045492 2 21:20
? 00:00:04 libvirtd -l
milan15 : ~ [0] #
We found that in logs Libvirtd says that he is unable to access to /sys/fs/cgroups.
{"log":"2024-09-04 17:40:02.831+0000: 2041803: error :
virCgroupV2ParseControllersFile:282 : Unable to read from
'/sys/fs/cgroup/../../machine/qemu-1394-mameluk-59ad0e58-732e-4468-9d3a-9be2cbac4931.libvirt-qemu/cgroup.controllers':
No such file or
directory\n","stream":"stderr","time":"2024-09-04T17:40:02.831308736Z"}
{"log":"2024-09-04 17:40:02.831+0000: 2041804: error : virFileReadAll:1447
: Failed to open file
'/sys/fs/cgroup/../../machine/qemu-1393-pavelaw3-215f218a-48d4-4b22-b15d-90ee0665f643.libvirt-qemu/cgroup.controllers':
No such file or
directory\n","stream":"stderr","time":"2024-09-04T17:40:02.83143703Z"}
{"log":"2024-09-04 17:40:02.831+0000: 2041804: error :
virCgroupV2ParseControllersFile:282 : Unable to read from
‘/sys/fs/cgroup/../../machine/qemu-1393-pavelaw3-215f218a-48d4-4b22-b15d-90ee0665f643.libvirt-qemu/cgroup.controllers':
No such file or
directory\n","stream":"stderr","time":"2024-09-04T17:40:02.831453382Z"}
The function in question which fails here is called (through a few
callback pointers so it's opaque) from virCgroupNewDetectMachine(),
which is called from virDomainCgroupConnectCgroup which finally gets
called from the function attempting to reconnect to an existing qemu
instance qemuProcessReconnect().
The failure to reconnect results into the VM being terminated via
qemuProcessStop().
We made several tests(added sleep and listed /sys/fs/cgroup directory before launching
the new process ) — it seems that every process in container has ability to access to
/sys/fs/cgroups.
It seems that the path /sys/fs/cgroup/../../machine/ isn’t correct.
This path is consturcted from paths which originate from detectiion of
cgroup mounts (virCgroupDetectMounts) which uses /proc/mounts.
But it's hard to tell because you didn't post a full debug log here or
in the upstream issue you've reported.
We also tried Libvirt 10.0 version and many other. The result is the
same.
We have the same scheme on other generations of processors — and everything works
excellent.
This is an example of you docker-compose where we run Libvirt(just a part)
version: "3.0"
services:
libvirt:
image: mameluk_libvirtd
build:
context: .
privileged: true
volumes:
- /etc/docker/libvirt/etc/libvirt:/etc/libvirt:rw
- /lib/modules:/lib/modules:ro
- /dev:/dev
- /sys:/sys
- /run:/run:rw
- /var/lib/libvirt:/var/lib/libvirt:rslave
- /var/log/libvirt:/var/log/libvirt
- /home/docker/svc-libvirt/images:/home/svc-libvirt/images
- /etc/docker/libvirt/etc/lvm:/etc/lvm
- /home/docker/svc-libvirt/cidata:/home/svc-libvirt/cidata
ipc: host
network_mode: host
environment:
- TZ=Europe/Moscow
pid: host
restart: on-failure
entrypoint: ["/bin/start.sh", ""]
# depends_on:
#- virtlogd
container_name: mameluk_libvirtd
System: Ubuntu 22.04
core: Linux milan15 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7
09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Can anyone help us to solve this problem
Please note that upstream libvirt doesn't really provide support for
containered deployments as nobody really tests them upstream. Said that
there are users who do use libvirt this way but it has many intricacies
that can be easily messed up.