Libvirtd sends SIGTERM to old qemu processes after restart

Hello, Libvirt community. We have one strange issue with libivrtd. We’ve been using Libvirtd in docker for several years. This year we switched to the new generation of processes AMD 7663 and we started to use the new version(for us) of libviirtd 8.0.0. Before this we used Libvirt 6.0 Right now we have such situation: if we restart the container with Libvirt or Libvirt crashes and Docker engine restarts it the new process of Libvirt send SIGTERM to all running QEMU processes. This highlight from strace of qemu process: --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2045513, si_uid=0} --- This is output from bash where you can see that PID 2045513 is a new process of Libvirt(look at the uptime of container and PID) milan15 : ~ [0] # ps afxj | grep 2045513 2044884 2045700 2045699 2044884 pts/3 2045699 S+ 0 0:00 \_ grep --color=auto 2045513 2045492 2045513 2045513 2045513 ? -1 Ssl 0 0:04 \_ libvirtd -l milan15 : ~ [0] # milan15 : ~ [0] # docker ps | grep "libvirt$" 5b9e7d81a2f3 registry.beget.ru/vps/docker-libvirt/ubuntu/jammy:20240124-1 "libvirtd -l" 6 days ago Up 2 minutes libvirt milan15 : ~ [0] # docker top libvirt UID PID PPID C STIME TTY TIME CMD root 2045513 2045492 2 21:20 ? 00:00:04 libvirtd -l milan15 : ~ [0] # We found that in logs Libvirtd says that he is unable to access to /sys/fs/cgroups. {"log":"2024-09-04 17:40:02.831+0000: 2041803: error : virCgroupV2ParseControllersFile:282 : Unable to read from '/sys/fs/cgroup/../../machine/qemu-1394-mameluk-59ad0e58-732e-4468-9d3a-9be2cbac4931.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.831308736Z"} {"log":"2024-09-04 17:40:02.831+0000: 2041804: error : virFileReadAll:1447 : Failed to open file '/sys/fs/cgroup/../../machine/qemu-1393-pavelaw3-215f218a-48d4-4b22-b15d-90ee0665f643.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.83143703Z"} {"log":"2024-09-04 17:40:02.831+0000: 2041804: error : virCgroupV2ParseControllersFile:282 : Unable to read from ‘/sys/fs/cgroup/../../machine/qemu-1393-pavelaw3-215f218a-48d4-4b22-b15d-90ee0665f643.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.831453382Z"} We made several tests(added sleep and listed /sys/fs/cgroup directory before launching the new process ) — it seems that every process in container has ability to access to /sys/fs/cgroups. It seems that the path /sys/fs/cgroup/../../machine/ isn’t correct. We also tried Libvirt 10.0 version and many other. The result is the same. We have the same scheme on other generations of processors — and everything works excellent. This is an example of you docker-compose where we run Libvirt(just a part) version: "3.0" services: libvirt: image: mameluk_libvirtd build: context: . privileged: true volumes: - /etc/docker/libvirt/etc/libvirt:/etc/libvirt:rw - /lib/modules:/lib/modules:ro - /dev:/dev - /sys:/sys - /run:/run:rw - /var/lib/libvirt:/var/lib/libvirt:rslave - /var/log/libvirt:/var/log/libvirt - /home/docker/svc-libvirt/images:/home/svc-libvirt/images - /etc/docker/libvirt/etc/lvm:/etc/lvm - /home/docker/svc-libvirt/cidata:/home/svc-libvirt/cidata ipc: host network_mode: host environment: - TZ=Europe/Moscow pid: host restart: on-failure entrypoint: ["/bin/start.sh", ""] # depends_on: #- virtlogd container_name: mameluk_libvirtd System: Ubuntu 22.04 core: Linux milan15 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Can anyone help us to solve this problem -- Best Regards, Dmitrii Abramov

On Wed, Sep 04, 2024 at 23:47:12 +0300, Dmitrii Abramov wrote:
Hello, Libvirt community. We have one strange issue with libivrtd.
Hi, please note that user-related questions are suited for the libvirt-users list.
We’ve been using Libvirtd in docker for several years. This year we switched to the new generation of processes AMD 7663 and we started to use the new version(for us) of libviirtd 8.0.0. Before this we used Libvirt 6.0 Right now we have such situation: if we restart the container with Libvirt or Libvirt crashes and Docker engine restarts it the new process of Libvirt send SIGTERM to all running QEMU processes. This highlight from strace of qemu process: --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2045513, si_uid=0} --- This is output from bash where you can see that PID 2045513 is a new process of Libvirt(look at the uptime of container and PID) milan15 : ~ [0] # ps afxj | grep 2045513 2044884 2045700 2045699 2044884 pts/3 2045699 S+ 0 0:00 \_ grep --color=auto 2045513 2045492 2045513 2045513 2045513 ? -1 Ssl 0 0:04 \_ libvirtd -l milan15 : ~ [0] # milan15 : ~ [0] # docker ps | grep "libvirt$" 5b9e7d81a2f3 registry.beget.ru/vps/docker-libvirt/ubuntu/jammy:20240124-1 "libvirtd -l" 6 days ago Up 2 minutes libvirt milan15 : ~ [0] # docker top libvirt UID PID PPID C STIME TTY TIME CMD root 2045513 2045492 2 21:20 ? 00:00:04 libvirtd -l milan15 : ~ [0] # We found that in logs Libvirtd says that he is unable to access to /sys/fs/cgroups. {"log":"2024-09-04 17:40:02.831+0000: 2041803: error : virCgroupV2ParseControllersFile:282 : Unable to read from '/sys/fs/cgroup/../../machine/qemu-1394-mameluk-59ad0e58-732e-4468-9d3a-9be2cbac4931.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.831308736Z"} {"log":"2024-09-04 17:40:02.831+0000: 2041804: error : virFileReadAll:1447 : Failed to open file '/sys/fs/cgroup/../../machine/qemu-1393-pavelaw3-215f218a-48d4-4b22-b15d-90ee0665f643.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.83143703Z"} {"log":"2024-09-04 17:40:02.831+0000: 2041804: error : virCgroupV2ParseControllersFile:282 : Unable to read from ‘/sys/fs/cgroup/../../machine/qemu-1393-pavelaw3-215f218a-48d4-4b22-b15d-90ee0665f643.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.831453382Z"}
The function in question which fails here is called (through a few callback pointers so it's opaque) from virCgroupNewDetectMachine(), which is called from virDomainCgroupConnectCgroup which finally gets called from the function attempting to reconnect to an existing qemu instance qemuProcessReconnect(). The failure to reconnect results into the VM being terminated via qemuProcessStop().
We made several tests(added sleep and listed /sys/fs/cgroup directory before launching the new process ) — it seems that every process in container has ability to access to /sys/fs/cgroups. It seems that the path /sys/fs/cgroup/../../machine/ isn’t correct.
This path is consturcted from paths which originate from detectiion of cgroup mounts (virCgroupDetectMounts) which uses /proc/mounts. But it's hard to tell because you didn't post a full debug log here or in the upstream issue you've reported.
We also tried Libvirt 10.0 version and many other. The result is the same. We have the same scheme on other generations of processors — and everything works excellent. This is an example of you docker-compose where we run Libvirt(just a part) version: "3.0" services: libvirt: image: mameluk_libvirtd build: context: . privileged: true volumes: - /etc/docker/libvirt/etc/libvirt:/etc/libvirt:rw - /lib/modules:/lib/modules:ro - /dev:/dev - /sys:/sys - /run:/run:rw - /var/lib/libvirt:/var/lib/libvirt:rslave - /var/log/libvirt:/var/log/libvirt - /home/docker/svc-libvirt/images:/home/svc-libvirt/images - /etc/docker/libvirt/etc/lvm:/etc/lvm - /home/docker/svc-libvirt/cidata:/home/svc-libvirt/cidata ipc: host network_mode: host environment: - TZ=Europe/Moscow pid: host restart: on-failure entrypoint: ["/bin/start.sh", ""] # depends_on: #- virtlogd container_name: mameluk_libvirtd System: Ubuntu 22.04 core: Linux milan15 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Can anyone help us to solve this problem
Please note that upstream libvirt doesn't really provide support for containered deployments as nobody really tests them upstream. Said that there are users who do use libvirt this way but it has many intricacies that can be easily messed up.

On 9/5/24 10:10, Peter Krempa wrote:
On Wed, Sep 04, 2024 at 23:47:12 +0300, Dmitrii Abramov wrote:
Hello, Libvirt community. We have one strange issue with libivrtd.
We’ve been using Libvirtd in docker for several years. This year we switched to the new generation of processes AMD 7663 and we started to use the new version(for us) of libviirtd 8.0.0. Before this we used Libvirt 6.0
System: Ubuntu 22.04 core: Linux milan15 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Can anyone help us to solve this problem
Please note that upstream libvirt doesn't really provide support for containered deployments as nobody really tests them upstream. Said that there are users who do use libvirt this way but it has many intricacies that can be easily messed up.
I'll go one step further and ask you to test with a more recent version of libvirt. Since libvirt-8.0.0 there were numerous fixes to CGroup handling code and it's likely this bug has been fixed. Michal

Hello, Michal. We tested the same scheme with libvirtd version > 10.0 We had the same problems. As explained Peter the process of choosing the path the problem seems to be with cgroup v2. -- Best Regards, Dmitrii Abramov
Четверг, 5 сентября 2024, 12:42 +03:00 от Michal Prívozník <mprivozn@redhat.com>: On 9/5/24 10:10, Peter Krempa wrote:
On Wed, Sep 04, 2024 at 23:47:12 +0300, Dmitrii Abramov wrote:
Hello, Libvirt community. We have one strange issue with libivrtd.
We’ve been using Libvirtd in docker for several years. This year we switched to the new generation of processes AMD 7663 and we started to use the new version(for us) of libviirtd 8.0.0. Before this we used Libvirt 6.0
System: Ubuntu 22.04 core: Linux milan15 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Can anyone help us to solve this problem
Please note that upstream libvirt doesn't really provide support for containered deployments as nobody really tests them upstream. Said that there are users who do use libvirt this way but it has many intricacies that can be easily messed up.
I'll go one step further and ask you to test with a more recent version of libvirt. Since libvirt-8.0.0 there were numerous fixes to CGroup handling code and it's likely this bug has been fixed.
Michal

We can close the discussion. The problem is in two active mounts cgroup v2. root@milan15:/# cat /proc/mounts | grep cgro cgro up /sys/fs/ cgro up cgro up2 rw,nosuid,nodev,noexec,relatime 0 0 cgro up2 /sys/fs/ cgro up cgro up2 rw,nosuid,nodev,noexec,relatime 0 0 The problems seems to be in docker behaviour, when I try to bind /sys directory he creates additional mount cgro up /sys/fs/ cgro up cgro up2 rw,nosuid,nodev,noexec,relatime 0 0 Or maybe the problem is in distributive of OS. I’m not able to umout cgroup. But If I mount cgroup2(working variant) then I’m not able to get information about cgroup controllers. root@milan15:/# umount cgroup umount: cgroup: umount failed: Invalid argument. root@milan15:/# umount cgroup2 root@milan15:/# root@milan15:/# cat /sys/fs/cgroup/cgroup.controllers cat: /sys/fs/cgroup/cgroup.controllers: No such file or directory If understand right the code of function virCgroupDetectMounts it takes the first matched line from /proc/mount, that’s it takes non-working mount(Just Quick Look to the code) I solved the problem deleted bind sysfs and added option cgroup=host. In this case I have only one cgroup mounts and every libvirtd process can freely access to cgroup fs. Thank you for your help! -- Best Regards, Dmitrii Abramov
Четверг, 5 сентября 2024, 14:15 +03:00 от Dmitrii Abramov <a@mameluk.ru>: Hello, Michal. We tested the same scheme with libvirtd version > 10.0 We had the same problems. As explained Peter the process of choosing the path the problem seems to be with cgroup v2. -- Best Regards, Dmitrii Abramov
Четверг, 5 сентября 2024, 12:42 +03:00 от Michal Prívozník < mprivozn@redhat.com >: On 9/5/24 10:10, Peter Krempa wrote:
On Wed, Sep 04, 2024 at 23:47:12 +0300, Dmitrii Abramov wrote:
Hello, Libvirt community. We have one strange issue with libivrtd.
We’ve been using Libvirtd in docker for several years. This year we switched to the new generation of processes AMD 7663 and we started to use the new version(for us) of libviirtd 8.0.0. Before this we used Libvirt 6.0
System: Ubuntu 22.04 core: Linux milan15 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Can anyone help us to solve this problem
Please note that upstream libvirt doesn't really provide support for containered deployments as nobody really tests them upstream. Said that there are users who do use libvirt this way but it has many intricacies that can be easily messed up.
I'll go one step further and ask you to test with a more recent version of libvirt. Since libvirt-8.0.0 there were numerous fixes to CGroup handling code and it's likely this bug has been fixed.
Michal

Hello, Peter. Thank you for you answer.
Hi, please note that user-related questions are suited for the libvirt-users list. Sorry, next time will use correct email. The function in question which fails here is called (through a few callback pointers so it's opaque) from virCgroupNewDetectMachine(), which is called from virDomainCgroupConnectCgroup which finally gets called from the function attempting to reconnect to an existing qemu instance qemuProcessReconnect().
The failure to reconnect results into the VM being terminated via qemuProcessStop(). Thank you, that make sense. On the new kernel we’re using cgroupv2. On the old installation we use libvirtd 7.6 + various kernel. But there we have cgroup v1. It seems the problem exactly in cgroupv2 This path is consturcted from paths which originate from detectiion of cgroup mounts (virCgroupDetectMounts) which uses /proc/mounts.
But it's hard to tell because you didn't post a full debug log here or in the upstream issue you've reported. /proc/mounts inside container looks like milan15 : ~ [ 0 ] # docker exec -it libvirt cat /proc/mounts | grep cgroup cgroup /sys/fs/ cgroup cgroup 2 rw,nosuid,nodev,noexec,relatime 0 0 cgroup 2 /sys/fs/ cgroup cgroup 2 rw,nosuid,nodev,noexec,relatime 0 0 ON the host system anyway: milan15 : ~ [ 0 ] # cat /proc/mounts | grep cgroup cgroup 2 /sys/fs/ cgroup cgroup 2 rw,nosuid,nodev,noexec,relatime 0 0 Could this be a potential problem that inside container we have two time mounted cgroup? What kind of techinical information I can provide you yet? Please note that upstream libvirt doesn't really provide support for containered deployments as nobody really tests them upstream. Said that there are users who do use libvirt this way but it has many intricacies that can be easily messed up. I understand this. But we have had working variant with libvirtd 7.6 and cgroupv1 for more than 3 years. -- Best Regards, Dmitrii Abramov Четверг, 5 сентября 2024, 11:10 +03:00 от Peter Krempa <pkrempa@redhat.com>: On Wed, Sep 04, 2024 at 23:47:12 +0300, Dmitrii Abramov wrote:
Hello, Libvirt community. We have one strange issue with libivrtd.
Hi,
please note that user-related questions are suited for the libvirt-users list.
We’ve been using Libvirtd in docker for several years. This year we switched to the new generation of processes AMD 7663 and we started to use the new version(for us) of libviirtd 8.0.0. Before this we used Libvirt 6.0 Right now we have such situation: if we restart the container with Libvirt or Libvirt crashes and Docker engine restarts it the new process of Libvirt send SIGTERM to all running QEMU processes. This highlight from strace of qemu process: --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=2045513, si_uid=0} --- This is output from bash where you can see that PID 2045513 is a new process of Libvirt(look at the uptime of container and PID) milan15 : ~ [0] # ps afxj | grep 2045513 2044884 2045700 2045699 2044884 pts/3 2045699 S+ 0 0:00 \_ grep --color=auto 2045513 2045492 2045513 2045513 2045513 ? -1 Ssl 0 0:04 \_ libvirtd -l milan15 : ~ [0] # milan15 : ~ [0] # docker ps | grep "libvirt$" 5b9e7d81a2f3 registry.beget.ru/vps/docker-libvirt/ubuntu/jammy:20240124-1 "libvirtd -l" 6 days ago Up 2 minutes libvirt milan15 : ~ [0] # docker top libvirt UID PID PPID C STIME TTY TIME CMD root 2045513 2045492 2 21:20 ? 00:00:04 libvirtd -l milan15 : ~ [0] # We found that in logs Libvirtd says that he is unable to access to /sys/fs/cgroups. {"log":"2024-09-04 17:40:02.831+0000: 2041803: error : virCgroupV2ParseControllersFile:282 : Unable to read from '/sys/fs/cgroup/../../machine/qemu-1394-mameluk-59ad0e58-732e-4468-9d3a-9be2cbac4931.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.831308736Z"} {"log":"2024-09-04 17:40:02.831+0000: 2041804: error : virFileReadAll:1447 : Failed to open file '/sys/fs/cgroup/../../machine/qemu-1393-pavelaw3-215f218a-48d4-4b22-b15d-90ee0665f643.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.83143703Z"} {"log":"2024-09-04 17:40:02.831+0000: 2041804: error : virCgroupV2ParseControllersFile:282 : Unable to read from ‘/sys/fs/cgroup/../../machine/qemu-1393-pavelaw3-215f218a-48d4-4b22-b15d-90ee0665f643.libvirt-qemu/cgroup.controllers': No such file or directory\n","stream":"stderr","time":"2024-09-04T17:40:02.831453382Z"}
The function in question which fails here is called (through a few callback pointers so it's opaque) from virCgroupNewDetectMachine(), which is called from virDomainCgroupConnectCgroup which finally gets called from the function attempting to reconnect to an existing qemu instance qemuProcessReconnect().
The failure to reconnect results into the VM being terminated via qemuProcessStop().
We made several tests(added sleep and listed /sys/fs/cgroup directory before launching the new process ) — it seems that every process in container has ability to access to /sys/fs/cgroups. It seems that the path /sys/fs/cgroup/../../machine/ isn’t correct.
This path is consturcted from paths which originate from detectiion of cgroup mounts (virCgroupDetectMounts) which uses /proc/mounts.
But it's hard to tell because you didn't post a full debug log here or in the upstream issue you've reported.
We also tried Libvirt 10.0 version and many other. The result is the same. We have the same scheme on other generations of processors — and everything works excellent. This is an example of you docker-compose where we run Libvirt(just a part) version: "3.0" services: libvirt: image: mameluk_libvirtd build: context: . privileged: true volumes: - /etc/docker/libvirt/etc/libvirt:/etc/libvirt:rw - /lib/modules:/lib/modules:ro - /dev:/dev - /sys:/sys - /run:/run:rw - /var/lib/libvirt:/var/lib/libvirt:rslave - /var/log/libvirt:/var/log/libvirt - /home/docker/svc-libvirt/images:/home/svc-libvirt/images - /etc/docker/libvirt/etc/lvm:/etc/lvm - /home/docker/svc-libvirt/cidata:/home/svc-libvirt/cidata ipc: host network_mode: host environment: - TZ=Europe/Moscow pid: host restart: on-failure entrypoint: ["/bin/start.sh", ""] # depends_on: #- virtlogd container_name: mameluk_libvirtd System: Ubuntu 22.04 core: Linux milan15 6.5.0-35-generic #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 09:00:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux Can anyone help us to solve this problem
Please note that upstream libvirt doesn't really provide support for containered deployments as nobody really tests them upstream. Said that there are users who do use libvirt this way but it has many intricacies that can be easily messed up.
participants (3)
-
Dmitrii Abramov
-
Michal Prívozník
-
Peter Krempa