Libvirt slow after a couple of months uptime

Hello, I have some issues with libvirtd getting slow over time. After a fresh reboot (or systemctl restart libvirtd) virsh list / virt-install is fast, as expected, but after a couple of months uptime they both take a significantly longer time. Virsh list takes around 3 seconds (from 0.04s on a fresh reboot) and virt-install takes over a minute (from around a second). Running strace on virsh list it seems to get stuck in a loop on this: poll([{fd=5<socket:[173169773]>, events=POLLOUT}, {fd=6<anon_inode:[eventfd]>, events=POLLIN}], 2, -1) = 2 ([{fd=5, revents=POLLOUT}, {fd=6, revents=POLLIN}]) While restarting libvirtd fixes it a restart takes around 1 minute where ebtables rules etc are recreated and it does interrupt the service. What could cause this? How would I troubleshoot this? I'm running Ubuntu 22.04 / libvirt 8.0.0 with 70 active VM’s on a 16/32 core machine with 256GB of ram, CPU is below 50% usage at all times, memory below 50% usage and swap 0% usage. Thanks, André

On Fri, Sep 16, 2022 at 19:41:28 +0200, André Malm wrote:
Hello,
I have some issues with libvirtd getting slow over time.
After a fresh reboot (or systemctl restart libvirtd) virsh list / virt-install is fast, as expected, but after a couple of months uptime they both take a significantly longer time.
Virsh list takes around 3 seconds (from 0.04s on a fresh reboot) and virt-install takes over a minute (from around a second).
Running strace on virsh list it seems to get stuck in a loop on this: poll([{fd=5<socket:[173169773]>, events=POLLOUT}, {fd=6<anon_inode:[eventfd]>, events=POLLIN}], 2, -1) = 2 ([{fd=5, revents=POLLOUT}, {fd=6, revents=POLLIN}])
Unfortunately this bit doesn't help much. Virsh' is simply a client which does RPC over a unix socket to the libvirt/virtqemud daemon based on your host configuration. This means that what you straced was simply a event loop waiting for the communication with the server. In fact there's a whole thread simply for polling and dispatching the calls so it's expected that it's always stuck in a poll().
While restarting libvirtd fixes it
So it looks like the problem isn't in virsh at all. In such case stracing virsh won't help at all as it's a completely different process from the dameon.
a restart takes around 1 minute where ebtables rules etc are recreated and it does interrupt the service. What could cause this? How would I troubleshoot this?
The best way to at least get an idea where the problem might be would be to collect debug logs of the libvirt daemon (libvirtd/virtqemud based on how your host is configured). To enable debug logs you can use the following guide, which also explains how to figure out which daemon is in use and also outlines how to set it without restarting the daemon. Make sure to read the appropriate chapters: https://www.libvirt.org/kbase/debuglogs.html The log contains timestamps so we'll be able to see what bogs down the runtime once it's in the 'slow' period.

Hello, Thanks for the reply. I've now run into a slow server again and the debug log print around 500 of these lines a second; 2022-11-11 06:55:52.948+0000: 1470: debug : virEventGLibHandleDispatch:113 : Dispatch handler data=0x7f336402ac00 watch=7 fd=24 events=1 opaque=(nil) 2022-11-11 06:55:52.948+0000: 1470: info : virEventGLibHandleDispatch:116 : EVENT_GLIB_DISPATCH_HANDLE: watch=7 events=1 cb=0x7f33a824d610 opaque=(nil) 2022-11-11 06:55:52.948+0000: 1470: debug : virEventRunDefaultImpl:341 : running default event implementation What might be causing this? Den 2022-09-19 kl. 13:22, skrev Peter Krempa:
On Fri, Sep 16, 2022 at 19:41:28 +0200, André Malm wrote:
Hello,
I have some issues with libvirtd getting slow over time.
After a fresh reboot (or systemctl restart libvirtd) virsh list / virt-install is fast, as expected, but after a couple of months uptime they both take a significantly longer time.
Virsh list takes around 3 seconds (from 0.04s on a fresh reboot) and virt-install takes over a minute (from around a second).
Running strace on virsh list it seems to get stuck in a loop on this: poll([{fd=5<socket:[173169773]>, events=POLLOUT}, {fd=6<anon_inode:[eventfd]>, events=POLLIN}], 2, -1) = 2 ([{fd=5, revents=POLLOUT}, {fd=6, revents=POLLIN}]) Unfortunately this bit doesn't help much. Virsh' is simply a client which does RPC over a unix socket to the libvirt/virtqemud daemon based on your host configuration.
This means that what you straced was simply a event loop waiting for the communication with the server. In fact there's a whole thread simply for polling and dispatching the calls so it's expected that it's always stuck in a poll().
While restarting libvirtd fixes it So it looks like the problem isn't in virsh at all. In such case stracing virsh won't help at all as it's a completely different process from the dameon.
a restart takes around 1 minute where ebtables rules etc are recreated and it does interrupt the service. What could cause this? How would I troubleshoot this? The best way to at least get an idea where the problem might be would be to collect debug logs of the libvirt daemon (libvirtd/virtqemud based on how your host is configured).
To enable debug logs you can use the following guide, which also explains how to figure out which daemon is in use and also outlines how to set it without restarting the daemon. Make sure to read the appropriate chapters:
https://www.libvirt.org/kbase/debuglogs.html
The log contains timestamps so we'll be able to see what bogs down the runtime once it's in the 'slow' period.

I am running qemu/kvm on a Manjaro Laptop. There is a thing with a package named ceph-libs being removed from manjaro repos apparently. This requires to build it from AUR which fails. The only reason for ceph-libs seems that apparently qemu-block-rbd depends on it. However nothing else seems to depend on qemu-block-rbd. Can anybody tell me whether in your eyes qemu-block-rbd may be save to remove and likewise resolve this problem?

maybe good to add ... I am not aware using ceph in any way. On 12.11.22 08:41, vrms wrote:
I am running qemu/kvm on a Manjaro Laptop. There is a thing with a package named ceph-libs being removed from manjaro repos apparently. This requires to build it from AUR which fails. The only reason for ceph-libs seems that apparently qemu-block-rbd depends on it. However nothing else seems to depend on qemu-block-rbd.
Can anybody tell me whether in your eyes qemu-block-rbd may be save to remove and likewise resolve this problem?
-- Gunnar Wagner | Jahnstr. 5, 19386 Lübz | mob +49.176.7080.9090

On Sat, Nov 12, 2022 at 08:41:44 +0100, vrms wrote: Firstly please don't start a conversation by replying to an existing one as it gets threaded improperly.
I am running qemu/kvm on a Manjaro Laptop. There is a thing with a package named ceph-libs being removed from manjaro repos apparently. This requires to build it from AUR which fails. The only reason for ceph-libs seems that apparently qemu-block-rbd depends on it. However nothing else seems to depend on qemu-block-rbd.
Can anybody tell me whether in your eyes qemu-block-rbd may be save to remove and likewise resolve this problem?
'qemu-block-rbd' is a qemu backed for accessing ceph/RBD disks. qemu is modular so it will work without the backend if you don't need to access RBD storage, so you can remove the package without any problem.

Further investigating this it indeed seems like libvirtd is stuck in a busy-loop in the event loop. It repeatedly reads and writes from fd 10; read(10, "\2\0\0\0\0\0\0\0", 16) = 8 write(10, "\1\0\0\0\0\0\0\0", 8) = 8 fd 10 being; 10u a_inode 0,14 0 12812 [eventfd] Any ideas on how to find the root cause of this? -------- Vidarebefordrat meddelande -------- Ämne: Re: Libvirt slow after a couple of months uptime Datum: Fri, 11 Nov 2022 08:02:19 +0100 Från: André Malm <admin@sheepa.org> Till: Peter Krempa <pkrempa@redhat.com> Kopia: libvirt-users@redhat.com Hello, Thanks for the reply. I've now run into a slow server again and the debug log print around 500 of these lines a second; 2022-11-11 06:55:52.948+0000: 1470: debug : virEventGLibHandleDispatch:113 : Dispatch handler data=0x7f336402ac00 watch=7 fd=24 events=1 opaque=(nil) 2022-11-11 06:55:52.948+0000: 1470: info : virEventGLibHandleDispatch:116 : EVENT_GLIB_DISPATCH_HANDLE: watch=7 events=1 cb=0x7f33a824d610 opaque=(nil) 2022-11-11 06:55:52.948+0000: 1470: debug : virEventRunDefaultImpl:341 : running default event implementation What might be causing this? Den 2022-09-19 kl. 13:22, skrev Peter Krempa:
On Fri, Sep 16, 2022 at 19:41:28 +0200, André Malm wrote:
Hello,
I have some issues with libvirtd getting slow over time.
After a fresh reboot (or systemctl restart libvirtd) virsh list / virt-install is fast, as expected, but after a couple of months uptime they both take a significantly longer time.
Virsh list takes around 3 seconds (from 0.04s on a fresh reboot) and virt-install takes over a minute (from around a second).
Running strace on virsh list it seems to get stuck in a loop on this: poll([{fd=5<socket:[173169773]>, events=POLLOUT}, {fd=6<anon_inode:[eventfd]>, events=POLLIN}], 2, -1) = 2 ([{fd=5, revents=POLLOUT}, {fd=6, revents=POLLIN}]) Unfortunately this bit doesn't help much. Virsh' is simply a client which does RPC over a unix socket to the libvirt/virtqemud daemon based on your host configuration.
This means that what you straced was simply a event loop waiting for the communication with the server. In fact there's a whole thread simply for polling and dispatching the calls so it's expected that it's always stuck in a poll().
While restarting libvirtd fixes it So it looks like the problem isn't in virsh at all. In such case stracing virsh won't help at all as it's a completely different process from the dameon.
a restart takes around 1 minute where ebtables rules etc are recreated and it does interrupt the service. What could cause this? How would I troubleshoot this? The best way to at least get an idea where the problem might be would be to collect debug logs of the libvirt daemon (libvirtd/virtqemud based on how your host is configured).
To enable debug logs you can use the following guide, which also explains how to figure out which daemon is in use and also outlines how to set it without restarting the daemon. Make sure to read the appropriate chapters:
https://www.libvirt.org/kbase/debuglogs.html
The log contains timestamps so we'll be able to see what bogs down the runtime once it's in the 'slow' period.
participants (4)
-
André Malm
-
gunnar.wagner
-
Peter Krempa
-
vrms