virtlogd spinning on 100% CPU with the latest libvirt

Build libvirt from git (ccf7567329f). Using the libvirt ‘run’ script, run something like libguestfs-test-tool. I think basically any command which runs a guest will do. NB These commands are all run as NON-root: killall libvirtd lt-libvirtd virtlogd lt-virtlogd ./build/run libguestfs-test-tool Now there will be a lt-virtlogd process using 100% of CPU: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2572972 rjones 20 0 47880 16256 14516 R 100.0 0.1 0:19.27 lt-virt+ $ ls /proc/2572972/fd -l total 0 lrwx------. 1 rjones rjones 64 Feb 17 17:45 0 -> /dev/null lrwx------. 1 rjones rjones 64 Feb 17 17:45 1 -> /dev/null lr-x------. 1 rjones rjones 64 Feb 17 17:45 11 -> /var/lib/sss/mc/passwd lr-x------. 1 rjones rjones 64 Feb 17 17:45 12 -> /var/lib/sss/mc/group lrwx------. 1 rjones rjones 64 Feb 17 17:45 13 -> 'socket:[48299994]' lr-x------. 1 rjones rjones 64 Feb 17 17:45 14 -> 'pipe:[48299995]' l-wx------. 1 rjones rjones 64 Feb 17 17:45 16 -> /home/rjones/.cache/libvirt/qemu/log/guestfs-xllxycje1blj4nmd.log l-wx------. 1 rjones rjones 64 Feb 17 17:45 17 -> /run/systemd/inhibit/1620.ref lrwx------. 1 rjones rjones 64 Feb 17 17:45 2 -> /dev/null lrwx------. 1 rjones rjones 64 Feb 17 17:45 3 -> 'socket:[48299981]' l-wx------. 1 rjones rjones 64 Feb 17 17:45 4 -> /run/user/1000/libvirt/virtlogd.pid lrwx------. 1 rjones rjones 64 Feb 17 17:45 5 -> 'socket:[48299984]' lrwx------. 1 rjones rjones 64 Feb 17 17:45 6 -> 'socket:[48299986]' lr-x------. 1 rjones rjones 64 Feb 17 17:45 7 -> 'pipe:[48299988]' l-wx------. 1 rjones rjones 64 Feb 17 17:45 8 -> 'pipe:[48299988]' lrwx------. 1 rjones rjones 64 Feb 17 17:45 9 -> 'anon_inode:[eventfd]' $ ls -ltr /home/rjones/.cache/libvirt/qemu/log/guestfs-xllxycje1blj4nmd.log -rw-------. 1 rjones rjones 4003 Feb 17 17:44 /home/rjones/.cache/libvirt/qemu/log/guestfs-xllxycje1blj4nmd.log Only one thread running with this stack trace: Thread 1 (Thread 0x7fa51f219b40 (LWP 2572972)): #0 virObjectGetLockableObj (anyobj=0x55fb2896c200) at ../../src/util/virobject.c:393 #1 virObjectLock (anyobj=0x55fb2896c200) at ../../src/util/virobject.c:427 #2 0x00007fa520fda48f in virNetServerHasClients (srv=0x55fb2896c200) at ../../src/rpc/virnetserver.c:966 #3 0x00007fa520fd7b69 in daemonServerHasClients (payload=<optimized out>, key=<optimized out>, opaque=0x7ffc8adb5a47) at ../../src/rpc/virnetdaemon.c:916 #4 0x00007fa520ea7140 in virHashForEach (data=<optimized out>, iter=<optimized out>, table=<optimized out>) at ../../src/util/virhash.c:639 #5 virHashForEach (table=0x55fb289571a0, iter=iter@entry=0x7fa520fd7b60 <daemonServerHasClients>, data=data@entry=0x7ffc8adb5a47) at ../../src/util/virhash.c:627 #6 0x00007fa520fd89ee in virNetDaemonHasClients (dmn=<optimized out>) at ../../src/rpc/virnetdaemon.c:927 #7 0x00007fa520fd8aa5 in virNetDaemonRun (dmn=0x55fb28957110) at ../../src/rpc/virnetdaemon.c:842 #8 0x000055fb27b5c8e9 in main (argc=<optimized out>, argv=0x7ffc8adb6188) at ../../src/logging/log_daemon.c:1153 pstack shows it's fairly busy and the stack trace is not very consistent, eg: #0 0x00007fa52032aaed in g_free () at /lib64/libglib-2.0.so.0 #1 0x00007fa520e73d1b in virFree (ptrptr=ptrptr@entry=0x55fb2896c1a8) at ../../src/util/viralloc.c:348 #2 0x00007fa520e982d2 in virResetError (err=0x55fb2896c1a0) at ../../src/util/virerror.c:472 #3 virResetError (err=0x55fb2896c1a0) at ../../src/util/virerror.c:468 #4 0x00007fa520e9963c in virEventRunDefaultImpl () at ../../src/util/virevent.c:341 #5 0x00007fa520fd8abd in virNetDaemonRun (dmn=0x55fb28957110) at ../../src/rpc/virnetdaemon.c:858 #6 0x000055fb27b5c8e9 in main (argc=<optimized out>, argv=0x7ffc8adb6188) at ../../src/logging/log_daemon.c:1153 #0 0x00007fa520184a37 in poll () at /lib64/libc.so.6 #1 0x00007fa520324e3e in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0 #2 0x00007fa520324f73 in g_main_context_iteration () at /lib64/libglib-2.0.so.0 #3 0x00007fa520e9a4f0 in virEventGLibRunOnce () at ../../src/util/vireventglib.c:496 #4 0x00007fa520e99645 in virEventRunDefaultImpl () at ../../src/util/virevent.c:343 #5 0x00007fa520fd8abd in virNetDaemonRun (dmn=0x55fb28957110) at ../../src/rpc/virnetdaemon.c:858 #6 0x000055fb27b5c8e9 in main (argc=<optimized out>, argv=0x7ffc8adb6188) at ../../src/logging/log_daemon.c:1153 #0 0x00007fa520322d9d in g_source_ref () at /lib64/libglib-2.0.so.0 #1 0x00007fa520322e71 in g_source_iter_next () at /lib64/libglib-2.0.so.0 #2 0x00007fa52032479f in g_main_context_check () at /lib64/libglib-2.0.so.0 #3 0x00007fa520324de2 in g_main_context_iterate.isra () at /lib64/libglib-2.0.so.0 #4 0x00007fa520324f73 in g_main_context_iteration () at /lib64/libglib-2.0.so.0 #5 0x00007fa520e9a4f0 in virEventGLibRunOnce () at ../../src/util/vireventglib.c:496 #6 0x00007fa520e99645 in virEventRunDefaultImpl () at ../../src/util/virevent.c:343 #7 0x00007fa520fd8abd in virNetDaemonRun (dmn=0x55fb28957110) at ../../src/rpc/virnetdaemon.c:858 #8 0x000055fb27b5c8e9 in main (argc=<optimized out>, argv=0x7ffc8adb6188) at ../../src/logging/log_daemon.c:1153 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html

[Dropped Peter from CC. Please don't CC individual developers unless they've explicitly requested that you do so.] On Mon, 2020-02-17 at 17:50 +0000, Richard W.M. Jones wrote:
Build libvirt from git (ccf7567329f).
Using the libvirt ‘run’ script, run something like libguestfs-test-tool. I think basically any command which runs a guest will do. NB These commands are all run as NON-root:
killall libvirtd lt-libvirtd virtlogd lt-virtlogd ./build/run libguestfs-test-tool
Now there will be a lt-virtlogd process using 100% of CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2572972 rjones 20 0 47880 16256 14516 R 100.0 0.1 0:19.27 lt-virt+
It's actually worse than that: not only virtlogd usese an unwarranted amount of CPU, but it also keeps the log file for the domain busy, thus preventing the same domain from being started again: $ virsh start alpine Domain alpine started $ virsh destroy alpine Domain alpine destroyed $ virsh start alpine error: Failed to start domain alpine error: can't connect to virtlogd: Cannot open log file: '/var/log/libvirt/qemu/alpine.log': Device or resource busy $ sudo lsof | grep alpine.log virtlogd 146845 root 16w REG 253,0 35103 17195654 /var/log/libvirt/qemu/alpine.log $ Restarting virtlogd makes thing operational again: $ sudo systemctl restart virtlogd $ virsh start alpine Domain alpine started $ My guess is that virtlogd doesn't realize the QEMU process is gone, and sits there spinning forever waiting for some output that will never arrive. -- Andrea Bolognani / Red Hat / Virtualization

On Tue, Feb 18, 2020 at 08:34:53PM +0100, Andrea Bolognani wrote:
[Dropped Peter from CC. Please don't CC individual developers unless they've explicitly requested that you do so.]
On Mon, 2020-02-17 at 17:50 +0000, Richard W.M. Jones wrote:
Build libvirt from git (ccf7567329f).
Using the libvirt ‘run’ script, run something like libguestfs-test-tool. I think basically any command which runs a guest will do. NB These commands are all run as NON-root:
killall libvirtd lt-libvirtd virtlogd lt-virtlogd ./build/run libguestfs-test-tool
Now there will be a lt-virtlogd process using 100% of CPU:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2572972 rjones 20 0 47880 16256 14516 R 100.0 0.1 0:19.27 lt-virt+
It's actually worse than that: not only virtlogd usese an unwarranted amount of CPU, but it also keeps the log file for the domain busy, thus preventing the same domain from being started again:
$ virsh start alpine Domain alpine started
$ virsh destroy alpine Domain alpine destroyed
$ virsh start alpine error: Failed to start domain alpine error: can't connect to virtlogd: Cannot open log file: '/var/log/libvirt/qemu/alpine.log': Device or resource busy
$ sudo lsof | grep alpine.log virtlogd 146845 root 16w REG 253,0 35103 17195654 /var/log/libvirt/qemu/alpine.log $
Restarting virtlogd makes thing operational again:
$ sudo systemctl restart virtlogd $ virsh start alpine Domain alpine started
$
My guess is that virtlogd doesn't realize the QEMU process is gone, and sits there spinning forever waiting for some output that will never arrive.
Yeah, since the switch to GLib event loop, the virLogHandlerDomainLogFileEvent which was supposed to clean that up is no longer called on hangup. My naive fix: https://www.redhat.com/archives/libvir-list/2020-February/msg00651.html Jano
participants (3)
-
Andrea Bolognani
-
Ján Tomko
-
Richard W.M. Jones