[libvirt-users] Could not destroy domain, current job is remoteDispatchConnectGetAllDomainStats

Hello libvirt-users list, We're catching the same bug since 3.4.0 version (3.3.0 works OK). So, we have process that is permanently connected to libvirtd via socket and it is collecting stats, listening to events and control the VPSes. When we try to 'shutdown' a number of VPSes we often catch the bug. One of VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, and there is the next error in the log: Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virFileReadAll:1420 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetValueStr:844 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu account: Operation not permitted Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job (destroy, none) for domain DOMAIN1; current job is (query, none) owned by (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats) I think only the last line matters. The bug is highly reproducible. We can easily catch it even when we call multiple 'virsh shutdown' in shell one by one. When we shutdown the process connected to the socket - everything become OK and the bug is gone. The system is used is Gentoo Linux, tried all modern versions of libvirt (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git)) and they have this bug. 3.3.0 works OK. Thanks for any help in advance. We can send any additional info if needed. ~Serhii

On 01/17/2018 03:45 PM, Serhii Kharchenko wrote:
Hello libvirt-users list,
We're catching the same bug since 3.4.0 version (3.3.0 works OK). So, we have process that is permanently connected to libvirtd via socket and it is collecting stats, listening to events and control the VPSes.
When we try to 'shutdown' a number of VPSes we often catch the bug. One of VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, and there is the next error in the log:
Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virFileReadAll:1420 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetValueStr:844 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu account: Operation not permitted Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job (destroy, none) for domain DOMAIN1; current job is (query, none) owned by (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
Sounds like qemuConnectGetAllDomainStats() forgot to unset the job on a domain. Can you please attach gdb to libvirtd and run 't a a bt' and share the output? Michal

2018-01-17 17:32 GMT+02:00 Michal Privoznik <mprivozn@redhat.com>:
Sounds like qemuConnectGetAllDomainStats() forgot to unset the job on a domain. Can you please attach gdb to libvirtd and run 't a a bt' and share the output?
Michal
This is output of gdb 't a a bt' ~5 seconds after that error message appeared in log with remoteDispatchConnectGetAllDomainStats() mentioned. Thread 18 (Thread 0x7fd395ffb700 (LWP 3803)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7fd3967fc700 (LWP 2858)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3a6589a20 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_nodedev.so #3 0x00007fd3bc5b2de2 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7fd3977fe700 (LWP 2854)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7fd397fff700 (LWP 2852)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7fd3a4bbd700 (LWP 2851)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7fd3a53be700 (LWP 2850)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7fd3a5bbf700 (LWP 2849)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7fd3ae7fc700 (LWP 2848)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7fd3aeffd700 (LWP 2847)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7fd3af7fe700 (LWP 2846)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7fd3affff700 (LWP 2845)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7fd3b4fef700 (LWP 2844)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7fd3a7fff700 (LWP 2843)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7fd3b57f0700 (LWP 2842)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7fd3b5ff1700 (LWP 2841)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7fd3b67f2700 (LWP 2840)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc5b3b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7fd3b6ff3700 (LWP 2839)): #0 0x00007fd3bbc1f746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007fd3bc5b3016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007fd3a5c7776b in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #3 0x00007fd3a5c8b355 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #4 0x00007fd3a5c8d85f in qemuMonitorJSONGetMemoryStats () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #5 0x00007fd3a5c9efc3 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #6 0x00007fd3a5c9f18a in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #7 0x00007fd3a5c9f77f in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #8 0x00007fd3bc672abe in virDomainListGetStats () from /usr/lib64/libvirt.so.0 #9 0x000055f1c9fbb3b5 in ?? () #10 0x00007fd3bc6cd308 in virNetServerProgramDispatch () from /usr/lib64/libvirt.so.0 #11 0x000055f1c9feaef8 in ?? () #12 0x00007fd3bc5b3a2b in ?? () from /usr/lib64/libvirt.so.0 #13 0x00007fd3bc5b2db8 in ?? () from /usr/lib64/libvirt.so.0 #14 0x00007fd3bbc186b3 in start_thread () from /lib64/libpthread.so.0 #15 0x00007fd3bb94c76f in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7fd3bd295800 (LWP 2838)): #0 0x00007fd3bb9409cc in poll () from /lib64/libc.so.6 #1 0x00007fd3bc55cc84 in virEventPollRunOnce () from /usr/lib64/libvirt.so.0 #2 0x00007fd3bc55b971 in virEventRunDefaultImpl () from /usr/lib64/libvirt.so.0 #3 0x00007fd3bc6c864d in virNetDaemonRun () from /usr/lib64/libvirt.so.0 #4 0x000055f1c9fb1700 in ?? () #5 0x00007fd3bb869fe1 in __libc_start_main () from /lib64/libc.so.6 #6 0x000055f1c9fb1ffa in ?? ()

2018-01-17 18:38 GMT+02:00 Serhii Kharchenko <harrykas@gmail.com <mailto:harrykas@gmail.com>>: 2018-01-17 17:32 GMT+02:00 Michal Privoznik <mprivozn@redhat.com <mailto:mprivozn@redhat.com>>: Sounds like qemuConnectGetAllDomainStats() forgot to unset the job on a domain. Can you please attach gdb to libvirtd and run 't a a bt' and share the output? Michal This is output of gdb 't a a bt' ~5 seconds after that error message appeared in log with remoteDispatchConnectGetAllDomainStats() mentioned. And this is output of gdb 't a a bt' executed after the VPS is stuck in 'in shutdown' state but error message is not yet logged. It logged after I executed gdb command. Thread 18 (Thread 0x7ff36ffff700 (LWP 29836)): #0 0x00007ff3c0fe4ca6 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978082 in virCondWaitUntil () from /usr/lib64/libvirt.so.0 #2 0x00007ff378017a28 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #3 0x00007ff37801b9ab in qemuDomainObjBeginJob () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #4 0x00007ff378047306 in qemuProcessBeginStopJob () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #5 0x00007ff3780b1689 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #6 0x00007ff3c1978a2b in ?? () from /usr/lib64/libvirt.so.0 #7 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #8 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #9 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x7ff374fa1700 (LWP 28463)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff378971a20 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_nodedev.so #3 0x00007ff3c1977de2 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x7ff375fa3700 (LWP 28459)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x7ff3767a4700 (LWP 28458)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x7ff376fa5700 (LWP 28457)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x7ff3777a6700 (LWP 28456)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7ff377fa7700 (LWP 28455)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7ff3b37fe700 (LWP 28454)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x7ff3b3fff700 (LWP 28453)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x7ff3b8bb1700 (LWP 28452)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x7ff3b93b2700 (LWP 28451)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x7ff3b9bb3700 (LWP 28450)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b24 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x7ff3ba3b4700 (LWP 28449)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x7ff3babb5700 (LWP 28448)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x7ff3bb3b6700 (LWP 28447)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x7ff3bbbb7700 (LWP 28446)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff37805f76b in qemuMonitorSend () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #3 0x00007ff378073355 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #4 0x00007ff378075237 in qemuMonitorJSONQueryCPUs () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #5 0x00007ff378061dd2 in qemuMonitorGetCpuHalted () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #6 0x00007ff378023c9e in qemuDomainRefreshVcpuHalted () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #7 0x00007ff3780a3329 in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #8 0x00007ff37808777f in ?? () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #9 0x00007ff3c1a37abe in virDomainListGetStats () from /usr/lib64/libvirt.so.0 #10 0x0000560fd00243b5 in ?? () #11 0x00007ff3c1a92308 in virNetServerProgramDispatch () from /usr/lib64/libvirt.so.0 #12 0x0000560fd0053ef8 in ?? () #13 0x00007ff3c1978a2b in ?? () from /usr/lib64/libvirt.so.0 #14 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #15 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #16 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x7ff3bc3b8700 (LWP 28445)): #0 0x00007ff3c0fe4746 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007ff3c1978016 in virCondWait () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1978b63 in ?? () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1977db8 in ?? () from /usr/lib64/libvirt.so.0 #4 0x00007ff3c0fdd6b3 in start_thread () from /lib64/libpthread.so.0 #5 0x00007ff3c0d1176f in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x7ff3c265a800 (LWP 28444)): #0 0x00007ff3c0d059cc in poll () from /lib64/libc.so.6 #1 0x00007ff3c1921c84 in virEventPollRunOnce () from /usr/lib64/libvirt.so.0 #2 0x00007ff3c1920971 in virEventRunDefaultImpl () from /usr/lib64/libvirt.so.0 #3 0x00007ff3c1a8d64d in virNetDaemonRun () from /usr/lib64/libvirt.so.0 #4 0x0000560fd001a700 in ?? () #5 0x00007ff3c0c2efe1 in __libc_start_main () from /lib64/libc.so.6 #6 0x0000560fd001affa in ?? ()

On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote:
Hello libvirt-users list,
We're catching the same bug since 3.4.0 version (3.3.0 works OK). So, we have process that is permanently connected to libvirtd via socket and it is collecting stats, listening to events and control the VPSes.
When we try to 'shutdown' a number of VPSes we often catch the bug. One of VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, and there is the next error in the log:
Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virFileReadAll:1420 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetValueStr:844 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage': No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu account: Operation not permitted Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job (destroy, none) for domain DOMAIN1; current job is (query, none) owned by (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
I think only the last line matters. The bug is highly reproducible. We can easily catch it even when we call multiple 'virsh shutdown' in shell one by one.
When we shutdown the process connected to the socket - everything become OK and the bug is gone.
The system is used is Gentoo Linux, tried all modern versions of libvirt (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git)) and they have this bug. 3.3.0 works OK.
I don't see anything obvious stats related in the diff between 3.3.0 and 3.4.0. We have added reporting of the shutdown reason, but that's just parsing one more JSON reply we previously ignored. Can you try running 'git bisect' to pinpoint the exact commit that caused this issue? Jan
Thanks for any help in advance. We can send any additional info if needed.
~Serhii
_______________________________________________ libvirt-users mailing list libvirt-users@redhat.com https://www.redhat.com/mailman/listinfo/libvirt-users

On 01/18/2018 08:25 AM, Ján Tomko wrote:
On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote:
Hello libvirt-users list,
We're catching the same bug since 3.4.0 version (3.3.0 works OK). So, we have process that is permanently connected to libvirtd via socket and it is collecting stats, listening to events and control the VPSes.
When we try to 'shutdown' a number of VPSes we often catch the bug. One of VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, and there is the next error in the log:
Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virFileReadAll:1420 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage':
No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetValueStr:844 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage':
No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu account: Operation not permitted Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job (destroy, none) for domain DOMAIN1; current job is (query, none) owned by (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
I think only the last line matters. The bug is highly reproducible. We can easily catch it even when we call multiple 'virsh shutdown' in shell one by one.
When we shutdown the process connected to the socket - everything become OK and the bug is gone.
The system is used is Gentoo Linux, tried all modern versions of libvirt (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git)) and they have this bug. 3.3.0 works OK.
I don't see anything obvious stats related in the diff between 3.3.0 and 3.4.0. We have added reporting of the shutdown reason, but that's just parsing one more JSON reply we previously ignored.
Can you try running 'git bisect' to pinpoint the exact commit that caused this issue?
I am able to reproduce this issue, ran bisect and fount that the commit which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1. https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=aeda1b8c56dc58b0a413ac... But it's very unlikely that the commit is causing the error. If anything it is just exposing whatever error we have there. I mean, if I revert the commit on the top of current HEAD I can no longer reproduce the issue. Michal

2018-01-18 17:49 GMT+02:00 Michal Privoznik <mprivozn@redhat.com>:
On 01/18/2018 08:25 AM, Ján Tomko wrote:
On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote:
Hello libvirt-users list,
We're catching the same bug since 3.4.0 version (3.3.0 works OK). So, we have process that is permanently connected to libvirtd via socket and it is collecting stats, listening to events and control the VPSes.
When we try to 'shutdown' a number of VPSes we often catch the bug. One of VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, and there is the next error in the log:
Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virFileReadAll:1420 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\ x2d36\x2dDOMAIN1.scope/cpuacct.usage':
No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetValueStr:844 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\ x2d36\x2dDOMAIN1.scope/cpuacct.usage':
No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu account: Operation not permitted Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job (destroy, none) for domain DOMAIN1; current job is (query, none) owned by (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
I think only the last line matters. The bug is highly reproducible. We can easily catch it even when we call multiple 'virsh shutdown' in shell one by one.
When we shutdown the process connected to the socket - everything become OK and the bug is gone.
The system is used is Gentoo Linux, tried all modern versions of libvirt (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git)) and they have this bug. 3.3.0 works OK.
I don't see anything obvious stats related in the diff between 3.3.0 and 3.4.0. We have added reporting of the shutdown reason, but that's just parsing one more JSON reply we previously ignored.
Can you try running 'git bisect' to pinpoint the exact commit that caused this issue?
I am able to reproduce this issue, ran bisect and fount that the commit which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1.
https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h= aeda1b8c56dc58b0a413acc61bbea938b40499e1;hp=ec337aee9b20091d6f9f60b78f210d 55f812500b
But it's very unlikely that the commit is causing the error. If anything it is just exposing whatever error we have there. I mean, if I revert the commit on the top of current HEAD I can no longer reproduce the issue.
Michal
Michal, Ján, I've got the same results: Bisecting: 0 revisions left to test after this (roughly 0 steps) [aeda1b8c56dc58b0a413acc61bbea938b40499e1] qemu: monitor: do not report error on shutdown And yes, when I revert it in HEAD - the problem is gone.

2018-01-18 18:49 GMT+03:00 Michal Privoznik <mprivozn@redhat.com>:
On 01/18/2018 08:25 AM, Ján Tomko wrote:
On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote:
Hello libvirt-users list,
We're catching the same bug since 3.4.0 version (3.3.0 works OK). So, we have process that is permanently connected to libvirtd via socket and it is collecting stats, listening to events and control the VPSes.
When we try to 'shutdown' a number of VPSes we often catch the bug. One of VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, and there is the next error in the log:
Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virFileReadAll:1420 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage':
No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetValueStr:844 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage':
No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu account: Operation not permitted Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job (destroy, none) for domain DOMAIN1; current job is (query, none) owned by (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
I think only the last line matters. The bug is highly reproducible. We can easily catch it even when we call multiple 'virsh shutdown' in shell one by one.
When we shutdown the process connected to the socket - everything become OK and the bug is gone.
The system is used is Gentoo Linux, tried all modern versions of libvirt (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git)) and they have this bug. 3.3.0 works OK.
I don't see anything obvious stats related in the diff between 3.3.0 and 3.4.0. We have added reporting of the shutdown reason, but that's just parsing one more JSON reply we previously ignored.
Can you try running 'git bisect' to pinpoint the exact commit that caused this issue?
I am able to reproduce this issue, ran bisect and fount that the commit which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1.
https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=aeda1b8c56dc58b0a413ac...
But it's very unlikely that the commit is causing the error. If anything it is just exposing whatever error we have there. I mean, if I revert the commit on the top of current HEAD I can no longer reproduce the issue.
Yes, sometimes ago i'm already reported this in list and someone from virtuozzo/openvz team helps me to revert exactly this commit. -- Vasiliy Tolstov, e-mail: v.tolstov@selfip.ru

On 01/18/2018 04:49 PM, Michal Privoznik wrote:
On 01/18/2018 08:25 AM, Ján Tomko wrote:
On Wed, Jan 17, 2018 at 04:45:38PM +0200, Serhii Kharchenko wrote:
Hello libvirt-users list,
We're catching the same bug since 3.4.0 version (3.3.0 works OK). So, we have process that is permanently connected to libvirtd via socket and it is collecting stats, listening to events and control the VPSes.
When we try to 'shutdown' a number of VPSes we often catch the bug. One of VPSes sticks in 'in shutdown' state, no related 'qemu' process is present, and there is the next error in the log:
Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.005+0000: 20438: warning : qemuGetProcessInfo:1460 : cannot parse process status data Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virFileReadAll:1420 : Failed to open file '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage':
No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetValueStr:844 : Unable to read from '/sys/fs/cgroup/cpu,cpuacct/machine.slice/machine-qemu\x2d36\x2dDOMAIN1.scope/cpuacct.usage':
No such file or directory Jan 17 13:54:20 server1 libvirtd[20437]: 2018-01-17 13:54:20.006+0000: 20441: error : virCgroupGetDomainTotalCpuStats:3319 : unable to get cpu account: Operation not permitted Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: warning : qemuDomainObjBeginJobInternal:4862 : Cannot start job (destroy, none) for domain DOMAIN1; current job is (query, none) owned by (20440 remoteDispatchConnectGetAllDomainStats, 0 <null>) for (30s, 0s) Jan 17 13:54:23 server1 libvirtd[20437]: 2018-01-17 13:54:23.805+0000: 20522: error : qemuDomainObjBeginJobInternal:4874 : Timed out during operation: cannot acquire state change lock (held by remoteDispatchConnectGetAllDomainStats)
I think only the last line matters. The bug is highly reproducible. We can easily catch it even when we call multiple 'virsh shutdown' in shell one by one.
When we shutdown the process connected to the socket - everything become OK and the bug is gone.
The system is used is Gentoo Linux, tried all modern versions of libvirt (3.4.0, 3.7.0, 3.8.0, 3.9.0, 3.10.0, 4.0.0-rc2 (today's version from git)) and they have this bug. 3.3.0 works OK.
I don't see anything obvious stats related in the diff between 3.3.0 and 3.4.0. We have added reporting of the shutdown reason, but that's just parsing one more JSON reply we previously ignored.
Can you try running 'git bisect' to pinpoint the exact commit that caused this issue? I am able to reproduce this issue, ran bisect and fount that the commit which broke it is aeda1b8c56dc58b0a413acc61bbea938b40499e1.
https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=aeda1b8c56dc58b0a413ac...
But it's very unlikely that the commit is causing the error. If anything it is just exposing whatever error we have there. I mean, if I revert the commit on the top of current HEAD I can no longer reproduce the issue.
Hi, looks like we hit the same issue in oVirt, see for example https://bugzilla.redhat.com/show_bug.cgi?id=1532277 there is a fix planned, and/or a BZ entry I can track? Thanks, -- Francesco Romani Senior SW Eng., Virtualization R&D Red Hat IRC: fromani github: @fromanirh
participants (5)
-
Francesco Romani
-
Ján Tomko
-
Michal Privoznik
-
Serhii Kharchenko
-
Vasiliy Tolstov