On Wed, Mar 30, 2022 at 01:11:50 +0000, Tobias Hofmann (tohofman) wrote:
Hello all,
I have a system with one VM running. After some time the VM needs to be temporarily
stopped and started again. This start operation fails and from that point on any virsh
command is hanging and does not execute.
This issue is reproducible and I have already figured out that restarting libvirtd
resolves this issue. However, I’m now trying to understand why it’s getting stuck in the
first place.
I try not to get too much into detail because I think this would be more confusing than
it would actually help to understand the problem. In general, I’m wondering what approach
you should follow to debug why libvirt gets stuck.
Online I’ve read that you should run this command: `# gdb -batch -p $(pidof libvirtd) -ex
't a a bt f'`. I’ve run that command and attached the output to this mail.
However, I have to admit that I have no idea what to do with it.
So from the backtrace:
Thread 18 (Thread 0x7f58bcf3e700 (LWP 19010)):
#0 0x00007f58c9e3a9f5 in pthread_cond_wait@(a)GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x00007f58ccc79e26 in virCondWait () from /lib64/libvirt.so.0
#2 0x00007f58a046f70b in qemuMonitorSend () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#3 0x00007f58a04811d0 in qemuMonitorJSONCommandWithFd () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#4 0x00007f58a0482a01 in qemuMonitorJSONSetCapabilities () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#5 0x00007f58a044f567 in qemuConnectMonitor () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#6 0x00007f58a04506f8 in qemuProcessWaitForMonitor () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#7 0x00007f58a0456a52 in qemuProcessLaunch () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#8 0x00007f58a045a4b2 in qemuProcessStart () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#9 0x00007f58a04bd5c6 in qemuDomainObjStart.constprop.50 () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#10 0x00007f58a04bdd26 in qemuDomainCreateWithFlags () from
/usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#11 0x00007f58cce1524c in virDomainCreate () from /lib64/libvirt.so.0
#12 0x000055e7499a3da3 in remoteDispatchDomainCreateHelper ()
Libvirtd is not actually entirely stuck but it's waiting for qemu to
start communicating with us when starting the VM.
Now why qemu got stuck it's not clear from this.
To un-stuck libvirt it should not be needed to restart it, but it should
be simply enough to 'virsh destroy $VM', which kills of the stuck qemu.
Now you can use the same approach you did with collecting the backtrace
of libvirtd to check what qemu is doing.
Additionally you can also have a look in /var/log/libvirt/qemu/$VM.log
to see whether qemu logged something.
System related info:
* OS: CentOS 7.8.2003
* libvirt version: 4.5.0-33