
On Tue, Jul 20, 2021 at 15:52:58 +0800, Jiatong Shen wrote:
Hello community,
I am seeing following log in production,
2021-07-20 07:43:49.417+0000: 3918294: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:44:19.424+0000: 3918296: warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (modify, none) for domain instance-0000074e; current job is (modify, none) owned by (3919429 qemuProcessReconnect, 0 <null>) for (2183193s, 0s) 2021-07-20 07:44:19.424+0000: 3918296: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:44:49.428+0000: 3918296: warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query, none) for domain instance-0000074e; current job is (modify, none) owned by (3919429 qemuProcessReconnect, 0 <null>) for (2183223s, 0s) 2021-07-20 07:44:49.428+0000: 3918296: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:45:19.429+0000: 3918298: warning : qemuDomainObjBeginJobInternal:4933 :
I am confused about what is qemuProcessReconnect and why it acquires a domain state lock..
qemuProcessReconnect is an operation that is executed in a separate thread which re-establishes connection to a qemu process once you restart libvirtd. This usually means that the reconnection process got stuck for some reason. Unfortunately your log doesn't show why and unless you've got a debug log prior to that happening it won't be possible. Theoretically seeing which function the thread doing the reconnect is stuck at could perhaps show why. Usually the only fix for this is to destroy the VM which is stuck.