investigate locks on a domain

Hello community, I am seeing following log in production, 2021-07-20 07:43:49.417+0000: 3918294: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:44:19.424+0000: 3918296: warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (modify, none) for domain instance-0000074e; current job is (modify, none) owned by (3919429 qemuProcessReconnect, 0 <null>) for (2183193s, 0s) 2021-07-20 07:44:19.424+0000: 3918296: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:44:49.428+0000: 3918296: warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query, none) for domain instance-0000074e; current job is (modify, none) owned by (3919429 qemuProcessReconnect, 0 <null>) for (2183223s, 0s) 2021-07-20 07:44:49.428+0000: 3918296: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:45:19.429+0000: 3918298: warning : qemuDomainObjBeginJobInternal:4933 : I am confused about what is qemuProcessReconnect and why it acquires a domain state lock.. -- Best Regards, Jiatong Shen

On Tue, Jul 20, 2021 at 15:52:58 +0800, Jiatong Shen wrote:
Hello community,
I am seeing following log in production,
2021-07-20 07:43:49.417+0000: 3918294: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:44:19.424+0000: 3918296: warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (modify, none) for domain instance-0000074e; current job is (modify, none) owned by (3919429 qemuProcessReconnect, 0 <null>) for (2183193s, 0s) 2021-07-20 07:44:19.424+0000: 3918296: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:44:49.428+0000: 3918296: warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query, none) for domain instance-0000074e; current job is (modify, none) owned by (3919429 qemuProcessReconnect, 0 <null>) for (2183223s, 0s) 2021-07-20 07:44:49.428+0000: 3918296: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:45:19.429+0000: 3918298: warning : qemuDomainObjBeginJobInternal:4933 :
I am confused about what is qemuProcessReconnect and why it acquires a domain state lock..
qemuProcessReconnect is an operation that is executed in a separate thread which re-establishes connection to a qemu process once you restart libvirtd. This usually means that the reconnection process got stuck for some reason. Unfortunately your log doesn't show why and unless you've got a debug log prior to that happening it won't be possible. Theoretically seeing which function the thread doing the reconnect is stuck at could perhaps show why. Usually the only fix for this is to destroy the VM which is stuck.

On 7/20/21 11:17 AM, Peter Krempa wrote:
On Tue, Jul 20, 2021 at 15:52:58 +0800, Jiatong Shen wrote:
Hello community,
I am seeing following log in production,
2021-07-20 07:43:49.417+0000: 3918294: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:44:19.424+0000: 3918296: warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (modify, none) for domain instance-0000074e; current job is (modify, none) owned by (3919429 qemuProcessReconnect, 0 <null>) for (2183193s, 0s) 2021-07-20 07:44:19.424+0000: 3918296: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:44:49.428+0000: 3918296: warning : qemuDomainObjBeginJobInternal:4933 : Cannot start job (query, none) for domain instance-0000074e; current job is (modify, none) owned by (3919429 qemuProcessReconnect, 0 <null>) for (2183223s, 0s) 2021-07-20 07:44:49.428+0000: 3918296: error : qemuDomainObjBeginJobInternal:4945 : Timed out during operation: cannot acquire state change lock (held by qemuProcessReconnect) 2021-07-20 07:45:19.429+0000: 3918298: warning : qemuDomainObjBeginJobInternal:4933 :
I am confused about what is qemuProcessReconnect and why it acquires a domain state lock..
qemuProcessReconnect is an operation that is executed in a separate thread which re-establishes connection to a qemu process once you restart libvirtd.
This usually means that the reconnection process got stuck for some reason. Unfortunately your log doesn't show why and unless you've got a debug log prior to that happening it won't be possible.
Theoretically seeing which function the thread doing the reconnect is stuck at could perhaps show why.
This should be pretty easy to see though. All that's needed is attaching gdb an running backtrace. Ideally over all threads as you don't know which one is stuck trying to reconnect. Something like the following: gdb -q -p $(pgrep libvirtd) -ex 't a a bt' -ex 'q' Just make sure you have debuginfo installed for libvirt otherwise the stacktrace won't be useful. Michal
participants (3)
-
Jiatong Shen
-
Michal Prívozník
-
Peter Krempa