Issue report: Snapshot-create-as block while the vm's process is killed out.

Hello, I have meet one issues while using snapshot-create-as with external file. Centos 8, Libivirt 7.0.0 Qemu: 5.2.0 While making one snapshot with memory and disk extaernal file, and killing the process of the vm qemu same time mostly, the snapshot-create-as command blocked. So it will failed to start the vm again for "cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML)". The log is as bellow: 2021-08-12 10:27:38.373+0000: 4344: error : qemuDomainAgentAvailable:8125 : Guest agent is not responding: QEMU guest agent is not connected 2021-08-12 10:27:38.573+0000: 4340: error : qemuDomainAgentAvailable:8125 : Guest agent is not responding: QEMU guest agent is not connected 2021-08-12 10:27:38.757+0000: 128211: error : qemuDomainAgentAvailable:8125 : Guest agent is not responding: QEMU guest agent is not connected 2021-08-12 10:27:39.397+0000: 4341: error : qemuDomainAgentAvailable:8125 : Guest agent is not responding: QEMU guest agent is not connected 2021-08-12 10:27:39.887+0000: 4309: error : virNetSocketReadWire:1817 : End of file while reading data: Input/output error 2021-08-12 10:27:53.735+0000: 4340: error : qemuDomainAgentAvailable:8125 : Guest agent is not responding: QEMU guest agent is not connected 2021-08-12 10:27:54.822+0000: 4344: error : qemuDomainAgentAvailable:8125 : Guest agent is not responding: QEMU guest agent is not connected 2021-08-12 10:27:55.230+0000: 4340: error : qemuDomainAgentAvailable:8125 : Guest agent is not responding: QEMU guest agent is not connected 2021-08-12 10:46:40.994+0000: 4344: warning : qemuDomainObjTaint:6187 : Domain id=95 name='vm_6_135' uuid=b2812e01-331a-4e65-912b-73be5ace2271 is tainted: high-privileges 2021-08-12 10:48:05.785+0000: 4309: error : virNetSocketReadWire:1817 : End of file while reading data: Input/output error 2021-08-12 10:49:50.440+0000: 1643666: error : qemuMonitorIO:583 : internal error: End of file from qemu monitor 2021-08-12 10:50:13.809+0000: 4341: warning : qemuDomainObjTaint:6187 : Domain id=96 name='vm_6_135' uuid=b2812e01-331a-4e65-912b-73be5ace2271 is tainted: high-privileges 2021-08-12 10:51:21.587+0000: 1646757: error : qemuMonitorIO:583 : internal error: End of file from qemu monitor 2021-08-12 10:52:28.421+0000: 4342: warning : qemuDomainObjBeginJobInternal:953 : Cannot start job (none, none, start) for domain vm_6_135; current job is (none, none, snapshot) owned by (0 <null>, 0 <null>, 4340 remoteDispatchDomainSnapshotCreateXML (flags=0x280)) for (0s, 0s, 67s) 2021-08-12 10:52:28.421+0000: 4342: error : qemuDomainObjBeginJobInternal:975 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) 2021-08-12 10:53:10.522+0000: 4342: warning : qemuDomainObjBeginJobInternal:953 : Cannot start job (none, none, start) for domain vm_6_135; current job is (none, none, snapshot) owned by (0 <null>, 0 <null>, 4340 remoteDispatchDomainSnapshotCreateXML (flags=0x280)) for (0s, 0s, 109s) 2021-08-12 10:53:10.522+0000: 4342: error : qemuDomainObjBeginJobInternal:975 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) 2021-08-12 10:53:17.821+0000: 128211: warning : qemuDomainObjBeginJobInternal:953 : Cannot start job (none, none, start) for domain vm_6_135; current job is (none, none, snapshot) owned by (0 <null>, 0 <null>, 4340 remoteDispatchDomainSnapshotCreateXML (flags=0x280)) for (0s, 0s, 117s) 2021-08-12 10:53:17.821+0000: 128211: error : qemuDomainObjBeginJobInternal:975 : Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML) ... ... 2021-08-13 05:49:41.240+0000: 4342: warning : qemuDomainObjBeginJobInternal:953 : Cannot start job (none, none, start) for domain vm_6_135; current job is (none, none, snapshot) owned by (0 <null>, 0 <null>, 4340 remoteDispatchDomainSnapshotCreateXML (flags=0x280)) for (0s, 0s, 68300s) Gdb attach the proces of the libvirt, found there are two suspicious threads, stack as below: gdb) bt #0 0x00007f8f91a492fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8f9574ed7a in virCondWait () from /lib64/libvirt.so.0 #2 0x00007f8f95784ef5 in virDomainObjWait () from /lib64/libvirt.so.0 #3 0x00007f8f422eb26d in qemuMigrationSrcWaitForCompletion () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #4 0x00007f8f422ee5d3 in qemuMigrationSrcToFile () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #5 0x00007f8f42331caa in qemuSaveImageCreate () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #6 0x00007f8f42336ac0 in qemuSnapshotCreateXML () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #7 0x00007f8f422b6ac1 in qemuDomainSnapshotCreateXML () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #8 0x00007f8f9591f604 in virDomainSnapshotCreateXML () from /lib64/libvirt.so.0 #9 0x0000562390841b81 in remoteDispatchDomainSnapshotCreateXMLHelper () #10 0x00007f8f9580e807 in virNetServerProgramDispatch () from /lib64/libvirt.so.0 #11 0x00007f8f95813b26 in virNetServerHandleJob () from /lib64/libvirt.so.0 #12 0x00007f8f9574f85f in virThreadPoolWorker () from /lib64/libvirt.so.0 #13 0x00007f8f9574eecb in virThreadHelper () from /lib64/libvirt.so.0 #14 0x00007f8f91a4314a in start_thread () from /lib64/libpthread.so.0 #15 0x00007f8f941f1f23 in clone () from /lib64/libc.so.6 (gdb) bt #0 0x00007f8f91a492fc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f8f9574ed7a in virCondWait () from /lib64/libvirt.so.0 #2 0x00007f8f95784ef5 in virDomainObjWait () from /lib64/libvirt.so.0 #3 0x00007f8f42257ac8 in qemuBlockStorageSourceCreateGeneric () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #4 0x00007f8f4225b88d in qemuBlockStorageSourceCreate () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #5 0x00007f8f423346ee in qemuSnapshotDiskPrepareOne () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #6 0x00007f8f423353f8 in qemuSnapshotCreateXML () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #7 0x00007f8f422b6ac1 in qemuDomainSnapshotCreateXML () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #8 0x00007f8f9591f604 in virDomainSnapshotCreateXML () from /lib64/libvirt.so.0 #9 0x0000562390841b81 in remoteDispatchDomainSnapshotCreateXMLHelper () #10 0x00007f8f9580e807 in virNetServerProgramDispatch () from /lib64/libvirt.so.0 #11 0x00007f8f95813b26 in virNetServerHandleJob () from /lib64/libvirt.so.0 #12 0x00007f8f9574f85f in virThreadPoolWorker () from /lib64/libvirt.so.0 #13 0x00007f8f9574eecb in virThreadHelper () from /lib64/libvirt.so.0 #14 0x00007f8f91a4314a in start_thread () from /lib64/libpthread.so.0 #15 0x00007f8f941f1f23 in clone () from /lib64/libc.so.6 (gdb) The call work flow one is: qemuSnapshotCreateActiveExternal -> qemuSaveImageCreate -> qemuMigrationSrcToFile -> qemuMigrationSrcWaitForCompletion -> virDomainObjWait The others is similar : qemuSnapshotCreateXML -> ... It may be resume after the restart libvirtd.service. Is it correct that the stat change lock should be unlocked while the vm is terminal or shutdown and how to fix it? Thanks! 13520781732@189.cn

On Mon, Aug 16, 2021 at 11:04:12 +0800, 13520781732@189.cn wrote:
Hello,
I have meet one issues while using snapshot-create-as with external file.
Centos 8, Libivirt 7.0.0 Qemu: 5.2.0
While making one snapshot with memory and disk extaernal file, and killing the process of the vm qemu same time mostly, the snapshot-create-as command blocked. So it will failed to start the vm again for "cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML)".
[...]
The call work flow one is: qemuSnapshotCreateActiveExternal -> qemuSaveImageCreate -> qemuMigrationSrcToFile -> qemuMigrationSrcWaitForCompletion -> virDomainObjWait
The others is similar : qemuSnapshotCreateXML -> ...
It may be resume after the restart libvirtd.service.
Is it correct that the stat change lock should be unlocked while the vm is terminal or shutdown and how to fix it?
The function handling the shutdown of the monitor is supposed to wake up all waiting threads, which should prevent such a problem. In this case the log attached is not enough to see what's really happening, so please attach a full debug log and ideally report the issue in our gitlab issue tracker: https://gitlab.com/libvirt/libvirt/-/issues/new Guidance on how to setup logging for debugging can be found at: https://libvirt.org/kbase/debuglogs.html#less-verbose-logging-for-qemu-vms

Hello, Peter, I submitted one issue as the link: https://gitlab.com/libvirt/libvirt/-/issues/201 Create memory snapshot while kill -9 the pid of the vm, snapshot-create-as command failed and the vm can not started again for blocked . (#201) · Issues · libvirt / libvirt · GitLab With libvirtd debug log level. The issue I reproduced with work flows. Thank You! 13520781732@189.cn From: Peter Krempa Date: 2021-08-16 17:17 To: 13520781732@189.cn CC: libvir-list Subject: Re: Issue report: Snapshot-create-as block while the vm's process is killed out. On Mon, Aug 16, 2021 at 11:04:12 +0800, 13520781732@189.cn wrote:
Hello,
I have meet one issues while using snapshot-create-as with external file.
Centos 8, Libivirt 7.0.0 Qemu: 5.2.0
While making one snapshot with memory and disk extaernal file, and killing the process of the vm qemu same time mostly, the snapshot-create-as command blocked. So it will failed to start the vm again for "cannot acquire state change lock (held by monitor=remoteDispatchDomainSnapshotCreateXML)".
[...]
The call work flow one is: qemuSnapshotCreateActiveExternal -> qemuSaveImageCreate -> qemuMigrationSrcToFile -> qemuMigrationSrcWaitForCompletion -> virDomainObjWait
The others is similar : qemuSnapshotCreateXML -> ...
It may be resume after the restart libvirtd.service.
Is it correct that the stat change lock should be unlocked while the vm is terminal or shutdown and how to fix it?
The function handling the shutdown of the monitor is supposed to wake up all waiting threads, which should prevent such a problem. In this case the log attached is not enough to see what's really happening, so please attach a full debug log and ideally report the issue in our gitlab issue tracker: https://gitlab.com/libvirt/libvirt/-/issues/new Guidance on how to setup logging for debugging can be found at: https://libvirt.org/kbase/debuglogs.html#less-verbose-logging-for-qemu-vms
participants (2)
-
13520781732@189.cn
-
Peter Krempa