[libvirt] Attempting to pivot disk with incomplete mirror results in hang

I have what appears to be a bug when pivoting a disk during a block copy that is not yet 100% finished, resulting in the pivot command hanging. I have verified this problem on libvirt 1.2.10. Here's what Im seeing (transient guest): 1.) Start the block copy: [root@host ~]# virsh blockcopy f20-SPICE vda /dev/sdc --format=raw --blockdev Block Copy started [root@host ~]# 2.) Query status to see it works/is started [root@host ~]# virsh blockjob --info f20-SPICE vda Block Copy: [ 1 %] [root@host ~]# 3.) Attempt the pivot before mirror reaches 100% (this makes cmd hang) [root@test-parent-kvm ~]# virsh blockjob f20-SPICE vda --pivot ^^^ cmd is now hanging ------------------------------------------------------------- When its hanging, I see this in /var/log/libvirt/libvirtd.log 19:29:33.376+0000: 1845: error : qemuDomainBlockPivot:15367 : block copy still active: disk 'vda' not ready for pivot yet 19:30:31.000+0000: 1842: warning : qemuDomainObjBeginJobInternal:1376 : Cannot start job (query, none) for domain f20-SPICE; current job is (modify, none) owned by (1845, 0) 19:30:31.000+0000: 1842: error : qemuDomainObjBeginJobInternal:1381 : Timed out during operation: cannot acquire state change lock This then makes other commands querying this domain hang as well, such as: virsh blockjob --info f20-SPICE vda With this being put into log: 19:30:31.000+0000: 1842: error : qemuDomainObjBeginJobInternal:1381 : Timed out during operation: cannot acquire state change lock 19:34:45.568+0000: 1841: error : virNetSocketReadWire:1571 : End of file while reading data: Input/output error Is this expected behavior? It would be nice if on step #3 instead of hanging the command returned in an error state saying cannot pivot because the mirror is not yet 100% or similar. I am happy to test later versions or patches as needed.

On 01/13/2015 02:43 PM, Scott Sullivan wrote:
I have what appears to be a bug when pivoting a disk during a block copy that is not yet 100% finished, resulting in the pivot command hanging. I have verified this problem on libvirt 1.2.10.
Here's what Im seeing (transient guest):
1.) Start the block copy:
[root@host ~]# virsh blockcopy f20-SPICE vda /dev/sdc --format=raw --blockdev Block Copy started [root@host ~]#
2.) Query status to see it works/is started
[root@host ~]# virsh blockjob --info f20-SPICE vda Block Copy: [ 1 %]
[root@host ~]#
3.) Attempt the pivot before mirror reaches 100% (this makes cmd hang)
[root@test-parent-kvm ~]# virsh blockjob f20-SPICE vda --pivot ^^^ cmd is now hanging
-------------------------------------------------------------
When its hanging, I see this in /var/log/libvirt/libvirtd.log
19:29:33.376+0000: 1845: error : qemuDomainBlockPivot:15367 : block copy still active: disk 'vda' not ready for pivot yet 19:30:31.000+0000: 1842: warning : qemuDomainObjBeginJobInternal:1376 : Cannot start job (query, none) for domain f20-SPICE; current job is (modify, none) owned by (1845, 0) 19:30:31.000+0000: 1842: error : qemuDomainObjBeginJobInternal:1381 : Timed out during operation: cannot acquire state change lock
This then makes other commands querying this domain hang as well, such as:
virsh blockjob --info f20-SPICE vda
With this being put into log:
19:30:31.000+0000: 1842: error : qemuDomainObjBeginJobInternal:1381 : Timed out during operation: cannot acquire state change lock 19:34:45.568+0000: 1841: error : virNetSocketReadWire:1571 : End of file while reading data: Input/output error
Is this expected behavior? It would be nice if on step #3 instead of hanging the command returned in an error state saying cannot pivot because the mirror is not yet 100% or similar. I am happy to test later versions or patches as needed.
I forgot to mention, I am using QEMU 2.2.0 in this case. Libvirt version (as mentioned earlier) is 1.2.10.

On 01/13/2015 12:43 PM, Scott Sullivan wrote:
I have what appears to be a bug when pivoting a disk during a block copy that is not yet 100% finished, resulting in the pivot command hanging. I have verified this problem on libvirt 1.2.10.
Known issue: https://bugzilla.redhat.com/show_bug.cgi?id=1134294
Here's what Im seeing (transient guest):
1.) Start the block copy:
[root@host ~]# virsh blockcopy f20-SPICE vda /dev/sdc --format=raw --blockdev Block Copy started [root@host ~]#
2.) Query status to see it works/is started
[root@host ~]# virsh blockjob --info f20-SPICE vda Block Copy: [ 1 %]
[root@host ~]#
3.) Attempt the pivot before mirror reaches 100% (this makes cmd hang)
[root@test-parent-kvm ~]# virsh blockjob f20-SPICE vda --pivot ^^^ cmd is now hanging
The attempt to pivot should have failed immediately. I'm working on a patch.
-------------------------------------------------------------
When its hanging, I see this in /var/log/libvirt/libvirtd.log
19:29:33.376+0000: 1845: error : qemuDomainBlockPivot:15367 : block copy still active: disk 'vda' not ready for pivot yet 19:30:31.000+0000: 1842: warning : qemuDomainObjBeginJobInternal:1376 : Cannot start job (query, none) for domain f20-SPICE; current job is (modify, none) owned by (1845, 0) 19:30:31.000+0000: 1842: error : qemuDomainObjBeginJobInternal:1381 : Timed out during operation: cannot acquire state change lock
This then makes other commands querying this domain hang as well, such as:
virsh blockjob --info f20-SPICE vda
With this being put into log:
19:30:31.000+0000: 1842: error : qemuDomainObjBeginJobInternal:1381 : Timed out during operation: cannot acquire state change lock 19:34:45.568+0000: 1841: error : virNetSocketReadWire:1571 : End of file while reading data: Input/output error
Is this expected behavior? It would be nice if on step #3 instead of hanging the command returned in an error state saying cannot pivot because the mirror is not yet 100% or similar. I am happy to test later versions or patches as needed.
I'll remember to cc you on the patch I post, hopefully later today :) -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 01/13/2015 03:05 PM, Eric Blake wrote:
On 01/13/2015 12:43 PM, Scott Sullivan wrote:
I have what appears to be a bug when pivoting a disk during a block copy that is not yet 100% finished, resulting in the pivot command hanging. I have verified this problem on libvirt 1.2.10. Known issue: https://bugzilla.redhat.com/show_bug.cgi?id=1134294
Here's what Im seeing (transient guest):
1.) Start the block copy:
[root@host ~]# virsh blockcopy f20-SPICE vda /dev/sdc --format=raw --blockdev Block Copy started [root@host ~]#
2.) Query status to see it works/is started
[root@host ~]# virsh blockjob --info f20-SPICE vda Block Copy: [ 1 %]
[root@host ~]#
3.) Attempt the pivot before mirror reaches 100% (this makes cmd hang)
[root@test-parent-kvm ~]# virsh blockjob f20-SPICE vda --pivot ^^^ cmd is now hanging The attempt to pivot should have failed immediately. I'm working on a patch.
-------------------------------------------------------------
When its hanging, I see this in /var/log/libvirt/libvirtd.log
19:29:33.376+0000: 1845: error : qemuDomainBlockPivot:15367 : block copy still active: disk 'vda' not ready for pivot yet 19:30:31.000+0000: 1842: warning : qemuDomainObjBeginJobInternal:1376 : Cannot start job (query, none) for domain f20-SPICE; current job is (modify, none) owned by (1845, 0) 19:30:31.000+0000: 1842: error : qemuDomainObjBeginJobInternal:1381 : Timed out during operation: cannot acquire state change lock
This then makes other commands querying this domain hang as well, such as:
virsh blockjob --info f20-SPICE vda
With this being put into log:
19:30:31.000+0000: 1842: error : qemuDomainObjBeginJobInternal:1381 : Timed out during operation: cannot acquire state change lock 19:34:45.568+0000: 1841: error : virNetSocketReadWire:1571 : End of file while reading data: Input/output error
Is this expected behavior? It would be nice if on step #3 instead of hanging the command returned in an error state saying cannot pivot because the mirror is not yet 100% or similar. I am happy to test later versions or patches as needed. I'll remember to cc you on the patch I post, hopefully later today :)
Excellent! I will be sure to test it. Likely will be tomorrow morning I will test it if you get one in later today. Thanks for your hard work.

On 01/13/2015 12:43 PM, Scott Sullivan wrote:
I have what appears to be a bug when pivoting a disk during a block copy that is not yet 100% finished, resulting in the pivot command hanging. I have verified this problem on libvirt 1.2.10.
I couldn't reproduce with the latest libvirt. After more research, I think the problem was fixed for 1.2.11 with this commit: commit fe3691f66348d55e88c9811fd79ff9314e053977 Author: Erik Skultety <eskultet@redhat.com> Date: Wed Dec 3 13:56:47 2014 +0100 qemu: Fix virsh freeze when blockcopy storage file is removed If someone removes blockcopy storage file when still in mirroring phase and then requesting blockjob abort using pivot, virsh cmd freezes. This is not an issue with older qemu versions which did not support asynchronous jobs (which we prefer by default). As we have reached the mirroring phase successfully, polling monitor for blockjob info always returns 1 and the loop never ends. This fix introduces a check for qemuDomainBlockPivot return code, possibly skipping the asynchronous waiting completely, if an error occurred and asynchronous waiting was the preferred method. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1139567 -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 01/13/2015 11:48 PM, Eric Blake wrote:
I have what appears to be a bug when pivoting a disk during a block copy that is not yet 100% finished, resulting in the pivot command hanging. I have verified this problem on libvirt 1.2.10. I couldn't reproduce with the latest libvirt. After more research, I
On 01/13/2015 12:43 PM, Scott Sullivan wrote: think the problem was fixed for 1.2.11 with this commit:
commit fe3691f66348d55e88c9811fd79ff9314e053977 Author: Erik Skultety <eskultet@redhat.com> Date: Wed Dec 3 13:56:47 2014 +0100
qemu: Fix virsh freeze when blockcopy storage file is removed
If someone removes blockcopy storage file when still in mirroring phase and then requesting blockjob abort using pivot, virsh cmd freezes. This is not an issue with older qemu versions which did not support asynchronous jobs (which we prefer by default). As we have reached the mirroring phase successfully, polling monitor for blockjob info always returns 1 and the loop never ends. This fix introduces a check for qemuDomainBlockPivot return code, possibly skipping the asynchronous waiting completely, if an error occurred and asynchronous waiting was the preferred method.
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1139567
Thanks Eric. I can confirm this is corrected in 1.2.11 now.
participants (2)
-
Eric Blake
-
Scott Sullivan