On Thursday, 14 October 2021 09:28:12 CEST Michal Prívozník wrote:
On 10/13/21 2:54 PM, Vojtech Juranek wrote:
> Hi,
> I'm trying to find the root cause for BZ #1898049 [1]. When setting up
> Windows HA cluster on Windows Server VMs run on top of oVirt, Windows
> cluster validator runs couple of tests and fails during test "Validate
> SCSI-3 Persistent Reservation" and one of the VMs of>
> the cluster is paused with IO error. Disk definition is as follows:
> <disk type='block' device='lun' sgio='unfiltered'
snapshot='no'>
>
> <driver name='qemu' type='raw' cache='none'
error_policy='stop'
> io='native'/> <source
> dev='/dev/mapper/3600a09803830447a4f244c4657616f6f'
index='1'>>
> <seclabel model='dac' relabel='no'/>
> <reservations managed='yes'>
>
> <source type='unix'
> path='/var/lib/libvirt/qemu/domain-1-Windows-2016-2/pr-helper0.
> sock' mode='client'/>>
> </reservations>
>
> </source>
> <backingStore/>
> <target dev='sdb' bus='scsi'/>
> <shareable/>
> <alias name='ua-26b4975e-e1d4-4e27-b2c6-2ea0894a571b'/>
> <address type='drive' controller='0' bus='0'
target='0' unit='1'/>
>
> </disk>
>
> and libvirt error I get is bellow [2].
>
> When I try to create reservation from Windows VM manually, I get following
> error (but not sure I do it whole process correctly):
>
> .\sg_persist --out --register --param-sark=123abc e:
> QEMU QEMU HARDDISK 2.5+
> Peripheral device type: disk
>
> PR out (Register): command not supported
> sg_persist failed: Illegal request, Invalid opcode
>
>
> Do you have any ideas what could be wrong or how to determine
> the root cause of this this issue?
>
> Thanks in advance.
> Vojta
>
>
> [1]
https://bugzilla.redhat.com/1898049
> [2] libvirt debug log:
>
> 2021-10-12 11:43:25.148+0000: 2006427: debug : qemuMonitorEmitIOError:1243
> : mon=0x7fb02006a020 2021-10-12 11:43:25.148+0000: 2006427: info :
> virObjectRef:402 : OBJECT_REF: obj=0x7fb02006a020 2021-10-12
> 11:43:25.148+0000: 2006427: info : virObjectRef:402 : OBJECT_REF:
> obj=0x7fafd0130020 2021-10-12 11:43:25.148+0000: 2000208: info :
> virObjectRef:402 : OBJECT_REF: obj=0x7fafd010d340 2021-10-12
> 11:43:25.148+0000: 2006427: info : virObjectNew:258 : OBJECT_NEW:
> obj=0x7fb020082590 classname=virDomainEventIOError 2021-10-12
> 11:43:25.148+0000: 2000208: info : vir_object_finalize:321 :
> OBJECT_DISPOSE: obj=0x7fb020082500 2021-10-12 11:43:25.148+0000: 2006427:
> info : virObjectNew:258 : OBJECT_NEW: obj=0x7fb020082620
> classname=virDomainEventIOError 2021-10-12 11:43:25.148+0000: 2000208:
> info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7fb020082500 2021-10-12
> 11:43:25.148+0000: 2006427: debug : qemuProcessHandleIOError:907 :
> Transitioned guest Windows-2016-2 to paused state due to IO error
> 2021-10-12 11:43:25.148+0000: 2000208: info : virObjectUnref:380 :
> OBJECT_UNREF: obj=0x7fafd010d340 2021-10-12 11:43:25.148+0000: 2006427:
> info : virObjectNew:258 : OBJECT_NEW: obj=0x7fafbc1fb8c0
> classname=virDomainEventLifecycle 2021-10-12 11:43:25.148+0000: 2006427:
> debug : virDomainLockProcessPause:204 : plugin=0x7fafd01272a0
> dom=0x7fb01400f5e0 state=0x7fb02401d768 2021-10-12 11:43:25.148+0000:
> 2006427: debug : virDomainLockManagerNew:134 : plugin=0x7fafd01272a0
> dom=0x7fb01400f5e0 withResources=1 2021-10-12 11:43:25.148+0000: 2006427:
> debug : virLockManagerPluginGetDriver:276 : plugin=0x7fafd01272a0
> 2021-10-12 11:43:25.148+0000: 2006427: debug : virLockManagerNew:300 :
> driver=0x7fafd444a000 type=0 nparams=5 params=0x7fafd77de640 flags=0x0
> 2021-10-12 11:43:25.148+0000: 2006427: debug : virLockManagerLogParams:97
> : key=uuid type=uuid value=70eee88c-ba2c-4c6c-bd51-c2b663db27f8
> 2021-10-12 11:43:25.148+0000: 2006427: debug : virLockManagerLogParams:90
> : key=name type=string value=Windows-2016-2 2021-10-12
> 11:43:25.148+0000: 2006427: debug : virLockManagerLogParams:78 : key=id
> type=uint value=1 2021-10-12 11:43:25.148+0000: 2006427: debug :
> virLockManagerLogParams:78 : key=pid type=uint value=2006418 2021-10-12
> 11:43:25.148+0000: 2006427: debug : virLockManagerLogParams:93 :
> key=uri type=cstring value=(null) 2021-10-12 11:43:25.148+0000: 2006427:
> debug : virDomainLockManagerNew:146 : Adding leases 2021-10-12
> 11:43:25.148+0000: 2006427: debug : virDomainLockManagerNew:151 : Adding
> disks 2021-10-12 11:43:25.149+0000: 2006427: debug :
> virDomainLockManagerAddImage:90 : Add disk
> /rhev/data-center/mnt/blockSD/7c4f09b6-9e87-436f-bda9-22d1f0b50955/images
> /f5d6e074-dfe9-462d-8cfd-3e14b0eb5aea/766e36b2-84a6-43e7-a48b-a5f47e669860
> 2021-10-12 11:43:25.149+0000: 2006427: debug :
> virLockManagerAddResource:326 : lock=0x7fafbc19e250 type=0
> name=/rhev/data-center/mnt/blockSD/7c4f09b6-9e87-436f-bda9-22d1f0b50955/i
> mages/f5d6e074-dfe9-462d-8cfd-3e14b0eb5aea/766e36b2-84a6-43e7-a48b-a5f47e6
> 69860 nparams=0 params=(nil) flags=0x0 2021-10-12 11:43:25.149+0000:
> 2006427: debug : virDomainLockManagerAddImage:90 : Add disk
> /dev/mapper/3600a09803830447a4f244c4657616f6f 2021-10-12
> 11:43:25.149+0000: 2006427: debug : virLockManagerAddResource:326 :
> lock=0x7fafbc19e250 type=0
> name=/dev/mapper/3600a09803830447a4f244c4657616f6f nparams=0 params=(nil)
> flags=0x2 2021-10-12 11:43:25.149+0000: 2006427: debug :
> virLockManagerRelease:359 : lock=0x7fafbc19e250 state=0x7fb02401d768
> flags=0x0 2021-10-12 11:43:25.149+0000: 2006427: debug :
> virLockManagerFree:381 : lock=0x7fafbc19e250 2021-10-12
> 11:43:25.149+0000: 2006427: debug : qemuProcessHandleIOError:920 :
> Preserving lock state '<null>' 2021-10-12 11:43:25.150+0000: 2006427:
> info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7fafd0130020 2021-10-12
> 11:43:25.150+0000: 2006427: info : virObjectUnref:380 : OBJECT_UNREF:
> obj=0x7fb02006a020 2021-10-12 11:43:25.150+0000: 2000208: info :
> virObjectRef:402 : OBJECT_REF: obj=0x7fafd010d340 2021-10-12
> 11:43:25.150+0000: 2000208: info : vir_object_finalize:321 :
> OBJECT_DISPOSE: obj=0x7fb020082590 2021-10-12 11:43:25.150+0000: 2006427:
> info : virObjectRef:402 : OBJECT_REF: obj=0x7fb02006a020 2021-10-12
> 11:43:25.150+0000: 2000208: info : virObjectUnref:380 : OBJECT_UNREF:
> obj=0x7fb020082590 2021-10-12 11:43:25.150+0000: 2006427: info :
> virObjectUnref:380 : OBJECT_UNREF: obj=0x7fb02006a020 2021-10-12
> 11:43:25.150+0000: 2000208: info : virObjectNew:258 : OBJECT_NEW:
> obj=0x564a89a4cc60 classname=virDomain 2021-10-12 11:43:25.150+0000:
> 2006427: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x7fb02006a020
> 2021-10-12 11:43:25.150+0000: 2000208: info : virObjectRef:402 :
> OBJECT_REF: obj=0x7fafd0018ca0 2021-10-12 11:43:25.150+0000: 2000208:
> info : virObjectRef:402 : OBJECT_REF: obj=0x564a89978df0 2021-10-12
> 11:43:25.150+0000: 2000208: debug : virAccessManagerCheckDomain:238 :
> manager=0x564a89978df0(name=stack) driver=QEMU domain=0x7ffd2c677010
> perm=0 2021-10-12 11:43:25.150+0000: 2000208: debug :
> virAccessManagerCheckDomain:238 : manager=0x564a89978e50(name=none)
> driver=QEMU domain=0x7ffd2c677010 perm=0 2021-10-12 11:43:25.150+0000:
> 2000208: info : virObjectUnref:380 : OBJECT_UNREF: obj=0x564a89978df0
> 2021-10-12 11:43:25.150+0000: 2000208: debug :
> remoteRelayDomainEventIOErrorReason:529 : Relaying domain io error
> Windows-2016-2 1 /dev/mapper/3600a09803830447a4f244c4657616f6f
> ua-26b4975e-e1d4-4e27-b2c6-2ea0894a571b 1 , callback 3
Unfortunately, you cut off the log a bit early/late; there should be a
message from qemu's monitor mentioning BLOCK_IO_ERROR; Can you check
your logs please?
yes, it's few lines earlier:
2021-10-12 11:43:22.297+0000: 2000213: info : virObjectUnref:380 : OBJECT_UNREF:
obj=0x564a89a1d010
2021-10-12 11:43:22.297+0000: 2000213: info : virObjectUnref:380 : OBJECT_UNREF:
obj=0x564a89a46360
2021-10-12 11:43:22.297+0000: 2000208: debug : virNetMessageFree:85 : msg=0x564a89a3d510
nfds=0 cb=(nil)
2021-10-12 11:43:22.297+0000: 2000208: debug : virNetServerClientCalculateHandleMode:154 :
tls=(nil) hs=-1, rx=0x564a89a48620 tx=(nil)
2021-10-12 11:43:22.297+0000: 2000208: debug : virNetServerClientCalculateHandleMode:185 :
mode=01
2021-10-12 11:43:25.148+0000: 2006427: info : virObjectRef:402 : OBJECT_REF:
obj=0x7fb02006a020
2021-10-12 11:43:25.148+0000: 2006427: debug : qemuMonitorEmitEvent:1166 :
mon=0x7fb02006a020 event=BLOCK_IO_ERROR
2021-10-12 11:43:25.148+0000: 2006427: info : virObjectRef:402 : OBJECT_REF:
obj=0x7fb02006a020
2021-10-12 11:43:25.148+0000: 2006427: debug : qemuProcessHandleEvent:574 :
vm=0x7fb01400f5e0
2021-10-12 11:43:25.148+0000: 2006427: info : virObjectNew:258 : OBJECT_NEW:
obj=0x7fb020082500 classname=virDomainQemuMonitorEvent
2021-10-12 11:43:25.148+0000: 2006427: info : virObjectUnref:380 : OBJECT_UNREF:
obj=0x7fb02006a020
2021-10-12 11:43:25.148+0000: 2006427: debug : qemuMonitorEmitIOError:1243 :
mon=0x7fb02006a020
However, it seem the issue was caused by using error_policy='stop' in
<driver name='qemu' type='raw' cache='none'
error_policy='stop' io='native'/>
I need to double check, but switching to error_policy='report' seems to fix the
issue
and HA validator succeeds (probably one of the test is testing concurrent reservations
and validating that one reservation fails?).
Thanks
Vojta