Hello everyone,
I’m looking at an issue where I do see guests freezing (Dl) process state during a block disk mirror from one storage to another storage (NFS) where the network stack of the guest can freeze for up to 10 seconds.
Looking at the storage and IO I noticed good throughput ad low latency <3ms and I am having trouble to track down the source for the issue, as neither storage nor networking show issues. Interestingly when I do the same test with virtio-blk I do not really see the process freezes at the frequency or duration compared to virtio-scsi which seem to indicate a client side rather than storage side problem.
I had looked at the syscalls and nothing stuck out:
%
time
seconds
usecs/call
calls
errors
syscall
------ ----------- ----------- --------- --------- ----------------
28.51
20.672654
8339
2479
ioctl
27.81
20.162714
3379
5967
31
futex
22.02
15.964498
785
20335
poll
15.22
11.038403
150
73561
io_submit
4.17
3.023285
41
73540
lseek
1.20
0.868003
5
158591
write
0.63
0.459030
11
42871
ppoll
0.22
0.159263
8
19314
recvmsg
0.16
0.115520
5
22526
read
0.04
0.029149
29149
1
restart_syscall
0.01
0.009252
28
330
sendmsg
0.00
0.001221
1221
1
munmap
0.00
0.000458
22
21
fcntl
0.00
0.000286
95
3
openat
0.00
0.000166
5
32
rt_sigprocmask
0.00
0.000103
10
10
fdatasync
0.00
0.000099
25
4 clone
0.00
0.000081
7
12
mmap
0.00
0.000077
19
4
close
0.00
0.000076
6
12
mprotect
0.00
0.000056
14
4
madvise
0.00
0.000025
6
4
set_robust_list
0.00
0.000023
6
4
prctl
------ ----------- ----------- --------- --------- ----------------
100.00
72.504442
419626
31
total
Does anyone have an idea how to better debug this issue ?
Thanks
Bjoern