Hello everyone,

 

I’m looking at an issue where I do see guests freezing (Dl) process state during a block disk mirror from one storage to another storage (NFS) where the network stack of the guest can freeze for up to 10 seconds.

Looking at the storage and IO I noticed good throughput ad low latency <3ms and I am having trouble to track down the source for the issue, as neither storage nor networking  show issues. Interestingly when I do the same test with virtio-blk I do not really see the process freezes at the frequency or duration compared to virtio-scsi which seem to indicate a client side rather than storage side problem.

 

I had looked at the syscalls and nothing stuck out:

 

 

% time     seconds  usecs/call     calls    errors syscall

------ ----------- ----------- --------- --------- ----------------

28.51   20.672654        8339      2479           ioctl          

 27.81   20.162714        3379      5967        31 futex          

 22.02   15.964498         785     20335           poll           

 15.22   11.038403         150     73561           io_submit      

  4.17    3.023285          41     73540           lseek          

  1.20    0.868003           5    158591           write          

  0.63    0.459030          11     42871           ppoll          

  0.22    0.159263           8     19314           recvmsg        

  0.16    0.115520           5     22526           read           

  0.04    0.029149       29149         1           restart_syscall

  0.01    0.009252          28       330           sendmsg        

  0.00    0.001221        1221         1           munmap         

  0.00    0.000458          22        21           fcntl           

  0.00    0.000286          95         3           openat         

  0.00    0.000166           5        32           rt_sigprocmask 

  0.00    0.000103          10        10           fdatasync      

  0.00    0.000099          25         4           clone          

  0.00    0.000081           7        12           mmap           

  0.00    0.000077          19         4           close          

  0.00    0.000076           6        12           mprotect       

  0.00    0.000056          14         4           madvise        

  0.00    0.000025           6         4           set_robust_list

  0.00    0.000023           6         4           prctl          

------ ----------- ----------- --------- --------- ----------------

100.00   72.504442                419626        31 total

 

 

Does anyone have an idea how to better debug this issue ?

 

Thanks

Bjoern