Re: [libvirt] blkio cgroup

22 Feb 2011

      On Mon, Feb 21, 2011 at 09:30:32AM +0100, Dominik Klein wrote:
...
On 02/21/2011 09:19 AM, Dominik Klein wrote:
...
...
...
- Is it possible to capture 10-15 second blktrace on your underlying
  physical device. That should give me some idea what's happening.
Will do, read on.
Just realized I missed this one ... Had better done it right away.
So here goes.
Setup as in first email. 8 Machines, 2 important, 6 not important ones
with a throttle of ~10M. group_isolation=1. Each vm dd'ing zeroes.
blktrace -d /dev/sdb -w 30
=== sdb ===
  CPU  0:                 4769 events,      224 KiB data
  CPU  1:                28079 events,     1317 KiB data
  CPU  2:                 1179 events,       56 KiB data
  CPU  3:                 5529 events,      260 KiB data
  CPU  4:                  295 events,       14 KiB data
  CPU  5:                  649 events,       31 KiB data
  CPU  6:                  185 events,        9 KiB data
  CPU  7:                  180 events,        9 KiB data
  CPU  8:                   17 events,        1 KiB data
  CPU  9:                   12 events,        1 KiB data
  CPU 10:                    6 events,        1 KiB data
  CPU 11:                   55 events,        3 KiB data
  CPU 12:                28005 events,     1313 KiB data
  CPU 13:                 1542 events,       73 KiB data
  CPU 14:                 4814 events,      226 KiB data
  CPU 15:                  389 events,       19 KiB data
  CPU 16:                 1545 events,       73 KiB data
  CPU 17:                  119 events,        6 KiB data
  CPU 18:                 3019 events,      142 KiB data
  CPU 19:                   62 events,        3 KiB data
  CPU 20:                  800 events,       38 KiB data
  CPU 21:                   17 events,        1 KiB data
  CPU 22:                  243 events,       12 KiB data
  CPU 23:                    1 events,        1 KiB data
  Total:                 81511 events (dropped 0),     3822 KiB data
Very constant 296 blocked processes in vmstat during this run. But...
apparently no data is written at all (see "bo" column).
Hm..., this sounds bad. If you have put a limit of ~10Mb/s then no
"bo" is bad. That would explain that why your box is not responding
and you need to do power reset.

- I am assuming that you have not put any throttling limits on root group.
  Is your system root also on /dev/sdb or on a separate disk altogether.

- This sounds like a bug in throttling logic. To narrow it down can you
  start running "deadline" on end device. If it still happens, it is more
  or less in throttling layer.

- We can also try to remove dm layers and just create partitions on 
  /dev/sdb and export as virtio disks to virtual machines and take
  dm layer out of picture and see if it still happens.

- In one of the mails you mentioned that with 1 virutal machine throttling
  READs and WRITEs is working for you. So it looks like 1 virtual machine
  does not hang but once you launch 8 virtual machines it hangs. Can we
  try increasing the number of vitual machines gragually and confirm that
  it happens only if some certain number of virtual machines are launched.

- Can you also paste me the rules you have put on important and non-important
  groups. Somehow I suspect that some of the rule has gone horribly bad
  in the sense that it is very low and effectively no virtual machine
  is making any progress.

- How long does it take to reach in this locked state where bo=0.

- you can also try to redirect blktrace output to blkparse and redirect
  it to standard output and see capture some output by copying pasting
  last messages.

In the mean time, I will try to launch more machines and see if I can
reproduce the issue.

Thanks
Vivek
...
...
vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
 0 296      0 125254224  21432 142016    0    0    16   633  181  331  0
 0 93  7
 0 296      0 125253728  21432 142016    0    0     0     0 17115 33794
 0  0 25 75
 0 296      0 125254112  21432 142016    0    0     0     0 17084 33721
 0  0 25 74
 1 296      0 125254352  21440 142012    0    0     0    18 17047 33736
 0  0 25 75
 0 296      0 125304224  21440 131060    0    0     0     0 17630 33989
 0  1 23 76
 1 296      0 125306496  21440 130260    0    0     0     0 16810 33401
 0  0 20 80
 4 296      0 125307208  21440 129856    0    0     0     0 17169 33744
 0  0 26 74
 0 296      0 125307496  21448 129508    0    0     0    14 17105 33650
 0  0 36 64
 0 296      0 125307712  21452 129672    0    0     2  1340 17117 33674
 0  0 22 78
 1 296      0 125307752  21452 129520    0    0     0     0 16875 33438
 0  0 29 70
 1 296      0 125307776  21452 129520    0    0     0     0 16959 33560
 0  0 21 79
 1 296      0 125307792  21460 129520    0    0     0    12 16700 33239
 0  0 15 85
 1 296      0 125307808  21460 129520    0    0     0     0 16750 33274
 0  0 25 74
 1 296      0 125307808  21460 129520    0    0     0     0 17020 33601
 0  0 26 74
 1 296      0 125308272  21460 129520    0    0     0     0 17080 33616
 0  0 20 80
 1 296      0 125308408  21460 129520    0    0     0     0 16428 32972
 0  0 42 58
 1 296      0 125308016  21460 129524    0    0     0     0 17021 33624
 0  0 22 77
While we're on that ... It is impossible for me now to recover from this
state without pulling the power plug.
On the VMs console I see messages like
INFO: task (kjournald|flush-254|dd|rs:main|...) blocked for more than
120 seconds.
If VMs are completely blocked and not making any progress, it is expected.
...
While the ssh sessions through which the dd was started seem intact
(pressing enter gives a new line), it is impossible to cancel the dd
command. Logging in on the VMs console also is impossible.
Opening a new ssh session to the host also does not work. Killing the
qemu-kvm processes from a session opened earlier leaves zomby processes.
Moving the VMs back to the root cgroup makes no difference either.
Regards
Dominik
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] blkio cgroup

Vivek Goyal