Hi
to be as verbose and clear as possible, I will write every command I use.
This sets everything up for the test:
# setup start
importantvms="kernel1 kernel2"
notimportantvms="kernel3 kernel4 kernel5 kernel6 kernel7 kernel8"
for vm in $importantvms $notimportantvms; do
virsh domstate $vm|grep -q running || virsh start $vm;
done
echo 1 > /sys/block/sdb/queue/iosched/group_isolation
mount -t cgroup -o blkio none /mnt
cd /mnt
mkdir important
mkdir notimportant
for vm in $notimportantvms; do
cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
for task in *; do
/bin/echo $task > /mnt/notimportant/tasks
done
done
for vm in $importantvms; do
cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
for task in *; do
/bin/echo $task > /mnt/important/tasks
done
done
#ls -lH /dev/vdisks/kernel[3-8]
#brw-rw---- 1 root root 254, 25 Feb 22 13:42 kernel3
#brw-rw---- 1 root root 254, 26 Feb 22 13:42 kernel4
#brw-rw---- 1 root root 254, 27 Feb 22 13:42 kernel5
#brw-rw---- 1 root root 254, 28 Feb 22 13:42 kernel6
#brw-rw---- 1 root root 254, 29 Feb 22 13:43 kernel7
#brw-rw---- 1 root root 254, 30 Feb 22 13:43 kernel8
for i in $(seq 25 30); do
/bin/echo 254:$i 10000000 >
/mnt/notimportant/blkio.throttle.write_bps_device
done
# setup complete
Hm..., this sounds bad. If you have put a limit of ~10Mb/s then no
"bo" is bad. That would explain that why your box is not responding
and you need to do power reset.
- I am assuming that you have not put any throttling limits on root group.
Is your system root also on /dev/sdb or on a separate disk altogether.
No throttling on root. Correct.
system root is on sda
vms are on sdb
- This sounds like a bug in throttling logic. To narrow it down can
you
start running "deadline" on end device. If it still happens, it is more
or less in throttling layer.
cat /sys/block/sdb/queue/scheduler
noop deadline [cfq]
echo deadline > /sys/block/sdb/queue/scheduler
cat /sys/block/sdb/queue/scheduler
noop [deadline] cfq
This changes things:
vmstat before test (nothing going on):
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 130231696 17968 69424 0 0 0 0 16573 32834
0 0 100 0
vmstat during test while all 8 vms are writing (2 unthrottled, 6 at 10M):
r b swpd free buff cache si so bi bo in cs us sy
id wa
2 9 0 126257984 17968 69424 0 0 8 164462 20114 36751
5 4 52 39
vmstat during test when only throttled vms are still dd'ing
2 21 0 124518928 17976 69424 0 0 0 63876 17410 33670
1 1 59 39
Load is 12-ish during the test and the results are as expected:
Throttled VMs write with approx 10M, unthrottled get like 80 each, which
sums up to about the 200M max capacity of the device for direct io.
- We can also try to remove dm layers and just create partitions on
/dev/sdb and export as virtio disks to virtual machines and take
dm layer out of picture and see if it still happens.
Did that. Also happens when scheduler is cfq. Also does not happen when
scheduler is deadline. So it does not seem to be a dm issue.
- In one of the mails you mentioned that with 1 virutal machine
throttling
READs and WRITEs is working for you. So it looks like 1 virtual machine
does not hang but once you launch 8 virtual machines it hangs. Can we
try increasing the number of vitual machines gragually and confirm that
it happens only if some certain number of virtual machines are launched.
Back on cfq here.
1 throttled vm: works, load ~4
2 throttled vms: works, load ~6
3 throttled vms: works, load ~9
4 throttled vms: works, load ~12
6 throttled vms: works, load ~20
The number of blocked threads increases with the number of vms dd'ing.
At the beginning of each test, the blocked threads number goes really
high (4 vms 160, 6 vms 220), but then drops significantly and stays low.
So it seems that when only throttled vms are running, the problem does
not occur.
1 throttled + 1 unthrottled vm: works, load ~5
2 throttled + 1 unthrottled vm: boom
Constantly 144 blocked threads, bo=0, load increasing to 144. System
needs power reset.
So, thinking about what I did in the initial setup, I re-tested without the
for vm in $importantvms; do
cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
for task in *; do
/bin/echo $task > /mnt/important/tasks
done
done
since I don't do anything with that "important" cgroup (yet) anyway. It
did not make a difference though.
- Can you also paste me the rules you have put on important and
non-important
groups. Somehow I suspect that some of the rule has gone horribly bad
in the sense that it is very low and effectively no virtual machine
is making any progress.
See setup commands in the beginning of this email.
- How long does it take to reach in this locked state where bo=0.
It goes there "instantly", right after the dd commands start.
- you can also try to redirect blktrace output to blkparse and
redirect
it to standard output and see capture some output by copying pasting
last messages.
I hope this is what you meant:
blktrace -d /dev/sdb -o - | blkparse -i -
Output is in attachment.
Regards
Dominik