Re: [libvirt] blkio cgroup

22 Feb 2011


      On Mon, Feb 21, 2011 at 03:36:14PM +0800, Gui Jianfeng wrote:
...
Dominik,
Would you try "oflag=direct" when you do tests in Guests. And make sure
/sys/block/xxx/queue/iosched/group_isolation is set to 1.
oflag=direct in guest might be good for testing and understanding the
problem, but in practice we will not have a control over what a user is
running inside guest. The only control we will have is to use cache=none for
guest and then control any traffic coming out of guest.

Thanks
Vivek
...
I guess with such setting, your tests should goes well.
Thanks,
Gui
Vivek Goyal wrote:
...
On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote:
...
Hi Vivek
I don't know whether you follow the libvirt list, I assume you don't. So
I thought I'd forward you an E-Mail involving the blkio controller and a
terrible situation arising from using it (maybe in a wrong way).
I'd truely appreciate it if you read it and commented on it. Maybe I did
something wrong, but maybe also I found a bug in some way.
Hi Dominik,
Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have
just now subscribed.
Few questions inline.
...
-------- Original Message --------
Subject: Re: [libvirt] [PATCH 0/6 v3] Add blkio cgroup support
Date: Fri, 18 Feb 2011 14:42:51 +0100
From: Dominik Klein <dk@in-telegence.net>
To: libvir-list@redhat.com
Hi
back with some testing results.
...
...
how about the start Guest with option "cache=none" to bypass pagecache?
This should help i think.
I will read up on where to set that and give it a try. Thanks for the hint.
So here's what I did and found out:
The host system has 2 12 core CPUs and 128 GB of Ram.
I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of
RAm and one disk, which is an lv on the host. Cache mode is "none":
So you have only one root SATA disk and setup a linear logical volume on
that? I not, can you give more info about the storage configuration?
- I am assuming you are using CFQ on your underlying physical disk.
- What kernel version are you testing with.
- Cache=none mode is good which should make all the IO O_DIRECT on host
  and should show up as SYNC IO on CFQ without losing io context info.
  The onlly probelm is intermediate dm layer and if it is changing the
  io context somehow. I am not sure at this point of time.
- Is it possible to capture 10-15 second blktrace on your underlying
  physical device. That should give me some idea what's happening.
- Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1
  on your underlying physical device where CFQ is running and see if it makes
  any difference.
...
for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7
kernel8; do virsh dumpxml $vm|grep cache; done
      <driver name='qemu' type='raw' cache='none'/>
      <driver name='qemu' type='raw' cache='none'/>
      <driver name='qemu' type='raw' cache='none'/>
      <driver name='qemu' type='raw' cache='none'/>
      <driver name='qemu' type='raw' cache='none'/>
      <driver name='qemu' type='raw' cache='none'/>
      <driver name='qemu' type='raw' cache='none'/>
      <driver name='qemu' type='raw' cache='none'/>
My goal is to give more I/O time to kernel1 and kernel2 than to the rest
of the VMs.
mount -t cgroup -o blkio none /mnt
cd /mnt
mkdir important
mkdir notimportant
echo 1000 > important/blkio.weight
echo 100 > notimportant/blkio.weight
for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do
cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
for task in *; do
/bin/echo $task > /mnt/notimportant/tasks
done
done
for vm in kernel1 kernel2; do
cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
for task in *; do
/bin/echo $task > /mnt/important/tasks
done
done
Then I used cssh to connect to all 8 VMs and execute
dd if=/dev/zero of=testfile bs=1M count=1500
in all VMs simultaneously.
Results are:
kernel1: 47.5593 s, 33.1 MB/s
kernel2: 60.1464 s, 26.2 MB/s
kernel3: 74.204 s, 21.2 MB/s
kernel4: 77.0759 s, 20.4 MB/s
kernel5: 65.6309 s, 24.0 MB/s
kernel6: 81.1402 s, 19.4 MB/s
kernel7: 70.3881 s, 22.3 MB/s
kernel8: 77.4475 s, 20.3 MB/s
Results vary a little bit from run to run, but it is nothing
spectacular, as weights of 1000 vs. 100 would suggest.
So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of
weighing I/O. First I rebooted everything so that no old configuration
of cgroup was left in place and then setup everything except the 100 and
1000 weight configuration.
quote from blkio.txt:
------------
- blkio.throttle.write_bps_device
        - Specifies upper limit on WRITE rate to the device. IO rate is
          specified in bytes per second. Rules are per deivce. Following is
          the format.
echo "<major>:<minor>  <rate_bytes_per_second>" >
/cgrp/blkio.write_bps_device
-------------
for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7
kernel8; do ls -lH /dev/vdisks/$vm; done
brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1
brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2
brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3
brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4
brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5
brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6
brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7
brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8
/bin/echo 254:25 10000000 >
/mnt/notimportant/blkio.throttle.write_bps_device
/bin/echo 254:26 10000000 >
/mnt/notimportant/blkio.throttle.write_bps_device
/bin/echo 254:27 10000000 >
/mnt/notimportant/blkio.throttle.write_bps_device
/bin/echo 254:28 10000000 >
/mnt/notimportant/blkio.throttle.write_bps_device
/bin/echo 254:29 10000000 >
/mnt/notimportant/blkio.throttle.write_bps_device
/bin/echo 254:30 10000000 >
/mnt/notimportant/blkio.throttle.write_bps_device
/bin/echo 254:30 10000000 >
/mnt/notimportant/blkio.throttle.write_bps_device
Then I ran the previous test again. This resulted in an ever increasing
load (last I checked was ~ 300) on the host system. (This is perfectly
reproducible).
uptime
Fri Feb 18 14:42:17 2011
14:42:17 up 12 min,  9 users,  load average: 286.51, 142.22, 56.71
Have you run top or something to figure out why load average is shooting
up. I suspect that because of throttling limit, IO threads have been
blocked and qemu is forking more IO threads. Can you just run top/ps
and figure out what's happening.
Again, is it some kind of linear volume group from which you have carved
out logical volumes for each virtual machine?
For throttling to begin with, can we do a simple test first. That is
run a single virtual machine, put some throttling limit on logical volume
and try to do READs. Once READs work, lets test WRITES and check why
does system load go up.
Thanks
Vivek
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
-- 
Regards
Gui Jianfeng