Re: [libvirt] blkio cgroup

Friday, 18 February 2011

On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote:
...
 Hi Vivek

 I don't know whether you follow the libvirt list, I assume you don't. So
 I thought I'd forward you an E-Mail involving the blkio controller and a
 terrible situation arising from using it (maybe in a wrong way).

 I'd truely appreciate it if you read it and commented on it. Maybe I did
 something wrong, but maybe also I found a bug in some way. 
Hi Dominik, 

Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have
just now subscribed.

Few questions inline.

...
 -------- Original Message --------
 Subject: Re: [libvirt] [PATCH 0/6 v3] Add blkio cgroup support
 Date: Fri, 18 Feb 2011 14:42:51 +0100
 From: Dominik Klein <dk(a)in-telegence.net&gt;
 To: libvir-list(a)redhat.com

 Hi

 back with some testing results.

 >> how about the start Guest with option "cache=none" to bypass
pagecache?
 >> This should help i think.
 > 
 > I will read up on where to set that and give it a try. Thanks for the hint.

 So here's what I did and found out:

 The host system has 2 12 core CPUs and 128 GB of Ram.

 I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of
 RAm and one disk, which is an lv on the host. Cache mode is "none": 
So you have only one root SATA disk and setup a linear logical volume on
that? I not, can you give more info about the storage configuration?

- I am assuming you are using CFQ on your underlying physical disk.

- What kernel version are you testing with.

- Cache=none mode is good which should make all the IO O_DIRECT on host
  and should show up as SYNC IO on CFQ without losing io context info.
  The onlly probelm is intermediate dm layer and if it is changing the
  io context somehow. I am not sure at this point of time.

- Is it possible to capture 10-15 second blktrace on your underlying
  physical device. That should give me some idea what's happening.

- Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1
  on your underlying physical device where CFQ is running and see if it makes
  any difference.

...

 for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7
 kernel8; do virsh dumpxml $vm|grep cache; done
       <driver name='qemu' type='raw' cache='none'/>
       <driver name='qemu' type='raw' cache='none'/>
       <driver name='qemu' type='raw' cache='none'/>
       <driver name='qemu' type='raw' cache='none'/>
       <driver name='qemu' type='raw' cache='none'/>
       <driver name='qemu' type='raw' cache='none'/>
       <driver name='qemu' type='raw' cache='none'/>
       <driver name='qemu' type='raw' cache='none'/>

 My goal is to give more I/O time to kernel1 and kernel2 than to the rest
 of the VMs.

 mount -t cgroup -o blkio none /mnt
 cd /mnt
 mkdir important
 mkdir notimportant

 echo 1000 > important/blkio.weight
 echo 100 > notimportant/blkio.weight
 for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do
 cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
 for task in *; do
 /bin/echo $task > /mnt/notimportant/tasks
 done
 done

 for vm in kernel1 kernel2; do
 cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
 for task in *; do
 /bin/echo $task > /mnt/important/tasks
 done
 done

 Then I used cssh to connect to all 8 VMs and execute
 dd if=/dev/zero of=testfile bs=1M count=1500
 in all VMs simultaneously.

 Results are:
 kernel1: 47.5593 s, 33.1 MB/s
 kernel2: 60.1464 s, 26.2 MB/s
 kernel3: 74.204 s, 21.2 MB/s
 kernel4: 77.0759 s, 20.4 MB/s
 kernel5: 65.6309 s, 24.0 MB/s
 kernel6: 81.1402 s, 19.4 MB/s
 kernel7: 70.3881 s, 22.3 MB/s
 kernel8: 77.4475 s, 20.3 MB/s

 Results vary a little bit from run to run, but it is nothing
 spectacular, as weights of 1000 vs. 100 would suggest.

 So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of
 weighing I/O. First I rebooted everything so that no old configuration
 of cgroup was left in place and then setup everything except the 100 and
 1000 weight configuration.

 quote from blkio.txt:
 ------------
 - blkio.throttle.write_bps_device
         - Specifies upper limit on WRITE rate to the device. IO rate is
           specified in bytes per second. Rules are per deivce. Following is
           the format.

   echo "<major>:<minor>  <rate_bytes_per_second>" >
 /cgrp/blkio.write_bps_device
 -------------

 for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7
 kernel8; do ls -lH /dev/vdisks/$vm; done
 brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1
 brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2
 brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3
 brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4
 brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5
 brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6
 brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7
 brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8

 /bin/echo 254:25 10000000 >
 /mnt/notimportant/blkio.throttle.write_bps_device
 /bin/echo 254:26 10000000 >
 /mnt/notimportant/blkio.throttle.write_bps_device
 /bin/echo 254:27 10000000 >
 /mnt/notimportant/blkio.throttle.write_bps_device
 /bin/echo 254:28 10000000 >
 /mnt/notimportant/blkio.throttle.write_bps_device
 /bin/echo 254:29 10000000 >
 /mnt/notimportant/blkio.throttle.write_bps_device
 /bin/echo 254:30 10000000 >
 /mnt/notimportant/blkio.throttle.write_bps_device
 /bin/echo 254:30 10000000 >
 /mnt/notimportant/blkio.throttle.write_bps_device

 Then I ran the previous test again. This resulted in an ever increasing
 load (last I checked was ~ 300) on the host system. (This is perfectly
 reproducible).

 uptime
 Fri Feb 18 14:42:17 2011
 14:42:17 up 12 min,  9 users,  load average: 286.51, 142.22, 56.71 
Have you run top or something to figure out why load average is shooting
up. I suspect that because of throttling limit, IO threads have been
blocked and qemu is forking more IO threads. Can you just run top/ps
and figure out what's happening.

Again, is it some kind of linear volume group from which you have carved
out logical volumes for each virtual machine?

For throttling to begin with, can we do a simple test first. That is
run a single virtual machine, put some throttling limit on logical volume
and try to do READs. Once READs work, lets test WRITES and check why
does system load go up.

Thanks
Vivek

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] blkio cgroup