Re: [libvirt] blkio cgroup

Monday, 21 February 2011

On Mon, Feb 21, 2011 at 03:36:14PM +0800, Gui Jianfeng wrote:
...
 Dominik,

 Would you try "oflag=direct" when you do tests in Guests. And make sure
 /sys/block/xxx/queue/iosched/group_isolation is set to 1. 
oflag=direct in guest might be good for testing and understanding the
problem, but in practice we will not have a control over what a user is
running inside guest. The only control we will have is to use cache=none for
guest and then control any traffic coming out of guest.

Thanks
Vivek

...

 I guess with such setting, your tests should goes well.

 Thanks,
 Gui

 Vivek Goyal wrote:
 > On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote:
 >> Hi Vivek
 >>
 >> I don't know whether you follow the libvirt list, I assume you don't.
So
 >> I thought I'd forward you an E-Mail involving the blkio controller and a
 >> terrible situation arising from using it (maybe in a wrong way).
 >>
 >> I'd truely appreciate it if you read it and commented on it. Maybe I did
 >> something wrong, but maybe also I found a bug in some way.
 > 
 > Hi Dominik, 
 > 
 > Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have
 > just now subscribed.
 > 
 > Few questions inline.
 > 
 >> -------- Original Message --------
 >> Subject: Re: [libvirt] [PATCH 0/6 v3] Add blkio cgroup support
 >> Date: Fri, 18 Feb 2011 14:42:51 +0100
 >> From: Dominik Klein <dk(a)in-telegence.net&gt;
 >> To: libvir-list(a)redhat.com
 >>
 >> Hi
 >>
 >> back with some testing results.
 >>
 >>>> how about the start Guest with option "cache=none" to bypass
pagecache?
 >>>> This should help i think.
 >>> I will read up on where to set that and give it a try. Thanks for the hint.
 >> So here's what I did and found out:
 >>
 >> The host system has 2 12 core CPUs and 128 GB of Ram.
 >>
 >> I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of
 >> RAm and one disk, which is an lv on the host. Cache mode is "none":
 > 
 > So you have only one root SATA disk and setup a linear logical volume on
 > that? I not, can you give more info about the storage configuration?
 > 
 > - I am assuming you are using CFQ on your underlying physical disk.
 > 
 > - What kernel version are you testing with.
 > 
 > - Cache=none mode is good which should make all the IO O_DIRECT on host
 >   and should show up as SYNC IO on CFQ without losing io context info.
 >   The onlly probelm is intermediate dm layer and if it is changing the
 >   io context somehow. I am not sure at this point of time.
 > 
 > - Is it possible to capture 10-15 second blktrace on your underlying
 >   physical device. That should give me some idea what's happening.
 > 
 > - Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1
 >   on your underlying physical device where CFQ is running and see if it makes
 >   any difference.
 > 
 >> for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7
 >> kernel8; do virsh dumpxml $vm|grep cache; done
 >>       <driver name='qemu' type='raw'
cache='none'/>
 >>       <driver name='qemu' type='raw'
cache='none'/>
 >>       <driver name='qemu' type='raw'
cache='none'/>
 >>       <driver name='qemu' type='raw'
cache='none'/>
 >>       <driver name='qemu' type='raw'
cache='none'/>
 >>       <driver name='qemu' type='raw'
cache='none'/>
 >>       <driver name='qemu' type='raw'
cache='none'/>
 >>       <driver name='qemu' type='raw'
cache='none'/>
 >>
 >> My goal is to give more I/O time to kernel1 and kernel2 than to the rest
 >> of the VMs.
 >>
 >> mount -t cgroup -o blkio none /mnt
 >> cd /mnt
 >> mkdir important
 >> mkdir notimportant
 >>
 >> echo 1000 > important/blkio.weight
 >> echo 100 > notimportant/blkio.weight
 >> for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do
 >> cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
 >> for task in *; do
 >> /bin/echo $task > /mnt/notimportant/tasks
 >> done
 >> done
 >>
 >> for vm in kernel1 kernel2; do
 >> cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task
 >> for task in *; do
 >> /bin/echo $task > /mnt/important/tasks
 >> done
 >> done
 >>
 >> Then I used cssh to connect to all 8 VMs and execute
 >> dd if=/dev/zero of=testfile bs=1M count=1500
 >> in all VMs simultaneously.
 >>
 >> Results are:
 >> kernel1: 47.5593 s, 33.1 MB/s
 >> kernel2: 60.1464 s, 26.2 MB/s
 >> kernel3: 74.204 s, 21.2 MB/s
 >> kernel4: 77.0759 s, 20.4 MB/s
 >> kernel5: 65.6309 s, 24.0 MB/s
 >> kernel6: 81.1402 s, 19.4 MB/s
 >> kernel7: 70.3881 s, 22.3 MB/s
 >> kernel8: 77.4475 s, 20.3 MB/s
 >>
 >> Results vary a little bit from run to run, but it is nothing
 >> spectacular, as weights of 1000 vs. 100 would suggest.
 >>
 >> So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of
 >> weighing I/O. First I rebooted everything so that no old configuration
 >> of cgroup was left in place and then setup everything except the 100 and
 >> 1000 weight configuration.
 >>
 >> quote from blkio.txt:
 >> ------------
 >> - blkio.throttle.write_bps_device
 >>         - Specifies upper limit on WRITE rate to the device. IO rate is
 >>           specified in bytes per second. Rules are per deivce. Following is
 >>           the format.
 >>
 >>   echo "<major>:<minor>  <rate_bytes_per_second>"
>
 >> /cgrp/blkio.write_bps_device
 >> -------------
 >>
 >> for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7
 >> kernel8; do ls -lH /dev/vdisks/$vm; done
 >> brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1
 >> brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2
 >> brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3
 >> brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4
 >> brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5
 >> brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6
 >> brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7
 >> brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8
 >>
 >> /bin/echo 254:25 10000000 >
 >> /mnt/notimportant/blkio.throttle.write_bps_device
 >> /bin/echo 254:26 10000000 >
 >> /mnt/notimportant/blkio.throttle.write_bps_device
 >> /bin/echo 254:27 10000000 >
 >> /mnt/notimportant/blkio.throttle.write_bps_device
 >> /bin/echo 254:28 10000000 >
 >> /mnt/notimportant/blkio.throttle.write_bps_device
 >> /bin/echo 254:29 10000000 >
 >> /mnt/notimportant/blkio.throttle.write_bps_device
 >> /bin/echo 254:30 10000000 >
 >> /mnt/notimportant/blkio.throttle.write_bps_device
 >> /bin/echo 254:30 10000000 >
 >> /mnt/notimportant/blkio.throttle.write_bps_device
 >>
 >> Then I ran the previous test again. This resulted in an ever increasing
 >> load (last I checked was ~ 300) on the host system. (This is perfectly
 >> reproducible).
 >>
 >> uptime
 >> Fri Feb 18 14:42:17 2011
 >> 14:42:17 up 12 min,  9 users,  load average: 286.51, 142.22, 56.71
 > 
 > Have you run top or something to figure out why load average is shooting
 > up. I suspect that because of throttling limit, IO threads have been
 > blocked and qemu is forking more IO threads. Can you just run top/ps
 > and figure out what's happening.
 > 
 > Again, is it some kind of linear volume group from which you have carved
 > out logical volumes for each virtual machine?
 > 
 > For throttling to begin with, can we do a simple test first. That is
 > run a single virtual machine, put some throttling limit on logical volume
 > and try to do READs. Once READs work, lets test WRITES and check why
 > does system load go up.
 > 
 > Thanks
 > Vivek
 > 
 > --
 > libvir-list mailing list
 > libvir-list(a)redhat.com
 > https://www.redhat.com/mailman/listinfo/libvir-list
 > 

 -- 
 Regards
 Gui Jianfeng 

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [libvirt] blkio cgroup