
On Mon, Feb 21, 2011 at 03:36:14PM +0800, Gui Jianfeng wrote:
Dominik,
Would you try "oflag=direct" when you do tests in Guests. And make sure /sys/block/xxx/queue/iosched/group_isolation is set to 1.
oflag=direct in guest might be good for testing and understanding the problem, but in practice we will not have a control over what a user is running inside guest. The only control we will have is to use cache=none for guest and then control any traffic coming out of guest. Thanks Vivek
I guess with such setting, your tests should goes well.
Thanks, Gui
Vivek Goyal wrote:
On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote:
Hi Vivek
I don't know whether you follow the libvirt list, I assume you don't. So I thought I'd forward you an E-Mail involving the blkio controller and a terrible situation arising from using it (maybe in a wrong way).
I'd truely appreciate it if you read it and commented on it. Maybe I did something wrong, but maybe also I found a bug in some way.
Hi Dominik,
Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have just now subscribed.
Few questions inline.
-------- Original Message -------- Subject: Re: [libvirt] [PATCH 0/6 v3] Add blkio cgroup support Date: Fri, 18 Feb 2011 14:42:51 +0100 From: Dominik Klein <dk@in-telegence.net> To: libvir-list@redhat.com
Hi
back with some testing results.
how about the start Guest with option "cache=none" to bypass pagecache? This should help i think. I will read up on where to set that and give it a try. Thanks for the hint. So here's what I did and found out:
The host system has 2 12 core CPUs and 128 GB of Ram.
I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of RAm and one disk, which is an lv on the host. Cache mode is "none":
So you have only one root SATA disk and setup a linear logical volume on that? I not, can you give more info about the storage configuration?
- I am assuming you are using CFQ on your underlying physical disk.
- What kernel version are you testing with.
- Cache=none mode is good which should make all the IO O_DIRECT on host and should show up as SYNC IO on CFQ without losing io context info. The onlly probelm is intermediate dm layer and if it is changing the io context somehow. I am not sure at this point of time.
- Is it possible to capture 10-15 second blktrace on your underlying physical device. That should give me some idea what's happening.
- Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1 on your underlying physical device where CFQ is running and see if it makes any difference.
for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do virsh dumpxml $vm|grep cache; done <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/> <driver name='qemu' type='raw' cache='none'/>
My goal is to give more I/O time to kernel1 and kernel2 than to the rest of the VMs.
mount -t cgroup -o blkio none /mnt cd /mnt mkdir important mkdir notimportant
echo 1000 > important/blkio.weight echo 100 > notimportant/blkio.weight for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task for task in *; do /bin/echo $task > /mnt/notimportant/tasks done done
for vm in kernel1 kernel2; do cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task for task in *; do /bin/echo $task > /mnt/important/tasks done done
Then I used cssh to connect to all 8 VMs and execute dd if=/dev/zero of=testfile bs=1M count=1500 in all VMs simultaneously.
Results are: kernel1: 47.5593 s, 33.1 MB/s kernel2: 60.1464 s, 26.2 MB/s kernel3: 74.204 s, 21.2 MB/s kernel4: 77.0759 s, 20.4 MB/s kernel5: 65.6309 s, 24.0 MB/s kernel6: 81.1402 s, 19.4 MB/s kernel7: 70.3881 s, 22.3 MB/s kernel8: 77.4475 s, 20.3 MB/s
Results vary a little bit from run to run, but it is nothing spectacular, as weights of 1000 vs. 100 would suggest.
So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of weighing I/O. First I rebooted everything so that no old configuration of cgroup was left in place and then setup everything except the 100 and 1000 weight configuration.
quote from blkio.txt: ------------ - blkio.throttle.write_bps_device - Specifies upper limit on WRITE rate to the device. IO rate is specified in bytes per second. Rules are per deivce. Following is the format.
echo "<major>:<minor> <rate_bytes_per_second>" > /cgrp/blkio.write_bps_device -------------
for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do ls -lH /dev/vdisks/$vm; done brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1 brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2 brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3 brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4 brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5 brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6 brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7 brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8
/bin/echo 254:25 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:26 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:27 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:28 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:29 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:30 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device /bin/echo 254:30 10000000 > /mnt/notimportant/blkio.throttle.write_bps_device
Then I ran the previous test again. This resulted in an ever increasing load (last I checked was ~ 300) on the host system. (This is perfectly reproducible).
uptime Fri Feb 18 14:42:17 2011 14:42:17 up 12 min, 9 users, load average: 286.51, 142.22, 56.71
Have you run top or something to figure out why load average is shooting up. I suspect that because of throttling limit, IO threads have been blocked and qemu is forking more IO threads. Can you just run top/ps and figure out what's happening.
Again, is it some kind of linear volume group from which you have carved out logical volumes for each virtual machine?
For throttling to begin with, can we do a simple test first. That is run a single virtual machine, put some throttling limit on logical volume and try to do READs. Once READs work, lets test WRITES and check why does system load go up.
Thanks Vivek
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Regards Gui Jianfeng