I’m running libvirt 0.10.2 and qemu-kvm-1.2.0, both compiled from source, on CentOS 6.  I’ve got a working blkio cgroup hierarchy which I’m attaching guests to using the following XML guest configs:

 

VM1 (foreground):

 

  <cputune>

    <shares>2048</shares>

  </cputune>

  <blkiotune>

    <weight>1000</weight>

  </blkiotune>

 

VM2 (background):

 

  <cputune>

    <shares>2</shares>

  </cputune>

  <blkiotune>

    <weight>100</weight>

  </blkiotune>

 

I’ve tested write throughput on the host using cgexec and dd, demonstrating that libvirt has correctly set up the cgroups:

 

cgexec -g blkio:libvirt/qemu/foreground time dd if=/dev/zero of=trash1.img oflag=direct bs=1M count=4096 & cgexec -g blkio:libvirt/qemu/background time dd if=/dev/zero of=trash2.img oflag=direct bs=1M count=4096 &

 

Snap from iotop, showing an 8:1 ratio (should be 10:1, but 8:1 is acceptable):

 

Total DISK READ: 0.00 B/s | Total DISK WRITE: 91.52 M/s

  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND

9602 be/4 root        0.00 B/s   10.71 M/s  0.00 % 98.54 % dd if=/dev/zero of=trash2.img oflag=direct bs=1M count=4096

9601 be/4 root        0.00 B/s   80.81 M/s  0.00 % 97.76 % dd if=/dev/zero of=trash1.img oflag=direct bs=1M count=4096

 

Further, checking the task list inside each cgroup shows the guest’s main PID, plus those of the virtio kernel threads.  It’s hard to tell if all the virtio kernel threads are listed, but all the ones I’ve hunted down appear to be there.

 

However, when running the same dd commands inside the guests, I get roughly-equal performance – nowhere near the ~8:1 relative bandwidth enforcement I get from the host: (background ctrl-c’d right after foreground finishes, both started within 1s of each other)

 

[ben@foreground ~]$ dd if=/dev/zero of=trash1.img oflag=direct bs=1M count=4096

4096+0 records in

4096+0 records out

4294967296 bytes (4.3 GB) copied, 104.645 s, 41.0 MB/s

 

[ben@background ~]$ dd if=/dev/zero of=trash2.img oflag=direct bs=1M count=4096

^C4052+0 records in

4052+0 records out

4248829952 bytes (4.2 GB) copied, 106.318 s, 40.0 MB/s

 

I thought based on this statement: “Currently, the Block I/O subsystem does not work for buffered write operations. It is primarily targeted at direct I/O, although it works for buffered read operations.” from this page: https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html that this problem might be due to host-side buffering, but I have that explicitly disabled in my guest configs:

 

  <devices>

    <emulator>/usr/bin/qemu-kvm</emulator>

    <disk type="file" device="disk">

      <driver name="qemu" type="raw" cache="none"/>

      <source file="/path/to/disk.img"/>

      <target dev="vda" bus="virtio"/>

      <alias name="virtio-disk0"/>

      <address type="pci" domain="0x0000" bus="0x00" slot="0x04" function="0x0"/>

    </disk>

 

Here is the qemu line from ps, showing that it’s clearly being passed through from the guest XML config:

 

root      5110 20.8  4.3 4491352 349312 ?      Sl   11:58   0:38 /usr/bin/qemu-kvm -name background -S -M pc-1.2 -enable-kvm -m 2048 -smp 2,sockets=2,cores=1,threads=1 -uuid ea632741-c7be-36ab-bd69-da3cbe505b38 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/background.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/path/to/disk.img,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=22 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:1 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

 

For fun I tried a few different cache options to try to force a bypass the host buffercache, including writethough and directsync, but the number of virtio kernel threads appeared to explode (especially for directsync) and the throughput dropped quite low: ~50% of “none” for writethrough and ~5% for directsync.

 

With cache=none, when I generate write loads inside the VMs, I do see growth in the host’s buffer cache.  Further, if I use non-direct I/O inside the VMs, and inflate the balloon (forcing the guest’s buffer cache to flush), I don’t see a corresponding drop in background throughput.  Is it possible that the cache="none" directive is not being respected? 

 

Since cgroups is working for host-side processes I think my blkio subsystem is correctly set up (using cfq, group_isolation=1 etc).  Maybe I miscompiled qemu, without some needed direct I/O support?  Has anyone seen this before?

 

Ben Clay

rbclay@ncsu.edu