Hi,
i have a vm which has a poor performance.
E.g. top needs seconds to refresh its output on the console. Same with netstat.
The guest is hosting a MySQL DB with a webfrontend, its response is poor too.
I'm looking for the culprit.
Following top in the guest i get these hints:
Memory is free enough, system is not swapping.
System has 8GB RAM and two cpu's.
Cpu 0 is struggling with a lot of software interrupts, between 50% and 80%.
Cpu1 is often waiting for IO (wa), between 0% and 20%.
No application is consuming much cpu time.
Here is an example:
top - 11:19:18 up 18:19, 11 users, load average: 1.44, 0.94, 0.66
Tasks: 95 total, 1 running, 94 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 20.0%id, 0.0%wa, 0.0%hi, 80.0%si, 0.0%st
Cpu1 : 1.9%us, 13.8%sy, 0.0%ni, 73.8%id, 10.5%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 7995216k total, 6385176k used, 1610040k free, 177772k buffers
Swap: 2104472k total, 0k used, 2104472k free, 5940884k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6470 root 16 0 12844 1464 804 S 12 0.0 2:17.13 screen
6022 root 15 0 41032 3052 2340 S 3 0.0 1:10.99 sshd
8322 root 0 -20 10460 4976 2268 S 3 0.1 19:20.38 atop
10806 root 16 0 5540 1216 880 R 0 0.0 0:00.51 top
126 root 15 0 0 0 0 S 0 0.0 0:23.33 pdflush
3531 postgres 15 0 68616 1600 792 S 0 0.0 0:41.24 postmaster
The host in which the guest runs has 96GB RAM and 8 cores.
It does not seem to do much:
top - 11:21:19 up 15 days, 15:53, 14 users, load average: 1.40, 1.39, 1.40
Tasks: 221 total, 2 running, 219 sleeping, 0 stopped, 0 zombie
Cpu0 : 15.9%us, 2.7%sy, 0.0%ni, 81.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 5.0%us, 3.0%sy, 0.0%ni, 92.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 2.0%us, 0.3%sy, 0.0%ni, 97.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.3%us, 1.0%sy, 0.0%ni, 98.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 1.3%us, 0.3%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 96738M total, 13466M used, 83272M free, 3M buffers
Swap: 2046M total, 0M used, 2046M free, 3887M cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21765 root 20 0 105m 15m 4244 S 5 0.0 0:00.15 crm
3180 root 20 0 8572m 8.0g 8392 S 3 8.4 62:25.73 qemu-kvm
8529 hacluste 10 -10 90820 14m 9400 S 0 0.0 29:52.48 cib
21329 root 20 0 9040 1364 940 R 0 0.0 0:00.16 top
28439 root 20 0 0 0 0 S 0 0.0 0:04.51 kworker/4:2
1 root 20 0 10560 828 692 S 0 0.0 0:07.67 init
2 root 20 0 0 0 0 S 0 0.0 0:00.28 kthreadd
3 root 20 0 0 0 0 S 0 0.0 3:03.23 ksoftirqd/0
6 root RT 0 0 0 0 S 0 0.0 0:05.02 migration/0
7 root RT 0 0 0 0 S 0 0.0 0:02.82 watchdog/0
8 root RT 0 0 0 0 S 0 0.0 0:05.18 migration/1
I think the host is not the problem.
The vm resides on a SAN which is attached via FC. The whole system is a two node cluster.
The vm resides in a raw partition without a FS, which i read should be good for the
performance.
It runs on the other node slow too. Inside the vm i have logical volumes
(it was a physical system i migrated to a vm). The partitions are formatted with reiserfs
(The system is already some years old, at that time reiserfs was popular ...).
I use iostat on the guest:
This is a typical snapshot:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz
await svctm %util
vda 0.00 3.05 0.00 2.05 0.00 20.40 19.90 0.09
44.59 31.22 6.40
dm-0 0.00 0.00 0.00 4.55 0.00 18.20 8.00 0.24
52.31 7.74 3.52
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.10 0.00 0.40 8.00 0.01
92.00 56.00 0.56
dm-3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
dm-5 0.00 0.00 0.00 0.35 0.00 1.40 8.00 0.03
90.29 65.71 2.30
dm-6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00
vda has several partitions, one for /, one for swap, and two physical volumes for LVM.
Following "man iostat", the columns await and svctm seem to be important. Man
says:
await
The average time (in milliseconds) for I/O requests issued to the device to be served.
This includes the time spent by the requests in queue and the time spent servicing them.
svctm
The average service time (in milliseconds) for I/O requests that were issued to the
device.
It seems system is waiting a long time for IO. Although the amount of transfered data is
small.
I have some suspicions:
- the lvm setup in the guest
- some hardware
- cache mode for the disk is "none". Otherwise i can't do a live migration.
What do you think ? How can i find out from where the high si comes ?
Network and disk are virtio devices (which should be fast):
vm58820-4:~ # lsmod|grep -i virt
virtio_balloon 22788 0
virtio_net 30464 0
virtio_pci 27264 0
virtio_ring 21376 1 virtio_pci
virtio_blk 25224 5
virtio 22916 4 virtio_balloon,virtio_net,virtio_pci,virtio_blk
That's the config of the guest:
<domain type='kvm'>
<name>mausdb_vm</name>
<uuid>f08c2f32-fe35-137a-0e9d-fa7485d57974</uuid>
<memory unit='KiB'>8198144</memory>
<currentMemory unit='KiB'>8197376</currentMemory>
<vcpu placement='static'>2</vcpu>
<os>
<type arch='x86_64' machine='pc-i440fx-1.4'>hvm</type>
<boot dev='cdrom'/>
<bootmenu enable='yes'/>
</os>
<features>
<acpi/>
<apic/>
<pae/>
</features>
<clock offset='utc'/>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<devices>
<emulator>/usr/bin/qemu-kvm</emulator>
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/vg_cluster_01/lv_cluster_01'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x04' function='0x0'/>
</disk>
<controller type='usb' index='0'>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x01' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'/>
<controller type='ide' index='0'>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x01' function='0x1'/>
</controller>
<interface type='bridge'>
<mac address='52:54:00:37:92:01'/>
<source bridge='br0'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x03' function='0x0'/>
</interface>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<graphics type='vnc' port='-1' autoport='yes'
listen='127.0.0.1'>
<listen type='address' address='127.0.0.1'/>
</graphics>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x00'
slot='0x05' function='0x0'/>
</memballoon>
</devices>
<seclabel type='none'/>
</domain>
Host OS is SLES 11 SP4, guest os is SLES 10 SP4. Both 64bit.
Thanks for any hint.
Bernd
--
Bernd Lentes
Systemadministration
institute of developmental genetics
Gebäude 35.34 - Raum 208
HelmholtzZentrum München
bernd.lentes(a)helmholtz-muenchen.de
phone: +49 (0)89 3187 1241
fax: +49 (0)89 3187 2294
Erst wenn man sich auf etwas festlegt kann man Unrecht haben
Scott Adams
Helmholtz Zentrum Muenchen
Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
Ingolstaedter Landstr. 1
85764 Neuherberg
www.helmholtz-muenchen.de
Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons Enhsen
Registergericht: Amtsgericht Muenchen HRB 6466
USt-IdNr: DE 129521671