[libvirt-users] Performance tuning questions for mail server

Hi, I have a fedora15 x86_64 host with one fedora15 guest running amavis+spamassassin+postfix and performance is horrible. The host is a quad-core E13240 with 16GB and 3 1TB Seagate ST31000524NS and all partitions are ext4. I've allocated 4 processors and 8GB of RAM to this guest. I really hoped someone could help me identify areas in which performance can be improved at both the guest and host. Load on the server is regularly above 20, yet the processors generally are idle and the host is still responsive. Should the concentration of the performance tuning be done on the guest or the host? I've read through the information at http://libvirt.org/formatdomain.html#elementsMemoryTuning but I don't know how the settings apply to my configuration and which ones apply to my hardware. I've included my libvirt xml config below. It was built using virt-manager on fedora15. There appears to be quite a few other options that are available and not provided by virt-manager that I would like to be able to use. I've done a little kernel tuning on the host, although it doesn't appear to have made much difference. Are there a set of values for kernel parameters that would be advisable to make for a mail server? Here is what I currently have: # sysctl -p net.ipv4.ip_forward = 1 net.ipv4.conf.default.rp_filter = 1 net.ipv4.conf.default.accept_source_route = 0 kernel.sysrq = 0 kernel.core_uses_pid = 1 net.bridge.bridge-nf-call-ip6tables = 0 net.bridge.bridge-nf-call-iptables = 0 net.bridge.bridge-nf-call-arptables = 0 net.ipv4.ip_forward = 1 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.all.send_redirects = 0 net.ipv4.conf.default.send_redirects = 0 net.ipv4.icmp_ignore_bogus_error_responses = 1 net.ipv4.conf.all.log_martians = 0 net.ipv4.conf.default.log_martians = 0 net.ipv4.conf.default.accept_source_route = 0 net.ipv4.conf.all.accept_redirects = 0 net.ipv4.conf.default.accept_redirects = 0 vm.vfs_cache_pressure = 35 vm.nr_hugepages = 512 net.ipv4.tcp_max_syn_backlog = 2048 fs.aio-max-nr = 1048576 vm.dirty_background_ratio = 3 vm.dirty_ratio = 40 After making changes, do you have any recommendations on which tools to use to monitor those changes and see how they perform? I have noatime set in fstab in the guest for the /var partition, where much of the spamassassin occurs. I've included below my libvirt xml config for the guest and hoped someone could make some recommendations on how to apply the cputune and memtune parameters to my system. <domain type='kvm'> <name>mail02</name> <uuid>ec4f3cf5-2f27-fb3e-72f6-3fa3176b13b6</uuid> <memory>8388608</memory> <currentMemory>8388608</currentMemory> <vcpu>8</vcpu> <os> <type arch='x86_64' machine='pc-0.14'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/mail02.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='block' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdc' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='1' unit='0'/> </disk> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <interface type='bridge'> <mac address='52:54:00:67:2c:4c'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video> <model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> </domain> Thanks, Alex

On Wed, Oct 05, 2011 at 02:28:30PM -0400, Alex wrote:
Load on the server is regularly above 20, yet the processors generally are idle and the host is still responsive.
That's completely normal for an email server running spamassassin, in my experience, and has nothing to do with libvirt. IME, the issue is DNS lookups, which spamassassin and the RBLs it uses make *heavy* use of. Run a caching DNS on localhost with fairly agressive caching (i.e. go ahead and have it ignore TTLs and keep everything for at least 5 minutes) and I expect that you'll find that problem being rapidly resolved. -Robin

Hi,
Load on the server is regularly above 20, yet the processors generally are idle and the host is still responsive.
That's completely normal for an email server running spamassassin, in my experience, and has nothing to do with libvirt. IME, the issue is DNS lookups, which spamassassin and the RBLs it uses make *heavy* use of. Run a caching DNS on localhost with fairly agressive caching (i.e. go ahead and have it ignore TTLs and keep everything for at least 5 minutes) and I expect that you'll find that problem being rapidly resolved.
Thanks for your help, but as much as I'd like it to be a simple DNS issue, I really don't think that's the case. I have a caching nameserver running, and while there sure is a lot of DNS traffic, it's got to be more than that. This mail server does manage a lot of mail per day, but not enough to even consume the 8GB I've allocated, and the "mailq" command typically takes a few seconds to respond, even when there's only a few messages in the queue. iotop may report as much as 2M/s on the host, with an average of about 400-600K/s. Does that seem like a lot? I can write like 80MB/s at least using dd to test. My configuration does not use any of the optional CPU or memory parameters from libvirt, so I'm pretty sure there is something more that can be done to improve performance, but I just don't know how. Thanks for any ideas. Alex

On 10/6/11, Alex <mysqlstudent@gmail.com> wrote:
This mail server does manage a lot of mail per day, but not enough to even consume the 8GB I've allocated, and the "mailq" command typically takes a few seconds to respond, even when there's only a few messages in the queue.
iotop may report as much as 2M/s on the host, with an average of about 400-600K/s. Does that seem like a lot? I can write like 80MB/s at least using dd to test.
I had similar problems previously. The crux in my case is the number of IOPS possible. 100K of 2KB file writes is still 2MB/s but requires a lot more IO overheads than 1 single 2MB file. Recommendations that I was given which stopped the system from locking up included 1. use ionice on the mail process 2. change the IO scheduler/elevator from the default CFQ 3. switch from file-based virtual drives to raw disk devices 4. mount filesystems with noatime Also, are you using RAID 5?

Hi,
This mail server does manage a lot of mail per day, but not enough to even consume the 8GB I've allocated, and the "mailq" command typically takes a few seconds to respond, even when there's only a few messages in the queue.
iotop may report as much as 2M/s on the host, with an average of about 400-600K/s. Does that seem like a lot? I can write like 80MB/s at least using dd to test.
I had similar problems previously. The crux in my case is the number of IOPS possible. 100K of 2KB file writes is still 2MB/s but requires a lot more IO overheads than 1 single 2MB file.
Recommendations that I was given which stopped the system from locking up included 1. use ionice on the mail process 2. change the IO scheduler/elevator from the default CFQ 3. switch from file-based virtual drives to raw disk devices 4. mount filesystems with noatime
Also, are you using RAID 5?
The only thing I haven't done from above is to use ionice on mail processes. I'm using RAID5 across three 1TB SATA3 disks, I'm using deadline scheduler, the /var partition is mounted noatime, and the disk is mounted raw: <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/mail02.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> How do I configure BIND to ignore TTLs and keep everything for five minutes, as suggested? Thanks, Alex

On 10/7/11, Alex <mysqlstudent@gmail.com> wrote:
The only thing I haven't done from above is to use ionice on mail processes. I'm using RAID5 across three 1TB SATA3 disks,
RAID 5 is another major bottleneck there. The commonly cited write penalty for RAID 5 appears to be 4 IOPS, while RAID 1/10 is 2 IOPS. Due to the typical load on an email server with more writes than read, this become rather punishing. So you might see a major improvement simply by adding another 1TB and going for RAID 10. Personally because of that, as well as the headache of recovering from a RAID 5 disaster, I've stopped using RAID 5 especially since the cost of drives are really cheap.
I'm using deadline scheduler, the /var partition is mounted noatime, and the disk is mounted raw:
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/mail02.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
I meant raw disk devices rather than files e.g. <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/server02_vg1_2tb/lv_vm03_swap'/> <target dev='vdb' bus='virtio'/> </disk> This eliminates one layer of filesystem overheads.
How do I configure BIND to ignore TTLs and keep everything for five minutes, as suggested?
I'm not too sure about this sorry.

Hi,
The only thing I haven't done from above is to use ionice on mail processes. I'm using RAID5 across three 1TB SATA3 disks,
RAID 5 is another major bottleneck there. The commonly cited write penalty for RAID 5 appears to be 4 IOPS, while RAID 1/10 is 2 IOPS. Due to the typical load on an email server with more writes than read, this become rather punishing.
So you might see a major improvement simply by adding another 1TB and going for RAID 10.
I thought RAID10 still involved RAID1 on all disks, so really the only improvement would be the lack of the parity write, correct? The wikipedia entry seems to indicate it's not all that much faster: http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10 Am I just expecting more from kvm/qemu than is realistic at this point? Are there no high-volume mail servers that use libvirt?
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/mail02.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
I meant raw disk devices rather than files e.g. <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/server02_vg1_2tb/lv_vm03_swap'/> <target dev='vdb' bus='virtio'/> </disk>
This eliminates one layer of filesystem overheads.
Can this type change be done without modifying the image, or must some conversion be done prior to making this change? I also found this cool doc on monitoring and improving performance: http://www.ufsdump.org/papers/io-tuning.pdf Thanks again, Alex

On Sun, Oct 09, 2011 at 03:26:52PM -0400, Alex wrote:
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/mail02.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
I meant raw disk devices rather than files e.g. <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/server02_vg1_2tb/lv_vm03_swap'/> <target dev='vdb' bus='virtio'/> </disk>
This eliminates one layer of filesystem overheads.
Can this type change be done without modifying the image, or must some conversion be done prior to making this change?
You can do that at any time; it just requires a reboot. You really want caching off; see http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fl... -Robin

On 10/10/11, Alex <mysqlstudent@gmail.com> wrote:
I thought RAID10 still involved RAID1 on all disks, so really the only improvement would be the lack of the parity write, correct? The wikipedia entry seems to indicate it's not all that much faster:
Parity write and parity calculations. For a RAID 1 or RAID 10 setup, you just write the same data to 2 disks. But for RAID 5, you need to read, verify parity, calculate new parity, then write. In terms of raw sequential bandwidth, the two don't differ that much but it's the random IOPS that's the problem. You can check out this post, it's meant for an Exchange 2007 server but the calculations there clearly demonstrate the difference between RAID 1/10 and RAID 5, it also assumes a 50/50 mix. http://www.mmcug.org/blogs/Lists/Posts/Post.aspx?ID=16 There are also plenty of other tests and articles done to compare RAID 10 and RAID 5.

Hi,
I'm using deadline scheduler, the /var partition is mounted noatime, and the disk is mounted raw:
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/mail02.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
I meant raw disk devices rather than files e.g. <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/server02_vg1_2tb/lv_vm03_swap'/> <target dev='vdb' bus='virtio'/> </disk>
I made this change by editing the xml, restarting libvirtd, then using virsh to define the xml file and received this message: virsh # define /etc/libvirt/qemu/bwimail02.xml error: Failed to define domain from /etc/libvirt/qemu/bwimail02.xml error: missing source information for device vda Have I done something wrong, or am I missing something? Thanks for the other notes. I'm still trying to digest it all, but continuing to make progress. Thanks, Alex

On Mon, Oct 10, 2011 at 05:52:28PM -0400, Alex wrote:
Hi,
I'm using deadline scheduler, the /var partition is mounted noatime, and the disk is mounted raw:
<disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/mail02.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
I meant raw disk devices rather than files e.g. <disk type='block' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source dev='/dev/server02_vg1_2tb/lv_vm03_swap'/> <target dev='vdb' bus='virtio'/> </disk>
I made this change by editing the xml, restarting libvirtd, then using virsh to define the xml file and received this message:
virsh # define /etc/libvirt/qemu/bwimail02.xml error: Failed to define domain from /etc/libvirt/qemu/bwimail02.xml error: missing source information for device vda
Have I done something wrong, or am I missing something?
You'll need to show us the xml file. -Robin

On 10/10/2011 03:52 PM, Alex wrote:
I made this change by editing the xml, restarting libvirtd, then using virsh to define the xml file and received this message:
virsh # define /etc/libvirt/qemu/bwimail02.xml error: Failed to define domain from /etc/libvirt/qemu/bwimail02.xml error: missing source information for device vda
Have I done something wrong, or am I missing something?
Yes, you goofed by directly editing /etc/libvirt. By doing that, you are going behind libvirt's back - if your edits happen to work, then a libvirtd restart will use them, but if you introduce a typo or other problem, then it is your fault that libvirt can't get things to work. If you had instead gone through the libvirt API (such as by using 'virsh edit bwimail02'), then libvirt would do some sanity checking up front and refuse to install your changes unless they were safe. That said, your typo:
<source dev='/dev/server02_vg1_2tb/lv_vm03_swap'/>
is that you used <source dev=/> instead of <source file=/>. Even if the source is a raw block device on the host, you still call it out using file= in the xml. -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

The 10/10/11, Eric Blake wrote:
On 10/10/2011 03:52 PM, Alex wrote:
I made this change by editing the xml, restarting libvirtd, then using virsh to define the xml file and received this message:
<...>
Have I done something wrong, or am I missing something?
Yes, you goofed by directly editing /etc/libvirt. By doing that, you are going behind libvirt's back - if your edits happen to work, then a libvirtd restart will use them, but if you introduce a typo or other problem, then it is your fault that libvirt can't get things to work. If you had instead gone through the libvirt API (such as by using 'virsh edit bwimail02'), then libvirt would do some sanity checking up front and refuse to install your changes unless they were safe.
Which is comunterintuitive, IMHO. Admins are used to edit configuration files in /etc because most software are designed this way. Files not designed to be edited may stand in /var. At least, current files in /etc could have a header with comments from preventing manual edition and tips on how to deal to change configuration. ,-p -- Nicolas Sebrecht

On 10/11/2011 02:04 AM, Nicolas Sebrecht wrote:
Yes, you goofed by directly editing /etc/libvirt. By doing that, you are going behind libvirt's back - if your edits happen to work, then a libvirtd restart will use them, but if you introduce a typo or other problem, then it is your fault that libvirt can't get things to work. If you had instead gone through the libvirt API (such as by using 'virsh edit bwimail02'), then libvirt would do some sanity checking up front and refuse to install your changes unless they were safe.
Which is comunterintuitive, IMHO. Admins are used to edit configuration files in /etc because most software are designed this way.
Files not designed to be edited may stand in /var. At least, current files in /etc could have a header with comments from preventing manual edition and tips on how to deal to change configuration. ,-p
Which is exactly why newer libvirt sticks this header on such files (here, from /etc/libvirt/qemu/domainname.xml): <!-- WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: virsh edit domainname or other application using the libvirt API. --> -- Eric Blake eblake@redhat.com +1-801-349-2682 Libvirt virtualization library http://libvirt.org

On 10/11/11, Alex <mysqlstudent@gmail.com> wrote:
I made this change by editing the xml, restarting libvirtd, then using virsh to define the xml file and received this message:
virsh # define /etc/libvirt/qemu/bwimail02.xml error: Failed to define domain from /etc/libvirt/qemu/bwimail02.xml error: missing source information for device vda
Have I done something wrong, or am I missing something?
Will need to see your current XML to tell. But if you simply copied and pasted my code, then it's definitely not going to work. For one thing, you would need to have created a LVM volume named exactly the same way I did. If you simply changed the paremeters using your existing .img file as the source, then it also won't work simply for obvious reasons.

Hi,
I made this change by editing the xml, restarting libvirtd, then using virsh to define the xml file and received this message:
virsh # define /etc/libvirt/qemu/bwimail02.xml error: Failed to define domain from /etc/libvirt/qemu/bwimail02.xml error: missing source information for device vda
Have I done something wrong, or am I missing something?
Will need to see your current XML to tell. But if you simply copied and pasted my code, then it's definitely not going to work. For one thing, you would need to have created a LVM volume named exactly the same way I did.
If you simply changed the paremeters using your existing .img file as the source, then it also won't work simply for obvious reasons.
Although I'm not very experienced with libvirt, I'd like to think I have a reasonable grasp of the technology. Perhaps I'm showing just how new at this I am, but what would be the proper way using virsh to change the type to file instead of raw? I've included my xml below, and would really appreciate any guidance you may be able to offer. This is the pre-modified version. One of the things I was working on was trying to get the host processor spec to mirror that of the guest, but the guest had quite a few deficiencies from the host, including less cache (4096kb vs 8192kb), and quite a few missing flags on the guest. How does that impact performance? This guest config was mostly generated by virt-manager on fedora15. <domain type='kvm'> <name>mail02</name> <uuid>ec4f3cf5-2f27-fb3e-72f6-3fa3176b13b6</uuid> <memory>4194304</memory> <currentMemory>4194304</currentMemory> <vcpu>8</vcpu> <os> <type arch='x86_64' machine='pc-0.14'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/bin/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/var/lib/libvirt/images/mail02.img'/> <target dev='vda' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </disk> <disk type='block' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdc' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='1' unit='0'/> </disk> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <interface type='bridge'> <mac address='52:54:00:67:2c:4c'/> <source bridge='br0'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target type='serial' port='0'/> </console> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes'/> <video> <model type='cirrus' vram='9216' heads='1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </memballoon> </devices> </domain> Thanks, Alex

Hi,
iotop may report as much as 2M/s on the host, with an average of about 400-600K/s. Does that seem like a lot? I can write like 80MB/s at least using dd to test.
I had similar problems previously. The crux in my case is the number of IOPS possible. 100K of 2KB file writes is still 2MB/s but requires a lot more IO overheads than 1 single 2MB file.
Can I also ask how you measured performance and the effect any changes you made may have had on the system? iotop seems very general. Perhaps sar? Ideas for graphing its output? Thanks again, Alex

On 10/7/11, Alex <mysqlstudent@gmail.com> wrote:
Can I also ask how you measured performance and the effect any changes you made may have had on the system?
iotop seems very general. Perhaps sar? Ideas for graphing its output?
Well for that situation, people were screaming at me so I didn't really measured things empirically. It was just throwing one possible fix after another, while watching the system load iowait % go down until things were smooth again. Another key thing is having sufficient memory for the VM but I don't think that's your problem since you have 8GB on it. Then again, it would really depend on how much mails you are handling. But if you have time for measurement, as well as can afford the time/impact, you can try iozone which produces detailed stats and includes graphing functionality IIRC. The other tool I use is iostat. Although I'm also thinking of writing a script that more closely mimic the kind of loads on my typical server i.e. many small email files written + some reads and random updates to a single large file (e.g. log file, user db).
participants (5)
-
Alex
-
Emmanuel Noobadmin
-
Eric Blake
-
Nicolas Sebrecht
-
Robin Lee Powell