I have a dell r910 with rhel6.2 on it and libvirt 0.8. This machine is hosting 12 virtual
guests. Every 3 - 5 months the server crashes for no apparent reason. The logs show no
kernel panics or other issues causing the crash. The sar logs show a very high context
switch count ( approx. 170000). and also high runq-sz (approx. 10 - 18). The cpu's
were mostly idle, memory usage low, no swapping, disk io very low as well. This also
occurs on a number of other rhel6.2 servers I have using KVM/libvirt/qemu for
virtualization. I am curious if anyone else has reported incidents like this. After the
crash the servers all come backup, but as you can imagine, it is troubling to see this
kind of behavior, especially with these machines hosting production guests.
Any help or suggestions on what to look for would be helpful.
Regards
Show replies by date