Hi,
I'm having problems with libvirt crashing after a couple hours when a
specific domain monitoring program is running.
I have pasted below the following:
1. libvirt version
2. qemu-kvm version
3. OS version
4. Kernel version
5. libvirt status post-crash
6. libvirtd.log (info level dump around crash; too long to post
everything so just the beginning and end. UTC)
7. custom.log (on what this domain monitoring program was doing around
the time of the crash. JST)
8. FYI on the program being executed
9. other related server settings
Please, if anyone can look through these and give some insight as to
what is causing libvirt to crash, that would be greatly appreciated.
1.) libvirt version:
# rpm -q libvirt
libvirt-0.9.10-21.el6.x86_64
2.) qemu-kvm version:
qemu-kvm-0.12.1.2-3.295.el6.10.x86_64
3.) OS version:
# cat /etc/redhat-release
CentOS release 6.3 (Final)
4.) Kernel version:
# uname -r
2.6.32-279.22.1.el6.x86_64
5.) libvirt status after crash:
# service libvirtd status
libvirtd dead but pid file exists
6.) libvirtd.log
2014-02-06 10:25:05.173+0000: 1187: info : remoteDispatchAuthList:2091 : Bypass polkit
auth for privileged client pid:58626,uid:0
2014-02-06 10:25:05.237+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit
auth for privileged client pid:58636,uid:0
2014-02-06 10:25:05.271+0000: 1185: info : remoteDispatchAuthList:2091 : Bypass polkit
auth for privileged client pid:58646,uid:0
2014-02-06 10:25:05.301+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit
auth for privileged client pid:58648,uid:0
2014-02-06 10:25:05.400+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit
auth for privileged client pid:58650,uid:0
Caught Segmentation violation dumping internal log buffer:
====== start of log =====
^(a)05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=19 w=21, f=32 e=25 d=0
2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=20 w=22,
f=34 e=25 d=0
2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=21 w=23,
f=33 e=25 d=0
2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=22 w=24,
f=36 e=25 d=0
2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=23 w=25,
f=38 e=25 d=0
2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=24 w=26,
f=39 e=25 d=0
2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=25 w=27,
f=41 e=25 d=0
2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=26 w=28,
f=40 e=25 d=0
(cut out due to length)
2014-02-06 10:25:05.423+00001182: debug : virEventPollDispatchHandles:488 :
EVENT_POLL_DISPATCH_HANDLE: watch=2791 events=2
2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : msg=0x2326a20 nfds=0
cb=(nil)
2014-02-06 10:25:05.423+00001182: debug : virNetServerClientCalculateHandleMode:137 :
tls=(nil) hs=-1, rx=0x2266390 tx=(nil)
2014-02-06 10:25:05.423+00001182: debug : virNetServerClientCalculateHandleMode:167 :
mode=1
2014-02-06 10:25:05.423+00001182: debug : virEventPollUpdateHandle:151 :
EVENT_POLL_UPDATE_HANDLE: watch=2791 events=1
2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip
interrupt, 1 -1675536288
2014-02-06 10:25:05.423+00001182: debug : virEventPollDispatchHandles:474 : i=33 w=2793
2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:506 : Cleanup 12
2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:554 : Cleanup 34
2014-02-06 10:25:05.423+00001182: debug : virNetServerClientClose:632 : client=0x22e6860
refs=3
2014-02-06 10:25:05.423+00001182: debug : virKeepAliveStop:382 : RPC_KEEPALIVE_STOP:
ka=0x225bf20 client=0x22e6860
2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveTimeout:293 :
EVENT_POLL_REMOVE_TIMEOUT: timer=8290
2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip
interrupt, 0 -1675536288
2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveTimeout:293 :
EVENT_POLL_REMOVE_TIMEOUT: timer=8289
2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip
interrupt, 0 -1675536288
2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE:
ka=0x225bf20 client=0x22e6860 refs=3
2014-02-06 10:25:05.423+00001182: debug : daemonRemoveAllClientStreams:493 : stream=(nil)
2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveHandle:180 :
EVENT_POLL_REMOVE_HANDLE: watch=2791
2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveHandle:193 : mark delete 32
50
2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip
interrupt, 0 -1675536288
2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : msg=0x2266390 nfds=0
cb=(nil)
2014-02-06 10:25:05.423+00001182: debug : virNetSocketFree:722 : RPC_SOCKET_FREE:
sock=0x22e66a0 refs=2
2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 :
RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=3
2014-02-06 10:25:05.423+00001182: debug : virEventRunDefaultImpl:244 : running default
event implementation
2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:506 : Cleanup 12
2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:519 :
EVENT_POLL_PURGE_TIMEOUT: timer=8289
2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE:
ka=0x225bf20 client=0x22e6860 refs=2
2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:519 :
EVENT_POLL_PURGE_TIMEOUT: timer=8290
2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE:
ka=0x225bf20 client=0x22e6860 refs=1
2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 :
RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=2
2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:554 : Cleanup 34
2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:567 :
EVENT_POLL_PURGE_HANDLE: watch=2791
2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 :
RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=1
2014-02-06 10:25:05.423+00001182: debug : virConnectClose:1462 : conn=0x7f1b380c4630
2014-02-06 10:25:05.423+00001182: debug : virUnrefConnect:145 : unref connection
0x7f1b380c4630 1
2014-02-06 10:25:05.423+00001182: debug : virReleaseConnect:94 : release connection
0x7f1b380c4630
====== end of log =====
7.) custom.log
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh domifstat i-8-114-VM Interf
ace
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file
or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh domifstat i-8-114-VM vnet4
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file
or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh list --all
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file
or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh domiflist i-8-114-VM
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file
or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh domifstat i-8-114-VM Interf
ace
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file
or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh domifstat i-8-114-VM vnet4
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file
or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh list --all
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file
or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh list --all
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file
or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
Feb 6 19:25:05 jp7-rk90000 [authpriv.notice] sudo: zabbix :
TTY=unknown ; PWD=/etc/zabbix/sender_scripts/compute ; USER=root ;
COMMAND=/usr/bin/virsh list --all
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM unable to
dlopen(/lib64/security/pam_fprintd.so): /lib64/security/pam_fprintd.so:
cannot open shared object file: No such file or directory
Feb 6 19:25:05 jp7-rk90000 [authpriv.err] sudo: PAM adding faulty
module: /lib64/security/pam_fprintd.so
8.) FYI on this program:
This program consists of three main scripts that are run via cron.
2 run every 5 minutes, and 1 runs every minute.
The two scripts that are executed every 5 minutes rely heavily on the
virsh command. However, it is made so that the simultaneous number of
connections to libvirt is not too large; the max number of libvirt-sock
connections at any given moment does not go over 6.
As it is a domain monitoring program, it only executes the following
virsh commands:
virsh list --all
virsh dominfo
virsh domblklist
virsh domblkstat
virsh domiflist
virsh domifstat
9.) other related server settings
9-1) user resource limits
[root@ ~]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 773493
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 32768
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
9-2) libvirtd.conf
The following settings have been changed:
#max_clients = 20
max_clients = 250
#max_workers = 20
max_workers = 250
#max_requests = 20
max_requests = 250
Regards,
Minami