On 07.02.2014 09:47, Minami Katsumata wrote:
Hi,
I'm having problems with libvirt crashing after a couple hours when a specific domain monitoring program is running.
I have pasted below the following: 1. libvirt version 2. qemu-kvm version 3. OS version 4. Kernel version 5. libvirt status post-crash 6. libvirtd.log (info level dump around crash; too long to post everything so just the beginning and end. UTC) 7. custom.log (on what this domain monitoring program was doing around the time of the crash. JST) 8. FYI on the program being executed 9. other related server settings
Please, if anyone can look through these and give some insight as to what is causing libvirt to crash, that would be greatly appreciated.
1.) libvirt version: # rpm -q libvirt libvirt-0.9.10-21.el6.x86_64
This is rather ancient libvirt, can you please update and see if the issue was fixed?
2.) qemu-kvm version: qemu-kvm-0.12.1.2-3.295.el6.10.x86_64
3.) OS version: # cat /etc/redhat-release CentOS release 6.3 (Final)
Ah, this explains the libvirt version. AFAIK there's been Centos-6.5 released.
4.) Kernel version: # uname -r 2.6.32-279.22.1.el6.x86_64
5.) libvirt status after crash: # service libvirtd status libvirtd dead but pid file exists
6.) libvirtd.log
2014-02-06 10:25:05.173+0000: 1187: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58626,uid:0 2014-02-06 10:25:05.237+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58636,uid:0 2014-02-06 10:25:05.271+0000: 1185: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58646,uid:0 2014-02-06 10:25:05.301+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58648,uid:0 2014-02-06 10:25:05.400+0000: 1184: info : remoteDispatchAuthList:2091 : Bypass polkit auth for privileged client pid:58650,uid:0 Caught Segmentation violation dumping internal log buffer:
====== start of log =====
^@05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=19 w=21, f=32 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=20 w=22, f=34 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=21 w=23, f=33 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=22 w=24, f=36 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=23 w=25, f=38 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=24 w=26, f=39 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=25 w=27, f=41 e=25 d=0 2014-02-06 10:25:05.412+00001182: debug : virEventPollMakePollFDs:383 : Prepare n=26 w=28, f=40 e=25 d=0
(cut out due to length)
2014-02-06 10:25:05.423+00001182: debug : virEventPollDispatchHandles:488 : EVENT_POLL_DISPATCH_HANDLE: watch=2791 events=2 2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : msg=0x2326a20 nfds=0 cb=(nil) 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientCalculateHandleMode:137 : tls=(nil) hs=-1, rx=0x2266390 tx=(nil) 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientCalculateHandleMode:167 : mode=1 2014-02-06 10:25:05.423+00001182: debug : virEventPollUpdateHandle:151 : EVENT_POLL_UPDATE_HANDLE: watch=2791 events=1 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 1 -1675536288 2014-02-06 10:25:05.423+00001182: debug : virEventPollDispatchHandles:474 : i=33 w=2793 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:506 : Cleanup 12 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:554 : Cleanup 34 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientClose:632 : client=0x22e6860 refs=3 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveStop:382 : RPC_KEEPALIVE_STOP: ka=0x225bf20 client=0x22e6860 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveTimeout:293 : EVENT_POLL_REMOVE_TIMEOUT: timer=8290 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveTimeout:293 : EVENT_POLL_REMOVE_TIMEOUT: timer=8289 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=3 2014-02-06 10:25:05.423+00001182: debug : daemonRemoveAllClientStreams:493 : stream=(nil) 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveHandle:180 : EVENT_POLL_REMOVE_HANDLE: watch=2791 2014-02-06 10:25:05.423+00001182: debug : virEventPollRemoveHandle:193 : mark delete 32 50 2014-02-06 10:25:05.423+00001182: debug : virEventPollInterruptLocked:702 : Skip interrupt, 0 -1675536288 2014-02-06 10:25:05.423+00001182: debug : virNetMessageFree:75 : msg=0x2266390 nfds=0 cb=(nil) 2014-02-06 10:25:05.423+00001182: debug : virNetSocketFree:722 : RPC_SOCKET_FREE: sock=0x22e66a0 refs=2 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=3 2014-02-06 10:25:05.423+00001182: debug : virEventRunDefaultImpl:244 : running default event implementation 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:506 : Cleanup 12 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:519 : EVENT_POLL_PURGE_TIMEOUT: timer=8289 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=2 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupTimeouts:519 : EVENT_POLL_PURGE_TIMEOUT: timer=8290 2014-02-06 10:25:05.423+00001182: debug : virKeepAliveFree:304 : RPC_KEEPALIVE_FREE: ka=0x225bf20 client=0x22e6860 refs=1 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=2 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:554 : Cleanup 34 2014-02-06 10:25:05.423+00001182: debug : virEventPollCleanupHandles:567 : EVENT_POLL_PURGE_HANDLE: watch=2791 2014-02-06 10:25:05.423+00001182: debug : virNetServerClientFree:591 : RPC_SERVER_CLIENT_FREE: client=0x22e6860 refs=1 2014-02-06 10:25:05.423+00001182: debug : virConnectClose:1462 : conn=0x7f1b380c4630 2014-02-06 10:25:05.423+00001182: debug : virUnrefConnect:145 : unref connection 0x7f1b380c4630 1 2014-02-06 10:25:05.423+00001182: debug : virReleaseConnect:94 : release connection 0x7f1b380c4630
====== end of log =====
These logs are pretty much useless (not your fault). On one hand, they may help us to see what libvirt was doing just before the crash. On the other hand: a) it completely misses TID b) it ends just before SIGSEGV occurs (so for example if segmentation fault happens in one thread, the logs may as well been showing completely unrelated thread). Therefore I think attaching gdb to the libvirtd, then reproducing the crash would gain more data, IMO. Michal