[libvirt-users] Libvirtd slowness

Hi, Can anyone tell me if it's normal to take over 2 seconds to connect to the libvirt daemon? xen1 ~ # time virsh -c 'xen://xen5/' pwd real 0m2.102s xen1 ~ # time virsh -c 'xen://xen1/' pwd real 0m2.217s Strace shows it sitting for 2.08s on polling the remote socket: poll([{fd=6, events=POLLIN}, {fd=7, events=POLLIN}], 2, -1) = 1 ([{fd=6, revents=POLLIN}]) <2.082839> Not quite sure why though? Experiencing this on both 0.9.7 and 0.9.0. This happens both with virsh and the perl libvirt bindings. If i just spawn virsh interactively I only incur that 2s hit on intial spawn, all commands are very quick and responsive. Anyone have any suggestions? - Nathan -- Nathan March<nathan@gt.net> Gossamer Threads Inc. http://www.gossamer-threads.com/ Tel: (604) 687-5804 Fax: (604) 687-5806

On 12/12/2011 4:35 PM, Nathan March wrote:
On 12/1/2011 3:18 PM, Nathan March wrote:
Can anyone tell me if it's normal to take over 2 seconds to connect to the libvirt daemon?
Can anyone comment on this? Even just a quick test to see if it happens on your systems would be appreciated.
I've rebuilt it with debugging and found the following: 16:47:29.640: 5972: debug : remoteIOEventLoop:10599 : Giving up the buck 66 0x7ff564a59010 (nil) 16:47:29.640: 5972: debug : remoteIO:10773 : All done with our call 66 (nil) 0x7ff564a59010 16:47:29.641: 5972: debug : remoteIO:10673 : Do proc=1 serial=1 length=48 wait=(nil) 16:47:29.641: 5972: debug : remoteIO:10745 : We have the buck 1 0x10c6130 0x10c6130 16:47:31.670: 5972: debug : remoteIODecodeMessageLength:10066 : Got length, now need 56 total (52 more) 16:47:31.670: 5972: debug : remoteIOEventLoop:10599 : Giving up the buck 1 0x10c6130 (nil) 16:47:31.670: 5972: debug : remoteIO:10773 : All done with our call 1 (nil) 0x10c6130 16:47:31.670: 5972: debug : doRemoteOpen:922 : Adding Handler for remote events 16:47:31.670: 5972: debug : virEventPollAddHandle:110 : Add handle fd=7 events=1 cb=0x7ff The 2s gap is pretty visible, not sure why yet. - Nathan

On 13.12.2011 01:48, Nathan March wrote:
On 12/12/2011 4:35 PM, Nathan March wrote:
On 12/1/2011 3:18 PM, Nathan March wrote:
Can anyone tell me if it's normal to take over 2 seconds to connect to the libvirt daemon?
Can anyone comment on this? Even just a quick test to see if it happens on your systems would be appreciated.
I've rebuilt it with debugging and found the following:
16:47:29.640: 5972: debug : remoteIOEventLoop:10599 : Giving up the buck 66 0x7ff564a59010 (nil) 16:47:29.640: 5972: debug : remoteIO:10773 : All done with our call 66 (nil) 0x7ff564a59010 16:47:29.641: 5972: debug : remoteIO:10673 : Do proc=1 serial=1 length=48 wait=(nil) 16:47:29.641: 5972: debug : remoteIO:10745 : We have the buck 1 0x10c6130 0x10c6130 16:47:31.670: 5972: debug : remoteIODecodeMessageLength:10066 : Got length, now need 56 total (52 more) 16:47:31.670: 5972: debug : remoteIOEventLoop:10599 : Giving up the buck 1 0x10c6130 (nil) 16:47:31.670: 5972: debug : remoteIO:10773 : All done with our call 1 (nil) 0x10c6130 16:47:31.670: 5972: debug : doRemoteOpen:922 : Adding Handler for remote events 16:47:31.670: 5972: debug : virEventPollAddHandle:110 : Add handle fd=7 events=1 cb=0x7ff
The 2s gap is pretty visible, not sure why yet.
Yeah, the gap is obvious. However, this is not telling us why. Can you please provide the daemon logs? As far as I can tell, we see client here, waiting for response for 2 secs. Therefore we need to inspect the daemon for what take it so long. Michal

On 12/13/2011 12:51 AM, Michal Privoznik wrote:
On 13.12.2011 01:48, Nathan March wrote:
I've rebuilt it with debugging and found the following:
16:47:29.640: 5972: debug : remoteIOEventLoop:10599 : Giving up the buck 66 0x7ff564a59010 (nil) 16:47:29.640: 5972: debug : remoteIO:10773 : All done with our call 66 (nil) 0x7ff564a59010 16:47:29.641: 5972: debug : remoteIO:10673 : Do proc=1 serial=1 length=48 wait=(nil) 16:47:29.641: 5972: debug : remoteIO:10745 : We have the buck 1 0x10c6130 0x10c6130 16:47:31.670: 5972: debug : remoteIODecodeMessageLength:10066 : Got length, now need 56 total (52 more) 16:47:31.670: 5972: debug : remoteIOEventLoop:10599 : Giving up the buck 1 0x10c6130 (nil) 16:47:31.670: 5972: debug : remoteIO:10773 : All done with our call 1 (nil) 0x10c6130 16:47:31.670: 5972: debug : doRemoteOpen:922 : Adding Handler for remote events 16:47:31.670: 5972: debug : virEventPollAddHandle:110 : Add handle fd=7 events=1 cb=0x7ff
The 2s gap is pretty visible, not sure why yet.
Yeah, the gap is obvious. However, this is not telling us why. Can you please provide the daemon logs? As far as I can tell, we see client here, waiting for response for 2 secs. Therefore we need to inspect the daemon for what take it so long.
Yep, here it is: 11:09:25.687: 7898: debug : virEventPollDispatchHandles:466 : i=5 w=6 11:09:25.687: 7898: debug : virEventPollDispatchHandles:466 : i=6 w=7 11:09:25.687: 7898: debug : virEventPollDispatchHandles:466 : i=7 w=8 11:09:25.687: 7898: debug : virEventPollDispatchHandles:479 : Dispatch n=7 f=17 w=8 e=1 0x21e19a0 11:09:25.688: 7903: debug : virEventPollUpdateHandle:143 : Update handle w=7 e=3 11:09:25.688: 7903: debug : virEventPollInterruptLocked:693 : Interrupting 11:09:25.688: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:25.789: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:25.889: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:25.990: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.091: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.191: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.292: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.393: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.494: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.595: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.696: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.796: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.897: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.025: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.126: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.227: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.328: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.429: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.530: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.631: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.731: 7898: debug : virEventPollDispatchHandles:466 : i=8 w=9 11:09:27.732: 7898: debug : virEventPollCleanupTimeouts:497 : Cleanup 1 11:09:27.732: 7898: debug : virEventPollCleanupHandles:544 : Cleanup 9 11:09:27.732: 7898: debug : virEventRunDefaultImpl:188 : 11:09:27.732: 7898: debug : virEventPollCleanupTimeouts:497 : Cleanup 1 11:09:27.732: 7898: debug : virEventPollCleanupHandles:544 : Cleanup 9 This is on a freshly booted machine with no VM's running on it, so not sure what domains it's expecting to find? - Nathan

On 12/13/2011 11:11 AM, Nathan March wrote:
This is on a freshly booted machine with no VM's running on it, so not sure what domains it's expecting to find?
- Nathan
Little more info, here's an strace of the libvirtd during that time: [pid 7898] 11:21:48.450160 write(8, "11:21:48.449: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again\n", 100) = 100 [pid 7898] 11:21:48.450468 nanosleep({0, 100000000}, NULL) = 0 [pid 7898] 11:21:48.550803 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7fc1cfa42400}, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fc1cfa42400}, 8) = 0 [pid 7898] 11:21:48.551095 write(16, "\1\0\0\0\0\0\0\0\0\0\0\0\16\0\0\0", 16) = 16 [pid 7898] 11:21:48.551351 write(16, "/local/domain\0", 14 <unfinished ...> [pid 8577] 11:21:48.551597 <... read resumed> "\1\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0", 16) = 16 [pid 8577] 11:21:48.551729 read(16, "0\0", 2) = 2 [pid 8577] 11:21:48.551937 read(16, <unfinished ...> [pid 7898] 11:21:48.552081 <... write resumed> ) = 14 [pid 7898] 11:21:48.552193 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fc1cfa42400}, NULL, 8) = 0 [pid 7898] 11:21:48.552435 mlock(0x7fc1cdb697d0, 80) = 0 [pid 7898] 11:21:48.552662 mlock(0x7fc1cdb696e0, 136) = 0 [pid 7898] 11:21:48.552886 ioctl(15, SNDCTL_DSP_RESET, 0x7fc1cdb69560) = 0 [pid 7898] 11:21:48.553158 munlock(0x7fc1cdb696e0, 136) = 0 [pid 7898] 11:21:48.553330 munlock(0x7fc1cdb697d0, 80) = 0 [pid 7898] 11:21:48.553524 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7fc1cfa42400}, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fc1cfa42400}, 8) = 0 [pid 7898] 11:21:48.553843 write(16, "\1\0\0\0\0\0\0\0\0\0\0\0\16\0\0\0", 16) = 16 [pid 7898] 11:21:48.554124 write(16, "/local/domain\0", 14 <unfinished ...> [pid 8577] 11:21:48.554411 <... read resumed> "\1\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0", 16) = 16 [pid 8577] 11:21:48.554584 read(16, "0\0", 2) = 2 [pid 8577] 11:21:48.554745 read(16, <unfinished ...> [pid 7898] 11:21:48.554911 <... write resumed> ) = 14 [pid 7898] 11:21:48.555020 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fc1cfa42400}, NULL, 8) = 0 [pid 7898] 11:21:48.555289 mlock(0x7fc1cdb697c0, 80) = 0 [pid 7898] 11:21:48.555461 mlock(0x7fc1cdb696d0, 136) = 0 [pid 7898] 11:21:48.555684 ioctl(15, SNDCTL_DSP_RESET, 0x7fc1cdb69550) = 0 [pid 7898] 11:21:48.555884 munlock(0x7fc1cdb696d0, 136) = 0 [pid 7898] 11:21:48.556088 munlock(0x7fc1cdb697c0, 80) = 0 [pid 7898] 11:21:48.556293 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7fc1cfa42400}, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fc1cfa42400}, 8) = 0 [pid 7898] 11:21:48.556548 write(16, "\2\0\0\0\0\0\0\0\0\0\0\0\25\0\0\0", 16) = 16 [pid 7898] 11:21:48.556825 write(16, "/local/domain/0/name\0", 21 <unfinished ...> [pid 8577] 11:21:48.557075 <... read resumed> "\2\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0", 16) = 16 [pid 8577] 11:21:48.557206 read(16, "Domain-0", 8) = 8 [pid 8577] 11:21:48.557373 read(16, <unfinished ...> [pid 7898] 11:21:48.557538 <... write resumed> ) = 21 [pid 7898] 11:21:48.557602 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fc1cfa42400}, NULL, 8) = 0 [pid 7898] 11:21:48.557872 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7fc1cfa42400}, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fc1cfa42400}, 8) = 0 [pid 7898] 11:21:48.558166 write(16, "\2\0\0\0\0\0\0\0\0\0\0\0\23\0\0\0", 16) = 16 [pid 7898] 11:21:48.558420 write(16, "/local/domain/0/vm\0", 19 <unfinished ...> [pid 8577] 11:21:48.558662 <... read resumed> "\2\0\0\0\0\0\0\0\0\0\0\0*\0\0\0", 16) = 16 [pid 8577] 11:21:48.558887 read(16, "/vm/00000000-0000-0000-0000-000000000000-7", 42) = 42 [pid 8577] 11:21:48.559150 read(16, <unfinished ...> [pid 7898] 11:21:48.559331 <... write resumed> ) = 19 [pid 7898] 11:21:48.559464 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER|SA_SIGINFO, 0x7fc1cfa42400}, NULL, 8) = 0 [pid 7898] 11:21:48.559706 gettimeofday({1323804108, 559781}, NULL) = 0 [pid 7898] 11:21:48.559912 gettid() = 7898 [pid 7898] 11:21:48.560116 write(8, "11:21:48.559: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again\n", 100) = 100 Yet xenstore data for the local dom0 looks fine? xen3 ~ # xenstore-ls /local/domain/0 vm = "/vm/00000000-0000-0000-0000-000000000000-7" device = "" control = "" platform-feature-multiprocessor-suspend = "1" error = "" memory = "" target = "1048576" guest = "" hvmpv = "" data = "" description = "" console = "" limit = "1048576" type = "xenconsoled" domid = "0" cpu = "" 0 = "" availability = "online" name = "Domain-0" xen3 ~ # xenstore-read /local/domain/0/name Domain-0 xen3 ~ # xenstore-read /local/domain/0/vm /vm/00000000-0000-0000-0000-000000000000-7 Bug? - Nathan

On 12/13/2011 11:24 AM, Nathan March wrote:
On 12/13/2011 11:11 AM, Nathan March wrote:
This is on a freshly booted machine with no VM's running on it, so not sure what domains it's expecting to find?
- Nathan
Little more info, here's an strace of the libvirtd during that time:
Anyone have anything they can suggest for this? - Nathan

On 12/13/2011 11:11 AM, Nathan March wrote:
On 12/13/2011 12:51 AM, Michal Privoznik wrote:
On 13.12.2011 01:48, Nathan March wrote:
I've rebuilt it with debugging and found the following:
16:47:29.640: 5972: debug : remoteIOEventLoop:10599 : Giving up the buck 66 0x7ff564a59010 (nil) 16:47:29.640: 5972: debug : remoteIO:10773 : All done with our call 66 (nil) 0x7ff564a59010 16:47:29.641: 5972: debug : remoteIO:10673 : Do proc=1 serial=1 length=48 wait=(nil) 16:47:29.641: 5972: debug : remoteIO:10745 : We have the buck 1 0x10c6130 0x10c6130 16:47:31.670: 5972: debug : remoteIODecodeMessageLength:10066 : Got length, now need 56 total (52 more) 16:47:31.670: 5972: debug : remoteIOEventLoop:10599 : Giving up the buck 1 0x10c6130 (nil) 16:47:31.670: 5972: debug : remoteIO:10773 : All done with our call 1 (nil) 0x10c6130 16:47:31.670: 5972: debug : doRemoteOpen:922 : Adding Handler for remote events 16:47:31.670: 5972: debug : virEventPollAddHandle:110 : Add handle fd=7 events=1 cb=0x7ff
The 2s gap is pretty visible, not sure why yet.
Yeah, the gap is obvious. However, this is not telling us why. Can you please provide the daemon logs? As far as I can tell, we see client here, waiting for response for 2 secs. Therefore we need to inspect the daemon for what take it so long.
Yep, here it is:
11:09:25.687: 7898: debug : virEventPollDispatchHandles:466 : i=5 w=6 11:09:25.687: 7898: debug : virEventPollDispatchHandles:466 : i=6 w=7 11:09:25.687: 7898: debug : virEventPollDispatchHandles:466 : i=7 w=8 11:09:25.687: 7898: debug : virEventPollDispatchHandles:479 : Dispatch n=7 f=17 w=8 e=1 0x21e19a0 11:09:25.688: 7903: debug : virEventPollUpdateHandle:143 : Update handle w=7 e=3 11:09:25.688: 7903: debug : virEventPollInterruptLocked:693 : Interrupting 11:09:25.688: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:25.789: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:25.889: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:25.990: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.091: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.191: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.292: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.393: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.494: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.595: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.696: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.796: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:26.897: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.025: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.126: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.227: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.328: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.429: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.530: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.631: 7898: debug : xenStoreDomainIntroduced:1354 : Some domains were missing, trying again 11:09:27.731: 7898: debug : virEventPollDispatchHandles:466 : i=8 w=9 11:09:27.732: 7898: debug : virEventPollCleanupTimeouts:497 : Cleanup 1 11:09:27.732: 7898: debug : virEventPollCleanupHandles:544 : Cleanup 9 11:09:27.732: 7898: debug : virEventRunDefaultImpl:188 : 11:09:27.732: 7898: debug : virEventPollCleanupTimeouts:497 : Cleanup 1 11:09:27.732: 7898: debug : virEventPollCleanupHandles:544 : Cleanup 9
This is on a freshly booted machine with no VM's running on it, so not sure what domains it's expecting to find?
Anyone able to help out here? - Nathan
participants (2)
-
Michal Privoznik
-
Nathan March