On 2014年06月18日 20:24, Laine Stump wrote:
On 06/17/2014 06:16 AM, vaughan wrote:
> Hi experts,
>
> Release: OL7
> Kernel: 3.10.0-121.el7.x86_64
>
> Noticed below error on OL7 server, while loading Intel 10gigabit nic
> driver module , ixgbe in syslog
> --------------------------------------------------------------------
> journal: nl_recv returned with error: No buffer space available
> -------------------------------------------------------------------
>
> Complete syslog content for ixgbe module load :
>
> un 16 20:46:10 ca-ostest432 kernel: ixgbe: Intel(R) 10 Gigabit PCI
> Express Network Driver - version 3.15.1-k
> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe: Copyright (c) 1999-2013
> Intel Corporation.
> Jun 16 20:46:10 ca-ostest432 kvm: 1 guest now active
> Jun 16 20:46:10 ca-ostest432 kvm: 0 guests now active
> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: Multiqueue
> Enabled: Rx Queue count = 16, Tx Queue count = 16
> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (PCI
> Express:5.0GT/s:Width x8) 00:1b:21:c8:24:74
> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: MAC: 2, PHY: 9,
> SFP+: 3, PBA No: E70856-007
> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: PCI Express
> bandwidth of 32GT/s available
> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (Speed:5.0GT/s,
> Width: x8, Encoding Loss:20%)
> Jun 16 20:46:10 ca-ostest432 journal: nl_recv returned with error: No
> buffer space available
A very similar problem (probably the same, but you're showing the kernel
error message rather than the error logged by libvirt) was reported and
Can we
distinguish the two problem just from libvirtd.log? I turned it
on, and it just output the following messages, like common errors. If
you would like to see the whole log, I can attach it.
2014-06-18 14:16:40.901+0000: 31891: info :
virNetlinkEventServiceStart:520 : starting netlink event service with
protocol 15
2014-06-18 14:16:40.901+0000: 31891: debug :
virEventPollInterruptLocked:710 : Skip interrupt, 0 0
2014-06-18 14:16:40.901+0000: 31891: debug : virEventPollAddHandle:136 :
EVENT_POLL_ADD_HANDLE: watch=6 fd=14 events=1 cb=0x7fdff9397680
opaque=0x7fdffb8ff960 ff=(nil)
2014-06-18 14:16:40.901+0000: 31891: debug :
virNetlinkEventServiceStart:576 : netlink event listener on fd: 14 running
2014-06-18 14:16:40.901+0000: 31891: debug : virNetServerRun:1093 :
srv=0x7fdffb8f0d70 quit=0
.........
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=0 w=1, f=6 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=1 w=2, f=8 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=2 w=3, f=11 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=3 w=4, f=12 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=4 w=5, f=13 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=5 w=6, f=14 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=6 w=7, f=15 e=0 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=7 w=8, f=15 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=8 w=9, f=17 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCalculateTimeout:332 : Calculate expiry of 0 timers
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCalculateTimeout:361 : Timeout at 0 due in -1 ms
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:627 :
EVENT_POLL_RUN: nhandles=8 timeout=-1
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:638 :
Poll got 1 event(s)
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchTimeouts:423 : Dispatch 0
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:468 : Dispatch 8
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=0 w=1
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=1 w=2
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=2 w=3
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=3 w=4
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=4 w=5
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=5 w=6
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:496 : EVENT_POLL_DISPATCH_HANDLE: watch=6
events=1
2014-06-18 14:16:48.993+0000: 31891: error : virNetlinkEventCallback:344
: nl_recv return ok length: 186: Success
2014-06-18 14:16:48.993+0000: 31891: debug : virNetlinkEventCallback:349
: dispatching to max 0 clients, called from event watch 6
2014-06-18 14:16:48.993+0000: 31891: debug : virNetlinkEventCallback:362
: event not handled.
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=7 w=8
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=8 w=9
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCleanupTimeouts:514 : Cleanup 0
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used,
releasing 0
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCleanupHandles:562 : Cleanup 9
2014-06-18 14:16:48.993+0000: 31891: debug : virEventRunDefaultImpl:270
: running default event implementation
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCleanupTimeouts:514 : Cleanup 0
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used,
releasing 0
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCleanupHandles:562 : Cleanup 9
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=0 w=1, f=6 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=1 w=2, f=8 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=2 w=3, f=11 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=3 w=4, f=12 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=4 w=5, f=13 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=5 w=6, f=14 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=6 w=7, f=15 e=0 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=7 w=8, f=15 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391
: Prepare n=8 w=9, f=17 e=1 d=0
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCalculateTimeout:332 : Calculate expiry of 0 timers
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollCalculateTimeout:361 : Timeout at 0 due in -1 ms
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:627 :
EVENT_POLL_RUN: nhandles=8 timeout=-1
2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:638 :
Poll got 1 event(s)
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchTimeouts:423 : Dispatch 0
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:468 : Dispatch 8
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=0 w=1
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=1 w=2
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=2 w=3
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=3 w=4
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=4 w=5
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:482 : i=5 w=6
2014-06-18 14:16:48.993+0000: 31891: debug :
virEventPollDispatchHandles:496 : EVENT_POLL_DISPATCH_HANDLE: watch=6
events=5
2014-06-18 14:16:48.993+0000: 31891: error : virNetlinkEventCallback:340
: nl_recv returned with error, -5: No buffer space available
addressed in libnl3 quite awhile back. libnl3 originally set the
default
buffer size to 4096, which wasn't enough for SRIOV cards with lots of
VFs. So they increased it to 4 * 4096, which should be plenty for
anybody. That libnl3 patch is present in RHEL7.0 (currently at 3.2.21.6).
Can you verify the version of libnl3 you are running, and that it
contains this code
if (page_size == 0)
page_size = getpagesize() * 4;
I'm using
libnl3-3.2.21-6.0.1.el7.x86_64 which has the patch you mentioned,
* <E4><B8><89> 2<E6><9C><88> 26 2014 Thomas Graf
<tgraf(a)redhat.com> -
3.2.21-5
- nl-Increase-receive-buffer-size-to-4-pages.patch (rh #1040626)
in the function lib/nl.c:nl_recv() (previously it was just
"page_size =
getpagesize();"). If you don't have that patch in your libnl3 package,
please backport the upstream commit that makes that change. If you do
have that patch in your libnl3, perhaps you have gotten a different
ixgbe driver from somewhere (we did test against ixgbe with the maximum
number of VFs, so there would have to be something different in your
driver). It would be good to figure out the source of the problem before
applying any fix anywhere - much better to understand the cause, and
right now I don't think we do; what is creating the need for such a
large buffer in your case, but not for others who use the same driver
with the same number of VFs?).
Thank you to point this out. I admit that my patch
is derived from the
symptom, not the cause.
I'll dig in this direction.
filename:
/lib/modules/3.10.0-121.el7.x86_64/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko
version: 3.15.1-k
license: GPL
description: Intel(R) 10 Gigabit PCI Express Network Driver
author: Intel Corporation, <linux.nics(a)intel.com>
srcversion: FC629607FA8BAB4CDE3BCF0
.....
depends: mdio,ptp,dca
intree: Y
vermagic: 3.10.0-121.el7.x86_64 SMP mod_unload modversions
signer: Red Hat Enterprise Linux kernel signing key
sig_key: E3:7B:3F:4E:75:78:02:B9:12:00:CF:F3:C7:81:1A:51:2E:94:83:EF
sig_hashalgo: sha256
parm: max_vfs:Maximum number of virtual functions to allocate per
physical function - default is zero and maximum value is 63 (uint)
parm: allow_unsupported_sfp:Allow unsupported and untested SFP+ modules
on 82599-based adapters (uint)
parm: debug:Debug level (0=none,...,16=all) (int)
Up to now our position has been that this problem should be fixed in
libnl, so we have preferred to not patch libvirt for it, but instead get
libnl fixed. If we do decide to patch libvirt, I think it would be
better to turn on message peeking for nl_recv
(nl_socket_enable_msg_peek()), as that would solve the problem totally
and permanently (the upstream maintainer of libnl is reluctant to turn
that on by default due to potential performance problems in other users
of libnl)
Affirmative.