[libvirt] NL_RECV RETURNED WITH ERROR: NO BUFFER SPACE AVAILABLE ERROR FOR IXGBE LOAD

Hi experts, Release: OL7 Kernel: 3.10.0-121.el7.x86_64 Noticed below error on OL7 server, while loading Intel 10gigabit nic driver module , ixgbe in syslog -------------------------------------------------------------------- journal: nl_recv returned with error: No buffer space available ------------------------------------------------------------------- Complete syslog content for ixgbe module load : un 16 20:46:10 ca-ostest432 kernel: ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.15.1-k Jun 16 20:46:10 ca-ostest432 kernel: ixgbe: Copyright (c) 1999-2013 Intel Corporation. Jun 16 20:46:10 ca-ostest432 kvm: 1 guest now active Jun 16 20:46:10 ca-ostest432 kvm: 0 guests now active Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16 Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (PCI Express:5.0GT/s:Width x8) 00:1b:21:c8:24:74 Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: MAC: 2, PHY: 9, SFP+: 3, PBA No: E70856-007 Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: PCI Express bandwidth of 32GT/s available Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (Speed:5.0GT/s, Width: x8, Encoding Loss:20%) Jun 16 20:46:10 ca-ostest432 journal: nl_recv returned with error: No buffer space available Jun 16 20:46:11 ca-ostest432 journal: nl_recv returned with error: No buffer space available Jun 16 20:46:11 ca-ostest432 systemd-udevd[4939]: renamed network interface eth0 to ens2f0 Jun 16 20:46:11 ca-ostest432 kernel: ixgbe 0000:13:00.0: Intel(R) 10 Gigabit Network Connection Jun 16 20:46:11 ca-ostest432 kernel: ixgbe 0000:13:00.1: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16 Jun 16 20:46:11 ca-ostest432 kernel: ixgbe 0000:13:00.1: (PCI Express:5.0GT/s:Width x8) 00:1b:21:c8:24:75 Jun 16 20:46:11 ca-ostest432 kernel: ixgbe 0000:13:00.1: MAC: 2, PHY: 9, SFP+: 4, PBA No: E70856-007 Jun 16 20:46:11 ca-ostest432 kernel: ixgbe 0000:13:00.1: PCI Express bandwidth of 32GT/s available Jun 16 20:46:11 ca-ostest432 kernel: ixgbe 0000:13:00.1: (Speed:5.0GT/s, Width: x8, Encoding Loss:20%) Jun 16 20:46:11 ca-ostest432 systemd-udevd[4984]: renamed network interface eth0 to ens2f1 Jun 16 20:46:11 ca-ostest432 kernel: ixgbe 0000:13:00.1: Intel(R) 10 Gigabit Network Connection Jun 16 20:46:11 ca-ostest432 NetworkManager[1316]: <info> (ens2f0): carrier is OFF (but ignored) Jun 16 20:46:11 ca-ostest432 NetworkManager[1316]: <info> (ens2f0): new Ethernet device (driver: 'ixgbe' ifindex: 14) Jun 16 20:46:11 ca-ostest432 NetworkManager[1316]: <info> (ens2f0): exported as /org/freedesktop/NetworkManager/Devices/13 Jun 16 20:46:11 ca-ostest432 NetworkManager[1316]: <info> (ens2f0): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2] Jun 16 20:46:11 ca-ostest432 journal: nl_recv returned with error: No buffer space available Jun 16 20:46:11 ca-ostest432 kernel: ixgbe 0000:13:00.0: registered PHC device on ens2f0 Jun 16 20:46:12 ca-ostest432 kernel: IPv6: ADDRCONF(NETDEV_UP): ens2f0: link is not ready Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f0): preparing device Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f1): carrier is OFF (but ignored) Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f1): new Ethernet device (driver: 'ixgbe' ifindex: 15) Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f1): exported as /org/freedesktop/NetworkManager/Devices/14 Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f1): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2] Jun 16 20:46:12 ca-ostest432 journal: nl_recv returned with error: No buffer space available Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f1): preparing device Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f0): device state change: unavailable -> disconnected (reason 'none') [20 30 0] Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f1): device state change: unavailable -> disconnected (reason 'none') [20 30 0] Jun 16 20:46:12 ca-ostest432 kernel: ixgbe 0000:13:00.0 ens2f0: detected SFP+: 3 Jun 16 20:46:12 ca-ostest432 kernel: ixgbe 0000:13:00.1: registered PHC device on ens2f1 Jun 16 20:46:12 ca-ostest432 kernel: IPv6: ADDRCONF(NETDEV_UP): ens2f1: link is not ready Jun 16 20:46:12 ca-ostest432 kernel: ixgbe 0000:13:00.1 ens2f1: detected SFP+: 4 Jun 16 20:46:12 ca-ostest432 kernel: ixgbe 0000:13:00.1 ens2f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX Jun 16 20:46:12 ca-ostest432 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens2f1: link becomes ready Jun 16 20:46:12 ca-ostest432 NetworkManager[1316]: <info> (ens2f1): link connected Jun 16 20:46:12 ca-ostest432 kernel: ixgbe 0000:13:00.0 ens2f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX Jun 16 20:46:12 ca-ostest432 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens2f0: link becomes ready Jun 16 20:46:13 ca-ostest432 NetworkManager[1316]: <info> (ens2f0): link connected --------------------------------------------------------------------------------------------- Issue is not seen for other nic drivers when loaded , i.e verified with be2net ,e1000e , igb drivers , so it looks like specific to ixgbe driver module. This problem can be addressed by increasing the size of the socket receive buffer via nl_socket_set_buffer_size(). However, since reliable transmissions from kernel to user are impossible in any case, I don't know if we need to ensure enough buffer and how much it will be enough. In my test, 32768*4 bytes of receive buffer size is ok. Below is my patch.
From 4505e0001563bcef814dd2c6dc02b2d83e521867 Mon Sep 17 00:00:00 2001 From: rpm-build <rpm-build> Date: Mon, 16 Jun 2014 20:05:08 +0800 Subject: [PATCH] set nl socket buffer size to 32768*4
Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com> --- src/util/virnetlink.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index 5849ed7..cf44ad2 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -561,6 +561,15 @@ virNetlinkEventServiceStart(unsigned int protocol, unsigned int groups) goto error_server; } + if (nl_socket_set_buffer_size(srv->netlinknh, (32768*4), 0)) { + virReportSystemError(errno, "%s", + _("cannot set netlink socket buffer size to 32768*4")); + goto error_server; + } else { + virReportSystemError(0, "%s", + _("set netlink socket buffer size to 32768*4")); + } + if ((srv->eventwatch = virEventAddHandle(fd, VIR_EVENT_HANDLE_READABLE, virNetlinkEventCallback, -- 1.9.0 -- Regards, Vaughan

On 06/16/2014 09:16 PM, vaughan wrote:
Hi experts,
[ALL CAPS SUBJECT LINE MAKES IT SOUND LIKE YOU ARE SHOUTING]
Release: OL7 Kernel: 3.10.0-121.el7.x86_64
Noticed below error on OL7 server, while loading Intel 10gigabit nic driver module , ixgbe in syslog -------------------------------------------------------------------- journal: nl_recv returned with error: No buffer space available -------------------------------------------------------------------
This problem can be addressed by increasing the size of the socket receive buffer via nl_socket_set_buffer_size(). However, since reliable transmissions from kernel to user are impossible in any case, I don't know if we need to ensure enough buffer and how much it will be enough. In my test, 32768*4 bytes of receive buffer size is ok. Below is my patch.
Thanks for the submission. Can you use 'git send-email' next time, as that format is easier for maintainers to apply your patch, than having to manually extract your inlined patch out of the body of a larger message?
From 4505e0001563bcef814dd2c6dc02b2d83e521867 Mon Sep 17 00:00:00 2001 From: rpm-build <rpm-build> Date: Mon, 16 Jun 2014 20:05:08 +0800 Subject: [PATCH] set nl socket buffer size to 32768*4
Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com>
Your commit message is rather sparse. You should include some of the information from your larger email as part of your commit message (in particular, that you determined that the ixgbe driver module fails with the default netlink buffer size, and that requesting a larger libnl buffer resulted in clearing up the log messages that you were seeing).
--- src/util/virnetlink.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index 5849ed7..cf44ad2 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -561,6 +561,15 @@ virNetlinkEventServiceStart(unsigned int protocol, unsigned int groups) goto error_server; }
+ if (nl_socket_set_buffer_size(srv->netlinknh, (32768*4), 0)) {
Spaces on both sides of binary operators.
+ virReportSystemError(errno, "%s", + _("cannot set netlink socket buffer size to 32768*4"));
Unusual indentation. Also, are we guaranteed that 'errno' is a sane value at this point? http://libnl2.sourcearchive.com/documentation/2.0-1/group__socket_gaec9b1f3b... merely states that nl_socket_set_buffer_size() returns a negative error code on failure, not that it sets errno.
+ goto error_server; + } else { + virReportSystemError(0, "%s", + _("set netlink socket buffer size to 32768*4"));
virReportSystemError() is the wrong function to use here; in particular, reporting 0 as the error value will make for a non-sensical message: set netlink socket buffer size to 32768*4: Success virReportError() would be sufficient. Also, do you have to repeat the 32768*4 value in your two error messages, or can you merely just state "cannot resize socket buffer" and be done with it (so that the value itself can be changed without having to retranslate messages down the road).
+ } + if ((srv->eventwatch = virEventAddHandle(fd, VIR_EVENT_HANDLE_READABLE, virNetlinkEventCallback,
I'd feel better if Laine actually approves the patch, but I've pointed out enough issues that it is worth you posting a v2. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org

On 06/17/2014 06:16 AM, vaughan wrote:
Hi experts,
Release: OL7 Kernel: 3.10.0-121.el7.x86_64
Noticed below error on OL7 server, while loading Intel 10gigabit nic driver module , ixgbe in syslog -------------------------------------------------------------------- journal: nl_recv returned with error: No buffer space available -------------------------------------------------------------------
Complete syslog content for ixgbe module load :
un 16 20:46:10 ca-ostest432 kernel: ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver - version 3.15.1-k Jun 16 20:46:10 ca-ostest432 kernel: ixgbe: Copyright (c) 1999-2013 Intel Corporation. Jun 16 20:46:10 ca-ostest432 kvm: 1 guest now active Jun 16 20:46:10 ca-ostest432 kvm: 0 guests now active Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: Multiqueue Enabled: Rx Queue count = 16, Tx Queue count = 16 Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (PCI Express:5.0GT/s:Width x8) 00:1b:21:c8:24:74 Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: MAC: 2, PHY: 9, SFP+: 3, PBA No: E70856-007 Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: PCI Express bandwidth of 32GT/s available Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (Speed:5.0GT/s, Width: x8, Encoding Loss:20%) Jun 16 20:46:10 ca-ostest432 journal: nl_recv returned with error: No buffer space available
A very similar problem (probably the same, but you're showing the kernel error message rather than the error logged by libvirt) was reported and addressed in libnl3 quite awhile back. libnl3 originally set the default buffer size to 4096, which wasn't enough for SRIOV cards with lots of VFs. So they increased it to 4 * 4096, which should be plenty for anybody. That libnl3 patch is present in RHEL7.0 (currently at 3.2.21.6). Can you verify the version of libnl3 you are running, and that it contains this code if (page_size == 0) page_size = getpagesize() * 4; in the function lib/nl.c:nl_recv() (previously it was just "page_size = getpagesize();"). If you don't have that patch in your libnl3 package, please backport the upstream commit that makes that change. If you do have that patch in your libnl3, perhaps you have gotten a different ixgbe driver from somewhere (we did test against ixgbe with the maximum number of VFs, so there would have to be something different in your driver). It would be good to figure out the source of the problem before applying any fix anywhere - much better to understand the cause, and right now I don't think we do; what is creating the need for such a large buffer in your case, but not for others who use the same driver with the same number of VFs?). Up to now our position has been that this problem should be fixed in libnl, so we have preferred to not patch libvirt for it, but instead get libnl fixed. If we do decide to patch libvirt, I think it would be better to turn on message peeking for nl_recv (nl_socket_enable_msg_peek()), as that would solve the problem totally and permanently (the upstream maintainer of libnl is reluctant to turn that on by default due to potential performance problems in other users of libnl)

On 2014年06月18日 20:24, Laine Stump wrote: > On 06/17/2014 06:16 AM, vaughan wrote: >> Hi experts, >> >> Release: OL7 >> Kernel: 3.10.0-121.el7.x86_64 >> >> Noticed below error on OL7 server, while loading Intel 10gigabit nic >> driver module , ixgbe in syslog >> -------------------------------------------------------------------- >> journal: nl_recv returned with error: No buffer space available >> ------------------------------------------------------------------- >> >> Complete syslog content for ixgbe module load : >> >> un 16 20:46:10 ca-ostest432 kernel: ixgbe: Intel(R) 10 Gigabit PCI >> Express Network Driver - version 3.15.1-k >> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe: Copyright (c) 1999-2013 >> Intel Corporation. >> Jun 16 20:46:10 ca-ostest432 kvm: 1 guest now active >> Jun 16 20:46:10 ca-ostest432 kvm: 0 guests now active >> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: Multiqueue >> Enabled: Rx Queue count = 16, Tx Queue count = 16 >> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (PCI >> Express:5.0GT/s:Width x8) 00:1b:21:c8:24:74 >> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: MAC: 2, PHY: 9, >> SFP+: 3, PBA No: E70856-007 >> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: PCI Express >> bandwidth of 32GT/s available >> Jun 16 20:46:10 ca-ostest432 kernel: ixgbe 0000:13:00.0: (Speed:5.0GT/s, >> Width: x8, Encoding Loss:20%) >> Jun 16 20:46:10 ca-ostest432 journal: nl_recv returned with error: No >> buffer space available > A very similar problem (probably the same, but you're showing the kernel > error message rather than the error logged by libvirt) was reported and Can we distinguish the two problem just from libvirtd.log? I turned it on, and it just output the following messages, like common errors. If you would like to see the whole log, I can attach it. 2014-06-18 14:16:40.901+0000: 31891: info : virNetlinkEventServiceStart:520 : starting netlink event service with protocol 15 2014-06-18 14:16:40.901+0000: 31891: debug : virEventPollInterruptLocked:710 : Skip interrupt, 0 0 2014-06-18 14:16:40.901+0000: 31891: debug : virEventPollAddHandle:136 : EVENT_POLL_ADD_HANDLE: watch=6 fd=14 events=1 cb=0x7fdff9397680 opaque=0x7fdffb8ff960 ff=(nil) 2014-06-18 14:16:40.901+0000: 31891: debug : virNetlinkEventServiceStart:576 : netlink event listener on fd: 14 running 2014-06-18 14:16:40.901+0000: 31891: debug : virNetServerRun:1093 : srv=0x7fdffb8f0d70 quit=0 ......... 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=0 w=1, f=6 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=1 w=2, f=8 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=2 w=3, f=11 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=3 w=4, f=12 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=4 w=5, f=13 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=5 w=6, f=14 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=6 w=7, f=15 e=0 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=7 w=8, f=15 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=8 w=9, f=17 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCalculateTimeout:332 : Calculate expiry of 0 timers 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCalculateTimeout:361 : Timeout at 0 due in -1 ms 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:627 : EVENT_POLL_RUN: nhandles=8 timeout=-1 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:638 : Poll got 1 event(s) 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchTimeouts:423 : Dispatch 0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:468 : Dispatch 8 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=0 w=1 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=1 w=2 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=2 w=3 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=3 w=4 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=4 w=5 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=5 w=6 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:496 : EVENT_POLL_DISPATCH_HANDLE: watch=6 events=1 2014-06-18 14:16:48.993+0000: 31891: error : virNetlinkEventCallback:344 : nl_recv return ok length: 186: Success 2014-06-18 14:16:48.993+0000: 31891: debug : virNetlinkEventCallback:349 : dispatching to max 0 clients, called from event watch 6 2014-06-18 14:16:48.993+0000: 31891: debug : virNetlinkEventCallback:362 : event not handled. 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=7 w=8 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=8 w=9 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCleanupTimeouts:514 : Cleanup 0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used, releasing 0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCleanupHandles:562 : Cleanup 9 2014-06-18 14:16:48.993+0000: 31891: debug : virEventRunDefaultImpl:270 : running default event implementation 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCleanupTimeouts:514 : Cleanup 0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCleanupTimeouts:550 : Found 0 out of 0 timeout slots used, releasing 0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCleanupHandles:562 : Cleanup 9 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=0 w=1, f=6 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=1 w=2, f=8 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=2 w=3, f=11 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=3 w=4, f=12 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=4 w=5, f=13 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=5 w=6, f=14 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=6 w=7, f=15 e=0 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=7 w=8, f=15 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollMakePollFDs:391 : Prepare n=8 w=9, f=17 e=1 d=0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCalculateTimeout:332 : Calculate expiry of 0 timers 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollCalculateTimeout:361 : Timeout at 0 due in -1 ms 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:627 : EVENT_POLL_RUN: nhandles=8 timeout=-1 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollRunOnce:638 : Poll got 1 event(s) 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchTimeouts:423 : Dispatch 0 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:468 : Dispatch 8 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=0 w=1 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=1 w=2 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=2 w=3 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=3 w=4 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=4 w=5 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:482 : i=5 w=6 2014-06-18 14:16:48.993+0000: 31891: debug : virEventPollDispatchHandles:496 : EVENT_POLL_DISPATCH_HANDLE: watch=6 events=5 2014-06-18 14:16:48.993+0000: 31891: error : virNetlinkEventCallback:340 : nl_recv returned with error, -5: No buffer space available > addressed in libnl3 quite awhile back. libnl3 originally set the default > buffer size to 4096, which wasn't enough for SRIOV cards with lots of > VFs. So they increased it to 4 * 4096, which should be plenty for > anybody. That libnl3 patch is present in RHEL7.0 (currently at 3.2.21.6). > > Can you verify the version of libnl3 you are running, and that it > contains this code > > if (page_size == 0) > page_size = getpagesize() * 4; I'm using libnl3-3.2.21-6.0.1.el7.x86_64 which has the patch you mentioned, * <E4><B8><89> 2<E6><9C><88> 26 2014 Thomas Graf <tgraf@redhat.com> - 3.2.21-5 - nl-Increase-receive-buffer-size-to-4-pages.patch (rh #1040626) > in the function lib/nl.c:nl_recv() (previously it was just "page_size = > getpagesize();"). If you don't have that patch in your libnl3 package, > please backport the upstream commit that makes that change. If you do > have that patch in your libnl3, perhaps you have gotten a different > ixgbe driver from somewhere (we did test against ixgbe with the maximum > number of VFs, so there would have to be something different in your > driver). It would be good to figure out the source of the problem before > applying any fix anywhere - much better to understand the cause, and > right now I don't think we do; what is creating the need for such a > large buffer in your case, but not for others who use the same driver > with the same number of VFs?). Thank you to point this out. I admit that my patch is derived from the symptom, not the cause. I'll dig in this direction. filename: /lib/modules/3.10.0-121.el7.x86_64/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko version: 3.15.1-k license: GPL description: Intel(R) 10 Gigabit PCI Express Network Driver author: Intel Corporation, <linux.nics@intel.com> srcversion: FC629607FA8BAB4CDE3BCF0 ..... depends: mdio,ptp,dca intree: Y vermagic: 3.10.0-121.el7.x86_64 SMP mod_unload modversions signer: Red Hat Enterprise Linux kernel signing key sig_key: E3:7B:3F:4E:75:78:02:B9:12:00:CF:F3:C7:81:1A:51:2E:94:83:EF sig_hashalgo: sha256 parm: max_vfs:Maximum number of virtual functions to allocate per physical function - default is zero and maximum value is 63 (uint) parm: allow_unsupported_sfp:Allow unsupported and untested SFP+ modules on 82599-based adapters (uint) parm: debug:Debug level (0=none,...,16=all) (int) > > > Up to now our position has been that this problem should be fixed in > libnl, so we have preferred to not patch libvirt for it, but instead get > libnl fixed. If we do decide to patch libvirt, I think it would be > better to turn on message peeking for nl_recv > (nl_socket_enable_msg_peek()), as that would solve the problem totally > and permanently (the upstream maintainer of libnl is reluctant to turn > that on by default due to potential performance problems in other users > of libnl) > Affirmative.
participants (4)
-
Eric Blake
-
Laine Stump
-
vaughan
-
Vaughan Cao