On 12/08/2015 10:58 AM, Jan Gutter wrote:
Hi,
I've run into a rather interesting problem recently where a weird
interaction between libnl3 and libvirt caused some difficult-to-debug
issues. From libvirt's side, the issue was that a netlink response was
much larger than the pagesize and truncated by libnl3. When
virNetDevLinkDump() calls virNetlinkCommand(), nl_recv() is supposed
to return a rather large structure with information for all the
Virtual Functions. When called in a system where the number of PCIe
Virtual Functions are more than 30 for a given Physical Function, the
netlink response is larger than 4k, meaning that a message is
truncated. Unfortunately libnl3 truncates this silently, meaning that
a cryptic error pops up much later in virNetDevParseVfConfig(),
"missing IFLA_VF_INFO in netlink response".
Aside from the error propagation (which might be fixable in libnl3),
there still remains the need to enable libvirt to function in cases
like this. This can be done in two ways, both in virNetlinkCommand().
What version of libvirt and libnl are you using? I thought that we had
solved this problem, either in libvirt or in libnl at least a couple
years ago. Did something not get pushed somewhere? (now I need to go
spelunking in bugzilla again :-/)
1. Message peeking can be enabled. In theory this slows down any
netlink messages by doing a two stage query: query the buffer size,
then allocate the receive buffer and receive the message. This is a
reliability/performance tradeoff, I guess.
This is as simple as adding:
nl_socket_enable_msg_peek(nlhandle);
2. The receive buffer size can also be made larger:
nl_socket_set_msg_buf_size(nlhandle, ARBITRARY_BUFFER_SIZE);
This does not incur a performance penalty, but until libnl3 can
propagate the truncation error, this merely postpones the error for
future generations...
Jan
--
libvir-list mailing list
libvir-list(a)redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list