Re: [libvirt] [PATCH - 2 alternatives] util: fix libvirtd startup failure due to netlink error

2 May 2012


      On 05/01/2012 03:16 PM, Eric Blake wrote:
...
On 05/01/2012 01:05 PM, Laine Stump wrote:
...
The two following patches fix the same problem (described in
https://bugzilla.redhat.com/show_bug.cgi?id=816465) in two alternate
ways - one by retrying the failing operation after a delay, the other by
using knowledge of how libnl works internally to artificially "reserve"
a particular address so libnl doesn't attempt to bind to it.
You might think that libnl is the right place to fix this bug.
Unfortunately, that isn't possible, because it would involve changing
libnl's API - currently libnl decides what address to use for future
binds of a netlink socket at the time nl_handle_alloc() is called, but
doesn't actually attempt to bind it. Later, during nl_connect(), it
calls bind() with the address it had earlier decided. It would be nice
if we could just retry the bind with a different address when the first
attempt failed, but libnl allows client applications to retrieve the
bind address *before nl_connect() is called*, so an application may have
already gotten the address prior to calling nl_connect(), and changing
it would render the applications information incorrect.
Does libnl-3 have the same issue?  It would be interesting if Serge
Hallyn's patches to support libnl-3 ended up allowing us to move to a
version of the library that doesn't have the same fundamental flaw.
Have we complained about this flaw to the libnl upstream folks?
Yes, libnl-3 has the same flaw in the API/code, but no I haven't
contacted them about it yet - I've been too busy gathering information.
That's on my list of things to do, though (actually I'm thinking it
would be good to have the libnl maintainer take a look at Serge's
patches - he had earlier agreed to help out with making libvirt
libnl-3-compliant when we got around to it).