On 05/01/2012 03:16 PM, Eric Blake wrote:
On 05/01/2012 01:05 PM, Laine Stump wrote:
> The two following patches fix the same problem (described in
>
https://bugzilla.redhat.com/show_bug.cgi?id=816465) in two alternate
> ways - one by retrying the failing operation after a delay, the other by
> using knowledge of how libnl works internally to artificially "reserve"
> a particular address so libnl doesn't attempt to bind to it.
>
> You might think that libnl is the right place to fix this bug.
> Unfortunately, that isn't possible, because it would involve changing
> libnl's API - currently libnl decides what address to use for future
> binds of a netlink socket at the time nl_handle_alloc() is called, but
> doesn't actually attempt to bind it. Later, during nl_connect(), it
> calls bind() with the address it had earlier decided. It would be nice
> if we could just retry the bind with a different address when the first
> attempt failed, but libnl allows client applications to retrieve the
> bind address *before nl_connect() is called*, so an application may have
> already gotten the address prior to calling nl_connect(), and changing
> it would render the applications information incorrect.
Does libnl-3 have the same issue? It would be interesting if Serge
Hallyn's patches to support libnl-3 ended up allowing us to move to a
version of the library that doesn't have the same fundamental flaw.
Have we complained about this flaw to the libnl upstream folks?
Yes, libnl-3 has the same flaw in the API/code, but no I haven't
contacted them about it yet - I've been too busy gathering information.
That's on my list of things to do, though (actually I'm thinking it
would be good to have the libnl maintainer take a look at Serge's
patches - he had earlier agreed to help out with making libvirt
libnl-3-compliant when we got around to it).