
On 05/01/2012 03:16 PM, Eric Blake wrote:
On 05/01/2012 01:05 PM, Laine Stump wrote:
The two following patches fix the same problem (described in https://bugzilla.redhat.com/show_bug.cgi?id=816465) in two alternate ways - one by retrying the failing operation after a delay, the other by using knowledge of how libnl works internally to artificially "reserve" a particular address so libnl doesn't attempt to bind to it.
You might think that libnl is the right place to fix this bug. Unfortunately, that isn't possible, because it would involve changing libnl's API - currently libnl decides what address to use for future binds of a netlink socket at the time nl_handle_alloc() is called, but doesn't actually attempt to bind it. Later, during nl_connect(), it calls bind() with the address it had earlier decided. It would be nice if we could just retry the bind with a different address when the first attempt failed, but libnl allows client applications to retrieve the bind address *before nl_connect() is called*, so an application may have already gotten the address prior to calling nl_connect(), and changing it would render the applications information incorrect. Does libnl-3 have the same issue? It would be interesting if Serge Hallyn's patches to support libnl-3 ended up allowing us to move to a version of the library that doesn't have the same fundamental flaw. Have we complained about this flaw to the libnl upstream folks?
Yes, libnl-3 has the same flaw in the API/code, but no I haven't contacted them about it yet - I've been too busy gathering information. That's on my list of things to do, though (actually I'm thinking it would be good to have the libnl maintainer take a look at Serge's patches - he had earlier agreed to help out with making libvirt libnl-3-compliant when we got around to it).