On 05/01/2012 01:05 PM, Laine Stump wrote:
The two following patches fix the same problem (described in
https://bugzilla.redhat.com/show_bug.cgi?id=816465) in two alternate
ways - one by retrying the failing operation after a delay, the other by
using knowledge of how libnl works internally to artificially "reserve"
a particular address so libnl doesn't attempt to bind to it.
You might think that libnl is the right place to fix this bug.
Unfortunately, that isn't possible, because it would involve changing
libnl's API - currently libnl decides what address to use for future
binds of a netlink socket at the time nl_handle_alloc() is called, but
doesn't actually attempt to bind it. Later, during nl_connect(), it
calls bind() with the address it had earlier decided. It would be nice
if we could just retry the bind with a different address when the first
attempt failed, but libnl allows client applications to retrieve the
bind address *before nl_connect() is called*, so an application may have
already gotten the address prior to calling nl_connect(), and changing
it would render the applications information incorrect.
Does libnl-3 have the same issue? It would be interesting if Serge
Hallyn's patches to support libnl-3 ended up allowing us to move to a
version of the library that doesn't have the same fundamental flaw.
Have we complained about this flaw to the libnl upstream folks?
So the best we can do (for now at least) is work around the problem, and
these are two possible workarounds.
I'm still debating which workaround is more palatable, but agree that we
have to do something.
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org