On 05/03/2012 04:09 AM, Daniel P. Berrange wrote:
On Wed, May 02, 2012 at 04:35:48PM -0400, Laine Stump wrote:
[lots of explanation snipped]
> So, the conclusions to draw from this analysis are:
>
> 1) my "alternative 1" patch was only coincidentally succeeding, and
> would be about as useful as everyone removing their shoes at airport
> security checkpoints.
>
> 2) If libvirtd has multiple threads started up before any netlink
> sockets have been bound to "pid of libvirtd", there is a possibility
> that the first call to nl_connect will fail (due to another thread being
> in getaddrinfo/__check_pf()). This is just as true for the macvtap and
> netcf uses of libnl as for the virNetlinkEventService use.
>
> 3) Once the first call to nl_connect is successfully completed (and/or
> if an extra (and otherwise unused) nlhandle is created with
> nl_handle_alloc() before creating any nlhandles that are subsequently
> nl_connect()ed), the likelyhood of a subsequent nl_connect() failure is
> effectively 0, since the address space used by libnl is all positive 32
> bit numbers, and the address space used by the auto-bind address in the
> kernel is (almost) all negative 32 bit numbers.
>
> 4) libnl should, at the very least, be modified to not use exactly
> nl_pid = pid, since there is a very high likelihood that particular
> address will already be taken by a library function that is calling bind
> directly, rather than through libnl. Really, its API shouldn't allow
> applications to retrieve the bind address used until after nl_connect()
> has already completed successfully; unfortunately, that would require an
> incompatible change in the API.
>
> Now that I completely understand the problem, I actually think that
> neither of these patches is quite correct; the first because it is
> simply bogus, and the second because it only solves the problem to
> virNetlinkEventService - it still leaves open the possibility that
> macvtap or netcf usage of libnl could result in a failure (although
> *only* if one of those uses happened to be called prior to
> virNetlinkEventService).
>
> To be 100% safe, I think what we need to do is put an extra call to
> nl_handle_alloc() very early in main, prior to calling
> virNetServerNew(), which is when all the other worker threads are
> created. I'll put together such a patch and send it to the list later
> tonight.
Wow, thanks for figuring this out. It is all far worse than I imagined :-(
Actually, investigating this was the most fun I've had in awhile :-)
It could have been worse - it could have turned out that the address
spaces continued to collide even after reserving "pid". Still, it's not
a pretty situation.
Clearly libnl is broken here, but I guess it is dead upstream in
favour
of libnl3. I wonder if that shares the same problem.
As far as I can tell, the libnl3 code is unchanged in this respect -
nl_handle_alloc() is for some reason replaced with nl_socket_alloc(),
but the address (stored in nl_pid) is still generated at that time, and
in the same manner, is available at any time via
nl_socket_get_local_port(), and is used by nl_connect() to call bind(),
with no attempts to adjust it if EADDRINUSE is encountered.
By the way, a workaround in libnl (as suggested in my conclusion (4)
would also be very simple - that could be done by patching libnl's
generate_local_port to skip over "pid + (0 << 22)", like this:
--- a/lib/socket.c
+++ b/lib/socket.c
@@ -58,7 +58,7 @@ static uint32_t generate_local_port(void)
if (used_ports_map[i] == 0xFFFFFFFF)
continue;
- for (n = 0; n < 32; n++) {
+ for (n = (i ? 0 : 1); n < 32; n++) {
if (1UL & (used_ports_map[i] >> n))
continue;
(this is based on upstream git, but generate_local_port() is identical
in libnl-1.1). Note that I call this a "workaround" rather than a
"fix"
- in my opinion, a true fix isn't possible with the existing libnl API -
as I've said before, not only should libnl not be unilaterally deciding
the bind address, it certainly shouldn't be allowing the application to
retrieve the bind address before a successful bind has taken place, as
this eliminates the possibility of changing that address in case of
EADDRINUSE.
The downside to the above patch (compared to patching libvirt) is that
there may be some other application that is (knowingly or unknowingly)
depending on the bind address being based on "pid" rather than "pid + (1
<< 22)", and that application would mysteriously begin to fail. So as a
localized patch that is certain to fix libvirt's libnl problem with no
chance of making life worse for anyone else, I still prefer a patch to
libvirt.
Agree that creating a netlink handle in libvirtd main() sounds like
a
way to workaround it.
I just sent a new patch upstream that does this:
https://www.redhat.com/archives/libvir-list/2012-May/msg00235.html
As noted there, I don't want to push it until I get experimental
verification that it doesn't damage lldpad<-->libvirtd communication.