On 05/01/2012 01:10 PM, Laine Stump wrote:
This patch is one alternative to solve the problem detailed in:
https://bugzilla.redhat.com/show_bug.cgi?id=816465
Some other unidentified library in use by libvirtd (in another thread)
is apparently temporarily binding to a NETLINK_ROUTE raw socket with
an address of "pid of libvirtd" during startup. This is the same
address used by libnl for the first netlink socket it binds, and the
netlink socket allocated for virNetlinkEventServiceStart() happens to
be that first socket; the result is that nl_connect() fails about
15-20% of the time (but apparently only if there is a guest running at
the time libvirtd starts).
Testing has shown that in the case that nl_connect fails the first
time, retrying it after a 500msec sleep leads to success 100% of the
time, so this patch doubles that delay (which also has 100% success
rate.
+++ b/src/util/virnetlink.c
@@ -355,9 +355,18 @@ virNetlinkEventServiceStart(void)
}
if (nl_connect(srv->netlinknh, NETLINK_ROUTE) < 0) {
- virReportSystemError(errno,
- "%s", _("cannot connect to netlink
socket"));
- goto error_server;
+ /* the address that libnl wants to use for this connect ("pid
+ * of libvirtd") is sometimes temporarily in use by some other
+ * unidentified code. Retrying after a 500msec sleep has
+ * achieved 100% success rates, so we sleep for 1000msec and
+ * retry.
+ */
+ usleep(1000000);
Sleeping for 1 entire second is user-visible; if we go with this
approach, I'd rather see it be as a retry loop that probes something
like once every 200ms for 5 tries (or something similar), for better
response time.
--
Eric Blake eblake(a)redhat.com +1-919-301-3266
Libvirt virtualization library
http://libvirt.org