[libvirt] [PATCHv3 0/2] Added waiting for DAD to finish for bridge address.

This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process which is relied on when waiting for DAD to complete exits without actually waiting for DAD. This is dnsmasq daemon's task. It seems to be a race that DAD finished before dnsmasq main process exited. The above commit needs the execution to block until DAD finishes for bridge IPv6 address because then it closes dummy tap device. Thus we need to ensure this ourselves. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do. We use netlink to dump information about existing IPv6 addresses. Netlink's response is a multi-part message. Unfortunately, the current implementation of virNetlink treats such messages as faulty and throws an error. So the patch 2/2 adds multi-part nelink response support. Update v2: fixed syntax. Update v3: moved to virnetdev. Maxim Perevedentsev (2): network: added waiting for DAD to finish for bridge address. Add support for multi-part netlink messages. src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 35 +++++++++- src/util/virnetdev.c | 160 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 2 + src/util/virnetlink.c | 4 +- 5 files changed, 200 insertions(+), 2 deletions(-) -- Sincerely, Maxim Perevedentsev

This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process exits without waiting for DAD, this is dnsmasq daemon's task. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do. --- Difference to v2: Moved to virnetdev. src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 35 +++++++++- src/util/virnetdev.c | 160 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 2 + 4 files changed, 197 insertions(+), 1 deletion(-) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0517c24..fa9e1c1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1776,6 +1776,7 @@ virNetDevSetRcvMulti; virNetDevSetupControl; virNetDevSysfsFile; virNetDevValidateConfig; +virNetDevWaitDadFinish; # util/virnetdevbandwidth.h diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 3d6721b..2172a3d 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2026,6 +2026,33 @@ networkAddRouteToBridge(virNetworkObjPtr network, } static int +networkWaitDadFinish(virNetworkObjPtr network) +{ + virNetworkIpDefPtr ipdef; + virSocketAddrPtr *addrs = NULL; + size_t i; + int ret; + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) {} + + if (i == 0) + return 0; + if (VIR_ALLOC_N(addrs, i)) + return -1; + + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) { + addrs[i] = &ipdef->address; + } + + ret = virNetDevWaitDadFinish(addrs, i); + VIR_FREE(addrs); + return ret; +} + +static int networkStartNetworkVirtual(virNetworkDriverStatePtr driver, virNetworkObjPtr network) { @@ -2159,7 +2186,13 @@ networkStartNetworkVirtual(virNetworkDriverStatePtr driver, if (v6present && networkStartRadvd(driver, network) < 0) goto err4; - /* DAD has happened (dnsmasq waits for it), dnsmasq is now bound to the + /* dnsmasq main process does not wait for DAD to complete, + * so we need to wait for it ourselves. + */ + if (v6present && networkWaitDadFinish(network) < 0) + goto err4; + + /* DAD has happened, dnsmasq is now bound to the * bridge's IPv6 address, so we can now set the dummy tun down. */ if (tapfd >= 0) { diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 1e20789..c81342a 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -96,6 +96,7 @@ VIR_LOG_INIT("util.netdev"); # define FEATURE_BIT_IS_SET(blocks, index, field) \ (FEATURE_WORD(blocks, index, field) & FEATURE_FIELD_FLAG(index)) #endif +# define IP_BUF_SIZE 4096 typedef enum { VIR_MCAST_TYPE_INDEX_TOKEN, @@ -1219,6 +1220,103 @@ int virNetDevClearIPAddress(const char *ifname, return ret; } +/* return whether there is a known address with 'tentative' flag set */ +static int +virNetDevParseDadStatus(struct nlmsghdr *nlh, int len, + virSocketAddrPtr *addrs, size_t count) +{ + struct ifaddrmsg *ifaddrmsg_ptr; + unsigned int ifaddrmsg_len; + struct rtattr *rtattr_ptr; + size_t i; + struct in6_addr *addr; + for (; NLMSG_OK(nlh, len); nlh = NLMSG_NEXT(nlh, len)) { + if (NLMSG_PAYLOAD(nlh, 0) < sizeof(struct ifaddrmsg)) { + /* Message without payload is the last one. */ + break; + } + + ifaddrmsg_ptr = (struct ifaddrmsg *)NLMSG_DATA(nlh); + if (!(ifaddrmsg_ptr->ifa_flags & IFA_F_TENTATIVE)) { + /* Not tentative: we are not interested in this entry. */ + continue; + } + + ifaddrmsg_len = IFA_PAYLOAD(nlh); + rtattr_ptr = (struct rtattr *) IFA_RTA(ifaddrmsg_ptr); + for (; RTA_OK(rtattr_ptr, ifaddrmsg_len); + rtattr_ptr = RTA_NEXT(rtattr_ptr, ifaddrmsg_len)) { + if (RTA_PAYLOAD(rtattr_ptr) != sizeof(struct in6_addr)) { + /* No address: ignore. */ + continue; + } + + /* We check only known addresses. */ + for (i = 0; i < count; i++) { + addr = &addrs[i]->data.inet6.sin6_addr; + if (!memcmp(addr, RTA_DATA(rtattr_ptr), + sizeof(struct in6_addr))) { + /* We found matching tentative address. */ + return 1; + } + } + } + } + return 0; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +{ + struct nl_msg *nlmsg = NULL; + struct ifaddrmsg ifa; + struct nlmsghdr *resp = NULL; + unsigned int recvbuflen; + int ret = -1, dad = 1; + + if (!(nlmsg = nlmsg_alloc_simple(RTM_GETADDR, + NLM_F_REQUEST | NLM_F_DUMP))) { + virReportOOMError(); + return -1; + } + + memset(&ifa, 0, sizeof(ifa)); + /* DAD is for IPv6 adresses only. */ + ifa.ifa_family = AF_INET6; + if (nlmsg_append(nlmsg, &ifa, sizeof(ifa), NLMSG_ALIGNTO) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + goto cleanup; + } + + /* Periodically query netlink until DAD finishes on all known addresses. */ + while (dad) { + if (virNetlinkCommand(nlmsg, &resp, &recvbuflen, 0, 0, + NETLINK_ROUTE, 0) < 0) + goto cleanup; + + if (virNetlinkGetErrorCode(resp, recvbuflen) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("error reading DAD state information")); + goto cleanup; + } + + /* Parse response. */ + dad = virNetDevParseDadStatus(resp, recvbuflen, addrs, count); + if (dad) + usleep(1000 * 10); + + VIR_FREE(resp); + } + ret = 0; + + cleanup: + VIR_FREE(resp); + nlmsg_free(nlmsg); + return ret; +} + #else /* defined(__linux__) && defined(HAVE_LIBNL) */ int virNetDevSetIPAddress(const char *ifname, @@ -1338,6 +1436,68 @@ int virNetDevClearIPAddress(const char *ifname, return ret; } +/* return whether there is a known address with 'tentative' flag set */ +static int +virNetDevParseDadStatus(char *outbuf, + virSocketAddrPtr *addrs, size_t count) +{ + virSocketAddr sockaddr; + size_t i, j; + int ret = 0; + char **addr_strings = virStringSplit(outbuf, "\n", 0); + for (j = 0; addr_strings[j] != NULL; ++j) { + if (virSocketParseAddr(strings[j], &sockaddr, AF_INET6) < 0) + continue; + for (i = 0; i < count; i++) { + if (virSocketAddrEqual(addrs[i], &sockaddr)) { + ret = 1; + goto cleanup; + } + } + } + + cleanup: + virStringFreeList(addr_strings); + return ret; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +{ + int ret = -1, dad = 1; + char *outbuf = NULL; + virCommandPtr cmd = NULL; + +# ifdef IP_PATH + while (dad) { + cmd = virCommandNew(POSIX_SHELL); + virCommandAddArgList(cmd, "-c", + IP_PATH "-6 addr show tentative | \ + awk '/inet6/{split(\\$2, a, \\\"/\\\"); \ + print a[1]}'", NULL); + virCommandSetOutputBuffer(cmd, &outbuf); + if (virCommandRun(cmd, NULL)) + goto cleanup; + virCommandFree(cmd); + + dad = virNetDevParseDadStatus(outbuf, addrs, count); + if (dad) + usleep(1000 * 10); + + VIR_FREE(outbuf); + } + ret = 0; +# else + virReportSystemError(ENOSYS, "%s", _("Unable to check IPv6 DAD")); +# endif + + cleanup: + virCommandFree(cmd); + VIR_FREE(outbuf); + return ret; +} + #endif /* defined(__linux__) && defined(HAVE_LIBNL) */ /** diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index fff881c..a09b3b2 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -105,6 +105,8 @@ int virNetDevClearIPAddress(const char *ifname, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; int virNetDevGetIPAddress(const char *ifname, virSocketAddrPtr addr) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; +int virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2); int virNetDevSetMAC(const char *ifname, -- Sincerely, Maxim Perevedentsev

On 08/10/2015 01:08 PM, Maxim Perevedentsev wrote:
This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process exits without waiting for DAD, this is dnsmasq daemon's task. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do.
Since the comments in our code imply that dnsmasq should be waiting for DAD to complete prior to daemonizing, before pushing a fix like this I'd like to find out from the dnsmasq folks if we are erroneously relying on nonexistent dnsmasq behavior, or if maybe there is a bug in some version of dnsmasq. Simon (or other dnsmasq people) - when dnsmasq is run with "enable-ra", does it make sure it completes DAD prior to daemonizing? Or does libvirt need to do this extra polling to assure that DAD has completed? (or maybe there's some other config parameter we need to add?)
--- Difference to v2: Moved to virnetdev.
src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 35 +++++++++- src/util/virnetdev.c | 160 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 2 + 4 files changed, 197 insertions(+), 1 deletion(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0517c24..fa9e1c1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1776,6 +1776,7 @@ virNetDevSetRcvMulti; virNetDevSetupControl; virNetDevSysfsFile; virNetDevValidateConfig; +virNetDevWaitDadFinish;
# util/virnetdevbandwidth.h diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 3d6721b..2172a3d 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2026,6 +2026,33 @@ networkAddRouteToBridge(virNetworkObjPtr network, }
static int +networkWaitDadFinish(virNetworkObjPtr network) +{ + virNetworkIpDefPtr ipdef; + virSocketAddrPtr *addrs = NULL; + size_t i; + int ret; + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) {} + + if (i == 0) + return 0; + if (VIR_ALLOC_N(addrs, i)) + return -1; + + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) { + addrs[i] = &ipdef->address; + } + + ret = virNetDevWaitDadFinish(addrs, i); + VIR_FREE(addrs); + return ret; +} + +static int networkStartNetworkVirtual(virNetworkDriverStatePtr driver, virNetworkObjPtr network) { @@ -2159,7 +2186,13 @@ networkStartNetworkVirtual(virNetworkDriverStatePtr driver, if (v6present && networkStartRadvd(driver, network) < 0) goto err4;
- /* DAD has happened (dnsmasq waits for it), dnsmasq is now bound to the + /* dnsmasq main process does not wait for DAD to complete, + * so we need to wait for it ourselves. + */ + if (v6present && networkWaitDadFinish(network) < 0) + goto err4; + + /* DAD has happened, dnsmasq is now bound to the * bridge's IPv6 address, so we can now set the dummy tun down. */ if (tapfd >= 0) { diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 1e20789..c81342a 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -96,6 +96,7 @@ VIR_LOG_INIT("util.netdev"); # define FEATURE_BIT_IS_SET(blocks, index, field) \ (FEATURE_WORD(blocks, index, field) & FEATURE_FIELD_FLAG(index)) #endif +# define IP_BUF_SIZE 4096
typedef enum { VIR_MCAST_TYPE_INDEX_TOKEN, @@ -1219,6 +1220,103 @@ int virNetDevClearIPAddress(const char *ifname, return ret; }
+/* return whether there is a known address with 'tentative' flag set */ +static int +virNetDevParseDadStatus(struct nlmsghdr *nlh, int len, + virSocketAddrPtr *addrs, size_t count) +{ + struct ifaddrmsg *ifaddrmsg_ptr; + unsigned int ifaddrmsg_len; + struct rtattr *rtattr_ptr; + size_t i; + struct in6_addr *addr; + for (; NLMSG_OK(nlh, len); nlh = NLMSG_NEXT(nlh, len)) { + if (NLMSG_PAYLOAD(nlh, 0) < sizeof(struct ifaddrmsg)) { + /* Message without payload is the last one. */ + break; + } + + ifaddrmsg_ptr = (struct ifaddrmsg *)NLMSG_DATA(nlh); + if (!(ifaddrmsg_ptr->ifa_flags & IFA_F_TENTATIVE)) { + /* Not tentative: we are not interested in this entry. */ + continue; + } + + ifaddrmsg_len = IFA_PAYLOAD(nlh); + rtattr_ptr = (struct rtattr *) IFA_RTA(ifaddrmsg_ptr); + for (; RTA_OK(rtattr_ptr, ifaddrmsg_len); + rtattr_ptr = RTA_NEXT(rtattr_ptr, ifaddrmsg_len)) { + if (RTA_PAYLOAD(rtattr_ptr) != sizeof(struct in6_addr)) { + /* No address: ignore. */ + continue; + } + + /* We check only known addresses. */ + for (i = 0; i < count; i++) { + addr = &addrs[i]->data.inet6.sin6_addr; + if (!memcmp(addr, RTA_DATA(rtattr_ptr), + sizeof(struct in6_addr))) { + /* We found matching tentative address. */ + return 1; + } + } + } + } + return 0; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +{ + struct nl_msg *nlmsg = NULL; + struct ifaddrmsg ifa; + struct nlmsghdr *resp = NULL; + unsigned int recvbuflen; + int ret = -1, dad = 1; + + if (!(nlmsg = nlmsg_alloc_simple(RTM_GETADDR, + NLM_F_REQUEST | NLM_F_DUMP))) { + virReportOOMError(); + return -1; + } + + memset(&ifa, 0, sizeof(ifa)); + /* DAD is for IPv6 adresses only. */ + ifa.ifa_family = AF_INET6; + if (nlmsg_append(nlmsg, &ifa, sizeof(ifa), NLMSG_ALIGNTO) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + goto cleanup; + } + + /* Periodically query netlink until DAD finishes on all known addresses. */ + while (dad) { + if (virNetlinkCommand(nlmsg, &resp, &recvbuflen, 0, 0, + NETLINK_ROUTE, 0) < 0) + goto cleanup; + + if (virNetlinkGetErrorCode(resp, recvbuflen) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("error reading DAD state information")); + goto cleanup; + } + + /* Parse response. */ + dad = virNetDevParseDadStatus(resp, recvbuflen, addrs, count); + if (dad) + usleep(1000 * 10); + + VIR_FREE(resp); + } + ret = 0; + + cleanup: + VIR_FREE(resp); + nlmsg_free(nlmsg); + return ret; +} + #else /* defined(__linux__) && defined(HAVE_LIBNL) */
int virNetDevSetIPAddress(const char *ifname, @@ -1338,6 +1436,68 @@ int virNetDevClearIPAddress(const char *ifname, return ret; }
+/* return whether there is a known address with 'tentative' flag set */ +static int +virNetDevParseDadStatus(char *outbuf, + virSocketAddrPtr *addrs, size_t count) +{ + virSocketAddr sockaddr; + size_t i, j; + int ret = 0; + char **addr_strings = virStringSplit(outbuf, "\n", 0); + for (j = 0; addr_strings[j] != NULL; ++j) { + if (virSocketParseAddr(strings[j], &sockaddr, AF_INET6) < 0) + continue; + for (i = 0; i < count; i++) { + if (virSocketAddrEqual(addrs[i], &sockaddr)) { + ret = 1; + goto cleanup; + } + } + } + + cleanup: + virStringFreeList(addr_strings); + return ret; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +{ + int ret = -1, dad = 1; + char *outbuf = NULL; + virCommandPtr cmd = NULL; + +# ifdef IP_PATH + while (dad) { + cmd = virCommandNew(POSIX_SHELL); + virCommandAddArgList(cmd, "-c", + IP_PATH "-6 addr show tentative | \ + awk '/inet6/{split(\\$2, a, \\\"/\\\"); \ + print a[1]}'", NULL); + virCommandSetOutputBuffer(cmd, &outbuf); + if (virCommandRun(cmd, NULL)) + goto cleanup; + virCommandFree(cmd); + + dad = virNetDevParseDadStatus(outbuf, addrs, count); + if (dad) + usleep(1000 * 10); + + VIR_FREE(outbuf); + } + ret = 0; +# else + virReportSystemError(ENOSYS, "%s", _("Unable to check IPv6 DAD")); +# endif + + cleanup: + virCommandFree(cmd); + VIR_FREE(outbuf); + return ret; +} + #endif /* defined(__linux__) && defined(HAVE_LIBNL) */
/** diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index fff881c..a09b3b2 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -105,6 +105,8 @@ int virNetDevClearIPAddress(const char *ifname, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; int virNetDevGetIPAddress(const char *ifname, virSocketAddrPtr addr) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; +int virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);
int virNetDevSetMAC(const char *ifname, -- Sincerely, Maxim Perevedentsev
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 10/08/15 22:29, Laine Stump wrote:
On 08/10/2015 01:08 PM, Maxim Perevedentsev wrote:
This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process exits without waiting for DAD, this is dnsmasq daemon's task. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do.
Since the comments in our code imply that dnsmasq should be waiting for DAD to complete prior to daemonizing, before pushing a fix like this I'd like to find out from the dnsmasq folks if we are erroneously relying on nonexistent dnsmasq behavior, or if maybe there is a bug in some version of dnsmasq.
Simon (or other dnsmasq people) - when dnsmasq is run with "enable-ra", does it make sure it completes DAD prior to daemonizing? Or does libvirt need to do this extra polling to assure that DAD has completed? (or maybe there's some other config parameter we need to add?)
Dnsmasq doesn't wait for DAD to complete before returning. Internally, it know is DAD is still happening on an interface, as it needs to delay calling bind() until after it's complete. It would, therefore be relatively simple to add this behaviour, but it's not a complete solution, since new interfaces can appear _after_ dnsmasq has completed start-up. Having libvirt do its own checks whenever it creates an interface might therefore be a cleaner way of doing things, but I don't have an objection to changing dnsmasq behaviour if there's a good reason why that's not sensible. Cheers, Simon.
--- Difference to v2: Moved to virnetdev.
src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 35 +++++++++- src/util/virnetdev.c | 160 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 2 + 4 files changed, 197 insertions(+), 1 deletion(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0517c24..fa9e1c1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1776,6 +1776,7 @@ virNetDevSetRcvMulti; virNetDevSetupControl; virNetDevSysfsFile; virNetDevValidateConfig; +virNetDevWaitDadFinish;
# util/virnetdevbandwidth.h diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 3d6721b..2172a3d 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2026,6 +2026,33 @@ networkAddRouteToBridge(virNetworkObjPtr network, }
static int +networkWaitDadFinish(virNetworkObjPtr network) +{ + virNetworkIpDefPtr ipdef; + virSocketAddrPtr *addrs = NULL; + size_t i; + int ret; + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) {} + + if (i == 0) + return 0; + if (VIR_ALLOC_N(addrs, i)) + return -1; + + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) { + addrs[i] = &ipdef->address; + } + + ret = virNetDevWaitDadFinish(addrs, i); + VIR_FREE(addrs); + return ret; +} + +static int networkStartNetworkVirtual(virNetworkDriverStatePtr driver, virNetworkObjPtr network) { @@ -2159,7 +2186,13 @@ networkStartNetworkVirtual(virNetworkDriverStatePtr driver, if (v6present && networkStartRadvd(driver, network) < 0) goto err4;
- /* DAD has happened (dnsmasq waits for it), dnsmasq is now bound to the + /* dnsmasq main process does not wait for DAD to complete, + * so we need to wait for it ourselves. + */ + if (v6present && networkWaitDadFinish(network) < 0) + goto err4; + + /* DAD has happened, dnsmasq is now bound to the * bridge's IPv6 address, so we can now set the dummy tun down. */ if (tapfd >= 0) { diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 1e20789..c81342a 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -96,6 +96,7 @@ VIR_LOG_INIT("util.netdev"); # define FEATURE_BIT_IS_SET(blocks, index, field) \ (FEATURE_WORD(blocks, index, field) & FEATURE_FIELD_FLAG(index)) #endif +# define IP_BUF_SIZE 4096
typedef enum { VIR_MCAST_TYPE_INDEX_TOKEN, @@ -1219,6 +1220,103 @@ int virNetDevClearIPAddress(const char *ifname, return ret; }
+/* return whether there is a known address with 'tentative' flag set */ +static int +virNetDevParseDadStatus(struct nlmsghdr *nlh, int len, + virSocketAddrPtr *addrs, size_t count) +{ + struct ifaddrmsg *ifaddrmsg_ptr; + unsigned int ifaddrmsg_len; + struct rtattr *rtattr_ptr; + size_t i; + struct in6_addr *addr; + for (; NLMSG_OK(nlh, len); nlh = NLMSG_NEXT(nlh, len)) { + if (NLMSG_PAYLOAD(nlh, 0) < sizeof(struct ifaddrmsg)) { + /* Message without payload is the last one. */ + break; + } + + ifaddrmsg_ptr = (struct ifaddrmsg *)NLMSG_DATA(nlh); + if (!(ifaddrmsg_ptr->ifa_flags & IFA_F_TENTATIVE)) { + /* Not tentative: we are not interested in this entry. */ + continue; + } + + ifaddrmsg_len = IFA_PAYLOAD(nlh); + rtattr_ptr = (struct rtattr *) IFA_RTA(ifaddrmsg_ptr); + for (; RTA_OK(rtattr_ptr, ifaddrmsg_len); + rtattr_ptr = RTA_NEXT(rtattr_ptr, ifaddrmsg_len)) { + if (RTA_PAYLOAD(rtattr_ptr) != sizeof(struct in6_addr)) { + /* No address: ignore. */ + continue; + } + + /* We check only known addresses. */ + for (i = 0; i < count; i++) { + addr = &addrs[i]->data.inet6.sin6_addr; + if (!memcmp(addr, RTA_DATA(rtattr_ptr), + sizeof(struct in6_addr))) { + /* We found matching tentative address. */ + return 1; + } + } + } + } + return 0; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +{ + struct nl_msg *nlmsg = NULL; + struct ifaddrmsg ifa; + struct nlmsghdr *resp = NULL; + unsigned int recvbuflen; + int ret = -1, dad = 1; + + if (!(nlmsg = nlmsg_alloc_simple(RTM_GETADDR, + NLM_F_REQUEST | NLM_F_DUMP))) { + virReportOOMError(); + return -1; + } + + memset(&ifa, 0, sizeof(ifa)); + /* DAD is for IPv6 adresses only. */ + ifa.ifa_family = AF_INET6; + if (nlmsg_append(nlmsg, &ifa, sizeof(ifa), NLMSG_ALIGNTO) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + goto cleanup; + } + + /* Periodically query netlink until DAD finishes on all known addresses. */ + while (dad) { + if (virNetlinkCommand(nlmsg, &resp, &recvbuflen, 0, 0, + NETLINK_ROUTE, 0) < 0) + goto cleanup; + + if (virNetlinkGetErrorCode(resp, recvbuflen) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("error reading DAD state information")); + goto cleanup; + } + + /* Parse response. */ + dad = virNetDevParseDadStatus(resp, recvbuflen, addrs, count); + if (dad) + usleep(1000 * 10); + + VIR_FREE(resp); + } + ret = 0; + + cleanup: + VIR_FREE(resp); + nlmsg_free(nlmsg); + return ret; +} + #else /* defined(__linux__) && defined(HAVE_LIBNL) */
int virNetDevSetIPAddress(const char *ifname, @@ -1338,6 +1436,68 @@ int virNetDevClearIPAddress(const char *ifname, return ret; }
+/* return whether there is a known address with 'tentative' flag set */ +static int +virNetDevParseDadStatus(char *outbuf, + virSocketAddrPtr *addrs, size_t count) +{ + virSocketAddr sockaddr; + size_t i, j; + int ret = 0; + char **addr_strings = virStringSplit(outbuf, "\n", 0); + for (j = 0; addr_strings[j] != NULL; ++j) { + if (virSocketParseAddr(strings[j], &sockaddr, AF_INET6) < 0) + continue; + for (i = 0; i < count; i++) { + if (virSocketAddrEqual(addrs[i], &sockaddr)) { + ret = 1; + goto cleanup; + } + } + } + + cleanup: + virStringFreeList(addr_strings); + return ret; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +{ + int ret = -1, dad = 1; + char *outbuf = NULL; + virCommandPtr cmd = NULL; + +# ifdef IP_PATH + while (dad) { + cmd = virCommandNew(POSIX_SHELL); + virCommandAddArgList(cmd, "-c", + IP_PATH "-6 addr show tentative | \ + awk '/inet6/{split(\\$2, a, \\\"/\\\"); \ + print a[1]}'", NULL); + virCommandSetOutputBuffer(cmd, &outbuf); + if (virCommandRun(cmd, NULL)) + goto cleanup; + virCommandFree(cmd); + + dad = virNetDevParseDadStatus(outbuf, addrs, count); + if (dad) + usleep(1000 * 10); + + VIR_FREE(outbuf); + } + ret = 0; +# else + virReportSystemError(ENOSYS, "%s", _("Unable to check IPv6 DAD")); +# endif + + cleanup: + virCommandFree(cmd); + VIR_FREE(outbuf); + return ret; +} + #endif /* defined(__linux__) && defined(HAVE_LIBNL) */
/** diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index fff881c..a09b3b2 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -105,6 +105,8 @@ int virNetDevClearIPAddress(const char *ifname, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; int virNetDevGetIPAddress(const char *ifname, virSocketAddrPtr addr) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; +int virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);
int virNetDevSetMAC(const char *ifname, -- Sincerely, Maxim Perevedentsev
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCAAGBQJVya7kAAoJEBXN2mrhkTWi8IYP/RNoeivBxzgpFxgsAVptURxE zQYzYYBQDZefdvaNX/Zg//OjUchek8bChaMEuk2dIKpA5HWHej4d9+fI3RQjkc2P 5FVv4ng1o8CfdM8MZpalFYkQXPZXIKfzrOpxfcCh9IigOlJWdCJLPn9VnAUAOAul ylxrqo2hedbnYisPcN3Wb5LRguN/RwpNddyLO8PSpP5gt96Z+ykY6F3PLFMOmXQz OFLDwfU6+ppx1vFBlI3wSlN6BG/i5y7m63TzOSUIwOEE9mw4cwoqEKoU8DpgsNEE rjpvNs63wED8kcqVHqxMVt0VHSeADPSlVB2E7CcjqqRWksbHJIInkVJ8adM/ibPK /CbbfHQpaGAH0H3ke+J5P70KA6Sfo8VlDcvo2iAOT6ENuPmrUi7zwdzxuwAXEXtP J/oCILD+FRY00mD2rkZyjdlvfX8zsfiuL6nhOTGmF+OrDDr+qR534NyDJBdvhCfJ Hc75ERfbhKa7yYlt0+pSN5wZtShbKAnjHDKxOKloAu8csakDE9B753PwUbfDQIam vN3chMENeogNiZqsctntA7csOU8IssqU5u2XWMrGhcV4onnbleiNG/Ue/v3CzEzx LlNSqXONETqeU+Z3qIU+rhy42DZL89rdavYR4a5T/asge1fiWjYVdyUg8atkBExt 2h6k2LPUVZNjVgD6p4f/ =edDp -----END PGP SIGNATURE-----

On 08/11/2015 11:14 AM, Simon Kelley wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 10/08/15 22:29, Laine Stump wrote:
This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process exits without waiting for DAD, this is dnsmasq daemon's task. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do. Since the comments in our code imply that dnsmasq should be waiting for DAD to complete prior to daemonizing, before pushing a fix like
On 08/10/2015 01:08 PM, Maxim Perevedentsev wrote: this I'd like to find out from the dnsmasq folks if we are erroneously relying on nonexistent dnsmasq behavior, or if maybe there is a bug in some version of dnsmasq.
Simon (or other dnsmasq people) - when dnsmasq is run with "enable-ra", does it make sure it completes DAD prior to daemonizing? Or does libvirt need to do this extra polling to assure that DAD has completed? (or maybe there's some other config parameter we need to add?)
Dnsmasq doesn't wait for DAD to complete before returning. Internally, it know is DAD is still happening on an interface, as it needs to delay calling bind() until after it's complete. It would, therefore be relatively simple to add this behaviour, but it's not a complete solution, since new interfaces can appear _after_ dnsmasq has completed start-up.
Having libvirt do its own checks whenever it creates an interface might therefore be a cleaner way of doing things, but I don't have an objection to changing dnsmasq behaviour if there's a good reason why that's not sensible. I think this behavior will be inconsistent with the behavior present in dnsmasq for long, so we can create conflicts with software relying on current dnsmasq logic. I think if libvirt needs dad to finish then it should wait itself instead of modifying third-party software.
Cheers,
Simon.
--- Difference to v2: Moved to virnetdev.
src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 35 +++++++++- src/util/virnetdev.c | 160 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 2 + 4 files changed, 197 insertions(+), 1 deletion(-)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 0517c24..fa9e1c1 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1776,6 +1776,7 @@ virNetDevSetRcvMulti; virNetDevSetupControl; virNetDevSysfsFile; virNetDevValidateConfig; +virNetDevWaitDadFinish;
# util/virnetdevbandwidth.h diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 3d6721b..2172a3d 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2026,6 +2026,33 @@ networkAddRouteToBridge(virNetworkObjPtr network, }
static int +networkWaitDadFinish(virNetworkObjPtr network) +{ + virNetworkIpDefPtr ipdef; + virSocketAddrPtr *addrs = NULL; + size_t i; + int ret; + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) {} + + if (i == 0) + return 0; + if (VIR_ALLOC_N(addrs, i)) + return -1; + + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) { + addrs[i] = &ipdef->address; + } + + ret = virNetDevWaitDadFinish(addrs, i); + VIR_FREE(addrs); + return ret; +} + +static int networkStartNetworkVirtual(virNetworkDriverStatePtr driver, virNetworkObjPtr network) { @@ -2159,7 +2186,13 @@ networkStartNetworkVirtual(virNetworkDriverStatePtr driver, if (v6present && networkStartRadvd(driver, network) < 0) goto err4;
- /* DAD has happened (dnsmasq waits for it), dnsmasq is now bound to the + /* dnsmasq main process does not wait for DAD to complete, + * so we need to wait for it ourselves. + */ + if (v6present && networkWaitDadFinish(network) < 0) + goto err4; + + /* DAD has happened, dnsmasq is now bound to the * bridge's IPv6 address, so we can now set the dummy tun down. */ if (tapfd >= 0) { diff --git a/src/util/virnetdev.c b/src/util/virnetdev.c index 1e20789..c81342a 100644 --- a/src/util/virnetdev.c +++ b/src/util/virnetdev.c @@ -96,6 +96,7 @@ VIR_LOG_INIT("util.netdev"); # define FEATURE_BIT_IS_SET(blocks, index, field) \ (FEATURE_WORD(blocks, index, field) & FEATURE_FIELD_FLAG(index)) #endif +# define IP_BUF_SIZE 4096
typedef enum { VIR_MCAST_TYPE_INDEX_TOKEN, @@ -1219,6 +1220,103 @@ int virNetDevClearIPAddress(const char *ifname, return ret; }
+/* return whether there is a known address with 'tentative' flag set */ +static int +virNetDevParseDadStatus(struct nlmsghdr *nlh, int len, + virSocketAddrPtr *addrs, size_t count) +{ + struct ifaddrmsg *ifaddrmsg_ptr; + unsigned int ifaddrmsg_len; + struct rtattr *rtattr_ptr; + size_t i; + struct in6_addr *addr; + for (; NLMSG_OK(nlh, len); nlh = NLMSG_NEXT(nlh, len)) { + if (NLMSG_PAYLOAD(nlh, 0) < sizeof(struct ifaddrmsg)) { + /* Message without payload is the last one. */ + break; + } + + ifaddrmsg_ptr = (struct ifaddrmsg *)NLMSG_DATA(nlh); + if (!(ifaddrmsg_ptr->ifa_flags & IFA_F_TENTATIVE)) { + /* Not tentative: we are not interested in this entry. */ + continue; + } + + ifaddrmsg_len = IFA_PAYLOAD(nlh); + rtattr_ptr = (struct rtattr *) IFA_RTA(ifaddrmsg_ptr); + for (; RTA_OK(rtattr_ptr, ifaddrmsg_len); + rtattr_ptr = RTA_NEXT(rtattr_ptr, ifaddrmsg_len)) { + if (RTA_PAYLOAD(rtattr_ptr) != sizeof(struct in6_addr)) { + /* No address: ignore. */ + continue; + } + + /* We check only known addresses. */ + for (i = 0; i < count; i++) { + addr = &addrs[i]->data.inet6.sin6_addr; + if (!memcmp(addr, RTA_DATA(rtattr_ptr), + sizeof(struct in6_addr))) { + /* We found matching tentative address. */ + return 1; + } + } + } + } + return 0; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +{ + struct nl_msg *nlmsg = NULL; + struct ifaddrmsg ifa; + struct nlmsghdr *resp = NULL; + unsigned int recvbuflen; + int ret = -1, dad = 1; + + if (!(nlmsg = nlmsg_alloc_simple(RTM_GETADDR, + NLM_F_REQUEST | NLM_F_DUMP))) { + virReportOOMError(); + return -1; + } + + memset(&ifa, 0, sizeof(ifa)); + /* DAD is for IPv6 adresses only. */ + ifa.ifa_family = AF_INET6; + if (nlmsg_append(nlmsg, &ifa, sizeof(ifa), NLMSG_ALIGNTO) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + goto cleanup; + } + + /* Periodically query netlink until DAD finishes on all known addresses. */ + while (dad) { + if (virNetlinkCommand(nlmsg, &resp, &recvbuflen, 0, 0, + NETLINK_ROUTE, 0) < 0) + goto cleanup; + + if (virNetlinkGetErrorCode(resp, recvbuflen) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("error reading DAD state information")); + goto cleanup; + } + + /* Parse response. */ + dad = virNetDevParseDadStatus(resp, recvbuflen, addrs, count); + if (dad) + usleep(1000 * 10); + + VIR_FREE(resp); + } + ret = 0; + + cleanup: + VIR_FREE(resp); + nlmsg_free(nlmsg); + return ret; +} + #else /* defined(__linux__) && defined(HAVE_LIBNL) */
int virNetDevSetIPAddress(const char *ifname, @@ -1338,6 +1436,68 @@ int virNetDevClearIPAddress(const char *ifname, return ret; }
+/* return whether there is a known address with 'tentative' flag set */ +static int +virNetDevParseDadStatus(char *outbuf, + virSocketAddrPtr *addrs, size_t count) +{ + virSocketAddr sockaddr; + size_t i, j; + int ret = 0; + char **addr_strings = virStringSplit(outbuf, "\n", 0); + for (j = 0; addr_strings[j] != NULL; ++j) { + if (virSocketParseAddr(strings[j], &sockaddr, AF_INET6) < 0) + continue; + for (i = 0; i < count; i++) { + if (virSocketAddrEqual(addrs[i], &sockaddr)) { + ret = 1; + goto cleanup; + } + } + } + + cleanup: + virStringFreeList(addr_strings); + return ret; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +int +virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) +{ + int ret = -1, dad = 1; + char *outbuf = NULL; + virCommandPtr cmd = NULL; + +# ifdef IP_PATH + while (dad) { + cmd = virCommandNew(POSIX_SHELL); + virCommandAddArgList(cmd, "-c", + IP_PATH "-6 addr show tentative | \ + awk '/inet6/{split(\\$2, a, \\\"/\\\"); \ + print a[1]}'", NULL); + virCommandSetOutputBuffer(cmd, &outbuf); + if (virCommandRun(cmd, NULL)) + goto cleanup; + virCommandFree(cmd); + + dad = virNetDevParseDadStatus(outbuf, addrs, count); + if (dad) + usleep(1000 * 10); + + VIR_FREE(outbuf); + } + ret = 0; +# else + virReportSystemError(ENOSYS, "%s", _("Unable to check IPv6 DAD")); +# endif + + cleanup: + virCommandFree(cmd); + VIR_FREE(outbuf); + return ret; +} + #endif /* defined(__linux__) && defined(HAVE_LIBNL) */
/** diff --git a/src/util/virnetdev.h b/src/util/virnetdev.h index fff881c..a09b3b2 100644 --- a/src/util/virnetdev.h +++ b/src/util/virnetdev.h @@ -105,6 +105,8 @@ int virNetDevClearIPAddress(const char *ifname, ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; int virNetDevGetIPAddress(const char *ifname, virSocketAddrPtr addr) ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) ATTRIBUTE_RETURN_CHECK; +int virNetDevWaitDadFinish(virSocketAddrPtr *addrs, size_t count) + ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2);
int virNetDevSetMAC(const char *ifname, -- Sincerely, Maxim Perevedentsev
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1
iQIcBAEBCAAGBQJVya7kAAoJEBXN2mrhkTWi8IYP/RNoeivBxzgpFxgsAVptURxE zQYzYYBQDZefdvaNX/Zg//OjUchek8bChaMEuk2dIKpA5HWHej4d9+fI3RQjkc2P 5FVv4ng1o8CfdM8MZpalFYkQXPZXIKfzrOpxfcCh9IigOlJWdCJLPn9VnAUAOAul ylxrqo2hedbnYisPcN3Wb5LRguN/RwpNddyLO8PSpP5gt96Z+ykY6F3PLFMOmXQz OFLDwfU6+ppx1vFBlI3wSlN6BG/i5y7m63TzOSUIwOEE9mw4cwoqEKoU8DpgsNEE rjpvNs63wED8kcqVHqxMVt0VHSeADPSlVB2E7CcjqqRWksbHJIInkVJ8adM/ibPK /CbbfHQpaGAH0H3ke+J5P70KA6Sfo8VlDcvo2iAOT6ENuPmrUi7zwdzxuwAXEXtP J/oCILD+FRY00mD2rkZyjdlvfX8zsfiuL6nhOTGmF+OrDDr+qR534NyDJBdvhCfJ Hc75ERfbhKa7yYlt0+pSN5wZtShbKAnjHDKxOKloAu8csakDE9B753PwUbfDQIam vN3chMENeogNiZqsctntA7csOU8IssqU5u2XWMrGhcV4onnbleiNG/Ue/v3CzEzx LlNSqXONETqeU+Z3qIU+rhy42DZL89rdavYR4a5T/asge1fiWjYVdyUg8atkBExt 2h6k2LPUVZNjVgD6p4f/ =edDp -----END PGP SIGNATURE-----
-- Your sincerely, Maxim Perevedentsev

On 08/11/2015 04:14 AM, Simon Kelley wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 10/08/15 22:29, Laine Stump wrote:
This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process exits without waiting for DAD, this is dnsmasq daemon's task. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do. Since the comments in our code imply that dnsmasq should be waiting for DAD to complete prior to daemonizing, before pushing a fix like
On 08/10/2015 01:08 PM, Maxim Perevedentsev wrote: this I'd like to find out from the dnsmasq folks if we are erroneously relying on nonexistent dnsmasq behavior, or if maybe there is a bug in some version of dnsmasq.
Simon (or other dnsmasq people) - when dnsmasq is run with "enable-ra", does it make sure it completes DAD prior to daemonizing? Or does libvirt need to do this extra polling to assure that DAD has completed? (or maybe there's some other config parameter we need to add?)
Dnsmasq doesn't wait for DAD to complete before returning. Internally, it know is DAD is still happening on an interface, as it needs to delay calling bind() until after it's complete. It would, therefore be relatively simple to add this behaviour, but it's not a complete solution, since new interfaces can appear _after_ dnsmasq has completed start-up.
Having libvirt do its own checks whenever it creates an interface might therefore be a cleaner way of doing things, but I don't have an objection to changing dnsmasq behaviour if there's a good reason why that's not sensible.
No need for dnsmasq to change its behavior. I just wanted to make sure that it was the comment in libvirt that was wrong, and not a regression in dnsmasq. Based on your answer, it appears that this was a misunderstanding by the original author of the libvirt code, so it is something that libvirt needs to fix. Thanks for the quick response!

Such messages do not have NLMSG_ERROR or NLMSG_DONE type but they are valid responses. We test 'multi-partness' by looking for NLM_F_MULTI flag. --- src/util/virnetlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index 0052ef9..f02bb59 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -386,7 +386,9 @@ virNetlinkGetErrorCode(struct nlmsghdr *resp, unsigned int recvbuflen) break; default: - goto malformed_resp; + /* We allow multipart messages. */ + if (!(resp->nlmsg_flags & NLM_F_MULTI)) + goto malformed_resp; } return result; -- Sincerely, Maxim Perevedentsev

On 08/10/2015 01:08 PM, Maxim Perevedentsev wrote:
Such messages do not have NLMSG_ERROR or NLMSG_DONE type but they are valid responses. We test 'multi-partness' by looking for NLM_F_MULTI flag. --- src/util/virnetlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index 0052ef9..f02bb59 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -386,7 +386,9 @@ virNetlinkGetErrorCode(struct nlmsghdr *resp, unsigned int recvbuflen) break;
default: - goto malformed_resp; + /* We allow multipart messages. */ + if (!(resp->nlmsg_flags & NLM_F_MULTI)) + goto malformed_resp;
1) It's interesting that they don't seem to define what type the message will be in these cases (it's not NLMSG_DONE or NLMSG_ERROR, and the only other standard types are NLMSG_NOOP and NLMSG_OVERRUN.) So what *is* the type in the case of a multipart message. 2) Doesn't the presence of the NLM_F_MULTI flag indicate that we need to look for another packet, rather than just returning? It's been a long time since I looked at the details of the netlink message handling, but won't this "constipate" the socket?

On 08/11/2015 12:15 AM, Laine Stump wrote: > On 08/10/2015 01:08 PM, Maxim Perevedentsev wrote: >> Such messages do not have NLMSG_ERROR or NLMSG_DONE type >> but they are valid responses. We test 'multi-partness' >> by looking for NLM_F_MULTI flag. >> --- >> src/util/virnetlink.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c >> index 0052ef9..f02bb59 100644 >> --- a/src/util/virnetlink.c >> +++ b/src/util/virnetlink.c >> @@ -386,7 +386,9 @@ virNetlinkGetErrorCode(struct nlmsghdr *resp, unsigned int recvbuflen) >> break; >> >> default: >> - goto malformed_resp; >> + /* We allow multipart messages. */ >> + if (!(resp->nlmsg_flags & NLM_F_MULTI)) >> + goto malformed_resp; > 1) It's interesting that they don't seem to define what type the message > will be in these cases (it's not NLMSG_DONE or NLMSG_ERROR, and the only > other standard types are NLMSG_NOOP and NLMSG_OVERRUN.) So what *is* the > type in the case of a multipart message. I tried it myself (centos7) and the type was 20 (I don't remember, hex or decimal) which is unspecified so we should not rely on particular type. > 2) Doesn't the presence of the NLM_F_MULTI flag indicate that we need to > look for another packet, rather than just returning? It's been a long > time since I looked at the details of the netlink message handling, but > won't this "constipate" the socket? > At the time we get there all parts of multipart message are received. This is handled internally (http://www.infradead.org/~tgr/libnl/doc/core.html). We might parse all parts but I think this is cumbersome and redundant work since it is unlikely for message to be broken in submessages. -- Your sincerely, Maxim Perevedentsev

Hello guys! Just a humble reminder of pending request :-) Any suggestions about patches maybe? On 08/10/2015 08:08 PM, Maxim Perevedentsev wrote:
This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process which is relied on when waiting for DAD to complete exits without actually waiting for DAD. This is dnsmasq daemon's task.
It seems to be a race that DAD finished before dnsmasq main process exited. The above commit needs the execution to block until DAD finishes for bridge IPv6 address because then it closes dummy tap device. Thus we need to ensure this ourselves.
So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do.
We use netlink to dump information about existing IPv6 addresses. Netlink's response is a multi-part message. Unfortunately, the current implementation of virNetlink treats such messages as faulty and throws an error. So the patch 2/2 adds multi-part nelink response support.
Update v2: fixed syntax. Update v3: moved to virnetdev.
Maxim Perevedentsev (2): network: added waiting for DAD to finish for bridge address. Add support for multi-part netlink messages.
src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 35 +++++++++- src/util/virnetdev.c | 160 ++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetdev.h | 2 + src/util/virnetlink.c | 4 +- 5 files changed, 200 insertions(+), 2 deletions(-)
-- Sincerely, Maxim Perevedentsev
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
-- Your sincerely, Maxim Perevedentsev

On 08/17/2015 10:48 AM, Maxim Perevedentsev wrote:
Hello guys!
Just a humble reminder of pending request :-) Any suggestions about patches maybe?
Sorry for the delay. Pretty much everybody (including me) is at KVM Forum this week. I'll try to make some comments on the patches as soon as I can.

Thank you! Waiting in hope you have not forgotten. :) On 08/17/2015 10:11 PM, Laine Stump wrote:
On 08/17/2015 10:48 AM, Maxim Perevedentsev wrote:
Hello guys!
Just a humble reminder of pending request :-) Any suggestions about patches maybe?
Sorry for the delay. Pretty much everybody (including me) is at KVM Forum this week. I'll try to make some comments on the patches as soon as I can.
-- Your sincerely, Maxim Perevedentsev

Hello everyone! I'd like to remind I'm still waiting for comments on these patches. On 08/26/2015 01:28 PM, Maxim Perevedentsev wrote:
Thank you! Waiting in hope you have not forgotten. :)
On 08/17/2015 10:11 PM, Laine Stump wrote:
On 08/17/2015 10:48 AM, Maxim Perevedentsev wrote:
Hello guys!
Just a humble reminder of pending request :-) Any suggestions about patches maybe?
Sorry for the delay. Pretty much everybody (including me) is at KVM Forum this week. I'll try to make some comments on the patches as soon as I can.
-- Your sincerely, Maxim Perevedentsev
participants (3)
-
Laine Stump
-
Maxim Perevedentsev
-
Simon Kelley