[libvirt] [PATCH 0/2] Added waiting for DAD to finish for bridge address.

This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process which is relied on when waiting for DAD to complete exits without actually waiting for DAD. This is dnsmasq daemon's task. It seems to be a race that DAD finished before dnsmasq main process exited. The above commit needs the execution to block until DAD finishes for bridge IPv6 address because then it closes dummy tap device. Thus we need to ensure this ourselves. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do. We use netlink to dump information about existing IPv6 addresses. Netlink's response is a multi-part message. Unfortunately, the current implementation of virNetlink treats such messages as faulty and throws an error. So the patch 2/2 adds multi-part nelink response support. Maxim Perevedentsev (2): network: added waiting for DAD to finish for bridge address. netlink: add support for multi-part netlink messages. src/network/bridge_driver.c | 109 +++++++++++++++++++++++++++++++++++++++++++- src/util/virnetlink.c | 4 +- 2 files changed, 111 insertions(+), 2 deletions(-) -- Sincerely, Maxim Perevedentsev

This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process exits without waiting for DAD, this is dnsmasq daemon's task. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do. --- src/network/bridge_driver.c | 109 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 108 insertions(+), 1 deletion(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 3d6721b..6a7a505 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2025,6 +2025,107 @@ networkAddRouteToBridge(virNetworkObjPtr network, return 0; } +/* return whether DAD is in progress */ +static int +networkParseDadStatus(struct nlmsghdr *nlh, int len, + virNetworkObjPtr network) +{ + struct ifaddrmsg *ifaddrmsg_ptr; + unsigned int ifaddrmsg_len; + struct rtattr *rtattr_ptr; + size_t i; + virNetworkIpDefPtr ipdef; + unsigned char prefix; + struct in6_addr *addr; + for (; NLMSG_OK(nlh, len); nlh = NLMSG_NEXT(nlh, len)) { + if (NLMSG_PAYLOAD(nlh, 0) < sizeof(struct ifaddrmsg)) + // Message without payload is the last one. + break; + + ifaddrmsg_ptr = (struct ifaddrmsg *)NLMSG_DATA(nlh); + if (!(ifaddrmsg_ptr->ifa_flags & IFA_F_TENTATIVE)) + // Not tentative: we are not interested in this entry. + continue; + + ifaddrmsg_len = IFA_PAYLOAD(nlh); + rtattr_ptr = (struct rtattr *) IFA_RTA(ifaddrmsg_ptr); + for (; RTA_OK(rtattr_ptr, ifaddrmsg_len); + rtattr_ptr = RTA_NEXT(rtattr_ptr, ifaddrmsg_len)) { + if (RTA_PAYLOAD(rtattr_ptr) != sizeof(struct in6_addr)) + // No address: ignore. + continue; + + // We check only known addresses. + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) { + + prefix = virNetworkIpDefPrefix(ipdef); + addr = &ipdef->address.data.inet6.sin6_addr; + + if (prefix == ifaddrmsg_ptr->ifa_prefixlen && + !memcmp(addr, RTA_DATA(rtattr_ptr), + sizeof(struct in6_addr))) + // We found matching tentative address. + return 1; + } + } + } + return 0; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +static int +networkWaitDadFinish(virNetworkObjPtr network) +{ + struct nl_msg *nlmsg = NULL; + struct ifaddrmsg ifa; + struct nlmsghdr *resp = NULL; + unsigned int recvbuflen; + int ret = -1, dad = 1; + + if (!(nlmsg = nlmsg_alloc_simple(RTM_GETADDR, + NLM_F_REQUEST | NLM_F_DUMP))) { + virReportOOMError(); + return -1; + } + + memset(&ifa, 0, sizeof(ifa)); + // DAD is for IPv6 adresses only. + ifa.ifa_family = AF_INET6; + if (nlmsg_append(nlmsg, &ifa, sizeof(ifa), NLMSG_ALIGNTO) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + goto cleanup; + } + + // Periodically query netlink until DAD finishes on all known addresses. + while (dad) { + if (virNetlinkCommand(nlmsg, &resp, &recvbuflen, 0, 0, + NETLINK_ROUTE, 0) < 0) + goto cleanup; + + if (virNetlinkGetErrorCode(resp, recvbuflen) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("error reading DAD state information")); + goto cleanup; + } + + //Parse response. + dad = networkParseDadStatus(resp, recvbuflen, network); + if (dad) + usleep(1000 * 10); + + VIR_FREE(resp); + } + ret = 0; + + cleanup: + VIR_FREE(resp); + nlmsg_free(nlmsg); + return ret; +} + static int networkStartNetworkVirtual(virNetworkDriverStatePtr driver, virNetworkObjPtr network) @@ -2159,7 +2260,13 @@ networkStartNetworkVirtual(virNetworkDriverStatePtr driver, if (v6present && networkStartRadvd(driver, network) < 0) goto err4; - /* DAD has happened (dnsmasq waits for it), dnsmasq is now bound to the + /* dnsmasq main process does not wait for DAD to complete, + * so we need to wait for it ourselves. + */ + if (v6present && networkWaitDadFinish(network) < 0) + goto err4; + + /* DAD has happened, dnsmasq is now bound to the * bridge's IPv6 address, so we can now set the dummy tun down. */ if (tapfd >= 0) { -- Sincerely, Maxim Perevedentsev

On 07/31/2015 07:35 PM, Maxim Perevedentsev wrote:
This is a fix for commit db488c79173b240459c7754f38c3c6af9b432970 dnsmasq main process exits without waiting for DAD, this is dnsmasq daemon's task. So we periodically poll the kernel using netlink and check whether there are any IPv6 addresses assigned to bridge which have 'tentative' state. After DAD is finished, execution continues. I guess that is what dnsmasq was assumed to do.
There are some comments on style, but unfortunately I'm not very confident in this part of libvirt, so it would be great if someone else reviewed these patches.
--- src/network/bridge_driver.c | 109 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 108 insertions(+), 1 deletion(-)
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 3d6721b..6a7a505 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -2025,6 +2025,107 @@ networkAddRouteToBridge(virNetworkObjPtr network, return 0; }
+/* return whether DAD is in progress */ Could you, please, specify, which value mean 'DAD is in progress', it's unclear from the comment? +static int +networkParseDadStatus(struct nlmsghdr *nlh, int len, + virNetworkObjPtr network) +{ + struct ifaddrmsg *ifaddrmsg_ptr; + unsigned int ifaddrmsg_len; + struct rtattr *rtattr_ptr; + size_t i; + virNetworkIpDefPtr ipdef; + unsigned char prefix; + struct in6_addr *addr; + for (; NLMSG_OK(nlh, len); nlh = NLMSG_NEXT(nlh, len)) { + if (NLMSG_PAYLOAD(nlh, 0) < sizeof(struct ifaddrmsg)) + // Message without payload is the last one.
Please, change comments to C89-style, /* */. Also add braces, libvirt's coding style requires putting braces around multi-line body even if it's a single line of code with a comment.
+ break; + + ifaddrmsg_ptr = (struct ifaddrmsg *)NLMSG_DATA(nlh); + if (!(ifaddrmsg_ptr->ifa_flags & IFA_F_TENTATIVE)) + // Not tentative: we are not interested in this entry. + continue;
The same, braces and comment.
+ + ifaddrmsg_len = IFA_PAYLOAD(nlh); + rtattr_ptr = (struct rtattr *) IFA_RTA(ifaddrmsg_ptr); + for (; RTA_OK(rtattr_ptr, ifaddrmsg_len); + rtattr_ptr = RTA_NEXT(rtattr_ptr, ifaddrmsg_len)) { + if (RTA_PAYLOAD(rtattr_ptr) != sizeof(struct in6_addr)) + // No address: ignore. + continue; + Braces and comment, also please, fix all such things in code below.
+ // We check only known addresses. + for (i = 0; + (ipdef = virNetworkDefGetIpByIndex(network->def, AF_INET6, i)); + i++) { + + prefix = virNetworkIpDefPrefix(ipdef); + addr = &ipdef->address.data.inet6.sin6_addr; + + if (prefix == ifaddrmsg_ptr->ifa_prefixlen && + !memcmp(addr, RTA_DATA(rtattr_ptr), + sizeof(struct in6_addr))) + // We found matching tentative address. + return 1; + } + } + } + return 0; +} + +/* return after DAD finishes for all known IPv6 addresses or an error */ +static int +networkWaitDadFinish(virNetworkObjPtr network)
I'd put this function to src/util/virnetlink.c
+{ + struct nl_msg *nlmsg = NULL; + struct ifaddrmsg ifa; + struct nlmsghdr *resp = NULL; + unsigned int recvbuflen; + int ret = -1, dad = 1; + + if (!(nlmsg = nlmsg_alloc_simple(RTM_GETADDR, + NLM_F_REQUEST | NLM_F_DUMP))) {
+ virReportOOMError(); + return -1; + } + + memset(&ifa, 0, sizeof(ifa)); + // DAD is for IPv6 adresses only. + ifa.ifa_family = AF_INET6; + if (nlmsg_append(nlmsg, &ifa, sizeof(ifa), NLMSG_ALIGNTO) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small"));
+ goto cleanup; + } + + // Periodically query netlink until DAD finishes on all known addresses. + while (dad) { + if (virNetlinkCommand(nlmsg, &resp, &recvbuflen, 0, 0, + NETLINK_ROUTE, 0) < 0) + goto cleanup; + + if (virNetlinkGetErrorCode(resp, recvbuflen) < 0) { + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("error reading DAD state information"));
style: if (!(nlmsg = nlmsg_alloc_simple(RTM_GETADDR, NLM_F_REQUEST | NLM_F_DUMP))) { please, align second line to the brace the same
+ goto cleanup; + } + + //Parse response. + dad = networkParseDadStatus(resp, recvbuflen, network); + if (dad) + usleep(1000 * 10); What if an interface will remain in transient state forever?
+ + VIR_FREE(resp); + } + ret = 0; + + cleanup: + VIR_FREE(resp); + nlmsg_free(nlmsg); + return ret; +} + static int networkStartNetworkVirtual(virNetworkDriverStatePtr driver, virNetworkObjPtr network) @@ -2159,7 +2260,13 @@ networkStartNetworkVirtual(virNetworkDriverStatePtr driver, if (v6present && networkStartRadvd(driver, network) < 0) goto err4;
- /* DAD has happened (dnsmasq waits for it), dnsmasq is now bound to the + /* dnsmasq main process does not wait for DAD to complete, + * so we need to wait for it ourselves. + */ + if (v6present && networkWaitDadFinish(network) < 0) + goto err4; + + /* DAD has happened, dnsmasq is now bound to the * bridge's IPv6 address, so we can now set the dummy tun down. */ if (tapfd >= 0) { -- Sincerely, Maxim Perevedentsev
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On 08/04/2015 06:18 PM, Dmitry Guryanov wrote:
On 07/31/2015 07:35 PM, Maxim Perevedentsev wrote:
+/* return after DAD finishes for all known IPv6 addresses or an error */ +static int +networkWaitDadFinish(virNetworkObjPtr network)
I'd put this function to src/util/virnetlink.c Then we should create list of sockaddr_in6 to pass addresses and netmasks to virnetlink.c since it should not include virNetwork* structures. This will be double work I think.
+ //Parse response. + dad = networkParseDadStatus(resp, recvbuflen, network); + if (dad) + usleep(1000 * 10); What if an interface will remain in transient state forever? I think this is the responsibility of kernel to update DAD state to 'success' or 'failed' to make interface leave transient state. In addition, this is what was expected from dnsmasq in above commit.
-- Your sincerely, Maxim Perevedentsev

Such messages do not have NLMSG_ERROR or NLMSG_DONE type but they are valid responses. We test 'multi-partness' by looking for NLM_F_MULTI flag. --- src/util/virnetlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index 0052ef9..105a604 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -386,7 +386,9 @@ virNetlinkGetErrorCode(struct nlmsghdr *resp, unsigned int recvbuflen) break; default: - goto malformed_resp; + // We allow multipart messages. + if (!(resp->nlmsg_flags & NLM_F_MULTI)) + goto malformed_resp; } return result; -- Sincerely, Maxim Perevedentsev

On 07/31/2015 07:35 PM, Maxim Perevedentsev wrote:
Such messages do not have NLMSG_ERROR or NLMSG_DONE type but they are valid responses. We test 'multi-partness' by looking for NLM_F_MULTI flag.
This patch looks OK to me except for comment style.
--- src/util/virnetlink.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index 0052ef9..105a604 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -386,7 +386,9 @@ virNetlinkGetErrorCode(struct nlmsghdr *resp, unsigned int recvbuflen) break;
default: - goto malformed_resp; + // We allow multipart messages.
+ if (!(resp->nlmsg_flags & NLM_F_MULTI)) + goto malformed_resp; }
return result; -- Sincerely, Maxim Perevedentsev
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (2)
-
Dmitry Guryanov
-
Maxim Perevedentsev