[libvirt] [PATCH 0/4] Prevent loosing IPv6 routes due to forwarding

Hi all, When enabling IPv6 forwarding on hosts getting Router Advertised routes, the host looses the RA routes. To prevent this, check if the host has such routes. In case there are some, check that the accept_ra is set to 2 before bringing the network up. Cédric Bosdonnat (4): util: extract the request sending code from virNetlinkCommand() util: add virNetlinkDumpCommand() bridge_driver.c: more uses of SYSCTL_PATH network: check accept_ra before enabling ipv6 forwarding src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 187 +++++++++++++++++++++++++++++++++++++++++--- src/util/virnetlink.c | 145 +++++++++++++++++++++++++--------- src/util/virnetlink.h | 9 +++ 4 files changed, 298 insertions(+), 44 deletions(-) -- 2.11.0

Allow to reuse as much as possible from virNetlinkCommand(). This comment prepares for the introduction of virNetlindDumpCommand() only differing by how it handles the responses. --- src/util/virnetlink.c | 90 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 55 insertions(+), 35 deletions(-) diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index a5d10fa8e..5fb49251c 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -209,61 +209,38 @@ virNetlinkCreateSocket(int protocol) goto cleanup; } - -/** - * virNetlinkCommand: - * @nlmsg: pointer to netlink message - * @respbuf: pointer to pointer where response buffer will be allocated - * @respbuflen: pointer to integer holding the size of the response buffer - * on return of the function. - * @src_pid: the pid of the process to send a message - * @dst_pid: the pid of the process to talk to, i.e., pid = 0 for kernel - * @protocol: netlink protocol - * @groups: the group identifier - * - * Send the given message to the netlink layer and receive response. - * Returns 0 on success, -1 on error. In case of error, no response - * buffer will be returned. - */ -int virNetlinkCommand(struct nl_msg *nl_msg, - struct nlmsghdr **resp, unsigned int *respbuflen, - uint32_t src_pid, uint32_t dst_pid, - unsigned int protocol, unsigned int groups) +static virNetlinkHandle * +virNetlinkDoCommand(struct nl_msg *nl_msg, uint32_t src_pid, + struct sockaddr_nl nladdr, + unsigned int protocol, unsigned int groups) { - int ret = -1; - struct sockaddr_nl nladdr = { - .nl_family = AF_NETLINK, - .nl_pid = dst_pid, - .nl_groups = 0, - }; ssize_t nbytes; - struct pollfd fds[1]; int fd; int n; - struct nlmsghdr *nlmsg = nlmsg_hdr(nl_msg); virNetlinkHandle *nlhandle = NULL; - int len = 0; + struct pollfd fds[1]; + struct nlmsghdr *nlmsg = nlmsg_hdr(nl_msg); if (protocol >= MAX_LINKS) { virReportSystemError(EINVAL, _("invalid protocol argument: %d"), protocol); - goto cleanup; + goto error; } if (!(nlhandle = virNetlinkCreateSocket(protocol))) - goto cleanup; + goto error; fd = nl_socket_get_fd(nlhandle); if (fd < 0) { virReportSystemError(errno, "%s", _("cannot get netlink socket fd")); - goto cleanup; + goto error; } if (groups && nl_socket_add_membership(nlhandle, groups) < 0) { virReportSystemError(errno, "%s", _("cannot add netlink membership")); - goto cleanup; + goto error; } nlmsg_set_dst(nl_msg, &nladdr); @@ -274,10 +251,11 @@ int virNetlinkCommand(struct nl_msg *nl_msg, if (nbytes < 0) { virReportSystemError(errno, "%s", _("cannot send to netlink socket")); - goto cleanup; + goto error; } memset(fds, 0, sizeof(fds)); + fds[0].fd = fd; fds[0].events = POLLIN; @@ -289,9 +267,51 @@ int virNetlinkCommand(struct nl_msg *nl_msg, if (n == 0) virReportSystemError(ETIMEDOUT, "%s", _("no valid netlink response was received")); - goto cleanup; } + return nlhandle; + + error: + virNetlinkFree(nlhandle); + return NULL; +} + +/** + * virNetlinkCommand: + * @nlmsg: pointer to netlink message + * @respbuf: pointer to pointer where response buffer will be allocated + * @respbuflen: pointer to integer holding the size of the response buffer + * on return of the function. + * @src_pid: the pid of the process to send a message + * @dst_pid: the pid of the process to talk to, i.e., pid = 0 for kernel + * @protocol: netlink protocol + * @groups: the group identifier + * + * Send the given message to the netlink layer and receive response. + * Returns 0 on success, -1 on error. In case of error, no response + * buffer will be returned. + */ +int virNetlinkCommand(struct nl_msg *nl_msg, + struct nlmsghdr **resp, unsigned int *respbuflen, + uint32_t src_pid, uint32_t dst_pid, + unsigned int protocol, unsigned int groups) +{ + int ret = -1; + struct sockaddr_nl nladdr = { + .nl_family = AF_NETLINK, + .nl_pid = dst_pid, + .nl_groups = 0, + }; + struct pollfd fds[1]; + virNetlinkHandle *nlhandle = NULL; + int len = 0; + + memset(fds, 0, sizeof(fds)); + + if (!(nlhandle = virNetlinkDoCommand(nl_msg, src_pid, nladdr, + protocol, groups))) + goto cleanup; + len = nl_recv(nlhandle, &nladdr, (unsigned char **)resp, NULL); if (len == 0) { virReportError(VIR_ERR_INTERNAL_ERROR, "%s", -- 2.11.0

On 03/03/2017 10:00 AM, Cédric Bosdonnat wrote:
Allow to reuse as much as possible from virNetlinkCommand(). This comment prepares for the introduction of virNetlindDumpCommand() only differing by how it handles the responses. --- src/util/virnetlink.c | 90 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 55 insertions(+), 35 deletions(-)
diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index a5d10fa8e..5fb49251c 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -209,61 +209,38 @@ virNetlinkCreateSocket(int protocol) goto cleanup; }
- -/** - * virNetlinkCommand: - * @nlmsg: pointer to netlink message - * @respbuf: pointer to pointer where response buffer will be allocated - * @respbuflen: pointer to integer holding the size of the response buffer - * on return of the function. - * @src_pid: the pid of the process to send a message - * @dst_pid: the pid of the process to talk to, i.e., pid = 0 for kernel - * @protocol: netlink protocol - * @groups: the group identifier - * - * Send the given message to the netlink layer and receive response. - * Returns 0 on success, -1 on error. In case of error, no response - * buffer will be returned. - */ -int virNetlinkCommand(struct nl_msg *nl_msg, - struct nlmsghdr **resp, unsigned int *respbuflen, - uint32_t src_pid, uint32_t dst_pid, - unsigned int protocol, unsigned int groups) +static virNetlinkHandle * +virNetlinkDoCommand(struct nl_msg *nl_msg, uint32_t src_pid,
Instead of virNetlinkDoCommand(), how about virNetlinkSendRequest() (or virNetlinkSendMessage())? Aside from that, it looks good. Possibly it could be tweaked a bit to make it usable in virNetlinkDumpLink() as well. Similarly another related cleanup is that all the callers to virNetlinkCommand() end up doing some version of virNetlinkGetErrorCode() - they could be merged. But all that (aside from renaming the function) is independent of this patch - ACK to this with a name change.

virNetlinkCommand() processes only one response message, while some netlink commands like routes dumping need to process several ones. Add virNetlinkDumpCommand() as a virNetlinkCommand() sister. --- src/libvirt_private.syms | 1 + src/util/virnetlink.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetlink.h | 9 ++++++++ 3 files changed, 65 insertions(+) diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index bce0487ab..71143851c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2117,6 +2117,7 @@ virNetDevVPortProfileOpTypeToString; # util/virnetlink.h virNetlinkCommand; virNetlinkDelLink; +virNetlinkDumpCommand; virNetlinkDumpLink; virNetlinkEventAddClient; virNetlinkEventRemoveClient; diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index 5fb49251c..4747ba5a4 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -335,6 +335,49 @@ int virNetlinkCommand(struct nl_msg *nl_msg, return ret; } +int +virNetlinkDumpCommand(struct nl_msg *nl_msg, + virNetlinkDumpCallback callback, + uint32_t src_pid, uint32_t dst_pid, + unsigned int protocol, unsigned int groups, + void *opaque) +{ + int ret = -1; + bool end = false; + int len = 0; + struct nlmsghdr *resp = NULL; + struct nlmsghdr *msg = NULL; + + struct sockaddr_nl nladdr = { + .nl_family = AF_NETLINK, + .nl_pid = dst_pid, + .nl_groups = 0, + }; + virNetlinkHandle *nlhandle = NULL; + + if (!(nlhandle = virNetlinkDoCommand(nl_msg, src_pid, nladdr, + protocol, groups))) + goto cleanup; + + while (!end) { + len = nl_recv(nlhandle, &nladdr, (unsigned char **)&resp, NULL); + + for (msg = resp; NLMSG_OK(msg, len); msg = NLMSG_NEXT(msg, len)) { + if (msg->nlmsg_type == NLMSG_DONE) + end = true; + + if (callback(msg, opaque) < 0) + goto cleanup; + } + } + + ret = 0; + + cleanup: + virNetlinkFree(nlhandle); + return ret; +} + /** * virNetlinkDumpLink: @@ -1062,6 +1105,18 @@ int virNetlinkCommand(struct nl_msg *nl_msg ATTRIBUTE_UNUSED, return -1; } +int +virNetlinkDumpCommand(struct nl_msg *nl_msg ATTRIBUTE_UNUSED, + virNetlinkDumpCallback callback ATTRIBUTE_UNUSED, + uint32_t src_pid ATTRIBUTE_UNUSED, + uint32_t dst_pid ATTRIBUTE_UNUSED, + unsigned int protocol ATTRIBUTE_UNUSED, + unsigned int groups ATTRIBUTE_UNUSED, + void *opaque ATTRIBUTE_UNUSED) +{ + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", _(unsupported)); + return -1; +} int virNetlinkDumpLink(const char *ifname ATTRIBUTE_UNUSED, diff --git a/src/util/virnetlink.h b/src/util/virnetlink.h index 11e817c82..088b01343 100644 --- a/src/util/virnetlink.h +++ b/src/util/virnetlink.h @@ -52,6 +52,15 @@ int virNetlinkCommand(struct nl_msg *nl_msg, uint32_t src_pid, uint32_t dst_pid, unsigned int protocol, unsigned int groups); +typedef int (*virNetlinkDumpCallback)(const struct nlmsghdr *resp, + void *data); + +int virNetlinkDumpCommand(struct nl_msg *nl_msg, + virNetlinkDumpCallback callback, + uint32_t src_pid, uint32_t dst_pid, + unsigned int protocol, unsigned int groups, + void *opaque); + typedef int (*virNetlinkDelLinkFallback)(const char *ifname); int virNetlinkDelLink(const char *ifname, virNetlinkDelLinkFallback fallback); -- 2.11.0

On 03/03/2017 10:00 AM, Cédric Bosdonnat wrote:
virNetlinkCommand() processes only one response message, while some netlink commands like routes dumping need to process several ones. Add virNetlinkDumpCommand() as a virNetlinkCommand() sister. --- src/libvirt_private.syms | 1 + src/util/virnetlink.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ src/util/virnetlink.h | 9 ++++++++ 3 files changed, 65 insertions(+)
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index bce0487ab..71143851c 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -2117,6 +2117,7 @@ virNetDevVPortProfileOpTypeToString; # util/virnetlink.h virNetlinkCommand; virNetlinkDelLink; +virNetlinkDumpCommand; virNetlinkDumpLink; virNetlinkEventAddClient; virNetlinkEventRemoveClient; diff --git a/src/util/virnetlink.c b/src/util/virnetlink.c index 5fb49251c..4747ba5a4 100644 --- a/src/util/virnetlink.c +++ b/src/util/virnetlink.c @@ -335,6 +335,49 @@ int virNetlinkCommand(struct nl_msg *nl_msg, return ret; }
+int +virNetlinkDumpCommand(struct nl_msg *nl_msg, + virNetlinkDumpCallback callback, + uint32_t src_pid, uint32_t dst_pid, + unsigned int protocol, unsigned int groups, + void *opaque) +{ + int ret = -1; + bool end = false; + int len = 0; + struct nlmsghdr *resp = NULL; + struct nlmsghdr *msg = NULL; + + struct sockaddr_nl nladdr = { + .nl_family = AF_NETLINK, + .nl_pid = dst_pid, + .nl_groups = 0, + }; + virNetlinkHandle *nlhandle = NULL; + + if (!(nlhandle = virNetlinkDoCommand(nl_msg, src_pid, nladdr, + protocol, groups))) + goto cleanup; + + while (!end) { + len = nl_recv(nlhandle, &nladdr, (unsigned char **)&resp, NULL); + + for (msg = resp; NLMSG_OK(msg, len); msg = NLMSG_NEXT(msg, len)) { + if (msg->nlmsg_type == NLMSG_DONE) + end = true; + + if (callback(msg, opaque) < 0) + goto cleanup;
This is relying on the callback function to throw an error in order to get us out of the loop (assuming we never get an NLMSG_DONE). I don't see the callback that's written in patch 4 doing anything along the lines of virNetlinkGetErrorCode() though. Maybe you should call that function from here prior to the callback - that would eliminate the need to separately do that level of error checking in all the future consumers of this new function. Otherwise I think it looks okay (I can't claim to have great knowledge of multipart netlink messages though, so...)

Replace a few occurences of /proc/sys by the corresponding macro defined a few lines after: SYSCTL_PATH --- src/network/bridge_driver.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index c5ec2823d..3f6561055 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -85,6 +85,8 @@ */ #define VIR_NETWORK_DHCP_LEASE_FILE_SIZE_MAX (32 * 1024 * 1024) +#define SYSCTL_PATH "/proc/sys" + VIR_LOG_INIT("network.bridge_driver"); static virNetworkDriverStatePtr network_driver; @@ -2080,15 +2082,14 @@ networkEnableIPForwarding(bool enableIPv4, bool enableIPv6) &enabled, sizeof(enabled)); #else if (enableIPv4) - ret = virFileWriteStr("/proc/sys/net/ipv4/ip_forward", "1\n", 0); + ret = virFileWriteStr(SYSCTL_PATH "/net/ipv4/ip_forward", "1\n", 0); if (enableIPv6 && ret == 0) - ret = virFileWriteStr("/proc/sys/net/ipv6/conf/all/forwarding", "1\n", 0); + ret = virFileWriteStr(SYSCTL_PATH "/net/ipv6/conf/all/forwarding", "1\n", 0); + #endif return ret; } -#define SYSCTL_PATH "/proc/sys" - static int networkSetIPv6Sysctls(virNetworkObjPtr network) { -- 2.11.0

On 03/03/2017 10:00 AM, Cédric Bosdonnat wrote:
Replace a few occurences of /proc/sys by the corresponding macro defined a few lines after: SYSCTL_PATH --- src/network/bridge_driver.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index c5ec2823d..3f6561055 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -85,6 +85,8 @@ */ #define VIR_NETWORK_DHCP_LEASE_FILE_SIZE_MAX (32 * 1024 * 1024)
+#define SYSCTL_PATH "/proc/sys" + VIR_LOG_INIT("network.bridge_driver");
static virNetworkDriverStatePtr network_driver; @@ -2080,15 +2082,14 @@ networkEnableIPForwarding(bool enableIPv4, bool enableIPv6) &enabled, sizeof(enabled)); #else if (enableIPv4) - ret = virFileWriteStr("/proc/sys/net/ipv4/ip_forward", "1\n", 0); + ret = virFileWriteStr(SYSCTL_PATH "/net/ipv4/ip_forward", "1\n", 0); if (enableIPv6 && ret == 0) - ret = virFileWriteStr("/proc/sys/net/ipv6/conf/all/forwarding", "1\n", 0); + ret = virFileWriteStr(SYSCTL_PATH "/net/ipv6/conf/all/forwarding", "1\n", 0); + #endif return ret; }
-#define SYSCTL_PATH "/proc/sys" - static int networkSetIPv6Sysctls(virNetworkObjPtr network) {
Sure. ACK.

When enabling IPv6 on all interfaces, we may get the host Router Advertisement routes discarded. To avoid this, the user needs to set accept_ra to 2 for the interfaces with such routes. See https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt on this topic. To avoid user mistakenly loosing routes on their hosts, check accept_ra values before enabling IPv6 forwarding. If a RA route is detected, but neither the corresponding device nor global accept_ra is set to 2, the network will fail to start. --- src/network/bridge_driver.c | 178 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 173 insertions(+), 5 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 3f6561055..1ac837f7f 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -61,6 +61,7 @@ #include "virlog.h" #include "virdnsmasq.h" #include "configmake.h" +#include "virnetlink.h" #include "virnetdev.h" #include "virnetdevip.h" #include "virnetdevbridge.h" @@ -2067,6 +2068,168 @@ networkReloadFirewallRules(virNetworkDriverStatePtr driver) NULL); } +static int +networkGetAcceptRA(const char *ifname) +{ + char *path = NULL; + char *buf = NULL; + char *suffix; + int accept_ra = -1; + + if (virAsprintf(&path, SYSCTL_PATH "/net/ipv6/conf/%s/accept_ra", + ifname ? ifname : "all") < 0) + goto cleanup; + + if ((virFileReadAll(path, 512, &buf) < 0) || + (virStrToLong_i(buf, &suffix, 10, &accept_ra) < 0)) + goto cleanup; + + cleanup: + VIR_FREE(path); + VIR_FREE(buf); + + return accept_ra; +} + +#if defined(__linux__) && defined(HAVE_LIBNL) +struct networkIPv6CheckData { + bool hasRARoutes; + + /* Devices with conflicting accept_ra */ + char **devices; + size_t ndevices; +}; + +static int +networkCheckIPv6ForwardingCallback(const struct nlmsghdr *resp, + void *opaque) +{ + struct rtmsg *rtmsg = NLMSG_DATA(resp); + int accept_ra = -1; + struct rtattr *rta; + char *ifname = NULL; + char name[IFNAMSIZ]; + struct networkIPv6CheckData *data = opaque; + int ret = 0; + int len = RTM_PAYLOAD(resp); + int oif = -1; + + /* Ignore messages other than route ones */ + if (resp->nlmsg_type != RTM_NEWROUTE) + return ret; + + memset(&name, 0, sizeof(name)); + + /* Extract a few attributes */ + for (rta = RTM_RTA(rtmsg); RTA_OK(rta, len); rta = RTA_NEXT(rta, len)) { + switch (rta->rta_type) { + case RTA_OIF: + oif = *(int *)RTA_DATA(rta); + if (!if_indextoname(oif, name) || VIR_STRDUP(ifname, name) < 0) + VIR_DEBUG("Failed to convert OIF to a device name"); + break; + } + } + + /* No need to do anything else for non RA routes */ + if (rtmsg->rtm_protocol != RTPROT_RA) + goto cleanup; + + data->hasRARoutes = true; + + /* Check the accept_ra value for the interface */ + accept_ra = networkGetAcceptRA(ifname); + VIR_DEBUG("Checking route for device %s, accept_ra: %d", ifname, accept_ra); + + if (accept_ra != 2 && VIR_APPEND_ELEMENT(data->devices, data->ndevices, ifname) < 0) + ret = -1; + + cleanup: + VIR_FREE(ifname); + return ret; +} + +static bool +networkCheckIPv6Forwarding(void) +{ + struct nl_msg *nlmsg = NULL; + bool valid = false; + struct rtgenmsg genmsg; + size_t i; + struct networkIPv6CheckData data = { + .hasRARoutes = false, + .devices = NULL, + .ndevices = 0 + }; + + + /* Prepare the request message */ + if (!(nlmsg = nlmsg_alloc_simple(RTM_GETROUTE, + NLM_F_REQUEST | NLM_F_DUMP))) { + virReportOOMError(); + goto cleanup; + } + + memset(&genmsg, 0, sizeof(genmsg)); + genmsg.rtgen_family = AF_INET6; + + if (nlmsg_append(nlmsg, &genmsg, sizeof(genmsg), NLMSG_ALIGNTO) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + goto cleanup; + } + + /* Send the request and loop over the responses */ + if (virNetlinkDumpCommand(nlmsg, networkCheckIPv6ForwardingCallback, 0, 0, + NETLINK_ROUTE, 0, &data) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to loop over routes")); + goto cleanup; + } + + valid = !data.hasRARoutes || data.ndevices == 0; + + /* Check the global accept_ra if at least one isn't set on a + per-device basis */ + if (!valid && data.hasRARoutes) { + int accept_ra = networkGetAcceptRA(NULL); + valid = accept_ra == 2; + VIR_DEBUG("Checked global accept_ra: %d", accept_ra); + } + + if (!valid) { + virBuffer buf = VIR_BUFFER_INITIALIZER; + for (i = 0; i < data.ndevices; i++) { + virBufferAdd(&buf, data.devices[i], -1); + if (i < data.ndevices - 1) + virBufferAddLit(&buf, ", "); + } + + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Check the host setup: enabling IPv6 forwarding with " + "RA routes without accept_ra set to 2 is likely to cause " + "routes loss. Interfaces to look at: %s"), + virBufferCurrentContent(&buf)); + virBufferFreeAndReset(&buf); + } + + cleanup: + nlmsg_free(nlmsg); + for (i = 0; i < data.ndevices; i++) + VIR_FREE(data.devices[i]); + return valid; +} + +#else /* defined(__linux__) && defined(HAVE_LIBNL) */ + +static bool +networkCheckIPv6Forwarding(void) +{ + VIR_WARN("built without libnl: unable to check if IPv6 forwarding can be safely enabled"); + return true; +} +#endif /* defined(__linux__) && defined(HAVE_LIBNL) */ + /* Enable IP Forwarding. Return 0 for success, -1 for failure. */ static int networkEnableIPForwarding(bool enableIPv4, bool enableIPv6) @@ -2377,11 +2540,16 @@ networkStartNetworkVirtual(virNetworkDriverStatePtr driver, } /* If forward.type != NONE, turn on global IP forwarding */ - if (network->def->forward.type != VIR_NETWORK_FORWARD_NONE && - networkEnableIPForwarding(v4present, v6present) < 0) { - virReportSystemError(errno, "%s", - _("failed to enable IP forwarding")); - goto err3; + if (network->def->forward.type != VIR_NETWORK_FORWARD_NONE) { + if (!networkCheckIPv6Forwarding()) + goto err3; /* Precise error message already provided */ + + + if (networkEnableIPForwarding(v4present, v6present) < 0) { + virReportSystemError(errno, "%s", + _("failed to enable IP forwarding")); + goto err3; + } } -- 2.11.0

On 03/03/2017 10:00 AM, Cédric Bosdonnat wrote:
When enabling IPv6 on all interfaces, we may get the host Router Advertisement routes discarded. To avoid this, the user needs to set accept_ra to 2 for the interfaces with such routes.
See https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt on this topic.
To avoid user mistakenly loosing routes on their hosts, check
s/loosing/losing/
accept_ra values before enabling IPv6 forwarding. If a RA route is detected, but neither the corresponding device nor global accept_ra is set to 2, the network will fail to start.
Is there any safe way to deal with it automatically? (apologies if I missed the end of any discussion - I recall your initial inquiry on the subject, but nothing after that)
--- src/network/bridge_driver.c | 178 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 173 insertions(+), 5 deletions(-)
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index 3f6561055..1ac837f7f 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -61,6 +61,7 @@ #include "virlog.h" #include "virdnsmasq.h" #include "configmake.h" +#include "virnetlink.h" #include "virnetdev.h" #include "virnetdevip.h" #include "virnetdevbridge.h" @@ -2067,6 +2068,168 @@ networkReloadFirewallRules(virNetworkDriverStatePtr driver) NULL); }
+static int +networkGetAcceptRA(const char *ifname) +{ + char *path = NULL; + char *buf = NULL; + char *suffix; + int accept_ra = -1; + + if (virAsprintf(&path, SYSCTL_PATH "/net/ipv6/conf/%s/accept_ra", + ifname ? ifname : "all") < 0) + goto cleanup; + + if ((virFileReadAll(path, 512, &buf) < 0) || + (virStrToLong_i(buf, &suffix, 10, &accept_ra) < 0)) + goto cleanup; + + cleanup: + VIR_FREE(path); + VIR_FREE(buf); + + return accept_ra; +}
Although there is some sysfs-specific stuff in the network directory already, I think I'd prefer to have this in util/virnetdevip.c.
+ +#if defined(__linux__) && defined(HAVE_LIBNL) +struct networkIPv6CheckData { + bool hasRARoutes; + + /* Devices with conflicting accept_ra */ + char **devices; + size_t ndevices; +}; + +static int +networkCheckIPv6ForwardingCallback(const struct nlmsghdr *resp, + void *opaque) +{ + struct rtmsg *rtmsg = NLMSG_DATA(resp); + int accept_ra = -1; + struct rtattr *rta; + char *ifname = NULL; + char name[IFNAMSIZ]; + struct networkIPv6CheckData *data = opaque; + int ret = 0;
I understand why you defaulted ret = 0 instead of our normal -1. It does force me to actually think about each goto cleanup though...
+ int len = RTM_PAYLOAD(resp); + int oif = -1; + + /* Ignore messages other than route ones */ + if (resp->nlmsg_type != RTM_NEWROUTE) + return ret; + + memset(&name, 0, sizeof(name)); + + /* Extract a few attributes */ + for (rta = RTM_RTA(rtmsg); RTA_OK(rta, len); rta = RTA_NEXT(rta, len)) { + switch (rta->rta_type) { + case RTA_OIF: + oif = *(int *)RTA_DATA(rta); + if (!if_indextoname(oif, name) || VIR_STRDUP(ifname, name) < 0) + VIR_DEBUG("Failed to convert OIF to a device name");
Do you really want to stop looking, but also still return success? Maybe it should fail? Or maybe it's okay to succeed, but the VIR_DEBUG() should be a VIR_WARN. (I'm not sure, that's why I'm asking :-) (The one situation where I've seen an ifindex from the kernel fail to resolve into an interface name was when something "went wrong" in the kernel and macvtap interfaces showed themselves as attached to an ifindex that was no longer visible as a named network device. We never did figure out why (it only happened on a host that was part of a financial hosting service, so our customer wasn't allowed to give us access, and also couldn't install debug builds, so our troubleshooting capabilities were *very* limited), but the one certain thing is that the networking that involved these "nameless" interfaces didn't work (although other interfaces on the system did). Also, if VIR_STRDUP() fails you will have logged an error but still returned success. Maybe you should check that separately and return error (assuming Finally, maybe we should wrap if_indextoname() in a virNetDevGetName() (...FromIndex() ?) both for symmetry with virNetDevGetIndex() and for possible future use by other functions.
+ break; + } + } + + /* No need to do anything else for non RA routes */ + if (rtmsg->rtm_protocol != RTPROT_RA) + goto cleanup; + + data->hasRARoutes = true; + + /* Check the accept_ra value for the interface */ + accept_ra = networkGetAcceptRA(ifname); + VIR_DEBUG("Checking route for device %s, accept_ra: %d", ifname, accept_ra); + + if (accept_ra != 2 && VIR_APPEND_ELEMENT(data->devices, data->ndevices, ifname) < 0) + ret = -1;
Okay, so this callback is gathering a list of all interfaces that have a route but accept_ra != 2.
+ + cleanup: + VIR_FREE(ifname); + return ret; +} + +static bool +networkCheckIPv6Forwarding(void) +{ + struct nl_msg *nlmsg = NULL; + bool valid = false; + struct rtgenmsg genmsg; + size_t i; + struct networkIPv6CheckData data = { + .hasRARoutes = false, + .devices = NULL, + .ndevices = 0 + }; + + + /* Prepare the request message */ + if (!(nlmsg = nlmsg_alloc_simple(RTM_GETROUTE, + NLM_F_REQUEST | NLM_F_DUMP))) { + virReportOOMError(); + goto cleanup; + } + + memset(&genmsg, 0, sizeof(genmsg)); + genmsg.rtgen_family = AF_INET6; + + if (nlmsg_append(nlmsg, &genmsg, sizeof(genmsg), NLMSG_ALIGNTO) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("allocated netlink buffer is too small")); + goto cleanup; + } + + /* Send the request and loop over the responses */ + if (virNetlinkDumpCommand(nlmsg, networkCheckIPv6ForwardingCallback, 0, 0, + NETLINK_ROUTE, 0, &data) < 0) { + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", + _("Failed to loop over routes")); + goto cleanup; + } + + valid = !data.hasRARoutes || data.ndevices == 0; + + /* Check the global accept_ra if at least one isn't set on a + per-device basis */ + if (!valid && data.hasRARoutes) { + int accept_ra = networkGetAcceptRA(NULL); + valid = accept_ra == 2; + VIR_DEBUG("Checked global accept_ra: %d", accept_ra); + } + + if (!valid) { + virBuffer buf = VIR_BUFFER_INITIALIZER; + for (i = 0; i < data.ndevices; i++) { + virBufferAdd(&buf, data.devices[i], -1); + if (i < data.ndevices - 1) + virBufferAddLit(&buf, ", "); + } + + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Check the host setup: enabling IPv6 forwarding with " + "RA routes without accept_ra set to 2 is likely to cause " + "routes loss. Interfaces to look at: %s"), + virBufferCurrentContent(&buf)); + virBufferFreeAndReset(&buf); + } + + cleanup: + nlmsg_free(nlmsg); + for (i = 0; i < data.ndevices; i++) + VIR_FREE(data.devices[i]); + return valid; +} + +#else /* defined(__linux__) && defined(HAVE_LIBNL) */ + +static bool +networkCheckIPv6Forwarding(void) +{ + VIR_WARN("built without libnl: unable to check if IPv6 forwarding can be safely enabled"); + return true; +} +#endif /* defined(__linux__) && defined(HAVE_LIBNL) */ + /* Enable IP Forwarding. Return 0 for success, -1 for failure. */ static int networkEnableIPForwarding(bool enableIPv4, bool enableIPv6) @@ -2377,11 +2540,16 @@ networkStartNetworkVirtual(virNetworkDriverStatePtr driver, }
/* If forward.type != NONE, turn on global IP forwarding */ - if (network->def->forward.type != VIR_NETWORK_FORWARD_NONE && - networkEnableIPForwarding(v4present, v6present) < 0) { - virReportSystemError(errno, "%s", - _("failed to enable IP forwarding")); - goto err3; + if (network->def->forward.type != VIR_NETWORK_FORWARD_NONE) { + if (!networkCheckIPv6Forwarding()) + goto err3; /* Precise error message already provided */ + + + if (networkEnableIPForwarding(v4present, v6present) < 0) { + virReportSystemError(errno, "%s", + _("failed to enable IP forwarding")); + goto err3; + } }
I generally understand the logic, which looks fine. As I said above, I think it would be better to have most of it in virnetdevip.c (yes, I think the existing stuff that plays with ip_forward might also be better moved into utility functions in virnetdevip.c, but that's not your problem :-)

Hi guys, Has that patch series fallen into the mailing list abysses? -- Cedric On Fri, 2017-03-03 at 16:00 +0100, Cédric Bosdonnat wrote:
Hi all,
When enabling IPv6 forwarding on hosts getting Router Advertised routes, the host looses the RA routes. To prevent this, check if the host has such routes. In case there are some, check that the accept_ra is set to 2 before bringing the network up.
Cédric Bosdonnat (4): util: extract the request sending code from virNetlinkCommand() util: add virNetlinkDumpCommand() bridge_driver.c: more uses of SYSCTL_PATH network: check accept_ra before enabling ipv6 forwarding
src/libvirt_private.syms | 1 + src/network/bridge_driver.c | 187 +++++++++++++++++++++++++++++++++++++++++--- src/util/virnetlink.c | 145 +++++++++++++++++++++++++--------- src/util/virnetlink.h | 9 +++ 4 files changed, 298 insertions(+), 44 deletions(-)
participants (3)
-
Cedric Bosdonnat
-
Cédric Bosdonnat
-
Laine Stump