[libvirt] [PATCH 0/2] don't masquerade local broadcast/multicast packets

Masquerading local broadcast breaks DHCP replies for some clients. There has been a report about broken local multicast too. (See references in the patches.) Testing: since I have no idea how to test upstream libvirt on a RHEL-6.4.z virt host and guarantee nothing will be tangled up, I ported the series to libvirt-0.10.2-18.el6_4.5 and tested that. (The upstream series does build and passes the checks in HACKING, except I didn't try valgrind.) Laszlo Ersek (2): util/viriptables: add/remove rules that short-circuit masquerading bridge driver: don't masquerade local subnet broadcast/multicast packets src/util/viriptables.h | 10 +++++ src/network/bridge_driver.c | 76 +++++++++++++++++++++++++++++++++-- src/util/viriptables.c | 93 +++++++++++++++++++++++++++++++++++++++++++ src/libvirt_private.syms | 2 + 4 files changed, 177 insertions(+), 4 deletions(-)

The functions - iptablesAddForwardDontMasquerade(), - iptablesRemoveForwardDontMasquerade handle exceptions in the masquerading implemented in the POSTROUTING chain of the "nat" table. Such exceptions should be added as chronologically latest, logically top-most rules. The bridge driver will call these functions beginning with the next patch: some special destination IP addresses always refer to the local subnetwork, even though they don't match any practical subnetwork's netmask. Packets from virbrN targeting such IP addresses are never routed outwards, but the current rules treat them as non-virbrN-destined packets and masquerade them. This causes problems for some receivers on virbrN. Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- src/util/viriptables.h | 10 +++++ src/util/viriptables.c | 93 ++++++++++++++++++++++++++++++++++++++++++++++ src/libvirt_private.syms | 2 + 3 files changed, 105 insertions(+), 0 deletions(-) diff --git a/src/util/viriptables.h b/src/util/viriptables.h index b7ce59b..200dbbc 100644 --- a/src/util/viriptables.h +++ b/src/util/viriptables.h @@ -117,6 +117,16 @@ int iptablesRemoveForwardMasquerade (iptablesContext *ctx, virSocketAddrRangePtr addr, virPortRangePtr port, const char *protocol); +int iptablesAddForwardDontMasquerade(iptablesContext *ctx, + virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr); +int iptablesRemoveForwardDontMasquerade(iptablesContext *ctx, + virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr); int iptablesAddOutputFixUdpChecksum (iptablesContext *ctx, const char *iface, int port); diff --git a/src/util/viriptables.c b/src/util/viriptables.c index 16fbe9c..c6a9f6b 100644 --- a/src/util/viriptables.c +++ b/src/util/viriptables.c @@ -961,6 +961,99 @@ iptablesRemoveForwardMasquerade(iptablesContext *ctx, } +/* Don't masquerade traffic coming from the network associated with the bridge + * if said traffic targets @destaddr. + */ +static int +iptablesForwardDontMasquerade(iptablesContext *ctx, + virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr, + int action) +{ + int ret = -1; + char *networkstr = NULL; + virCommandPtr cmd = NULL; + + if (!(networkstr = iptablesFormatNetwork(netaddr, prefix))) + return -1; + + if (!VIR_SOCKET_ADDR_IS_FAMILY(netaddr, AF_INET)) { + /* Higher level code *should* guaranteee it's impossible to get here. */ + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Attempted to NAT '%s'. NAT is only supported for IPv4."), + networkstr); + goto cleanup; + } + + cmd = iptablesCommandNew(ctx->nat_postrouting, AF_INET, action); + + if (physdev && physdev[0]) + virCommandAddArgList(cmd, "--out-interface", physdev, NULL); + + virCommandAddArgList(cmd, "--source", networkstr, + "--destination", destaddr, "--jump", "RETURN", NULL); + ret = virCommandRun(cmd, NULL); +cleanup: + virCommandFree(cmd); + VIR_FREE(networkstr); + return ret; +} + +/** + * iptablesAddForwardDontMasquerade: + * @ctx: pointer to the IP table context + * @netaddr: the source network name + * @prefix: prefix (# of 1 bits) of netmask to apply to @netaddr + * @physdev: the physical output device or NULL + * @destaddr: the destination network not to masquerade for + * + * Add rules to the IP table context to avoid masquerading from + * @netaddr/@prefix to @destaddr on @physdev. @destaddr must be in a format + * directly consumable by iptables, it must not depend on user input or + * configuration. + * + * Returns 0 in case of success or an error code otherwise. + */ +int +iptablesAddForwardDontMasquerade(iptablesContext *ctx, + virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr) +{ + return iptablesForwardDontMasquerade(ctx, netaddr, prefix, physdev, + destaddr, ADD); +} + +/** + * iptablesRemoveForwardDontMasquerade: + * @ctx: pointer to the IP table context + * @netaddr: the source network name + * @prefix: prefix (# of 1 bits) of netmask to apply to @netaddr + * @physdev: the physical output device or NULL + * @destaddr: the destination network not to masquerade for + * + * Remove rules from the IP table context that prevent masquerading from + * @netaddr/@prefix to @destaddr on @physdev. @destaddr must be in a format + * directly consumable by iptables, it must not depend on user input or + * configuration. + * + * Returns 0 in case of success or an error code otherwise. + */ +int +iptablesRemoveForwardDontMasquerade(iptablesContext *ctx, + virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr) +{ + return iptablesForwardDontMasquerade(ctx, netaddr, prefix, physdev, + destaddr, REMOVE); +} + + static int iptablesOutputFixUdpChecksum(iptablesContext *ctx, const char *iface, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 9d5f74b..27cc49a 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1364,6 +1364,7 @@ iptablesAddForwardAllowCross; iptablesAddForwardAllowIn; iptablesAddForwardAllowOut; iptablesAddForwardAllowRelatedIn; +iptablesAddForwardDontMasquerade; iptablesAddForwardMasquerade; iptablesAddForwardRejectIn; iptablesAddForwardRejectOut; @@ -1376,6 +1377,7 @@ iptablesRemoveForwardAllowCross; iptablesRemoveForwardAllowIn; iptablesRemoveForwardAllowOut; iptablesRemoveForwardAllowRelatedIn; +iptablesRemoveForwardDontMasquerade; iptablesRemoveForwardMasquerade; iptablesRemoveForwardRejectIn; iptablesRemoveForwardRejectOut; -- 1.7.1

Packets sent by guests on virbrN, *or* by dnsmasq on the same, to - 255.255.255.255/32 (netmask-independent local network broadcast address), or to - 224.0.0.0/24 (local subnetwork multicast range) are never forwarded, hence it is not necessary to masquerade them. In fact we must not masquerade them: translating their source addresses or source ports (where applicable) may confuse receivers on virbrN. One example is the DHCP client in OVMF (= UEFI firmware for virtual machines): http://thread.gmane.org/gmane.comp.bios.tianocore.devel/1506/focus=2640 It expects DHCP replies to arrive from remote source port 67. Even though dnsmasq conforms to that, the destination address (255.255.255.255) and the source address (eg. 192.168.122.1) in the reply allow the UDP masquerading rule to match, which rewrites the source port to or above 1024. This prevents the DHCP client in OVMF from accepting the packet. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=709418 (I read attachment 526549 in that BZ by Brian J. Murrell <brian@interlinx.bc.ca>, and Laine Stump's review thereof, before starting this patch.) Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- src/network/bridge_driver.c | 76 ++++++++++++++++++++++++++++++++++++++++-- 1 files changed, 72 insertions(+), 4 deletions(-) diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index d5886fe..2b3b250 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -1542,6 +1542,9 @@ networkRefreshDaemons(struct network_driver *driver) } } +static const char networkLocalMulticast[] = "224.0.0.0/24"; +static const char networkLocalBroadcast[] = "255.255.255.255/32"; + static int networkAddMasqueradingIptablesRules(struct network_driver *driver, virNetworkObjPtr network, @@ -1586,11 +1589,20 @@ networkAddMasqueradingIptablesRules(struct network_driver *driver, /* * Enable masquerading. * - * We need to end up with 3 rules in the table in this order + * We need to end up with 5 rules in the table in this order + * + * 1. do not masquerade packets targeting 224.0.0.0/24 + * 2. do not masquerade packets targeting 255.255.255.255/32 + * 3. masquerade protocol=tcp with sport mapping restriction + * 4. masquerade protocol=udp with sport mapping restriction + * 5. generic, masquerade any protocol + * + * 224.0.0.0/24 is the local network multicast range. Packets are not + * forwarded outside. * - * 1. protocol=tcp with sport mapping restriction - * 2. protocol=udp with sport mapping restriction - * 3. generic any protocol + * 255.255.255.255/32 is the broadcast address of any local network. Again, + * such packets are never forwarded, but strict DHCP clients don't accept + * DHCP replies with changed source ports. * * The sport mappings are required, because default IPtables * MASQUERADE maintain port numbers unchanged where possible. @@ -1660,8 +1672,54 @@ networkAddMasqueradingIptablesRules(struct network_driver *driver, goto masqerr5; } + /* exempt local network broadcast address as destination */ + if (iptablesAddForwardDontMasquerade(driver->iptables, + &ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast) < 0) { + if (forwardIf) + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to add iptables rule to prevent local broadcast masquerading on %s"), + forwardIf); + else + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("failed to add iptables rule to prevent local broadcast masquerading")); + goto masqerr6; + } + + /* exempt local multicast range as destination */ + if (iptablesAddForwardDontMasquerade(driver->iptables, + &ipdef->address, + prefix, + forwardIf, + networkLocalMulticast) < 0) { + if (forwardIf) + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to add iptables rule to prevent local multicast masquerading on %s"), + forwardIf); + else + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("failed to add iptables rule to prevent local multicast masquerading")); + goto masqerr7; + } + return 0; + masqerr7: + iptablesRemoveForwardDontMasquerade(driver->iptables, + &ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast); + masqerr6: + iptablesRemoveForwardMasquerade(driver->iptables, + &ipdef->address, + prefix, + forwardIf, + &network->def->forward.addr, + &network->def->forward.port, + "tcp"); masqerr5: iptablesRemoveForwardMasquerade(driver->iptables, &ipdef->address, @@ -1703,6 +1761,16 @@ networkRemoveMasqueradingIptablesRules(struct network_driver *driver, const char *forwardIf = virNetworkDefForwardIf(network->def, 0); if (prefix >= 0) { + iptablesRemoveForwardDontMasquerade(driver->iptables, + &ipdef->address, + prefix, + forwardIf, + networkLocalMulticast); + iptablesRemoveForwardDontMasquerade(driver->iptables, + &ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast); iptablesRemoveForwardMasquerade(driver->iptables, &ipdef->address, prefix, -- 1.7.1

I'm not sure if everyone that needs to see this discussion are subscribed to RH bugzilla bug #709418 so I will repeat my comments from comment #12 on that bug here: On 13-05-27 10:34 PM, Laszlo Ersek wrote:
Packets sent by guests on virbrN, *or* by dnsmasq on the same, to - 255.255.255.255/32 (netmask-independent local network broadcast address), or to - 224.0.0.0/24 (local subnetwork multicast range)
Multicast is much wider than just 224.0.0.0/24. Per my patch in attachment 526549 [details], it's in fact 224.0.0.0/4. All multicast needs to be exempt from NAT. Quoting from Wikipedia article http://en.wikipedia.org/wiki/Multicast_address: IPv4 multicast addresses are defined by the leading address bits of 1110, originating from the classful network design of the early Internet when this group of addresses was designated as Class D. The Classless Inter-Domain Routing (CIDR) prefix of this group is 224.0.0.0/4. Indeed, Wikipedia is not authoritative, but it does provide a good starting point to finding authoritative information if you want. I suppose the authoritative doc is at http://www.iana.org/assignments/multicast-addresses/multicast-addresses.xml or somesuch. It does not explicitly say 224.0.0.0/4 but it certainly implies it in discussing the entire space from 224.0.0.0 through 239.255.255.255.
diff --git a/src/network/bridge_driver.c b/src/network/bridge_driver.c index d5886fe..2b3b250 100644 --- a/src/network/bridge_driver.c +++ b/src/network/bridge_driver.c @@ -1542,6 +1542,9 @@ networkRefreshDaemons(struct network_driver *driver) } }
+static const char networkLocalMulticast[] = "224.0.0.0/24";
This should be /8, not /24. There are many multicast protocols that operate outside of that 224.0.0.0/24 network. In fact they all must. 224.0.0.0/24 is reserved and assignment must be made from IANA. Let me give you a scenario where your 224.0.0.0/24 fails to correct the problem: Imagine you have a cluster of VMs all with a network address of 192.168.122.[2-10]/24 and 192.168.122.1 is your VM "server". Now imagine those VMs are using a multicast address of 226.1.2.3 (just for argument's sake). Each node will be listening on 226.1.2.3 for other "members" of the group to build a list of members. But once the packet from (say) 192.168.122.2->226.1.2.3 goes through the "VM host"'s NAT it will look to the other group members to be from host 192.168.122.1, so they all add 192.168.122.1 to their member list. Then 192.168.122.3 sends it's packet to 226.1.2.3, it goes through NAT also and all of the group members notice they already have 192.168.122.1 in their group and just ignore it. That goes on for all of the [2-10] nodes and none of them get added to the group list. That's because NAT broke the protocol. Further, to your arguments in comment #13 in BZ #709418 about needing to map ports > 1023 for security: The source port mapping goal has some specific parameters to it. It's trying to avoid a VM from masquerading as it's host and leveraging it's host's permission in the network. But that's only a problem because the VM is masquerading as the host. When you exempt multicast from masquerading you no longer have the problem of the VM being mistaken for the host and therefore you don't need the "quasi-security" of not allowing the VM to use ports < 1024. Put another way, mapping the source port numbers is only a requirement because the VM is borrowing the identity of the host. Once you eliminate the identity borrowing you can eliminate the need for the safety guards that that lending is putting in place. Cheers, b.
participants (2)
-
Brian J. Murrell
-
Laszlo Ersek