[libvirt] [PATCH v2 0/2] don't masquerade local broadcast/multicast packets

Masquerading local broadcast breaks DHCP replies for some clients. There has been a report about broken local multicast too. (See references in the patches.) Testing: build tested the upstream series. Tested the RHEL-6.4.z and RHEL-7.0 backports with OVMF netboot on virbr0. Changes between v1 (at http://www.redhat.com/archives/libvir-list/2013-May/msg01872.html ) and v2: - forward-ported to current upstream master (commit 49a5262d). This includes conflict resolution for: commit 477a619e1b37694e3c59c0d6c84ede6d2e28b878 Author: Roman Bogorodskiy <bogorodskiy@gmail.com> Date: Fri Jun 28 00:52:30 2013 -0400 Drop iptablesContext in both patches #1 and #2, and for commit 4ac708f250867f65091a20b153c204862d389cb9 Author: Roman Bogorodskiy <bogorodskiy@gmail.com> Date: Wed Jul 24 16:22:54 2013 +0400 bridge driver: extract platform specifics in patch #2. Laszlo Ersek (2): util/viriptables: add/remove rules that short-circuit masquerading bridge driver: don't masquerade local subnet broadcast/multicast packets src/util/viriptables.h | 8 ++++ src/network/bridge_driver_linux.c | 70 +++++++++++++++++++++++++++++-- src/util/viriptables.c | 88 +++++++++++++++++++++++++++++++++++++++ src/libvirt_private.syms | 2 + 4 files changed, 164 insertions(+), 4 deletions(-) -- 1.8.3.1

The functions - iptablesAddForwardDontMasquerade(), - iptablesRemoveForwardDontMasquerade handle exceptions in the masquerading implemented in the POSTROUTING chain of the "nat" table. Such exceptions should be added as chronologically latest, logically top-most rules. The bridge driver will call these functions beginning with the next patch: some special destination IP addresses always refer to the local subnetwork, even though they don't match any practical subnetwork's netmask. Packets from virbrN targeting such IP addresses are never routed outwards, but the current rules treat them as non-virbrN-destined packets and masquerade them. This causes problems for some receivers on virbrN. Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- src/util/viriptables.h | 8 +++++ src/util/viriptables.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++ src/libvirt_private.syms | 2 ++ 3 files changed, 98 insertions(+) diff --git a/src/util/viriptables.h b/src/util/viriptables.h index 447f4a8..aee1842 100644 --- a/src/util/viriptables.h +++ b/src/util/viriptables.h @@ -94,6 +94,14 @@ int iptablesRemoveForwardMasquerade (virSocketAddr *netaddr, virSocketAddrRangePtr addr, virPortRangePtr port, const char *protocol); +int iptablesAddForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr); +int iptablesRemoveForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr); int iptablesAddOutputFixUdpChecksum (const char *iface, int port); int iptablesRemoveOutputFixUdpChecksum (const char *iface, diff --git a/src/util/viriptables.c b/src/util/viriptables.c index 0f57d66..349734c 100644 --- a/src/util/viriptables.c +++ b/src/util/viriptables.c @@ -832,6 +832,94 @@ iptablesRemoveForwardMasquerade(virSocketAddr *netaddr, } +/* Don't masquerade traffic coming from the network associated with the bridge + * if said traffic targets @destaddr. + */ +static int +iptablesForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr, + int action) +{ + int ret = -1; + char *networkstr = NULL; + virCommandPtr cmd = NULL; + + if (!(networkstr = iptablesFormatNetwork(netaddr, prefix))) + return -1; + + if (!VIR_SOCKET_ADDR_IS_FAMILY(netaddr, AF_INET)) { + /* Higher level code *should* guaranteee it's impossible to get here. */ + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Attempted to NAT '%s'. NAT is only supported for IPv4."), + networkstr); + goto cleanup; + } + + cmd = iptablesCommandNew("nat", "POSTROUTING", AF_INET, action); + + if (physdev && physdev[0]) + virCommandAddArgList(cmd, "--out-interface", physdev, NULL); + + virCommandAddArgList(cmd, "--source", networkstr, + "--destination", destaddr, "--jump", "RETURN", NULL); + ret = virCommandRun(cmd, NULL); +cleanup: + virCommandFree(cmd); + VIR_FREE(networkstr); + return ret; +} + +/** + * iptablesAddForwardDontMasquerade: + * @netaddr: the source network name + * @prefix: prefix (# of 1 bits) of netmask to apply to @netaddr + * @physdev: the physical output device or NULL + * @destaddr: the destination network not to masquerade for + * + * Add rules to the IP table context to avoid masquerading from + * @netaddr/@prefix to @destaddr on @physdev. @destaddr must be in a format + * directly consumable by iptables, it must not depend on user input or + * configuration. + * + * Returns 0 in case of success or an error code otherwise. + */ +int +iptablesAddForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr) +{ + return iptablesForwardDontMasquerade(netaddr, prefix, physdev, destaddr, + ADD); +} + +/** + * iptablesRemoveForwardDontMasquerade: + * @netaddr: the source network name + * @prefix: prefix (# of 1 bits) of netmask to apply to @netaddr + * @physdev: the physical output device or NULL + * @destaddr: the destination network not to masquerade for + * + * Remove rules from the IP table context that prevent masquerading from + * @netaddr/@prefix to @destaddr on @physdev. @destaddr must be in a format + * directly consumable by iptables, it must not depend on user input or + * configuration. + * + * Returns 0 in case of success or an error code otherwise. + */ +int +iptablesRemoveForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr) +{ + return iptablesForwardDontMasquerade(netaddr, prefix, physdev, destaddr, + REMOVE); +} + + static int iptablesOutputFixUdpChecksum(const char *iface, int port, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 50e2f48..49505c9 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1450,6 +1450,7 @@ iptablesAddForwardAllowCross; iptablesAddForwardAllowIn; iptablesAddForwardAllowOut; iptablesAddForwardAllowRelatedIn; +iptablesAddForwardDontMasquerade; iptablesAddForwardMasquerade; iptablesAddForwardRejectIn; iptablesAddForwardRejectOut; @@ -1460,6 +1461,7 @@ iptablesRemoveForwardAllowCross; iptablesRemoveForwardAllowIn; iptablesRemoveForwardAllowOut; iptablesRemoveForwardAllowRelatedIn; +iptablesRemoveForwardDontMasquerade; iptablesRemoveForwardMasquerade; iptablesRemoveForwardRejectIn; iptablesRemoveForwardRejectOut; -- 1.8.3.1

On 09/23/2013 10:05 AM, Laszlo Ersek wrote:
The functions - iptablesAddForwardDontMasquerade(), - iptablesRemoveForwardDontMasquerade handle exceptions in the masquerading implemented in the POSTROUTING chain of the "nat" table. Such exceptions should be added as chronologically latest, logically top-most rules.
The bridge driver will call these functions beginning with the next patch: some special destination IP addresses always refer to the local subnetwork, even though they don't match any practical subnetwork's netmask. Packets from virbrN targeting such IP addresses are never routed outwards, but the current rules treat them as non-virbrN-destined packets and masquerade them. This causes problems for some receivers on virbrN.
Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- src/util/viriptables.h | 8 +++++ src/util/viriptables.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++ src/libvirt_private.syms | 2 ++ 3 files changed, 98 insertions(+)
diff --git a/src/util/viriptables.h b/src/util/viriptables.h index 447f4a8..aee1842 100644 --- a/src/util/viriptables.h +++ b/src/util/viriptables.h @@ -94,6 +94,14 @@ int iptablesRemoveForwardMasquerade (virSocketAddr *netaddr, virSocketAddrRangePtr addr, virPortRangePtr port, const char *protocol); +int iptablesAddForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr); +int iptablesRemoveForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr); int iptablesAddOutputFixUdpChecksum (const char *iface, int port); int iptablesRemoveOutputFixUdpChecksum (const char *iface, diff --git a/src/util/viriptables.c b/src/util/viriptables.c index 0f57d66..349734c 100644 --- a/src/util/viriptables.c +++ b/src/util/viriptables.c @@ -832,6 +832,94 @@ iptablesRemoveForwardMasquerade(virSocketAddr *netaddr, }
+/* Don't masquerade traffic coming from the network associated with the bridge + * if said traffic targets @destaddr. + */ +static int +iptablesForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr, + int action)
This function was based on the existing function iptablesForwardMasquerade(), replacing addr/port/protocol with the single destaddr. We may want to specify protocol or port in the future, but for now we don't need it, and we can always add those args in when we need to, so I'm okay with this. The name of the function is a bit troublesome to me though, since it's actually being used to setup rules for packets that *aren't* being forwarded (and the rules aren't going into the FORWARD table). How about naming it "iptablesDontMasquerade"? Some other name?
+{ + int ret = -1; + char *networkstr = NULL; + virCommandPtr cmd = NULL; + + if (!(networkstr = iptablesFormatNetwork(netaddr, prefix))) + return -1; + + if (!VIR_SOCKET_ADDR_IS_FAMILY(netaddr, AF_INET)) { + /* Higher level code *should* guaranteee it's impossible to get here. */ + virReportError(VIR_ERR_INTERNAL_ERROR, + _("Attempted to NAT '%s'. NAT is only supported for IPv4."), + networkstr); + goto cleanup; + } + + cmd = iptablesCommandNew("nat", "POSTROUTING", AF_INET, action); + + if (physdev && physdev[0]) + virCommandAddArgList(cmd, "--out-interface", physdev, NULL); + + virCommandAddArgList(cmd, "--source", networkstr, + "--destination", destaddr, "--jump", "RETURN", NULL); + ret = virCommandRun(cmd, NULL); +cleanup: + virCommandFree(cmd); + VIR_FREE(networkstr); + return ret; +} + +/** + * iptablesAddForwardDontMasquerade: + * @netaddr: the source network name + * @prefix: prefix (# of 1 bits) of netmask to apply to @netaddr + * @physdev: the physical output device or NULL + * @destaddr: the destination network not to masquerade for + * + * Add rules to the IP table context to avoid masquerading from + * @netaddr/@prefix to @destaddr on @physdev. @destaddr must be in a format + * directly consumable by iptables, it must not depend on user input or + * configuration. + * + * Returns 0 in case of success or an error code otherwise. + */ +int +iptablesAddForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr) +{ + return iptablesForwardDontMasquerade(netaddr, prefix, physdev, destaddr, + ADD); +} + +/** + * iptablesRemoveForwardDontMasquerade: + * @netaddr: the source network name + * @prefix: prefix (# of 1 bits) of netmask to apply to @netaddr + * @physdev: the physical output device or NULL + * @destaddr: the destination network not to masquerade for + * + * Remove rules from the IP table context that prevent masquerading from + * @netaddr/@prefix to @destaddr on @physdev. @destaddr must be in a format + * directly consumable by iptables, it must not depend on user input or + * configuration. + * + * Returns 0 in case of success or an error code otherwise. + */ +int +iptablesRemoveForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr) +{ + return iptablesForwardDontMasquerade(netaddr, prefix, physdev, destaddr, + REMOVE); +} + + static int iptablesOutputFixUdpChecksum(const char *iface, int port, diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 50e2f48..49505c9 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -1450,6 +1450,7 @@ iptablesAddForwardAllowCross; iptablesAddForwardAllowIn; iptablesAddForwardAllowOut; iptablesAddForwardAllowRelatedIn; +iptablesAddForwardDontMasquerade; iptablesAddForwardMasquerade; iptablesAddForwardRejectIn; iptablesAddForwardRejectOut; @@ -1460,6 +1461,7 @@ iptablesRemoveForwardAllowCross; iptablesRemoveForwardAllowIn; iptablesRemoveForwardAllowOut; iptablesRemoveForwardAllowRelatedIn; +iptablesRemoveForwardDontMasquerade; iptablesRemoveForwardMasquerade; iptablesRemoveForwardRejectIn; iptablesRemoveForwardRejectOut;

On 09/23/13 16:31, Laine Stump wrote:
On 09/23/2013 10:05 AM, Laszlo Ersek wrote:
+/* Don't masquerade traffic coming from the network associated with the bridge + * if said traffic targets @destaddr. + */ +static int +iptablesForwardDontMasquerade(virSocketAddr *netaddr, + unsigned int prefix, + const char *physdev, + const char *destaddr, + int action)
The name of the function is a bit troublesome to me though, since it's actually being used to setup rules for packets that *aren't* being forwarded (and the rules aren't going into the FORWARD table). How about naming it "iptablesDontMasquerade"? Some other name?
Will follow your suggestion in v3. Thank you! Laszlo

Packets sent by guests on virbrN, *or* by dnsmasq on the same, to - 255.255.255.255/32 (netmask-independent local network broadcast address), or to - 224.0.0.0/24 (local subnetwork multicast range) are never forwarded, hence it is not necessary to masquerade them. In fact we must not masquerade them: translating their source addresses or source ports (where applicable) may confuse receivers on virbrN. One example is the DHCP client in OVMF (= UEFI firmware for virtual machines): http://thread.gmane.org/gmane.comp.bios.tianocore.devel/1506/focus=2640 It expects DHCP replies to arrive from remote source port 67. Even though dnsmasq conforms to that, the destination address (255.255.255.255) and the source address (eg. 192.168.122.1) in the reply allow the UDP masquerading rule to match, which rewrites the source port to or above 1024. This prevents the DHCP client in OVMF from accepting the packet. Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=709418 Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- src/network/bridge_driver_linux.c | 70 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 66 insertions(+), 4 deletions(-) diff --git a/src/network/bridge_driver_linux.c b/src/network/bridge_driver_linux.c index 80430a8..95add0e 100644 --- a/src/network/bridge_driver_linux.c +++ b/src/network/bridge_driver_linux.c @@ -127,6 +127,9 @@ out: return ret; } +static const char networkLocalMulticast[] = "224.0.0.0/24"; +static const char networkLocalBroadcast[] = "255.255.255.255/32"; + int networkAddMasqueradingFirewallRules(virNetworkObjPtr network, virNetworkIpDefPtr ipdef) { @@ -167,11 +170,20 @@ int networkAddMasqueradingFirewallRules(virNetworkObjPtr network, /* * Enable masquerading. * - * We need to end up with 3 rules in the table in this order + * We need to end up with 5 rules in the table in this order * - * 1. protocol=tcp with sport mapping restriction - * 2. protocol=udp with sport mapping restriction - * 3. generic any protocol + * 1. do not masquerade packets targeting 224.0.0.0/24 + * 2. do not masquerade packets targeting 255.255.255.255/32 + * 3. masquerade protocol=tcp with sport mapping restriction + * 4. masquerade protocol=udp with sport mapping restriction + * 5. generic, masquerade any protocol + * + * 224.0.0.0/24 is the local network multicast range. Packets are not + * forwarded outside. + * + * 255.255.255.255/32 is the broadcast address of any local network. Again, + * such packets are never forwarded, but strict DHCP clients don't accept + * DHCP replies with changed source ports. * * The sport mappings are required, because default IPtables * MASQUERADE maintain port numbers unchanged where possible. @@ -238,8 +250,50 @@ int networkAddMasqueradingFirewallRules(virNetworkObjPtr network, goto masqerr5; } + /* exempt local network broadcast address as destination */ + if (iptablesAddForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast) < 0) { + if (forwardIf) + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to add iptables rule to prevent local broadcast masquerading on %s"), + forwardIf); + else + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("failed to add iptables rule to prevent local broadcast masquerading")); + goto masqerr6; + } + + /* exempt local multicast range as destination */ + if (iptablesAddForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalMulticast) < 0) { + if (forwardIf) + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to add iptables rule to prevent local multicast masquerading on %s"), + forwardIf); + else + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("failed to add iptables rule to prevent local multicast masquerading")); + goto masqerr7; + } + return 0; + masqerr7: + iptablesRemoveForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast); + masqerr6: + iptablesRemoveForwardMasquerade(&ipdef->address, + prefix, + forwardIf, + &network->def->forward.addr, + &network->def->forward.port, + "tcp"); masqerr5: iptablesRemoveForwardMasquerade(&ipdef->address, prefix, @@ -275,6 +329,14 @@ void networkRemoveMasqueradingFirewallRules(virNetworkObjPtr network, const char *forwardIf = virNetworkDefForwardIf(network->def, 0); if (prefix >= 0) { + iptablesRemoveForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalMulticast); + iptablesRemoveForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast); iptablesRemoveForwardMasquerade(&ipdef->address, prefix, forwardIf, -- 1.8.3.1

On 13-09-23 10:05 AM, Laszlo Ersek wrote:
Packets sent by guests on virbrN, *or* by dnsmasq on the same, to - 255.255.255.255/32 (netmask-independent local network broadcast address), or to - 224.0.0.0/24 (local subnetwork multicast range)
All multicast, not just the local subnet multicast needs to be exempt from masquerading. I would tend to guess that anyone trying to do "global Internet" multicast behind NAT is signing up for a world of trouble anyway. It's not like you could put a multicast source behind a NAT so it could really only apply to sinks. Since sinks operate by "subscribing" to a multicast source by way of their multicast router (which should be local and not through a NAT) I would think a NAT device that supports multicast to it's NAT clients should support it as if it were a multicast router and not so much a NAT and so there never really should be NATting of traffic from a sink. So AFACT there never really is a use-case for actually NATting multicast traffic, so just don't NAT the whole range, 224.0.0.0/4.
are never forwarded, hence it is not necessary to masquerade them.
In fact we must not masquerade them: translating their source addresses or source ports (where applicable) may confuse receivers on virbrN.
One example is the DHCP client in OVMF (= UEFI firmware for virtual machines):
An example use-case if you want for multicast is creating a corosync cluster. When that works, you have this patch right. :-)
diff --git a/src/network/bridge_driver_linux.c b/src/network/bridge_driver_linux.c index 80430a8..95add0e 100644 --- a/src/network/bridge_driver_linux.c +++ b/src/network/bridge_driver_linux.c @@ -127,6 +127,9 @@ out: return ret; }
+static const char networkLocalMulticast[] = "224.0.0.0/24";
NACK. 224.0.0.0/4 is the entire multicast space. 224.0.0.0/24 is just one tiny reserved "control block" of addresses for routing protocols, etc. There's a lot more to multicast than just routing protocols. Cheers, b.

On 09/23/13 16:40, Brian J. Murrell wrote:
On 13-09-23 10:05 AM, Laszlo Ersek wrote:
Packets sent by guests on virbrN, *or* by dnsmasq on the same, to - 255.255.255.255/32 (netmask-independent local network broadcast address), or to - 224.0.0.0/24 (local subnetwork multicast range)
All multicast, not just the local subnet multicast needs to be exempt from masquerading.
I would tend to guess that anyone trying to do "global Internet" multicast behind NAT is signing up for a world of trouble anyway. It's not like you could put a multicast source behind a NAT so it could really only apply to sinks.
Since sinks operate by "subscribing" to a multicast source by way of their multicast router (which should be local and not through a NAT) I would think a NAT device that supports multicast to it's NAT clients should support it as if it were a multicast router and not so much a NAT and so there never really should be NATting of traffic from a sink.
So AFACT there never really is a use-case for actually NATting multicast traffic, so just don't NAT the whole range, 224.0.0.0/4.
are never forwarded, hence it is not necessary to masquerade them.
In fact we must not masquerade them: translating their source addresses or source ports (where applicable) may confuse receivers on virbrN.
One example is the DHCP client in OVMF (= UEFI firmware for virtual machines):
An example use-case if you want for multicast is creating a corosync cluster. When that works, you have this patch right. :-)
diff --git a/src/network/bridge_driver_linux.c b/src/network/bridge_driver_linux.c index 80430a8..95add0e 100644 --- a/src/network/bridge_driver_linux.c +++ b/src/network/bridge_driver_linux.c @@ -127,6 +127,9 @@ out: return ret; }
+static const char networkLocalMulticast[] = "224.0.0.0/24";
NACK. 224.0.0.0/4 is the entire multicast space. 224.0.0.0/24 is just one tiny reserved "control block" of addresses for routing protocols, etc. There's a lot more to multicast than just routing protocols.
I won't do that in this series (see also Laine's review). In https://bugzilla.redhat.com/show_bug.cgi?id=709418#c13 I suggested to go with "224.0.0.0/24" now, and to let someone with good knowledge of multicast implement "224.0.0.0/4". You could write that patch too, and this email of yours would certainly be a good commit message basis. If you disagree with this approach (that is: if you think that "224.0.0.0/24" here is not gradual improvement but a step in the wrong direction), then - I'll drop "networkLocalMulticast" plus what depends on it completely, - I'll drop the BZ reference from the commit message, - I'll retitle the BZ so that it clearly concerns multicast only, again, like it did when you submitted it, - in short my series will focus only on local broadcast. The broadcast and the multicast use cases are distinct. I only tried to merge them because they appeared to affect the same code -- what I perceived to be a good first step for multicast seemed to be implementable with the same effort as local broadcast. However, if you think we can't (or shouldn't) build on this "first multicast step" present in the patch, then I'll drop multicast altogether. Sound good? Laszlo

On 13-09-23 02:27 PM, Laszlo Ersek wrote:
If you disagree with this approach (that is: if you think that "224.0.0.0/24" here is not gradual improvement but a step in the wrong direction),
Of course I'm not saying that. I think that's pretty clear. The only point we disagree on is the size of the network range, not the implementation of the feature so by definition of course your patch is a good initial improvement and should continue on that path. If somebody really needs to come along afterward as a separate effort and expand the range (or at least be able to leverage your work to do so in their own private builds) then that can happen. Cheers, b.

On 09/24/13 18:10, Brian J. Murrell wrote:
On 13-09-23 02:27 PM, Laszlo Ersek wrote:
If you disagree with this approach (that is: if you think that "224.0.0.0/24" here is not gradual improvement but a step in the wrong direction),
Of course I'm not saying that. I think that's pretty clear. The only point we disagree on is the size of the network range, not the implementation of the feature so by definition of course your patch is a good initial improvement and should continue on that path.
If somebody really needs to come along afterward as a separate effort and expand the range (or at least be able to leverage your work to do so in their own private builds) then that can happen.
Thanks, and that's really what I consider necessary. We agree that the change is not big or hard. It's just that - I can't convincingly argue the change in the commit message, - security is in the picture (and I can't argue it isn't), - hence I *really* don't want my S-o-b on the change. I'm not opposing the change at all, I just don't want my name on it, because I *can't prove* that it's secure. For the restrictive prefix I have at least public references. Thanks, Laszlo

On 09/23/2013 10:05 AM, Laszlo Ersek wrote:
Packets sent by guests on virbrN, *or* by dnsmasq on the same, to - 255.255.255.255/32 (netmask-independent local network broadcast address), or to - 224.0.0.0/24 (local subnetwork multicast range) are never forwarded, hence it is not necessary to masquerade them.
In fact we must not masquerade them: translating their source addresses or source ports (where applicable) may confuse receivers on virbrN.
One example is the DHCP client in OVMF (= UEFI firmware for virtual machines):
http://thread.gmane.org/gmane.comp.bios.tianocore.devel/1506/focus=2640
It expects DHCP replies to arrive from remote source port 67. Even though dnsmasq conforms to that, the destination address (255.255.255.255) and the source address (eg. 192.168.122.1) in the reply allow the UDP masquerading rule to match, which rewrites the source port to or above 1024. This prevents the DHCP client in OVMF from accepting the packet.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=709418
Signed-off-by: Laszlo Ersek <lersek@redhat.com> --- src/network/bridge_driver_linux.c | 70 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 66 insertions(+), 4 deletions(-)
diff --git a/src/network/bridge_driver_linux.c b/src/network/bridge_driver_linux.c index 80430a8..95add0e 100644 --- a/src/network/bridge_driver_linux.c +++ b/src/network/bridge_driver_linux.c @@ -127,6 +127,9 @@ out: return ret; }
+static const char networkLocalMulticast[] = "224.0.0.0/24"; +static const char networkLocalBroadcast[] = "255.255.255.255/32";
1) After briefly looking at the references in the BZ, I agree with you that it's safest to not yet bypass NAT on the entire multicast range. Let's make sure we fully understand it, then submit a separate patch for the larger range. 2) Along with 255.255.255.255/32, I think this patch can/should also add a "networkDirectedLocalBroadcast" (which will obviously need to be a local variable and recomputed each time). This can be computed by ORing the ip address of the network with ~netmask, then appending a 32 prefix. So for example, the directed broadcast for 192.168.122.1/24 would be 192.168.122.255/32. Beyond that it looks good, so ACK with those changes (unless someone else has anything to say).
+ int networkAddMasqueradingFirewallRules(virNetworkObjPtr network, virNetworkIpDefPtr ipdef) { @@ -167,11 +170,20 @@ int networkAddMasqueradingFirewallRules(virNetworkObjPtr network, /* * Enable masquerading. * - * We need to end up with 3 rules in the table in this order + * We need to end up with 5 rules in the table in this order * - * 1. protocol=tcp with sport mapping restriction - * 2. protocol=udp with sport mapping restriction - * 3. generic any protocol + * 1. do not masquerade packets targeting 224.0.0.0/24 + * 2. do not masquerade packets targeting 255.255.255.255/32 + * 3. masquerade protocol=tcp with sport mapping restriction + * 4. masquerade protocol=udp with sport mapping restriction + * 5. generic, masquerade any protocol + * + * 224.0.0.0/24 is the local network multicast range. Packets are not + * forwarded outside. + * + * 255.255.255.255/32 is the broadcast address of any local network. Again, + * such packets are never forwarded, but strict DHCP clients don't accept + * DHCP replies with changed source ports. * * The sport mappings are required, because default IPtables * MASQUERADE maintain port numbers unchanged where possible. @@ -238,8 +250,50 @@ int networkAddMasqueradingFirewallRules(virNetworkObjPtr network, goto masqerr5; }
+ /* exempt local network broadcast address as destination */ + if (iptablesAddForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast) < 0) { + if (forwardIf) + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to add iptables rule to prevent local broadcast masquerading on %s"), + forwardIf); + else + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("failed to add iptables rule to prevent local broadcast masquerading")); + goto masqerr6; + } + + /* exempt local multicast range as destination */ + if (iptablesAddForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalMulticast) < 0) { + if (forwardIf) + virReportError(VIR_ERR_SYSTEM_ERROR, + _("failed to add iptables rule to prevent local multicast masquerading on %s"), + forwardIf); + else + virReportError(VIR_ERR_SYSTEM_ERROR, "%s", + _("failed to add iptables rule to prevent local multicast masquerading")); + goto masqerr7; + } + return 0;
+ masqerr7: + iptablesRemoveForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast); + masqerr6: + iptablesRemoveForwardMasquerade(&ipdef->address, + prefix, + forwardIf, + &network->def->forward.addr, + &network->def->forward.port, + "tcp"); masqerr5: iptablesRemoveForwardMasquerade(&ipdef->address, prefix, @@ -275,6 +329,14 @@ void networkRemoveMasqueradingFirewallRules(virNetworkObjPtr network, const char *forwardIf = virNetworkDefForwardIf(network->def, 0);
if (prefix >= 0) { + iptablesRemoveForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalMulticast); + iptablesRemoveForwardDontMasquerade(&ipdef->address, + prefix, + forwardIf, + networkLocalBroadcast); iptablesRemoveForwardMasquerade(&ipdef->address, prefix, forwardIf,

On 09/23/13 16:46, Laine Stump wrote:
2) Along with 255.255.255.255/32, I think this patch can/should also add a "networkDirectedLocalBroadcast" (which will obviously need to be a local variable and recomputed each time). This can be computed by ORing the ip address of the network with ~netmask, then appending a 32 prefix. So for example, the directed broadcast for 192.168.122.1/24 would be 192.168.122.255/32.
I have just finished implementing and testing this. And now I realize that such a rule is not necessary at all :) Because, 192.168.122.255/32 actually *falls into* 192.168.122.0/24. Hence, the masquerading rules, which are restricted to !192.168.122.0/24 destination addresses, *ignore* 192.168.122.255/32 anyway. 255.255.255.255/32 is tricky because it never falls into the bridge's subnet numerically (consequently, it always matches the exclusive constraint), and yet it must not be masqueraded. I'm posting the v3 series anyway. It shouldn't be hard to trim it down for v4... Thanks, Laszlo

On 09/23/2013 08:01 PM, Laszlo Ersek wrote:
On 09/23/13 16:46, Laine Stump wrote:
2) Along with 255.255.255.255/32, I think this patch can/should also add a "networkDirectedLocalBroadcast" (which will obviously need to be a local variable and recomputed each time). This can be computed by ORing the ip address of the network with ~netmask, then appending a 32 prefix. So for example, the directed broadcast for 192.168.122.1/24 would be 192.168.122.255/32. I have just finished implementing and testing this. And now I realize that such a rule is not necessary at all :)
Yes, I'm embarrassed to say you are correct. Mixed up memories combined with reading through the BZ too quickly led me to the false recollection that even traffic that remained on the local subnet was being port-mapped. So you can just eliminate that part of the patch.
Because, 192.168.122.255/32 actually *falls into* 192.168.122.0/24. Hence, the masquerading rules, which are restricted to
!192.168.122.0/24
destination addresses, *ignore* 192.168.122.255/32 anyway.
255.255.255.255/32 is tricky because it never falls into the bridge's subnet numerically (consequently, it always matches the exclusive constraint), and yet it must not be masqueraded.
I'm posting the v3 series anyway. It shouldn't be hard to trim it down for v4...
Thanks, Laszlo .
participants (3)
-
Brian J. Murrell
-
Laine Stump
-
Laszlo Ersek