
Warning, this is a long & complicated email with lots of horrible details :-) I've long been a little confused with the way iptables & bridging interacts, so set out to do some experiments. I added a -j LOG rule to every single chain in both the filter & nat tables, and then tried various traffic patterns, to see which chains were traversed & in which order. There are 2 types of config I considered - virtual networking, and shared physical device. For both these I tried with net.bridge.bridge-nf-call-iptables on & off. This gave 4 scenarios to test with. For the test I simply did 'ping -c 1 <ip addr>', which gives a simple roundtrip with a single packet in each direction. The results were as follows.... Scenario 1: Virtual network =========================== net.bridge.bridge-nf-call-iptables = 0 Host: eth0 -> Internet virbr0 -> MASQUERADE to eth0 Guest: vif1.0 -> virbr0 Traffic: Guest -> Google ------------------------ Out: NAT-PREROUTING IN=virbr0 OUT= SRC=192.168.122.47 DST=64.233.167.99 FORWARD IN=virbr0 OUT=eth0 SRC=192.168.122.47 DST=64.233.167.99 NAT-POSTROUTING IN= OUT=eth0 SRC=192.168.122.47 DST=64.233.167.99 Back: FORWARD IN=eth0 OUT=virbr0 SRC=64.233.167.99 DST=192.168.122.47 Traffic: Guest -> Host ---------------------- Out: NAT-PREROUTING IN=virbr0 OUT= SRC=192.168.122.47 DST=192.168.122.1 INPUT IN=virbr0 OUT= SRC=192.168.122.47 DST=192.168.122.1 Back: OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47 Traffic: Host -> Guest ---------------------- Out: NAT-OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47 OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47 NAT-POSTROUTING IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47 Back: INPUT IN=virbr0 OUT= SRC=192.168.122.47 DST=192.168.122.1 Scenario 2: Virtual network =========================== net.bridge.bridge-nf-call-iptables = 1 Host: eth0 -> Internet virbr0 -> MASQUERADE to eth0 Guest: vif1.0 -> virbr0 Traffic: Guest -> Google ------------------------ Out: NAT-PREROUTING IN=virbr0 OUT= PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99 FORWARD IN=virbr0 OUT=eth0 PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99 NAT-POSTROUTING IN= OUT=eth0 PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99 Back: FORWARD IN=eth0 OUT=virbr0 SRC=64.233.167.99 DST=192.168.122.47 Traffic: Guest -> Host ---------------------- Out: NAT-PREROUTING IN=virbr0 OUT= PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1 INPUT IN=virbr0 OUT= PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1 Back: OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47 Traffic: Host -> Guest ---------------------- Out: NAT-OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47 OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47 NAT-POSTROUTING IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47 Back: INPUT IN=virbr0 OUT= PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1 Scenario 3: Shared physical device ================================== net.bridge.bridge-nf-call-iptables = 0 Host: peth1 -> Internet xenbr0 -> peth1 Guest: vif2.0 -> xenbr0 Traffic: Guest -> Google ------------------------ Nada Traffic: Guest -> Host ---------------------- Out: NAT-PREROUTING IN=eth1 OUT= SRC=192.168.254.120 DST=192.168.254.132 INPUT IN=eth1 OUT= SRC=192.168.254.120 DST=192.168.254.132 Back: OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120 Traffic: Host -> Guest ---------------------- Out: NAT-OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120 OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120 NAT-POSTROUTING IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120 Back: INPUT IN=eth1 OUT= SRC=192.168.254.120 DST=192.168.254.132 Scenario 4: Shared physical device ================================== net.bridge.bridge-nf-call-iptables = 1 Host: peth1 -> Internet xenbr0 -> peth1 Guest: vif2.0 -> xenbr0 Traffic: Guest -> Google ------------------------ Out: NAT-PREROUTING IN=xenbr1 OUT= PHYSIN=vif2.0 SRC=192.168.254.120 DST=64.233.167.99 FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=peth1 SRC=192.168.254.120 DST=64.233.167.99 NAT-POSTROUTING IN= OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=peth1 SRC=192.168.254.120 DST=64.233.167.99 Back: FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=peth1 PHYSOUT=vif2.0 SRC=64.233.167.99 DST=192.168.254.120 Traffic: Guest -> Host ---------------------- Out: NAT-PREROUTING IN=xenbr1 OUT= PHYSIN=vif2.0 SRC=192.168.254.120 DST=192.168.254.132 FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120 DST=192.168.254.132 NAT-POSTROUTING IN= OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120 DST=192.168.254.132 INPUT IN=eth1 OUT= SRC=192.168.254.120 DST=192.168.254.132 Back: OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120 FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif0.1 PHYSOUT=vif2.0 SRC=192.168.254.132 DST=192.168.254.120 Traffic: Host -> Guest ---------------------- Out: NAT-OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120 OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120 NAT-POSTROUTING IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120 FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif0.1 PHYSOUT=vif2.0 SRC=192.168.254.132 DST=192.168.254.120 Back: FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120 DST=192.168.254.132 INPUT IN=eth1 OUT= SRC=192.168.254.120 DST=192.168.254.132 Now in this email I'm really only concerned with the first 2 virtual network scernaios. The shared physical device stuff can be ignored henceforth, basically because it 'just works(tm)'. For virtual networks there are basically 3 types of networking config we need to represent in terms of iptables rules, and these need to work for scenrios 1 & 2 - ie regardless of the magic sysctl knob. Here is what we currently implement...... Type 1: Isolated virtual network -------------------------------- - We don't add anything here Type 2: Forwarding to a specific NIC only ----------------------------------------- Chain POSTROUTING (policy ACCEPT 345 packets, 32627 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- * eth1 0.0.0.0/0 0.0.0.0/0 PHYSDEV match ! --physdev-is-bridged Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT all -- eth1 vnet0 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 0 0 ACCEPT all -- vnet0 eth1 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- * eth1 0.0.0.0/0 0.0.0.0/0 PHYSDEV match --physdev-in vnet0 Chain INPUT (policy ACCEPT 80483 packets, 382M bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT udp -- vnet0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:53 0 0 ACCEPT tcp -- vnet0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 0 0 ACCEPT udp -- vnet0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:67 0 0 ACCEPT tcp -- vnet0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:67 Type 3: Forwarding to any active NIC ------------------------------------ Chain POSTROUTING (policy ACCEPT 360 packets, 33843 bytes) pkts bytes target prot opt in out source destination 2 476 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0 PHYSDEV match ! --physdev-is-bridged Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT all -- * virbr0 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED 0 0 ACCEPT all -- virbr0 * 0.0.0.0/0 0.0.0.0/0 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 PHYSDEV match --physdev-in virbr0 Chain INPUT (policy ACCEPT 80884 packets, 382M bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:53 0 0 ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 0 0 ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:67 0 0 ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:67 So how do these shape up, given the traversal scenarios & the overall desire to be as restrictive as possible with traffic. Problem: The INPUT rules are missing altogether for the isolated virtual network so potentially DHCP/DNS will be blocked Solution: Add them - simple bug. Problem: The POSTROUTING rule is too generic so it matches pretty much any kind of traffic, from any virtual network, or even from VPN devices setup by VPNC. Solution: Only masquerade traffic whose source address is within the netmask associated with the virtual network in question Problem: The FORWARD rule is too generic, forwarding traffic to/from the virtual network regardless of whether the dest/src IP address is within the netmask associated with the virtual network. Assuming the first problem is setup to only masquerade valid IP addresses from the virtual network, this rule would then allow guests to spoof their IP and have it forwarded off-host. Solution: Only forward packets whose IP address is within the netmask associated with the virtual network Problem: The policy of the FORWARD rule is ACCEPT, and/or later user defined rules may inadvertently match on traffic from the virtual network, again allowing through spoof traffic, or traffic from what should be an isolated virtual network Solution: There needs to be a catch-all REJECT rule associated with every bridge device, in both directions Problem: There is an extra physdev match per bridge device, and per guest device. This is basically unneccessary since the previous rule sets will already have allowed through the traffic. The physdev matches also only work if net.bridge.bridge-nf-call-iptables = 1 Solution: Simply remove the per-device matches Problem: The POSTROUTING rule has a physdev match applied, which only works if net.bridge.bridge-nf-call-iptables = 1. Solution: Remove physdev match & masquerade based on network address associated with the virtual network If we apply all solution outlined here, we'll end up with a set of rules which look like this.......... Type 1: Isolated virtual network -------------------------------- Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes) pkts bytes target prot opt in out source destination Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes) pkts bytes target prot opt in out source destination 0 0 REJECT all -- * vnet2 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable 0 0 REJECT all -- vnet2 * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable Chain INPUT (policy ACCEPT 76724 packets, 366M bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT udp -- vnet2 * 0.0.0.0/0 0.0.0.0/0 udp dpt:53 0 0 ACCEPT tcp -- vnet2 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 0 0 ACCEPT udp -- vnet2 * 0.0.0.0/0 0.0.0.0/0 udp dpt:67 0 0 ACCEPT tcp -- vnet2 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:67 Type 2: Forwarding to a specific NIC only ----------------------------------------- Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes) pkts bytes target prot opt in out source destination 0 0 MASQUERADE all -- * eth1 192.168.200.0/24 0.0.0.0/0 Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT all -- eth1 vnet3 0.0.0.0/0 192.168.200.0/24 state RELATED,ESTABLISHED 0 0 ACCEPT all -- vnet3 eth1 192.168.200.0/24 0.0.0.0/0 0 0 REJECT all -- * vnet3 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable 0 0 REJECT all -- vnet3 * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable Chain INPUT (policy ACCEPT 76724 packets, 366M bytes) pkts bytes target prot opt in out source destination 0 0 ACCEPT udp -- vnet3 * 0.0.0.0/0 0.0.0.0/0 udp dpt:53 0 0 ACCEPT tcp -- vnet3 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 0 0 ACCEPT udp -- vnet3 * 0.0.0.0/0 0.0.0.0/0 udp dpt:67 0 0 ACCEPT tcp -- vnet3 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:67 Type 3: Forwarding to any active NIC ------------------------------------ Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes) pkts bytes target prot opt in out source destination 16 1292 MASQUERADE all -- * * 192.168.122.0/24 0.0.0.0/0 Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes) pkts bytes target prot opt in out source destination 44 20200 ACCEPT all -- * virbr0 0.0.0.0/0 192.168.122.0/24 state RELATED,ESTABLISHED 56 3676 ACCEPT all -- virbr0 * 192.168.122.0/24 0.0.0.0/0 0 0 REJECT all -- * virbr0 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable 0 0 REJECT all -- virbr0 * 0.0.0.0/0 0.0.0.0/0 reject-with icmp-port-unreachable Chain INPUT (policy ACCEPT 76724 packets, 366M bytes) pkts bytes target prot opt in out source destination 28 1728 ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:53 0 0 ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:53 5 1640 ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 udp dpt:67 0 0 ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:67 So in summary: - Every single network type has the 4 INPUT rules for DHCP/DNS - Every single network type has catch all REJECT rules in FORWARD chain for both directions of traffic - A network forwarding to any device, has ACCEPT rules which allow through traffic associated with the virtual network IP range to/from any device - A network forwarding to a specific device, has ACCEPT rules which allow through traffic associated with the virtual network IP range to/from that specific device. - A network forwarding to any device, has MASQUERADE rule to translate source address which matches the virtual network & destined for any dev - A network forwarding to a specific device, has MASQUERADE rule to translate source address which matches the virutal network & destinaed for that specific device - There are no physdev matches needed. Hopefully at least one person has read this far through the email and still understands what is going on.... I'm attaching a patch which implements all this. BTW, there is also a bug in the vif-bridge script for Xen which adds a per-guest VIF physdev match rule. This needs to be removed too. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|