Warning, this is a long & complicated email with lots of horrible details :-)
I've long been a little confused with the way iptables & bridging interacts,
so set out to do some experiments. I added a -j LOG rule to every single chain
in both the filter & nat tables, and then tried various traffic patterns, to
see which chains were traversed & in which order. There are 2 types of config
I considered - virtual networking, and shared physical device. For both these
I tried with net.bridge.bridge-nf-call-iptables on & off. This gave 4 scenarios
to test with. For the test I simply did 'ping -c 1 <ip addr>', which gives
a
simple roundtrip with a single packet in each direction. The results were
as follows....
Scenario 1: Virtual network
===========================
net.bridge.bridge-nf-call-iptables = 0
Host: eth0 -> Internet
virbr0 -> MASQUERADE to eth0
Guest: vif1.0 -> virbr0
Traffic: Guest -> Google
------------------------
Out:
NAT-PREROUTING IN=virbr0 OUT= SRC=192.168.122.47 DST=64.233.167.99
FORWARD IN=virbr0 OUT=eth0 SRC=192.168.122.47 DST=64.233.167.99
NAT-POSTROUTING IN= OUT=eth0 SRC=192.168.122.47 DST=64.233.167.99
Back:
FORWARD IN=eth0 OUT=virbr0 SRC=64.233.167.99 DST=192.168.122.47
Traffic: Guest -> Host
----------------------
Out:
NAT-PREROUTING IN=virbr0 OUT= SRC=192.168.122.47 DST=192.168.122.1
INPUT IN=virbr0 OUT= SRC=192.168.122.47 DST=192.168.122.1
Back:
OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47
Traffic: Host -> Guest
----------------------
Out:
NAT-OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47
OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47
NAT-POSTROUTING IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47
Back:
INPUT IN=virbr0 OUT= SRC=192.168.122.47 DST=192.168.122.1
Scenario 2: Virtual network
===========================
net.bridge.bridge-nf-call-iptables = 1
Host: eth0 -> Internet
virbr0 -> MASQUERADE to eth0
Guest: vif1.0 -> virbr0
Traffic: Guest -> Google
------------------------
Out:
NAT-PREROUTING IN=virbr0 OUT= PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99
FORWARD IN=virbr0 OUT=eth0 PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99
NAT-POSTROUTING IN= OUT=eth0 PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99
Back:
FORWARD IN=eth0 OUT=virbr0 SRC=64.233.167.99 DST=192.168.122.47
Traffic: Guest -> Host
----------------------
Out:
NAT-PREROUTING IN=virbr0 OUT= PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1
INPUT IN=virbr0 OUT= PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1
Back:
OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47
Traffic: Host -> Guest
----------------------
Out:
NAT-OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47
OUTPUT IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47
NAT-POSTROUTING IN= OUT=virbr0 SRC=192.168.122.1 DST=192.168.122.47
Back:
INPUT IN=virbr0 OUT= PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1
Scenario 3: Shared physical device
==================================
net.bridge.bridge-nf-call-iptables = 0
Host: peth1 -> Internet
xenbr0 -> peth1
Guest: vif2.0 -> xenbr0
Traffic: Guest -> Google
------------------------
Nada
Traffic: Guest -> Host
----------------------
Out:
NAT-PREROUTING IN=eth1 OUT= SRC=192.168.254.120 DST=192.168.254.132
INPUT IN=eth1 OUT= SRC=192.168.254.120 DST=192.168.254.132
Back:
OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120
Traffic: Host -> Guest
----------------------
Out:
NAT-OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120
OUTPUT IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120
NAT-POSTROUTING IN= OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120
Back:
INPUT IN=eth1 OUT= SRC=192.168.254.120 DST=192.168.254.132
Scenario 4: Shared physical device
==================================
net.bridge.bridge-nf-call-iptables = 1
Host: peth1 -> Internet
xenbr0 -> peth1
Guest: vif2.0 -> xenbr0
Traffic: Guest -> Google
------------------------
Out:
NAT-PREROUTING IN=xenbr1 OUT= PHYSIN=vif2.0 SRC=192.168.254.120
DST=64.233.167.99
FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=peth1 SRC=192.168.254.120
DST=64.233.167.99
NAT-POSTROUTING IN= OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=peth1 SRC=192.168.254.120
DST=64.233.167.99
Back:
FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=peth1 PHYSOUT=vif2.0 SRC=64.233.167.99
DST=192.168.254.120
Traffic: Guest -> Host
----------------------
Out:
NAT-PREROUTING IN=xenbr1 OUT= PHYSIN=vif2.0 SRC=192.168.254.120
DST=192.168.254.132
FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120
DST=192.168.254.132
NAT-POSTROUTING IN= OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120
DST=192.168.254.132
INPUT IN=eth1 OUT= SRC=192.168.254.120
DST=192.168.254.132
Back:
OUTPUT IN= OUT=eth1 SRC=192.168.254.132
DST=192.168.254.120
FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif0.1 PHYSOUT=vif2.0 SRC=192.168.254.132
DST=192.168.254.120
Traffic: Host -> Guest
----------------------
Out:
NAT-OUTPUT IN= OUT=eth1 SRC=192.168.254.132
DST=192.168.254.120
OUTPUT IN= OUT=eth1 SRC=192.168.254.132
DST=192.168.254.120
NAT-POSTROUTING IN= OUT=eth1 SRC=192.168.254.132
DST=192.168.254.120
FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif0.1 PHYSOUT=vif2.0 SRC=192.168.254.132
DST=192.168.254.120
Back:
FORWARD IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120
DST=192.168.254.132
INPUT IN=eth1 OUT= SRC=192.168.254.120
DST=192.168.254.132
Now in this email I'm really only concerned with the first 2 virtual network
scernaios.
The shared physical device stuff can be ignored henceforth, basically because it
'just
works(tm)'.
For virtual networks there are basically 3 types of networking config we need to
represent
in terms of iptables rules, and these need to work for scenrios 1 & 2 - ie regardless
of
the magic sysctl knob.
Here is what we currently implement......
Type 1: Isolated virtual network
--------------------------------
- We don't add anything here
Type 2: Forwarding to a specific NIC only
-----------------------------------------
Chain POSTROUTING (policy ACCEPT 345 packets, 32627 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * eth1 0.0.0.0/0 0.0.0.0/0
PHYSDEV match ! --physdev-is-bridged
Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- eth1 vnet0 0.0.0.0/0 0.0.0.0/0
state RELATED,ESTABLISHED
0 0 ACCEPT all -- vnet0 eth1 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- * eth1 0.0.0.0/0 0.0.0.0/0
PHYSDEV match --physdev-in vnet0
Chain INPUT (policy ACCEPT 80483 packets, 382M bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT udp -- vnet0 * 0.0.0.0/0 0.0.0.0/0
udp dpt:53
0 0 ACCEPT tcp -- vnet0 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:53
0 0 ACCEPT udp -- vnet0 * 0.0.0.0/0 0.0.0.0/0
udp dpt:67
0 0 ACCEPT tcp -- vnet0 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:67
Type 3: Forwarding to any active NIC
------------------------------------
Chain POSTROUTING (policy ACCEPT 360 packets, 33843 bytes)
pkts bytes target prot opt in out source destination
2 476 MASQUERADE all -- * * 0.0.0.0/0 0.0.0.0/0
PHYSDEV match ! --physdev-is-bridged
Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- * virbr0 0.0.0.0/0 0.0.0.0/0
state RELATED,ESTABLISHED
0 0 ACCEPT all -- virbr0 * 0.0.0.0/0 0.0.0.0/0
0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0
PHYSDEV match --physdev-in virbr0
Chain INPUT (policy ACCEPT 80884 packets, 382M bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0
udp dpt:53
0 0 ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:53
0 0 ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0
udp dpt:67
0 0 ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:67
So how do these shape up, given the traversal scenarios & the overall desire to be
as restrictive as possible with traffic.
Problem: The INPUT rules are missing altogether for the isolated virtual network
so potentially DHCP/DNS will be blocked
Solution: Add them - simple bug.
Problem: The POSTROUTING rule is too generic so it matches pretty much any kind
of traffic, from any virtual network, or even from VPN devices setup
by VPNC.
Solution: Only masquerade traffic whose source address is within the netmask
associated with the virtual network in question
Problem: The FORWARD rule is too generic, forwarding traffic to/from the
virtual network regardless of whether the dest/src IP address
is within the netmask associated with the virtual network. Assuming
the first problem is setup to only masquerade valid IP addresses
from the virtual network, this rule would then allow guests to
spoof their IP and have it forwarded off-host.
Solution: Only forward packets whose IP address is within the netmask
associated with the virtual network
Problem: The policy of the FORWARD rule is ACCEPT, and/or later user defined
rules may inadvertently match on traffic from the virtual network,
again allowing through spoof traffic, or traffic from what should
be an isolated virtual network
Solution: There needs to be a catch-all REJECT rule associated with every
bridge device, in both directions
Problem: There is an extra physdev match per bridge device, and per guest
device. This is basically unneccessary since the previous rule
sets will already have allowed through the traffic. The physdev
matches also only work if net.bridge.bridge-nf-call-iptables = 1
Solution: Simply remove the per-device matches
Problem: The POSTROUTING rule has a physdev match applied, which only works
if net.bridge.bridge-nf-call-iptables = 1.
Solution: Remove physdev match & masquerade based on network address associated
with the virtual network
If we apply all solution outlined here, we'll end up with a set of rules which look
like this..........
Type 1: Isolated virtual network
--------------------------------
Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes)
pkts bytes target prot opt in out source destination
Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
pkts bytes target prot opt in out source destination
0 0 REJECT all -- * vnet2 0.0.0.0/0 0.0.0.0/0
reject-with icmp-port-unreachable
0 0 REJECT all -- vnet2 * 0.0.0.0/0 0.0.0.0/0
reject-with icmp-port-unreachable
Chain INPUT (policy ACCEPT 76724 packets, 366M bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT udp -- vnet2 * 0.0.0.0/0 0.0.0.0/0
udp dpt:53
0 0 ACCEPT tcp -- vnet2 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:53
0 0 ACCEPT udp -- vnet2 * 0.0.0.0/0 0.0.0.0/0
udp dpt:67
0 0 ACCEPT tcp -- vnet2 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:67
Type 2: Forwarding to a specific NIC only
-----------------------------------------
Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes)
pkts bytes target prot opt in out source destination
0 0 MASQUERADE all -- * eth1 192.168.200.0/24 0.0.0.0/0
Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT all -- eth1 vnet3 0.0.0.0/0 192.168.200.0/24
state RELATED,ESTABLISHED
0 0 ACCEPT all -- vnet3 eth1 192.168.200.0/24 0.0.0.0/0
0 0 REJECT all -- * vnet3 0.0.0.0/0 0.0.0.0/0
reject-with icmp-port-unreachable
0 0 REJECT all -- vnet3 * 0.0.0.0/0 0.0.0.0/0
reject-with icmp-port-unreachable
Chain INPUT (policy ACCEPT 76724 packets, 366M bytes)
pkts bytes target prot opt in out source destination
0 0 ACCEPT udp -- vnet3 * 0.0.0.0/0 0.0.0.0/0
udp dpt:53
0 0 ACCEPT tcp -- vnet3 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:53
0 0 ACCEPT udp -- vnet3 * 0.0.0.0/0 0.0.0.0/0
udp dpt:67
0 0 ACCEPT tcp -- vnet3 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:67
Type 3: Forwarding to any active NIC
------------------------------------
Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes)
pkts bytes target prot opt in out source destination
16 1292 MASQUERADE all -- * * 192.168.122.0/24 0.0.0.0/0
Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
pkts bytes target prot opt in out source destination
44 20200 ACCEPT all -- * virbr0 0.0.0.0/0 192.168.122.0/24
state RELATED,ESTABLISHED
56 3676 ACCEPT all -- virbr0 * 192.168.122.0/24 0.0.0.0/0
0 0 REJECT all -- * virbr0 0.0.0.0/0 0.0.0.0/0
reject-with icmp-port-unreachable
0 0 REJECT all -- virbr0 * 0.0.0.0/0 0.0.0.0/0
reject-with icmp-port-unreachable
Chain INPUT (policy ACCEPT 76724 packets, 366M bytes)
pkts bytes target prot opt in out source destination
28 1728 ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0
udp dpt:53
0 0 ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:53
5 1640 ACCEPT udp -- virbr0 * 0.0.0.0/0 0.0.0.0/0
udp dpt:67
0 0 ACCEPT tcp -- virbr0 * 0.0.0.0/0 0.0.0.0/0
tcp dpt:67
So in summary:
- Every single network type has the 4 INPUT rules for DHCP/DNS
- Every single network type has catch all REJECT rules in FORWARD chain for
both directions of traffic
- A network forwarding to any device, has ACCEPT rules which allow through
traffic associated with the virtual network IP range to/from any device
- A network forwarding to a specific device, has ACCEPT rules which allow
through traffic associated with the virtual network IP range to/from that
specific device.
- A network forwarding to any device, has MASQUERADE rule to translate
source address which matches the virtual network & destined for any dev
- A network forwarding to a specific device, has MASQUERADE rule to translate
source address which matches the virutal network & destinaed for that
specific device
- There are no physdev matches needed.
Hopefully at least one person has read this far through the email and still
understands what is going on....
I'm attaching a patch which implements all this.
BTW, there is also a bug in the vif-bridge script for Xen which adds a
per-guest VIF physdev match rule. This needs to be removed too.
Regards,
Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules:
http://search.cpan.org/~danberr/ -=|
|=- Projects:
http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|