[Libvir] Virtual network iptables rules

5 Apr 2007

      Warning, this is a long & complicated email with lots of horrible details :-)

I've long been a little confused with the way iptables & bridging interacts,
so set out to do some experiments. I added a -j LOG rule to every single chain
in both the filter & nat tables, and then tried various traffic patterns, to
see which chains were traversed & in which order. There are 2 types of config
I considered - virtual networking, and shared physical device. For both these
I tried with net.bridge.bridge-nf-call-iptables  on & off. This gave 4 scenarios
to test with. For the test I simply did 'ping -c 1 <ip addr>', which gives a
simple roundtrip with a single packet in each direction. The results were
as follows....

Scenario 1: Virtual network
===========================

  net.bridge.bridge-nf-call-iptables = 0

  Host:  eth0 -> Internet
         virbr0 -> MASQUERADE to eth0

  Guest: vif1.0 -> virbr0

Traffic: Guest -> Google
------------------------

Out:

NAT-PREROUTING  IN=virbr0 OUT=       SRC=192.168.122.47  DST=64.233.167.99
FORWARD         IN=virbr0 OUT=eth0   SRC=192.168.122.47  DST=64.233.167.99
NAT-POSTROUTING IN=       OUT=eth0   SRC=192.168.122.47  DST=64.233.167.99

Back:

FORWARD         IN=eth0   OUT=virbr0 SRC=64.233.167.99   DST=192.168.122.47

Traffic: Guest -> Host
----------------------

Out:

NAT-PREROUTING  IN=virbr0 OUT=       SRC=192.168.122.47  DST=192.168.122.1
INPUT           IN=virbr0 OUT=       SRC=192.168.122.47  DST=192.168.122.1

Back:

OUTPUT          IN=       OUT=virbr0 SRC=192.168.122.1   DST=192.168.122.47

Traffic: Host -> Guest
----------------------

Out:

NAT-OUTPUT      IN=       OUT=virbr0 SRC=192.168.122.1   DST=192.168.122.47
OUTPUT          IN=       OUT=virbr0 SRC=192.168.122.1   DST=192.168.122.47
NAT-POSTROUTING IN=       OUT=virbr0 SRC=192.168.122.1   DST=192.168.122.47

Back:

INPUT           IN=virbr0 OUT=       SRC=192.168.122.47  DST=192.168.122.1

Scenario 2: Virtual network
===========================

  net.bridge.bridge-nf-call-iptables = 1

  Host:  eth0 -> Internet
         virbr0 -> MASQUERADE to eth0

  Guest: vif1.0 -> virbr0

Traffic: Guest -> Google
------------------------

Out:

NAT-PREROUTING  IN=virbr0 OUT=       PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99
FORWARD         IN=virbr0 OUT=eth0   PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99
NAT-POSTROUTING IN=       OUT=eth0   PHYSIN=vif1.0 SRC=192.168.122.47 DST=64.233.167.99

Back:

FORWARD         IN=eth0   OUT=virbr0               SRC=64.233.167.99  DST=192.168.122.47

Traffic: Guest -> Host
----------------------

Out:

NAT-PREROUTING  IN=virbr0 OUT=       PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1
INPUT           IN=virbr0 OUT=       PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1

Back:

OUTPUT          IN=       OUT=virbr0               SRC=192.168.122.1  DST=192.168.122.47

Traffic: Host -> Guest
----------------------

Out:

NAT-OUTPUT      IN=       OUT=virbr0               SRC=192.168.122.1  DST=192.168.122.47
OUTPUT          IN=       OUT=virbr0               SRC=192.168.122.1  DST=192.168.122.47
NAT-POSTROUTING IN=       OUT=virbr0               SRC=192.168.122.1  DST=192.168.122.47

Back:

INPUT           IN=virbr0 OUT=       PHYSIN=vif1.0 SRC=192.168.122.47 DST=192.168.122.1

Scenario 3: Shared physical device
==================================

  net.bridge.bridge-nf-call-iptables = 0

  Host:  peth1 -> Internet
         xenbr0 -> peth1

  Guest: vif2.0 -> xenbr0

Traffic: Guest -> Google
------------------------

Nada

Traffic: Guest -> Host
----------------------

Out:

NAT-PREROUTING  IN=eth1 OUT=     SRC=192.168.254.120 DST=192.168.254.132
INPUT           IN=eth1 OUT=     SRC=192.168.254.120 DST=192.168.254.132

Back:

OUTPUT          IN=     OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120

Traffic: Host -> Guest
----------------------

Out:

NAT-OUTPUT      IN=     OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120
OUTPUT          IN=     OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120
NAT-POSTROUTING IN=     OUT=eth1 SRC=192.168.254.132 DST=192.168.254.120

Back:

INPUT           IN=eth1 OUT=     SRC=192.168.254.120 DST=192.168.254.132

Scenario 4: Shared physical device
==================================

  net.bridge.bridge-nf-call-iptables = 1

  Host:  peth1 -> Internet
         xenbr0 -> peth1

  Guest: vif2.0 -> xenbr0

Traffic: Guest -> Google
------------------------

Out:

NAT-PREROUTING  IN=xenbr1 OUT=       PHYSIN=vif2.0                SRC=192.168.254.120 DST=64.233.167.99
FORWARD         IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=peth1  SRC=192.168.254.120 DST=64.233.167.99
NAT-POSTROUTING IN=       OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=peth1  SRC=192.168.254.120 DST=64.233.167.99

Back:

FORWARD         IN=xenbr1 OUT=xenbr1 PHYSIN=peth1  PHYSOUT=vif2.0 SRC=64.233.167.99   DST=192.168.254.120

Traffic: Guest -> Host
----------------------

Out:

NAT-PREROUTING  IN=xenbr1 OUT=       PHYSIN=vif2.0                SRC=192.168.254.120 DST=192.168.254.132
FORWARD         IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120 DST=192.168.254.132
NAT-POSTROUTING IN=       OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120 DST=192.168.254.132
INPUT           IN=eth1   OUT=                                    SRC=192.168.254.120 DST=192.168.254.132

Back:

OUTPUT          IN=       OUT=eth1                                SRC=192.168.254.132 DST=192.168.254.120
FORWARD         IN=xenbr1 OUT=xenbr1 PHYSIN=vif0.1 PHYSOUT=vif2.0 SRC=192.168.254.132 DST=192.168.254.120

Traffic: Host -> Guest
----------------------

Out:

NAT-OUTPUT      IN=       OUT=eth1                                SRC=192.168.254.132 DST=192.168.254.120
OUTPUT          IN=       OUT=eth1                                SRC=192.168.254.132 DST=192.168.254.120
NAT-POSTROUTING IN=       OUT=eth1                                SRC=192.168.254.132 DST=192.168.254.120
FORWARD         IN=xenbr1 OUT=xenbr1 PHYSIN=vif0.1 PHYSOUT=vif2.0 SRC=192.168.254.132 DST=192.168.254.120

Back:

FORWARD         IN=xenbr1 OUT=xenbr1 PHYSIN=vif2.0 PHYSOUT=vif0.1 SRC=192.168.254.120 DST=192.168.254.132
INPUT           IN=eth1   OUT=                                    SRC=192.168.254.120 DST=192.168.254.132

Now in this email I'm really only concerned with the first 2 virtual network scernaios.
The shared physical device stuff can be ignored henceforth, basically because it 'just
works(tm)'.

For virtual networks there are basically 3 types of networking config we need to represent
in terms of iptables rules, and these need to work for scenrios 1 & 2 - ie regardless of
the magic sysctl knob.

Here is what we currently implement......

Type 1: Isolated virtual network
--------------------------------

  - We don't add anything here

Type 2: Forwarding to a specific NIC only
-----------------------------------------

Chain POSTROUTING (policy ACCEPT 345 packets, 32627 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  all  --  *      eth1    0.0.0.0/0            0.0.0.0/0           PHYSDEV match ! --physdev-is-bridged

Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  eth1   vnet0   0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  vnet0  eth1    0.0.0.0/0            0.0.0.0/0
    0     0 ACCEPT     all  --  *      eth1    0.0.0.0/0            0.0.0.0/0           PHYSDEV match --physdev-in vnet0

Chain INPUT (policy ACCEPT 80483 packets, 382M bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     udp  --  vnet0  *       0.0.0.0/0            0.0.0.0/0           udp dpt:53
    0     0 ACCEPT     tcp  --  vnet0  *       0.0.0.0/0            0.0.0.0/0           tcp dpt:53
    0     0 ACCEPT     udp  --  vnet0  *       0.0.0.0/0            0.0.0.0/0           udp dpt:67
    0     0 ACCEPT     tcp  --  vnet0  *       0.0.0.0/0            0.0.0.0/0           tcp dpt:67

Type 3: Forwarding to any active NIC
------------------------------------

Chain POSTROUTING (policy ACCEPT 360 packets, 33843 bytes)
 pkts bytes target     prot opt in     out     source               destination
    2   476 MASQUERADE  all  --  *      *       0.0.0.0/0            0.0.0.0/0           PHYSDEV match ! --physdev-is-bridged

Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  *      virbr0  0.0.0.0/0            0.0.0.0/0           state RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  virbr0 *       0.0.0.0/0            0.0.0.0/0
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0           PHYSDEV match --physdev-in virbr0

Chain INPUT (policy ACCEPT 80884 packets, 382M bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           udp dpt:53
    0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:53
    0     0 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           udp dpt:67
    0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:67

So how do these shape up, given the traversal scenarios & the overall desire to be
as restrictive as possible with traffic.

  Problem: The INPUT rules are missing altogether for the isolated virtual network
           so potentially DHCP/DNS will be blocked
 Solution: Add them - simple bug.

  Problem: The POSTROUTING rule is too generic so it matches pretty much any kind
           of traffic, from any virtual network, or even from VPN devices setup
           by VPNC.
 Solution: Only masquerade traffic whose source address is within the netmask
           associated with the virtual network in question

  Problem: The FORWARD rule is too generic, forwarding traffic to/from the
           virtual network regardless of whether the dest/src IP address 
           is within the netmask associated with the virtual network. Assuming
           the first problem is setup to only masquerade valid IP addresses 
           from the virtual network, this rule would then allow guests to
           spoof their IP and have it forwarded off-host.
 Solution: Only forward packets whose IP address is within the netmask 
           associated with the virtual network

  Problem: The policy of the FORWARD rule is ACCEPT, and/or later user defined
           rules may inadvertently match on traffic from the virtual network,
           again allowing through spoof traffic, or traffic from what should
           be an isolated virtual network
 Solution: There needs to be a catch-all REJECT rule associated with every
           bridge device, in both directions

  Problem: There is an extra physdev match per bridge device, and per guest
           device. This is basically unneccessary since the previous rule
           sets will already have allowed through the traffic. The physdev
           matches also only work if net.bridge.bridge-nf-call-iptables = 1
 Solution: Simply remove the per-device matches  

  Problem: The POSTROUTING rule has a physdev match applied, which only works
           if net.bridge.bridge-nf-call-iptables = 1.
 Solution: Remove physdev match & masquerade based on network address associated
           with the virtual network

If we apply all solution outlined here, we'll end up with a set of rules which look
like this..........

Type 1: Isolated virtual network
--------------------------------

Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 REJECT     all  --  *      vnet2   0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    0     0 REJECT     all  --  vnet2  *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable

Chain INPUT (policy ACCEPT 76724 packets, 366M bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     udp  --  vnet2  *       0.0.0.0/0            0.0.0.0/0           udp dpt:53
    0     0 ACCEPT     tcp  --  vnet2  *       0.0.0.0/0            0.0.0.0/0           tcp dpt:53
    0     0 ACCEPT     udp  --  vnet2  *       0.0.0.0/0            0.0.0.0/0           udp dpt:67
    0     0 ACCEPT     tcp  --  vnet2  *       0.0.0.0/0            0.0.0.0/0           tcp dpt:67

Type 2: Forwarding to a specific NIC only
-----------------------------------------

Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  all  --  *      eth1    192.168.200.0/24     0.0.0.0/0

Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  eth1   vnet3   0.0.0.0/0            192.168.200.0/24    state RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  vnet3  eth1    192.168.200.0/24     0.0.0.0/0
    0     0 REJECT     all  --  *      vnet3   0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    0     0 REJECT     all  --  vnet3  *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable

Chain INPUT (policy ACCEPT 76724 packets, 366M bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     udp  --  vnet3  *       0.0.0.0/0            0.0.0.0/0           udp dpt:53
    0     0 ACCEPT     tcp  --  vnet3  *       0.0.0.0/0            0.0.0.0/0           tcp dpt:53
    0     0 ACCEPT     udp  --  vnet3  *       0.0.0.0/0            0.0.0.0/0           udp dpt:67
    0     0 ACCEPT     tcp  --  vnet3  *       0.0.0.0/0            0.0.0.0/0           tcp dpt:67

Type 3: Forwarding to any active NIC
------------------------------------

Chain POSTROUTING (policy ACCEPT 273 packets, 26341 bytes)
 pkts bytes target     prot opt in     out     source               destination
   16  1292 MASQUERADE  all  --  *      *       192.168.122.0/24     0.0.0.0/0

Chain FORWARD (policy ACCEPT 29 packets, 2244 bytes)
 pkts bytes target     prot opt in     out     source               destination
   44 20200 ACCEPT     all  --  *      virbr0  0.0.0.0/0            192.168.122.0/24    state RELATED,ESTABLISHED
   56  3676 ACCEPT     all  --  virbr0 *       192.168.122.0/24     0.0.0.0/0
    0     0 REJECT     all  --  *      virbr0  0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable
    0     0 REJECT     all  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           reject-with icmp-port-unreachable

Chain INPUT (policy ACCEPT 76724 packets, 366M bytes)
 pkts bytes target     prot opt in     out     source               destination
   28  1728 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           udp dpt:53
    0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:53
    5  1640 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           udp dpt:67
    0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0           tcp dpt:67

So in summary:

  - Every single network type has the 4 INPUT rules for DHCP/DNS
  - Every single network type has catch all REJECT rules in FORWARD chain for
    both directions of traffic
  - A network forwarding to any device, has ACCEPT rules which allow through
    traffic associated with the virtual network IP range to/from any device
  - A network forwarding to a specific device, has ACCEPT rules which allow
    through traffic associated with the virtual network IP range to/from that 
    specific device.
  - A network forwarding to any device, has MASQUERADE rule to translate
    source address which matches the virtual network & destined for any dev
  - A network forwarding to a specific device, has MASQUERADE rule to translate
    source address which matches the virutal network & destinaed for that
    specific device
  - There are no physdev matches needed.

Hopefully at least one person has read this far through the email and still
understands what is going on....

I'm attaching a patch which implements all this.

BTW, there is also a bug in the vif-bridge script for Xen which adds a 
per-guest VIF  physdev match rule. This needs to be removed too.

Regards,
Dan.
-- 
|=- Red Hat, Engineering, Emerging Technologies, Boston.  +1 978 392 2496 -=|
|=-           Perl modules: http://search.cpan.org/~danberr/              -=|
|=-               Projects: http://freshmeat.net/~danielpb/               -=|
|=-  GnuPG: 7D3B9505   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505  -=|

Daniel P. Berrange

Mark McLoughlin

Daniel P. Berrange

Daniel P. Berrange

Daniel Veillard

Richard W.M. Jones

Richard W.M. Jones

Daniel P. Berrange

Mark McLoughlin

Daniel P. Berrange

Richard W.M. Jones

Mark McLoughlin

Daniel P. Berrange

tags

participants (4)