On Mon, Jan 15, 2007 at 08:06:18PM +0000, Mark McLoughlin wrote:
Hi,
Dan and I have been discussing how to "fix networking", not just Xen's
networking but also getting something sane wrt. QEMU/KVM etc.
Comments very welcome on the writeup below. The libvirt stuff is
towards the end, but I think all of it is probably useful to this list.
Since we've disappeared down a rat-hole with the other part of the thread,
here's an attempt to get back on-topic :-)
1. A privileged user creates two (Xen) guests, each with a
Virtual
Network Interface. Without any special networking configuration,
these two guests are connected to a default Virtual Network
which contains a combined Virtual Bridge/Router/Firewall.
+-----------+ D +-----------+
| Guest | N D H | Guest |
| A | A N C | B |
| +---+ | T S P | +---+ |
| |NIC| | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ | ^
| +--------+ +---+---+ +--------+ |
+-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+
+--------+ +-------+ +--------+
Notes:
* "vnbr0" is a bridge device with it's own IP address on
the same subnet as the guests.
* IP forwarding is enabled in Dom0. Masquerading and DNAT
is implemented using iptables.
* We run a DHCP server and a DNS proxy in Dom0 (e.g.
dnsmasq)
2. A privileged user does exactly the same thing as (1), but with
QEMU guests.
D
N D H
A N C
T S P
^ ^ ^
+---+---+
|
+---+---+
+-----------+ | vnbr0 | +-----------+
| Guest | +---+---+ | Guest |
| A | | | B |
| +---+ | +---+---+ | +---+ |
| |NIC| | | vtap0 | | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ +-------+ | +-------+ ^
| | | +---+---+ | | |
+------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+
| | +-------+ | |
+-------+ +-------+
Notes:
* VDE is a userspace ethernet bridge implemented using
vde_switch
* "vtap0" is a TAP device created by vde_switch
* Everything else is the same as (1)
* This could be done without vde_switch by having Guest A
create vtap0 and have Guest B connect directly to Guest
A's VLAN. However, if Guest A is shut down, Guest B's
network would go down.
Since the user is privileged, another way to do without VDE is to mirror
the Xen case almost exactly, creating one tap device per guest, instead
of Xen's netback vif devices:
+-----------+ D +-----------+
| Guest | N D H | Guest |
| A | A N C | B |
| +---+ | T S P | +---+ |
| |NIC| | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ | ^
| +--------+ +---+---+ +--------+ |
+-->+ vtap0 +----+ vnbr0 +----+ vtap1 +<--+
+--------+ +-------+ +--------+
3. An unprivileged user does exactly the same thing as (2).
+-----------+ +-----------+
| Guest | +----+----+ | Guest |
| A | |userspace| | B |
| +---+ | | network | | +---+ |
| |NIC| | | stack | | |NIC| |
+---+-+-+---+ +----+----+ +---+-+-+---+
^ +-------+ | +-------+ ^
| | | +---+---+ | | |
+------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+
| | +-------+ | |
+-------+ +-------+
Notes:
* Similar to (2) except there is can be no TAP device or
bridge
* The userspace network stack is implemented using
slirpvde to provide a DHCP server and DNS proxy to the
network, but also effectively a SNAT and DNAT router.
* slirpvde implements ethernet, ip, tcp, udp, icmp, dhcp,
tftp (etc.) in userspace. Completely crazy, but since
the kernel apparently has no secure way to allow
unprivileged users to leverage the kernel's network
stack for this, then it must be done in userspace.
Is it practical to just have some kind of privileged proxy that would
merely create & configure the tap devices on behalf of the unprivileged
guests ? If we just create tap devices for any unprivileged guest, but
kept them discounted from any real network device, would that still be
a big hole ?
Or can we leverage QEMU's builtin SLIRP or other non-TAP networking modes
to construct something reasonable in userspace, without using VDE.
4. Same as (2), except the user also creates two Xen guests.
+-----------+ D +-----------+
| Guest | N D H | Guest |
| A | A N C | B |
| +---+ | T S P | +---+ |
| |NIC| | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ | ^
| +--------+ +---+---+ +--------+ |
+-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+
+--------+ +---+---+ +--------+
|
+---+---+
| vtap0 |
+---+---+
|
+-------+ +--+--+ +-------+
+---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+
| +-------+ +-----+ +-------+ |
V V
+---+-+-+---+ +---+-+-+---+
| |NIC| | | |NIC| |
| +---+ | | +---+ |
| Guest | | Guest |
| C | | D |
+-----------+ +-----------+
Notes:
* In this case we could do away with VDE and have each
QEMU guest use its own TAP device.
Yep, that would make sense if the guests were privileged - best to stay
close to kernel networking devices if at all possible.
5. Same as (3) except Guests A and C are connected to a Shared
Physical Interface.
+-----------+ | D +-----------+
| Guest | ^ | N D H | Guest |
| A | | | A N C | B |
| +---+ | +---+---+ | T S P | +---+ |
| |NIC| | | eth0 | | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ | +---+---+ +---+-+-+---+
^ | | | ^
| +--------+ +---+---+ | +---+---+ +--------+ |
+>+ vif1.0 +-+ ebr0 + | + vnbr0 +-+ vif2.0 +<-+
+--------+ +---+---+ | +---+---+ +--------+
| | |
+---+---+ | +---+---+
| vtap1 | | | vtap0 |
+---+---+ | +---+---+
| | |
+-------+ +--+--+ | +--+--+ +-------+
+->+ VLAN0 +--+ VDE + | + VDE +--+ VLAN0 +<-+
| +-------+ +-----+ | +-----+ +-------+ |
V | V
+---+-+-+---+ | +---+-+-+---+
| |NIC| | | | |NIC| |
| +---+ | | | +---+ |
| Guest | | | Guest |
| C | | | D |
+-----------+ | +-----------+
Notes:
* The idea here is that when the admin configures eth0 to
be shareable, eth0 is configured as an addressless NIC
enslaved to a bridge which has the MAC address and IP
address that eth0 should have
* Again, VDE is redundant here.
This diagram just scares me, but I guess its merely showing two isolated
networks with a different set of guests on each. Probably be much less
scary if not ascii-art..
6. Same as 2) except the QEMU guests are on a Virtual Network
on
another physical machine which is, in turn, connected to the
Virtual Network on the first physical machine
+-----------+ D +-----------+
| Guest | N D H | Guest |
| A | A N C | B |
| +---+ | T S P | +---+ |
| |NIC| | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ | ^
| +--------+ +---+---+ +--------+ |
+-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+
+--------+ +---+---+ +--------+
|
+---+---+
| vtap0 |
+---+---+
|
+--+--+
| VDE |
+--+--+
|
First Physical Machine V
-------------------------------------------------------------
Second Physical Machine ^
|
+-------+ +--+--+ +-------+
+---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+
| +-------+ +-----+ +-------+ |
V V
+---+-+-+---+ +---+-+-+---+
| |NIC| | | |NIC| |
| +---+ | | +---+ |
| Guest | | Guest |
| C | | D |
+-----------+ +-----------+
Notes:
* What's going on here is that the two VDEs are connected
over the network, either via a plan socket or perhaps
encapsulated in another protocol like SSH or TLS
This is the case where I always thought VDE did get interesting - being
able to create pure userspace virtual networks across machines, without
any root privileges. Gives joe-user a nice lot of power
Virtual Networks will be implemented in libvirt. First, there will be
an
XML description of Virtual Networks e.g.:
<network id="0">
<name>Foo</name>
<uuid>596a5d2171f48fb2e068e2386a5c413e</uuid>
<listen address="172.31.0.5" port="1234" />
<connections>
<connection address="172.31.0.6" port="4321" />
</conections>
<dhcp enabled="true">
<ip address="10.0.0.1"
netmask="255.255.255.0"
start="10.0.0.128"
end="10.0.0.254" />
</dhcp>
<forwarding enabled="true">
<incoming default="deny">
<allow port="123" domain="foobar" destport="321"
/>
</incoming>
<outgoing default="allow">
<deny port="25" />
</outgoing>
</forwarding>
<network>
Got to also think how we connect guest domains to the virtual network.
Currently we just have something really simple like
<interface type="bridge">
<source bridge='xenbr0'/>
<mac address='00:11:22:33:44:55'/>
</interface>
I guess we've probably want to refer to the UUID of the network to map
it into the guest.
Oh, do we to define a 'network 0' to the the physical network of the hos
machine - what if there are multiple host NICs - any conventions we
need to let us distinguish ? Maybe its best to just refer to the host
network by using IP addresses - so we can deal better which case where
a machine switches from eth0 -> eth1 (wired to wireless) but keeps the
same IP address, or some such.
* The XML format isn't thought out at all, but briefly:
* The <listen> and <connections> elements describe
networks connected across physical machine boundaries.
* The <dhcp> element describes the configuration of the
DHCP server on the network.
* The <forwarding> element describes how incoming and
outgoing connections are forwarded.
Dan.
--
|=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=|
|=- Perl modules:
http://search.cpan.org/~danberr/ -=|
|=- Projects:
http://freshmeat.net/~danielpb/ -=|
|=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|