Hi,
Dan and I have been discussing how to "fix networking", not just Xen's
networking but also getting something sane wrt. QEMU/KVM etc.
Comments very welcome on the writeup below. The libvirt stuff is
towards the end, but I think all of it is probably useful to this list.
Cheers,
Mark.
Virtual Networking
The ability to manage virtual machines is something which is receiving a
lot of focus right now. Xen, KVM, QEMU and others provide the
infrastructure required to run a virtual machine, and each can provide
guests with a virtual network interface. This proposal addresses the
problem of how guests are networked together.
We aim:
* To make virtual networking "just work".
Guests should be able to communicate with each other, their host
and the Internet without any fuss or configuration. This should
be the case even with laptops and offline machines.
* To allow a greater flexibily with how guests are networked.
It should be possible to isolate groups of guests in different
networks, allow guests on different physical machines to
communicate, firewall guests' networks from physical networks or
allow guests to appear just like physical machines on physical
networks.
* To make networking virtual machines analogous with networking
physical machines.
* To support inter-networking between virtualisation technologies.
User Visible Concepts
=====================
It's important to consider the manner in which we expose the
functionality of virtual networking. What concepts will be exposing
through the UI? Are those concepts well defined and consistent? Are
those concepts more complex than neccessary? Or are the too simple to be
able to support the functionality we want?
Real world, or "physical", concepts[1]:
* Network - a number of interconnected machines.
* Network Interface - hardware which enables a machine to connect
to a network.
* Bridge - hardware which allows enables the interconnection of
machines to form a network. Bridges can also be connected to
other bridges to form a larger network.
* Router - hardware which connects two or more distinct networks,
allowing machines on different networks to communicate with one
another. Sometimes a router and a bridge are available as a
combined piece of hardware - the bridge forms a network and the
router connects that network to another distinct network.
* Firewall - software on a router which can be used to control how
machines on an "external" network (e.g. the Internet) can
communicate with machines on an "internal" network. For a given
type of connection, you can choose to disallow connections of a
that type or forward them to a specific internal machine. Can
also be used to control how internal machines can communicate
with external machines.
With virtual networking, we will be exposing the following "virtual"
concepts:
* Virtual Network - a number of interconnected virtual machines.
* Virtual Network Interface - a network interface in a virtual
machine.
* Virtual Bridge - allows the interconnection of virtual machines
to form a virtual network. A virtual bridge may be configured to
also act as a virtual router and firewall. A virtual bridge may
also be connected to another virtual bridge (perhaps on another
physical machine) to create a larger virtual network.
(Note, unprivileged users may create any of the above)
Finally, where the physical world meets the virtual world:
* Shared Physical Interface - if a physical interface is
configured to be "shared", then any number of virtual interfaces
may be connected to it allowing virtual machines to be connected
to the same physical network which the physical interface is
connected to.
Only privileged users may configure a physical interface to be
shared and/or connect guests to it.
There are a few problems with all of the above:
1. The distinction between a bridge and a router requires a lot of
technical knowledge to fully understand. However, the model of
e.g. a LinkSys router is familiar to a lot of people - a box
which allows you to network your machines together and connect
that network to (and firewall off) the Internet.
2. This "shared physical interface" notion is very "makey upey".
We
could perhaps talk about the idea in terms of connecting a
physical interface to a virtual bridge, but it exposes the
bridge vs. router distinction more than we'd like.
3. Guests are connected to a specific physical interface, whereas
perhaps users wish guests to be connected to "the network" -
i.e. if NetworkManager switched from wireless to wired while
remaining on the same subnet, perhaps we'd like to automatically
switch the bridge to the new network. In reality, though,
bridged networking is only really sane for machines on a fairly
static network connection.
[1] - Yes, these definitions aren't entirely accurate, but they describe
the kind of understanding a moderately technical user might have of the
concepts.
Example Networks
================
Below are some example networks users may configure and an explanation
of how that network would be implemented in practice.
1. A privileged user creates two (Xen) guests, each with a Virtual
Network Interface. Without any special networking configuration,
these two guests are connected to a default Virtual Network
which contains a combined Virtual Bridge/Router/Firewall.
+-----------+ D +-----------+
| Guest | N D H | Guest |
| A | A N C | B |
| +---+ | T S P | +---+ |
| |NIC| | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ | ^
| +--------+ +---+---+ +--------+ |
+-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+
+--------+ +-------+ +--------+
Notes:
* "vnbr0" is a bridge device with it's own IP address on
the same subnet as the guests.
* IP forwarding is enabled in Dom0. Masquerading and DNAT
is implemented using iptables.
* We run a DHCP server and a DNS proxy in Dom0 (e.g.
dnsmasq)
2. A privileged user does exactly the same thing as (1), but with
QEMU guests.
D
N D H
A N C
T S P
^ ^ ^
+---+---+
|
+---+---+
+-----------+ | vnbr0 | +-----------+
| Guest | +---+---+ | Guest |
| A | | | B |
| +---+ | +---+---+ | +---+ |
| |NIC| | | vtap0 | | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ +-------+ | +-------+ ^
| | | +---+---+ | | |
+------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+
| | +-------+ | |
+-------+ +-------+
Notes:
* VDE is a userspace ethernet bridge implemented using
vde_switch
* "vtap0" is a TAP device created by vde_switch
* Everything else is the same as (1)
* This could be done without vde_switch by having Guest A
create vtap0 and have Guest B connect directly to Guest
A's VLAN. However, if Guest A is shut down, Guest B's
network would go down.
3. An unprivileged user does exactly the same thing as (2).
+-----------+ +-----------+
| Guest | +----+----+ | Guest |
| A | |userspace| | B |
| +---+ | | network | | +---+ |
| |NIC| | | stack | | |NIC| |
+---+-+-+---+ +----+----+ +---+-+-+---+
^ +-------+ | +-------+ ^
| | | +---+---+ | | |
+------>+ VLAN0 +-+ VDE +-+ VLAN0 +<------+
| | +-------+ | |
+-------+ +-------+
Notes:
* Similar to (2) except there is can be no TAP device or
bridge
* The userspace network stack is implemented using
slirpvde to provide a DHCP server and DNS proxy to the
network, but also effectively a SNAT and DNAT router.
* slirpvde implements ethernet, ip, tcp, udp, icmp, dhcp,
tftp (etc.) in userspace. Completely crazy, but since
the kernel apparently has no secure way to allow
unprivileged users to leverage the kernel's network
stack for this, then it must be done in userspace.
4. Same as (2), except the user also creates two Xen guests.
+-----------+ D +-----------+
| Guest | N D H | Guest |
| A | A N C | B |
| +---+ | T S P | +---+ |
| |NIC| | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ | ^
| +--------+ +---+---+ +--------+ |
+-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+
+--------+ +---+---+ +--------+
|
+---+---+
| vtap0 |
+---+---+
|
+-------+ +--+--+ +-------+
+---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+
| +-------+ +-----+ +-------+ |
V V
+---+-+-+---+ +---+-+-+---+
| |NIC| | | |NIC| |
| +---+ | | +---+ |
| Guest | | Guest |
| C | | D |
+-----------+ +-----------+
Notes:
* In this case we could do away with VDE and have each
QEMU guest use its own TAP device.
5. Same as (3) except Guests A and C are connected to a Shared
Physical Interface.
+-----------+ | D +-----------+
| Guest | ^ | N D H | Guest |
| A | | | A N C | B |
| +---+ | +---+---+ | T S P | +---+ |
| |NIC| | | eth0 | | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ | +---+---+ +---+-+-+---+
^ | | | ^
| +--------+ +---+---+ | +---+---+ +--------+ |
+>+ vif1.0 +-+ ebr0 + | + vnbr0 +-+ vif2.0 +<-+
+--------+ +---+---+ | +---+---+ +--------+
| | |
+---+---+ | +---+---+
| vtap1 | | | vtap0 |
+---+---+ | +---+---+
| | |
+-------+ +--+--+ | +--+--+ +-------+
+->+ VLAN0 +--+ VDE + | + VDE +--+ VLAN0 +<-+
| +-------+ +-----+ | +-----+ +-------+ |
V | V
+---+-+-+---+ | +---+-+-+---+
| |NIC| | | | |NIC| |
| +---+ | | | +---+ |
| Guest | | | Guest |
| C | | | D |
+-----------+ | +-----------+
Notes:
* The idea here is that when the admin configures eth0 to
be shareable, eth0 is configured as an addressless NIC
enslaved to a bridge which has the MAC address and IP
address that eth0 should have
* Again, VDE is redundant here.
6. Same as 2) except the QEMU guests are on a Virtual Network on
another physical machine which is, in turn, connected to the
Virtual Network on the first physical machine
+-----------+ D +-----------+
| Guest | N D H | Guest |
| A | A N C | B |
| +---+ | T S P | +---+ |
| |NIC| | ^ ^ ^ | |NIC| |
+---+-+-+---+ +---+---+ +---+-+-+---+
^ | ^
| +--------+ +---+---+ +--------+ |
+-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+
+--------+ +---+---+ +--------+
|
+---+---+
| vtap0 |
+---+---+
|
+--+--+
| VDE |
+--+--+
|
First Physical Machine V
-------------------------------------------------------------
Second Physical Machine ^
|
+-------+ +--+--+ +-------+
+---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+
| +-------+ +-----+ +-------+ |
V V
+---+-+-+---+ +---+-+-+---+
| |NIC| | | |NIC| |
| +---+ | | +---+ |
| Guest | | Guest |
| C | | D |
+-----------+ +-----------+
Notes:
* What's going on here is that the two VDEs are connected
over the network, either via a plan socket or perhaps
encapsulated in another protocol like SSH or TLS
One interesting thing to note from all of those examples is that
although QEMU's networking options are very interesting, it doesn't
actually make sense for a network to be implemented inside a guest. The
network needs to be external to any guests, and so we use VDE to offer
similar networking options to the ones QEMU provides. All QEMU needs to
be able to do is to connect to VDE.
User Interface
==============
This isn't meant a UI specification, but just some notes on how this
stuff might be exposed in virt-manager.
* Networks List:
* Name
* Virtual/Physical
* Status
* Activity/traffic
* Virtual Network Configuration:
* Name
* List of connected guests
* Allow other Virtual Networks to connect to this
(defaults to no)
* Connect to other Virtual Network (defaults to none)
* DHCP enabled - DHCP configuration:
* IP range (optional)
* Router IP address (optional)
* Guest IP address/hostname assignment (optional)
* Forwarding enabled - firewall configuration:
* Incoming ports list and destination guest+port
for each (defaults to empty)
* Blocked outgoing ports lists (defaults to empty)
* Virtual NICs list:
* Guest interface name
* Virtual Network/Shared Physical Interface
* Hostname (defaults to guest name)
* IP address (if assigned)
* MAC address (if assigned)
* Virtual NIC Configuration:
* Random MAC address, or user-supplied MAC address.
* Virtual Network or Shared Physical Interface to connect
to.
Implementation
==============
Parity with the current state of networking with Xen will be achieved
by:
* Implementing "shared physical interface" support in Fedora's
initscripts and network configuration tool. It boils down to
configuring the interface (e.g. eth0) something like:
ifcfg-peth0:
DEVICE=peth0
ONBOOT=yes
Bridge=eth0
HWADDR=00:30:48:30:73:19
ifcfg-eth0
DEVICE=eth0
Type=Bridge
ONBOOT=yes
BOOTPROTO=dhcp
* Fixing Xen so that netloop is no longer required. Upstream have
ideas about how to make Xen automatically copy any frames that
are destined for Dom0 so that the netback driver doesn't run out
of shared pages if Dom0 doesn't process the frames quickly
enough.
* Create new network/vif scripts for Xen which will connect guests
to a shared physical interface's bridge.
Virtual Networks will be implemented in libvirt. First, there will be an
XML description of Virtual Networks e.g.:
<network id="0">
<name>Foo</name>
<uuid>596a5d2171f48fb2e068e2386a5c413e</uuid>
<listen address="172.31.0.5" port="1234" />
<connections>
<connection address="172.31.0.6" port="4321" />
</conections>
<dhcp enabled="true">
<ip address="10.0.0.1"
netmask="255.255.255.0"
start="10.0.0.128"
end="10.0.0.254" />
</dhcp>
<forwarding enabled="true">
<incoming default="deny">
<allow port="123" domain="foobar" destport="321"
/>
</incoming>
<outgoing default="allow">
<deny port="25" />
</outgoing>
</forwarding>
<network>
In a manner similar to libvirt's QEMU support, there will be a daemon to
manage Virtual Networks. The daemon will have access to a store of
network definitions. The daemon will be responsible for managing the
bridge devices, vde_switch/dhcp/dnses processes and the iptables rules
needed for SNAT/DNAT etc.
virsh command line interface would look like:
$> virsh network-create foo.xml
$> virsh network-dumpxml > foo.xml
$> virsh network-define foo.xml
$> virsh network-list
$> virsh network-start Foo
$> virsh network-stop Foo
$> virsh network-restart Foo
The libvirt API for virtual networks would be modelled on the API for
virtual machines:
/*
* Virtual Networks API
*/
/**
* virNetwork:
*
* a virNetwork is a private structure representing a virtual network.
*/
typedef struct _virNetwork virNetwork;
/**
* virNetworkPtr:
*
* a virNetworkPtr is pointer to a virNetwork private structure, this is the
* type used to reference a virtual network in the API.
*/
typedef virNetwork *virNetworkPtr;
/**
* virNetworkCreateFlags:
*
* Flags OR'ed together to provide specific behaviour when creating a
* Network.
*/
typedef enum {
VIR_NETWORK_NONE = 0
} virNetworkCreateFlags;
/*
* List active networks
*/
int virConnectNumOfNetworks (virConnectPtr conn);
int virConnectListNetworks (virConnectPtr conn,
int *ids,
int maxids);
/*
* List inactive networks
*/
int virConnectNumOfDefinedNetworks (virConnectPtr conn);
int virConnectListDefinedNetworks (virConnectPtr conn,
const char **names,
int maxnames);
/*
* Lookup network by name, id or uuid
*/
virNetworkPtr virNetworkLookupByName (virConnectPtr conn,
const char *name);
virNetworkPtr virNetworkLookupByID (virConnectPtr conn,
int id);
virNetworkPtr virNetworkLookupByUUID (virConnectPtr conn,
const unsigned char *uuid);
virNetworkPtr virNetworkLookupByUUIDString (virConnectPtr conn,
const char *uuid);
/*
* Create active transient network
*/
virNetworkPtr virNetworkCreateXML (virConnectPtr conn,
const char *xmlDesc,
unsigned int flags);
/*
* Define inactive persistent network
*/
virNetworkPtr virNetworkDefineXML (virConnectPtr conn,
const char *xmlDesc);
/*
* Delete persistent network
*/
int virNetworkUndefine (virNetworkPtr network);
/*
* Activate persistent network
*/
int virNetworkCreate (virNetworkPtr network);
/*
* Network destroy/free
*/
int virNetworkDestroy (virNetworkPtr network);
int virNetworkFree (virNetworkPtr network);
/*
* Network informations
*/
const char* virNetworkGetName (virNetworkPtr network);
unsigned int virNetworkGetID (virNetworkPtr network);
int virNetworkGetUUID (virNetworkPtr network,
unsigned char *uuid);
int virNetworkGetUUIDString (virNetworkPtr network,
char *buf);
char * virNetworkGetXMLDesc (virNetworkPtr network,
int flags);
Discussion points on the XML format and API:
* The XML format isn't thought out at all, but briefly:
* The <listen> and <connections> elements describe
networks connected across physical machine boundaries.
* The <dhcp> element describes the configuration of the
DHCP server on the network.
* The <forwarding> element describes how incoming and
outgoing connections are forwarded.
* Since virConnect is supposed to be a connection to a specific
hypervisor, does it make sense to create networks (which should
be hypervisor agnostic) through virConnect?
* Are we needlessly replicating any mistakes from the domains API
here? e.g. is the transient vs. persistent distinction useful
for networks?
* Is a UUID useful for networks? Yes, because it distinguishes
between networks of the same name on different hosts?
* Where is the connection between domains and networks in either
the API or the XML format? How is a domain associated with a
network? You put a bridge name in the <network>l definition
and use that in the domains <interface> definition? Or you put
the network name in the interface definition and have libvirt
look up the bridge name when creating the guest?
* Should it be possible to stop/start/restart a network? What for?
If something breaks the user restarts it to see if that will fix
it?