New subject: [Libvir] Virtual networking (not the rathole thread :-)

15 Jan 2007

      Hi,
	Dan and I have been discussing how to "fix networking", not just Xen's
networking but also getting something sane wrt. QEMU/KVM etc.

	Comments very welcome on the writeup below. The libvirt stuff is
towards the end, but I think all of it is probably useful to this list.

Cheers,
Mark.

                           Virtual Networking

The ability to manage virtual machines is something which is receiving a
lot of focus right now. Xen, KVM, QEMU and others provide the
infrastructure required to run a virtual machine, and each can provide
guests with a virtual network interface. This proposal addresses the
problem of how guests are networked together.

We aim:

      * To make virtual networking "just work".

        Guests should be able to communicate with each other, their host
        and the Internet without any fuss or configuration. This should
        be the case even with laptops and offline machines. 

      * To allow a greater flexibily with how guests are networked.

        It should be possible to isolate groups of guests in different
        networks, allow guests on different physical machines to
        communicate, firewall guests' networks from physical networks or
        allow guests to appear just like physical machines on physical
        networks. 

      * To make networking virtual machines analogous with networking
        physical machines. 

      * To support inter-networking between virtualisation technologies.

User Visible Concepts
=====================

It's important to consider the manner in which we expose the
functionality of virtual networking. What concepts will be exposing
through the UI? Are those concepts well defined and consistent? Are
those concepts more complex than neccessary? Or are the too simple to be
able to support the functionality we want?

Real world, or "physical", concepts[1]:

      * Network - a number of interconnected machines. 
      * Network Interface - hardware which enables a machine to connect
        to a network. 
      * Bridge - hardware which allows enables the interconnection of
        machines to form a network. Bridges can also be connected to
        other bridges to form a larger network. 
      * Router - hardware which connects two or more distinct networks,
        allowing machines on different networks to communicate with one
        another. Sometimes a router and a bridge are available as a
        combined piece of hardware - the bridge forms a network and the
        router connects that network to another distinct network. 
      * Firewall - software on a router which can be used to control how
        machines on an "external" network (e.g. the Internet) can
        communicate with machines on an "internal" network. For a given
        type of connection, you can choose to disallow connections of a
        that type or forward them to a specific internal machine. Can
        also be used to control how internal machines can communicate
        with external machines. 

With virtual networking, we will be exposing the following "virtual"
concepts:

      * Virtual Network - a number of interconnected virtual machines. 
      * Virtual Network Interface - a network interface in a virtual
        machine. 
      * Virtual Bridge - allows the interconnection of virtual machines
        to form a virtual network. A virtual bridge may be configured to
        also act as a virtual router and firewall. A virtual bridge may
        also be connected to another virtual bridge (perhaps on another
        physical machine) to create a larger virtual network. 

(Note, unprivileged users may create any of the above)

Finally, where the physical world meets the virtual world:

      * Shared Physical Interface - if a physical interface is
        configured to be "shared", then any number of virtual interfaces
        may be connected to it allowing virtual machines to be connected
        to the same physical network which the physical interface is
        connected to. 

        Only privileged users may configure a physical interface to be
        shared and/or connect guests to it. 

There are a few problems with all of the above:

     1. The distinction between a bridge and a router requires a lot of
        technical knowledge to fully understand. However, the model of
        e.g. a LinkSys router is familiar to a lot of people - a box
        which allows you to network your machines together and connect
        that network to (and firewall off) the Internet. 
     2. This "shared physical interface" notion is very "makey upey". We
        could perhaps talk about the idea in terms of connecting a
        physical interface to a virtual bridge, but it exposes the
        bridge vs. router distinction more than we'd like. 
     3. Guests are connected to a specific physical interface, whereas
        perhaps users wish guests to be connected to "the network" -
        i.e. if NetworkManager switched from wireless to wired while
        remaining on the same subnet, perhaps we'd like to automatically
        switch the bridge to the new network. In reality, though,
        bridged networking is only really sane for machines on a fairly
        static network connection. 

[1] - Yes, these definitions aren't entirely accurate, but they describe
the kind of understanding a moderately technical user might have of the
concepts.

Example Networks
================

Below are some example networks users may configure and an explanation
of how that network would be implemented in practice.

     1. A privileged user creates two (Xen) guests, each with a Virtual
        Network Interface. Without any special networking configuration,
        these two guests are connected to a default Virtual Network
        which contains a combined Virtual Bridge/Router/Firewall. 

          +-----------+                   D           +-----------+
          |   Guest   |           N   D   H           |   Guest   |
          |     A     |           A   N   C           |     B     |
          |   +---+   |           T   S   P           |   +---+   |
          |   |NIC|   |           ^   ^   ^           |   |NIC|   |
          +---+-+-+---+           +---+---+           +---+-+-+---+
                ^                     |                     ^
                |   +--------+    +---+---+    +--------+   |
                +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+
                    +--------+    +-------+    +--------+

        Notes: 

              * "vnbr0" is a bridge device with it's own IP address on
                the same subnet as the guests. 
              * IP forwarding is enabled in Dom0. Masquerading and DNAT
                is implemented using iptables. 
              * We run a DHCP server and a DNS proxy in Dom0 (e.g.
                dnsmasq) 
     2. A privileged user does exactly the same thing as (1), but with
        QEMU guests. 

                                          D
                                  N   D   H
                                  A   N   C
                                  T   S   P
                                  ^   ^   ^
                                  +---+---+
                                      |
                                  +---+---+
          +-----------+           | vnbr0 |           +-----------+
          |   Guest   |           +---+---+           |   Guest   |
          |     A     |               |               |     B     |
          |   +---+   |           +---+---+           |   +---+   |
          |   |NIC|   |           | vtap0 |           |   |NIC|   |
          +---+-+-+---+           +---+---+           +---+-+-+---+
                ^       +-------+     |     +-------+       ^
                |       |       | +---+---+ |       |       |
                +------>+ VLAN0 +-+  VDE  +-+ VLAN0 +<------+
                        |       | +-------+ |       |
                        +-------+           +-------+

        Notes:

              * VDE is a userspace ethernet bridge implemented using
                vde_switch 
              * "vtap0" is a TAP device created by vde_switch 
              * Everything else is the same as (1) 
              * This could be done without vde_switch by having Guest A
                create vtap0 and have Guest B connect directly to Guest
                A's VLAN. However, if Guest A is shut down, Guest B's
                network would go down. 
     3. An unprivileged user does exactly the same thing as (2).

          +-----------+                               +-----------+
          |   Guest   |          +----+----+          |   Guest   |
          |     A     |          |userspace|          |     B     |
          |   +---+   |          | network |          |   +---+   |
          |   |NIC|   |          |  stack  |          |   |NIC|   |
          +---+-+-+---+          +----+----+          +---+-+-+---+
                ^       +-------+     |     +-------+       ^
                |       |       | +---+---+ |       |       |
                +------>+ VLAN0 +-+  VDE  +-+ VLAN0 +<------+
                        |       | +-------+ |       |
                        +-------+           +-------+

        Notes:

              * Similar to (2) except there is can be no TAP device or
                bridge 
              * The userspace network stack is implemented using
                slirpvde to provide a DHCP server and DNS proxy to the
                network, but also effectively a SNAT and DNAT router. 
              * slirpvde implements ethernet, ip, tcp, udp, icmp, dhcp,
                tftp (etc.) in userspace. Completely crazy, but since
                the kernel apparently has no secure way to allow
                unprivileged users to leverage the kernel's network
                stack for this, then it must be done in userspace. 
     4. Same as (2), except the user also creates two Xen guests.

            +-----------+                   D           +-----------+ 
          |   Guest   |           N   D   H           |   Guest   | 
          |     A     |           A   N   C           |     B     | 
          |   +---+   |           T   S   P           |   +---+   | 
          |   |NIC|   |           ^   ^   ^           |   |NIC|   | 
          +---+-+-+---+           +---+---+           +---+-+-+---+ 
                ^                     |                     ^       
                |   +--------+    +---+---+    +--------+   |       
                +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+       
                    +--------+    +---+---+    +--------+           
                                      |                             
                                  +---+---+                         
                                  | vtap0 |                         
                                  +---+---+                         
                                      |                             
                      +-------+    +--+--+   +-------+               
                +---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+
                |     +-------+    +-----+   +-------+      |
                V                                           V 
          +---+-+-+---+                               +---+-+-+---+
          |   |NIC|   |                               |   |NIC|   |
          |   +---+   |                               |   +---+   |
          |   Guest   |                               |   Guest   |
          |     C     |                               |     D     |
          +-----------+                               +-----------+

        Notes:

              * In this case we could do away with VDE and have each
                QEMU guest use its own TAP device. 
     5. Same as (3) except Guests A and C are connected to a Shared
        Physical Interface. 

         +-----------+                 |          D       +-----------+
         |   Guest   |          ^      |  N   D   H       |   Guest   |
         |     A     |          |      |  A   N   C       |     B     |
         |   +---+   |      +---+---+  |  T   S   P       |   +---+   |
         |   |NIC|   |      | eth0  |  |  ^   ^   ^       |   |NIC|   |
         +---+-+-+---+      +---+---+  |  +---+---+       +---+-+-+---+
               ^                |      |      |                 ^
               | +--------+ +---+---+  |  +---+---+ +--------+  |
               +>+ vif1.0 +-+ ebr0  +  |  + vnbr0 +-+ vif2.0 +<-+
                 +--------+ +---+---+  |  +---+---+ +--------+  
                                |      |      |
                            +---+---+  |  +---+---+
                            | vtap1 |  |  | vtap0 |
                            +---+---+  |  +---+---+
                                |      |      |
                  +-------+  +--+--+   |   +--+--+  +-------+
               +->+ VLAN0 +--+ VDE +   |   + VDE +--+ VLAN0 +<-+
               |  +-------+  +-----+   |   +-----+  +-------+  |
               V                       |                       V 
         +---+-+-+---+                 |                 +---+-+-+---+
         |   |NIC|   |                 |                 |   |NIC|   |
         |   +---+   |                 |                 |   +---+   |
         |   Guest   |                 |                 |   Guest   |
         |     C     |                 |                 |     D     |
         +-----------+                 |                 +-----------+

        Notes:

              * The idea here is that when the admin configures eth0 to
                be shareable, eth0 is configured as an addressless NIC
                enslaved to a bridge which has the MAC address and IP
                address that eth0 should have 
              * Again, VDE is redundant here.
     6. Same as 2) except the QEMU guests are on a Virtual Network on
        another physical machine which is, in turn, connected to the
        Virtual Network on the first physical machine 

          +-----------+                   D           +-----------+ 
          |   Guest   |           N   D   H           |   Guest   | 
          |     A     |           A   N   C           |     B     | 
          |   +---+   |           T   S   P           |   +---+   | 
          |   |NIC|   |           ^   ^   ^           |   |NIC|   | 
          +---+-+-+---+           +---+---+           +---+-+-+---+ 
                ^                     |                     ^       
                |   +--------+    +---+---+    +--------+   |       
                +-->+ vif1.0 +----+ vnbr0 +----+ vif2.0 +<--+       
                    +--------+    +---+---+    +--------+           
                                      |                             
                                  +---+---+
                                  | vtap0 |
                                  +---+---+
                                      |
                                   +--+--+
                                   | VDE |
                                   +--+--+
                                      |
           First Physical Machine     V
           -------------------------------------------------------------
           Second Physical Machine    ^
                                      |
                      +-------+    +--+--+   +-------+
                +---->+ VLAN0 +----+ VDE +---+ VLAN0 +<-----+
                |     +-------+    +-----+   +-------+      |
                V                                           V 
          +---+-+-+---+                               +---+-+-+---+
          |   |NIC|   |                               |   |NIC|   |
          |   +---+   |                               |   +---+   |
          |   Guest   |                               |   Guest   |
          |     C     |                               |     D     |
          +-----------+                               +-----------+

        Notes:

              * What's going on here is that the two VDEs are connected
                over the network, either via a plan socket or perhaps
                encapsulated in another protocol like SSH or TLS 

One interesting thing to note from all of those examples is that
although QEMU's networking options are very interesting, it doesn't
actually make sense for a network to be implemented inside a guest. The
network needs to be external to any guests, and so we use VDE to offer
similar networking options to the ones QEMU provides. All QEMU needs to
be able to do is to connect to VDE. 

User Interface
==============

This isn't meant a UI specification, but just some notes on how this
stuff might be exposed in virt-manager.

      * Networks List: 
              * Name
              * Virtual/Physical
              * Status
              * Activity/traffic
      * Virtual Network Configuration: 
              * Name
              * List of connected guests
              * Allow other Virtual Networks to connect to this
                (defaults to no) 
              * Connect to other Virtual Network (defaults to none)
              * DHCP enabled - DHCP configuration: 
                      * IP range (optional)
                      * Router IP address (optional)
                      * Guest IP address/hostname assignment (optional)
              * Forwarding enabled - firewall configuration: 
                      * Incoming ports list and destination guest+port
                        for each (defaults to empty) 
                      * Blocked outgoing ports lists (defaults to empty)
      * Virtual NICs list: 
              * Guest interface name
              * Virtual Network/Shared Physical Interface
              * Hostname (defaults to guest name)
              * IP address (if assigned)
              * MAC address (if assigned)
      * Virtual NIC Configuration: 
              * Random MAC address, or user-supplied MAC address.
              * Virtual Network or Shared Physical Interface to connect
                to.

Implementation
==============

Parity with the current state of networking with Xen will be achieved
by:

      * Implementing "shared physical interface" support in Fedora's
        initscripts and network configuration tool. It boils down to
        configuring the interface (e.g. eth0) something like: 

             ifcfg-peth0:

               DEVICE=peth0
               ONBOOT=yes
               Bridge=eth0
               HWADDR=00:30:48:30:73:19

             ifcfg-eth0
               DEVICE=eth0
               Type=Bridge
               ONBOOT=yes
               BOOTPROTO=dhcp

      * Fixing Xen so that netloop is no longer required. Upstream have
        ideas about how to make Xen automatically copy any frames that
        are destined for Dom0 so that the netback driver doesn't run out
        of shared pages if Dom0 doesn't process the frames quickly
        enough. 
      * Create new network/vif scripts for Xen which will connect guests
        to a shared physical interface's bridge. 

Virtual Networks will be implemented in libvirt. First, there will be an
XML description of Virtual Networks e.g.:

  <network id="0">
    <name>Foo</name>
    <uuid>596a5d2171f48fb2e068e2386a5c413e</uuid>
    <listen address="172.31.0.5" port="1234" />
    <connections>
      <connection address="172.31.0.6" port="4321" />
    </conections>
    <dhcp enabled="true">
      <ip address="10.0.0.1" 
          netmask="255.255.255.0" 
          start="10.0.0.128"
          end="10.0.0.254" />
    </dhcp>
    <forwarding enabled="true">
      <incoming default="deny">
        <allow port="123" domain="foobar" destport="321" />
      </incoming>
      <outgoing default="allow">
        <deny port="25" />
      </outgoing>
    </forwarding>
  <network>

In a manner similar to libvirt's QEMU support, there will be a daemon to
manage Virtual Networks. The daemon will have access to a store of
network definitions. The daemon will be responsible for managing the
bridge devices, vde_switch/dhcp/dnses processes and the iptables rules
needed for SNAT/DNAT etc.

virsh command line interface would look like:

  $> virsh network-create foo.xml
  $> virsh network-dumpxml > foo.xml
  $> virsh network-define foo.xml
  $> virsh network-list
  $> virsh network-start Foo
  $> virsh network-stop Foo
  $> virsh network-restart Foo

The libvirt API for virtual networks would be modelled on the API for
virtual machines:

/*
 * Virtual Networks API
 */

/**
 * virNetwork:
 *
 * a virNetwork is a private structure representing a virtual network.
 */
typedef struct _virNetwork virNetwork;

/**
 * virNetworkPtr:
 *
 * a virNetworkPtr is pointer to a virNetwork private structure, this is the
 * type used to reference a virtual network in the API.
 */
typedef virNetwork *virNetworkPtr;

/**
 * virNetworkCreateFlags:
 *
 * Flags OR'ed together to provide specific behaviour when creating a
 * Network.
 */
typedef enum {
    VIR_NETWORK_NONE = 0
} virNetworkCreateFlags;

/*
 * List active networks
 */
int			virConnectNumOfNetworks	(virConnectPtr conn);
int			virConnectListNetworks	(virConnectPtr conn,
						 int *ids,
						 int maxids);

/*
 * List inactive networks
 */
int			virConnectNumOfDefinedNetworks	(virConnectPtr conn);
int			virConnectListDefinedNetworks	(virConnectPtr conn,
							 const char **names,
							 int maxnames);

/*
 * Lookup network by name, id or uuid
 */
virNetworkPtr		virNetworkLookupByName	(virConnectPtr conn,
						 const char *name);
virNetworkPtr 		virNetworkLookupByID	(virConnectPtr conn,
						 int id);
virNetworkPtr 		virNetworkLookupByUUID	(virConnectPtr conn,
						 const unsigned char *uuid);
virNetworkPtr		virNetworkLookupByUUIDString	(virConnectPtr conn,
							 const char *uuid);

/*
 * Create active transient network
 */
virNetworkPtr		virNetworkCreateXML	(virConnectPtr conn,
						 const char *xmlDesc,
						 unsigned int flags);
/*
 * Define inactive persistent network
 */
virNetworkPtr		virNetworkDefineXML	(virConnectPtr conn,
						 const char *xmlDesc);

/*
 * Delete persistent network
 */
int			virNetworkUndefine	(virNetworkPtr network);

/*
 * Activate persistent network
 */
int			virNetworkCreate	(virNetworkPtr network);

/*
 * Network destroy/free
 */
int			virNetworkDestroy	(virNetworkPtr network);
int			virNetworkFree		(virNetworkPtr network);

/*
 * Network informations
 */
const char*		virNetworkGetName	(virNetworkPtr network);
unsigned int		virNetworkGetID		(virNetworkPtr network);
int			virNetworkGetUUID	(virNetworkPtr network,
						 unsigned char *uuid);
int			virNetworkGetUUIDString	(virNetworkPtr network,
						 char *buf);
char *			virNetworkGetXMLDesc	(virNetworkPtr network,
						 int flags);

Discussion points on the XML format and API:

      * The XML format isn't thought out at all, but briefly: 
              * The <listen> and <connections> elements describe
                networks connected across physical machine boundaries. 
              * The <dhcp> element describes the configuration of the
                DHCP server on the network. 
              * The <forwarding> element describes how incoming and
                outgoing connections are forwarded. 
      * Since virConnect is supposed to be a connection to a specific
        hypervisor, does it make sense to create networks (which should
        be hypervisor agnostic) through virConnect? 
      * Are we needlessly replicating any mistakes from the domains API
        here? e.g. is the transient vs. persistent distinction useful
        for networks? 
      * Is a UUID useful for networks? Yes, because it distinguishes
        between networks of the same name on different hosts? 
      * Where is the connection between domains and networks in either
        the API or the XML format? How is a domain associated with a
        network? You put a bridge name in the <network>l definition
        and use that in the domains <interface> definition? Or you put
        the network name in the interface definition and have libvirt
        look up the bridge name when creating the guest? 
      * Should it be possible to stop/start/restart a network? What for?
        If something breaks the user restarts it to see if that will fix
        it?

[Libvir] Virtual networking

Mark McLoughlin

Mark McLoughlin

Hugh Brock

Mark McLoughlin

Karel Zak

Mark McLoughlin

Hugh Brock

Mark McLoughlin

Mark McLoughlin

Aron Griffis

Aron Griffis

Mark McLoughlin

Mark McLoughlin

Mark McLoughlin

Karel Zak

Karel Zak

Mark McLoughlin

Mark McLoughlin

Mark McLoughlin

Mark McLoughlin

Mark McLoughlin

Karel Zak

Mark McLoughlin

Mark McLoughlin

Mark McLoughlin

Mark McLoughlin

Mark McLoughlin

tags

participants (7)