Abstraction of guest <--> host network connection in libvirt
=====================================
The <interface> element of a guest's domain config in libvirt has a
<source> element that describes what resources on a host will be used to
connect the guest's network interface to the rest of the world. This is
very flexible, allowing several different types of connection (virtual
network, host bridge, direct macvtap connection to physical interface,
qemu usermode, user-defined via an external script), but currently has
the problem that unnecessary details of the host config are embedded
into the guest's config; if the guest is migrated to a different host,
and that host has a different hardware or network config (or possibly
the same hardware, but that hardware is currently in use by a different
guest), the migration will fail.
I am proposing a change to libvirt's network XML that will allow us to
(optionally - old configs will remain valid) remove the host details
from the guest's domain XML (which can move around from host to host)
and place them in the network XML (which remains with the host); the
domain XML will then use existing config elements to associate each
guest interface with a "network".
The motivating use case for this change is the "direct" connection type
(which uses macvtap for vepa and vnlink connections directly between a
guest and a physical interface, rather than through a bridge), but it is
applicable for all types of connection. (Another hopeful side effect of
this change will be to make libvirt's network connection model easier to
realize on non-Linux hypervisors (eg, VMWare ESX), so Mathias - please
chime in!)
Background
--------------------
libvirt currently has 3 major types of guest interface connection (there
are also "type='user'" and "type='ethernet'", but they
probably wouldn't
be used in a multi-host environment, so I'm not considering them here):
1) type='network'
The guest's network interface is connected to a libvirt-created "virtual
network", which is in reality (in the case of KVM or Xen) a Linux bridge
device that isn't connected to any physical host interface - any
connection to the outside goes through the host's IP routing stack.
The network to use is indicated in the <source> element of the guest's
interface xml: <source network='mynetwork'/>. Because the name
'mynetwork' is controlled by libvirt, it's perfectly reasonable to
assume that the same network name could be available on another host
that is accepting a migrated guest.
2) type='bridge'
The guest's network interface is connected to a bridge device (eg "br0")
that has already been configured in the host's network config files (eg,
in /etc/sysconfig/network-scripts). This bridge is itself connected to
the outside via a physical host interface, eg "eth0", *NOT* through the
hosts IP routing stack.
The bridge to use is indicated in the source with <source
bridge='br0'/>. Although the naming of the bridge is outside the scope
of libvirt, it is at least possible to setup all hosts to have the same
bridge name (so that a guest could be migrated from one host to another).
3) type='direct'
The guest's network interface is connected directly to a physical
interface (eg "eth0") with macvtap, or sometimes to a virtual function
("VF") of a physical interface (which is also really just another
interface, from the software point of view).
The interface to use is indicated with <source interface='eth0'
mode='something'/> In this case, the interface name is determined by the
host OS and cannot be arbitrarily changed. Also a host will have
multiple interfaces / VFs available to guests, and in some modes may
allow only a single guest to connect to a given interface (implying that
the interface used by a guest when on one host will probably not be
available when migrating to another). So in order to have flexible
migration from one host to another, an abstraction to allow the guest
XML to use the same name on all hosts must be introduced.
Three possible methods for providing this abstraction come to mind:
Option 1
-----------
(Be forewarned that Option 1 & 2 are shown here mainly to illustrate my
thought process while arriving at my preferred Option - 3 :-)
In a manner similar to the way the vnet%d tap devices are created, name
the interface with an embedded variable (eg "eth%d") (plus attributes
for min and max %d) and let the underlying code in libvirt search
for/reserve an appropriate device>
This is the simplest to code/configure, but does not allow a) more
complex names (eg, interface names as determined by biosdevname can be
of the form "pci%dp%d_%d"), b) multiple ranges, c) oversubscribing of
interfaces (it is possible, although sub-optimal, to connect multiple
guest interfaces to a single host interface with macvtap).
VERDICT: looks ugly, not flexible enough.
Option 2
-----------
create a new class of libvirt XML config to describe a pool of network
interfaces, and reference this pool in the guest interface element:
<interface type='interfacePool'>
<source pool='red-network'/>
...
</interface>
The problem with this is that it requires a new API for
defining/undefining/etc management of "interface pools". Also, it
wouldn't allow (for example) one host to use a pool of macvtap addresses
to connect guests, and another host to use a host bridge for the same
connection (obviously, such a non-uniform setup wouldn't be desirable in
a large host farm, but may be encountered in some smaller setup)
VERDICT: creates more API clutter (ie extra work *and* confusion for
users). Is "flexible enough" for current motivation, but unnecessarily
limiting, eg doesn't help the model to be more easily adapted to VMWare etc.
Option 3
-----------
Up to now we've only discussed the need for separating the host-specific
config (<source> element) in the case of type='direct' interfaces (well,
in reality I've gone back and edited this document so many times that is
no longer true, but play along with me! :-). But it really is a problem
for all interface types - all of the information currently in the
guest's interface <source> element really is tied to the host, and
shouldn't be defined in detail in the guest XML; it should instead be
defined once for each host, and only referenced by some name in the
guest XML; that way as a guest moves from host to host, it will
automatically adjust its connection to match the new environmant.
As a more general solution, instead of having the special new
"interfacePool" object in the config, what if the XML for "network was
expanded to mean "any type of guest network connection" (with a new
"type='xxx'" attribute at the toplevel to indicate which type), not just
"a private bridge optionally connected to the real world via routing/NAT"?
If this was the case, the guest interface XML could always be, eg:
<interface type='network'>
<source network='red-network'/>
...
</interface>
and depending on the network config of the host the guest was migrated
to, this could be either a direct (macvtap) connection via an interface
allocated from a pool (the pool being defined in the definition of
'red-network'), a bridge (again, pointed to by the definition of
'red-network', or a virtual network (using the current network
definition syntax). This way the same guest could be migrated not only
between macvtap-enabled hosts, but from there to a host using a bridge,
or maybe a host in a remote location that used a virtual network with a
secure tunnel to connect back to the rest of the red-network. (Part of
the migration process would of course check that the destination host
had a network of the proper name, and fail if it didn't; management
software at a level above libvirt would probably filter a list of
candidate migration destinations based on available networks, and only
attempt migration to one that had the matching network available).
Examples of 'red-network' for different types of connections (all of
these would work with the interface XML given above):
<!-- Existing usage - a libvirt virtual network -->
<network> <!-- (you could put "type='virtual'" here for
symmetry) -->
<name>red-network</name>
<bridge name='virbr0'/>
<forward mode='route'/>
...
</network>
<!-- The simplest - an existing host bridge -->
<network type='bridge'>
<name>red-network</name>
<bridge name='br0'/>
</network>
<network type='direct'>
<name>red-network</name>
<source mode='vepa'>
<!-- define the pool of available interfaces here. Interfaces may have -->
<!-- parameters associated with them, eg max number of simultaneous
guests -->
</source>
<!-- add any other elements from the guest interface XML that are tied
to -->
<!-- the host here (virtualport ?) (of course if they're host specific,
they -->
<!-- should have been in <source> in the first place!!) -->
</network>
I know there may be some resistance to this expansion of the usage of
<network>, but I think it does fit in with the current usage properly,
and is preferable to adding an entire new class of API just to define a
pool of interfaces.
Open questions:
1) What should the <pool> element inside network/source look like.
Making each interface in the pool a separate element, with possible
attributed, would be the simplest to code, but would get tedious on a
system with, for example, an ethernet card with 64 VFs. On the other
hand, just parameterizing a string (eth%d) is inadequate, eg, when there
are multiple non-contiguous ranges.
2) do we need a "max connections" for each interface in a pool of
macvtap interfaces? Or should we just overload them in a round-robin
fashion unless mode='passthru' (a new mode which requires only one guest
per interface).
3) What about the parameters in the <virtualport> element that are
currently used by vepa/vnlink. Do those belong with the host, or with
the guest?
4) Are there other <network> types that we want? Perhaps the recent
proposal for IPSec / secure tunnels could be incorporated as a new
network type (or maybe it could just be the standard "virtual" type,
with a tunnel as the forward device).