On Tue, Jan 26, 2010 at 05:22:05PM -0500, Stefan Berger wrote:
"Daniel P. Berrange" <berrange(a)redhat.com> wrote on
01/26/2010 04:21:56
>
> libvir-list, gerhard.stenzel, Vivek Kashyap, arndb
>
> Please respond to "Daniel P. Berrange"
>
> On Mon, Jan 25, 2010 at 12:47:17PM -0500, Stefan Berger wrote:
> > Hello!
> >
> > The attached patch provides support for the Linux macvtap device for
> > Qemu by passing a file descriptor to Qemu command line similar to how
it
> > is done with a regular tap device. I have modified the network XML
code
> > to understand a definition as the following one here:
> >
> > <network>
> > <name>vepanet</name>
> > <uuid>4ebd5168-6321-4757-8397-f6e83484f402</uuid>
> > <extbridge mode='vepa' dev='eth0'/>
> > </network>
>
> I don't think this is the correct place to be adding this kind
> of configuration / functionality. The virNetworkPtr / <network>
> XML is describing a virtual network capability which is *not*
> directly connected to the LAN. It may be configured to route
> from the virtual network to the LAN, with optional NAT applied.
> So while the implementation may use a bridge device, this bridge
> is not connected to any physical device. Since VEPA is about
> directly connecting VMs to the LAN, this doesn't really fit here.
Yes, I have re-purposed the network XML to describe an external bride.
There's the following advantage to this:
- you can migrate a VM between machines that have different types of
connectivity, i.e, tap and macvtap
- pushing the eth0 into referenced XML makes it independent of the local
configuration of the host, i.e,
on the one host it may be eth0 and on the other eth1. eth0 in the above
XML could be a physical adapter,
or an SR-IOV physical adapter or virtual function of an SR-IOV adapter.
I agree that those are both good advantages, but I'm still not liking
the idea of re-purposing the network XML model for this. Unfortunately
I don't yet have a clear alternative that satisfies those goals. I rather
regret that the current stuff uses the name 'network' since it is somewhat
misleading as to its purpose :-) The best idea I can come up with so far
is to imagine a new "switch" object which would basically use the syntax
you are suggesting as extension for the 'network" object, but without all
the existing bits todo with NAT/routing/DHCP. A 'switch' object might be
something that is also useful for the parallel work being done in firewall
filters in libvirt.
I don't think we neccessarily need to consider this mutually exclusive wrt
the direct syntax I suggest for VMs. We could start with the direct syntax
in VMs since that's pretty quick & easy to implement, and then introduce
the idea of a 'switch' object later to give us an alternate host-independant
config.
> In the context of bridging a guest to a plain ethernet device,
these
> fit together as follows
>
> 1. The virNodeDevPtr APIs are used to discover what physical network
> devices exist, 'eth0'
>
> 2. The virInterfacePtr APIs are used to create a bridge on the host
> br0, containing the physical device 'eth0'
Yes, I suppose this is all done via 'virsh iface-*' commands.
Yes, that's correct.
> So unless I'm missing something major in my reasoning here I
think
> in the domain XML we end up with two possible configs for guest
> network interfaces
>
>
> 1. The current one using plain Linux software bridging, which
> we can't change in an incompatible way
>
> <interface type='bridge'/>
> <source bridge='br0'/>
> <target dev='vnet0'/>
> </interface>
>
> Here, the source device is a bridge previously setup
> to have a physical device enslaved (regular or SR-IOV)
> The target device is the plain TAP device
plain TAP device -> no need for change here.
>
> 2. A new one using hardware bridging, which we can freely
> define for our new needs
>
> <interface type='direct'/>
> <source dev='eth0' mode='vepa|pepa|bridge'/>
> <target dev='vnet0'/>
> </interface>
In contrast to the ACLs ( :-) ), where I would regard the ACLs as
VM-attached data that ideally would migrate along when the VM migrates
between hosts, in the case of this network attachment I'd not put
host-specific information in the domain XML as is the case here with the
'eth0'. Who knows, maybe it's going to be the SR-IOV virtual adapter eth10
on the destination side? With the redirection into the network XML (or
similar) one could define a network XML per VM, create that with
host-specific information on the destination, i.e., eth10, and then
migrate the VM previously linked to eth0 via macvtap that then connected
via eth10. It's more work for upper layers, but if there is a need for
optimization for throughput, then maybe that's the only way that
optimizations can be done. Otherwise if all VMs in the data center are
created with above XML and eth0 then they will all need to stay on eth0 I
suppose.
In this context, how will the virtual functions of SR-IOV be
administered
and given to VMs. I suppose their management would be left up to higher
layers?
As a general rule we leave policy decisions to the management apps and
merely provide them mechanism to implement their desired policy.
>
> Here, source device is a physical device (regular or
> SR-IOV). The target device is a macvtap device.
>
> In both cases the TAP or macvtap device is created on the fly when the
> VM is booted & destroyed at shutdown (either by the kernel, or manually
> by libvirt for macvtap).
Yes, as long as libvirt is running when the VM goes down it can delete the
macvtap device. If not, I am trying to delete all macvtap devices at VM
startup using the MAC address of the VM (which the macvtap inherits) as
search/delete criterion.
That is more than sufficient - we already assume libvirtd is running at
time of guest shutdown . We don't officially support the scneario of a
guest shutting down while libvirtd is stopped - just make best effort to
cope.
> >
> > Index: libvirt/src/util/macvtap.c
> > ===================================================================
> > --- /dev/null
> > +++ libvirt/src/util/macvtap.c
> > @@ -0,0 +1,664 @@
> > +/*
> > + * Copyright (C) 2010 IBM Corporation
> > + *
> > + * This library is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later version.
> > + *
> > + * This library is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with this library; if not, write to the Free
Software
> > + * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
02111-1307 USA
> > + *
> > + * Authors:
> > + * Stefan Berger <stefanb(a)us.ibm.com>
> > + */
> > +
> > +#include <config.h>
> > +
> > +#if defined(WITH_MACVTAP)
>
> [snip].
>
> I've not had time to look at the details of this macvtap.c code yet,
> but I assume its doing all you need :-) Is there any benefit to using
> the network libnl.so library, rather than the ioctl()'s directly ?
Haven't looked at that library and its API, but can do so if it's
documented. Would it be ok to keep the current implementation, though?
I don't mind either way. I'll leave the decision upto you since you
know more about this code than me :-) So if you prefer to use the
current code that's fine.
Daniel
--
|: Red Hat, Engineering, London -o-
http://people.redhat.com/berrange/ :|
|:
http://libvirt.org -o-
http://virt-manager.org -o-
http://ovirt.org :|
|:
http://autobuild.org -o-
http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|