
On Tue, Jan 26, 2010 at 05:22:05PM -0500, Stefan Berger wrote:
"Daniel P. Berrange" <berrange@redhat.com> wrote on 01/26/2010 04:21:56
libvir-list, gerhard.stenzel, Vivek Kashyap, arndb
Please respond to "Daniel P. Berrange"
On Mon, Jan 25, 2010 at 12:47:17PM -0500, Stefan Berger wrote:
Hello!
The attached patch provides support for the Linux macvtap device for Qemu by passing a file descriptor to Qemu command line similar to how
it
is done with a regular tap device. I have modified the network XML code to understand a definition as the following one here:
<network> <name>vepanet</name> <uuid>4ebd5168-6321-4757-8397-f6e83484f402</uuid> <extbridge mode='vepa' dev='eth0'/> </network>
I don't think this is the correct place to be adding this kind of configuration / functionality. The virNetworkPtr / <network> XML is describing a virtual network capability which is *not* directly connected to the LAN. It may be configured to route from the virtual network to the LAN, with optional NAT applied. So while the implementation may use a bridge device, this bridge is not connected to any physical device. Since VEPA is about directly connecting VMs to the LAN, this doesn't really fit here.
Yes, I have re-purposed the network XML to describe an external bride.
There's the following advantage to this:
- you can migrate a VM between machines that have different types of connectivity, i.e, tap and macvtap
- pushing the eth0 into referenced XML makes it independent of the local configuration of the host, i.e, on the one host it may be eth0 and on the other eth1. eth0 in the above XML could be a physical adapter, or an SR-IOV physical adapter or virtual function of an SR-IOV adapter.
I agree that those are both good advantages, but I'm still not liking the idea of re-purposing the network XML model for this. Unfortunately I don't yet have a clear alternative that satisfies those goals. I rather regret that the current stuff uses the name 'network' since it is somewhat misleading as to its purpose :-) The best idea I can come up with so far is to imagine a new "switch" object which would basically use the syntax you are suggesting as extension for the 'network" object, but without all the existing bits todo with NAT/routing/DHCP. A 'switch' object might be something that is also useful for the parallel work being done in firewall filters in libvirt. I don't think we neccessarily need to consider this mutually exclusive wrt the direct syntax I suggest for VMs. We could start with the direct syntax in VMs since that's pretty quick & easy to implement, and then introduce the idea of a 'switch' object later to give us an alternate host-independant config.
In the context of bridging a guest to a plain ethernet device, these fit together as follows
1. The virNodeDevPtr APIs are used to discover what physical network devices exist, 'eth0'
2. The virInterfacePtr APIs are used to create a bridge on the host br0, containing the physical device 'eth0'
Yes, I suppose this is all done via 'virsh iface-*' commands.
Yes, that's correct.
So unless I'm missing something major in my reasoning here I think in the domain XML we end up with two possible configs for guest network interfaces
1. The current one using plain Linux software bridging, which we can't change in an incompatible way
<interface type='bridge'/> <source bridge='br0'/> <target dev='vnet0'/> </interface>
Here, the source device is a bridge previously setup to have a physical device enslaved (regular or SR-IOV) The target device is the plain TAP device
plain TAP device -> no need for change here.
2. A new one using hardware bridging, which we can freely define for our new needs
<interface type='direct'/> <source dev='eth0' mode='vepa|pepa|bridge'/> <target dev='vnet0'/> </interface>
In contrast to the ACLs ( :-) ), where I would regard the ACLs as VM-attached data that ideally would migrate along when the VM migrates between hosts, in the case of this network attachment I'd not put host-specific information in the domain XML as is the case here with the 'eth0'. Who knows, maybe it's going to be the SR-IOV virtual adapter eth10 on the destination side? With the redirection into the network XML (or similar) one could define a network XML per VM, create that with host-specific information on the destination, i.e., eth10, and then migrate the VM previously linked to eth0 via macvtap that then connected via eth10. It's more work for upper layers, but if there is a need for optimization for throughput, then maybe that's the only way that optimizations can be done. Otherwise if all VMs in the data center are created with above XML and eth0 then they will all need to stay on eth0 I suppose.
In this context, how will the virtual functions of SR-IOV be administered and given to VMs. I suppose their management would be left up to higher layers?
As a general rule we leave policy decisions to the management apps and merely provide them mechanism to implement their desired policy.
Here, source device is a physical device (regular or SR-IOV). The target device is a macvtap device.
In both cases the TAP or macvtap device is created on the fly when the VM is booted & destroyed at shutdown (either by the kernel, or manually by libvirt for macvtap).
Yes, as long as libvirt is running when the VM goes down it can delete the macvtap device. If not, I am trying to delete all macvtap devices at VM startup using the MAC address of the VM (which the macvtap inherits) as search/delete criterion.
That is more than sufficient - we already assume libvirtd is running at time of guest shutdown . We don't officially support the scneario of a guest shutting down while libvirtd is stopped - just make best effort to cope.
Index: libvirt/src/util/macvtap.c =================================================================== --- /dev/null +++ libvirt/src/util/macvtap.c @@ -0,0 +1,664 @@ +/* + * Copyright (C) 2010 IBM Corporation + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free
Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + * + * Authors: + * Stefan Berger <stefanb@us.ibm.com> + */ + +#include <config.h> + +#if defined(WITH_MACVTAP)
[snip].
I've not had time to look at the details of this macvtap.c code yet, but I assume its doing all you need :-) Is there any benefit to using the network libnl.so library, rather than the ioctl()'s directly ?
Haven't looked at that library and its API, but can do so if it's documented. Would it be ok to keep the current implementation, though?
I don't mind either way. I'll leave the decision upto you since you know more about this code than me :-) So if you prefer to use the current code that's fine. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|