[Libvir] A whole tonne of networking fixes / enhancements

I've been testing the networking support and found various bugs / missing features that I thought we really need to have in the release - some of them impact the XML so we have to get this right now. - Build up in memory linked-list of network devices & disk devices in same order as they are listed in XML. Currently they are built up in reversed order, which makes the XML not be idempotent, and also means that if you have multiple NICs, what you think is eth0 ends up being eth4 and what you think is eth4 is eth0. This patch fixes the ordering to match XML. - Set the 'vlan' attribute in command line args to QEMU. This ensures that separate network devices are in fact separated. Previously if you had multiple NICs, QEMU connected them all to the same VLAN so any traffic on one NIC got sent to all NICs. Most definitely not what you want in any usual scenario, and created a traffic storm from the resultant network loops ! - Added support for networking of type='bridge'. This gives parity with equivalent Xen networking, eg <interface type='bridge'> <source dev='xenbr0'/> <target dev='vnet3'/> </interface> Will create a tap device called vnet3 and bridge it into xenbr0 - Added support for networking of type='ethernet'. This give parity with equivlent Xen networking, eg <interface type='ethernet'> <script path='/etc/qemu-ifup'/> <target dev='vnet5'/> </interface> Will create a tap device called vnet5 and run 'qemu-ifup' to setup its configuration. Think the various non-bridge Xen networking configs. - Added support for 'client', 'server', 'mcast' networking types. These are QEMU specific types, allowing unprivileged (or privileged) users to create virtual networks without TAP devices. eg two machines, one with <interface type='server'> <source address="127.0.0.1" port="5555"/> </interface> And the other with <interface type='client'> <source address="127.0.0.1" port="5555"/> </interface> Or both using multicast: <interface type='mcast'> <source address="230.0.0.1" port="5558"/> </interface> Both these options allow QEMU instancs on different physical machines to talk to each other. The multicast protocol is also compatible with the UserModeLinux multicast protocol. - Fix the 'type=network' config use the <target dev='vnet3'> element instead of a custom tapifname=vnet3 attribute - this gives consistent way to name tap devices that - most importantly - matches the Xen XML format for specifying vifname, eg <interface type='network'> <source network='default'/> <target dev='vnet2'/> </interface> Will create a tapdevice called vnet2 and connect it to the bridge device associated with the network 'default'. - Removed references to 'vde' - we're not using this explicitly - at some time in the future, we'll perhaps use VDE for doing virtual networking for unprivileged users where bridge devices are not available - Removed references to 'tap' network type - this is basically handled by the 'ethernet' network type to give XML compatability with the same functionality in the Xen backend. - The virtual network configuration currently always adds whole bunch of IPTables rules to the FORWARD/POSTROUTING chain which allow traffic from the virtual network to be masqueraded out through any active physical interface. This may be correct thing todo for the default network, but we also need the ability to create totally isolated networks (no forwarding at all), or directed networks (eg forwarding to an explicit physical device). To deal with this scenario I introduce a new element in the network XML called '<forward>'. If this is not present, then no forwarding rules are added at all. If it is present, but with no attributes then a generic rule allowing forwarding to any interface is added. If it is present and the 'dev' attribute is specific then forwarding is only allowed to that named interface. The default network XML thus now includes <forward/> So that by default we have a virtual network connected to all physical devices. - MAC addreses were not be autogenerated inside libvirt_qemud. If you don't provide a MAC address, QEMU (insanely) uses a hardcoded default. So all NICs end up with an identical MAC. We now always autogenerate a MAC address if not explicitly listed in XML. One final thing to be aware of - the Fedora Core 6 Xen kernel currently has totally fubar TCP checksum offland. So if you try to bridge a Xen guest into the libvirt virtual networking, it'll fail to get a DHCP address from dnsmasq. Even if you fix that by turning off TX checksums in Dom0, you'll get checksum failures for the actual TCP data transmission too. The only solution is to either upgrade the Dom0 kernel to a RHEL-5 vintage, or to also turn off checksumming in the guest. A new FC6 xen kernel is in the works which should hopefully fix this for real. With fixed kernel, I can easily setup virtual networks connecting both Xen PV, Xen FV and QEMU instances together. Anyway, the upshot of all this, is that we can now trivially create really complicated & fun networking layouts across QEMU & Xen, using a mixture of bridging, NAT, isolated LANs, and tunnelled VLANS :-) We really ought to document it a little though Since Mark is off on vacation for a while, I'd appreciate people taking a close look at this / actually giving it a try if you can. It is possible to create a totally isolated network using <network> <name>private</name> <uuid>d237ce44-8efa-452c-b8e6-1ae9cf53aeb1</uuid> <bridge name="virbr0" /> <ip address="192.168.122.1" netmask="255.255.255.0"> <dhcp> <range start="192.168.122.2" end="192.168.122.254" /> </dhcp> </ip> </network> And a QEMU guest with 5 (yes, 5) network cards <domain type='qemu'> <name>QEMUFirewall</name> <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> <memory>219200</memory> <currentMemory>219200</currentMemory> <vcpu>1</vcpu> <os> <type arch='i686' machine='pc'>hvm</type> <boot dev='hd'/> </os> <devices> <emulator>/usr/bin/qemu</emulator> <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest1'/> <target dev='hda'/> </disk> <interface type='network'> <source network='private'/> <target dev='vnet1'/> </interface> <interface type='bridge'> <source dev='xenbr0'/> <target dev='vnet2'/> </interface> <interface type='ethernet'> <script path='/etc/dan-test-ifup'/> <target dev='vnet3'/> </interface> <interface type='server'> <source address="127.0.0.1" port="5555"/> </interface> <interface type='mcast'> <source address="230.0.0.1" port="5558"/> </interface> <graphics type='vnc' port='-1'/> </devices> </domain> In this XML, only eth1 is connected to the hosts public facing network. The other NICs are all on various private networks. So this QEMU guest is in essence a router/firewall box. eg, you could connected various other guests to the 'private' virtual network, and the only way they could reach the outside world is via this QEMU instance doing routing. Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|

On Tue, Mar 13, 2007 at 04:28:16AM +0000, Daniel P. Berrange wrote:
I've been testing the networking support and found various bugs / missing features that I thought we really need to have in the release - some of them impact the XML so we have to get this right now.
- Build up in memory linked-list of network devices & disk devices in same order as they are listed in XML. Currently they are built up in reversed order, which makes the XML not be idempotent, and also means that if you have multiple NICs, what you think is eth0 ends up being eth4 and what you think is eth4 is eth0. This patch fixes the ordering to match XML.
- Set the 'vlan' attribute in command line args to QEMU. This ensures that separate network devices are in fact separated. Previously if you had multiple NICs, QEMU connected them all to the same VLAN so any traffic on one NIC got sent to all NICs. Most definitely not what you want in any usual scenario, and created a traffic storm from the resultant network loops !
- Added support for networking of type='bridge'. This gives parity with equivalent Xen networking, eg
<interface type='bridge'> <source dev='xenbr0'/> <target dev='vnet3'/> </interface>
Will create a tap device called vnet3 and bridge it into xenbr0
- Added support for networking of type='ethernet'. This give parity with equivlent Xen networking, eg
<interface type='ethernet'> <script path='/etc/qemu-ifup'/> <target dev='vnet5'/> </interface>
Will create a tap device called vnet5 and run 'qemu-ifup' to setup its configuration. Think the various non-bridge Xen networking configs.
- Added support for 'client', 'server', 'mcast' networking types. These are QEMU specific types, allowing unprivileged (or privileged) users to create virtual networks without TAP devices.
eg two machines, one with
<interface type='server'> <source address="127.0.0.1" port="5555"/> </interface>
And the other with
<interface type='client'> <source address="127.0.0.1" port="5555"/> </interface>
Or both using multicast:
<interface type='mcast'> <source address="230.0.0.1" port="5558"/> </interface>
Both these options allow QEMU instancs on different physical machines to talk to each other. The multicast protocol is also compatible with the UserModeLinux multicast protocol.
- Fix the 'type=network' config use the <target dev='vnet3'> element instead of a custom tapifname=vnet3 attribute - this gives consistent way to name tap devices that - most importantly - matches the Xen XML format for specifying vifname, eg
<interface type='network'> <source network='default'/> <target dev='vnet2'/> </interface>
Will create a tapdevice called vnet2 and connect it to the bridge device associated with the network 'default'.
- Removed references to 'vde' - we're not using this explicitly - at some time in the future, we'll perhaps use VDE for doing virtual networking for unprivileged users where bridge devices are not available
- Removed references to 'tap' network type - this is basically handled by the 'ethernet' network type to give XML compatability with the same functionality in the Xen backend.
- The virtual network configuration currently always adds whole bunch of IPTables rules to the FORWARD/POSTROUTING chain which allow traffic from the virtual network to be masqueraded out through any active physical interface. This may be correct thing todo for the default network, but we also need the ability to create totally isolated networks (no forwarding at all), or directed networks (eg forwarding to an explicit physical device).
To deal with this scenario I introduce a new element in the network XML called '<forward>'. If this is not present, then no forwarding rules are added at all. If it is present, but with no attributes then a generic rule allowing forwarding to any interface is added. If it is present and the 'dev' attribute is specific then forwarding is only allowed to that named interface. The default network XML thus now includes
<forward/>
So that by default we have a virtual network connected to all physical devices.
- MAC addreses were not be autogenerated inside libvirt_qemud. If you don't provide a MAC address, QEMU (insanely) uses a hardcoded default. So all NICs end up with an identical MAC. We now always autogenerate a MAC address if not explicitly listed in XML.
One final thing to be aware of - the Fedora Core 6 Xen kernel currently has totally fubar TCP checksum offland. So if you try to bridge a Xen guest into the libvirt virtual networking, it'll fail to get a DHCP address from dnsmasq. Even if you fix that by turning off TX checksums in Dom0, you'll get checksum failures for the actual TCP data transmission too. The only solution is to either upgrade the Dom0 kernel to a RHEL-5 vintage, or to also turn off checksumming in the guest. A new FC6 xen kernel is in the works which should hopefully fix this for real.
With fixed kernel, I can easily setup virtual networks connecting both Xen PV, Xen FV and QEMU instances together.
Anyway, the upshot of all this, is that we can now trivially create really complicated & fun networking layouts across QEMU & Xen, using a mixture of bridging, NAT, isolated LANs, and tunnelled VLANS :-) We really ought to document it a little though
Yup, but in general libvirt documentation really need some revamp, I guess that will be the main task for me post to the upcoming release.
Since Mark is off on vacation for a while, I'd appreciate people taking a close look at this / actually giving it a try if you can.
It is possible to create a totally isolated network using
<network> <name>private</name> <uuid>d237ce44-8efa-452c-b8e6-1ae9cf53aeb1</uuid> <bridge name="virbr0" /> <ip address="192.168.122.1" netmask="255.255.255.0"> <dhcp> <range start="192.168.122.2" end="192.168.122.254" /> </dhcp> </ip> </network>
And a QEMU guest with 5 (yes, 5) network cards
<domain type='qemu'> <name>QEMUFirewall</name> <uuid>c7a5fdbd-edaf-9455-926a-d65c16db1809</uuid> <memory>219200</memory> <currentMemory>219200</currentMemory> <vcpu>1</vcpu> <os> <type arch='i686' machine='pc'>hvm</type> <boot dev='hd'/> </os> <devices> <emulator>/usr/bin/qemu</emulator> <disk type='block' device='disk'> <source dev='/dev/HostVG/QEMUGuest1'/> <target dev='hda'/> </disk> <interface type='network'> <source network='private'/> <target dev='vnet1'/> </interface> <interface type='bridge'> <source dev='xenbr0'/> <target dev='vnet2'/> </interface> <interface type='ethernet'> <script path='/etc/dan-test-ifup'/> <target dev='vnet3'/> </interface> <interface type='server'> <source address="127.0.0.1" port="5555"/> </interface> <interface type='mcast'> <source address="230.0.0.1" port="5558"/> </interface> <graphics type='vnc' port='-1'/> </devices> </domain>
In this XML, only eth1 is connected to the hosts public facing network. The other NICs are all on various private networks. So this QEMU guest is in essence a router/firewall box. eg, you could connected various other guests to the 'private' virtual network, and the only way they could reach the outside world is via this QEMU instance doing routing.
This all sounds good, I really appreciate trying to unify as much as possible the various syntaxes. I reviewed the patch and really didn't found anything to point out. I don't have yet an up2date rawhide system to test unfortunately. Daniel -- Red Hat Virtualization group http://redhat.com/virtualization/ Daniel Veillard | virtualization library http://libvirt.org/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Hi Dan, On Tue, 2007-03-13 at 04:28 +0000, Daniel P. Berrange wrote:
static int iptablesPhysdevForward(iptablesContext *ctx, const char *iface, + const char *target, int action) { - return iptablesAddRemoveRule(ctx->forward_filter, - action, - "--match", "physdev", - "--physdev-in", iface, - "--jump", "ACCEPT", - NULL); + if (target && target[0]) { + return iptablesAddRemoveRule(ctx->forward_filter, + action, + "--match", "physdev", + "--physdev-in", iface, + "--out", target, + "--jump", "ACCEPT", + NULL); + } else { + return iptablesAddRemoveRule(ctx->forward_filter, + action, + "--match", "physdev", + "--physdev-in", iface, + "--jump", "ACCEPT", + NULL); + } }
This bit looks wrong to me. The rule is intended to allow frames from the given bridge port to be forwarded across the bridge. AFAIK --out would match against the outgoing bridge port in this case. Certainly the interface which we wish to allow IP forwarding to isn't relevant to this rule. Cheers, Mark.
participants (3)
-
Daniel P. Berrange
-
Daniel Veillard
-
Mark McLoughlin