On Wed, Nov 07, 2018 at 08:48:16AM +0000, Nikolay Shirokovskiy wrote:
Hi, all!
There is performance issue with network filters and broadcast ethernet traffic.
If L2 segment is large enough (several thousands of VMs) then there is a lot of
broadcast ARP traffic (about frames 100/s). As aresult on host with several hundreds
VMs (say 300) we have kernel thread eating 100% of CPUs just for checking this traffic
against firewall rules. The problem is if there are rules in ebtables POSTROUTING chain
(clean-traffic is example of such filter) then when every single broadcast frame turns
into
300, one for every distinct bridge port and then each one of these 300 is checked
against
300 / 2 rules average to find chain for that port. As a result we have 100 * 300 * 300 /
2
= 4.5 * 10^6 rules checks per second. Kernel does not spread this workload onto
different CPUs and anyway this is wasting CPUs!
Yes, this is a key limitation of the traditional ebtables/ip[6]tables commands.
There's no efficient way to associate rules with specific devices.
This is apparently solved with nftables if you setup your chains to match on
the 'netdev' family.
The simple solution is to put rules that ACCEPT ARP traffic into
POSTROUTING
itself before any port specific chains. But this will affect non-VM ports too
and host itself. So can we instead make a distinct network namespace for every
VM and put tap there, next add the bridge into the namespace too so we can apply
ebtables rules there and insert tap into the bridge. Finally connect the bridges
in root namespace and VM namespace by veth pair. As result in the situation
described above each cloned frame will be cheched only againt rules for this
very VM. The regular TCP traffic will have same benefits. On the other hand we
need a bridge and veth pair for every VM and some CPU power to process this extra
traffic path.
Yeah, I don't really like the idea of introducing extra devices into
the I/O path for every NIC, as it will burn extra CPU and introduce
latency.
I don't really have a particular suggestion for fixing the perf problem
offhand, other than my note about nftables supposedly allowing us to fix
this problem. RHEL-8 & Fedora 30 will both be nftables based, so it is
imminently available as a solution for libvirt, assuming it does in fact
let us solve the perf problem.
The hard thing is that we'll need some significant work in the nwfilter
driver to port it to native nft commands - just using the legacy iptbles
compat tools uses nft, but not a way that would let us get the perf
benefit IIUC.
Regards,
Daniel
--
|:
https://berrange.com -o-
https://www.flickr.com/photos/dberrange :|
|:
https://libvirt.org -o-
https://fstop138.berrange.com :|
|:
https://entangle-photo.org -o-
https://www.instagram.com/dberrange :|