On 5/3/23 11:40 AM, Daniel P. Berrangé wrote:
On Sun, Apr 30, 2023 at 11:19:15PM -0400, Laine Stump wrote:
> When I first started on this (long, protracted, repeatedly interrupted
> for extended periods - many of these patches are > a year old) task, I
> considered doing an all-at-once complete replacement of iptables with
> nftables, since all the Linux distros we support have had nftables for
> several years, and I'm pretty sure nobody has it disabled (not even
> sure if it's possible to disable nftables while still enabling
> iptables, since they both use xtables in the kernel). But due to
> libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit
> fd5b15ff all the way back in July 2010 for details) which has no
> equivalent in nftables rules (and we don't *want* it to!!), and the
> desire to be able to easily switch back to iptables in case of an
> unforeseen regression, we decided that both iptables and nftables need
> to be supported (for now), with the default (for now) remaining as
> iptables.
What happens if you use iptables -j CHECKSUM --checksum-fill, and
this is just the iptables compat shim that is secretly talking to
NFT in the kernel ? Is the checksum stuff just ignored entirely
then ?
If so, then the key decision factor for us, is whether any of the
supported distros still officially support use of the iptables KMOD
instead of nft KMOD.
> So, what I've ended up with is:
>
> 1) a network.conf file (didn't exist before) with a single setting
> "firewall_backend". If unset, the network driver tries to use iptables
> on the backend, and if that's missing, then tries to use nftables.
For most distros I feel like nftables should be the default
and not require a user opt in. That raises the question of
how do we deal with the upgrade path.
Historically when starting libvirt we'll blow away our existing
iptables rules and create them fresh. For an upgrade path we'll
need a variant which blows away iptables rules and instead creates
nftables rules. If that works, then we could default to nftables
without user config.
Since my patches do that, I would be totally fine with preferring
nftables over iptables when no config is present. It would definitely
give a better feeling of "modernizing" if we could do that.
(Actually, I was considering that when I tweak Eric Garver's patches for
native firewalld networks, I would prefer firewalld over nftables if
firewalld.service is running. In that case, the order of preference
(still overrideable, of course) would be firewalld->nftables->iptables,
which from a functionality standpoint makes the most sense.)
> We had spent some time in IRC discussing different ways of using new
> functionality available in nftables to make a more
> efficient/performant implemention of the desired filtering, and there
> are some really great possibilities that need to be explored, but in
> the end there were too many details up in the air, and I decided that
> it would be more "accomplishable" (coined a new word there!) to first
> replicate existing behavior with nftables, but do it inside a
> framework that makes it easy to modify the details in the future (in
> particular making it painless to switch back and forth between builds
> with differing filter models at runtime) - this way we'll be able to
> separate the infrastructure work from the details of the rules (which
> we can then more easily work on and experiment with). (This implies
> that the main objective right now is "get rid of iptables
> dependencies", not "make the filtering faster and more efficient").
I feel like filtering wrt virtual networks isn't our main
performance bottleneck. Most machines only have a couple
of virtual networks, so most host traffic only has to
match a few rules.
With nwfilter though, if we have 1000 VMs we'll have
1000 top level rules that need evaluating for every
packet. IOW, the performance optimizations allowed
by nftables are much more relevant to nwfilter.
That's a good point, and makes me feel better about not trying to
immediately optimize the rules created here. (although I do recall
someone once a long time ago who filed a bug about poor performance when
they had > 500 virtual networks :-P)
(OTOH, I've been reading through your response to 18/28, and what you're
suggesting there, in the name of simpler removal rather than runtime
performance, is changing what rules are generated in the initial
version. I'm still wrapping my head around that - I had confused "chain"
with "hook" in a strange way, and thought that the behavior was
different than it apparently is).
> * allows switching between iptables/nftables backends without
> rebooting or restarting networks/guests.
>
> Because the commands required to remove a network's filter rules are
> now saved in the network status XML, each time libvirtd (or
> virtnetworkd) is restarted, it will execute exactly the commands
> needed to remove the filter rules that had been added by the
> previous libvirtd/virtnetworkd (rather than just making a guess, as
> we've always done up until now), and then add new rules using the
> current backend+binary's set of rules (while also saving the info
> needed for future removal of these new rules back into the network's
> status XML).
Ok that's answering my question about upgrades then. It looks like we
should be able to default to nftables out of the box, if we assume
that absence of info in the network status XML implies iptables.
Yep, that's what the code does - if there is no <fwRemoval> element,
then we assume that the current network instance was setup with iptables
rules.
> * virFirewall does *not* provide a backend-agnostic interface [this is fine]
>
> * We need to maintain a backward-compatible API for virFirewall so
> that we don't have to touch nwfilter code. Trying to make its API
> backend-agnostic would require individually considering/changing
> every nwfilter use of virFirewall.
>
> * instead virFirewall objects are just a way to build a collection
> of commands to execute to build a firewall, then execute them
> while collecting info for and building a collection of commands
> that will tear down that firewall in the future.
>
> Do I want to "fix" this in the future by making virFirewall a higher
> level interface that accepts tokens describing the type of rule to
> add (rather than backend-specific arguments to a backend-specific
> command)? No. I think I like the way virFirewall works (as
> described in that previous bullet-point), instead I'm thinking that
> it is just slightly mis-named - I've lately been thinking of it as a
> "virNetFilterCmdList". Similarly, the virFirewallRules that it has a
> list of aren't really "rules", they are better described as
commands
> or actions, so maybe they should be renamed to virNetfilterCmd or
> virNetfilterAction. But that is just cosmetic, so I didn't want to
> get into it in these patches (especially in case someone disagrees,
> or has a better idea for naming).
The main thing I'd like to see is to maintain the clean split of
code between what is general purpose, and what is specific to the
virtual network driver and what is specific to the nwfilter
driver. AFAICT, this series mixes that up with circular dependancies
between viriptables.c, virfirewall.c and virnetfilter.c.
Yeah, after reading through your comments about that part, I agree. It
all started because viriptables.[ch] were originally put in util instead
of network, and I was too focused on maintaining the status quo where
possible. As for the circular dependency, that was one of the bits where
I felt a bit dirty writing the code, as mentioned earlier. I'm going to
try and eliminate that.
I'd like virfirwall.c to remain indepent of anything related to
the virtual network driver.
Definitely. And if viriptables is moved from util to network, then we
*must* eliminate this bit.