Re: [PATCH v2 24/27] network: add an nftables backend for network driver's firewall construction

Tuesday, 23 April 2024

On 4/23/24 7:15 AM, Daniel P. Berrangé wrote:
...
 On Sun, Apr 21, 2024 at 10:53:32PM -0400, Laine Stump wrote:
> Support using nftables to setup the firewall for each virtual network,
> rather than iptables. The initial implementation of the nftables
> backend creates (almost) exactly the same ruleset as the iptables
> backend, determined by running the following commands on a host that
> has an active virtual network:
>
>    iptables-save >iptables.txt
>    iptables-restore-translate -f iptables.txt
>
> (and the similar ip6tables-save/ip6tables-restore-translate for an
> IPv6 network). Correctness of the new backend was checked by comparing
> the output of:
>
>    nft list ruleset
>
> when the backend is set to iptables and when it is set to nftables.
>
> This page was used as a guide:
>
>   
https://wiki.nftables.org/wiki-nftables/index.php/Moving_from_iptables_to...
>
> The only differences between the rules created by the nftables backed
> vs. the iptables backend (aside from a few inconsequential changes in
> display order of some chains/options) are:
>
> 1) When we add nftables rules, rather than adding them in the
> system-created "filter" and "nat" tables, we add them in a
private
> table (ie only we should be using it) created by us called "libvirt"
> (the system-created "filter" and "nat" tables can't be used
because
> adding any rules to those tables directly with nft will cause failure
> of any legacy application attempting to use iptables when it tries to
> list the iptables rules (e.g. "iptables -S").
>
> (NB: in nftables only a single table is required for both nat and
> filter rules - the chains for each are differentiated by specifying
> different "hook" locations for the toplevel chain of each)
>
> 2) nftables doesn't support the "checksum mangle" rule (or any
> equivalent functionality) that we have historically added to our
> iptables rules, so the nftables rules we add have nothing related to
> checksum mangling.
>
> (NB: The result of (2) is that if you a) have a very old guest (RHEL5
> era or earlier) and b) that guest is using a virtio-net network
> device, and c) the virtio-net device is using vhost packet processing
> (the default) then DHCP on the guest will fail. You can work around
> this by adding <driver name='qemu'/> to the <interface> XML for
the
> guest).
>
> There are certainly much better nftables rulesets that could be used
> instead of those implemented here, and everything is in place to make
> future changes to the rules that are used simple and free of surprises
> (e.g. the rules that are added have coresponding "removal" commands
> added to the network status so that we will always remove exactly the
> rules that were previously added rather than trying to remove the
> rules that "this build of libvirt would have added" (which will be
> incorrect the first time we run a libvirt with a newly modified
> ruleset). For this initial implementation though, I wanted the
> nftables rules to be as identical to the iptables rules as possible,
> just to make it easier to verify that everything is working.
>
> The backend can be manually chosen using the firewall_backend setting
> in /etc/libvirt/network.conf. libvirtd/virtnetworkd will read this
> setting when it starts; if there is no explicit setting, it will look
> for iptables commands on the host and, if they are found, will select
> the iptables backend (this is the default for the sake of 100%
> backward compatibility), but if iptables commands aren't found, then
> it will use the nftables backend.
>
> (Since libvirt will automatically remove all its previous filter rules
> and re-add rules using the current backend setting for all active
> networks when it starts up, and the only noticeable change in behavior
> between the iptables and nftables backends is that noted in item (2)
> above, we could instead decide to make nftables the default backend
> rather than iptables - it all depends on how important it is to work
> properly on 15 year old guest OSes using DHCP with virtio-net
> interfaces).
>
> Signed-off-by: Laine Stump <laine(a)redhat.com&gt;
> ---
>   po/POTFILES                       |   1 +
>   src/network/bridge_driver_conf.c  |   3 +
>   src/network/bridge_driver_linux.c |  18 +-
>   src/network/meson.build           |   1 +
>   src/network/network.conf          |  17 +-
>   src/network/network_nftables.c    | 940 ++++++++++++++++++++++++++++++
>   src/network/network_nftables.h    |  28 +
>   src/util/virfirewall.c            | 169 +++++-
>   src/util/virfirewall.h            |   2 +
>   9 files changed, 1174 insertions(+), 5 deletions(-)
>   create mode 100644 src/network/network_nftables.c
>   create mode 100644 src/network/network_nftables.h
>

> +    if (needRollback) {
> +        virFirewallCmd *rollback = virFirewallAddRollbackCmd(firewall,
fwCmd->layer, NULL);
> +        const char *handleStart = NULL;
> +        size_t handleLen = 0;
> +        g_autofree char *handleStr = NULL;
> +        g_autofree char *rollbackStr = NULL;
> +
> +        /* Search for "# handle n" in stdout of the nft add command -
> +         * that is the handle of the table/rule/chain that will later
> +         * need to be deleted.
> +         */

 What are the uniqueness guarantees of handle numbers. 
Each table has a monotonically increasing counter (I'd assume at least 
32 bits), so it shouldn't be a problem.

But WAIT!!! - While typing this reply I've discovered something new!

Until about 45 minutes ago, I had assumed there was a single systemwide 
counter. But based on your question, I asked egarver who told me that 
there is a counter for each table. And we create our own tables (called 
"libvirt", for both ip and ip6). I just tried manually deleting the 
libvirt table, and in that case the counter does start over from 0! :-O

At first thought this seems catastrophic, but there are some mitigating 
factors:

1) libvirt is the only user of the libvirt tables, and libvirt never 
deletes them (and nobody else should either). Anybody who deletes the 
libvirt tables could have also just killed all the qemu processes on the 
host (or corrupted the disk images or any number of other things)

2) If someone manually deletes the libvirt table (which is really simple 
for root: "nft delete table ip libvirt") and virtqemud times out and 
exits prior to the next call to the network driver, then:

   a) networking for all guests will be broken until
      virtqemud is restarted (by someone requesting
      something from the network driver).

   b) when virtqemud is restarted, it will delete/re-add all
      the rules for all the networks. If the networks'
      firewall rules are reloaded in the same order that
      the networks were originally started, then the rule
      deletions will all fail, the new rules will be added
      and everything will once again be okay.

   c) But if the order of the networks is different, then
      the 2nd network to be reloaded might possibly delete
      the rules of the 1st network (if nw2 had been the
      first to be started originally), and so guest
      networking would remain broken. This breakage would
      be fixed by a 2nd restart of virtqemud (assuming the
      networks were processed in the same order as before).

3) If, on the other hand, the libvirt table is manually deleted while 
virtqemud is still running, and someone attempts to start a new network 
before virtqemud exits, then:

   a) all guest networking will still be broken

   b) if nobody else has manually created a new libvirt
      table + all the LIBVIRT_XXX chains in the meantime,
      then attempting to start the network will lead to
      an ugly error.

   c) If someone *did* recreate *exactly* the same libvirt table
      with the same chains, then the new network would start, and
      work properly, but all the other libvirt networks would be
      unusable until virtqemud exited and restarted (and the first
      time it restarted, you might end up with the same issues
      as (1c)

So the only way this would happen is by someone with root access 
nefariously (or ignorantly) manually deleting libvirt's table, and the 
extent of the breakage would be that libvirt-managed guests would lose 
their networking until virtqemud was restarted twice in succession.

I can't decide if this is a case of "Ooh! We'd better try to protect 
against this!", or "Well, you deliberately broke it, so you get to pick 
up the pieces!"

...
 Consider the scenario:

   1. Libvirt add a rule, and the kernel reports "handle 1729".
   2. Some admin or other application deletes *our* rule.
   3. We now stop our network and delete handle 1729.

 Step (3) is obviously going to be redundant, but what happens in
 practice. Will libvirt be *guaranteed* to get an error from
 trying to dlete handle 1729. It would be bad if, between steps 2
 and 3, the kernel could assume handle 1729 to a completely
 differnt application adding rules - it would cause libvirt to
 delete someone else's rule by mistake.

 Obviously handles can be reused across reboots too. The status XML
 should be on /run which should be a tmpfs thats purged. 
That's had been the distance of my thinking about the issue - a system 
would have to add/delete "at least 4billion" rules in order to get to 
the point of reusing a handle (update - unless the libvirt table is 
deleted/re-added, but libvirt will never do that, and nobody else should 
touch the libvirt table), and reboot would clear our memory of handles 
issued by previous boots.

...
 I wonder
 though if, for safety, we should record the boot This makes me
 wonder if we should record the current system boot timestamp,
 and validate that matches before trying to delete based on handle
 number ? 
I suppose if we want to be ironclad. But is this the only issue we would 
have if someone's /run persisted across reboots? And I'm assuming this 
would likely break a lot of other software that assumes /run is cleared 
on reboot, wouldn't it?

How deep do we want to crawl down this rabbit hole? (That's not 
rhetorical, I'm truly wondering :-))

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: [PATCH v2 24/27] network: add an nftables backend for network driver's firewall construction