Many long years ago (April 2010), soon after "vhost" in-kernel packet
processing was added to the virtio-net driver, people running RHEL5
virtual machines with a virtio-net interface connected via a libvirt
virtual network noticed that when vhost packet processing was enabled,
their VMs could no longer get an IP address via DHCP - the guest was
ignoring the DHCP response packets sent by the host.
The (as danpb calls them) "gory details" of this are chronicled here:
https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html
but basically it was because the checksum of packets wasn't being
fully computed on the host side (because the host had checksum
offloading enabled and thought that it would be taken care of later,
e.g. with NIC hardware), while these packets going from a tap device
to a virtio-net NIC in a guest wouldn't get that service, and the
packets would arrive with a "bad checksum".
The "fix" for this ended up being that iptables added a new
"--checksum-fill" action, and libvirt added an iptables rule for each
virtual network to match DHCP response packets and perform
--checksum-fill.
In the meantime, the ISC DHCP package (which contains the dhclient
program that had been rejecting the bad checksum packets) made a
separate fix to their dhclient which caused it to accept packets
anyway even if they didn't have a proper checksum (NB: that's not a
full explanation, and possibly not accurate). The word at the time
from those "in the know" was that the bad checksum problem was really
specific to ISC's dhclient, and so once their fix was in use
everywhere dhclient was used, the problem would be a thing of the past
and the checksum fixup iptables rules would no longer be needed (but
would otherwise be harmless if it was still there).
Based on this information (and also due to the opinion that fixing the
problem by having iptables modify the packet checksum was the wrong
way to fix things), the nftables developers made the decision to not
implement an equivalent to --checksum-fill in nftables. As a result,
when I wrote the nftables firewall backend for libvirt virtual
networks, it didn't add in any rule to "fix" broken UDP checksums
(after all, that was fixed somewhere else 14 years ago, right???)
Cut to last week, when Rich Jones was doing routine testing using
Fedora 40 (the first Fedora release to use the nftables backend of
libvirt's network driver by default) and a FreeBSD guest - for "some
strange reason", the FreeBSD guest was unable to get an IP address
from DHCP!!
https://www.spinics.net/linux/fedora/libvirt-users/msg14356.html
A few quick tests proved that it was the same old "bad checksum"
problem from 2010 come back to haunt us.
After some discussion with Phil Sutter and Eric Garver (nftables
people), they suggested that, while nftables doesn't have an action
that will *compute* the checksum of a packet, it *does* have an action
that will set the checksum to 0, and that maybe we should try
that. Then Phil tried it himself by manually adding such a rule to a
running system, and verified that it did fix the issue at least for
FreeBSD guests.
So over the weekend I came up with a patch to add a checksum 0 rule to
the rules setup for each virtual network. This is that patch.
I have so far verified that this patch enables FreeBSD to receive the
DHCP response and get an IP address, and that it hasn't *broken* this
functionality for a random old Fedora image I had (Fedora 27!?!?! I
really need to update my test images!!). Before pushing it I would
like to verify that zeroing the checksum of DHCP response packets
doesn't break any other guest, so I would appreciate the help of
anyone who could build and install libvirt with this patch and let me
know of both successes and failures of any guest to acquire an IP
address with DHCP. Once I've received enough positive reports (and 0
negative reports!) then we can think about pushing this patch (and
also backporting it downstream to Fedora 40)
Signed-off-by: Laine Stump <laine(a)redhat.com>
---
src/network/network_nftables.c | 69 +++++++++++++++++++
.../forward-dev-linux.nftables | 16 +++++
.../isolated-linux.nftables | 16 +++++
.../nat-default-linux.nftables | 16 +++++
.../nat-ipv6-linux.nftables | 16 +++++
.../nat-ipv6-masquerade-linux.nftables | 16 +++++
.../nat-many-ips-linux.nftables | 16 +++++
.../nat-port-range-ipv6-linux.nftables | 16 +++++
.../nat-port-range-linux.nftables | 16 +++++
.../nat-tftp-linux.nftables | 16 +++++
.../route-default-linux.nftables | 16 +++++
11 files changed, 229 insertions(+)
diff --git a/src/network/network_nftables.c b/src/network/network_nftables.c
index f8b5ab665d..5523207269 100644
--- a/src/network/network_nftables.c
+++ b/src/network/network_nftables.c
@@ -51,6 +51,7 @@ VIR_LOG_INIT("network.nftables");
#define VIR_NFTABLES_FWD_OUT_CHAIN "guest_output"
#define VIR_NFTABLES_FWD_X_CHAIN "guest_cross"
#define VIR_NFTABLES_NAT_POSTROUTE_CHAIN "guest_nat"
+#define VIR_NFTABLES_MANGLE_POSTROUTE_CHAIN "postroute_mangle"
/* we must avoid using the standard "filter" table as used by
* iptables, as any subsequent attempts to use iptables commands will
@@ -106,6 +107,10 @@ nftablesGlobalChain nftablesChains[] = {
/* chains for NAT rules */
{NULL, VIR_NFTABLES_NAT_POSTROUTE_CHAIN, "{ type nat hook postrouting priority
100; policy accept; }"},
+
+ /* chain for "mangle" rules that modify packets (e.g. 0 out UDP checksums)
*/
+ {NULL, VIR_NFTABLES_MANGLE_POSTROUTE_CHAIN, "{ type filter hook postrouting
priority 0; policy accept; }"},
+
};
@@ -644,6 +649,44 @@ nftablesAddDontMasquerade(virFirewall *fw,
}
+/**
+ * nftablesAddOutputFixUdpChecksum:
+ *
+ * Add a rule to @fw that will 0 out the checksum of udp packets
+ * output from @iface with destination port @port.
+
+ * Zeroing the checksum of a UDP packet tells the receiving end "you
+ * don't need to validate the checksum", which is useful in cases
+ * where the host (sender) thinks that packet checksums will be
+ * computed elsewhere (and so leaves a partially computed checksum in
+ * the packet header) while the guest (receiver) thinks that the
+ * checksum has already been fully computed; in the meantime none of
+ * the code in between has actually finished computing the
+ * checksum.
+ *
+ * An example of this is DHCP response packets from host to
+ * guest. If the checksum of each of these packets isn't zeroed, then
+ * many guests (e.g. FreeBSD) will drop them with reason BAD CHECKSUM;
+ * if the packets arrive at those guests with a checksum of 0, they
+ * will happily accept the packet.
+ */
+static void
+nftablesAddOutputFixUdpChecksum(virFirewall *fw,
+ const char *iface,
+ int port)
+{
+ g_autofree char *portstr = g_strdup_printf("%d", port);
+
+ virFirewallAddCmd(fw, VIR_FIREWALL_LAYER_IPV4,
+ "insert", "rule", "ip",
+ VIR_NFTABLES_PRIVATE_TABLE,
+ VIR_NFTABLES_MANGLE_POSTROUTE_CHAIN,
+ "oif", iface, "udp", "dport",
portstr,
+ "counter", "udp", "checksum",
"set", "0",
+ NULL);
+}
+
+
static const char networkLocalMulticastIPv4[] = "224.0.0.0/24";
static const char networkLocalMulticastIPv6[] = "ff02::/16";
static const char networkLocalBroadcast[] = "255.255.255.255/32";
@@ -901,6 +944,30 @@ nftablesAddGeneralFirewallRules(virFirewall *fw,
}
+static void
+nftablesAddChecksumFirewallRules(virFirewall *fw,
+ virNetworkDef *def)
+{
+ size_t i;
+ virNetworkIPDef *ipv4def;
+
+ /* Look for the first IPv4 address that has dhcp or tftpboot
+ * defined. We support dhcp config on 1 IPv4 interface only.
+ */
+ for (i = 0; (ipv4def = virNetworkDefGetIPByIndex(def, AF_INET, i)); i++) {
+ if (ipv4def->nranges || ipv4def->nhosts)
+ break;
+ }
+
+ /* If we are doing local DHCP service on this network, add a rule
+ * that will fixup the checksum of DHCP response packets back to
+ * the guests.
+ */
+ if (ipv4def)
+ nftablesAddOutputFixUdpChecksum(fw, def->bridge, 68);
+}
+
+
static int
nftablesAddIPSpecificFirewallRules(virFirewall *fw,
virNetworkDef *def,
@@ -952,6 +1019,8 @@ nftablesAddFirewallRules(virNetworkDef *def, virFirewall
**fwRemoval)
return -1;
}
+ nftablesAddChecksumFirewallRules(fw, def);
+
if (virFirewallApply(fw) < 0)
return -1;
diff --git a/tests/networkxml2firewalldata/forward-dev-linux.nftables
b/tests/networkxml2firewalldata/forward-dev-linux.nftables
index 8badb74beb..9dea1a88a4 100644
--- a/tests/networkxml2firewalldata/forward-dev-linux.nftables
+++ b/tests/networkxml2firewalldata/forward-dev-linux.nftables
@@ -156,3 +156,19 @@ daddr \
224.0.0.0/24 \
counter \
return
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/isolated-linux.nftables
b/tests/networkxml2firewalldata/isolated-linux.nftables
index d1b4dac178..67ee0a2bf5 100644
--- a/tests/networkxml2firewalldata/isolated-linux.nftables
+++ b/tests/networkxml2firewalldata/isolated-linux.nftables
@@ -62,3 +62,19 @@ oif \
virbr0 \
counter \
accept
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/nat-default-linux.nftables
b/tests/networkxml2firewalldata/nat-default-linux.nftables
index 28508292f9..951a5a6d60 100644
--- a/tests/networkxml2firewalldata/nat-default-linux.nftables
+++ b/tests/networkxml2firewalldata/nat-default-linux.nftables
@@ -142,3 +142,19 @@ daddr \
224.0.0.0/24 \
counter \
return
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/nat-ipv6-linux.nftables
b/tests/networkxml2firewalldata/nat-ipv6-linux.nftables
index d8a9ba706d..617ed8b753 100644
--- a/tests/networkxml2firewalldata/nat-ipv6-linux.nftables
+++ b/tests/networkxml2firewalldata/nat-ipv6-linux.nftables
@@ -200,3 +200,19 @@ oif \
virbr0 \
counter \
accept
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables
b/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables
index a7f09cda59..a710d0e296 100644
--- a/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables
+++ b/tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables
@@ -272,3 +272,19 @@ daddr \
ff02::/16 \
counter \
return
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/nat-many-ips-linux.nftables
b/tests/networkxml2firewalldata/nat-many-ips-linux.nftables
index b826fe6134..0be5fb7e65 100644
--- a/tests/networkxml2firewalldata/nat-many-ips-linux.nftables
+++ b/tests/networkxml2firewalldata/nat-many-ips-linux.nftables
@@ -366,3 +366,19 @@ daddr \
224.0.0.0/24 \
counter \
return
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/nat-port-range-ipv6-linux.nftables
b/tests/networkxml2firewalldata/nat-port-range-ipv6-linux.nftables
index ceaed6fa40..7574356855 100644
--- a/tests/networkxml2firewalldata/nat-port-range-ipv6-linux.nftables
+++ b/tests/networkxml2firewalldata/nat-port-range-ipv6-linux.nftables
@@ -384,3 +384,19 @@ daddr \
ff02::/16 \
counter \
return
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/nat-port-range-linux.nftables
b/tests/networkxml2firewalldata/nat-port-range-linux.nftables
index 1dc37a26ec..127536e4db 100644
--- a/tests/networkxml2firewalldata/nat-port-range-linux.nftables
+++ b/tests/networkxml2firewalldata/nat-port-range-linux.nftables
@@ -312,3 +312,19 @@ oif \
virbr0 \
counter \
accept
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/nat-tftp-linux.nftables
b/tests/networkxml2firewalldata/nat-tftp-linux.nftables
index 28508292f9..951a5a6d60 100644
--- a/tests/networkxml2firewalldata/nat-tftp-linux.nftables
+++ b/tests/networkxml2firewalldata/nat-tftp-linux.nftables
@@ -142,3 +142,19 @@ daddr \
224.0.0.0/24 \
counter \
return
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
diff --git a/tests/networkxml2firewalldata/route-default-linux.nftables
b/tests/networkxml2firewalldata/route-default-linux.nftables
index 282c9542a5..be9c4f5439 100644
--- a/tests/networkxml2firewalldata/route-default-linux.nftables
+++ b/tests/networkxml2firewalldata/route-default-linux.nftables
@@ -56,3 +56,19 @@ oif \
virbr0 \
counter \
accept
+nft \
+-ae insert \
+rule \
+ip \
+libvirt_network \
+postroute_mangle \
+oif \
+virbr0 \
+udp \
+dport \
+68 \
+counter \
+udp \
+checksum \
+set \
+0
--
2.47.0