
On 10/14/24 12:06 PM, Daniel P. Berrangé wrote:
On Mon, Oct 14, 2024 at 04:55:37PM +0100, Daniel P. Berrangé wrote:
On Mon, Oct 14, 2024 at 04:37:42PM +0100, Richard W.M. Jones wrote:
On Mon, Oct 14, 2024 at 10:46:22AM -0400, Laine Stump wrote:
On 10/14/24 5:35 AM, Richard W.M. Jones wrote:
On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:
Urgh, I wonder if this is fallout from switching to NFT instead of iptables.
I can list the firewall rules if you tell me what I'm looking for ...
IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules, since they believe that all modern distros would have long ago fixed their bugs wrt mangled checksums.
That's the first thing that came to my mind too - maybe RHEL5 *isn't* the only guest OS that has this problem. (I certainly hope that isn't the case :-/)
There are two ways to test out this theory:
1) change the setting of "firewall_backend" in /etc/libvirt/network.conf to "iptables" and restart virtnetworkd
(if that does work, then switch back to nftables, restart virtnetworkd, and test again just to make sure the issue wasn't caused by some out-of-place rule)
I changed the setting between nftables and iptables a few times and I can confirm that your theory seems to be correct.
iptables =>
"5 bad udp checksums in 5 packets" message is NOT seen
FreeBSD gets an immediate DHCPOFFER and boots quickly with network
nftables =>
FreeBSD sends 5 DHCPDISCOVER messages
"5 bad udp checksums in 5 packets" reappears
FreeBSD does NOT see DHCPOFFER, although it does seem to remember the offer from the previous boot, so it does get a network connection in the end.
or
2) tell qemu to setup the virtio-net device to do its packet processing in userspace rather than the kernel. You do this by adding
<driver name='qemu'/>
to the <interface> section.
This also works (with nftables).
If I understand the trace correctly, the bad checksum originates on the Linux host (the reply sent by dnsmasq).
I need to try it again to verify, but my recollection is that (when you're using virtio-net with default settings) the checksums of DHCP packets in one direction or the other *always* show up in tcpdump as having bad checksums, but they still end up getting to the other end with a proper checksum. Sometime in the distant past I *may have* had it explained to me why this happens, but I don't recall now. Anyway, I'm just saying this so that you know the validity of the UDP checksum shouldn't be used as an indicator of whether or not things are "working".
I have to say I also don't really understand what's happening here. Isn't the Linux host sending DHCPOFFER? Why doesn't it set the UDP checksum correctly and/or why would tcpdump report it wrongly if it is setting it?
Here are the original gory details
https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html
TL;DR: we have checksum offload running so the host doesn't fill in any checksums, but DHCP client then tries to validate the non-existant checksum. Boom.
ISC DHCP fixed this in
https://github.com/isc-projects/dhcp/commit/7ff6ae5aa85754119319def3c7f225a4...
and if i'm interpreting this patch correctly, it is only fixed on Linux - most changes are in lpf.c, which is "Linux Packet Filter", and I'm assumnig that codepath won't be used on *BSD.
If correct, then the idea that checksum fixup from iptables is obsolete is incorrect, and we need it added to nftables for parity.
Requiring users to turn off vhost-net feature is horrible, not just for the user experiance of not having a broken VM out of the box, but also for performance, as checksum offloading is a good thing if you want fast networking.
Phil Sutter and Eric Garver suggested that we try 0'ing out the checksum of these packets, which is something that nftables *can* do. Phil tried it and it worked for him, so I tried it and it worked for me too. So this weekend I made a patch that will add a rule like this: nft -ae insert rule ip libvirt_network postroute_mangle \ oif virbr0 udp dport 68 counter udp checksum set 0 along with adding a single chain like this to contain all those rules: nft add chain ip libvirt_network guest_mangle \ '{ type filter hook postrouting priority 0; policy accept; }' I've tested it with FreeBSD and Fedora guests and it works properly with both. I posted the patch to devel@lists.libvirt.org https://www.spinics.net/linux/fedora/libvir/msg249203.html and am hoping that others can also test it to verify that it's not *breaking* dhcp for any other guests (I personally don't have much in the way of Windows guest images, or debian/ubuntu/suse/etc. I could spin some up but it would probably be faster (and less work for me!) if other people just tested with what they have).