On 10/14/24 12:06 PM, Daniel P. Berrangé wrote:
On Mon, Oct 14, 2024 at 04:55:37PM +0100, Daniel P. Berrangé wrote:
> On Mon, Oct 14, 2024 at 04:37:42PM +0100, Richard W.M. Jones wrote:
>> On Mon, Oct 14, 2024 at 10:46:22AM -0400, Laine Stump wrote:
>>> On 10/14/24 5:35 AM, Richard W.M. Jones wrote:
>>>> On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:
>>>>>
>>>>> Urgh, I wonder if this is fallout from switching to NFT instead of
iptables.
>>>>
>>>> I can list the firewall rules if you tell me what I'm looking for
...
>>>>
>>>>> IIUC, the NFT kernel maintainers didn't implement for checksum
fixup rules,
>>>>> since they believe that all modern distros would have long ago fixed
their
>>>>> bugs wrt mangled checksums.
>>>
>>> That's the first thing that came to my mind too - maybe RHEL5
>>> *isn't* the only guest OS that has this problem. (I certainly hope
>>> that isn't the case :-/)
>>>
>>> There are two ways to test out this theory:
>>>
>>> 1) change the setting of "firewall_backend" in
>>> /etc/libvirt/network.conf to "iptables" and restart virtnetworkd
>>>
>>> (if that does work, then switch back to nftables, restart
>>> virtnetworkd, and test again just to make sure the issue wasn't
>>> caused by some out-of-place rule)
>>
>> I changed the setting between nftables and iptables a few times and I
>> can confirm that your theory seems to be correct.
>>
>> iptables =>
>>
>> "5 bad udp checksums in 5 packets" message is NOT seen
>>
>> FreeBSD gets an immediate DHCPOFFER and boots quickly with network
>>
>> nftables =>
>>
>> FreeBSD sends 5 DHCPDISCOVER messages
>>
>> "5 bad udp checksums in 5 packets" reappears
>>
>> FreeBSD does NOT see DHCPOFFER, although it does seem to remember
>> the offer from the previous boot, so it does get a network
>> connection in the end.
>>
>>> or
>>>
>>> 2) tell qemu to setup the virtio-net device to do its packet
>>> processing in userspace rather than the kernel. You do this by
>>> adding
>>>
>>> <driver name='qemu'/>
>>>
>>> to the <interface> section.
>>
>> This also works (with nftables).
>>
>>>> If I understand the trace correctly, the bad checksum originates on
>>>> the Linux host (the reply sent by dnsmasq).
>>>
>>> I need to try it again to verify, but my recollection is that (when
>>> you're using virtio-net with default settings) the checksums of DHCP
>>> packets in one direction or the other *always* show up in tcpdump as
>>> having bad checksums, but they still end up getting to the other end
>>> with a proper checksum. Sometime in the distant past I *may have*
>>> had it explained to me why this happens, but I don't recall now.
>>> Anyway, I'm just saying this so that you know the validity of the
>>> UDP checksum shouldn't be used as an indicator of whether or not
>>> things are "working".
>>
>> I have to say I also don't really understand what's happening here.
>> Isn't the Linux host sending DHCPOFFER? Why doesn't it set the UDP
>> checksum correctly and/or why would tcpdump report it wrongly if it is
>> setting it?
>
> Here are the original gory details
>
>
https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html
>
> TL;DR: we have checksum offload running so the host doesn't fill
> in any checksums, but DHCP client then tries to validate the
> non-existant checksum. Boom.
ISC DHCP fixed this in
https://github.com/isc-projects/dhcp/commit/7ff6ae5aa85754119319def3c7f22...
and if i'm interpreting this patch correctly, it is only fixed on
Linux - most changes are in lpf.c, which is "Linux Packet Filter",
and I'm assumnig that codepath won't be used on *BSD.
If correct, then the idea that checksum fixup from iptables is
obsolete is incorrect, and we need it added to nftables for parity.
Requiring users to turn off vhost-net feature is horrible, not
just for the user experiance of not having a broken VM out of
the box, but also for performance, as checksum offloading is a
good thing if you want fast networking.
Phil Sutter and Eric Garver suggested that we try 0'ing out the checksum
of these packets, which is something that nftables *can* do. Phil tried
it and it worked for him, so I tried it and it worked for me too. So
this weekend I made a patch that will add a rule like this:
nft -ae insert rule ip libvirt_network postroute_mangle \
oif virbr0 udp dport 68 counter udp checksum set 0
along with adding a single chain like this to contain all those rules:
nft add chain ip libvirt_network guest_mangle \
'{ type filter hook postrouting priority 0; policy accept; }'
I've tested it with FreeBSD and Fedora guests and it works properly with
both. I posted the patch to devel(a)lists.libvirt.org
and am hoping that others can also test it to verify that it's not
*breaking* dhcp for any other guests (I personally don't have much in
the way of Windows guest images, or debian/ubuntu/suse/etc. I could spin
some up but it would probably be faster (and less work for me!) if other
people just tested with what they have).