FreeBSD dhcp failing with UDP checksum errors

Richard W.M. Jones

12 Oct 2024 12 Oct '24

1:05 p.m.

I recently reinstalled Fedora (host) and I'm trying to import a previously working FreeBSD 13 guest. It boots fine, but fails to get an address from DHCP. In the FreeBSD boot output it prints: Starting dhclient. DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 7 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 9 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 9 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 10 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 17 5 bad udp checksums in 5 packets Indeed, tcpdumping the network on the host side shows that checksums are wrong (note "bad udp cksum" in the reply message): 0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 52:54:00:d4:07:ab (oui Unknown), length 300, xid 0xf9ee0d34, secs 53, Flags [none] (0x0000) Client-Ethernet-Address 52:54:00:d4:07:ab (oui Unknown) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Discover Requested-IP (50), length 4: freebsd.home.annexia.org Client-ID (61), length 7: ether 52:54:00:d4:07:ab Hostname (12), length 7: "freebsd" Parameter-Request (55), length 10: Subnet-Mask (1), BR (28), Time-Zone (2), Classless-Static-Route (121) Default-Gateway (3), Domain-Name (15), Domain-Name-Server (6), Hostname (12) Unknown (119), MTU (26) END (255), length 0 PAD (0), length 0, occurs 20 13:07:37.304083 IP (tos 0xc0, ttl 64, id 20207, offset 0, flags [none], proto UDP (17), length 328) cash.bootps > 192.168.122.203.bootpc: [bad udp cksum 0x7763 -> 0x88a0!] BOOTP/DHCP, Reply, length 300, xid 0xf9ee0d34, secs 53, Flags [none] (0x0000) Your-IP 192.168.122.203 Server-IP cash Client-Ethernet-Address 52:54:00:d4:07:ab (oui Unknown) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Offer Server-ID (54), length 4: cash Lease-Time (51), length 4: 3600 RN (58), length 4: 1800 RB (59), length 4: 3150 Subnet-Mask (1), length 4: 255.255.255.0 BR (28), length 4: 192.168.122.255 Default-Gateway (3), length 4: cash Domain-Name-Server (6), length 4: cash END (255), length 0 PAD (0), length 0, occurs 8 I guess this is something to do with checksum offloading. I can only find ancient bugs related to this. How to fix? The host is: libvirt-daemon-10.6.0-1.fc41.x86_64 dnsmasq-2.90-3.fc41.x86_64 Linux cash 6.11.0-0.rc5.20240830git20371ba12063.47.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 30 15:36:28 UTC 2024 x86_64 GNU/Linux Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit

Show replies by date

Daniel P. Berrangé

14 Oct 14 Oct

8:52 a.m.

On Sat, Oct 12, 2024 at 02:05:53PM +0100, Richard W.M. Jones wrote:

...

I recently reinstalled Fedora (host) and I'm trying to import a previously working FreeBSD 13 guest. It boots fine, but fails to get an address from DHCP. In the FreeBSD boot output it prints:

Starting dhclient. DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 7 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 9 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 9 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 10 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 17 5 bad udp checksums in 5 packets

Indeed, tcpdumping the network on the host side shows that checksums are wrong (note "bad udp cksum" in the reply message):

0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 52:54:00:d4:07:ab (oui Unknown), length 300, xid 0xf9ee0d34, secs 53, Flags [none] (0x0000) Client-Ethernet-Address 52:54:00:d4:07:ab (oui Unknown) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Discover Requested-IP (50), length 4: freebsd.home.annexia.org Client-ID (61), length 7: ether 52:54:00:d4:07:ab Hostname (12), length 7: "freebsd" Parameter-Request (55), length 10: Subnet-Mask (1), BR (28), Time-Zone (2), Classless-Static-Route (121) Default-Gateway (3), Domain-Name (15), Domain-Name-Server (6), Hostname (12) Unknown (119), MTU (26) END (255), length 0 PAD (0), length 0, occurs 20 13:07:37.304083 IP (tos 0xc0, ttl 64, id 20207, offset 0, flags [none], proto UDP (17), length 328) cash.bootps > 192.168.122.203.bootpc: [bad udp cksum 0x7763 -> 0x88a0!] BOOTP/DHCP, Reply, length 300, xid 0xf9ee0d34, secs 53, Flags [none] (0x0000) Your-IP 192.168.122.203 Server-IP cash Client-Ethernet-Address 52:54:00:d4:07:ab (oui Unknown) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Offer Server-ID (54), length 4: cash Lease-Time (51), length 4: 3600 RN (58), length 4: 1800 RB (59), length 4: 3150 Subnet-Mask (1), length 4: 255.255.255.0 BR (28), length 4: 192.168.122.255 Default-Gateway (3), length 4: cash Domain-Name-Server (6), length 4: cash END (255), length 0 PAD (0), length 0, occurs 8

I guess this is something to do with checksum offloading. I can only find ancient bugs related to this. How to fix? The host is:

libvirt-daemon-10.6.0-1.fc41.x86_64 dnsmasq-2.90-3.fc41.x86_64 Linux cash 6.11.0-0.rc5.20240830git20371ba12063.47.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 30 15:36:28 UTC 2024 x86_64 GNU/Linux

Urgh, I wonder if this is fallout from switching to NFT instead of iptables. IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules, since they believe that all modern distros would have long ago fixed their bugs wrt mangled checksums. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Richard W.M. Jones

9:35 a.m.

On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:

...

On Sat, Oct 12, 2024 at 02:05:53PM +0100, Richard W.M. Jones wrote:

...
I recently reinstalled Fedora (host) and I'm trying to import a previously working FreeBSD 13 guest. It boots fine, but fails to get an address from DHCP. In the FreeBSD boot output it prints:

Starting dhclient. DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 7 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 9 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 9 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 10 DHCPDISCOVER on vtnet0 to 255.255.255.255 port 67 interval 17 5 bad udp checksums in 5 packets

Indeed, tcpdumping the network on the host side shows that checksums are wrong (note "bad udp cksum" in the reply message):

0.0.0.0.bootpc > 255.255.255.255.bootps: [udp sum ok] BOOTP/DHCP, Request from 52:54:00:d4:07:ab (oui Unknown), length 300, xid 0xf9ee0d34, secs 53, Flags [none] (0x0000) Client-Ethernet-Address 52:54:00:d4:07:ab (oui Unknown) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Discover Requested-IP (50), length 4: freebsd.home.annexia.org Client-ID (61), length 7: ether 52:54:00:d4:07:ab Hostname (12), length 7: "freebsd" Parameter-Request (55), length 10: Subnet-Mask (1), BR (28), Time-Zone (2), Classless-Static-Route (121) Default-Gateway (3), Domain-Name (15), Domain-Name-Server (6), Hostname (12) Unknown (119), MTU (26) END (255), length 0 PAD (0), length 0, occurs 20 13:07:37.304083 IP (tos 0xc0, ttl 64, id 20207, offset 0, flags [none], proto UDP (17), length 328) cash.bootps > 192.168.122.203.bootpc: [bad udp cksum 0x7763 -> 0x88a0!] BOOTP/DHCP, Reply, length 300, xid 0xf9ee0d34, secs 53, Flags [none] (0x0000) Your-IP 192.168.122.203 Server-IP cash Client-Ethernet-Address 52:54:00:d4:07:ab (oui Unknown) Vendor-rfc1048 Extensions Magic Cookie 0x63825363 DHCP-Message (53), length 1: Offer Server-ID (54), length 4: cash Lease-Time (51), length 4: 3600 RN (58), length 4: 1800 RB (59), length 4: 3150 Subnet-Mask (1), length 4: 255.255.255.0 BR (28), length 4: 192.168.122.255 Default-Gateway (3), length 4: cash Domain-Name-Server (6), length 4: cash END (255), length 0 PAD (0), length 0, occurs 8

I guess this is something to do with checksum offloading. I can only find ancient bugs related to this. How to fix? The host is:

libvirt-daemon-10.6.0-1.fc41.x86_64 dnsmasq-2.90-3.fc41.x86_64 Linux cash 6.11.0-0.rc5.20240830git20371ba12063.47.fc42.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 30 15:36:28 UTC 2024 x86_64 GNU/Linux

Urgh, I wonder if this is fallout from switching to NFT instead of iptables.

I can list the firewall rules if you tell me what I'm looking for ...

...

IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules, since they believe that all modern distros would have long ago fixed their bugs wrt mangled checksums.

If I understand the trace correctly, the bad checksum originates on the Linux host (the reply sent by dnsmasq). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v

Laine Stump

2:46 p.m.

On 10/14/24 5:35 AM, Richard W.M. Jones wrote:

...

On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:

...
Urgh, I wonder if this is fallout from switching to NFT instead of iptables.

I can list the firewall rules if you tell me what I'm looking for ...

...
IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules, since they believe that all modern distros would have long ago fixed their bugs wrt mangled checksums.

That's the first thing that came to my mind too - maybe RHEL5 *isn't* the only guest OS that has this problem. (I certainly hope that isn't the case :-/) There are two ways to test out this theory: 1) change the setting of "firewall_backend" in /etc/libvirt/network.conf to "iptables" and restart virtnetworkd (if that does work, then switch back to nftables, restart virtnetworkd, and test again just to make sure the issue wasn't caused by some out-of-place rule) or 2) tell qemu to setup the virtio-net device to do its packet processing in userspace rather than the kernel. You do this by adding <driver name='qemu'/> to the <interface> section.

...

If I understand the trace correctly, the bad checksum originates on the Linux host (the reply sent by dnsmasq).

I need to try it again to verify, but my recollection is that (when you're using virtio-net with default settings) the checksums of DHCP packets in one direction or the other *always* show up in tcpdump as having bad checksums, but they still end up getting to the other end with a proper checksum. Sometime in the distant past I *may have* had it explained to me why this happens, but I don't recall now. Anyway, I'm just saying this so that you know the validity of the UDP checksum shouldn't be used as an indicator of whether or not things are "working".

Richard W.M. Jones

3:37 p.m.

On Mon, Oct 14, 2024 at 10:46:22AM -0400, Laine Stump wrote:

...

On 10/14/24 5:35 AM, Richard W.M. Jones wrote:

...
On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:

...
Urgh, I wonder if this is fallout from switching to NFT instead of iptables.

I can list the firewall rules if you tell me what I'm looking for ...

...
IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules, since they believe that all modern distros would have long ago fixed their bugs wrt mangled checksums.

That's the first thing that came to my mind too - maybe RHEL5 *isn't* the only guest OS that has this problem. (I certainly hope that isn't the case :-/)

There are two ways to test out this theory:

1) change the setting of "firewall_backend" in /etc/libvirt/network.conf to "iptables" and restart virtnetworkd

(if that does work, then switch back to nftables, restart virtnetworkd, and test again just to make sure the issue wasn't caused by some out-of-place rule)

I changed the setting between nftables and iptables a few times and I can confirm that your theory seems to be correct. iptables => "5 bad udp checksums in 5 packets" message is NOT seen FreeBSD gets an immediate DHCPOFFER and boots quickly with network nftables => FreeBSD sends 5 DHCPDISCOVER messages "5 bad udp checksums in 5 packets" reappears FreeBSD does NOT see DHCPOFFER, although it does seem to remember the offer from the previous boot, so it does get a network connection in the end.

...

or

2) tell qemu to setup the virtio-net device to do its packet processing in userspace rather than the kernel. You do this by adding

<driver name='qemu'/>

to the <interface> section.

This also works (with nftables).

...

...
If I understand the trace correctly, the bad checksum originates on the Linux host (the reply sent by dnsmasq).

I need to try it again to verify, but my recollection is that (when you're using virtio-net with default settings) the checksums of DHCP packets in one direction or the other *always* show up in tcpdump as having bad checksums, but they still end up getting to the other end with a proper checksum. Sometime in the distant past I *may have* had it explained to me why this happens, but I don't recall now. Anyway, I'm just saying this so that you know the validity of the UDP checksum shouldn't be used as an indicator of whether or not things are "working".

I have to say I also don't really understand what's happening here. Isn't the Linux host sending DHCPOFFER? Why doesn't it set the UDP checksum correctly and/or why would tcpdump report it wrongly if it is setting it? Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org

Daniel P. Berrangé

3:55 p.m.

On Mon, Oct 14, 2024 at 04:37:42PM +0100, Richard W.M. Jones wrote:

...

On Mon, Oct 14, 2024 at 10:46:22AM -0400, Laine Stump wrote:

...
On 10/14/24 5:35 AM, Richard W.M. Jones wrote:

...
On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:

...
Urgh, I wonder if this is fallout from switching to NFT instead of iptables.

I can list the firewall rules if you tell me what I'm looking for ...

...
IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules, since they believe that all modern distros would have long ago fixed their bugs wrt mangled checksums.

That's the first thing that came to my mind too - maybe RHEL5 *isn't* the only guest OS that has this problem. (I certainly hope that isn't the case :-/)

There are two ways to test out this theory:

1) change the setting of "firewall_backend" in /etc/libvirt/network.conf to "iptables" and restart virtnetworkd

(if that does work, then switch back to nftables, restart virtnetworkd, and test again just to make sure the issue wasn't caused by some out-of-place rule)

I changed the setting between nftables and iptables a few times and I can confirm that your theory seems to be correct.

iptables =>

"5 bad udp checksums in 5 packets" message is NOT seen

FreeBSD gets an immediate DHCPOFFER and boots quickly with network

nftables =>

FreeBSD sends 5 DHCPDISCOVER messages

"5 bad udp checksums in 5 packets" reappears

FreeBSD does NOT see DHCPOFFER, although it does seem to remember the offer from the previous boot, so it does get a network connection in the end.

...
or

2) tell qemu to setup the virtio-net device to do its packet processing in userspace rather than the kernel. You do this by adding

<driver name='qemu'/>

to the <interface> section.

This also works (with nftables).

...
...
If I understand the trace correctly, the bad checksum originates on the Linux host (the reply sent by dnsmasq).

I need to try it again to verify, but my recollection is that (when you're using virtio-net with default settings) the checksums of DHCP packets in one direction or the other *always* show up in tcpdump as having bad checksums, but they still end up getting to the other end with a proper checksum. Sometime in the distant past I *may have* had it explained to me why this happens, but I don't recall now. Anyway, I'm just saying this so that you know the validity of the UDP checksum shouldn't be used as an indicator of whether or not things are "working".

I have to say I also don't really understand what's happening here. Isn't the Linux host sending DHCPOFFER? Why doesn't it set the UDP checksum correctly and/or why would tcpdump report it wrongly if it is setting it?

Here are the original gory details https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html TL;DR: we have checksum offload running so the host doesn't fill in any checksums, but DHCP client then tries to validate the non-existant checksum. Boom. ith regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Daniel P. Berrangé

4:06 p.m.

On Mon, Oct 14, 2024 at 04:55:37PM +0100, Daniel P. Berrangé wrote:

...

On Mon, Oct 14, 2024 at 04:37:42PM +0100, Richard W.M. Jones wrote:

...
On Mon, Oct 14, 2024 at 10:46:22AM -0400, Laine Stump wrote:

...
On 10/14/24 5:35 AM, Richard W.M. Jones wrote:

...
On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:

...
Urgh, I wonder if this is fallout from switching to NFT instead of iptables.

I can list the firewall rules if you tell me what I'm looking for ...

...
IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules, since they believe that all modern distros would have long ago fixed their bugs wrt mangled checksums.

That's the first thing that came to my mind too - maybe RHEL5 *isn't* the only guest OS that has this problem. (I certainly hope that isn't the case :-/)

There are two ways to test out this theory:

1) change the setting of "firewall_backend" in /etc/libvirt/network.conf to "iptables" and restart virtnetworkd

(if that does work, then switch back to nftables, restart virtnetworkd, and test again just to make sure the issue wasn't caused by some out-of-place rule)

I changed the setting between nftables and iptables a few times and I can confirm that your theory seems to be correct.

iptables =>

"5 bad udp checksums in 5 packets" message is NOT seen

FreeBSD gets an immediate DHCPOFFER and boots quickly with network

nftables =>

FreeBSD sends 5 DHCPDISCOVER messages

"5 bad udp checksums in 5 packets" reappears

FreeBSD does NOT see DHCPOFFER, although it does seem to remember the offer from the previous boot, so it does get a network connection in the end.

...
or

2) tell qemu to setup the virtio-net device to do its packet processing in userspace rather than the kernel. You do this by adding

<driver name='qemu'/>

to the <interface> section.

This also works (with nftables).

...
...
If I understand the trace correctly, the bad checksum originates on the Linux host (the reply sent by dnsmasq).

I need to try it again to verify, but my recollection is that (when you're using virtio-net with default settings) the checksums of DHCP packets in one direction or the other *always* show up in tcpdump as having bad checksums, but they still end up getting to the other end with a proper checksum. Sometime in the distant past I *may have* had it explained to me why this happens, but I don't recall now. Anyway, I'm just saying this so that you know the validity of the UDP checksum shouldn't be used as an indicator of whether or not things are "working".

I have to say I also don't really understand what's happening here. Isn't the Linux host sending DHCPOFFER? Why doesn't it set the UDP checksum correctly and/or why would tcpdump report it wrongly if it is setting it?

Here are the original gory details

https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html

TL;DR: we have checksum offload running so the host doesn't fill in any checksums, but DHCP client then tries to validate the non-existant checksum. Boom.

ISC DHCP fixed this in https://github.com/isc-projects/dhcp/commit/7ff6ae5aa85754119319def3c7f225a4... and if i'm interpreting this patch correctly, it is only fixed on Linux - most changes are in lpf.c, which is "Linux Packet Filter", and I'm assumnig that codepath won't be used on *BSD. If correct, then the idea that checksum fixup from iptables is obsolete is incorrect, and we need it added to nftables for parity. Requiring users to turn off vhost-net feature is horrible, not just for the user experiance of not having a broken VM out of the box, but also for performance, as checksum offloading is a good thing if you want fast networking. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Laine Stump

21 Oct 21 Oct

4:29 a.m.

On 10/14/24 12:06 PM, Daniel P. Berrangé wrote:

...

On Mon, Oct 14, 2024 at 04:55:37PM +0100, Daniel P. Berrangé wrote:

...
On Mon, Oct 14, 2024 at 04:37:42PM +0100, Richard W.M. Jones wrote:

...
On Mon, Oct 14, 2024 at 10:46:22AM -0400, Laine Stump wrote:

...
On 10/14/24 5:35 AM, Richard W.M. Jones wrote:

...
On Mon, Oct 14, 2024 at 09:52:13AM +0100, Daniel P. Berrangé wrote:

...
Urgh, I wonder if this is fallout from switching to NFT instead of iptables.

I can list the firewall rules if you tell me what I'm looking for ...

...
IIUC, the NFT kernel maintainers didn't implement for checksum fixup rules, since they believe that all modern distros would have long ago fixed their bugs wrt mangled checksums.

That's the first thing that came to my mind too - maybe RHEL5 *isn't* the only guest OS that has this problem. (I certainly hope that isn't the case :-/)

There are two ways to test out this theory:

1) change the setting of "firewall_backend" in /etc/libvirt/network.conf to "iptables" and restart virtnetworkd

(if that does work, then switch back to nftables, restart virtnetworkd, and test again just to make sure the issue wasn't caused by some out-of-place rule)

I changed the setting between nftables and iptables a few times and I can confirm that your theory seems to be correct.

iptables =>

"5 bad udp checksums in 5 packets" message is NOT seen

FreeBSD gets an immediate DHCPOFFER and boots quickly with network

nftables =>

FreeBSD sends 5 DHCPDISCOVER messages

"5 bad udp checksums in 5 packets" reappears

FreeBSD does NOT see DHCPOFFER, although it does seem to remember the offer from the previous boot, so it does get a network connection in the end.

...
or

2) tell qemu to setup the virtio-net device to do its packet processing in userspace rather than the kernel. You do this by adding

<driver name='qemu'/>

to the <interface> section.

This also works (with nftables).

...
...
If I understand the trace correctly, the bad checksum originates on the Linux host (the reply sent by dnsmasq).

I need to try it again to verify, but my recollection is that (when you're using virtio-net with default settings) the checksums of DHCP packets in one direction or the other *always* show up in tcpdump as having bad checksums, but they still end up getting to the other end with a proper checksum. Sometime in the distant past I *may have* had it explained to me why this happens, but I don't recall now. Anyway, I'm just saying this so that you know the validity of the UDP checksum shouldn't be used as an indicator of whether or not things are "working".

I have to say I also don't really understand what's happening here. Isn't the Linux host sending DHCPOFFER? Why doesn't it set the UDP checksum correctly and/or why would tcpdump report it wrongly if it is setting it?

Here are the original gory details

https://lists.isc.org/pipermail/dhcp-hackers/2010-April/001835.html

TL;DR: we have checksum offload running so the host doesn't fill in any checksums, but DHCP client then tries to validate the non-existant checksum. Boom.

ISC DHCP fixed this in

https://github.com/isc-projects/dhcp/commit/7ff6ae5aa85754119319def3c7f225a4...

and if i'm interpreting this patch correctly, it is only fixed on Linux - most changes are in lpf.c, which is "Linux Packet Filter", and I'm assumnig that codepath won't be used on *BSD.

If correct, then the idea that checksum fixup from iptables is obsolete is incorrect, and we need it added to nftables for parity.

Requiring users to turn off vhost-net feature is horrible, not just for the user experiance of not having a broken VM out of the box, but also for performance, as checksum offloading is a good thing if you want fast networking.

Phil Sutter and Eric Garver suggested that we try 0'ing out the checksum of these packets, which is something that nftables *can* do. Phil tried it and it worked for him, so I tried it and it worked for me too. So this weekend I made a patch that will add a rule like this: nft -ae insert rule ip libvirt_network postroute_mangle \ oif virbr0 udp dport 68 counter udp checksum set 0 along with adding a single chain like this to contain all those rules: nft add chain ip libvirt_network guest_mangle \ '{ type filter hook postrouting priority 0; policy accept; }' I've tested it with FreeBSD and Fedora guests and it works properly with both. I posted the patch to devel@lists.libvirt.org https://www.spinics.net/linux/fedora/libvir/msg249203.html and am hoping that others can also test it to verify that it's not *breaking* dhcp for any other guests (I personally don't have much in the way of Windows guest images, or debian/ubuntu/suse/etc. I could spin some up but it would probably be faster (and less work for me!) if other people just tested with what they have).

411

Age (days ago)

420

Last active (days ago)

List overview

Download

7 comments

3 participants

participants (3)

Daniel P. Berrangé
Laine Stump
Richard W.M. Jones

FreeBSD dhcp failing with UDP checksum errors

tags

participants (3)