network config not working on newer libvirt

Greetings, up until a year ago, I was running a server with Debian 10 (stable) on it with the latest versions of libvirt, qemu and kernel 4.19.x Debian 10 had to offer (both libvirt and qemu versions were really old). the network config was simple, one of the vm acted as a router and provided the ip for both the host and the vm. I've recently switched distro and now I'm running latest stable libvirt, qemu and kernel 5.4.x I've tried to reinstate the network config on the new distro but I cannot get ip via dhcp for the second vm. if I assign manual ip and gateway, I have access to the outside world. here are the relevant dumps: network on the router vm: <interface type='network'> <mac address='52:54:00:53:1c:6b'/> <source network='default'/> <target dev='virtsw0-vm1'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x21' slot='0x01' function='0x0'/> </interface> the other vm <interface type='network'> <mac address='52:54:00:5a:4c:8c'/> <source network='default'/> <target dev='virtsw0-vm2'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> and finally: <network connections='2'> <name>default</name> <uuid>61bc1a72-bd02-408a-b88e-dec696742c20</uuid> <bridge name='virtsw0' stp='on' delay='0'/> <mac address='52:54:00:6b:1b:92'/> </network> as it is possible I'm missing a kernel config, here is the output of lsmod: vfio_pci 49152 6 vfio_virqfd 16384 1 vfio_pci vfio_iommu_type1 32768 2 vfio 28672 16 vfio_iommu_type1,vfio_pci ip6table_nat 16384 1 iptable_nat 16384 1 ebtables 24576 0 bridge 143360 0 stp 16384 1 bridge llc 16384 2 bridge,stp cfg80211 647168 0 x86_pkg_temp_thermal 20480 0 kvm_intel 237568 8 vhost_net 24576 3 vhost 36864 1 vhost_net tap 24576 1 vhost_net kvm 663552 1 kvm_intel tun 53248 8 vhost_net r8152 73728 0 nct6775 57344 0 mei_me 32768 0 hwmon_vid 20480 1 nct6775 irqbypass 16384 22 vfio_pci,kvm mii 16384 1 r8152 mei 77824 1 mei_me coretemp 16384 0 efivarfs 16384 1 and the .config at https://dpaste.com/9ZUCBDE9R any ideas how to fix it? Thanks, Dagg.

On 9/4/20 12:38 AM, daggs wrote:
Greetings,
up until a year ago, I was running a server with Debian 10 (stable) on it with the latest versions of libvirt, qemu and kernel 4.19.x Debian 10 had to offer (both libvirt and qemu versions were really old).
the network config was simple, one of the vm acted as a router and provided the ip for both the host and the vm. I've recently switched distro and now I'm running latest stable libvirt, qemu and kernel 5.4.x
You haven't said which distro, nor what is the libvirt exact libvirt version (probably won't matter in this case, but in general "libvirt x.y.z" is more useful than "latest stable libvirt"). If you have full connectivity once you've manually assigned IP addresses, then you don't have any routing problems, so that can be counted out. (Anyway, DHCP packets never go beyond the local network). In that case, you've most likely either got a firewall problem on host or guest, or a problem with your dhcp server.
I've tried to reinstate the network config on the new distro but I cannot get ip via dhcp for the second vm. if I assign manual ip and gateway, I have access to the outside world.
From where? The host? or the guests?
here are the relevant dumps: network on the router vm: <interface type='network'> <mac address='52:54:00:53:1c:6b'/> <source network='default'/> <target dev='virtsw0-vm1'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x21' slot='0x01' function='0x0'/> </interface>
the other vm <interface type='network'> <mac address='52:54:00:5a:4c:8c'/> <source network='default'/> <target dev='virtsw0-vm2'/> <model type='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface>
and finally: <network connections='2'> <name>default</name> <uuid>61bc1a72-bd02-408a-b88e-dec696742c20</uuid> <bridge name='virtsw0' stp='on' delay='0'/> <mac address='52:54:00:6b:1b:92'/> </network>
Your config is for a bridge that's created by libvirt, but with no iptables rules, no dnsmasq instance, and no IP address on the host. So any DHCP server config is outside libvirt's realm, as are any iptables or nftables rules, so in this case there is nothing to look at in the libvirt config for either of these issues. I would start troubleshooting by making sure that the dhcp server is running, and that you can communicate between the machine with DHCP server and the guest once a manual IP is assigned. Then use tcpdump or wireshark at different places on the path between those two to see how far the DHCP request is getting out, whether a response is being sent by the server, and if so how far the response is getting back (i.e. on the host, run tcpdump on the guest's tap device; if you see the DHCP request there, then run tcpdump on the bridge, if you see it there, run it on the tap device for the guest, if you see it there, then run tcpdump inside the guest; then check the dhcp server logs to see if it's receiving requests. While you're doing all of this, you can also be noticing whether or not a DHCP response is arriving at each step (and if you see the response, you can skip looking further ahead in the packet path, since you know by inference that it made it all the way to the DHCP server). Once you find the point that the packet is blocked, you'll be better able to determine why.
as it is possible I'm missing a kernel config, here is the output of lsmod: vfio_pci 49152 6 vfio_virqfd 16384 1 vfio_pci vfio_iommu_type1 32768 2 vfio 28672 16 vfio_iommu_type1,vfio_pci ip6table_nat 16384 1 iptable_nat 16384 1 ebtables 24576 0 bridge 143360 0 stp 16384 1 bridge llc 16384 2 bridge,stp cfg80211 647168 0 x86_pkg_temp_thermal 20480 0 kvm_intel 237568 8 vhost_net 24576 3 vhost 36864 1 vhost_net tap 24576 1 vhost_net kvm 663552 1 kvm_intel tun 53248 8 vhost_net r8152 73728 0 nct6775 57344 0 mei_me 32768 0 hwmon_vid 20480 1 nct6775 irqbypass 16384 22 vfio_pci,kvm mii 16384 1 r8152 mei 77824 1 mei_me coretemp 16384 0 efivarfs 16384 1
and the .config at https://dpaste.com/9ZUCBDE9R
any ideas how to fix it?
Thanks,
Dagg.

Greetings Laine,
You haven't said which distro, nor what is the libvirt exact libvirt version (probably won't matter in this case, but in general "libvirt x.y.z" is more useful than "latest stable libvirt").
you are correct, the previous os was debian 10 with libvirt 3, the new os is gentoo with libvirt 6.2.0
If you have full connectivity once you've manually assigned IP addresses, then you don't have any routing problems, so that can be counted out. (Anyway, DHCP packets never go beyond the local network).
In that case, you've most likely either got a firewall problem on host or guest, or a problem with your dhcp server.
iptables is installed on the host (required by libvirt because of the virt network features, from what I can see, it isn't running. the guest is libreelec, somehow I don't think it has iptables installed or configured. on the dhcp server (the other vm) I see this: Sat Sep 5 00:33:25 2020 daemon.info dnsmasq-dhcp[2579]: DHCPDISCOVER(br-lan) 52:54:00:5a:4c:8c Sat Sep 5 00:33:25 2020 daemon.info dnsmasq-dhcp[2579]: DHCPOFFER(br-lan) 10.0.0.40 52:54:00:5a:4c:8c multiple times, it means that the server accepted the request and offers the correct ip to it but doesn't seem to get there.
From where? The host? or the guests?
I can ssh from one vm to another, without the manual ip, I cannot do it
I would start troubleshooting by making sure that the dhcp server is running, and that you can communicate between the machine with DHCP server and the guest once a manual IP is assigned. Then use tcpdump or wireshark at different places on the path between those two to see how far the DHCP request is getting out, whether a response is being sent by the server, and if so how far the response is getting back (i.e. on the host, run tcpdump on the guest's tap device; if you see the DHCP request there, then run tcpdump on the bridge, if you see it there, run it on the tap device for the guest, if you see it there, then run tcpdump inside the guest; then check the dhcp server logs to see if it's receiving requests. While you're doing all of this, you can also be noticing whether or not a DHCP response is arriving at each step (and if you see the response, you can skip looking further ahead in the packet path, since you know by inference that it made it all the way to the DHCP server). Once you find the point that the packet is blocked, you'll be better able to determine why.
alright, I'll try that, thanks. Dagg.

Greetings Laine,
I would start troubleshooting by making sure that the dhcp server is running, and that you can communicate between the machine with DHCP server and the guest once a manual IP is assigned. Then use tcpdump or wireshark at different places on the path between those two to see how far the DHCP request is getting out, whether a response is being sent by the server, and if so how far the response is getting back (i.e. on the host, run tcpdump on the guest's tap device; if you see the DHCP request there, then run tcpdump on the bridge, if you see it there, run it on the tap device for the guest, if you see it there, then run tcpdump inside the guest; then check the dhcp server logs to see if it's receiving requests. While you're doing all of this, you can also be noticing whether or not a DHCP response is arriving at each step (and if you see the response, you can skip looking further ahead in the packet path, since you know by inference that it made it all the way to the DHCP server). Once you find the point that the packet is blocked, you'll be better able to determine why.
alright, I'll try that, thanks.
I've ran tcpdump on the vm's tap device, here is what I see: 01:42:15.404754 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:5a:4c:8c (oui Unknown), length 548 01:42:15.405075 IP Broadcom.Home.bootps > 10.0.0.40.bootpc: BOOTP/DHCP, Reply, length 300 01:42:15.735893 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:6b:1b:92.8003, length 35 01:42:17.718941 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:6b:1b:92.8003, length 35 01:42:17.846918 IP6 fe80::fc54:ff:fe5a:4c8c > ff02::2: ICMP6, router solicitation, length 16 01:42:19.702944 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:6b:1b:92.8003, length 35 01:42:20.450441 ARP, Request who-has 10.0.0.40 tell Broadcom.Home, length 28 I think that issue is this: 01:42:20.450441 ARP, Request who-has 10.0.0.40 tell Broadcom.Home, length 28 I'm not sure if this is expected but looks like my dhcp server ignores it. any thoughts on the matter? Dagg.

On 9/4/20 6:47 PM, daggs wrote:
Greetings Laine,
I would start troubleshooting by making sure that the dhcp server is running, and that you can communicate between the machine with DHCP server and the guest once a manual IP is assigned. Then use tcpdump or wireshark at different places on the path between those two to see how far the DHCP request is getting out, whether a response is being sent by the server, and if so how far the response is getting back (i.e. on the host, run tcpdump on the guest's tap device; if you see the DHCP request there, then run tcpdump on the bridge, if you see it there, run it on the tap device for the guest, if you see it there, then run tcpdump inside the guest; then check the dhcp server logs to see if it's receiving requests. While you're doing all of this, you can also be noticing whether or not a DHCP response is arriving at each step (and if you see the response, you can skip looking further ahead in the packet path, since you know by inference that it made it all the way to the DHCP server). Once you find the point that the packet is blocked, you'll be better able to determine why.
alright, I'll try that, thanks.
I've ran tcpdump on the vm's tap device, here is what I see:
When you say "the vm", you mean the one running libreelec, that is trying to get and IP address, correct?
01:42:15.404754 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from 52:54:00:5a:4c:8c (oui Unknown), length 548 01:42:15.405075 IP Broadcom.Home.bootps > 10.0.0.40.bootpc: BOOTP/DHCP, Reply, length 300
I guess Broadcom.home is the IP of the VM that's running the dhcp server? (I should have suggested using "tcpdump -n -e -v" :-/)
01:42:15.735893 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:6b:1b:92.8003, length 35 01:42:17.718941 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:6b:1b:92.8003, length 35 01:42:17.846918 IP6 fe80::fc54:ff:fe5a:4c8c > ff02::2: ICMP6, router solicitation, length 16 01:42:19.702944 STP 802.1d, Config, Flags [none], bridge-id 8000.52:54:00:6b:1b:92.8003, length 35 01:42:20.450441 ARP, Request who-has 10.0.0.40 tell Broadcom.Home, length 28
I think that issue is this: 01:42:20.450441 ARP, Request who-has 10.0.0.40 tell Broadcom.Home, length 28
I'm not sure if this is expected but looks like my dhcp server ignores it. any thoughts on the matter?
It looks strange, but is normal. What usually happens is this: 1) The guest sends a DHCP Discover Request, suggesting that it would like to use the addres 10.0.0.40 (These details will be revealed once you add "-v" to your tcpdump commandline. 2) The DHCP server says to itself "Hmm, this guy wants to use 10.0.0.40, which is okay with me, but first I should see if someone else is using it", so it sends out an ARP request for 10.0.0.40. Then just to be sure, it sends another. (at this point, if the server is dnsmasq and it hasn't received an ARP request, for some reason it sends an ICMP echo request to 10.0.0.40 (the requested/suggested IP) with destination MAC address of the client that just sent the DHCP request. No idea why. It won't be answered though (unless the client actually still had a lease on that address and was just renewing; but the DHCP server would know it if that's what was happening, so...) 3) If the server doesn't receive any response to the ARP request, then it will send a DHCP response to the requested IP + client MAC saying "Yes, you can use that IP address. 4) I'm not sure why (because it's been > 20 years since I last read the DHCP RFC), but in the case I just looked at on my host (which is using dnsmasq as the server, and dhclient as the client), the same request and response are sent/received at the same IP+MAC addresses a 2nd time. 5) at this point everybody agrees on the new IP address, the client sets its IP address, the server updates its leases table, and life carries on. But to back up for a minute - it's completely normal for the DHCP server to send out an ARP request and get no response. I think things are going south sometime after that. Are you seeing a DHCP reply at all? If you don't see it on the libreelec (client) machine's tap device, check if you see it going *out* on the DHCP server's tap. If it's not there, then you'll need to debug inside the guest running the DHCP server. Before this packet is receivd, the guest doesn't yet know that's its IP address, but it does know that's its MAC address, and it's waiting for a DHCP reply, so it takes the info from the reply, then sends another request, this time including all the options it received in the first reply. 4) Now

I am very inexperienced with KVM and have a question about networking As per my basic understanding a network in KVM apparently also provides the work of an DHCP server for all machines on that network. So, whether that perception is correct, how would you go about whether you wanted (for learning purposes) to use a particular VM on such a KVM (internal) network to act as the DHCP server for all machines on that network? by default it should conflict with the built in DHCP service of the network itself, wouldn't it?

I looked into this a little more myself (mixture of the "Virtual Machine Manager" gui and virsh cli is my toolset) which provided me with some insights. On 06.09.20 14:50, gunnar.wagner wrote:
/As per my basic understanding a network in KVM apparently also provides the work of an DHCP server for all machines on that network. / only if DHCP is enabled upon creation of such a network
/[...] how would you go about whether you wanted (for learning purposes) to use a particular VM on such a KVM (internal) network to act as the DHCP server for all machines on that network? / by creating a network without DHCP enabled

Greetings LAine,
When you say "the vm", you mean the one running libreelec, that is trying to get and IP address, correct?
yes, you are correct.
I guess Broadcom.home is the IP of the VM that's running the dhcp server? (I should have suggested using "tcpdump -n -e -v" :-/)
frankly, I have no idea who is Broadcom.home. here is the requested dump: https://dpaste.com/849DMX9ND
It looks strange, but is normal. What usually happens is this:
1) The guest sends a DHCP Discover Request, suggesting that it would like to use the addres 10.0.0.40 (These details will be revealed once you add "-v" to your tcpdump commandline.
2) The DHCP server says to itself "Hmm, this guy wants to use 10.0.0.40, which is okay with me, but first I should see if someone else is using it", so it sends out an ARP request for 10.0.0.40. Then just to be sure, it sends another.
(at this point, if the server is dnsmasq and it hasn't received an ARP request, for some reason it sends an ICMP echo request to 10.0.0.40 (the requested/suggested IP) with destination MAC address of the client that just sent the DHCP request. No idea why. It won't be answered though (unless the client actually still had a lease on that address and was just renewing; but the DHCP server would know it if that's what was happening, so...)
3) If the server doesn't receive any response to the ARP request, then it will send a DHCP response to the requested IP + client MAC saying "Yes, you can use that IP address.
4) I'm not sure why (because it's been > 20 years since I last read the DHCP RFC), but in the case I just looked at on my host (which is using dnsmasq as the server, and dhclient as the client), the same request and response are sent/received at the same IP+MAC addresses a 2nd time.
5) at this point everybody agrees on the new IP address, the client sets its IP address, the server updates its leases table, and life carries on.
But to back up for a minute - it's completely normal for the DHCP server to send out an ARP request and get no response. I think things are going south sometime after that. Are you seeing a DHCP reply at all? If you don't see it on the libreelec (client) machine's tap device, check if you see it going *out* on the DHCP server's tap. If it's not there, then you'll need to debug inside the guest running the DHCP server.
Before this packet is receivd, the guest doesn't yet know that's its IP address, but it does know that's its MAC address, and it's waiting for a DHCP reply, so it takes the info from the reply, then sends another request, this time including all the options it received in the first reply.
4) Now
should I add another nic with static ip and try to trace the pkts from there?

On 9/6/20 12:02 PM, daggs wrote:
Greetings LAine,
When you say "the vm", you mean the one running libreelec, that is trying to get and IP address, correct?
yes, you are correct.
I guess Broadcom.home is the IP of the VM that's running the dhcp server? (I should have suggested using "tcpdump -n -e -v" :-/)
frankly, I have no idea who is Broadcom.home.
It's just some name tcpdump used to replace the IP address of one of the machines, and since it's the source IP of a DHCP reply packet, it most likely is the IP of the DHCP server.
here is the requested dump: https://dpaste.com/849DMX9ND
What I see in that dump is that the DHCP client (Mac address 52:54:00:5a:4c:8c, hostname "streamer" repeatedly sends the exact same DHCP request (6 times), and the DHCP server responds to each of these requests alternating between sending the response to the client's MAC with a destination IP already set, and to the broadcast MAC + IP addresses) interspersed with several ARP requests directed at the MAC address of the client asking who has the IP that the server just suggested (so it's doing something different from what I described in my previous message - rather than using ARP to verify that an IP isn't already in use prior to assigning it, it's assuming it has full authority over IP addresses in the broadcast domain, assigning that IP to the client without checking for prior use, and then sending the ARP request to see if the client actually decided to use it.) Eventually the client gives up (because it hasn't seen any valid DHCP responses) and gives itself an IP on the 169.254.0.0/16 network, then goes about the process of looking for other devices to connect to using that IP. Was this dump taken on the host of the tap device of the client (libreelec aka streamer)? If so, I can only see two options: 1) there is something in iptables or ebtables (or nftables, if you have that on the host) blocking the DHCP response packets from going out the tap interface, or 2) there is something in the guest itself blocking the traffic or preventing the packet from passing. For (1) you'd need to run "ebtables -L; iptables -S; nft list ruleset" and look for something suspicious. For (2) can you try changing both the libreelec and the DHCP server vm's ethernet device models from virtio to e1000? (or e1000e if they are q35 machinetypes)? If that works, then change one or the other back and see if it stops working.
should I add another nic with static ip and try to trace the pkts from there?
You mean so you can ssh to the client/libreelec and run tcpdump there agains the interface that's doing dhcp? Is tcpdump even available on libreelec? I know it's very limited, and has no simple facilities for adding new packages. If it has tcpdump though, then sure. The only problem is that you would probably not be able to get tcpdump running via that interface quick enough to see the initial boottime dhcp exchange; instead you'll probably need to go into the UI and bring the other interface down/up to trigger a new DHCP cycle. (BTW, if everything works when the client has a static IP address, then that proves there is no problem related to ARP requests/responses - that much is required in order for even a static IP to work)

Greetings Laine,
It's just some name tcpdump used to replace the IP address of one of the machines, and since it's the source IP of a DHCP reply packet, it most likely is the IP of the DHCP server. ok, sounds reasonable
here is the requested dump: https://dpaste.com/849DMX9ND
What I see in that dump is that the DHCP client (Mac address 52:54:00:5a:4c:8c, hostname "streamer" repeatedly sends the exact same DHCP request (6 times), and the DHCP server responds to each of these requests alternating between sending the response to the client's MAC with a destination IP already set, and to the broadcast MAC + IP addresses) interspersed with several ARP requests directed at the MAC address of the client asking who has the IP that the server just suggested (so it's doing something different from what I described in my previous message - rather than using ARP to verify that an IP isn't already in use prior to assigning it, it's assuming it has full authority over IP addresses in the broadcast domain, assigning that IP to the client without checking for prior use, and then sending the ARP request to see if the client actually decided to use it.)
Eventually the client gives up (because it hasn't seen any valid DHCP responses) and gives itself an IP on the 169.254.0.0/16 network, then goes about the process of looking for other devices to connect to using that IP.
Was this dump taken on the host of the tap device of the client (libreelec aka streamer)?
here are the relevant adapters of the vm: 4: virtsw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:6b:1b:92 brd ff:ff:ff:ff:ff:ff 5: virtsw-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virtsw state DOWN group default qlen 1000 link/ether 52:54:00:6b:1b:92 brd ff:ff:ff:ff:ff:ff 6: nic_host: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000 link/ether fe:54:00:a7:79:6b brd ff:ff:ff:ff:ff:ff inet 11.0.0.3/24 brd 11.0.0.255 scope global dynamic noprefixroute nic_host valid_lft 33053sec preferred_lft 27653sec inet6 fdab:9802:eb52::a59/128 scope global noprefixroute valid_lft forever preferred_lft forever inet6 fdab:9802:eb52:0:41d9:d311:10fd:e343/64 scope global mngtmpaddr noprefixroute valid_lft forever preferred_lft forever inet6 fe80::fc54:ff:fea7:796b/64 scope link valid_lft forever preferred_lft forever 7: virtsw-router: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virtsw state UNKNOWN group default qlen 1000 link/ether fe:54:00:53:1c:6b brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe53:1c6b/64 scope link valid_lft forever preferred_lft forever 8: virtsw-streamer: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virtsw state UNKNOWN group default qlen 1000 link/ether fe:54:00:5a:4c:8c brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe5a:4c8c/64 scope link valid_lft forever preferred_lft forever the dump was taken from the host tapping onto virtsw-streamer. virtsw-streamer is configured as follows: <interface type='network'> <mac address='52:54:00:5a:4c:8c'/> <source network='default' portid='77aae31e-5efa-4789-911c-c55b367cd695' bridge='virtsw'/> <target dev='virtsw-streamer'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface>
If so, I can only see two options: 1) there is something in iptables or ebtables (or nftables, if you have that on the host) blocking the DHCP response packets from going out the tap interface, or 2) there is something in the guest itself blocking the traffic or preventing the packet from passing.
For (1) you'd need to run "ebtables -L; iptables -S; nft list ruleset" and look for something suspicious. here is what I get: utils_server /home/igor # ebtables -L; iptables -S; nft list ruleset The kernel doesn't support the ebtables 'filter' table. -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -N LIBVIRT_FWI -N LIBVIRT_FWO -N LIBVIRT_FWX -N LIBVIRT_INP -N LIBVIRT_OUT -A INPUT -j LIBVIRT_INP -A FORWARD -j LIBVIRT_FWX -A FORWARD -j LIBVIRT_FWI -A FORWARD -j LIBVIRT_FWO -A OUTPUT -j LIBVIRT_OUT -A LIBVIRT_FWI -o virtsw -j REJECT --reject-with icmp-port-unreachable -A LIBVIRT_FWO -i virtsw -j REJECT --reject-with icmp-port-unreachable -A LIBVIRT_FWX -i virtsw -o virtsw -j ACCEPT -A LIBVIRT_INP -i virtsw -p udp -m udp --dport 53 -j ACCEPT -A LIBVIRT_INP -i virtsw -p tcp -m tcp --dport 53 -j ACCEPT -A LIBVIRT_INP -i virtsw -p udp -m udp --dport 67 -j ACCEPT -A LIBVIRT_INP -i virtsw -p tcp -m tcp --dport 67 -j ACCEPT -A LIBVIRT_OUT -o virtsw -p udp -m udp --dport 53 -j ACCEPT -A LIBVIRT_OUT -o virtsw -p tcp -m tcp --dport 53 -j ACCEPT -A LIBVIRT_OUT -o virtsw -p udp -m udp --dport 68 -j ACCEPT -A LIBVIRT_OUT -o virtsw -p tcp -m tcp --dport 68 -j ACCEPT bash: nft: command not found
what about the rest of the ports?
For (2) can you try changing both the libreelec and the DHCP server vm's ethernet device models from virtio to e1000? (or e1000e if they are q35 machinetypes)? If that works, then change one or the other back and see if it stops working.
will try and report.
You mean so you can ssh to the client/libreelec and run tcpdump there agains the interface that's doing dhcp? Is tcpdump even available on libreelec? I know it's very limited, and has no simple facilities for adding new packages. If it has tcpdump though, then sure. The only problem is that you would probably not be able to get tcpdump running via that interface quick enough to see the initial boottime dhcp exchange; instead you'll probably need to go into the UI and bring the other interface down/up to trigger a new DHCP cycle.
static tcpdump should do the trick imho.
(BTW, if everything works when the client has a static IP address, then that proves there is no problem related to ARP requests/responses - that much is required in order for even a static IP to work)
currently I use static ip and I can ssh to the streamer from all machines on the network.

It's just some name tcpdump used to replace the IP address of one of the machines, and since it's the source IP of a DHCP reply packet, it most likely is the IP of the DHCP server. ok, sounds reasonable
here is the requested dump: https://dpaste.com/849DMX9ND
What I see in that dump is that the DHCP client (Mac address 52:54:00:5a:4c:8c, hostname "streamer" repeatedly sends the exact same DHCP request (6 times), and the DHCP server responds to each of these requests alternating between sending the response to the client's MAC with a destination IP already set, and to the broadcast MAC + IP addresses) interspersed with several ARP requests directed at the MAC address of the client asking who has the IP that the server just suggested (so it's doing something different from what I described in my previous message - rather than using ARP to verify that an IP isn't already in use prior to assigning it, it's assuming it has full authority over IP addresses in the broadcast domain, assigning that IP to the client without checking for prior use, and then sending the ARP request to see if the client actually decided to use it.)
Eventually the client gives up (because it hasn't seen any valid DHCP responses) and gives itself an IP on the 169.254.0.0/16 network, then goes about the process of looking for other devices to connect to using that IP.
Was this dump taken on the host of the tap device of the client (libreelec aka streamer)?
here are the relevant adapters of the vm: 4: virtsw: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:6b:1b:92 brd ff:ff:ff:ff:ff:ff 5: virtsw-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virtsw state DOWN group default qlen 1000 link/ether 52:54:00:6b:1b:92 brd ff:ff:ff:ff:ff:ff 6: nic_host: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000 link/ether fe:54:00:a7:79:6b brd ff:ff:ff:ff:ff:ff inet 11.0.0.3/24 brd 11.0.0.255 scope global dynamic noprefixroute nic_host valid_lft 33053sec preferred_lft 27653sec inet6 fdab:9802:eb52::a59/128 scope global noprefixroute valid_lft forever preferred_lft forever inet6 fdab:9802:eb52:0:41d9:d311:10fd:e343/64 scope global mngtmpaddr noprefixroute valid_lft forever preferred_lft forever inet6 fe80::fc54:ff:fea7:796b/64 scope link valid_lft forever preferred_lft forever 7: virtsw-router: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virtsw state UNKNOWN group default qlen 1000 link/ether fe:54:00:53:1c:6b brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe53:1c6b/64 scope link valid_lft forever preferred_lft forever 8: virtsw-streamer: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virtsw state UNKNOWN group default qlen 1000 link/ether fe:54:00:5a:4c:8c brd ff:ff:ff:ff:ff:ff inet6 fe80::fc54:ff:fe5a:4c8c/64 scope link valid_lft forever preferred_lft forever
the dump was taken from the host tapping onto virtsw-streamer.
virtsw-streamer is configured as follows: <interface type='network'> <mac address='52:54:00:5a:4c:8c'/> <source network='default' portid='77aae31e-5efa-4789-911c-c55b367cd695' bridge='virtsw'/> <target dev='virtsw-streamer'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface>
If so, I can only see two options: 1) there is something in iptables or ebtables (or nftables, if you have that on the host) blocking the DHCP response packets from going out the tap interface, or 2) there is something in the guest itself blocking the traffic or preventing the packet from passing.
For (1) you'd need to run "ebtables -L; iptables -S; nft list ruleset" and look for something suspicious. here is what I get: utils_server /home/igor # ebtables -L; iptables -S; nft list ruleset The kernel doesn't support the ebtables 'filter' table. -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -N LIBVIRT_FWI -N LIBVIRT_FWO -N LIBVIRT_FWX -N LIBVIRT_INP -N LIBVIRT_OUT -A INPUT -j LIBVIRT_INP -A FORWARD -j LIBVIRT_FWX -A FORWARD -j LIBVIRT_FWI -A FORWARD -j LIBVIRT_FWO -A OUTPUT -j LIBVIRT_OUT -A LIBVIRT_FWI -o virtsw -j REJECT --reject-with icmp-port-unreachable -A LIBVIRT_FWO -i virtsw -j REJECT --reject-with icmp-port-unreachable -A LIBVIRT_FWX -i virtsw -o virtsw -j ACCEPT -A LIBVIRT_INP -i virtsw -p udp -m udp --dport 53 -j ACCEPT -A LIBVIRT_INP -i virtsw -p tcp -m tcp --dport 53 -j ACCEPT -A LIBVIRT_INP -i virtsw -p udp -m udp --dport 67 -j ACCEPT -A LIBVIRT_INP -i virtsw -p tcp -m tcp --dport 67 -j ACCEPT -A LIBVIRT_OUT -o virtsw -p udp -m udp --dport 53 -j ACCEPT -A LIBVIRT_OUT -o virtsw -p tcp -m tcp --dport 53 -j ACCEPT -A LIBVIRT_OUT -o virtsw -p udp -m udp --dport 68 -j ACCEPT -A LIBVIRT_OUT -o virtsw -p tcp -m tcp --dport 68 -j ACCEPT bash: nft: command not found
what about the rest of the ports?
For (2) can you try changing both the libreelec and the DHCP server vm's ethernet device models from virtio to e1000? (or e1000e if they are q35 machinetypes)? If that works, then change one or the other back and see if it stops working.
will try and report. changed the router's nic type to e1000e and the streamer's nic type to e1000, error still persists.
You mean so you can ssh to the client/libreelec and run tcpdump there agains the interface that's doing dhcp? Is tcpdump even available on libreelec? I know it's very limited, and has no simple facilities for adding new packages. If it has tcpdump though, then sure. The only problem is that you would probably not be able to get tcpdump running via that interface quick enough to see the initial boottime dhcp exchange; instead you'll probably need to go into the UI and bring the other interface down/up to trigger a new DHCP cycle.
static tcpdump should do the trick imho.
(BTW, if everything works when the client has a static IP address, then that proves there is no problem related to ARP requests/responses - that much is required in order for even a static IP to work)
currently I use static ip and I can ssh to the streamer from all machines on the network.

Greetings Laine, I've found the issue, I was able install tcpdump within libreelec, I saw that the dhcp sends and receives pkts well. so I've thought it might be related to the, I've replaced type from virtio to e1000 and it worked. libreelec uses kernel 5.1.16 so I assume there is a bug there when it comes to virtio. I'll use e1000 for now until the next version of libreelec comes out hopefully with the issue fixed. thanks for all the help. Dagg.
participants (3)
-
daggs
-
gunnar.wagner
-
Laine Stump