On Sat, Jan 27, 2018 at 3:44 PM, Peter Crowther
<peter.crowther(a)melandra.com> wrote:
You say you can ping but not ssh. If you install tcpdump on the VM,
can you
see the ping packets arriving and leaving? If not, I suspect an address
collision - especially if ping continues to work with the VM shut down. If
you can't ping, check the other end of your bridge. I'm more familiar with
open vSwitch, but I'm somewhat concerned that your bridge definition doesn't
include a physical NIC as one of its connections.
Ok, so I have investigated a bit further by doing some tcpdump and
wireshark traces, as you suggested, and here is what I have found:
When an Ethernet frame that is less then 60 bytes in size goes through
the network, it is padded with 0x00 bytes until it has 60 bytes in
length (64 with the frame check sequence). When this kind of padded
frames goes from centos1 VM through the linux bridge br0 to the
windows host, the IP and TCP headers in those frames wrongly consider
the 0x00 padded bytes as part of the user data, therefore the upstream
protocol (SSH in my case) tries to interpret them, and this is why
Putty hangs. Those 0x00 padded bytes are at the layer2 Ethernet frame
level, and should not be considered in the user data of the higher
level protocols.
About the padding bytes I have found some info here:
https://wiki.wireshark.org/Ethernet#Allowed_Packet_Lengths
The flow in my environment is like this:
[windows host]<---->[server1 host br0(eno1,vnet0)]<---->[eth0 centos1 VM]
All above hosts are in the same subnet, so no routers in between.
Server1 has the br0 linux bridge in forwarding mode that connects eno1
physical interface with the vnet0 tap interface. The vnet0 tap
interface is connected to the centos1 VM eth0 interface.
When I (1) ssh from the windows host to server1, no issue here. When I
(2) ssh from the same windows host to the centos1 VM, so I go through
the br0 bridge, I have this ssh issue I have mentioned. So I took
several tcpdump traces, and compared the working ones with the non
working ones, and this is the conclusion. So at this stage, everything
points to the linux bridge, since in the working scenario (1) those
0x00 padding bytes are left alone and not considered in the user data
of the IP and TCP protocols.