Hi,
Problem-
Offloading (in Software) for VM generated packets ( TSO enabled in VM's ) degrades severely with increase in VM's on a host.
On increasing VM's ( which are pushing traffic simultaneously ) on a compute node-
- % offloaded packets coming out of VM's ( TSO enabled ) on tap port / veth-pair decreases significantly
- Size of offloaded packets coming out of VM's (TSO enabled ) on tap porty / veth pair decreases significantly
We are using OpenStack setup. Throughput for SNAT Test ( iperf client at VM and server at external network machine ) is SIGNIFICANTLY less that DNAT Test ( server at VM and client at external network machine). For 50 VM's ( 25 VM on each compute node on a 2 Compute Node setup ) SNAT throughput is 30% less than DNAT Throughput.
I was hoping to get community feedback on what is controlling the software offloading of VM packets and how can we improve it ?
NOTE- This seems to be one of the bottlenecks in SNAT which is affecting throughput at TX side on Compute Node. Improving this would help in improving SNAT test network performance.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Description-
We have a testbed OpenStack deployment. We boot 1, 10 and 25 VM's on a single compute node and start iperf traffic. ( VM's are iperf client ).
We then simultaneously do tcpdump at the veth-pair connecting the VM to the OVS Bridges.
Tcpdump data shows that on increasing the VM's on a host, the % of offloaded packets degrades severely
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Host configuration- 12 cores ( 24 vCPU ), 40 GB RAM
[root rhel7-25 ~]# uname -a
Linux rhel7-25.in.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
VM MTU is set to 1450
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Analysis-
Following is the % of non-offloaded packets observed at the tap ports / veth pair ( connected to VM's TO OVS Bridge )
------------------------------------
| VMs on 1 Compute Node | % Non-Offloaded packets |
|-----------------------------------|
| 1 | 11.11% |
| 10 | 71.78% |
| 25 | 80.44% |
|----|--------- |
Thus we see significant degradation in offloaded packets when 10 and 25 VM's are sending iperf data simultaneously. ( TSO enabled VM's )
Non-Offloaded packets means Ethernet Frame of size 1464 ( VM MTU is 1450 ).
Thus the packets coming out of the VM's (TSO enabled ) are majority non-offloaded as we increase VM's on a host.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Tcpdump details-
Iperf Server IP- 1.1.1.34
For 1 VM, we see majority offloaded packets and also large sized offloaded frames-
[piyush rhel7-34 25]$ cat qvoed7aa38d-22.log | grep "> 1.1.1.34.5001" | head -n 30
14:36:26.331073 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 74: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 0
14:36:26.331917 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 66: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 0
14:36:26.331946 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 90: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 24
14:36:26.331977 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 7056: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 6990
14:36:26.332018 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 5658: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 5592
14:36:26.332527 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 7056: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 6990
14:36:26.332560 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 9852: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 9786
14:36:26.333024 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 8454: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 8388
14:36:26.333054 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 7056: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 6990
14:36:26.333076 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 4260: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 4194
14:36:26.333530 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 16842: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 16776
14:36:26.333568 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 4260: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 4194
14:36:26.333886 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 21036: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 20970
14:36:26.333925 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 2862: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 2796
14:36:26.334303 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 21036: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 20970
14:36:26.334349 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 2862: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 2796
14:36:26.334741 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 22434: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 22368
14:36:26.335118 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 25230: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 25164
14:36:26.335566 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 25230: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 25164
14:36:26.336007 fa:16:3e:98:41:8b > fa:16:3e:ef:5f:16, IPv4, length 23832: 10.20.7.3.50395 > 1.1.1.34.5001: tcp 23766
For 20 VM's, we see reduction is size of offloaded packets and also the size of offloaded packets is reduced. Tcpdump for one of the 10 VM's ( similar characterization for all 10 VM's )-
[piyush rhel7-34 25]$ cat qvo255d8cdd-90.log | grep "> 1.1.1.34.5001" | head -n 30
15:09:25.024790 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 74: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 0
15:09:25.026834 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 66: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 0
15:09:25.026870 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 90: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 24
15:09:25.027186 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 7056: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 6990
15:09:25.027213 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 5658: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 5592
15:09:25.032500 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 5658: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 5592
15:09:25.032539 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 1464: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 1398
15:09:25.032567 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 7056: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 6990
15:09:25.035122 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 7056: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 6990
15:09:25.035631 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 7056: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 6990
15:09:25.035661 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 7056: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 6990
15:09:25.038508 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 7056: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 6990
15:09:25.038904 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 7056: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 6990
15:09:25.039300 fa:16:3e:b9:f8:ec > fa:16:3e:c1:de:cc, IPv4, length 7056: 10.20.18.3.36798 > 1.1.1.34.5001: tcp 6990
For 25 VM's, we see very less offloaded packets and also the size of offloaded packets is reduced. Tcpdump for one of the 25 VM's ( similar characterization for all 25 VM's )-
15:52:31.544316 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.544340 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.545034 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.545066 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 5658: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 5592
15:52:31.545474 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.545501 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 2862: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 2796
15:52:31.545539 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 2862: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 2796
15:52:31.545572 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 7056: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 6990
15:52:31.545736 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.545807 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.545813 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.545934 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.545956 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.545974 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
15:52:31.546012 fa:16:3e:3c:7d:78 > fa:16:3e:aa:af:d5, IPv4, length 1464: 10.20.10.3.45892 > 1.1.1.34.5001: tcp 1398
Thanks and regards,
Piyush Raman
Mail: pirsriva@in.ibm.com