Load balancer example tcp connection to application server is not working #90

mardim91 · 2020-04-27T15:14:56Z

The nc connection to the application server through the "nc 10.2.2.0 5001" command is not working.

Executing tcpdump commands in application server in the nsm0 interface I observe that the SRC ip of the encapsulated packet is not the 10.70.0.0 but some random IP. Something is not working very well in load balancer plugin when it comes to the TCP connections. The ICMP connections are working fine and the source IPs are 10.70.0.0.

Steps to reproduce:

Deploy the load-balancer example
Login to the application server pod and execute tcpdump -i nsm0
login to load balancer pod and execute "nc 10.2.2.0 5001"
Check the Source IPs of the encapsulated packet.

nickolaev · 2020-05-20T22:26:37Z

Could that be a VPP problem?

edwarnicke · 2020-05-20T22:28:25Z

@uablrek thoughts?

uablrek · 2020-05-21T07:11:27Z

Access works when initiated from outside the cluster, i.e when the k8s-node forwards the traffic. When traffic is initiated from the k8s-node itself it seem to fail. I can't see how linux can mess this up so IMHO the fault must be in NSM (vpp?).

uablrek · 2020-05-21T08:50:02Z

tcpdump inside an application-server POD

When traffic is initiated from the k8s-node the src is trashed as described;

$ tcpdump -lni nsm0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on nsm0, link-type EN10MB (Ethernet), capture size 262144 bytes
08:45:15.856313 IP 10.60.1.1 > 10.60.1.3: GREv0, length 64: IP 9.74.2.1.37217 > 10.2.2.2.5001: Flags [S], seq 3667427917, win 64240, options [mss 1460,sackOK,TS val 1802552007 ecr 0,nop,wscale 7], length 0
08:45:16.868429 IP 10.60.1.1 > 10.60.1.3: GREv0, length 64: IP 5.84.2.1.37217 > 10.2.2.2.5001: Flags [S], seq 3667427917, win 64240, options [mss 1460,sackOK,TS val 1802553021 ecr 0,nop,wscale 7], length 0

But when traffic is initiated from outside the cluster it works;

$ tcpdump -lni nsm0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on nsm0, link-type EN10MB (Ethernet), capture size 262144 bytes
08:48:16.305146 IP 10.60.1.1 > 10.60.1.3: GREv0, length 64: IP 192.168.1.201.34401 > 10.2.2.2.5001: Flags [S], seq 1134753072, win 64240, options [mss 1460,sackOK,TS val 3857117458 ecr 0,nop,wscale 6], length 0
08:48:16.305339 IP 10.2.2.2.5001 > 192.168.1.201.34401: Flags [S.], seq 2880819978, ack 1134753073, win 65160, options [mss 1460,sackOK,TS val 3266929533 ecr 3857117458,nop,wscale 7], length 0
08:48:16.329451 IP 10.60.1.1 > 10.60.1.3: GREv0, length 56: IP 192.168.1.201.34401 > 10.2.2.2.5001: Flags [.], ack 1, win 1004, options [nop,nop,TS val 3857117477 ecr 3266929533], length 0
08:48:16.332074 IP 10.2.2.2.5001 > 192.168.1.201.34401: Flags [P.], seq 1:37, ack 1, win 510, options [nop,nop,TS val 3266929559 ecr 3857117477], length 36
08:48:16.332461 IP 10.2.2.2.5001 > 192.168.1.201.34401: Flags [F.], seq 37, ack 1, win 510, options [nop,nop,TS val 3266929560 ecr 3857117477], length 0
08:48:16.349096 IP 10.60.1.1 > 10.60.1.3: GREv0, length 56: IP 192.168.1.201.34401 > 10.2.2.2.5001: Flags [.], ack 37, win 1004, options [nop,nop,TS val 3857117501 ecr 3266929559], length 0
08:48:16.389033 IP 10.60.1.1 > 10.60.1.3: GREv0, length 56: IP 192.168.1.201.34401 > 10.2.2.2.5001: Flags [.], ack 38, win 1004, options [nop,nop,TS val 3857117545 ecr 3266929560], length 0
08:48:17.553320 IP 10.60.1.1 > 10.60.1.3: GREv0, length 56: IP 192.168.1.201.34401 > 10.2.2.2.5001: Flags [F.], seq 1, ack 38, win 1004, options [nop,nop,TS val 3857118706 ecr 3266929560], length 0
08:48:17.553450 IP 10.2.2.2.5001 > 192.168.1.201.34401: Flags [.], ack 2, win 51

uablrek · 2020-05-21T08:52:52Z

Note that it is the first 16 bits in the src address that are over-written with some garbage. Last 16 bit are ok.

mardim91 · 2020-05-21T10:55:56Z

From some investigation that I did long ago i could isolate the problem and my conclusion is that the problem must be on the load balancer vpp plugin. Everything else looks alright until the traffic reaches the tunnel that is created from the load balancer inside vpp towards the application server. There the traffic gets messed up. So my best bet would be that the bug is on the vpp load balancer plugin side and on the way that sets up the tunnel.

uablrek · 2020-05-21T10:56:14Z

BTW This problem did not exist when the example was submitted.

uablrek · 2020-05-21T11:06:26Z

16-bit and very random, a misplaced CRC?

edwarnicke · 2020-05-21T19:43:37Z

@uablrek Might be good to poke vpp-dev

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load balancer example tcp connection to application server is not working #90

Load balancer example tcp connection to application server is not working #90

mardim91 commented Apr 27, 2020

nickolaev commented May 20, 2020

edwarnicke commented May 20, 2020

uablrek commented May 21, 2020 •

edited

Loading

uablrek commented May 21, 2020

uablrek commented May 21, 2020

mardim91 commented May 21, 2020

uablrek commented May 21, 2020

uablrek commented May 21, 2020

edwarnicke commented May 21, 2020

Load balancer example tcp connection to application server is not working #90

Load balancer example tcp connection to application server is not working #90

Comments

mardim91 commented Apr 27, 2020

nickolaev commented May 20, 2020

edwarnicke commented May 20, 2020

uablrek commented May 21, 2020 • edited Loading

uablrek commented May 21, 2020

tcpdump inside an application-server POD

uablrek commented May 21, 2020

mardim91 commented May 21, 2020

uablrek commented May 21, 2020

uablrek commented May 21, 2020

edwarnicke commented May 21, 2020

uablrek commented May 21, 2020 •

edited

Loading