Skip to content
This repository has been archived by the owner on Jan 16, 2019. It is now read-only.

Remove tcp_nodelay and add tcp_quickack in write and read #4

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

MaxiMaerz
Copy link

Acording to http://man7.org/linux/man-pages/man7/tcp.7.html
quickack flag is NOT permanent.
As discussed in https://stackoverflow.com/questions/7286592/set-tcp-quickack-and-tcp-nodelay
tcp_nodelay should not be used.
In our usecase the robot did not jump with this fix and a low-latency
kernel.

Acording to http://man7.org/linux/man-pages/man7/tcp.7.html
quickack flag is NOT permanent.
As discussed in https://stackoverflow.com/questions/7286592/set-tcp-quickack-and-tcp-nodelay
tcp_nodelay should not be used.
In our usecase the robot did not jump with this fix and a low-latency
kernel.
@Zagitta
Copy link
Owner

Zagitta commented Oct 18, 2017

Hello @MaxiMaerz, thank you for your PR.
Could you please provide some more context regarding the issue your PR fixes?
While it's not documented in this repository I actually did quite extensive latency and jitter tests for my master thesis to show the impact of using TCP_QUICKACK and TCP_NODELAY (although admittedly I didn't test them seperately).
The flags are specifically set in URStream here because not every usage of TCPSocket requires it, specifically connection to the secondary non-realtime port.

However I do believe you've identified a real issue because the flags aren't being set on the critical reverse URServer connection.
I have amended this in the "quickack" and refactored the flag setting slightly.
Unfortunately I do not have access to the robot anymore, so could you please check this achieves the same result as your PR? If it does, I will merge it into master rather than your PR.

Regarding the TCP_NODELAY flag, it has exactly the desired behaviour because the buffer supplied to each write contains a full packet and we want to send it immediately as latency is of utmost importance.

Once again, thank you for taking your time to report the issue and proposing a fix.

Best regards, Simon

@MaxiMaerz
Copy link
Author

Hello @Zagitta, sorry the description is just the commit msg.

I have to admit, that i just added the TCP_QUICKACK without much checking if it is needed in read or write, and i will check if your solution achieves the same effect.

However the TCP_NODELAY Flag has to be unset, while it is active we had major Latency Problems resulting random jitter movements. We tested on a UR 10, 5 and 3. Please check out my second link:

Turning on TCP_NODELAY has similar effects, but can make throughput worse for small writes. If you write a loop which sends just a few bytes (worst case, one byte) to a socket with "write()", and the Nagle algorithm is disabled with TCP_NODELAY, each write becomes one IP packet. This increases traffic by a factor of 40, with IP and TCP headers for each payload. Tinygram prevention won't let you send a second packet if you have one in flight, unless you have enough data to fill the maximum sized packet. It accumulates bytes for one round trip time, then sends everything in the queue. That's almost always what you want. If you have TCP_NODELAY set, you need to be much more aware of buffering and flushing issues. None of this matters for bulk one-way transfers, which is most HTTP today. (I've never looked at the impact of this on the SSL handshake, where it might matter.) Short version: set TCP_QUICKACK. If you find a case where that makes things worse, let me know. John Nagle

Additional we found, that switching to a low latency kernel can remove the last jitters which occur from time to time.

Thank you for your fast response and your great work on this driver,

Best, Maxi

@Zagitta
Copy link
Owner

Zagitta commented Oct 19, 2017

Hello again,

Yes I've read what you posted many times but I was under the impression that each packet would be flushed before the next one because there would be 8ms between each, however this assumption is incorrect as the the positions are actually interpolated at 4x rate meaning a new packet every 2ms in which case I definitely can see another packet still being in flight.

However if you have time to test the effects of the two flags I have added the small python script I used to log the jitter in my thesis called "test_jitter.py" to the root of the quickack branch.
Enabling/disabling the two flags simply requires to change the two bools passed to client_ in https://github.com/Zagitta/ur_modern_driver/blob/quickack/src/ur/server.cpp#L8

However I believe you're right about TCP_NODELAY and for now have disabled it for TrajectoryFollower on the quickack branch.

Thank you for your kind words, I'm just happy to see my code being used in the wild.

@henningkayser
Copy link

Hello @Zagitta, @MaxiMaerz,

I just tested the PR and the quickack branch on our UR5 with kernel versions 4.10.0-28-generic and 4.4.0.97-lowlatency.
Actually disabling the TCP_NODELAY flag increases jitter tremendously on our setup with only a slight improvement when using the low latency kernel.

With enabled TCP_NODELAY and QUICKACK (on every connection like in the quickack branch/PR) jitter is almost non existent. Again the low latency kernel has a small advantage in this configuration.

I'm not sure why our results are so different from those of @MaxiMaerz but I assume it might be related to the system or maybe the firmware version of the robot.
What do you think about adding an optional ros param for TCP_NODELAY?

@Zagitta
Copy link
Owner

Zagitta commented Oct 20, 2017

@henhenhen Thank you for taking the time to test both this PR and the quickack branch.

I'm by no means a network expert and honestly on quite deep water here but it seems very strange the two of you would get such vastly different results.
I would agree with you it's a hardware setup difference. How is the PC and UR connected for the both of you? Directly with static IP or with DHCP through a switch or router?

@henhenhen did you record the jitter using the "test_jitter.py" script I provided? If so are you able to share the data? Because I'm a bit curious and would like to see how the jitter compares to the tests I did for my thesis.

@MaxiMaerz
Copy link
Author

Hey @henhenhen @Zagitta,

Just for our setup:
We use a direct static connection (a switch with dhcp was causing jitters)
Our controller version is: 3.3.3.292
And we use the position based interface (not the speed based with ros control)

Can you make the Flags as rosparams? I will test as soon as possible using your tool.

I assume it depends on the complete configuration (interface and firmware on both sides),
In our case we had most jitters using TCP_NODELAY only, less if TCP_NODELAY and TCP_QUICKACK and even less if using QUICKACK only. The low latency kernel always improved the situation.

However we once had a Robot with Controller Version < 3 which did not jitter with the original settings of the driver, unfortunately i have no access to a old controller anymore.

@Zagitta
Copy link
Owner

Zagitta commented Oct 30, 2017

Hey sorry, I've been rather busy with work+life recently and honestly forgot to get back to you.
I will try to push an update on Friday with some ros parms to toggle the flags for the different sockets since there are quite a few of them that could potentially influence the jitter.

@potiuk
Copy link

potiuk commented Dec 5, 2017

We've tested that ranch and it seems that it improves the situation quite a bit. I think @Zagitta you can safely merge the changes :). I have videos showing the differences and I will share it if I got the approval from the company mangement.

We have no more fast and dangerous "catching up" with high speed which we experienced with the original ur_modern_driver version. However even with low-latency host kernel we still have some occasionals slow-downs (never full stops) of the arm during move-it generated position-controlled move.

We are going to explore it a bit more and invest some engineering effort in improving position tracking - after looking at the code and architecture, we have some ideas on how the whole positional tracking can be improved. I will open a separate issue for that - I do not want to hijack that issue for it.

UPDATE: I see that I cannot open an issue for this fork, so I will open it in the original ur_modern_driver.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants