-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP timeout too short #171
Comments
This kind of issue (different server implementations using different beacon periods) was why Jeff made CA's watch-dog timeout adaptive, measuring how often it normally receives a beacon and setting the timeout to be double that period. Could you do something similar? |
This is too clever for my taste. I dislike coupling UDP beacons and TCP connections in this manner. I also dislike the unpredictability introduced. imo. timeouts on LANs should be bounded and short (in terms of human attention span) to ensure responsive applications. |
Don't these conflict though?
vs.
I was trying to suggest how you might avoid setting any timeout value to be as long as 60 seconds, but I don't know exactly what effect it has. What happens if you set it to 35 or 40 seconds say? With CA there's the chance of a lost beacon (UDP packet), is the same true here or is this all TCP traffic? |
Absolutely! imo. 30 seconds is too long. If this were a "green field" I would start with a default timeout of <=10 seconds.
I actually tested 40 seconds first. It works, but then decided to go with a more conservative 60 seconds (once bitten, twice shy). I'm happy to go with 40 seconds though. More strictly, this would mean
atm. this is all TCP (on the c++ side at least). |
On further reflection I think I'll go with the shorter default of 40 seconds. I must take notice when @anjohnson is suggesting that I'm being overly conservative. |
Since this is all TCP there's no danger of a lost ack, so a timeout must mean either network congestion or the other end was blocked, both of which are cases that should be raised as potential problems. |
Finally got around to catching up with this discussion.
That's exactly what the 'new' PVA Java client does:
If Java client receives no TCP traffic (monitor updates) from server for half the CONN_TMO (~15 seconds +-3 seconds), it sends an 'ECHO' request.
The client assumes that if the server sees nothing from client for CONN_TMO seconds, the server will close the connection on us. With softIocPVA from R7.0.4.1, I can confirm that the C++ PVA server will indeed close the connection on a running monitor after 30 seconds unless the (Java) client sends 'echo'. So both the C++ server and the (new) Java client use CONN_TMO (default 30) to time out, and they prevent a timeout by sending 'echo' requests after CONN_TMO/2 (default 15), unless there is already other TCP traffic.
I'd say that was a bug/misunderstanding in the original PVA java client. Setting the server's CONN_TMO to 40 should both fix it. Do we keep the protocol description and basic mechanism, or should that now be updated? (For what it's worth, just noticed that the new Java client actually reads EPICS_CA_CONN_TMO, not EPICS_PVA_CONN_TMO. Will need to update that to allow separate CA vs. PVA settings). |
Had accidentally used the CA setting. To be compatible with C++ implementation, should use 'EPICS_PVA_CONN_TMO'. epics-base/pvAccessCPP#171 Also listing key settings in README
The 30 second idle timeout introduced by #144 was based a misunderstanding (by me) of the meaning of connectionTimeout in the java code. pvAccessJava clients are sending a echo every 30 seconds, while pvAccessCPP (and now also PVXS) servers timeout after 30 seconds. So there is a race between these two ~equal intervals.
The symptom of this is that otherwise idle connections will sometimes timeout after a multiple of 30 seconds. eg. with client and server both on the same host (my laptop) this can sometimes take several minutes.
I guess the only reasonable course of action is to increase the timeout in pvAccessCPP from 30 seconds to 60, while leaving the echo interval at 15 seconds?
cf. #139 and epics-base/pvxs#13 (comment)
The text was updated successfully, but these errors were encountered: