Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XHR Polling misbehaves under (very) poor network conditions #91

Open
exFalso opened this issue Feb 5, 2016 · 0 comments
Open

XHR Polling misbehaves under (very) poor network conditions #91

exFalso opened this issue Feb 5, 2016 · 0 comments

Comments

@exFalso
Copy link

exFalso commented Feb 5, 2016

A bit of context:
We started using a SockJS based protocol in production a couple of weeks ago. After a couple of days of monitoring we noticed some clients exhibiting weird behaviour in rare cases: it seemed as if newly connected clients started in the middle of the application-layer protocol rather than with a regular handshake.
After a bit of debugging we realized that the cause is that sometimes the XHR polling client sends a delayed (> 5 second delay) poll request, which is treated as a new connection. If in the meantime an /xhr_send also happens then on the server this is treated as the first message of a newly connected client.
The server side SockJS is our own implementation so I checked with the reference sockjs-node implementation and could reproduce the same issue.

Steps to reproduce:
I'll use the example echo service from the https://github.com/sockjs/sockjs-node README. The poor network condition is simulated with a port-forwarding proxy.

  1. Start server on port 9999
  2. Start port forward on localhost from 9998 to 9999
  3. $ curl -X POST localhost:9998/echo/000/000/xhr open frame
  4. $ curl -X POST localhost:9998/echo/000/000/xhr poll
  5. $ curl localhost:9998/echo/000/000/xhr_send --data '["Hello"]' -H content-type:text/plain poll returns, server is waiting 5 seconds for next poll before close
  6. Send SIGSTOP to the proxy. The port will stay open but data sent will be buffered by the OS
  7. $ curl -X POST localhost:9998/echo/000/000/xhr poll, does not reach server
  8. $ curl localhost:9998/echo/000/000/xhr_send --data '["World!"]' -H content-type:text/plain hangs
  9. Wait 5 seconds so that server drops connection
  10. Send SIGCONT to port forward. This will cause the /xhr and /xhr_send requests to be delivered (hopefully in this order). The /xhr request will open a new connection with the same session_id (return code 200) and the /xhr_send will send the message (return code 204), most probably causing havoc in the application layer protocol.

Solution proposal 1: Move the "open connection" functionality to a new endpoint, e.g. /xhr_open. If a rogue /xhr request arrives the server can simply disregard it. I think this is The Right Way to solve this, however it is obviously not backwards-compatible. Maybe keep the the original behaviour around as deprecated for a few versions?
Solution proposal 2: Keep around a set of recently-closed-session_ids on the server. New connections opened with such a session_id should be rejected. This has the benefit of being backwards compatible, however it adds additional complexity to the server and I reckon it also makes testing more cumbersome, as they may rely on being able to instantly open a new connection with the same session_id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant