XHR Polling misbehaves under (very) poor network conditions

A bit of context:
We started using a SockJS based protocol in production a couple of weeks ago. After a couple of days of monitoring we noticed some clients exhibiting weird behaviour in rare cases: it seemed as if newly connected clients started in the middle of the application-layer protocol rather than with a regular handshake.
After a bit of debugging we realized that the cause is that sometimes the XHR polling client sends a delayed (> 5 second delay) poll request, which is treated as a new connection. If in the meantime an /xhr_send also happens then on the server this is treated as the first message of a newly connected client.
The server side SockJS is our own implementation so I checked with the reference sockjs-node implementation and could reproduce the same issue.

Steps to reproduce:
I'll use the example echo service from the https://github.com/sockjs/sockjs-node README. The poor network condition is simulated with a port-forwarding proxy.
1. Start server on port 9999
2. Start port forward on localhost from 9998 to 9999
3. `$ curl -X POST localhost:9998/echo/000/000/xhr` open frame
4. `$ curl -X POST localhost:9998/echo/000/000/xhr` poll
5. `$ curl localhost:9998/echo/000/000/xhr_send --data '["Hello"]' -H content-type:text/plain` poll returns, server is waiting 5 seconds for next poll before close
6. Send SIGSTOP to the proxy. The port will stay open but data sent will be buffered by the OS
7. `$ curl -X POST localhost:9998/echo/000/000/xhr` poll, does not reach server
8. `$ curl localhost:9998/echo/000/000/xhr_send --data '["World!"]' -H content-type:text/plain` hangs
9. Wait 5 seconds so that server drops connection
10. Send SIGCONT to port forward. This will cause the /xhr and /xhr_send requests to be delivered (hopefully in this order). The /xhr request will open a new connection with the same session_id (return code 200) and the /xhr_send will send the message (return code 204), most probably causing havoc in the application layer protocol.

Solution proposal 1: Move the "open connection" functionality to a new endpoint, e.g. /xhr_open. If a rogue /xhr request arrives the server can simply disregard it. I think this is The Right Way to solve this, however it is obviously not backwards-compatible. Maybe keep the the original behaviour around as deprecated for a few versions?
Solution proposal 2: Keep around a set of recently-closed-session_ids on the server. New connections opened with such a session_id should be rejected. This has the benefit of being backwards compatible, however it adds additional complexity to the server and I reckon it also makes testing more cumbersome, as they may rely on being able to instantly open a new connection with the same session_id.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

XHR Polling misbehaves under (very) poor network conditions #91

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

XHR Polling misbehaves under (very) poor network conditions #91

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions