-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propagate peer ID change to outgoing reliables #15680
Conversation
Otherwise a desync could ocurr since the server does strict checking. fixes luanti-org#15627
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the patch makes the server not ack the "connection start" packet the first time its sent. this reliably produces the faulty situation.
I applied the diff to master and started a local server. Then I connected with another instance (same build) to the server. ACK is indeed skipped exactly once (first joined player). The client can join. I only get the following --trace
log on client-side:
... (no earlier mention of "con(" or "Connection")
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=23 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/50068)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
INFO[Main]: SourceShaderCache::getOrLoad(): No path found for "nodes_shader/opengl_geometry.glsl"
...
... (no earlier mention of "con(" or "Connection")
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=23 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/50068)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
...
INFO[Main]: - Starting mesh update thread
TRACE[ConnectionSend]: con(18/50068)UDP processing reliable CONNCMD_SEND
TRACE[ConnectionSend]: con(18/50068) processing reliable command for peer id: 1 data size: 36
TRACE[ConnectionSend]: con(18/50068) channel: 1, peer quota:1024
TRACE[ConnectionSend]: reliables on wire: 0, waiting for ack for 0
TRACE[ConnectionSend]: incoming_reliables: 0, next reliable packet: 65500, next queued: 0
TRACE[ConnectionSend]: reliables queued : 1
TRACE[ConnectionSend]: queued commands : 0
TRACE[ConnectionSend]: con(18/50068) INFO: sending a queued reliable packet channel: 1, seqnum: 65503
TRACE[ConnectionReceive]: con(18/50068) [ CONTROLTYPE_ACK: channelnum=1, peer_id=1, seqnum=65503 ]
TRACE[ConnectionReceive]: con(18/50068) Queuing ACK command to peer_id: 1 channel: 0 seqnum: 238
TRACE[ConnectionReceive]: con(18/50068)RECURSIVE, TYPE_RELIABLE peer_id: 1, channel: 0, seqnum: 238
TRACE[ConnectionReceive]: con(18/50068)RETURNING TYPE_ORIGINAL to user
Is this what should be expected in master + diff?
For comparison, PR + diff:
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=24 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/23091)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
TRACE[ConnectionReceive]: con(18/23091) [ CONTROLTYPE_ACK: channelnum=0, peer_id=1, seqnum=65500 ]
TRACE[ConnectionReceive]: con(18/23091) set resend timeout 1.024 (rtt=0.512) for peer id: 1
...
INFO[Main]: - Starting mesh update thread
TRACE[ConnectionSend]: con(18/23091)UDP processing reliable CONNCMD_SEND
TRACE[ConnectionSend]: con(18/23091) processing reliable command for peer id: 1 data size: 36
TRACE[ConnectionSend]: con(18/23091) channel: 1, peer quota:1024
TRACE[ConnectionSend]: reliables on wire: 0, waiting for ack for 0
TRACE[ConnectionSend]: incoming_reliables: 0, next reliable packet: 65500, next queued: 0
TRACE[ConnectionSend]: reliables queued : 1
TRACE[ConnectionSend]: queued commands : 0
TRACE[ConnectionSend]: con(18/23091) INFO: sending a queued reliable packet channel: 1, seqnum: 65503
TRACE[ConnectionReceive]: con(18/23091) [ CONTROLTYPE_ACK: channelnum=1, peer_id=1, seqnum=65503 ]
TRACE[ConnectionReceive]: con(18/23091) Queuing ACK command to peer_id: 1 channel: 0 seqnum: 238
TRACE[ConnectionReceive]: con(18/23091)RECURSIVE, TYPE_RELIABLE peer_id: 1, channel: 0, seqnum: 238
TRACE[ConnectionReceive]: con(18/23091)RETURNING TYPE_ORIGINAL to user
The changes make sense and do look correct. However, I am not sure how else this could be tested.
Not easy to tell from the logs. Your best bet bet is to use Wireshark to look at the packets. Edit: also to be precise, in the buggy state the client will disconnect with an errors 30s after joining. you can check if that happens. |
Unfortunately, I could not reproduce this error. However, the code does look good, thus I could give my approval regardless. |
I hope to never touch the networking code again.
To do
This PR is Ready for Review.
How to test
the patch makes the server not ack the "connection start" packet the first time its sent. this reliably produces the faulty situation.
with the PR applies you will see the second time the client sends the "connection start" packet it already has the new peer id assigned to by the server.
note: in case you want to test a more complicated setup: the fix (this PR) is client-side, the test repro patch is server-side