Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate peer ID change to outgoing reliables #15680

Merged
merged 4 commits into from
Feb 1, 2025
Merged

Conversation

sfan5
Copy link
Collaborator

@sfan5 sfan5 commented Jan 14, 2025

I hope to never touch the networking code again.

To do

This PR is Ready for Review.

How to test

diff --git a/src/network/mtp/impl.cpp b/src/network/mtp/impl.cpp
--- a/src/network/mtp/impl.cpp
+++ b/src/network/mtp/impl.cpp
@@ -1661,6 +1661,14 @@ void Connection::sendAck(session_t peer_id, u8 channelnum, u16 seqnum)
                        " channel: " << (channelnum & 0xFF) <<
                        " seqnum: " << seqnum << std::endl);
 
+       static bool first = true;
+       if (peer_id != PEER_ID_SERVER && channelnum == 0 && seqnum == SEQNUM_INITIAL) {
+               if (first) {
+                       first = false;
+                       return;
+               }
+       }
+
        SharedBuffer<u8> ack(4);
        writeU8(&ack[0], PACKET_TYPE_CONTROL);
        writeU8(&ack[1], CONTROLTYPE_ACK);
  1. apply patch, reproduce issue on master
  2. switch to PR (keep patch)
  3. issue is gone

the patch makes the server not ack the "connection start" packet the first time its sent. this reliably produces the faulty situation.
with the PR applies you will see the second time the client sends the "connection start" packet it already has the new peer id assigned to by the server.

note: in case you want to test a more complicated setup: the fix (this PR) is client-side, the test repro patch is server-side

Otherwise a desync could ocurr since the server does strict checking.
fixes luanti-org#15627
@sfan5 sfan5 added @ Network Bugfix 🐛 PRs that fix a bug labels Jan 14, 2025
@sfan5 sfan5 added this to the 5.11.0 milestone Jan 14, 2025
@sfan5 sfan5 linked an issue Jan 14, 2025 that may be closed by this pull request
Copy link
Member

@SmallJoker SmallJoker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the patch makes the server not ack the "connection start" packet the first time its sent. this reliably produces the faulty situation.

I applied the diff to master and started a local server. Then I connected with another instance (same build) to the server. ACK is indeed skipped exactly once (first joined player). The client can join. I only get the following --trace log on client-side:

... (no earlier mention of "con(" or "Connection")
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=23 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/50068)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
INFO[Main]: SourceShaderCache::getOrLoad(): No path found for "nodes_shader/opengl_geometry.glsl"
...
... (no earlier mention of "con(" or "Connection")
INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=23 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/50068)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
...
INFO[Main]: - Starting mesh update thread
TRACE[ConnectionSend]: con(18/50068)UDP processing reliable CONNCMD_SEND
TRACE[ConnectionSend]: con(18/50068) processing reliable command for peer id: 1 data size: 36
TRACE[ConnectionSend]: con(18/50068)	 channel: 1, peer quota:1024
TRACE[ConnectionSend]: 			reliables on wire: 0, waiting for ack for 0
TRACE[ConnectionSend]: 			incoming_reliables: 0, next reliable packet: 65500, next queued: 0
TRACE[ConnectionSend]: 			reliables queued : 1
TRACE[ConnectionSend]: 			queued commands  : 0
TRACE[ConnectionSend]: con(18/50068) INFO: sending a queued reliable packet  channel: 1, seqnum: 65503
TRACE[ConnectionReceive]: con(18/50068) [ CONTROLTYPE_ACK: channelnum=1, peer_id=1, seqnum=65503 ]
TRACE[ConnectionReceive]: con(18/50068) Queuing ACK command to peer_id: 1 channel: 0 seqnum: 238
TRACE[ConnectionReceive]: con(18/50068)RECURSIVE, TYPE_RELIABLE peer_id: 1, channel: 0, seqnum: 238
TRACE[ConnectionReceive]: con(18/50068)RETURNING TYPE_ORIGINAL to user

Is this what should be expected in master + diff?

For comparison, PR + diff:

INFO[Main]: Client::ReceiveAll(): Packet processing budget exceeded.
INFO[Main]: Client::afterContentReceived() started
...
INFO[Main]: getShaderIdDirect(): Returning id=24 for name "nodes_shader"
VERBOSE[ConnectionSend]: con(18/23091)RE-SENDING timed-out RELIABLE to 127.0.0.1(t/o=0.5): count=1, channel=0, seqnum=65500
TRACE[ConnectionReceive]: con(18/23091) [ CONTROLTYPE_ACK: channelnum=0, peer_id=1, seqnum=65500 ]
TRACE[ConnectionReceive]: con(18/23091) set resend timeout 1.024 (rtt=0.512) for peer id: 1
...
INFO[Main]: - Starting mesh update thread
TRACE[ConnectionSend]: con(18/23091)UDP processing reliable CONNCMD_SEND
TRACE[ConnectionSend]: con(18/23091) processing reliable command for peer id: 1 data size: 36
TRACE[ConnectionSend]: con(18/23091)	 channel: 1, peer quota:1024
TRACE[ConnectionSend]: 			reliables on wire: 0, waiting for ack for 0
TRACE[ConnectionSend]: 			incoming_reliables: 0, next reliable packet: 65500, next queued: 0
TRACE[ConnectionSend]: 			reliables queued : 1
TRACE[ConnectionSend]: 			queued commands  : 0
TRACE[ConnectionSend]: con(18/23091) INFO: sending a queued reliable packet  channel: 1, seqnum: 65503
TRACE[ConnectionReceive]: con(18/23091) [ CONTROLTYPE_ACK: channelnum=1, peer_id=1, seqnum=65503 ]
TRACE[ConnectionReceive]: con(18/23091) Queuing ACK command to peer_id: 1 channel: 0 seqnum: 238
TRACE[ConnectionReceive]: con(18/23091)RECURSIVE, TYPE_RELIABLE peer_id: 1, channel: 0, seqnum: 238
TRACE[ConnectionReceive]: con(18/23091)RETURNING TYPE_ORIGINAL to user

The changes make sense and do look correct. However, I am not sure how else this could be tested.

src/network/mtp/threads.cpp Outdated Show resolved Hide resolved
@sfan5
Copy link
Collaborator Author

sfan5 commented Jan 19, 2025

Is this what should be expected in master + diff?

Not easy to tell from the logs. Your best bet bet is to use Wireshark to look at the packets.

Edit: also to be precise, in the buggy state the client will disconnect with an errors 30s after joining. you can check if that happens.

src/network/mtp/threads.cpp Outdated Show resolved Hide resolved
src/network/mtp/threads.cpp Outdated Show resolved Hide resolved
src/network/mtp/threads.cpp Outdated Show resolved Hide resolved
@SmallJoker
Copy link
Member

Edit: also to be precise, in the buggy state the client will disconnect with an errors 30s after joining. you can check if that happens.

Unfortunately, I could not reproduce this error. However, the code does look good, thus I could give my approval regardless.

@sfan5 sfan5 merged commit c0422b1 into luanti-org:master Feb 1, 2025
16 checks passed
@sfan5 sfan5 deleted the blergh branch February 1, 2025 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disconnected after 30 seconds on servers upgraded to 5.10 (or even 5.9)
2 participants