Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] fix applyGroupSequences #2735

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gou4shi1
Copy link
Contributor

@gou4shi1 gou4shi1 commented May 11, 2023

The issue

The log of receiver:

/SRT:RcvQ:w106 D:SRT.gm: applyGroupSequences: @1071584760 gets seq from @1071584834 rcv %787865610 snd %787865661 as GROUPWISE snd-seq
/SRT:RcvQ:w106 D:SRT.gm: @1071584760: synchronizeWithGroup: DERIVED ISN: RCV=%787865661 -> %787865610 (shift by -51) SND=%787865661 -> %787865661 (shift by 0)
/SRT:RcvQ:w136 D:SRT.gm: applyGroupSequences: no socket found connected and transmitting, @1071584722 not changing sequences, storing snd-seq %787865661
/SRT:RcvQ:w136 D:SRT.gm: @1071584722: synchronizeWithGroup: DEFINED ISN: RCV=%787865661 SND=%787865661

The correct recv base seq was %787865610, and then a new member @1071584722 was comming while all other members were broken at the same time.
@1071584722 didn't inherit the correct recv base seq, but defined it's own %787865661

However, at the sender side, when the peer of @1071584722 was coming, the peer of @1071584760 was not yet broken coincidently, so @1071584722 succeeded to inherit the correct send base seq %787865610 from CUDTGroup::m_iLastSchedSeqNo.

As a result, the sender sent from %787865610 while the receiver received from %787865661, the first 51 packets triggered DROPREQ, so only the following packets succeeded to transmit.

The proposed solution 1 (this PR)

CUDTGroup::m_iLastSchedSeqNo is a good design, because members status can't be consistent all the time in both side, we should store group-wise send-base-seq and recv-base-seq. It's defined by the first connected socket, and is updated by the ACK timer (not by every packet, to reduce overhead).

The proposed solution 2

rm this check

if ((!m_config.bRendezvous) && (m_ConnRes.m_iISN != m_iISN)) // secuity check
    e = CUDTException(MJ_SETUP, MN_SECURITY, 0);

so that RESPONDER can just use the CUDTGROUP::m_iLastSchedSeqNo as m_iISN, instead of w_hs.m_iISN, then the whole applyGroupSequences() function can be eliminated.

@gou4shi1 gou4shi1 marked this pull request as draft May 11, 2023 08:30
@gou4shi1 gou4shi1 force-pushed the fix-group-sync branch 3 times, most recently from 774212f to 2b345fa Compare May 11, 2023 09:35

if (was_empty)
{
gp->syncWithSocket(s->core(), HSD_RESPONDER);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the issue, it should keep previous group-wise sequences even if the group is empty.

{
if (m_bConnected) // You are the first one, no need to change.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the issue, it should inherit previous group-wise sequences even if the group is not connected (empty).

@@ -263,7 +201,6 @@ CUDTGroup::CUDTGroup(SRT_GROUP_TYPE gtype)
, m_bSynSending(true)
, m_bTsbPd(true)
, m_bTLPktDrop(true)
, m_iTsbPdDelay_us(0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not used.

// number will collide with any ISN provided by a socket.
// Also since now every socket will derive this ISN.
m_iLastSchedSeqNo = generateISN();
resetInitialRxSequence();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the issue, it should keep previous group-wise sequences even if the group is empty.

@gou4shi1 gou4shi1 marked this pull request as ready for review May 11, 2023 10:02
@maxsharabayko maxsharabayko added this to the v1.5.2 milestone May 11, 2023
@maxsharabayko maxsharabayko added Type: Bug Indicates an unexpected problem or unintended behavior [core] Area: Changes in SRT library core labels May 11, 2023
@codecov-commenter
Copy link

Codecov Report

Merging #2735 (4c68203) into master (3cefede) will decrease coverage by 0.74%.
The diff coverage is 90.47%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master    #2735      +/-   ##
==========================================
- Coverage   67.51%   66.78%   -0.74%     
==========================================
  Files          99       99              
  Lines       20166    20148      -18     
==========================================
- Hits        13616    13456     -160     
- Misses       6550     6692     +142     
Impacted Files Coverage Δ
srtcore/group.cpp 38.11% <84.61%> (+0.34%) ⬆️
srtcore/core.cpp 60.83% <100.00%> (-1.40%) ⬇️
srtcore/group.h 60.68% <100.00%> (-0.54%) ⬇️

... and 11 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ethouris
Copy link
Collaborator

I'm sorry, but I completely don't understand the fix.

The idea for applyGroupSequences was to derive the settings from the socket to the group, if it was the first socket, or to forcefully apply the settings that the group already has otherwise.

If the problem was that the first member was broken and it caused inconsistency between connection sides, maybe it could be done differently: the ISN in case of group could be created already by a group when connecting the first socket (there is a possibility to enforce ISN when connecting) so that every socket will come in with exactly the same ISN, no matter which one succeeds as the first connection and therefore the first group member.

@ethouris
Copy link
Collaborator

Just BTW - I found #2444 that mentions a kind of this problem in the description (Problem 2). Might you be able to check if this solves the same problem or is anyhow related?

@ethouris
Copy link
Collaborator

ethouris commented Sep 6, 2024

Just FYI - I added a test #3020 to try to test the described situation. If this test doesn't represent the problem you have described in this PR, please try to improve it.

If I understand the problem correctly, that is, it came from a problem that a new empty group could derive ISN from the first socket, which in case of failure is dismissed, then this behavior is changed: the ISN for the group is set initially to a generated value, which is declared in the constructor. I believe this comment here directly refers to this problem:

    // Set this data immediately during creation before
    // two or more sockets start arguing about it.
    m_iLastSchedSeqNo = CUDT::generateISN();

@maxsharabayko maxsharabayko modified the milestones: v1.6.0, v1.5.4 Sep 9, 2024
@maxsharabayko
Copy link
Collaborator

@ethouris

If I understand the problem correctly, that is, it came from a problem that a new empty group could derive ISN from the first socket, which in case of failure is dismissed, then this behavior is changed: the ISN for the group is set initially to a generated value, which is declared in the constructor. I believe this comment here directly refers to this problem:

// Set this data immediately during creation before
// two or more sockets start arguing about it.
m_iLastSchedSeqNo = CUDT::generateISN();

This initialization was done in #1438 (SRT v1.4.2). I suppose the reported problem is on the receiver side 🤔

@maxsharabayko maxsharabayko modified the milestones: v1.5.4, v1.6.0 Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants