Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Add reconnection counter for the group in main/backup mode #1522

Open
1 task
maxsharabayko opened this issue Sep 3, 2020 · 3 comments · May be fixed by #1552
Open
1 task

[FR] Add reconnection counter for the group in main/backup mode #1522

maxsharabayko opened this issue Sep 3, 2020 · 3 comments · May be fixed by #1552
Assignees
Labels
[core] Area: Changes in SRT library core Priority: High Type: Enhancement Indicates new feature requests
Milestone

Comments

@maxsharabayko
Copy link
Collaborator

When an active link is switched to a backup link, there should be a way for the application to know those switches are happening.
For example, a switch counter for the group.

Suppose we have Link A (main link) and Link B (backup link).
The initial value of the switch counter for a group is 0.

If Link A becomes (is detected as) unstable, Link B is activated, and the switch counter is incremented by 1.

If after some time Link A becomes stable again, and the transmission continues over this link, while Link B is back to idle, the switch counter is incremented. The condition is in force even if both links were used for transmission for the described period of instability.

TODO

  • Define the API function to determine the switch count.
    • It should not be a group statistic.
    • It could be srt_group_data(..) function, but an additional argument would be required (GROUP_STATUS).
    • It could be a read-only socket option SRTO_SWITCH_COUNT.
@maxsharabayko maxsharabayko added Type: Enhancement Indicates new feature requests [core] Area: Changes in SRT library core labels Sep 3, 2020
@maxsharabayko maxsharabayko added this to the v1.5.0 - Sprint 23 milestone Sep 3, 2020
@ethouris
Copy link
Collaborator

Reconnection is a term that doesn't exist in the implementation of the group, so this isn't possible to be implemented. Simply the SRT library doesn't have information it could use to calculate this statistical data.

Where's the problem: SRT knows only that the application requested a connection. If this connection breaks, it's removed. If the application connects to the same address again, SRT doesn't have any "history information" about the connection that was used once already. Even if you have used this connection in the past, for SRT this is always a "new connection".

Application is the only actor here that knows that particular connection is "renewed" and as well the only actor that undertakes the action of "reconnecting a broken link". The best place then to keep this information is in the application's connection table as this is the only place that can "identify the old and new link as the same".

@maxsharabayko
Copy link
Collaborator Author

"Reconnection counter" is an incorrect term I used in the title. However, the description talks about the "switch counter".

@ethouris
Copy link
Collaborator

ethouris commented Sep 15, 2020

Ok, so one more thing, just to make sure about the implementation.

If you have a situation that a link is considered unstable, the only reaction is activation of the highest priority idle link. Nothing more at the moment, and since this time there are two links used. Also there's nothing done about this fact for some initial period of time as the link is "temporary activated" and until this state is over, it's not considered parallel.

Only after this period is over, the temporary activation state is cleared and since this moment the link can potentially be treated as parallel. USUALLY though if the link was really broken, this link will be likely removed before this check could be performed. It works then more-less this way (let's say that every line represents a single call of the sending function, just the number of these calls here is way less than in reality):

running/stable | idle
running/stable | idle
running/UNSTABLE | running/tmpactive <-- GROUPSTABTIMEO exceeded, link unstable!
running/UNSTABLE | running/tmpactive (no parallel links atm)
running/UNSTABLE | running/tmpactive
running/UNSTABLE | running/tmpactive
BROKEN | running/tmpactive <-- the stability problem resolved to broken link
... | running/tmpactive
... | running/stable <-- Temporary activation period is over

The question is, which exactly event should cause the counter to be increased?

Note also that in case of the "stability overreaction" the scenario is a little different:

running/stable | idle
running/stable | idle
running/UNSTABLE | running/tmpactive <-- GROUPSTABTIMEO exceeded, link unstable!
running/UNSTABLE | running/tmpactive (no parallel links atm)
running/stable | running/tmpactive <-- Stable again!
running/stable | running/tmpactive
running/stable | running/stable <-- Temporary activation period is over
running/stable | idle <-- Link silenced

I think this situation also deserves some stats to be collected, but if you only have a counter of activated links, these two situations would not be distinguishable.

@ethouris ethouris linked a pull request Sep 16, 2020 that will close this issue
@mbakholdina mbakholdina modified the milestones: v1.5.0 - Sprint 23, v1.5.0 - Sprint 25 Sep 21, 2020
@mbakholdina mbakholdina modified the milestones: v1.5.0 - Sprint 25, v1.5.0 Oct 14, 2020
@maxsharabayko maxsharabayko modified the milestones: v1.5.0, v1.4.3 Oct 22, 2020
@mbakholdina mbakholdina modified the milestones: v1.4.3, v1.5.0 Oct 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Priority: High Type: Enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants