-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Added connection counter stats for backup groups #1552
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CBytePerfMon
structure keeps increasing. 🤔
8 new stats entries for group usage only (4 interval-based and 4 accumulated).
We might need something like srt_groupstats(CBytePerfMon*, CGroupPerfMon*, bool clear, bool total)
And maybe a newer CBytePerfMon
, that does not mix interval-based and total. Requires some thinking effort.
|
||
uint32_t countBreak; // increased with every broken link in the group | ||
uint32_t countActivate; // increased with every activated link in the group | ||
uint32_t countEager; // increased with every case when unstable link became stable again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recovered or restored might be better. 🤔
uint32_t countEager; // increased with every case when unstable link became stable again | |
uint32_t countRecovered; // increased with every case when unstable link became stable again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation for this field is to count the cases when a link has been considered unstable, which has forced the group sender to activate the backup link, but then it turned out that the link actually wasn't broken and has returned to stability. This means that the activation was based on a false alert and resulted in only unnecessary overhead. A user who is reviewing these statistics, when seeing any nonzero value here, and moreover widely exceeding the value from countBroken
, it should be an information for him that he definitely has too small value of groupstabtimeo
option.
Hard to fit such an explanation in one comment line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some better names I could propose would be:
- countFalseAlert
- countMissed
- countExcess
uint32_t countBreak; // increased with every broken link in the group | ||
uint32_t countActivate; // increased with every activated link in the group | ||
uint32_t countEager; // increased with every case when unstable link became stable again | ||
uint32_t countSilence; // increased with every case of switching a link back to idle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better? 🤔
uint32_t countSilence; // increased with every case of switching a link back to idle | |
uint32_t countDeactivated; // increased with every case of switching a link back to idle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks ok, if you prefer. Just note I can't simply commit this suggestion because the name must be changed everywhere, including documentation.
This should be updated after adding: