Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated SRT core statistics documentation #1199

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

mbakholdina
Copy link
Collaborator

  • Updated SRT core statistics documentation
  • Created a summary table with SRT statistics

Please do not merge, it requires final improvements.

@mbakholdina mbakholdina self-assigned this Mar 24, 2020
@mbakholdina mbakholdina added [docs] Area: Improvements or additions to documentation Type: Maintenance Work required to maintain or clean up the code labels Mar 24, 2020
@mbakholdina mbakholdina added this to the v1.5.0 milestone Mar 24, 2020
TODO:

- There is no `pktRcvRetransTotal` stats.
- Which side
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Receiver. These are data packets that arrived with flag R=1.


Sending rate in Mbps. Sender side.

## mbpsRecvRate
TODO: How it is calculated?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mbpsSendRate = 8 * pktSend / count_us(now - lastSampleTime) (symbolically. pktSend is set to 0 at the lastSampleTime time).


The distance in sequence numbers between the two original (not retransmitted) packets,
that were received out of order. Receiver only.
TODO: How it is calculated? Why it is interval based?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the maximum so far seen distance in sequence numbers between the out-of-order packet (late packet with R=0) and the top receiver sequence. This value is decreased by 1 together with reorder tolerance at the moment when 10 packets in a row were received in order.

### pktReorderTolerance

TODO:
- Why it is in interval-based statistics?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because reorder tolerance is self-regulating. It starts from 0 and it grows whenever pktReorderDistance grows, up to a maximum that is controlled by SRTO_LOSSMAXTTL option.



## pktRcvAvgBelatedTime
TODO: What's this?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need some attention. The intention of this data was to calculate the average time of a uselessly sent packet that has arrived despite that it was already received (or dropped) in order to get oriented how much the sender failed to prevent a potentially belated packet from being sent.

After time, I think in this form this stat may not exactly help in anything, as it tends to have a "negative cummulation" (packets that were retransmitted and accepted do not contribute to this value at all). OTOH it might give you an idea that there's something wrong if this value tends to be quite high and exceeds the latency (that is, it was sent later than 2* after the latency passed).

## pktRcvBelated
### pktRcvBelated

TODO: Revise this, which side, measured over the interval
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Receiver. This is the number of packets that had the sequence behind ACK and were sent uselessly.

Note that several sockets sharing one outgoing port use the same sending queue.
They may have different pacing of the outgoing packets, but all the packets will
be placed in the same sending queue, which may affect the send timing.
- How is this calculated? The minimum time during which period?
Copy link
Collaborator

@ethouris ethouris Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't calculated, it's the really used control value that represents the time interval between two consecutive sent packets. That is, when a sending thread is about to send a packet and the time since last sending is less than this, it sleeps up to that time.

Note that packets used for bandwidth probing and control packets are sent without respecting this interval, however (if I'm not missed) for bandwidth probing packets this haste time is recorded as spared time (this means longer sleeping for the next packet).

How this value is shaped, it depends on the congestion control module and the SRTO_MAXBW setting.

The maximum number of packets that can be "in flight". Sender only.
See also [pktFlightSize](#pktFlightSize).
- Rephrase this - The maximum number of packets that can be "in flight" state. - it does not reflect the idea
- revise the whole paragraph
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"In flight" means the packets that have departed the sender, but did not arrive at the receiver, in a certain point in time. Precise estimation of this value is impossible without having some 3rd independent probe machine that can retrieve data at precisely the same time from both machines simultaneously, but this value can at least exist in theory. The only point in time when this value can be calculated closely to reality is at the moment when ACK is being received by the data sender, and it's the distance between ACK and the top sender sequence - though this still includes the number of packets that have been received during the time when the ACK packet was travelling through the network.

In this value this is the "maximum" allowed flight size - because the sender can only control those packets that it didin't sent yet, and therefore if the measured flight size exceeds the maximum allowed flight size, it must stop sending, even though the receiver buffer free space didn't get to 0.

`pktFlightSize <= pktFlowWindow` and `pktFlightSize <= pktCongestionWindow`
The number of packets in flight is calculated as the difference between sequence numbers of the latest acknowledged packet (latest reported by an ACK message packet) and the latest sent packet at the moment statistic is being read. Note that `pktFlightSize <= pktFlowWindow` and `pktFlightSize <= pktCongestionWindow`.

**NOTE:** ACKs are received by the SRT sender periodically at least every 10 milliseconds. This statistic is most accurate just after receiving an ACK packet and becomes a little exaggerated over time until the next ACK packet arrives. This is because with a new packet sent, while the ACK number stays the same for a moment, the value of `pktFlightSize` increases. But the exact number of packets arrived since the last ACK report is unknown. A new statistic might be added to only report the distance between the ACK sequence number and the sent packet sequence number at the moment when an ACK arrives. This statistic will not be updated until the next ACK packet arrives. The difference between the suggested statistic and `pktFlightSize` would then reveal the number of packets with an unknown state at that moment.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMPORTANT: this "new statistic" mentioned here is now possible to be easily implemented because I had to introduce this "minimum flight window" tracer due to the need of implementing the "window" balancing algorithm for balancing groups.


This value is calculated by the SRT receiver based on the incoming ACKACK control packets (sent back by the SRT sender to acknowledge incoming ACKs).

TODO: peer to agent terminology, with the same journal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of "peer to agent" etc. you can also say something like "from machine A to machine B and vice versa", though I'm not sure if this speaks better. The journal number is the data that is recorded when ACK message is sent so that it's known what the time was when sending ACK for which the received ACKACK message is a response.

@@ -314,266 +420,230 @@ either already acknowledged or dropped by TSBPD as too late to be delivered.

Copy link
Collaborator

@ethouris ethouris Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a little confusion in this description. There are two different stateous sequence numbers in the socket that are taken into account when regarding the incoming packet's sequence number:

  • the top reception sequence
  • the ACK sequence

Packets named "belated" are not late (or, not only) towards the top reception sequence, but they are late towards the ACK sequence. It means that these are packets that come too late to be regarded for anything. Packets that are "late", but not "belated" (they are behind the top reception sequence, but still ahead of ACK), can still seal holes caused by missing packets, and usually these are retransmitted packets. Note that every belated packet means wasted link capacity.


The available space in the SRT receiver buffer, in bytes. Receiver only.

TODO: SRT socket?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UDP socket is not regarded anyhow in the stats anyway, but might be that this would be better for clarity.

@maxsharabayko maxsharabayko modified the milestones: v1.5.0, Parking Lot, Backlog May 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[docs] Area: Improvements or additions to documentation Type: Maintenance Work required to maintain or clean up the code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants