Skip to content

Observation Algorithms

yanj-github edited this page Apr 26, 2024 · 6 revisions

Video Observation Algorithms

This section defines observation algorithms for each video observation. The pyzbar library is used to read embedded QR codes. Recording shall be taken at least twice the frame rate of the displayed content to capture most of the QR codes.

  • Detected Mezzanine QR codes: [QRa, QRb, QRc .... QRn]
  • Sample duration in ms of recording: Dr = 1000ms / camera_frame_rate
  • Delay in QR creation on Test Runner (extracted from T.R. QR code): d
  • Maximum permitted startup delay (CTA-2003: 120ms): TSMax
  • A FIXED arbitrary value equivalent to calculate a half of the recording frame: CAMERA_FRAME_ADJUSTMENT = 0.5
  • tolerance = 20ms

Every frame shall be rendered and the samples shall be rendered in increasing presentation time order

  1. check the first frame is rendered.

     QRa.mezzanine_frame_num == 1
    

    for the random access to frame

     QRa.mezzanine_frame_num == random_access_frame_num
    

    for the random access to time

     QRa.mezzanine_frame_num == rounddown(random_access_time * mezzanine_frame_rate)
    
  2. check the last frame is rendered. Half frame duration is to be added before rounddown for calculating correct frame number regardless the way cmaf_track_duration rounded.

     QRn.mezzanine_frame_num == rounddown((cmaf_track_duration + half_frame_duration) * mezzanine_frame_rate)
    
  3. check that the frames shall be rendered in increasing order.

     for QRb to QRn:
         QR[i-1].mezzanine_frame_num + 1 == QR[i].mezzanine_frame_num
    

Switching and Splicing test

  1. Check the first sample is rendered.
  2. Check the last sample is rendered.
  3. Based on playout parameter work out the switching and splicing point.
  4. Based on step 3 check the ending frame and starting frame at the switching and splicing point.
  5. Check that the samples shall be rendered in increasing order with in the same switching and splicing block.

Play back with gaps test

e.g: Low-Latency: Playback over Gaps

  1. Check the first sample is rendered.
  2. Check the last sample is rendered.
  3. Based parameters work out the start and end point of the gap.
  4. Based on step 3 check the ending frame and starting frame at the gap. Tolerance required to be considered before the gap start. However, no tolerance is expected after the end of the gap.
  5. Check that the samples shall be rendered and in increasing order from playback start till the gap, and from after the gap till the end of playback.

Truncated Playback and Restart

  1. Check the first sample is rendered.
  2. Check the last sample is rendered.
  3. Based on playout and second_playout_switching_time parameter works out the switching point.
  4. Based on step 3 check the ending frame and starting frame at the switching point.
  5. Check that the samples shall be rendered in increasing order within the same switching block.
  6. Based on step 3 check the ending frame is within second_playout_switching_time and end of playout. And check that frames rendered only once.
  7. Based on second_playout parameter works out the switching point.
  8. Based on step 7 check the ending frame and starting frame at the switching point.
  9. Check that the samples shall be rendered in increasing order within the same switching block.

The playback duration of the playback matches expected duration

Some devices display frame 1 before and after the play event. Measurement starts from the second detected frame. There are some devices hold on to the last frame until the next test is loaded. Actual playback duration is measured till the first detection time of the last detected frame. The first frame duration and the last frame duration is added to account for the whole duration. Expected track duration also takes account of the start and ending missing frames.

    expected_track_duration = cmaf_track_duration - start_missing_frame_duration - ending_missing_frame_duration
    actual_playback_duration = (QRn.first_appear_camera_frame_num - QRb.first_appear_camera_frame_num) * Dr + first_frame_duration + last_frame_duration
    actual_playback_duration == expected_track_duration +/- tolerance

Additional tolerance should be considered for the tests where the bigger tolerances should be used.


NOTE Devices hold on to the last frame is a mirror image of issue with the first frame. However, Observation Framework cannot handle it in a same way that it does for the first frame. The finished() event cannot be calculated of its actual position in time. Because this is the last event. There is no following event from Test Runner, so that the QR code generation delay cannot be obtained to the finished() event.


Test with waiting in playback

e.g: Buffer Underrun and Recovery Test Runner signals "waiting" status when the waiting occurs. Playback waiting duration is calculated from 1st detection of "waiting" status (waiting_start_time) till it changes back to "playing" (playing_start_time). However, "waiting" status before the playback start should be ignored. Total waiting duration should take into account of the testing limitations, the status QR code detection might be delayed for 1000/camera_frame_rate. Then expected_track_duration should be adjusted to add detected waiting duration.

    min_gap_duration += playing_start_time - waiting_start_time - 1000/camera_frame_rate
    max_gap_duration += playing_start_time - waiting_start_time + 1000/camera_frame_rate

Truncated Playback and Restart

Duration checks for each presentation should be observed separately. Start frames adjustment required for 1st presentation while the ending frame adjustment required for 2nd presentation. Playback duration of presentation one should be more than second_playout_switching_time.

The start-up delay should be sufficiently low, i.e., TR [k, 1] – Ti < TSMax

start_up_delay is calculated on first appear camera frame number after play(). If device display frame 1 before play() then start up delay is measured to frame 2, when there are some frames missing at the beginning the start-up delay is measured to the first detected frame.

    start_up_delay = (QRa.first_appear_camera_frame_num_after_play * Dr) - ((play_event.camera_frame_num * Dr) - d)
    start_up_delay < TSMax

The presented sample matches the one reported by the currentTime value within the tolerance of the sample duration

The currentTime values are detected from test runner status QR codes. Status report interval is much bigger than the sample duration. Therefore, the measurements are made on every different status report instead of each presented sample. It is not able to detect the correct point of when the playback starts in relation to the reported currentTime (e.g: where ct=0.0). The currentTime check is ignored at the beginning where ct=0.0 or correspondingly for random access tests e.g: where ct=random_access_time. When there is no matching pair of presented sample detected for the currentTime the check is ignored.

    sample_tolerance_in_recording = ct_frame_tolerance * 1000/mezzanine_frame_rate/(1000/camera_frame_rate) = camera_frame_rate/mezzanine_frame_rate
    sample_tolerance = ct_frame_tolerance * 1000/mezzanine_frame_rate
    
    target_camera_frame_num_of_ct_event = ct_event.first_seen_camera_frame_num - (ct_event.d / Dr)
    first_possible_camera_frame_num_of_target = target_camera_frame_num_of_ct_event - CAMERA_FRAME_ADJUSTMENT - sample_tolerance_in_recording
    last_possible_camera_frame_num_of_target = target_camera_frame_num_of_ct_event + CAMERA_FRAME_ADJUSTMENT  + sample_tolerance_in_recording
    
    for first_possible_camera_frame_num_of_target to last_possible_camera_frame_num_of_target
            foreach mezzanine_qr_code on camera_frame that within the range
                    if mezzanine_qr_code.media_time == (ct_event.current_time +/- (sample_tolerance + tolerance))
                            test is PASSED

For the splicing test the actual media time is calculated by adding previous period duration

    previous_period is calculated based on playout parameter from the TR.
    mezzanine_qr_code.media_time[i] = mezzanine_qr_code.media_time[i] + previous_period

For the Low-Latency (1): Initialization test

The current time checks for "playing", "play", "current time = 0.0" is ignored. This is because the "playing" will change to "waiting" and 1st frame won't be rendered until successfully appending the first CMAF fragment.

Measure the time between the successful appending of the first CMAF chunk that exceeded min_buffer_duration and the first media sample being visible or audible. This value shall be compared against render_threshold.

This observation is similar to start up delay. However, it is measured from appending of the first CMAF chunk, Test Runner signals "appended" event on the successful appending of the first CMAF chunk.

Audio Observation Algorithms

This section defines observation algorithms for each audio observation. Audio recording is to be made jointly with video via a 3.5mm audio jack (or similar) from a device under test. To make audio observations ffmpeg is used to extract audio wave and save it to a .wav file. The pyaudio library is being used to read audio wave data from the file. The audio data is further trimed to remove the leading and trailing audio by detected position of starting and ending audio segments from the recording.

Audio mezzanine is cut into small segments in 20ms. Cross-correlation is used to compare mezzanine with recording and obtain offset timings for each segment from recording. To speed up the calculation only check in the expected neighbourhood of the segment, 500ms sample is taken from the recording file on expected position, instead of finding matches with whole recording file.

  • Detected audio segments timings in recording: [ASa, ASb, ASc .... ASn] where ASa=0ms
  • Audio media time: [ASa.media_time = 0ms, ASb.media_time = 20ms, ASc.media_time = 40ms .... ASn.media_time = 20ms*i]
  • Audio segment length: audio_sample_length = 20ms
  • Maximum permitted startup delay (CTA-2003: 120ms): TSMax
  • A FIXED arbitrary value equivalent to calculate a half of the recording frame: CAMERA_FRAME_ADJUSTMENT = 0.5
  • tolerance = 20ms
  • Audio-Video Synchronization tolerance: av_sync_tolerance = 40, -120

Every sample shall be rendered and the samples shall be rendered in increasing presentation time order

For each audio segment compare its detected audio segments timings in recording with its media time. If timings match, then report the pass.

    for ASa to ASn:
            abs(AS[i] - AS[i].media_time) <= audio_sample_length

Sometimes samples rendered and in right order but delayed slightly. If timings do not matches, check that current segment is in line with two adjacent segments.

Random Access Tests

Expected mezzanine audio signal is calculate from random access point till the end.

Splicing tests

Expected mezzanine audio signal is calculated based on playout parameter.

The playback duration of the playback matches expected duration

Calculate detected audio duration. Measurement starts from the beginning of first detected audio segment to the finishing time of last detected audio segment. Duration check should consider duration missing either end of playback.

    expected_track_duration = cmaf_track_duration - start_missing_duration - ending_missing_duration
    actual_playback_duration = ASn - ASa + audio_sample_length
    actual_playback_duration == expected_track_duration +/- tolerance

Random Access Tests

Expected duration is calculate from random access point till the end.

Splicing tests

Expected duration is calculate from playout parameter.

The start-up delay should be sufficiently low, i.e., TR [k, 1] – Ti < TSMax

start_up_delay is calculated on first detected audio segment after play().

    start_up_delay = ASa - ((play_event.camera_frame_num * Dr) - d)
    start_up_delay < TSMax

NOTE Detected audio segments timings in recording is checked in "Every sample shall be rendered, and the samples shall be rendered in increasing presentation time order" and any failing segment are removed. When there are some missing audio segments at the beginning "ASa" is detected earliest audio segment.


Combined Audio and Video Observation Algorithms

This section defines observation algorithms for combined audio and video observation.

Every sample for every media type included in the CMAF Presentation duration shall be rendered and shall be rendered in order

Video and audio observations are separately made, and results are given separately.

The playback duration of the playback matches expected duration

Video and audio observations are separately made, and results are given separately.

The start-up delay should be sufficiently low, i.e., TR [k, 1] – Ti < TSMax

Video and audio observations are separately made, and results are given separately.

The presented sample matches the one reported by the currentTime value within the tolerance of the sample duration

This is not applicable to audio. Only video observation is made.

The presentation starts with the earliest video sample and the audio sample that corresponds to the same presentation time

Check earliest video sample presentation time matches with HTML starting presentation time. And checks that earliest audio sample presentation time matches the earliest video sample presentation time.

Audio-Video Synchronization: The mediaTime of the presented audio sample matches the one reported by the video currentTime value within the tolerance of +40 mS / -120 mS

  1. Calculate video offset Calculate mean detection time based on 1st and last QR code detection time. For 1st frame it might be rendered before play so take the last detection time while for last frame it might rendered after play stopped so take the first detection time. Calculate offset of detection time and media time for each detection frames. Clean up the offsets to smooth out the video detection limitations.

  2. Calculate audio offset Calculate offsets of detection time and media time for each audio segment.

  3. Check for A/V sync within the tolerance Find the matching audio and video based on the same media time the differences should be less than an audio sample length if longer the matches not found ignore the audio samples which failed to find matching audio where the sync is not measurable.

Clone this wiki locally