Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement audio video synchronization #4325

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Implement audio video synchronization #4325

wants to merge 5 commits into from

Conversation

nanangizz
Copy link
Member

How it works in general:

  1. App creates AV sync using pjmedia_av_sync_create().
  2. App adds all media to be synchronized using pjmedia_av_sync_add_media().
  3. Each time a media receives an RTCP-SR, update AV sync usingpjmedia_av_sync_update_ref().
  4. Each time a media returns a frame to be rendered, e.g: via port.get_frame(), update AV sync using pjmedia_av_sync_update_pts(), the function will return a number:
    • zero : nothing to do,
    • positive: increase delay as many as the returned number,
    • negative: decrease delay as many as the returned number.

Logic in pjmedia_av_sync_update_pts():

  • Calculate the absolute/NTP timestamp of the frame playback based on the reference NTP and the supplied frame RTP imestamp. This timestamp is usually called presentation time (pts).
  • If pts is the largest or most recent, set it as AV sync's max_ntp. Otherwise, calculate the difference/delay of this pts to the max_ntp, then request the media to speed up as much as the delay.
  • If after some requests the delay is still relatively high, request the fastest media to slow down instead.

Changes needed in stream:

  1. Update reference time (NTP & RTP timestamps) whenever receive RTCP-SR.
  2. Update presentation time whenever returns a frame (to play/render).
  3. Add mechanism to calculate current delay & optimal delay from jitter buffer size.
  4. When AV sync request to speed up, as long as the end delay is not lower than the optimal delay, do it.
  5. When AV sync request to slow down, do it, i.e: increase and maintain the jitter buffer size.

@sauwming
Copy link
Member

An early comment since it's still a draft.
The term 'av' seems to imply the sync will only apply to audio and video only, even though it looks like the description can cover general cases.
Especially with the upcoming text media, which may also need to be sync-ed (although the sync may not be available in the early version, but perhaps in the future).

Perhaps removing the av and call it pjmedia_sync should be sufficient?

@nanangizz
Copy link
Member Author

An early comment since it's still a draft.

Sure, actually early feedbacks is one of the goal of creating this draft.

The term 'av' seems to imply the sync will only apply to audio and video only, even though it looks like the description can cover general cases. Especially with the upcoming text media, which may also need to be sync-ed (although the sync may not be available in the early version, but perhaps in the future).

Perhaps removing the av and call it pjmedia_sync should be sufficient?

Yes, the sync module is generic for any media, especially delivered via RTP/RTCP and using a shared/synchronized NTP source.

IMO the 'av_sync' prefix will enhance intuitiveness, i.e: it sounds related to presentation time synchronization. While 'sync' prefix may be somewhat ambiguous, e.g: what to sync?. I recall how 'ssl_sock' term was chosen over tls_sock for catchy/intuitiveness reason, when it was being developed, TLS was already around replacing SSL (even PJSIP transport was already using 'TLS', see also #957). What do you think?

@sauwming
Copy link
Member

The term should be inter-media synchronization, but yes, it's not quite as catchy, so I suppose it's okay if the 'av' stays.

The jitter buffer is designed for audio, while in video it is just a normal/dummy buffer. Hence managing delay in audio stream is much easier than in video stream. As normally video presentation is later than audio, it should be fine to adjust the delay in audio stream only for the synchronization.
@@ -330,11 +326,15 @@ PJ_DEF(pj_int32_t) pjmedia_av_sync_update_pts(
pj_sub_timestamp(&ntp_diff, &media->last_ntp);
ms_diff = ntp_to_ms(&ntp_diff);

/* Smoothen and round down the delay */
ms_diff = ((ms_diff + 19 * media->smooth_diff) / 200) * 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should an upper bound considered? Where does the magic numbers come from?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added the upper bound 60s in the last commit, actually not really sure if it is needed.

The magic numbers are weighting for the smoothing, i.e: last delay uses weight of 19, newly received delay uses weight of 1, then divide the sum by 20 for the new delay. Instead of by 20, it is by 200 then multiplied to 10 for rounding down to nearest 10.

Thanks for the feedback.

@nanangizz
Copy link
Member Author

Ready to review perhaps :)

Currently the delay adjustment is implemented in audio stream only, because:

  • video playback is kind of play as soon as possible (with built-in minimum delay for stability), the delay/burst/optimal-delay calculations in jitter buffer do not really apply (due to multiple RTP packets per video frame)
  • it is easier to manage delay in audio as the jitter buffer already have info about delay & optimal delay
  • should be sufficient for synchronization.

@nanangizz nanangizz marked this pull request as ready for review March 5, 2025 08:13
@andreas-wehrmann
Copy link
Contributor

Especially with the upcoming text media, which may also need to be sync-ed (although the sync may not be available in the early version, but perhaps in the future).

Sorry, quick question about that: Are there any specific plans to implement this (more specifically: RTT with T.140)?
I'm asking because I will need this later (probably this year) and was thinking about implementing it myself and maybe bring this upstream, unless there are any efforts underway already which I could support?

@sauwming
Copy link
Member

sauwming commented Mar 5, 2025

Yes, I'm currently implementing it. The basic functionality is already working (can send and receive text), but there's still a lot more to do, such as redundancy, docs, etc.

@andreas-wehrmann
Copy link
Contributor

Yes, I'm currently implementing it.

If this work available publicly? From a quick glance I couldn't find a proper branch.
I'd like to take a look at it if that's alright with you (even if it's in its very early stages).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants