Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS SDK and React Native SDK Reconnection Issues #6333

Closed
mustafaboleken opened this issue May 15, 2024 · 22 comments · May be fixed by ant-media/StreamApp#468
Closed

JS SDK and React Native SDK Reconnection Issues #6333

mustafaboleken opened this issue May 15, 2024 · 22 comments · May be fixed by ant-media/StreamApp#468

Comments

@mustafaboleken
Copy link
Contributor

mustafaboleken commented May 15, 2024

  • When reconnection happens, play stream ids are duplicated, remote tracks are damaged and cannot be playable, and we cannot publish again.
  • When we stop playing or publishing and try to do it again, websocket connection closed randomly and the sample app cannot work until we restarted the application.
@MaZZly
Copy link

MaZZly commented May 15, 2024

For more info on this problem see the following: #4071 (comment)

@burak-58 burak-58 moved this from Next Sprint to 🔖 Sprint in Ant Media Server May 20, 2024
@mustafaboleken mustafaboleken moved this from 🔖 Sprint to 🏗 In progress in Ant Media Server May 26, 2024
@mustafaboleken mustafaboleken changed the title React Native Reconnection Issue JS SDK and React Native SDK Reconnection Issues May 26, 2024
@mustafaboleken
Copy link
Contributor Author

Hi @MaZZly

I just finished the implementation and it works for me. Can you please test it and tell me if it works for you or not?

Btw, other than these changes, you need to make a couple of configuration changes:

Go to the management console, edit advanced settings, and set "encodingTimeout": 500, "webRTCClientStartTimeoutMs": 1000

Also, you mentioned about the newTrackAvailable is not received after ice connection change. To be able to receive the latest tracks, you can call requestVideoTrackAssignments(streamId) function and you will get video_track_assignment_list and audio_track_assignment callbacks. Then, you can just update them.

@mustafaboleken mustafaboleken linked a pull request May 26, 2024 that will close this issue
@MaZZly
Copy link

MaZZly commented May 27, 2024

@mustafaboleken we will go through it and see if it works as needed for network changes :)

@burak-58 burak-58 moved this from 🏗 In progress to After sprint in Ant Media Server May 27, 2024
@MaZZly
Copy link

MaZZly commented May 27, 2024

@mustafaboleken it seems to be a bit better, but there are still cases where it stops working and seem to get into a "runaway" scenario where the console flooding with triggered events for reconnection_attempt_for_X/unathorized_access/already_playing.

By the way, we haven't needed or seen any events of the requestVideoTrackAssignments or video_track_assignment_list type yet.

Runaway/flooding problem scenario

  • Have 3 participants join a conference room and publish their own streams as pariticipant_desktop, pariticipant2_desktop and pariticipant3_mobile (from e.g. Android phone)
    • have all these streams visible on all devices also, just like e.g. a normal Teams/Zoom call or similar
  • On the mobile client, Toggle back and forth between wifi and cellular couple of times, making sure to let the streams "recover" before toggling again...
    • Here I highly recommend to connect remote debugging to see what is going on in the browser console of the device.
  • You can also try fully disabling the internet access for a while and re-enabling it.

After doing this for a while (and it seems after WebSocketNotConnected has triggered a couple times) it will start to flood the console with more and more of the following:

  • reconnection_attempt_for_publisher (multiple times for same streamid)
  • reconnection_attempt_for_player (multiple times for same streamid)
  • Cannot send message:
  • WebSocketNotConnected
  • unauthorized_access (hundreds of them for the same streamids)
  • already_playing (hundreds of them for the same streamids)

So it seems there is something that starts spawning multiple calls when the websocket connection has been gone in to some state (multiple disconnects?) and from there just keeps spawning more and more events, creating a runaway scenario where no streaming works anymore and the device/tab ultimately becomes unusable.


One more thing, which I'm not 100% sure is related to the ANT JS SDK, but when this runaway scenario happens, it also seems to bring down the mobile connection on my Android 14 phone (OP10Pro).. It doesn't seem to happen while on Wifi. Could maybe be related to runaway scenario trying to open a bunch of new connections and my ISP therefore throttling the connection?

@mustafaboleken
Copy link
Contributor Author

Hi @MaZZly

I will try the scenario you provided. Btw, can you tell me your ant media server version?

@MaZZly
Copy link

MaZZly commented May 28, 2024

@mustafaboleken it is Enterprise Edition 2.9.0 20240405_1755

@MaZZly
Copy link

MaZZly commented May 28, 2024

@mustafaboleken did some more testing with 2 participants, one streaming from mobile, and one watcher from desktop. This is probably the easiest I've been able to reproduce problems so far.

Problem with broadcast after network switch

  • Connect to a room from desktop and mobile(Android on mobile network)
    • Android to publish stream, desktop to watch it.
  • Turn off the mobile network
  • Wait for the error WebSocket connection to '...' failed: ...
  • Turn the mobile network back on.
  • wait for publish_started event to trigger
    • Broadcast is visible in AMS ui
    • The watcher is not notified about new stream in room... Doing webRTCAdaptor.getRoomInfo( triggers callback no_active_streams_in_room)

We have logic for killing the broadcast if streamIdInUse triggers, but I also tried without and the same thing happens..

I also noticed that with the 1000ms setting, we are pretty often triggering publishTimeoutError on the mobile client... Is that really needed/better than the 3000 default setting?
Note that the 1000ms timeout might be "okay" in your test setup, and not triggering the problem described above...

Anyway, there seems to be some timing issue somewhere, and the even though the server sees the newly started broadcast, it is not passed on to the other participants in the room. Is there somewhere in AMS UI where I can see if the broadcast is tied to a room?

@mustafaboleken
Copy link
Contributor Author

Hi @MaZZly

When I think about these two things

By the way, we haven't needed or seen any events of the requestVideoTrackAssignments or video_track_assignment_list type yet.

and

Doing webRTCAdaptor.getRoomInfo( triggers callback no_active_streams_in_room

I realize that you are using a stream-based conference solution. I was doing my tests with the multitrack conference samples. Probably that's why I didn't face them.

Can you tell me which one you are using?

@MaZZly
Copy link

MaZZly commented May 28, 2024

TBH I don't know (AMS documentation is quite scattered...), but from a quick Google it seems we should've used sdpSemantics to enable multitrack, which we haven't.
So I'm going to say/guess we are just using stream-based solution.

@MaZZly
Copy link

MaZZly commented May 28, 2024

@mustafaboleken webRTCAdaptor.playStreamId still also has the problem of starting to contain duplicates (multiple ones) after doing network switches...

Edit: It even creates multiple double entries of same stream during one network switch... And also the "runaway"-problem described above seems to create very many instances of the same streamid... This could point to where the problem there is (if there are multiple ones "spawned" on every retry)

@mustafaboleken
Copy link
Contributor Author

Maybe you already saw that but stream-based conferences are deprecated and most likely removed in the next version. Do you want to migrate to the multitrack conference? If you prefer it, I can help you to migrate quickly and the issues that you are reporting will be solved. Let me know how do you wanna proceed?

@MaZZly
Copy link

MaZZly commented May 29, 2024

Well if the other one is deprecated I guess we should migrate... How do we proceed to get this done ASAP ?

@MaZZly
Copy link

MaZZly commented May 29, 2024

@mustafaboleken also, is the track-based conference room supported by other SDKs than JS? 🤔 If not, when/are they planned?

We're mostly interested in the Flutter SDK timeline

@mustafaboleken
Copy link
Contributor Author

Yes, the multitrack approach is supported by all of our SDKs. I can prepare a migration guide and send it to you or if you wanted to make a meeting, you can just contact me through my mail address so we can make it together. Other than my planned meetings, my first priority is solving your problem. Just let me know about your decision. @MaZZly

@MaZZly
Copy link

MaZZly commented May 29, 2024

@mustafaboleken lets start with a guide, and if something is unclear we can do a meeting to go through the unclear details :)

@mustafaboleken
Copy link
Contributor Author

Sure, I will prepare and send it to you in 2 hours. :)

@mustafaboleken
Copy link
Contributor Author

Key Updates and Changes

  1. Room Structure Removal:

    • The room structure has been completely removed. Instead, there is now one main track broadcast that acts as the room, with participants added as sub-track broadcasts into the main broadcast.
  2. Simplified Participant Management:

    • There is no need to store participants in lists like roomOfStream or streamsList as in the previous sample. Instead, you can check the subTracks of the main broadcasts. More details here.
  3. Storing Status in Metadata:

    • The microphone, camera, and screen share status can now be stored inside the metadata field. More details here.
  4. Function Simplification:

    • The joinRoom, leave, and getRoomInfo functions are no longer needed. All you need to do is publish to the main broadcast and play it. More details here.
  5. New Callbacks:

  6. New Websocket Message:

    • A new websocket message called getBroadcastObject has been introduced to retrieve the Broadcast structure as JSON. This broadcast can be either a Maintrack or a Subtrack. Obtain a Broadcast object for a main track by calling getBroadcastObject with the room ID, and for a subtrack, use its ID.
  7. Event-Based Approach:

    • The getRoomInfo method is no longer used. Instead, an event-based approach has been introduced. Applications will now be notified by the TRACK_LIST_UPDATED Data Channel message. Upon receiving this notification, applications can call getBroadcastObject for the Maintrack, check the subtracks, and obtain participant names by calling getBroadcastObject for the participant ID.

@mustafaboleken
Copy link
Contributor Author

mustafaboleken commented May 29, 2024

Also I have a pr to show how to implement reconnection mechanism into the sample, you can also take a look at that ant-media/StreamApp#449 @MaZZly

@MaZZly
Copy link

MaZZly commented May 30, 2024

@mustafaboleken I sent you a mail about setting up a chat :)

@mustafaboleken
Copy link
Contributor Author

You can check the inbox. @MaZZly

@burak-58 burak-58 moved this from After sprint to 🏗 In progress in Ant Media Server Jun 3, 2024
@Mohit-3196
Copy link
Contributor

Hi @MaZZly,

TBH I don't know (AMS documentation is quite scattered...), but from a quick Google it seems we should've used sdpSemantics to enable multitrack, which we haven't. So I'm going to say/guess we are just using stream-based solution.

I'm sorry you had to face problems with the documentation and couldn't find the right instructions while you were implementing the solution.
We are continuously trying to improve the documents and make them more useful and user-friendly.
It is already mentioned in the multitrack document that the sdpSemantics should be set to Unified Plan which is also set by default in AMS v2.4.3 and above.

It seems like we are missing some pieces in the documents and it would be great if you can guide me so that it can improved further.
May I know which specific document you looked at and what are the issues you face about the documentation so that we can improve the same?

Thank you,
Mohit

@burak-58 burak-58 moved this from 🏗 In progress to After sprint in Ant Media Server Jul 15, 2024
@mustafaboleken mustafaboleken moved this from After sprint to 🏗 In progress in Ant Media Server Jul 23, 2024
@burak-58 burak-58 moved this from 🏗 In progress to After sprint in Ant Media Server Aug 5, 2024
@mekya mekya assigned mekya and unassigned mustafaboleken Aug 12, 2024
@burak-58
Copy link
Contributor

burak-58 commented Sep 2, 2024

@burak-58 burak-58 closed this as completed Sep 2, 2024
@github-project-automation github-project-automation bot moved this from After sprint to ✅ Done in Ant Media Server Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
6 participants