-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: TURN_STATE_GET_CREDENTIALS fails after update from 1.7.2. to 1.10.2 of the SDK #2004
Comments
@skndi , Can you provide the complete logs? Has anything changed in you set up? |
Hello @disa6302, of course, here are the logs |
@skndi , Thank you for the logs. Although the get-credentials (reason unknown at the moment), want to confirm if you see a connection establishment failure? There are 2 state machines that run in the TURN path, I see logs only for one of them and the other one seems incomplete. Do you more logs than this? Want to check if the ICE agent state machine fails too and what happens to the other TURN state machine loop. |
Hi @disa6302, The file is opened truncated by default from the gist link and shows only 7000 lines from ~50000, I think that's why you don't see the second TURN state machine logs. There is a link in the gist to the raw file - https://gist.githubusercontent.com/skndi/add4220e87540bf719601cd651c1d818/raw/4d03564900e68ed693bdb0a5d324aa95b1b96616/gistfile1.txt, can you check if it contains all logs you need? From my check I see that both state machines have entered the FAILED state, but they report different errors for it - 0x5a00002c, which is STATUS_TURN_CONNECTION_GET_CREDENTIALS_FAILED and the other one 0x58000022 which is STATUS_SOCKET_CONNECTION_CLOSED_ALREADY. Also at the end of the logs the ICE agent state machine fails with 0x5a00000d - STATUS_ICE_NO_CONNECTED_CANDIDATE_PAIR |
Today I did a git bisect and the commit where the problem starts appearing is the one for the 1.8.0 release - e9f3cab, if that helps your investigation. My next step is to go on the develop branch and perform a bisect there, to see the exact commit which creates the issue. |
@skndi , Thank you for the details. What appears to be happening is that the credentials could not be fetched in 5 seconds, which is the timeout on that state. But this timeout existed in v1.7.2. Could you confirm if anything changed for you in the platform? To test it out, could you also try to increase the value here in 1.10.2 and see what happens? |
Hi @disa6302, Today I performed a bisect on #1776 which introduces the 1.8.0 release and have found the commit after which the sample stops working - ee5e345. I tried reverting it in the PR branch and the sample worked after that. On Monday I will try if reverting it on v1.10.2 works and will investigate what exactly in the changes makes it stop working on our system. |
Hi, I have investigated from our side and by a process of elimination reached the conclusion that not calling getaddrinfo in getIpWithHostName() later causes the TURN state machines to fail. In this function if I add a redundant call to getaddrinfo in the branch where parsing the IP address from the TURN server hostname is successful, then ICE succeeds and we get a working stream. I tried researching what getaddrinfo does under the hood and looked into the libc source code, but wasn't able to get an idea why this might be so, since it should only resolve hostnames to IP addresses. Our solution for the moment would be to get back to using getaddrinfo for address resolution. Do you have any insight why this might work that way, since to me it's quite confusing? |
@skndi , Thanks for all the deep dive!. This is very interesting. I see this log line in your logs: What is the IP address that gets logged if you use getaddrinfo? Is it different? |
@disa6302 No, the address found by getaddrinfo is the same. The parsing of the address works fine, but for the later TURN state machine connections to work a call to getaddrinfo must be made, even if it's result is unused. As a test I used the following code:
The result of the getaddrinfo call in the else case is not used in any way, but with this code we have a successful stream establishment. So my guess is that getaddrinfo has some side effect, which affects later connections, but I wasn't able to find anything obvious to me in the source code of the function. It's possible there is some quirk in our system also. For our use case it's good enough to go back to just using getaddrinfo, so that's what we decided to do. I'm posting the info in case you want to further research the issue, and since it would be very interesting to me if someone discovered the cause for this. |
Hi, |
@m-j-ahmadi , @skndi , |
Hi @disa6302 I am a colleague of @m-j-ahmadi we were thinking that 1.9.1 works but after testing few times, I could see that sometimes it works and most of the time it does not work. It seems like 1.9.1 does not work as well |
@skndi Can you please note which hardware platform you're on? You mentioned embedded linux arm64, but a few more details would help are you running linux on an SoC / any further details you can provide? |
@hassanctech we are also experiencing the same issue, not possible to migrate from 1.7.3 to 1.10.2 we had to downgrade and stay with 1.7.3 until we find the solution. we tested on Jetson nano previous version, Jetson Orin new version, and X64 Microsoft Azure VM with Ubuntu 22 |
Hi @hassanctech, we are running on an SoC with an Ambarella CV2 chip, the Linux distro is also provided by them. |
As a bit of extra info here is the user agent string user-agent -> AWS-WEBRTC-KVS-AGENT/1.10.2/1.5.2 GCC/10.2.1 Linux/5.4.215 aarch64 |
@skndi Thanks for those details. Can you please share some more details about your setup, are you using openssl or mbedtls? Are you using a static build? We have a release candidate branch: https://github.com/awslabs/amazon-kinesis-video-streams-webrtc-sdk-c/tree/1.10.3 We've done some tests on Ambarella S5L SoC, specifically with mbedtls option and static build and we are not seeing any degradation of TURN success rates. We now have an cmake option for you to specify the location of the cert.pem on device as well For reference I've used the following to build the SDK:
|
Please confirm you have already done the following
Please answer the following prompt
Describe the bug
The state TURN_STATE_GET_CREDENTIALS of the TurnConnectionStateMachine fails due to error 0x58000022.
Expected Behavior
The expected behavior is to get the credentials successfully and discover a viable TURN ICE candidate.
Current Behavior
2024-06-03 12:20:26.645 DEBUG socketConnectionClosed(): Close socket 15
2024-06-03 12:20:26.645 DEBUG connectionListenerReceiveDataRoutine(): recvfrom() failed with errno Connection reset by peer for socket 15
2024-06-03 12:20:26.653 VERBOSE stepStateMachine(): [TURN-0x7f6c054d00] State Machine - Current state: 0x0000000000000004, Next state: 0x0000000000000004, Current local state retry count [1]
, Max local state retry count [0], State transition wait time [0] ms
2024-06-03 12:20:26.653 VERBOSE stepStateMachine(): [TURN-0x7f6c0863c0] State Machine - Current state: 0x0000000000000004, Next state: 0x0000000000000004, Current local state retry count [1]
, Max local state retry count [0], State transition wait time [0] ms
2024-06-03 12:20:26.654 WARN socketConnectionSendData(): Warning: Failed to send data. Socket closed already
2024-06-03 12:20:26.654 ERROR iceUtilsSendData(): operation returned status code: 0x58000022
2024-06-03 12:20:26.654 ERROR iceUtilsSendStunPacket(): operation returned status code: 0x58000022
2024-06-03 12:20:26.654 VERBOSE stepStateMachine(): [TURN-0x7f6c0863c0] State Machine - Current state: 0x0000000000000004, Next state: 0x0000000000000100, Current local state retry count [0]
, Max local state retry count [0], State transition wait time [0] ms
2024-06-03 12:20:26.654 WARN executeFailedTurnState(): TurnConnection in TURN_STATE_FAILED due to 0x58000022. Aborting TurnConnection
2024-06-03 12:20:26.654 DEBUG stepTurnConnectionStateMachine(): [0x7f6c0863c0] Turn connection state changed from TURN_STATE_GET_CREDENTIALS to TURN_STATE_FAILED.
2024-06-03 12:20:26.654 VERBOSE stepStateMachine(): [TURN-0x7f6c0863c0] State Machine - Current state: 0x0000000000000100, Next state: 0x0000000000000080, Current local state retry count [0]
, Max local state retry count [0], State transition wait time [0] ms
2024-06-03 12:20:26.654 DEBUG stepTurnConnectionStateMachine(): [0x7f6c0863c0] Turn connection state changed from TURN_STATE_FAILED to TURN_STATE_CLEAN_UP.
2024-06-03 12:20:26.654 VERBOSE stepStateMachine(): [TURN-0x7f6c0863c0] State Machine - Current state: 0x0000000000000080, Next state: 0x0000000000000100, Current local state retry count [0]
, Max local state retry count [0], State transition wait time [0] ms
2024-06-03 12:20:26.654 WARN executeFailedTurnState(): TurnConnection in TURN_STATE_FAILED due to 0x58000022. Aborting TurnConnection
2024-06-03 12:20:26.654 DEBUG stepTurnConnectionStateMachine(): [0x7f6c0863c0] Turn connection state changed from TURN_STATE_CLEAN_UP to TURN_STATE_FAILED.
2024-06-03 12:20:26.654 VERBOSE stepStateMachine(): [TURN-0x7f6c0863c0] State Machine - Current state: 0x0000000000000100, Next state: 0x0000000000000100, Current local state retry count [1]
, Max local state retry count [0], State transition wait time [0] ms
2024-06-03 12:20:26.654 WARN executeFailedTurnState(): TurnConnection in TURN_STATE_FAILED due to 0x58000022. Aborting TurnConnection
Reproduction Steps
AWS_KVS_LOG_LEVEL=1 ./kvsWebrtcClientMaster <channel_name> 0 opus h264
and the KVS WebRTC Test Page as viewer, with NAT traversal set to TURN only
WebRTC C SDK version being used
1.10.2
If it was working in a previous version, which one?
1.7.2
Compiler and Version used
GCC 10.2.1
Operating System and version
Embedded aarch64 Linux
Platform being used
aarch64
The text was updated successfully, but these errors were encountered: