-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenThread Network Time Synchronization Fails to Build on the ESP Thread Border Router (IDFGH-13108) #14055
Comments
Hi, I am running into some similar issues as you. I think the issue is because the espressif RCP does not process the MAC header IE field properly if present, which results in a RCP RESET_FAULT. I started documenting some of the issues in the openthread repository (I am using ot-br-posix) |
Hi @no2chem, Thank you so much for taking the time to look over my issue! After reading your Github issue and corresponding pull request in the OpenThread source code, it makes much more sense why the RCP is crashing with my changes. But more importantly, it makes a lot more sense why I can't get time synchronization to work on my ESP Thread Border: there isn't support for it yet for ESP32 devices, and it was only recently that RCP support for time sync was added to OpenThread (which is essential for border routers if they want to time sync). Is my understanding of the issue correct? If not, what parts am I misunderstanding? Any guidance you can give me will be greatly appreciated. |
It seems that the RCP build isn't even passing the TIME_SYNC config options correctly, so there's no chance for it to work in the first place... I opened an issue and will submit a PR soon: |
Thank you again @no2chem! Please keep me posted your discoveries and when your pull requests get approved. This may be an obvious question, but for the sake of learning, I want to ask. To the best of my knowledge, the way that a Thread Border Router works (in general, not just ESP border routers), is that packets get received from the RCP, then get passed to through the Spinel to the code running the network layers and above in the OpenThread stack. Thus, the problem with time sync is that the RCP fails to pass in the time sync packets to the network layer and above. I don't believe the problem is with the RCP itself when it first receives a time sync packet, since it's able to parse it when it first sees it. The problem must arise after the RCP parses the packet, and it sends it to the OpenThread network layer (and the code response for MLE) for further processing. Is my understanding correct, or is there some parts I got wrong or haven't fully understood yet? |
honestly, I'm not really a thread expert myself. AFAICT, the line you highlighted isn't even in the RCP until you apply my PR (you can test yourself by adding a #error directive in the middle of the function). The time sync stuff in openthread isn't really documented. From reading the code, sync messages don't get sent until a leader shows up with time sync support. If the leader has time sync support, the mle sends a broadcast with the current time sequence. This is the problematic message, if you have a cli then running If the message contains a TimeIE, then for some reason the RCP will puke on it. Sync messages contain a Time IE that gets set in I haven't really debugged it more, but I will once I get a debug UART on the h2 working. |
well, got logging working on a different UART than the RCP UART. Looks like theres an assertion that is failing now:
I'll look into it further tomorrow... weird that its in the RX path (looks like a default handler), but it ALWAYS happens right after the leader broadcasts a new time sequence. |
I've debugged more into this and the problem is an assert, but not the line above (for some reason the backtrace seems to be inaccurate). I'm trying to determine whether we actually ended up in the wrong state or not. |
Hi @SimeonAT - it looks like I resolved the issue, it appears the 802154 driver has some issues on state transition changes when transmit_at and rx_on_idle were enabled at the same time. If you apply #14060 and #14089 and turn the time sync option on, you should get a RCP that works with time sync. however I think you won't be able to get the ESP thread border router working with time sync because the ESP border router is provided as a closed source library so you can't change the compile options on it. I just have the ESP32-S3 run serial over ethernet (or wi-fi) and use |
Thank you so much for the help @no2chem! I really appreciate the help, and the detailed and thorough explanation you given me. I don't necessarily need to run it over the border router. If I can get it in to run in the border router - then great. However, given such problems, I won't run my experiments in the border router - I'll stick with running them on non-border router FTDs. What's important for me in this case is if I don't run it in the border router, I'll need a justification as to why, and the discussion we have in this Github issue will allow me to describe why I'm not using the border router in my experiments. |
Answers checklist.
IDF version.
v5.4-dev-1030-g0479494e7a
Operating System used.
macOS
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
What is the expected behavior?
I should be able to use the OpenThread Network Time Synchronization feature on an ESP Thread Border Router, whether it be programmatically through the API or the command line interface.
What is the actual behavior?
I am using the ESP Thread Border Router SDK, using the
OPENTHREAD_TIME_SYNC
flag per the instructions given in Issue 12154. I enabled that flag in both the RCP and Border Router programs.The
idf.py build
process is successful for the RCP. However, theidf.py build
process fails for the Border Router, giving the following error below:I was able to successfully build this on ESP32-H2s using the
ot_cli
example program and they were able to use network time synchronization without any issues.I describe how I attempted to solve this issue in the More Information section. However, I had to modify the ESP-IDF source code in order to do so. This leads me to believe that it may be the case that the OpenThread Network Time Synchronization for the ESP Border Router does not have support yet.
As a result, I would like to ask the following question:
Is it possible to use OpenThread Network Time Synchronization on an ESP Thread Border Router, or is this feature only available for non-Border Router devices? If so, what steps do I need to take in order to properly navigate this build error (rather than using the ad-hoc solution I described in the More Information section?
I am not asking for a feature or bug fix to be implemented. However, my higher level question is if such a feature is currently possible in ESP-IDF. If it is not possible, I can change the OpenThread app layer programs that I am writing given that fact. However, I don't want to assume that it isn't possible to do so without asking if it possible first, as it may be the case that I am using the ESP-IDF OpenThread Network Time Sync feature incorrectly.
Steps to reproduce.
I want to note that I was running my fork of the ESP Thread Border Router SDK that I am using for my own purposes. However, I believe the problem can be easily reproduced on using the ESP Thread Border Router SDK.
The specific hardware I am using is the ESP Thread Border Router/Zigbee Gateway.
ot_rcp
example program. Useidf.py menuconfig
, enable theOPENTHREAD_TIME_SYNC
flag, and then runidf.py build
.idf.py menuconfig
, enable theOPENTHREAD_TIME_SYNC
, and runidf.py build
.Build or installation Logs.
The Console Output When I Build.
More Information.
In order to investigate this problem, I took a look at the
openthread
directory in the ESP-IDF Github repository.To the best of my knowledge, the ESP Thread Border Router program (not the RCP) uses
port/esp_openthread_radio_spinel.cpp
, while "non-Border Router" FTDs (e.g. an ESP32-H2 or ESP32-C6), usesport/esp_openthread_radio.cpp
. I believe the build error is coming from the factotPlatTimeGetXtalAccuracy()
is not defined inport/esp_openthread_radio_spinel.cpp
, and is only defined inport/esp_openthread_radio.c
. As a result, the ESP Thread Border Router program does not have a definition ofotPlatTimeGetXtalAccuracy()
that it can actually use.What I Did to Try to Solve the Problem
I decided to give it a try and see if I can solve this issue. I believe I was able to create a (naive) solution in my forks of ESP-IDF and OpenThread:
In my OpenThread fork, I set the
OPENTHREAD_CONFIG_TIME_SYNC_ENABLE
andOPENTHREAD_CONFIG_TIME_SYNC_REQUIRED
to the value ofOPENTHREAD_FTD
. I did not set both flags to1
, as I had problems building with the RCP whenOPENTHREAD_CONFIG_TIME_SYNC_ENABLE
was enabled (I describe this in further detail at the bottom of this post).I created a KConfig variable:
OT_RCP_ENABLE_TIME_SYNC
that I used to turn on time synchronization for the RCP only.I copied the function definition for
otPlatTimeGetXtalAccuracy()
used inesp_openthread_radio.c
, and added it toesp_openthread_radio_spinel.cpp
.To the best of my knowledge, I believe the RCP has access to both
esp_openthread_radio.c
andesp_openthread_radio_spinel.cpp
. I madeesp_ieee802154_transmit_sfd_done()
function process the network time sync packets for the RCP when theOT_RCP_ENABLE_TIME_SYNC
is enabled. I got the idea to edit this function from the changes made in commit2652881.
I used
idf.py menuconfig
on the RCP to enable theOT_RCP_ENABLE_TIME_SYNC
flag, but turned off theOPENTHREAD_CONFIG_TIME_SYNC_ENABLE
flag, then I built the RCP usingidf.py build
.I used
idf.py menuconfig
on the Basic Thread Border Router program to enable theOPENTHREAD_CONFIG_TIME_SYNC_ENABLE
, then built usingidf.py build
.After doing these steps, I was able to create a network with an ESP32-H2 and ESP Thread Border router that is time synchronized. I have attached screenshots showing the successful network time synchronization between the two devices.
Caveat
In order to test it works consistently, I would reset the Border Router and ESP32-H2 and run
networktime
multiple times in a row. Sometimes, the Border Router would crash due to an RCP failure.Why I Didn't Set
OPENTHREAD_CONFIG_TIME_SYNC_ENABLE
to1
in Step 1If I did so, I would get the following error when doing `idf.py build` on the RCP, shown in the code block below. That's why I thought it would be best to avoid setting this flag in the RCP.
The text was updated successfully, but these errors were encountered: