-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wifi mesh node do not find parent (or root) (IDFGH-11009) #12193
Comments
Now tested with esp-idf 4.4.5. Same result as for 4.4.3 |
Maybe related to #6518 ? |
@ClaesIvarsson
|
|
@ClaesIvarsson Can you provide a demo for us to reproduce this issue? |
Yes, I can demonstrate the problem, Maybe Friday at 10 CET is ok ? Please let me know and I will provide an invitation. |
Do you mean that you want to demonstrate your problem online? Actually, we hope you can provide a reproduce code so that we can test and debug it locally. |
Well,
Maybe a misunderstanding from my side! I am not sure I am allowed to share the existing code, will have to check how in such case. |
Is there anyway to get more debug information out from the mesh module ? Is there any information the provided logs that gives any hints on the problem at hand ? |
When the mesh starts, it will scan all channels to find the channel and network,
This scan result shows that the device has found the fixed root device, the channel is 1. Then the device will scan again on channel 1 and try to connect to the fixed-root(ESPM_1E053C). But in your failed log, it shows the device can't find any device to be connected, the MAP, candidate and root all are 0:
The device then continuously scans on channel 1 until the specified number of failures is reached. In your case, it's 20 times. After this, the device will start to scan all channels to find network, and still find the fixed-root (ESPM_1E053C) on channel.
Then try to connect on channel 1, repeat the process above. At the end of the first scan after switching to channel 1, the device should be able to scan the fixed-root, and then select to connect, just like in your success log.
It's strange that fixed-root cannot be scanned on channel 1 at the time of failure. Since you said the test with the IDF official example was successful and that it failed with your own code, so suggest you to check your own code. |
I have spent a lot of time trying to find a reason for this issue to happen. I can see that some builds are more prune to fail than others. I have used git bisect function to try to find a commit that cause the issue. But my conclusion is that this happens on almost versions of my software since I introduced the NAT feature based on the ip_internal_network. If present, mainly depends on the number of tests performed. |
I noticed this problem when using NAT with this same example (ip_internal_network). Yesterday I opened an issue, maybe it's the same problem: #12431 |
We tried the example with minor adjustments used as node in the mesh network (not root) and didn't see this behavior at that point, but maybe we didn't test it enough times. I will try to do this test again with more repetitions. |
I believe I have found the root cause for this problem! The mesh password variable stored in the cfg structure was not null terminated when my password was copied. It rely on the fact that the cfg.mesh_ap.password buffer is initialized with zero value.
or at least make sure that
|
@ClaesIvarsson Thanks for your update, we will consider to update it. |
@ClaesIvarsson |
I think it is luck or coincidence. The cfg variable is located on the stack which is uninitialized. My task is located in external ram, maybe this is the major difference. |
@ClaesIvarsson @zhangyanjiaoesp Thanks for the reporting and debuging. We will close this issue, plz feel free to reopen if still have this issue. Thanks. |
Answers checklist.
IDF version.
4.4.3
Operating System used.
Linux
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
Custom board
Power Supply used.
External 3.3V
What is the expected behavior?
With fixed root up and running I expect any node to find and connect to the mesh network directly (10-20s) on power on.
What is the actual behavior?
Node searches for root/parent but fail to find any and ends up throwing MESH_EVENT_NO_PARENT_FOUND after . After that, no new searches are performed and the node ends up dangling outside of the mesh network.
After power cycle and a fresh start may or may not resolve the issue.
Problem occurs approximately 50% of the times at startup.
I have also tried to delay application related tasks to not interfere at startup without any better result.
Steps to reproduce.
Init code for the wifi/mesh is identical to code in the ip_internal_example:
Debug Logs.
More Information.
Using a mesh network system and have a custom software based on the ip_internal_network example (with a lot of application additions).
Root node is fixed!
ESP32-WROVER 16MB flash / 4MB ram
running on both cores
wifi module assigned to core 1
I also tried to modify the original example project with same changes we have in our project (sdkconfig):
None of the changes in the example project caused similar behavior. The example project connect to my projects root 100% of the startups.
I have not tested more recent version of esp-idf since it breaks my project.
The text was updated successfully, but these errors were encountered: