Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wifi mesh node do not find parent (or root) (IDFGH-11009) #12193

Closed
3 tasks done
ClaesIvarsson opened this issue Sep 5, 2023 · 18 comments
Closed
3 tasks done

wifi mesh node do not find parent (or root) (IDFGH-11009) #12193

ClaesIvarsson opened this issue Sep 5, 2023 · 18 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF

Comments

@ClaesIvarsson
Copy link

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

4.4.3

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

Custom board

Power Supply used.

External 3.3V

What is the expected behavior?

With fixed root up and running I expect any node to find and connect to the mesh network directly (10-20s) on power on.

What is the actual behavior?

Node searches for root/parent but fail to find any and ends up throwing MESH_EVENT_NO_PARENT_FOUND after . After that, no new searches are performed and the node ends up dangling outside of the mesh network.

After power cycle and a fresh start may or may not resolve the issue.
Problem occurs approximately 50% of the times at startup.

I have also tried to delay application related tasks to not interfere at startup without any better result.

Steps to reproduce.

  1. Startup of the root
  2. wait until stable
  3. power on mesh node
  4. wait for node to startup ~10-15s

Init code for the wifi/mesh is identical to code in the ip_internal_example:

Debug Logs.

Log when failure occurs:
I (27) boot: ESP-IDF v4.4.3-1-gcc5640aef3 2nd stage bootloader
I (27) boot: compile time 08:15:55
I (27) boot: chip revision: 3
I (29) boot.esp32: SPI Speed      : 80MHz
I (32) boot.esp32: SPI Mode       : DIO
I (34) boot.esp32: SPI Flash Size : 16MB
I (36) boot: Enabling RNG early entropy source...
I (39) boot: Partition Table:
I (41) boot: ## Label            Usage          Type ST Offset   Length
I (44) boot:  0 otadata          OTA data         01 00 0000d000 00002000
I (48) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (52) boot:  2 coredump         Unknown data     01 03 00010000 00010000
I (56) boot:  3 reserved         Unknown data     01 fe 00020000 00020000
I (59) boot:  4 hwadmin          Unknown data     01 d0 00040000 00001000
I (63) boot:  5 az_cert          unknown          40 00 00041000 00001000
I (67) boot:  6 az_cert_key      unknown          40 01 00042000 00001000
I (70) boot:  7 az_ca_api        unknown          40 02 00043000 00003000
I (74) boot:  8 az_ca_blob       unknown          40 03 00046000 00003000
I (78) boot:  9 nvs_key          NVS keys         01 04 00049000 00001000
I (82) boot: 10 nvs              WiFi data        01 02 00050000 00020000
I (85) boot: 11 filesystem       Unknown data     01 81 00070000 00370000
I (89) boot: 12 factory          factory app      00 00 00400000 00400000
I (93) boot: 13 ota_0            OTA app          00 10 00800000 00400000
I (97) boot: 14 ota_1            OTA app          00 11 00c00000 00400000
I (100) boot: End of partition table
I (102) boot: Defaulting to factory image
I (105) esp_image: segment 0: paddr=00400020 vaddr=3f400020 size=be5a0h (779680) map
I (345) esp_image: segment 1: paddr=004be5c8 vaddr=3ffbdb60 size=01a50h (  6736) load
I (347) esp_image: segment 2: paddr=004c0020 vaddr=400d0020 size=25fa1ch (2488860) map
I (1099) esp_image: segment 3: paddr=0071fa44 vaddr=3ffbf5b0 size=053c4h ( 21444) load
I (1107) esp_image: segment 4: paddr=00724e10 vaddr=40080000 size=1d994h (121236) load
I (1149) esp_image: segment 5: paddr=007427ac vaddr=50000000 size=00010h (    16) load
I (1165) boot: Loaded app from partition at offset 0x400000
I (1165) boot: Disabling RNG early entropy source...
I (1171) psram: This chip is ESP32-D0WD
I (1171) spiram: Found 64MBit SPI RAM device
I (1171) spiram: SPI RAM mode: flash 80m sram 80m
I (1173) spiram: PSRAM initialized, cache is in low/high (2-core) mode.
I (1177) cpu_start: Pro cpu up.
I (1179) cpu_start: Starting app cpu, entry point is 0x400817bc
I (0) cpu_start: App cpu up.
I (1726) spiram: SPI SRAM memory test OK
I (1734) cpu_start: Pro cpu start user code
I (1734) cpu_start: cpu freq: 240000000
I (1734) cpu_start: Application information:
I (1735) cpu_start: Project name:     ElsIotModule
I (1738) cpu_start: App version:      1.3.0
I (1741) cpu_start: Compile time:     Jun 22 2023 15:11:27
I (1744) cpu_start: ELF file SHA256:  8d10c98a9ef7eb70...
I (1747) cpu_start: ESP-IDF:          v4.4.3-1-gcc5640aef3
I (1750) heap_init: Initializing. RAM available for dynamic allocation:
I (1754) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (1757) heap_init: At 3FFB6388 len 00001C78 (7 KiB): DRAM
I (1760) heap_init: At 3FFB9A20 len 00004108 (16 KiB): DRAM
I (1763) heap_init: At 3FFD11D0 len 0000EE30 (59 KiB): DRAM
I (1766) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (1769) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (1772) heap_init: At 4009D994 len 0000266C (9 KiB): IRAM
I (1776) spiram: Adding pool of 4084K of external SPI memory to heap allocator
I (1780) spi_flash: detected chip: generic
I (1782) spi_flash: flash io: dio
I (1827) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (1844) spiram: Reserving pool of 24K of internal memory for DMA/internal allocations
I (2314) [main, 256]: Running partition, label: factory, type: 0x0, subtype: 0x0, address: 0x400000
I (2324) [main, 115]: efuse block [0, 0, 192]: 00 00 00 00 E4 D9 DB 93  17 44 57 00 00 A2 00 00  32 09 00 00 00 00 10 00
I (2330) [main, 115]: efuse block [1, 0, 256]: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
I (2333) [main, 115]: efuse block [2, 0, 256]: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
I (2340) [main, 115]: efuse block [3, 0, 256]: 00 00 00 00 00 00 00 00  00 00 42 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
I (4448) [mesh_wifi, 330]: mesh_wifi_init(root=false)
I (4503) wifi:wifi driver task: 3ffde3b4, prio:23, stack:6656, core=1
I (4507) system_api: Base MAC address is not set
I (4508) system_api: read default base MAC address from EFUSE
I (4540) wifi:wifi firmware version: 8cb87ff
I (4541) wifi:wifi certification version: v7.0
I (4542) wifi:config NVS flash: enabled
I (4543) wifi:config nano formating: disabled
I (4544) wifi:Init data frame dynamic rx buffer num: 64
I (4546) wifi:Init management frame dynamic rx buffer num: 64
I (4549) wifi:Init management short buffer num: 32
I (4552) wifi:Init static tx buffer num: 8
I (4556) wifi:Init tx cache buffer num: 64
I (4558) wifi:Init static rx buffer size: 1600
I (4559) wifi:Init static rx buffer num: 8
I (4560) wifi:Init dynamic rx buffer num: 64
I (4567) wifi_init: tcpip mbox: 32
I (4568) wifi_init: udp mbox: 6
I (4568) wifi_init: tcp mbox: 6
I (4568) wifi_init: tcp tx win: 5840
I (4570) wifi_init: tcp rx win: 5840
I (4572) wifi_init: tcp mss: 1400
I (4574) wifi_init: WiFi/LWIP prefer SPIRAM
I (4628) wifi:Set ps type: 0

I (4628) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
I (4807) wifi:mode : sta (44:17:93:db:d9:e4) + softAP (44:17:93:db:d9:e5)
I (4808) wifi:enable tsf
I (4811) wifi:Total power save buffer number: 4
I (4812) wifi:Init max length of beacon: 752/752
I (4813) wifi:Init max length of beacon: 752/752
I (4875) mesh: <nvs>read layer:0, err:0x1102
I (4880) mesh: <nvs>read assoc:0, err:0x1102
I (5002) wifi:Set ps type: 0

I (5029) mesh: [CONFIG]fail to set node, err:0x4008
I (5034) [mesh_wifi, 413]:  - router_ssid       : XXXXXX
I (5038) [mesh_wifi, 414]:  - router_password   : XXXXXXXXXXXXXXX
I (5043) [mesh_wifi, 415]:  - mesh_id           : XXXXXX
I (5045) [mesh_wifi, 416]:  - mesh_password     : XXXXXXXXXXXXXXX
I (5066) wifi:Total power save buffer number: 4
W (5531) wifi:age timer already deinit
I (5532) wifi:mode : sta (44:17:93:db:d9:e4)
I (5555) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:1
I (5561) [node_main, 74]: MESH is started
I (5570) [mesh_wifi, 422]: mesh starts successfully, heap:3932327, root fixed

I (6875) [mdf_mem, 199]: Free heap, internal current: 59163, minimum: 59023, total current: 3702523, minimum: 3701267

I (9045) mesh: [S1], c4:dd:57:b8:19:01, channel:1, rssi:-73
I (9046) mesh: [S3], fa:8f:ca:92:ea:09, channel:1, rssi:-92
I (9046) mesh: [S5], 30:c3:d9:aa:6c:bb, channel:1, rssi:-94
I (9048) mesh: [S7], c4:dd:57:b8:19:65, channel:1, rssi:-69
I (9051) mesh: find root:ESPM_1E053C, root_cap:5(max:512), new channel:1, old channel:0
I (9055) mesh: [S10], 84:1e:a3:47:84:b6, channel:1, rssi:-94
I (9057) mesh: [S16], d2:21:f9:b8:60:46, channel:6, rssi:-92
I (9060) mesh: [FIND][ch:0]AP:29, otherID:0, MAP:2, idle:0, candidate:1, root:1[4d:08:40:2c:75:09][FIXED-ROOT]
I (9065) mesh: [FIND:1]find a network, channel:1, cfg<channel:0, router:, 4d:08:40:2c:75:09>

I (9069) [node_main, 269]: MESH_EVENT_FIND_NETWORK
I (9070) wifi:mode : sta (44:17:93:db:d9:e4) + softAP (44:17:93:db:d9:e5)
W (9076) wifi:<MESH AP>adjust channel:1, secondary channel offset:1(40U)
I (9078) wifi:Total power save buffer number: 4
I (9336) mesh: [SCAN][ch:1]AP:16, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (9338) mesh: [FAIL][1]root:0, fail:1, normal:0

I (9582) mesh: [SCAN][ch:1]AP:15, other(ID:0, RD:3), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (9584) mesh: [FAIL][2]root:0, fail:2, normal:0

I (9828) mesh: [SCAN][ch:1]AP:14, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (9829) mesh: [FAIL][3]root:0, fail:3, normal:0

I (10073) mesh: [SCAN][ch:1]AP:5, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (10075) mesh: [FAIL][4]root:0, fail:4, normal:0

I (10319) mesh: [SCAN][ch:1]AP:15, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (10320) mesh: [FAIL][5]root:0, fail:5, normal:0

I (10565) mesh: [SCAN][ch:1]AP:17, other(ID:0, RD:3), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (10566) mesh: [FAIL][6]root:0, fail:6, normal:0

I (10811) mesh: [SCAN][ch:1]AP:14, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (10812) mesh: [FAIL][7]root:0, fail:7, normal:0

I (11056) mesh: [SCAN][ch:1]AP:9, other(ID:0, RD:2), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (11058) mesh: [FAIL][8]root:0, fail:8, normal:0

I (11302) mesh: [SCAN][ch:1]AP:13, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (11303) mesh: [FAIL][9]root:0, fail:9, normal:0

I (11548) mesh: [SCAN][ch:1]AP:13, other(ID:0, RD:2), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (11550) mesh: [FAIL][10]root:0, fail:10, normal:0

I (11794) mesh: [SCAN][ch:1]AP:14, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (11795) mesh: [FAIL][11]root:0, fail:11, normal:0

I (12039) mesh: [SCAN][ch:1]AP:9, other(ID:0, RD:2), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (12041) mesh: [FAIL][12]root:0, fail:12, normal:0

I (12285) mesh: [SCAN][ch:1]AP:10, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (12287) mesh: [FAIL][13]root:0, fail:13, normal:0

I (12531) mesh: [SCAN][ch:1]AP:15, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (12533) mesh: [FAIL][14]root:0, fail:14, normal:0

I (12777) mesh: [SCAN][ch:1]AP:13, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (12779) mesh: [FAIL][15]root:0, fail:15, normal:0

I (13024) mesh: [SCAN][ch:1]AP:10, other(ID:0, RD:2), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (13025) mesh: [FAIL][16]root:0, fail:16, normal:0

I (13270) mesh: [SCAN][ch:1]AP:7, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (13272) mesh: [FAIL][17]root:0, fail:17, normal:0

I (13517) mesh: [SCAN][ch:1]AP:17, other(ID:0, RD:2), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (13519) mesh: [FAIL][18]root:0, fail:18, normal:0

I (13766) mesh: [SCAN][ch:1]AP:11, other(ID:0, RD:2), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (13768) mesh: [FAIL][19]root:0, fail:19, normal:0

I (14013) mesh: [SCAN][ch:1]AP:7, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (14015) mesh: [FAIL][20]root:0, fail:20, normal:0

I (14017) mesh: <MESH_NWK_PARENT_SELECTION>no parent found, set ROOTLESS, scan times:20, heap:62275
I (14021) mesh: <MESH_NWK_PARENT_SELECTION>channel<cfg:0> is allowed to switch, do a full channel scan
W (14026) wifi:age timer already deinit
I (14030) wifi:mode : sta (44:17:93:db:d9:e4)
I (14044) [node_main, 119]: mesh network has a root: 0
I (14045) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:1
I (17416) mesh: find root:ESPM_1E053C, root_cap:5(max:512), new channel:1, old channel:0
I (17417) mesh: [S3], 84:1e:a3:47:84:b6, channel:1, rssi:-90
I (17418) mesh: [S7], fc:f5:c4:3c:fc:61, channel:1, rssi:-80
I (17427) mesh: [S16], d2:21:f9:b8:60:46, channel:6, rssi:-91
I (17433) mesh: [FIND][ch:0]AP:27, otherID:0, MAP:1, idle:0, candidate:0, root:1[4d:08:40:2c:75:09][FIXED-ROOT]
I (17438) mesh: [FIND:1]find a network, channel:1, cfg<channel:0, router:, 4d:08:40:2c:75:09>

I (17442) [node_main, 269]: MESH_EVENT_FIND_NETWORK
I (17443) wifi:mode : sta (44:17:93:db:d9:e4) + softAP (44:17:93:db:d9:e5)
W (17449) wifi:<MESH AP>adjust channel:1, secondary channel offset:1(40U)
I (17452) wifi:Total power save buffer number: 4
I (17709) mesh: [SCAN][ch:1]AP:11, other(ID:0, RD:2), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (17711) mesh: [FAIL][21]root:0, fail:21, normal:0


Log when connection is established:
I (27) boot: ESP-IDF v4.4.3-1-gcc5640aef3 2nd stage bootloader
I (27) boot: compile time 08:15:55
I (27) boot: chip revision: 3
I (29) boot.esp32: SPI Speed      : 80MHz
I (32) boot.esp32: SPI Mode       : DIO
I (34) boot.esp32: SPI Flash Size : 16MB
I (36) boot: Enabling RNG early entropy source...
I (39) boot: Partition Table:
I (41) boot: ## Label            Usage          Type ST Offset   Length
I (44) boot:  0 otadata          OTA data         01 00 0000d000 00002000
I (48) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (52) boot:  2 coredump         Unknown data     01 03 00010000 00010000
I (56) boot:  3 reserved         Unknown data     01 fe 00020000 00020000
I (59) boot:  4 hwadmin          Unknown data     01 d0 00040000 00001000
I (63) boot:  5 az_cert          unknown          40 00 00041000 00001000
I (67) boot:  6 az_cert_key      unknown          40 01 00042000 00001000
I (70) boot:  7 az_ca_api        unknown          40 02 00043000 00003000
I (74) boot:  8 az_ca_blob       unknown          40 03 00046000 00003000
I (78) boot:  9 nvs_key          NVS keys         01 04 00049000 00001000
I (82) boot: 10 nvs              WiFi data        01 02 00050000 00020000
I (85) boot: 11 filesystem       Unknown data     01 81 00070000 00370000
I (89) boot: 12 factory          factory app      00 00 00400000 00400000
I (93) boot: 13 ota_0            OTA app          00 10 00800000 00400000
I (97) boot: 14 ota_1            OTA app          00 11 00c00000 00400000
I (100) boot: End of partition table
I (102) boot: Defaulting to factory image
I (105) esp_image: segment 0: paddr=00400020 vaddr=3f400020 size=be5a0h (779680) map
I (345) esp_image: segment 1: paddr=004be5c8 vaddr=3ffbdb60 size=01a50h (  6736) load
I (347) esp_image: segment 2: paddr=004c0020 vaddr=400d0020 size=25fa1ch (2488860) map
I (1099) esp_image: segment 3: paddr=0071fa44 vaddr=3ffbf5b0 size=053c4h ( 21444) load
I (1107) esp_image: segment 4: paddr=00724e10 vaddr=40080000 size=1d994h (121236) load
I (1149) esp_image: segment 5: paddr=007427ac vaddr=50000000 size=00010h (    16) load
I (1165) boot: Loaded app from partition at offset 0x400000
I (1165) boot: Disabling RNG early entropy source...
I (1171) psram: This chip is ESP32-D0WD
I (1171) spiram: Found 64MBit SPI RAM device
I (1171) spiram: SPI RAM mode: flash 80m sram 80m
I (1173) spiram: PSRAM initialized, cache is in low/high (2-core) mode.
I (1177) cpu_start: Pro cpu up.
I (1179) cpu_start: Starting app cpu, entry point is 0x400817bc
I (0) cpu_start: App cpu up.
I (1726) spiram: SPI SRAM memory test OK
I (1734) cpu_start: Pro cpu start user code
I (1734) cpu_start: cpu freq: 240000000
I (1734) cpu_start: Application information:
I (1735) cpu_start: Project name:     ElsIotModule
I (1738) cpu_start: App version:      1.3.0
I (1741) cpu_start: Compile time:     Jun 22 2023 15:11:27
I (1744) cpu_start: ELF file SHA256:  8d10c98a9ef7eb70...
I (1747) cpu_start: ESP-IDF:          v4.4.3-1-gcc5640aef3
I (1750) heap_init: Initializing. RAM available for dynamic allocation:
I (1754) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (1757) heap_init: At 3FFB6388 len 00001C78 (7 KiB): DRAM
I (1760) heap_init: At 3FFB9A20 len 00004108 (16 KiB): DRAM
I (1763) heap_init: At 3FFD11D0 len 0000EE30 (59 KiB): DRAM
I (1766) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (1769) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (1772) heap_init: At 4009D994 len 0000266C (9 KiB): IRAM
I (1776) spiram: Adding pool of 4084K of external SPI memory to heap allocator
I (1780) spi_flash: detected chip: generic
I (1782) spi_flash: flash io: dio
I (1827) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (1844) spiram: Reserving pool of 24K of internal memory for DMA/internal allocations
I (2427) [main, 256]: Running partition, label: factory, type: 0x0, subtype: 0x0, address: 0x400000
I (2490) [main, 115]: efuse block [0, 0, 192]: 00 00 00 00 E4 D9 DB 93  17 44 57 00 00 A2 00 00  32 09 00 00 00 00 10 00
I (2493) [main, 115]: efuse block [1, 0, 256]: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
I (2549) [main, 115]: efuse block [2, 0, 256]: 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
I (2552) [main, 115]: efuse block [3, 0, 256]: 00 00 00 00 00 00 00 00  00 00 42 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
I (4611) [mesh_wifi, 330]: mesh_wifi_init(root=false)
I (4627) wifi:wifi driver task: 3ffde190, prio:23, stack:6656, core=1
I (4639) system_api: Base MAC address is not set
I (4639) system_api: read default base MAC address from EFUSE
I (4663) wifi:wifi firmware version: 8cb87ff
I (4664) wifi:wifi certification version: v7.0
I (4665) wifi:config NVS flash: enabled
I (4665) wifi:config nano formating: disabled
I (4667) wifi:Init data frame dynamic rx buffer num: 64
I (4669) wifi:Init management frame dynamic rx buffer num: 64
I (4671) wifi:Init management short buffer num: 32
I (4675) wifi:Init static tx buffer num: 8
I (4677) wifi:Init tx cache buffer num: 64
I (4678) wifi:Init static rx buffer size: 1600
I (4680) wifi:Init static rx buffer num: 8
I (4682) wifi:Init dynamic rx buffer num: 64
I (4684) wifi_init: tcpip mbox: 32
I (4736) wifi_init: udp mbox: 6
I (4737) wifi_init: tcp mbox: 6
I (4737) wifi_init: tcp tx win: 5840
I (4737) wifi_init: tcp rx win: 5840
I (4739) wifi_init: tcp mss: 1400
I (4741) wifi_init: WiFi/LWIP prefer SPIRAM
I (4744) wifi:Set ps type: 0

I (4745) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
I (4944) wifi:mode : sta (44:17:93:db:d9:e4) + softAP (44:17:93:db:d9:e5)
I (4945) wifi:enable tsf
I (4947) wifi:Total power save buffer number: 4
I (4949) wifi:Init max length of beacon: 752/752
I (4950) wifi:Init max length of beacon: 752/752
I (5012) mesh: <nvs>read layer:0, err:0x1102
I (5017) mesh: <nvs>read assoc:0, err:0x1102
I (5127) wifi:Set ps type: 0

I (5155) mesh: [CONFIG]fail to set node, err:0x4008
I (5155) [mesh_wifi, 413]:  - router_ssid       : XXXXXXX
I (5155) [mesh_wifi, 414]:  - router_password   : XXXXXXXXXX
I (5158) [mesh_wifi, 415]:  - mesh_id           : XXXXXX
I (5161) [mesh_wifi, 416]:  - mesh_password     : XXXXXXXXXXXXXXX
I (5189) wifi:Total power save buffer number: 4
W (5654) wifi:age timer already deinit
I (5656) wifi:mode : sta (44:17:93:db:d9:e4)
I (5679) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:1
I (5685) [node_main, 74]: MESH is started
I (5694) [mesh_wifi, 422]: mesh starts successfully, heap:3932395, root fixed

I (7004) [mdf_mem, 199]: Free heap, internal current: 59099, minimum: 58907, total current: 3702135, minimum: 3700835

I (9168) mesh: [S1], fa:8f:ca:92:ea:09, channel:1, rssi:-92
I (9169) mesh: find root:ESPM_1E053C, root_cap:5(max:512), new channel:1, old channel:0
I (9170) mesh: [S3], fc:f5:c4:3c:fc:61, channel:1, rssi:-82
I (9172) mesh: [S4], 34:ab:95:5c:22:45, channel:1, rssi:-72
I (9175) mesh: [S5], fc:f5:c4:3c:f5:01, channel:1, rssi:-65
I (9178) mesh: [S8], c4:dd:57:b8:19:75, channel:1, rssi:-65
I (9180) mesh: [S9], c4:dd:57:b8:18:f1, channel:1, rssi:-67
I (9183) mesh: [S10], c4:dd:57:b8:19:29, channel:1, rssi:-67
I (9186) mesh: [S11], fc:f5:c4:3c:fc:0d, channel:1, rssi:-60
I (9188) mesh: [S12], fc:f5:c4:3c:f4:81, channel:1, rssi:-82
I (9191) mesh: [S15], fc:f5:c4:3c:fb:a9, channel:1, rssi:-55
I (9194) mesh: [S16], 84:1e:a3:47:84:b6, channel:1, rssi:-84
I (9196) mesh: [S17], fc:f5:c4:3c:f4:89, channel:1, rssi:-76
I (9199) mesh: [FIND][ch:0]AP:30, otherID:0, MAP:1, idle:0, candidate:0, root:1[4d:08:40:2c:75:09][FIXED-ROOT]
I (9204) mesh: [FIND:1]find a network, channel:1, cfg<channel:0, router:, 4d:08:40:2c:75:09>

I (9208) [node_main, 269]: MESH_EVENT_FIND_NETWORK
I (9209) wifi:mode : sta (44:17:93:db:d9:e4) + softAP (44:17:93:db:d9:e5)
W (9215) wifi:<MESH AP>adjust channel:1, secondary channel offset:1(40U)
I (9218) wifi:Total power save buffer number: 4
I (9474) mesh: [SCAN][ch:1]AP:8, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (9476) mesh: 7369[selection]try rssi_threshold:-78, backoff times:0, max:5<-78,-82,-85>
I (9480) mesh: [DONE]connect to parent:ESPM_1E053C, channel:1, rssi:-38, c4:de:e2:1e:05:3d[layer:1, assoc:4], my_vote_num:0/voter_num:0, rc[00:00:00:00:00:00/-120/0]
I (9487) mesh: set router bssid:d5:32:80:f0:31:fd
I (9517) wifi:new:<1,1>, old:<1,1>, ap:<1,1>, sta:<1,1>, prof:1
I (9517) wifi:state: init -> auth (b0)
I (9529) wifi:state: auth -> assoc (0)
I (9548) mesh: <MESH_NWK_MIE_CHANGE><><><><ROOT ADDR><><><>
I (9548) wifi:I (9549) mesh: <MESH_NWK_ROOT_ADDR>from assoc, layer:2, root_addr:c4:de:e2:1e:05:3d, root_cap:5
state: assoc -> run (10)
I (9552) wifi:connected with ESPM_1E053C, aid = 2, channel 1, 40U, bssid = c4:de:e2:1e:05:3d
I (9556) wifi:security: Open Auth, phy: bgn, rssi: -39
I (9562) wifi:pm start, type: 0

I (9563) mesh: <MESH_NWK_ROOT_ADDR>idle, layer:2, root_addr:c4:de:e2:1e:05:3d, conflict_roots.num:0<>
I (9565) [node_main, 97]: <MESH_EVENT_ROOT_ADDRESS>root address:c4dee21e053d
I (9565) mesh: [scan]new scanning time:600ms, beacon interval:300ms
I (9584) mesh: 2004<arm>parent monitor, my layer:2(cap:16)(node), interval:9071ms, retries:1<normal connected>
I (9585) [mesh_netif, 469]: It was a wifi station removing stuff
I (9588) [mesh_netif, 84]: Receiving task started
I (9637) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (9641) [node_main, 88]: <MESH_EVENT_PARENT_CONNECTED>
I (9642) [node_main, 110]: MESH_EVENT_TODS_STATE toDS=1
I (9645) [node_main, 97]: <MESH_EVENT_ROOT_ADDRESS>root address:c4dee21e053d

More Information.

Using a mesh network system and have a custom software based on the ip_internal_network example (with a lot of application additions).
Root node is fixed!

ESP32-WROVER 16MB flash / 4MB ram
running on both cores
wifi module assigned to core 1

I also tried to modify the original example project with same changes we have in our project (sdkconfig):

  • enable SPIRAM
  • dual core
  • wifi on core 1
  • other tuning of SPIRAM usage
    None of the changes in the example project caused similar behavior. The example project connect to my projects root 100% of the startups.

I have not tested more recent version of esp-idf since it breaks my project.

@ClaesIvarsson ClaesIvarsson added the Type: Bug bugs in IDF label Sep 5, 2023
@espressif-bot espressif-bot added the Status: Opened Issue is new label Sep 5, 2023
@github-actions github-actions bot changed the title wifi mesh node do not find parent found (or root) wifi mesh node do not find parent found (or root) (IDFGH-11009) Sep 5, 2023
@ClaesIvarsson
Copy link
Author

ClaesIvarsson commented Sep 5, 2023

Now tested with esp-idf 4.4.5. Same result as for 4.4.3
I also tried to update to 5.0.3 but I was not able to find a solution to migrate changes for the azure-iot-sdk-c port.

@ClaesIvarsson ClaesIvarsson changed the title wifi mesh node do not find parent found (or root) (IDFGH-11009) wifi mesh node do not find parent (or root) (IDFGH-11009) Sep 6, 2023
@ClaesIvarsson
Copy link
Author

Maybe related to #6518 ?

@zhangyanjiaoesp
Copy link
Collaborator

zhangyanjiaoesp commented Sep 12, 2023

@ClaesIvarsson
Please confirm whether the following information is correct.

  1. In your own mesh network, the root node is fixed, and node searches for root/parent always fail.
  2. When you use the same fixed root, and other nodes use the ip_internal_network code combined with your changes, the node can connect well.
  3. when you set the fixed root, do you call esp_mesh_fix_root(true) for all nodes?
  4. what is the argument of the esp_mesh_set_self_organized(enable, select_parent) function in your code?

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Sep 12, 2023
@ClaesIvarsson
Copy link
Author

  1. after reset of node it fails to find network 50% of the times
  2. ip_internal_network example combined with sdkconfig from my project connects well
  3. yes, all nodes except root (root gets esp_mesh_set_type(MESH_ROOT)
  4. I do not call esp_mesh_set_self_organized as documentation says it is enabled by default

@zhangyanjiaoesp
Copy link
Collaborator

@ClaesIvarsson Can you provide a demo for us to reproduce this issue?

@ClaesIvarsson
Copy link
Author

@ClaesIvarsson Can you provide a demo for us to reproduce this issue?

Yes, I can demonstrate the problem, Maybe Friday at 10 CET is ok ? Please let me know and I will provide an invitation.

@zhangyanjiaoesp
Copy link
Collaborator

@ClaesIvarsson Can you provide a demo for us to reproduce this issue?

Yes, I can demonstrate the problem, Maybe Friday at 10 CET is ok ? Please let me know and I will provide an invitation.

Do you mean that you want to demonstrate your problem online? Actually, we hope you can provide a reproduce code so that we can test and debug it locally.

@ClaesIvarsson
Copy link
Author

Well,

@ClaesIvarsson Can you provide a demo for us to reproduce this issue?

Yes, I can demonstrate the problem, Maybe Friday at 10 CET is ok ? Please let me know and I will provide an invitation.

Do you mean that you want to demonstrate your problem online? Actually, we hope you can provide a reproduce code so that we can test and debug it locally.

Maybe a misunderstanding from my side!

I am not sure I am allowed to share the existing code, will have to check how in such case.
To strip it down to something shareable would be rather large effort that I am not able to fit in my schedule right now. I will check how to proceed with my manager.

@ClaesIvarsson
Copy link
Author

Is there anyway to get more debug information out from the mesh module ? Is there any information the provided logs that gives any hints on the problem at hand ?
Any explanation why the scanning of network doesn't restart after failure of finding a network ?
It looks like there is a full scan of WiFi networks at startup but that only happens once ... is this true ?

@zhangyanjiaoesp
Copy link
Collaborator

@ClaesIvarsson

When the mesh starts, it will scan all channels to find the channel and network,

I (9045) mesh: [S1], c4:dd:57:b8:19:01, channel:1, rssi:-73
I (9046) mesh: [S3], fa:8f:ca:92:ea:09, channel:1, rssi:-92
I (9046) mesh: [S5], 30:c3:d9:aa:6c:bb, channel:1, rssi:-94
I (9048) mesh: [S7], c4:dd:57:b8:19:65, channel:1, rssi:-69
I (9051) mesh: find root:ESPM_1E053C, root_cap:5(max:512), new channel:1, old channel:0
I (9055) mesh: [S10], 84:1e:a3:47:84:b6, channel:1, rssi:-94
I (9057) mesh: [S16], d2:21:f9:b8:60:46, channel:6, rssi:-92
I (9060) mesh: [FIND][ch:0]AP:29, otherID:0, MAP:2, idle:0, candidate:1, root:1[4d:08:40:2c:75:09][FIXED-ROOT]
I (9065) mesh: [FIND:1]find a network, channel:1, cfg<channel:0, router:, 4d:08:40:2c:75:09>

This scan result shows that the device has found the fixed root device, the channel is 1. Then the device will scan again on channel 1 and try to connect to the fixed-root(ESPM_1E053C). But in your failed log, it shows the device can't find any device to be connected, the MAP, candidate and root all are 0:

I (9336) mesh: [SCAN][ch:1]AP:16, other(ID:0, RD:1), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (9338) mesh: [FAIL][1]root:0, fail:1, normal:0

The device then continuously scans on channel 1 until the specified number of failures is reached. In your case, it's 20 times. After this, the device will start to scan all channels to find network, and still find the fixed-root (ESPM_1E053C) on channel.

I (14021) mesh: <MESH_NWK_PARENT_SELECTION>channelcfg:0 is allowed to switch, do a full channel scan
W (14026) wifi:age timer already deinit
I (14030) wifi:mode : sta (44:17:93:db:d9:e4)
I (14044) [node_main, 119]: mesh network has a root: 0
I (14045) mesh: <MESH_NWK_LOOK_FOR_NETWORK>need_scan:0x3, need_scan_router:0x0, look_for_nwk_count:1
I (17416) mesh: find root:ESPM_1E053C, root_cap:5(max:512), new channel:1, old channel:0
I (17417) mesh: [S3], 84:1e:a3:47:84:b6, channel:1, rssi:-90
I (17418) mesh: [S7], fc:f5:c4:3c:fc:61, channel:1, rssi:-80
I (17427) mesh: [S16], d2:21:f9:b8:60:46, channel:6, rssi:-91
I (17433) mesh: [FIND][ch:0]AP:27, otherID:0, MAP:1, idle:0, candidate:0, root:1[4d:08:40:2c:75:09][FIXED-ROOT]
I (17438) mesh: [FIND:1]find a network, channel:1, cfg<channel:0, router:, 4d:08:40:2c:75:09>

Then try to connect on channel 1, repeat the process above.

At the end of the first scan after switching to channel 1, the device should be able to scan the fixed-root, and then select to connect, just like in your success log.

W (9215) wifi:adjust channel:1, secondary channel offset:1(40U)
I (9218) wifi:Total power save buffer number: 4
I (9474) mesh: [SCAN][ch:1]AP:8, other(ID:0, RD:0), MAP:1, idle:0, candidate:1, root:1, topMAP:0[c:0,i:0][4d:08:40:2c:75:09][FIXED-ROOT]<>
I (9476) mesh: 7369[selection]try rssi_threshold:-78, backoff times:0, max:5<-78,-82,-85>
I (9480) mesh: [DONE]connect to parent:ESPM_1E053C, channel:1, rssi:-38, c4:de:e2:1e:05:3d[layer:1, assoc:4], my_vote_num:0/voter_num:0, rc[00:00:00:00:00:00/-120/0]

It's strange that fixed-root cannot be scanned on channel 1 at the time of failure. Since you said the test with the IDF official example was successful and that it failed with your own code, so suggest you to check your own code.

@ClaesIvarsson
Copy link
Author

I have spent a lot of time trying to find a reason for this issue to happen.
My thesis has been that there is some kind of memory corruption, but I have not been able to find a pattern to follow.

I can see that some builds are more prune to fail than others. I have used git bisect function to try to find a commit that cause the issue. But my conclusion is that this happens on almost versions of my software since I introduced the NAT feature based on the ip_internal_network. If present, mainly depends on the number of tests performed.
I created a script that restart the node and waits for the event MESH_EVENT_PARENT_CONNECTED (with a timeout of 90s) and repeat this for N time (I ran my tests with 20 repetitions).

@yanmartins
Copy link

yanmartins commented Oct 19, 2023

I have spent a lot of time trying to find a reason for this issue to happen. My thesis has been that there is some kind of memory corruption, but I have not been able to find a pattern to follow.

I can see that some builds are more prune to fail than others. I have used git bisect function to try to find a commit that cause the issue. But my conclusion is that this happens on almost versions of my software since I introduced the NAT feature based on the ip_internal_network. If present, mainly depends on the number of tests performed. I created a script that restart the node and waits for the event MESH_EVENT_PARENT_CONNECTED (with a timeout of 90s) and repeat this for N time (I ran my tests with 20 repetitions).

I noticed this problem when using NAT with this same example (ip_internal_network). Yesterday I opened an issue, maybe it's the same problem: #12431

@ClaesIvarsson
Copy link
Author

I noticed this problem when using NAT with this same example (ip_internal_network). Yesterday I opened an issue, maybe it's the same problem: #12431

We tried the example with minor adjustments used as node in the mesh network (not root) and didn't see this behavior at that point, but maybe we didn't test it enough times. I will try to do this test again with more repetitions.

@ClaesIvarsson
Copy link
Author

I believe I have found the root cause for this problem!

The mesh password variable stored in the cfg structure was not null terminated when my password was copied. It rely on the fact that the cfg.mesh_ap.password buffer is initialized with zero value.
After adding a command to set the complete cfg structure to zero, I got rid of this problem.
I think it would be good if all mesh network examples are updated with this.

    mesh_cfg_t cfg; /*  = MESH_INIT_CONFIG_DEFAULT(); */
    memset(&cfg, 0, sizeof(mesh_cfg_t);
    cfg.crypto_funcs = &g_wifi_default_mesh_crypto_funcs;

or at least make sure that
buffers in the structure without length information is also terminated properly

cfg.router.password[sizeof(CONFIG_MESH_ROUTER_PASSWD)] = '\0';
cfg.mesh_ap.password[sizeof(CONFIG_MESH_AP_PASSWD)] = '\0';

@zhangyanjiaoesp
Copy link
Collaborator

I believe I have found the root cause for this problem!

The mesh password variable stored in the cfg structure was not null terminated when my password was copied. It rely on the fact that the cfg.mesh_ap.password buffer is initialized with zero value. After adding a command to set the complete cfg structure to zero, I got rid of this problem. I think it would be good if all mesh network examples are updated with this.

    mesh_cfg_t cfg; /*  = MESH_INIT_CONFIG_DEFAULT(); */
    memset(&cfg, 0, sizeof(mesh_cfg_t);
    cfg.crypto_funcs = &g_wifi_default_mesh_crypto_funcs;

or at least make sure that buffers in the structure without length information is also terminated properly

cfg.router.password[sizeof(CONFIG_MESH_ROUTER_PASSWD)] = '\0';
cfg.mesh_ap.password[sizeof(CONFIG_MESH_AP_PASSWD)] = '\0';

@ClaesIvarsson Thanks for your update, we will consider to update it.

@zhangyanjiaoesp
Copy link
Collaborator

@ClaesIvarsson
Why the IDF official example can work well ? Is there anything special about your password?

@ClaesIvarsson
Copy link
Author

ClaesIvarsson commented Oct 24, 2023

@ClaesIvarsson Why the IDF official example can work well ? Is there anything special about your password?

I think it is luck or coincidence. The cfg variable is located on the stack which is uninitialized. My task is located in external ram, maybe this is the major difference.

@espressif-bot espressif-bot added Status: Opened Issue is new and removed Status: In Progress Work is in progress labels Mar 4, 2024
@MaxwellAlan
Copy link
Collaborator

MaxwellAlan commented Mar 5, 2024

@ClaesIvarsson @zhangyanjiaoesp

Thanks for the reporting and debuging.

We will close this issue, plz feel free to reopen if still have this issue.

Thanks.

@espressif-bot espressif-bot added Status: Done Issue is done internally and removed Status: Opened Issue is new labels Mar 5, 2024
@espressif-bot espressif-bot added the Resolution: Done Issue is done internally label Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

5 participants