Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mesh] v4.4 - LTE mesh poor performances (IDFGH-9810) #11144

Closed
3 tasks done
KonssnoK opened this issue Apr 6, 2023 · 30 comments
Closed
3 tasks done

[mesh] v4.4 - LTE mesh poor performances (IDFGH-9810) #11144

KonssnoK opened this issue Apr 6, 2023 · 30 comments
Assignees
Labels
Resolution: Won't Do This will not be worked on Status: Done Issue is done internally

Comments

@KonssnoK
Copy link
Contributor

KonssnoK commented Apr 6, 2023

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

General issue report

Hello there,
now that we have partial fixes for
#9955
#11006

we enabled modem handling in our project.

We already have one beta installation that works exclusively on LTE connection -> Fixed root mesh with LTE and no WIFI

Apparently, there are some issues:

  • it's impossible to perform normal OTA to devices on the network, it always fails. I think this is because the update rate is super slow 3KB/s.
  • devices are going offline persistently until reset
  • devices are reporting messages multiple times.

Question:

  • do you have possibility to create a setup with an esp32s3 connected to a modem? (we have UART connection)
  • if not, should we organize some hardware for you?

Meanwhile we will setup in our offices some LTE installations to monitor and in order to give you logs.

@espressif-bot espressif-bot added the Status: Opened Issue is new label Apr 6, 2023
@github-actions github-actions bot changed the title [mesh] v4.4 - LTE mesh poor performances [mesh] v4.4 - LTE mesh poor performances (IDFGH-9810) Apr 6, 2023
@KonssnoK
Copy link
Contributor Author

KonssnoK commented Apr 9, 2023

further investigation is leading the disconnections events to multiple MQTT errors.
We are now starting to investigate the MQTT library, forking it.

Here an example:

[1] MQTT_EVENT_ERROR 32792 78 0 1 0 0
[694] MQTT_EVENT_ERROR 32774 0 0 1 0 0
[19] MQTT_EVENT_ERROR 32769 0 0 1 0 119
[33] MQTT_EVENT_ERROR 32769 0 0 1 0 11
[12] MQTT_EVENT_ERROR 32794 80 0 1 0 0
[4] MQTT_EVENT_ERROR 32794 29312 0 1 0 0
[1] MQTT_EVENT_ERROR 32794 76 0 1 0 0

We'll provide more info further on... not sure if we'll do PRs thought.

The other issues are still relevant to mesh:

  • it's impossible to perform normal OTA to devices on the network, it always fails. I think this is because the update rate is super slow 3KB/s.
  • devices are reporting messages multiple times.

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK

Question:

  • do you have possibility to create a setup with an esp32s3 connected to a modem? (we have UART connection)
  • if not, should we organize some hardware for you?

It will be better if you can organized some hardware for us, this can ensure the consistency between our tests and yours.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Apr 10, 2023
@KonssnoK
Copy link
Contributor Author

let me discuss with my colleagues, there might be some restrictions related to LTE and SIMs :)

@KonssnoK
Copy link
Contributor Author

@zhangyanjiaoesp how would you see us making a dedicated VPN access to hardware in Zurich? (or Paris/Milan)

We have the strong belief that we will have problems in China the SIM/modem.

We would give access to a laptop with a device with the modem attached (and maybe other devices in the same network)

@KonssnoK
Copy link
Contributor Author

@zhangyanjiaoesp other question: does the mesh protocol internally implements retries?
To understand if those multiple packets that we see could be caused by the mesh layer of if it's the MQTT layer

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK

how would you see us making a dedicated VPN access to hardware in Zurich? (or Paris/Milan)

We can have a try.

does the mesh protocol internally implements retries?

When call esp_mesh_send() to send P2P mesh data, there have mesh retries to ensure the reliability of transmission.

@KonssnoK
Copy link
Contributor Author

When call esp_mesh_send() to send P2P mesh data, there have mesh retries to ensure the reliability of transmission.

Sorry i didn't understand, if i call esp_mesh_send with type P2P -> there are internal retries?

@zhangyanjiaoesp
Copy link
Collaborator

When call esp_mesh_send() to send P2P mesh data, there have mesh retries to ensure the reliability of transmission.

Sorry i didn't understand, if i call esp_mesh_send with type P2P -> there are internal retries?

yes

@KonssnoK
Copy link
Contributor Author

Hello @zhangyanjiaoesp ,
i have a question to solve another issue we are currently facing with the mesh layer.

Currently we have the devices sometimes getting stuck in the following condition:
image
which, after quite some work, came out being because the tcpip task is currently doing
image

coming from the code (which should be as in the default example):

static esp_err_t mesh_driver_transmit_from_node_sta(void* h, void* buffer, size_t len)
{
    mesh_data_t data;
    ESP_LOGD(TAG, "Sending to root, dest addr: " MACSTR ", size: %d", MAC2STR((uint8_t*)buffer), len);
    data.data = buffer;
    data.size = len;
    data.proto = MESH_PROTO_AP; // Node's station transmits data to root's AP
    data.tos = MESH_TOS_P2P;
    esp_err_t err = esp_mesh_send(NULL, &data, MESH_DATA_TODS, NULL, 0);
    if (err != ESP_OK) {
        LOGE(TAG, "STA: Send err 0x%X %s", err, esp_err_to_name(err));
    }
    return err;
}

Questions:

  • why is the esp_mesh_send waiting on a queue? i would expect it to add to a queue instead
  • Could i add MESH_DATA_NONBLOCK to the call? do you have advices on how to handle non-blocking queues?

i will check if this small change will solve the stuck issue i'm seeing meanwhile

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Apr 25, 2023

adding MESH_DATA_NONBLOCK did not help.
i'm now also forcing a 2 seconds timeout with

ESP_ERROR_CHECK(esp_mesh_send_block_time(2000));

EDIT:
the combination of send_block_time and MESH_DATA_NONBLOCK seems to solve the issue we are facing!

@KonssnoK
Copy link
Contributor Author

@zhangyanjiaoesp could you also please explain the difference between
ESP_ERROR_CHECK(esp_mesh_send_block_time(2000));
and using
esp_err_t err = esp_mesh_send(NULL, &data, MESH_DATA_TODS, NULL, 0);
or
esp_err_t err = esp_mesh_send(NULL, &data, MESH_DATA_TODS | MESH_DATA_NONBLOCK , NULL, 0);

Is MESH_DATA_NONBLOCK necessary for esp_mesh_send to be non-blocking if i set esp_mesh_send_block_time?
Explained in another way: if i do esp_mesh_send_block_time and then i call esp_mesh_send WITHOUT MESH_DATA_NONBLOCK , will it block forever OR will it block for 2 seconds??

Thanks!

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Apr 25, 2023

@zhangyanjiaoesp , also sometimes we have wifi heap corruption.
What is the best procedure for this? we open a new issue? sadly it's quite difficult to reproduce but quite diffused amongst devices (multiple crashes everyday on multiple devices out of the 150 we have in the pilot).

image

_rootless:1
W (11:16:04.404) monitor: 3W3P:: rule deactivated       
I (1298619) wifi:I (11:16:07.733) mesh_main: <MESH_EVENT_FIND_NETWORK>new channel:1, router BSSID:00:00:00:00:00:00
mode : sta (7c:df:a1:e2:a3:04) + softAP (7c:df:a1:e2:a3:05)
W (1298625) wifi:<MESH AP>adjust channel:1, secondary channel offset:1(40U)
I (1298637) wifi:Total power save buffer number: 8      
W (11:16:10.404) monitor: 3W3P:: rule activated
W (11:16:11.404) monitor: 3W3P:: rule deactivated       
I (1304711) wifi:new:<1,1>, old:<1,1>, ap:<1,1>, sta:<1,0>, prof:1
I (1305695) wifi:state: init -> auth (b0)
CORRUPT HEAP: Bad head at 0x3dea961c. Expected 0xabba1234 got 0x3de00014

assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)
Setting breakpoint at 0x403761c5 and returning...       
xtensa-esp32s3-elf-addr2line -pfiaC -e c:\src\v3\firmware\build\firmware.elf 0x403761c5: [WinError 2] The system cannot find the file specified

@zhangyanjiaoesp
Copy link
Collaborator

@zhangyanjiaoesp , also sometimes we have wifi heap corruption. What is the best procedure for this? we open a new issue? sadly it's quite difficult to reproduce but quite diffused amongst devices (multiple crashes everyday on multiple devices out of the 150 we have in the pilot).

image

_rootless:1
W (11:16:04.404) monitor: 3W3P:: rule deactivated       
I (1298619) wifi:I (11:16:07.733) mesh_main: <MESH_EVENT_FIND_NETWORK>new channel:1, router BSSID:00:00:00:00:00:00
mode : sta (7c:df:a1:e2:a3:04) + softAP (7c:df:a1:e2:a3:05)
W (1298625) wifi:<MESH AP>adjust channel:1, secondary channel offset:1(40U)
I (1298637) wifi:Total power save buffer number: 8      
W (11:16:10.404) monitor: 3W3P:: rule activated
W (11:16:11.404) monitor: 3W3P:: rule deactivated       
I (1304711) wifi:new:<1,1>, old:<1,1>, ap:<1,1>, sta:<1,0>, prof:1
I (1305695) wifi:state: init -> auth (b0)
CORRUPT HEAP: Bad head at 0x3dea961c. Expected 0xabba1234 got 0x3de00014

assert failed: multi_heap_free multi_heap_poisoning.c:259 (head != NULL)
Setting breakpoint at 0x403761c5 and returning...       
xtensa-esp32s3-elf-addr2line -pfiaC -e c:\src\v3\firmware\build\firmware.elf 0x403761c5: [WinError 2] The system cannot find the file specified

@KonssnoK Suggest to create a new issue to tracing the heap corruption issue.

@zhangyanjiaoesp
Copy link
Collaborator

@KonssnoK
When you set the data.tos = MESH_TOS_P2P, the esp_mesh_send() function will be blocked, it will send the mesh data to wifi task, and then waiting on a queue to recv the tx state.
That is to say, if you set data.tos = MESH_TOS_P2P, then set the flag to include MESH_DATA_NONBLOCK will not work. At this time, you need to call esp_mesh_send_block_time() to specify a blocking time.

@KonssnoK
Copy link
Contributor Author

as requested,
i opened #11713
to track WIFI related crashes.
@zhangyanjiaoesp @mhdong
Meanwhile we continue to investigate slowness of transfer with our LTE SIM provider

@KonssnoK
Copy link
Contributor Author

We are still investigating with the network providers and the modem manufacturer. For now this does not seem related to the mesh per se. I will close the issue for now and reopen in case we find that both modem and network are ok.

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: Won't Do This will not be worked on and removed Status: In Progress Work is in progress labels Dec 19, 2023
@kieennt13
Copy link

kieennt13 commented Nov 5, 2024

Hello @KonssnoK ,
I have seen the ip_internal_network example in your forked esp-idf repository (branch lc/fixed_root_53) and have tried running it.

When Wi-Fi is available, everything works as expected.

However, when I disable Wi-Fi, although both the root and node switch to fixed root mode, their behavior confuses me. Below is the log of the root after switching to fixed root mode:

I (156372) mesh_main: <MESH_EVENT_PARENT_DISCONNECTED>reason:201 ERROR
W (156372) ping: Ignore counters while transitioning. error 201
W (157052) mesh_hand: Triggering FIXED ROOT handover
I (157052) mesh: <MESH_NWK_PARENT_DISCONNECTED>already disconnected, ignore it
I (157052) mesh: [IO]disable self-organizing<reconnect>
I (157072) mesh: <nvs>write layer:0
I (157082) mesh: <nvs>write assoc:0
I (157112) mesh: [CONFIG]connect to router:XXLL, 00:00:00:00:00:00
W (157122) ping: From 8.8.8.8 icmp_seq=52 timeout
E (157122) ping_sock: send error=0
I (157162) mesh: <mesh_connect_to_router,540>parent is set<stop reconnect>g_is_wifi_connecting:1, g_is_wifi_disconnecting:0, g_mesh_stop_reconnection:0
I (157162) mesh: [wifi]disconnected reason:106(scan fail), continuous:6/max:12, root, vote(,stopped)<><>I (157162) wifi:station: b0:a7:32:17:26:40 leave, AID = 1, bss_flags is 134243, bss:0x3ffba9ec
I (157182) wifi:new:<13,0>, old:<13,2>, ap:<13,2>, sta:<13,0>, prof:13
I (157192) wifi:mode : sta (cc:7b:5c:27:09:70)
I (157162) mesh_main: <MESH_EVENT_PARENT_DISCONNECTED>reason:106 ERROR
W (157222) mesh_hand: Stop mesh auto-reconnect
I (157222) mesh: [IO]disable self-organizing<stop reconnect>
I (157232) mesh: [scan]new scanning time:300ms, beacon interval:100ms
I (157232) mesh_main: <MESH_EVENT_CHILD_DISCONNECTED>aid:1, b0:a7:32:17:26:40
I (157242) mesh: [IO]disable self-organizing<stop reconnect>
I (159432) mesh: [wifi]disconnected reason:201(), continuous:0/max:12, root, vote(,stopped)<><>
I (159432) mesh_main: <MESH_EVENT_PARENT_DISCONNECTED>reason:201 ERROR
I (159432) mesh: <mesh_nwk_task_main,4587>parent is set<stop reconnect>g_is_wifi_connecting:0, g_is_wifi_disconnecting:0, g_mesh_stop_reconnection:1
I (159452) mesh_main: <MESH_EVENT_STOP_RECONNECTION>

I am not sure if it is actually participating in the mesh network or acting as the root. Could this be because my root is currently not connected to the SIM module?

And here is the log of the node:

I (156489) mesh_main: <MESH_EVENT_PARENT_DISCONNECTED>reason:2 ERROR
W (156489) ping: Ignore counters while transitioning. error 2
I (156529) mesh_main: <MESH_EVENT_NETWORK_STATE>is_rootless:1
I (156529) mesh: 5206[healing]looking for a new parent, [L:2]try layer:1[revote][scan]
W (157139) wifi:scan number is 0
I (157139) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (157139) mesh: [FAIL][1]root:0, fail:1, normal:0, <pre>backoff:0

W (157559) ping: From 8.8.8.8 icmp_seq=12 timeout
E (157559) ping_sock: send error=0
W (157749) wifi:scan number is 0
I (157749) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (157749) mesh: [FAIL][2]root:0, fail:2, normal:0, <pre>backoff:0

W (158359) wifi:scan number is 0
I (158359) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (158359) mesh: [FAIL][3]root:0, fail:3, normal:0, <pre>backoff:0

W (158969) wifi:scan number is 0
I (158969) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (158979) mesh: [FAIL][4]root:0, fail:4, normal:0, <pre>backoff:0

W (159579) wifi:scan number is 0
I (159579) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (159589) mesh: [FAIL][5]root:0, fail:5, normal:0, <pre>backoff:0

W (160189) wifi:scan number is 0
I (160189) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (160199) mesh: [FAIL][6]root:0, fail:6, normal:0, <pre>backoff:0

W (160559) ping: From 8.8.8.8 icmp_seq=13 timeout
E (160559) ping_sock: send error=0
W (160799) wifi:scan number is 0
I (160799) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (160809) mesh: [FAIL][7]root:0, fail:7, normal:0, <pre>backoff:0

W (161409) wifi:scan number is 0
I (161409) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (161419) mesh: [FAIL][8]root:0, fail:8, normal:0, <pre>backoff:0

W (162019) wifi:scan number is 0
I (162029) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (162029) mesh: [FAIL][9]root:0, fail:9, normal:0, <pre>backoff:0

W (162639) wifi:scan number is 0
I (162639) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (162639) mesh: [FAIL][10]root:0, fail:10, normal:0, <pre>backoff:0

W (163249) wifi:scan number is 0
I (163249) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (163249) mesh: [FAIL][11]root:0, fail:11, normal:0, <pre>backoff:0

W (163559) ping: From 8.8.8.8 icmp_seq=14 timeout
E (163559) ping_sock: send error=0
W (163859) wifi:scan number is 0
I (163859) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (163859) mesh: [FAIL][12]root:0, fail:12, normal:0, <pre>backoff:0

W (164469) wifi:scan number is 0
I (164469) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (164469) mesh: [FAIL][13]root:0, fail:13, normal:0, <pre>backoff:0

I (165079) mesh: [SCAN][ch:13]AP:1, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (165079) mesh: [FAIL][14]root:0, fail:14, normal:0, <pre>backoff:0

I (165689) mesh: [SCAN][ch:13]AP:1, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (165689) mesh: [FAIL][15]root:0, fail:15, normal:0, <pre>backoff:0

W (166289) wifi:scan number is 0
I (166289) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (166299) mesh: [FAIL][16]root:0, fail:16, normal:0, <pre>backoff:0

W (166559) ping: From 8.8.8.8 icmp_seq=15 timeout
E (166559) ping_sock: send error=0
W (166909) wifi:scan number is 0
I (166909) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (166909) mesh: [FAIL][17]root:0, fail:17, normal:0, <pre>backoff:0

I (167519) mesh: [SCAN][ch:13]AP:1, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (167519) mesh: [FAIL][18]root:0, fail:18, normal:0, <pre>backoff:0

W (168119) wifi:scan number is 0
I (168119) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (168129) mesh: [FAIL][19]root:0, fail:19, normal:0, <pre>backoff:0

W (168729) wifi:scan number is 0
I (168739) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (168739) mesh: [FAIL][20]root:0, fail:20, normal:0, <pre>backoff:0

I (169349) mesh: [SCAN][ch:13]AP:1, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (169349) mesh: [FAIL][21]root:0, fail:21, normal:0, <pre>backoff:0

W (169559) ping: From 8.8.8.8 icmp_seq=16 timeout
E (169559) ping_sock: send error=0
I (169949) mesh: [SCAN][ch:13]AP:1, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (169949) mesh: [FAIL][22]root:0, fail:22, normal:0, <pre>backoff:0

W (170559) wifi:scan number is 0
I (170559) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (170569) mesh: [FAIL][23]root:0, fail:23, normal:0, <pre>backoff:0

W (171169) wifi:scan number is 0
I (171169) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (171179) mesh: [FAIL][24]root:0, fail:24, normal:0, <pre>backoff:0

W (171779) wifi:scan number is 0
I (171779) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (171789) mesh: [FAIL][25]root:0, fail:25, normal:0, <pre>backoff:0

W (172389) wifi:scan number is 0
I (172389) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (172399) mesh: [FAIL][26]root:0, fail:26, normal:0, <pre>backoff:0

W (172559) ping: From 8.8.8.8 icmp_seq=17 timeout
E (172559) ping_sock: send error=0
W (172999) wifi:scan number is 0
I (172999) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (173009) mesh: [FAIL][27]root:0, fail:27, normal:0, <pre>backoff:0

W (173619) wifi:scan number is 0
I (173619) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (173619) mesh: [FAIL][28]root:0, fail:28, normal:0, <pre>backoff:0

W (174229) wifi:scan number is 0
I (174229) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (174229) mesh: [FAIL][29]root:0, fail:29, normal:0, <pre>backoff:0

W (174839) wifi:scan number is 0
I (174839) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (174839) mesh: [FAIL][30]root:0, fail:30, normal:0, <pre>backoff:0

W (175449) wifi:scan number is 0
I (175449) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (175449) mesh: [FAIL][31]root:0, fail:31, normal:0, <pre>backoff:0

W (175559) ping: From 8.8.8.8 icmp_seq=18 timeout
E (175559) ping_sock: send error=0
W (176059) wifi:scan number is 0
I (176059) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (176059) mesh: [FAIL][32]root:0, fail:32, normal:0, <pre>backoff:0

W (176669) wifi:scan number is 0
I (176669) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (176669) mesh: [FAIL][33]root:0, fail:33, normal:0, <pre>backoff:0

W (177279) wifi:scan number is 0
I (177279) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (177279) mesh: [FAIL][34]root:0, fail:34, normal:0, <pre>backoff:0

W (177889) wifi:scan number is 0
I (177889) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (177889) mesh: [FAIL][35]root:0, fail:35, normal:0, <pre>backoff:0

W (178499) wifi:scan number is 0
I (178499) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (178499) mesh: [FAIL][36]root:0, fail:36, normal:0, <pre>backoff:0

W (178559) ping: From 8.8.8.8 icmp_seq=19 timeout
E (178559) ping_sock: send error=0
W (179109) wifi:scan number is 0
I (179109) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (179119) mesh: [FAIL][37]root:0, fail:37, normal:0, <pre>backoff:0

W (179719) wifi:scan number is 0
I (179719) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (179729) mesh: [FAIL][38]root:0, fail:38, normal:0, <pre>backoff:0

W (180329) wifi:scan number is 0
I (180329) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (180339) mesh: [FAIL][39]root:0, fail:39, normal:0, <pre>backoff:0

W (180939) wifi:scan number is 0
I (180939) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
I (180949) mesh: [FAIL][40]root:0, fail:40, normal:0, <pre>backoff:0

W (181549) wifi:scan number is 0
I (181549) mesh: [SCAN][ch:13]AP:0, other(ID:0, RD:0), MAP:0, idle:0, candidate:0, root:0, topMAP:0[c:2,i:1][ba:08:8c:94:20:17]<>
W (181559) ping: From 8.8.8.8 icmp_seq=20 timeout
W (181559) mesh_hand: Triggering FIXED ROOT handover
I (181559) mesh: [FAIL][41]root:0, fail:41, normal:0, <pre>backoff:0

E (181569) ping_sock: send error=0
I (181579) mesh: [IO]enable self-organizing, search parent<reconnect>
I (181589) mesh: <MESH_NWK_PARENT_DISCONNECTED>already disconnected, ignore it
------------root: 0
I (181589) mesh: <MESH_NWK_SCAN_REQ_PASSIVE>unexpected, stop previous scan by parent selection
I (181589) mesh: <WIFI_EVENT_SCAN_DONE>status:fail, num:0, id:174
I (181599) mesh: [IO]disable self-organizing<stop reconnect>
------------root after: 0
I (181609) mesh: <MESH_NWK_SCAN_DONE>unexpected, flush scan results, request a new scan by parent selection
I (181619) mesh_main: <MESH_EVENT_ROOT_FIXED>fixed
W (181629) mesh_hand: Triggering FIXED ROOT handover
I (181639) mesh: <MESH_NWK_PARENT_DISCONNECTED>already disconnected, ignore it
I (182229) mesh: 4052<MESH_NWK_SCAN_DONE>self-organized is disabled, flush scan results.
W (184579) ping: From 8.8.8.8 icmp_seq=21 timeout
E (184579) ping_sock: send error=0
W (187579) ping: From 8.8.8.8 icmp_seq=22 timeout
E (187579) ping_sock: send error=0
W (190579) ping: From 8.8.8.8 icmp_seq=23 timeout
E (190579) ping_sock: send error=0
W (193579) ping: From 8.8.8.8 icmp_seq=24 timeout
E (193579) ping_sock: send error=0
W (196579) ping: From 8.8.8.8 icmp_seq=25 timeout
E (196579) ping_sock: send error=0
W (199579) ping: From 8.8.8.8 icmp_seq=26 timeout
E (199579) ping_sock: send error=0
W (202579) ping: From 8.8.8.8 icmp_seq=27 timeout
E (202579) ping_sock: send error=0
W (205579) ping: From 8.8.8.8 icmp_seq=28 timeout
E (205579) ping_sock: send error=0
W (208579) ping: From 8.8.8.8 icmp_seq=29 timeout
E (208579) ping_sock: send error=0
W (211579) ping: From 8.8.8.8 icmp_seq=30 timeout
E (211579) ping_sock: send error=0
I (211619) mesh_hand: Checking wifi connectivity (count 0)[LOCAL]
I (214459) mesh: 4016<scan>self-organized is disabled, users shall read out scan results by themselves.
I (214459) mesh_main: <MESH_EVENT_SCAN_DONE>number:9
W (214459) wifi:Haven't to connect to a suitable AP now!
W (214469) mesh_hand: scan_done: found 0, fixed_root 1, has_parent 0, forced 1
W (214469) mesh_hand: All nodes have bad connection, allow any RSSI.
W (214579) ping: From 8.8.8.8 icmp_seq=31 timeout
E (214579) ping_sock: send error=0
W (217579) ping: From 8.8.8.8 icmp_seq=32 timeout
E (217579) ping_sock: send error=0
W (220579) ping: From 8.8.8.8 icmp_seq=33 timeout
E (220579) ping_sock: send error=0
W (223579) ping: From 8.8.8.8 icmp_seq=34 timeout
E (223579) ping_sock: send error=0
W (226579) ping: From 8.8.8.8 icmp_seq=35 timeout
E (226579) ping_sock: send error=0
W (229579) ping: From 8.8.8.8 icmp_seq=36 timeout
E (229579) ping_sock: send error=0
W (232579) ping: From 8.8.8.8 icmp_seq=37 timeout
E (232579) ping_sock: send error=0
W (235579) ping: From 8.8.8.8 icmp_seq=38 timeout
E (235579) ping_sock: send error=0
W (238579) ping: From 8.8.8.8 icmp_seq=39 timeout
E (238579) ping_sock: send error=0
W (241579) ping: From 8.8.8.8 icmp_seq=40 timeout
E (241579) ping_sock: send error=0
I (241619) mesh_hand: Checking wifi connectivity (count 0)[LOCAL]
I (244459) mesh: 4016<scan>self-organized is disabled, users shall read out scan results by themselves.
I (244459) mesh_main: <MESH_EVENT_SCAN_DONE>number:14
W (244459) wifi:Haven't to connect to a suitable AP now!
W (244469) mesh_hand: scan_done: found 0, fixed_root 1, has_parent 0, forced 1
W (244469) mesh_hand: All nodes have bad connection, allow any RSSI.

I tried checking the root by calling esp_mesh_get_type() in mesh_handover_disconnect_callback(), and it still returns a value of 1.

I would greatly appreciate it if you could help explain this behavior.

Also, what is fake modem actually?

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 5, 2024

hi @kieennt13,
this branch is an export of part of our internal production code.
since it is moved on the espressif example we had to strip all the functionalities related to the modem itself.
your modem should create a ppp interface with priority higher than the WIFI one.
This way, once the modem will receive the IP, the interface will start routing traffic through the modem.

All "FAKE MODEM" related functions were made to try to simulate a modem in the example connecting to a different WIFI, but they should be ignored and replaced with real modem-related functions. (I always have a real modem-equipped device in the network)

The node should start scanning every 30 seconds , they won't connect unless they find a root device or another leaf device already connected to the fixed root.

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 5, 2024

We are still investigating with the network providers and the modem manufacturer. For now this does not seem related to the mesh per se. I will close the issue for now and reopen in case we find that both modem and network are ok.

and BTW we are still having huge issues with our production Modem, which came out to be the responsible for all the bad performances.

@kieennt13
Copy link

kieennt13 commented Nov 6, 2024

Hi, @KonssnoK ,
Thanks for your reply!
I am working on a project where the LTE connection is established using a library rewritten from TinyGSM to be compatible with ESP-IDF. LTE is connected before initializing the mesh, and it has its own functions for handling data transmission and reception.

The mesh network I created is based on a combination of the manual_networking and internal_communication examples from ESP-IDF, so it doesn't need the mesh_netif library.
In the Wi-Fi-based mesh scenario, I don’t even need to set up netif, just call esp_netif_init() (or maybe don't even need that too, I haven't tried to remove it) because my program has already connected to Wi-Fi beforehand.

My question is, in a scenario where the mesh switches from Wi-Fi to LTE, how can I ensure that the PPP interface created by the modem has a higher priority than that of Wi-Fi?
And, since I do not have mesh_netifs_stop(), how can I disable the Wi-Fi netif?

Also, I would greatly appreciate it if you could provide logs for the root and nodes in the scenario when switching to fixed-root mode.
Thanks!

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 6, 2024

Sorry we do not have capacity to provide support on our pieces of code :(

@kieennt13
Copy link

So sad to hear that :(
Anyway thanks for support!

@kieennt13
Copy link

@zhangyanjiaoesp could you please help me figured out this problem?

@kieennt13
Copy link

Oh, at least, could you share the log of root when it come to root fixed mode? @KonssnoK
I want to see what behavior the root will have when that scenario happened.
Thanks!

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 6, 2024

@kieennt13 here is an example log of an handover

W (15:42:21.436) mesh_hand: Triggering FIXED ROOT handover
I (183999) mesh: <MESH_NWK_PARENT_DISCONNECTED>already disconnected, ignore it
I (184000) mesh: [IO]disable self-organizing<reconnect>
I (184052) mesh: [CONFIG]connect to router:KI, 00:00:00:00:00:00
I (184060) mesh: <mesh_connect_to_router,541>parent is set<stop reconnect>g_is_wifi_connecting:1, g_is_wifi_disconnecting:0, g_mesh_stop_reconnection:0
I (184073) mesh: [wifi]disconnected reason:106(scan fail), continuous:5/max:12, root, vote(,stopped)<><>
I (184131) mesh: [IO]disable self-organizing<stop reconnect>
W (15:42:21.575) mesh_main: <MESH_EVENT_PARENT_DISCONNECTED>reason: 106 MESH_REASON_SCAN_FAIL
W (15:42:21.577) modem_comm: Signal: RSSI = -80.0/-75.0
W (15:42:21.583) mesh_hand: Stop mesh auto-reconnect
I (184156) mesh: [IO]disable self-organizing<stop reconnect>
W (15:42:21.600) modem_comm: LTE signal: RSRP -105.0/-100.0, DPL 105.0/110.0, SiNR 5.0/10.0
W (15:42:22.129) modem_comm: AT+CREG?: 2,5,"2F08","0071DF21",7
W (15:42:22.133) modem_comm: AT+CEREG?: 5,5,"2F08","0071DF21",7
W (15:42:22.137) modem_comm: AT+CGREG?: 2,0
W (15:42:22.144) modem_comm: LTE signal: RSRP -105.0/-100.0, DPL 105.0/110.0, SiNR 5.0/10.0
W (15:42:22.166) modem_comm: Selected operator: 0,0,"vodafone IT",7
W (15:42:22.171) modem_comm: Active LTE band(s): 0,0000000000000000000004
I (187013) mesh: [wifi]disconnected reason:201(), continuous:6/max:12, root, vote(,stopped)<><>
I (187027) mesh: <mesh_nwk_task_main,4597>parent is set<stop reconnect>g_is_wifi_connecting:0, g_is_wifi_disconnecting:0, g_mesh_stop_reconnection:1
I (15:42:24.478) mesh_main: <MESH_EVENT_STOP_RECONNECTION>
I (15:42:25.116) tmpr_propagation: Propagated local temperature
I (15:42:28.214) esp-netif_lwip-ppp: Connected
I (15:42:28.215) modem_comm: Modem connected to PPP server
W (15:42:28.216) network: GOT IP from ppp_sta
I (15:42:28.221) network: Network connected
I (15:42:28.225) network: Root node can access external IP network
W (190788) wifi:Haven't to connect to a suitable AP now!
I (190799) mesh: <MESH_NWK_ROOT_TODS_STATE>toDS:1
I (15:42:28.238) mesh_netif: Interface AP
I (15:42:28.246) mqtt_app: MQTT_EVENT_BEFORE_CONNECT
I (15:42:28.252) mqtt_app: Try [1] connection to mqtts://mqtt-stg.tiko.energy:8883
W (15:42:28.260) mesh_netif: DNS server [0]: 192.168.10.110
W (15:42:28.267) mesh_netif: DNS server [1]: 194.51.3.56
I (15:42:28.274) esp_netif_lwip: DHCP server started on interface WIFI_AP_DEF with IP: 10.0.0.1
W (190844) wifi:Haven't to connect to a suitable AP now!
I (15:42:28.286) mesh_netif: Clearing interface <sta>
I (15:42:28.293) mesh_netif: It was a wifi station, removing handlers
I (15:42:28.301) mesh_netif: Interface PPP
W (15:42:28.304) mesh_main: <MESH_EVENT_TODS_REACHABLE>reachable:1
I (15:42:31.575) stats: iteration_time_us=999987 Name=last% IDLE0=47 IDLE1=98 mqtt_task=51 
I (15:42:31.806) mqtt_app: MQTT_EVENT_CONNECTED 

@kieennt13
Copy link

Thanks a lots for those logs! @KonssnoK
Does your modem_comm based on the Pppos_client example in ESP-IDF or something else?
I'd love to dive into it and see how to create a PPP interface for mesh.

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 7, 2024

@kieennt13 we use esp-protocols as addon for supporting AT commands

@kieennt13
Copy link

@KonssnoK sorry, I don't have much knowlegde about the PPP interface and others related, how do you create a PPP interface with just AT Commands, because based on the Pppos_client example, I see they using modem_dce:

 /* Configure the PPP netif */
    esp_modem_dce_config_t dce_config = ESP_MODEM_DCE_DEFAULT_CONFIG(CONFIG_EXAMPLE_MODEM_PPP_APN);
    esp_netif_config_t netif_ppp_config = ESP_NETIF_DEFAULT_PPP();
    esp_netif_t *esp_netif = esp_netif_new(&netif_ppp_config);
    assert(esp_netif);

to configured it.

@KonssnoK
Copy link
Contributor Author

KonssnoK commented Nov 7, 2024

@kieennt13 esp has a default for a PPP netif interface.
We also do the same as in that example probably, this part was not written by me, so i can't help.

@kieennt13
Copy link

@KonssnoK
Is that so... :(
Btw, is there any log indicating that the new device has successfully become the root and that the node has connected to the root after the root starts the modem?
I know I am asking too much, but I would be very grateful if you could provide me with the full logs for both the node and the root.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Won't Do This will not be worked on Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

4 participants