-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ESP32S3] v4.4 WIFI crashes (IDFGH-10466) #11713
Comments
Crash n.1
|
Crash n.2
|
Crash n.3 (internal 5)
|
i think we have more, we'll continue to check. |
Crash n.4 (internal id 9)
|
Crash n.5 (internal id 10)
|
@KonssnoK all the posted backtraces do indicate that the program aborts due to a heap corruption. Please follow the recommendations listed in the docs to narrow down the root cause of heap corruption: https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/heap_debug.html#heap-corruption-detection Even though the backtrace originates from a Wi-Fi related function, this typically only means that the heap corruption has already occurred and that the next allocation has ran into the corrupted heap structure. Enabling heap debugging options can help detect the issue sooner and hopefully identify which part of the code is responsible for it. |
Hi @igrr , sadly there are a few issues:
Therefore we need:
|
@igrr another wifi crash, not related to heap
|
@KonssnoK Can you provide the |
@KonssnoK About the heap corruption issue, can you provide more logs before the problem happen? This will help us analyze in what scenario does this happen. |
Hi @zhangyanjiaoesp , sadly the crashes are happening on field devices, so we don't really know what is going on. we do have logs but they are related to the device functionality, not the network itself. Sorry i can't share the elf :( but that crash happened on one of our testing devices, so i might be able to reproduce it. |
@zhangyanjiaoesp we are still working on this, we are bit puzzled because it seems that the heap poisoning works only for free blocks, ignoring allocated ones. |
Hello @zhangyanjiaoesp , Please note that the whole mutex is handled internally by espressif, so our code is not involved
Attached the dump v3_dump_FFFFAC17540004AC_1690286435.txt We currently work on top of 4fc8964 No idea on how to reproduce. |
Another mesh crash we cannot investigate
|
@KonssnoK Can you provide the |
@zhangyanjiaoesp the elf i would send it via mail, if you can provide it. |
hi @KonssnoK you can send it to the following email [email protected] and include your elf file. Please make sure to mention the GitHub link and provide a brief description in the email. Alternatively, you can notify me (@mention) after sending the email, that we can get the information as soon as possible |
@KonssnoK coredump.py is not enough, can you provide more context for the logs? |
@zhangyanjiaoesp sadly no. the coredumps are retrieved from devices in the field placed in customers houses, so this is all we can see... What kind of context would you need? |
@KonssnoK We want to get the runtime logs until the crash, it may help us to analyze the crash scenario. It will help us also if you can provide the |
sorry, somehow i missed the @Xiehanxin comment! I sent an initial email with one of the most recent crashes related to
let's start with one, we'll see the next ones. Meanwhile, by tomorrow we should have some results of our heap-checking enhanced firmware. |
Hello, @zhangyanjiaoesp , @Xiehanxin. I have a question regarding one other crash
Apparently a misaligned access to the memory occurs in the WIFI task. By looking at who is doing the access tho, it is still the allocation.
Could it be that an aligned flag is missing in the wifi function allocating the memory? Thanks |
@KonssnoK All of these crash issues occurred on the same test version and code? |
@zhangyanjiaoesp we'll try to check where we call it.
|
Do you have the |
@zhangyanjiaoesp mmm i could send it to the esp mail |
@zhangyanjiaoesp who should i reference the mail to? |
@KonssnoK This comment has given the contact email. |
@zhangyanjiaoesp done |
@zhangyanjiaoesp well.. i had to resend the mail twice because :
how should i attach the files?
|
i can attach the dump here, but not the elf of the firmware |
@KonssnoK I remember you once sent an email to attach elf files, how did it work before? #11713 (comment) |
@zhangyanjiaoesp yeah, i remember sending an elf too. I've now sent it directly to Caijin, let's see.. Edit: |
@KonssnoK Just received the elf file from my colleague |
@zhangyanjiaoesp we parse the dump file with
in this case i didn't upload the txt one |
ok, I will try it. |
|
|
@KonssnoK |
@zhangyanjiaoesp |
FYI, I just reported #12261 |
I'm also getting heap corruption errors using pretty common ADF code - and it's only a problem with some HTTPS audio streams but never with simple HTTP streams. After a few seconds of stream on these bad HTTPS links I will get the error : assert failed: remove_free_block tlsf.c:330 (prev && "prev_free field can not be null") and then the core panic. This is on ESP32C3 so single core. IDF 5.01 environment |
@GerryBriggs Since you are based on ADF and your crash issue is not same as @KonssnoK 's,please create a new ticket to record your problem, thanks. |
btw since the last lwip memory fix, our crashes considerably reduced. |
Oh I will give that a try. I didnt know that the LWIP was updated. Most of my code is using IDF libraries not ADF. But the central events process is ADF audio "pipeline". I will start a breakout ticket tonight and post my results with the new LWIP. I will also give a list of HTTPS streams (connections) that cause the heap corruption, and a list of HTTPS streams that do not cause it. |
OK wow that worked for me. The new LWIP seems to have got rid of my heap corruption core panics on various HTTPS connections. Upgraded to 5.1.2 from 5.0.1. Wow. Thank you |
Glad it helped! |
i will close this for now and open a new one in case we find new issues |
Answers checklist.
General issue report
Hello,
we are going to use this issue to report multiple WIFI crashes our field devices are reporting.
v4.4 based on 3cec3a0
Since then, only one commit on the WIFI library, so these errors should be all still there.
Our application uses
The text was updated successfully, but these errors were encountered: