-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
esp_netif_list_lock/esp_netif_list_unlock freeing mutex creates problems (IDFGH-11088) #12261
Comments
Is it possible to change the esp_netif_next() implementation to use esp_netif_tcpip_exec() |
Hi @AxelLin I'm afraid that's not possible. The problem is in the API design itself (similar to the macro
|
Then I need to wait until the fix is available in stable branches, and update application code accordingly. |
This issue is already marked as closed, but currently this fix is only available in v5.2+ , I don't find the fix in any release branch. |
Yes, will fix on all release branches (down to |
This commit removes the lock from the list manipulation code in esp_netif_objects.c, because we already have another lock/task context for lwip. So the list manipulation is unsafe and safety must be assured by the stack layer (in esp_netif_lwip). Problems with current locking: * implementation of locking was wrong -- lazy init style of creating the mutex is not thread safe (and destroying it if we have no interface makes the problem exhibit very frequently) * locking only the list won't solve issues when assessing interfaces atomically * maintaining multiple locks is problematic, as we often switch between lwip context and user context in internal implementation of esp_netif_lwip Closes espressif#12261
This commit removes the lock from the list manipulation code in esp_netif_objects.c, because we already have another lock/task context for lwip. So the list manipulation is unsafe and safety must be assured by the stack layer (in esp_netif_lwip). Problems with current locking: * implementation of locking was wrong -- lazy init style of creating the mutex is not thread safe (and destroying it if we have no interface makes the problem exhibit very frequently) * locking only the list won't solve issues when assessing interfaces atomically * maintaining multiple locks is problematic, as we often switch between lwip context and user context in internal implementation of esp_netif_lwip Closes #12261
Answers checklist.
IDF version.
v4.4.4 / latest master
Operating System used.
Windows
How did you build your project?
Command line with CMake
If you are using Windows, please specify command line type.
None
Development Kit.
ESP32-wrover-kit
Power Supply used.
USB
What is the expected behavior?
No race conditions leading to crashes
What is the actual behavior?
Race conditions leading to crashes:
The issue is that the mutex protecting the esp_netif_list isn't thread-safe.
Two reasons:
In theory two threads could try and create an interface at the same time, each reaching esp_netif_list_lock at the same time. The first thread could context-switch just after having detected that the current mutex is null, and planning to allocate a new one. The other could then allocate one as well, and lock it, before context switching back. When the first one is brought back up, it will overwrite the first one, causing corruption.
The chances of this happening are small though, but it is still bad design.
The correct way of doing this is initing the relevant mutexes once in some init function.
The fact that the mutex is again optionally destroyed on esp_netif_list_unlock is really bad. (See here )
For instance, if we call esp_netif_get_handle_from_ifkey (e.g. by mdns_init) at the same time and/or just before an interface is created, it can lead to the mutex being allocated, freed, but also simultaenously used.
This has led to constant crashes of spinlocks asserts that can be found in the debug logs.
Note that this guy https://www.esp32.com/viewtopic.php?t=20233 has clearly seen this issue before in 2021 already.
A correct design would have an init function, deinit function, and not alloc/dealloc mutexes at runtime..
Steps to reproduce.
Problem 1) Create two threads, that simultaneously try to create interfaces. (Very hard to reproduce though)
Problem 2) Also a tricky race condition, but was able to reproduce this in a constant crashloop:
By slightly varying the delays between the thread execution sequences, the spinlock crash can be triggered. (But as this is a race condition, this is tricky. Anyway, just from reading the code the issue can be clearly spotted)
Debug Logs.
More Information.
No response
The text was updated successfully, but these errors were encountered: