Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stack Ptr and Stack Guard #425

Open
zereo08 opened this issue Dec 9, 2024 · 6 comments
Open

Stack Ptr and Stack Guard #425

zereo08 opened this issue Dec 9, 2024 · 6 comments
Labels
bug Something isn't working hardware New hardware or architecture support request

Comments

@zereo08
Copy link

zereo08 commented Dec 9, 2024

threadx_stack_ptr_problem
threadx_stack_ptr_problem_2
threadx_stack_ptr_problem_3

General Information:
Target Device: STM32H753ZI
ThreadX Version: 6.4.1_rel

Describe the Bug
The tx_thread_stack_highest_ptr in ThreadX does not correctly represent the highest ever used point in the thread stack when using stack guard functionality. Instead, due to ThreadX's binary search mechanism, it points to the closest unmodified memory pattern (e.g., 0xEF) within the stack. This can lead to an inaccurate measurement of the highest stack usage.

Problem Analysis
In my debugging session, I observed the following (look at screenshots for further information):
Stack Start Address: 0x240073f0
Stack End Address: 0x240077e7
Initial stack memory is filled with 0xEF (visible in Screenshot 1).
As the stack is utilized, portions of the stack are overwritten.
The tx_thread_stack_highest_ptr points to an address in the lower part of the stack (e.g., within the range of partially overwritten memory), even though higher parts of the stack have already been utilized (visible in Screenshot 2).
As visible in Screenshot 3, approximately 90% of the stack has been used, but the tx_thread_stack_highest_ptr fails to reflect this correctly.
This results in the stack guard not triggering as intended, leading to potential stack overflow without warning.
This behavior diminishes the reliability of the stack guard feature.

To Reproduce
Configure a ThreadX application to enable stack guard and monitor stack usage.
Initialize a thread with stack memory filled with 0xEF.
Let the thread use most of its stack memory during execution (e.g. recursive function calls lead to stack overflow, when not interrupted in some kind of way).
Debug and observe the value of tx_thread_stack_highest_ptr and compare it to the actual memory regions used.

Expected behavior
The tx_thread_stack_highest_ptr should accurately represent the highest point of memory utilized in the stack. The stack guard should trigger a warning when the threshold for stack usage is exceeded.

Impact
This issue reduces the reliability of the stack guard functionality, which is critical for ensuring system stability. A miscalculation in stack usage can lead to unanticipated stack overflows, potentially causing critical system failures.

Logs and console output
Screenshots and detailed observations have been attached to this report:

Screenshot 1 - Initial stack memory view.
Screenshot 2 - Observed tx_thread_stack_highest_ptr pointing to a lower memory location despite significant stack usage.
Screenshot 3 - Final stack memory state showing actual utilization vs. reported stack pointer.

Additional context
This behavior appears to be a result of the binary search mechanism in the background used to determine stack usage.

@zereo08 zereo08 added bug Something isn't working hardware New hardware or architecture support request labels Dec 9, 2024
@amgross
Copy link

amgross commented Dec 10, 2024

The other option is to do linear search. As written in the comment there, current solution known that it is not perfect solution, but fastest.

This is a best effort algorithm to find the highest stack usage. */

I think there is an assumption that there is no skip in writing to stack, but indeed in case of arrays on the stack that weren't fully used and such it may happen more frequently.

@zereo08
Copy link
Author

zereo08 commented Dec 12, 2024

So this means it is a best of practice method and if i want to guarantee it working, i should replace the algorithm with a linear search? Or is there any other alternative?

@amgross
Copy link

amgross commented Dec 12, 2024

Currently I can't think on better (none HW based) solution than linear search (starting from end of the stack till finding first byte that is different).
And even this is just the last byte that changed on the stack, it may be that the stack pointer even got further. Not tacking in account someone wrote into the head of the stack TX_STACK_FILL as part of its code.

@fdesbiens
Copy link
Contributor

Thank you for the detailed report, @zereo08. I also appreciate your willingness to help, @amgross.

I will discuss this with the project team and report back.

@fdesbiens
Copy link
Contributor

@eclipse-threadx/iot-threadx-committers Please have a look.

@billlamiework
Copy link

The previous comments are all valid, including the comment that a linear search might not even be 100% accurate. That said, accuracy could be increased by changing the tx_thread_stack_analyze function to ensure there are "n" consecutive bytes of TX_STACK_FILL towards the lowest address in the stack before stopping the binary search. Of course, the balance here is the size of "n" that meets the accuracy required with an acceptable amount of overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hardware New hardware or architecture support request
Projects
None yet
Development

No branches or pull requests

4 participants