Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZMS power cut corner case causing cycle count to wrap around and sector to be interpreted as a closed one #84874

Open
Emill opened this issue Jan 29, 2025 · 6 comments
Assignees
Labels
area: File System bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@Emill
Copy link

Emill commented Jan 29, 2025

Describe the bug
Consider a ZMS fs with > 2 sectors.
Assume sectors 0, 1 are closed, 2 is open and 3 is empty.
If sector 2 gets closed, garbage collection is initiated, moving items from sector 0 to sector 3. Assume a power cut happens during the move of items.

During re-initialization at the next boot, GC will be restarted. Before items are starting to be moved, the cycle count of sector 2 is incremented modulo 256.

Assume power cuts continue to happen during the GC is in progress 256 times in a row. At that time, the cycle count will eventually have wrapped around and now be equal to the cycle count of the close ATE, which now indicates that this is a valid closed sector instead of an empty one. That means all sectors will be marked as closed the next time the ZMS is initialized and thus cause corruption.

Suggested fix: in zms_add_empty_ate, make sure that the cycle counter that is to be written is not equal to whatever value is currently in the close ATE slot.

In fact, I don't really see right now why 256 different values are needed. It seems that 2 values (0 or 1) would be enough, and every zms_add_empty_ate could simply set to the cycle counter to the opposite of the one present in the close ATE.

To Reproduce
Initialize a system as above, set breakpoints in the flash_write function to be able to easily cut power when appropriate and perform the mentioned steps.

Expected behavior
Memory should not become corrupt.

Impact
The memory becomes corrupt.

Logs and console output
N/A

Environment (please complete the following information):

  • OS: N/A
  • Toolchain: N/A
  • Commit SHA or Version used: 87f11d7

Additional context
N/A

@rghaddab

@Emill Emill added the bug The issue is a bug, or the PR is fixing a bug label Jan 29, 2025
@henrikbrixandersen
Copy link
Member

Please use our bug template when reporting bugs. You need to edit this issue to include the information requested in https://github.com/zephyrproject-rtos/zephyr/blob/main/.github/ISSUE_TEMPLATE/001_bug_report.md

@rghaddab
Copy link
Contributor

In fact, I don't really see right now why 256 different values are needed. It seems that 2 values (0 or 1) would be enough, and every zms_add_empty_ate could simply set to the cycle counter to the opposite of the one present in the close ATE.

Hi @Emill thank you for reporting this corner case bug.
As you can see, the choice of having 256 different values instead of just 2 (you are right, only 2 values are needed) is basically to get the probability of having such corner cases very low and almost impossible to happen in real life.

I will have a look at your proposal to fix this very soon

@Emill
Copy link
Author

Emill commented Jan 30, 2025

Thanks for taking a look into this!

Please also see my line comment at d4e246d#diff-bdbbc59aff22840e2b494ff2b05c3c2d0521617f259e84ead786a33e3f001f78R1229 for subsys/fs/zms/zms.c at line 1229. If you want to I can file a new separate issue for this.

@rghaddab
Copy link
Contributor

Thanks for taking a look into this!

Please also see my line comment at d4e246d#diff-bdbbc59aff22840e2b494ff2b05c3c2d0521617f259e84ead786a33e3f001f78R1229 for subsys/fs/zms/zms.c at line 1229. If you want to I can file a new separate issue for this.

That one should be fixed as well. thanks

1 similar comment
@rghaddab
Copy link
Contributor

Thanks for taking a look into this!

Please also see my line comment at d4e246d#diff-bdbbc59aff22840e2b494ff2b05c3c2d0521617f259e84ead786a33e3f001f78R1229 for subsys/fs/zms/zms.c at line 1229. If you want to I can file a new separate issue for this.

That one should be fixed as well. thanks

@Emill
Copy link
Author

Emill commented Feb 4, 2025

It appears to be the same issue here:

rc = zms_add_empty_ate(fs, addr);
. If you do a "factory reset" by calling zms_clear 256 times in a row, you will end up where you already were before. Additionally, it appears upon first sight to me that the function zms_clear is not safe against power cuts. If there is a power cut after only some sectors have been erased/empty-marked, but not all, the storage will likely be corrupt the next time it is initialised (e.g. some old data might appear again and new data with same id is gone). NVS seems to have the same issue.

@fabiobaltieri fabiobaltieri added the priority: medium Medium impact/importance bug label Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: File System bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

4 participants