You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After hundreds of devices operating in the field for 1+ years with no complaints, I found a fascinating filesystem corruption. At the API level we were getting SPIFFS_ERR_DELETED on read operations that were within the file bounds. I managed to extract the entire filesystem image for analysis.
For a given file (id 0x20) that receives a lot of reads and writes over the lifetime of the device, we ended up in a situation where two live pages claimed to be a given object index page. Both pages are marked as live in the block header (0x8020). Both were marked as used, final, non deleted index page in the page header (flags 0xf8).
The file is ~6 kbytes so there are five index pages (1 header + 4 index). Index page spix=3 has the issue.
Here are two blocks from different offsets in the flash:
I produced this with od -Ad -w64 -t x2 spiffs.dat | grep '8020 0003 fff8'
There are about 150 other blocks which are also 8020 0003 but they are erased. This is expected as a lot of write operations are hitting this file.
It appears to me that the filesystem is not able to recover from this situation, because once the block is forked, the two instances will live their own individual life. The index lookup function picks one or the other in a non deterministic way, depending on what's in the cache variables. Further writes may end up cloning one version or the other depending on which one got picked.
It is not entirely clear to me how this situation came to be. The flash driver is pretty simple since we have an on-chip flash and we only need to call a simple function to program or erase the flash. The MTBF is pretty large considering the number of devices that have been operating correctly for a long while.
I can provide more information including the entire filesystem dump if that's interesting.
thanks
Balazs
The text was updated successfully, but these errors were encountered:
Hi,
We have a SPIFFs partition in the on-chip flash of a Texas Instruments microcontroller. It's a pretty small instance with the following parameters:
After hundreds of devices operating in the field for 1+ years with no complaints, I found a fascinating filesystem corruption. At the API level we were getting SPIFFS_ERR_DELETED on read operations that were within the file bounds. I managed to extract the entire filesystem image for analysis.
For a given file (id 0x20) that receives a lot of reads and writes over the lifetime of the device, we ended up in a situation where two live pages claimed to be a given object index page. Both pages are marked as live in the block header (0x8020). Both were marked as used, final, non deleted index page in the page header (flags 0xf8).
The file is ~6 kbytes so there are five index pages (1 header + 4 index). Index page spix=3 has the issue.
Here are two blocks from different offsets in the flash:
I produced this with
od -Ad -w64 -t x2 spiffs.dat | grep '8020 0003 fff8'
There are about 150 other blocks which are also
8020 0003
but they are erased. This is expected as a lot of write operations are hitting this file.It appears to me that the filesystem is not able to recover from this situation, because once the block is forked, the two instances will live their own individual life. The index lookup function picks one or the other in a non deterministic way, depending on what's in the cache variables. Further writes may end up cloning one version or the other depending on which one got picked.
It is not entirely clear to me how this situation came to be. The flash driver is pretty simple since we have an on-chip flash and we only need to call a simple function to program or erase the flash. The MTBF is pretty large considering the number of devices that have been operating correctly for a long while.
I can provide more information including the entire filesystem dump if that's interesting.
thanks
Balazs
The text was updated successfully, but these errors were encountered: