Description
With #49866 coming in v1.11 we changed the staleness check for cachefiles from using the mtime
of the source files to computing their _crc32
hash. Because _crc32
needs to read the whole source file, it is likely more expensive to compute than querying the mtime
.
So far no one seems to have noted regressions in loading times related to this change. If you do notice any, please link to this issue!
However, some noted that this change might have a notable impact on loading times on parallel filesystems.
This issue should track the status of this problem. I will update the list below regularly.
-
Related to this is a report from @sloede in the initial relocation issue, Make precompile files relocatable/servable #47943 (comment). The Trixi.jl folks already have a workflow setup that we can utilize for a benchmark. We are currently preparing a meeting to see how we can evaluate this.
-
Assuming the impact is not to be neglected, there is at least one way forward: @vchuravy suggested to counteract this by implementing a content-aware-storage (CAS) system.
-
In https://hackmd.io/@Je8OcLYBQr2ociLAtslIug/BJvv7G9pa I prepared a draft for a blogpost that we want to publish that explains how to utilize this relocation feature. Eventually, this report should be shared on discourse, ideally also including an answer to whether this has an impact on parallel filesystems. Any feedback to that post is very much appreciated!
-
Tangentially related: Computing the CRC hash of the sysimage costs a non-negligible startup time #50166. Sysimages are of the order 200 MB, at least for v1.8, v1.9, v1.10. Note that the problem with parallel filesystems seems to not be the size of a single file, but the vast number of files.