Implement write-back caching in the storage layer #611

lars-t-hansen · 2024-10-04T05:55:07Z

At the moment, jobanalyzer will flush the in-memory cache for a node when new data arrives for the node and then append the data on disk. This was simple to implement and is maximally safe wrt data loss, but it makes the cache useful only for bursty queries: after about 5 minutes, all nodes will have reported new data and the cache will therefore be completely clean. The cache capacity purging will likely not kick in at all except for very large node sets or for very long time frames.

The right fix here is to not flush the cache when new data arrives, but to append both in-memory and on disk. Mostly this will affect only the lowest database layer, clients will be unaware. Possibly we'll have a writer thread that processes a queue of append requests for files. We just need to be sure that (a) the risk of data loss is still very low and (b) data are coherent. For example, a file can be purged from the cache for capacity reasons while there are data waiting for it in the write queue. It would be safe to re-read the file (under some mutex with the writer) but only if the to-be-written data were not already appended to the previous in-memory content and observed by a client. It may be that dirty files cannot be flushed; they will have to be cleaned first.

lars-t-hansen · 2024-10-14T12:44:54Z

That characterization is off - it's only today's data that are flushed, everything from previous days will remain in memory as desired.

lars-t-hansen added task:enhancement New feature or request component:sonalyze sonalyze/* labels Oct 4, 2024

lars-t-hansen added the pri:low label Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement write-back caching in the storage layer #611

Implement write-back caching in the storage layer #611

lars-t-hansen commented Oct 4, 2024

lars-t-hansen commented Oct 14, 2024

Implement write-back caching in the storage layer #611

Implement write-back caching in the storage layer #611

Comments

lars-t-hansen commented Oct 4, 2024

lars-t-hansen commented Oct 14, 2024