Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement write-back caching in the storage layer #611

Open
lars-t-hansen opened this issue Oct 4, 2024 · 1 comment
Open

Implement write-back caching in the storage layer #611

lars-t-hansen opened this issue Oct 4, 2024 · 1 comment
Labels
component:sonalyze sonalyze/* pri:low task:enhancement New feature or request

Comments

@lars-t-hansen
Copy link
Collaborator

At the moment, jobanalyzer will flush the in-memory cache for a node when new data arrives for the node and then append the data on disk. This was simple to implement and is maximally safe wrt data loss, but it makes the cache useful only for bursty queries: after about 5 minutes, all nodes will have reported new data and the cache will therefore be completely clean. The cache capacity purging will likely not kick in at all except for very large node sets or for very long time frames.

The right fix here is to not flush the cache when new data arrives, but to append both in-memory and on disk. Mostly this will affect only the lowest database layer, clients will be unaware. Possibly we'll have a writer thread that processes a queue of append requests for files. We just need to be sure that (a) the risk of data loss is still very low and (b) data are coherent. For example, a file can be purged from the cache for capacity reasons while there are data waiting for it in the write queue. It would be safe to re-read the file (under some mutex with the writer) but only if the to-be-written data were not already appended to the previous in-memory content and observed by a client. It may be that dirty files cannot be flushed; they will have to be cleaned first.

@lars-t-hansen lars-t-hansen added task:enhancement New feature or request component:sonalyze sonalyze/* labels Oct 4, 2024
@lars-t-hansen
Copy link
Collaborator Author

That characterization is off - it's only today's data that are flushed, everything from previous days will remain in memory as desired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:sonalyze sonalyze/* pri:low task:enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant