You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, jobanalyzer will flush the in-memory cache for a node when new data arrives for the node and then append the data on disk. This was simple to implement and is maximally safe wrt data loss, but it makes the cache useful only for bursty queries: after about 5 minutes, all nodes will have reported new data and the cache will therefore be completely clean. The cache capacity purging will likely not kick in at all except for very large node sets or for very long time frames.
The right fix here is to not flush the cache when new data arrives, but to append both in-memory and on disk. Mostly this will affect only the lowest database layer, clients will be unaware. Possibly we'll have a writer thread that processes a queue of append requests for files. We just need to be sure that (a) the risk of data loss is still very low and (b) data are coherent. For example, a file can be purged from the cache for capacity reasons while there are data waiting for it in the write queue. It would be safe to re-read the file (under some mutex with the writer) but only if the to-be-written data were not already appended to the previous in-memory content and observed by a client. It may be that dirty files cannot be flushed; they will have to be cleaned first.
The text was updated successfully, but these errors were encountered:
At the moment, jobanalyzer will flush the in-memory cache for a node when new data arrives for the node and then append the data on disk. This was simple to implement and is maximally safe wrt data loss, but it makes the cache useful only for bursty queries: after about 5 minutes, all nodes will have reported new data and the cache will therefore be completely clean. The cache capacity purging will likely not kick in at all except for very large node sets or for very long time frames.
The right fix here is to not flush the cache when new data arrives, but to append both in-memory and on disk. Mostly this will affect only the lowest database layer, clients will be unaware. Possibly we'll have a writer thread that processes a queue of append requests for files. We just need to be sure that (a) the risk of data loss is still very low and (b) data are coherent. For example, a file can be purged from the cache for capacity reasons while there are data waiting for it in the write queue. It would be safe to re-read the file (under some mutex with the writer) but only if the to-be-written data were not already appended to the previous in-memory content and observed by a client. It may be that dirty files cannot be flushed; they will have to be cleaned first.
The text was updated successfully, but these errors were encountered: