You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Work is ongoing in #41 to to replace the providers Table with SQLiteDatastore, which would be used in combination with an LRU cache.
There are two aspects to this issue:
(1)
Because the relationship between between NodeId and SPRs is one-to-many, and because nim-datastore is a key-value store, we need a scheme to store multiple SPRs for the same NodeId.
In comments on #41 there are some suggestions on how to do it differently, i.e. with less overhead.
Additional discussion on this aspect can take place in comments here or in #41.
(2)
There needs to be a decision on the purpose of the persistent datastore in relation to the cache.
I wrote a long comment in #40 exploring the problem. But it was suggested to create an issue specifically about the persistent store. The key points made in that comment are reproduced below.
Is the purpose of the persistent datastore solely to populate the cache during process restart?
Or is it instead to maintain a set of available SPRs that's larger than the amount of memory committed to caching them?
The behavior of the cache and persistent datastore combo will vary depending on the purpose.
If the purpose of the persistent datastore is solely to populate the cache during process restart, then the relationship between the cache and datastore is simplified. Adding an SPR would involve a write to the cache and datastore, and a cache eviction would be paired with deletion/s in the datastore. Except for process startup, retrieving SPRs would only involve the cache. The amount of disk space used by the datastore would naturally be correlated with the size of the cache (keeping in mind flux owing to values not having a fixed size), though the datastore would be slightly larger than the cache owing to e.g. overhead of using SQLite and how that's done exactly (see notes in #41).
If the purpose of the persistent datastore is to maintain a set of available SPRs that's larger than the amount of memory committed to caching SPRs (the purpose I had in mind when writing the description for #41), things are more complicated1. What's the source of truth when asking the cache + datastore what SPRs it has for a CID? Retrieving from the cache and the datastore every time would defeat the purpose of the cache. Evictions from caches in level-two of the two-level cache proposed in #40 would result in inconsistency between the cache and datastore, which would defeat or at least hamper the purpose of the datastore. Limiting evictions to level-one of the cache would avoid the inconsistency problem, but with unbounded in-memory stores in level-two we'd be back to where we started.
As discussed during the team call on Sep 1, changes related to codex-storage/nim-codex#227 will significantly reduce the amount of data kept in the providers store. The implication being that a simpler approach to cache + persistent datastore (first purpose described above) should be okay.
Also note that considerations in #42 will need to inform choices made re: the cache + datastore.
Footnotes
In this case, since the sizes of the cache and datastore would not be correlated, there would need to be an additional mechanism to constrain the size of the datastore on disk, which I anticipated in the notes in replace providers:Table with providers:SQLiteDatastore #41. ↩
The text was updated successfully, but these errors were encountered:
Work is ongoing in #41 to to replace the providers
Table
withSQLiteDatastore
, which would be used in combination with an LRU cache.There are two aspects to this issue:
(1)
Because the relationship between between NodeId and SPRs is one-to-many, and because nim-datastore is a key-value store, we need a scheme to store multiple SPRs for the same NodeId.
At present, #41 accomplishes this via:
In comments on #41 there are some suggestions on how to do it differently, i.e. with less overhead.
Additional discussion on this aspect can take place in comments here or in #41.
(2)
There needs to be a decision on the purpose of the persistent datastore in relation to the cache.
I wrote a long comment in #40 exploring the problem. But it was suggested to create an issue specifically about the persistent store. The key points made in that comment are reproduced below.
Is the purpose of the persistent datastore solely to populate the cache during process restart?
Or is it instead to maintain a set of available SPRs that's larger than the amount of memory committed to caching them?
The behavior of the cache and persistent datastore combo will vary depending on the purpose.
If the purpose of the persistent datastore is solely to populate the cache during process restart, then the relationship between the cache and datastore is simplified. Adding an SPR would involve a write to the cache and datastore, and a cache eviction would be paired with deletion/s in the datastore. Except for process startup, retrieving SPRs would only involve the cache. The amount of disk space used by the datastore would naturally be correlated with the size of the cache (keeping in mind flux owing to values not having a fixed size), though the datastore would be slightly larger than the cache owing to e.g. overhead of using SQLite and how that's done exactly (see notes in #41).
If the purpose of the persistent datastore is to maintain a set of available SPRs that's larger than the amount of memory committed to caching SPRs (the purpose I had in mind when writing the description for #41), things are more complicated1. What's the source of truth when asking the cache + datastore what SPRs it has for a CID? Retrieving from the cache and the datastore every time would defeat the purpose of the cache. Evictions from caches in level-two of the two-level cache proposed in #40 would result in inconsistency between the cache and datastore, which would defeat or at least hamper the purpose of the datastore. Limiting evictions to level-one of the cache would avoid the inconsistency problem, but with unbounded in-memory stores in level-two we'd be back to where we started.
As discussed during the team call on Sep 1, changes related to codex-storage/nim-codex#227 will significantly reduce the amount of data kept in the providers store. The implication being that a simpler approach to cache + persistent datastore (first purpose described above) should be okay.
Also note that considerations in #42 will need to inform choices made re: the cache + datastore.
Footnotes
In this case, since the sizes of the cache and datastore would not be correlated, there would need to be an additional mechanism to constrain the size of the datastore on disk, which I anticipated in the notes in replace providers:Table with providers:SQLiteDatastore #41. ↩
The text was updated successfully, but these errors were encountered: