Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent providers store #43

Open
michaelsbradleyjr opened this issue Sep 1, 2022 · 0 comments
Open

Persistent providers store #43

michaelsbradleyjr opened this issue Sep 1, 2022 · 0 comments

Comments

@michaelsbradleyjr
Copy link
Contributor

michaelsbradleyjr commented Sep 1, 2022

Work is ongoing in #41 to to replace the providers Table with SQLiteDatastore, which would be used in combination with an LRU cache.

There are two aspects to this issue:

(1)

Because the relationship between between NodeId and SPRs is one-to-many, and because nim-datastore is a key-value store, we need a scheme to store multiple SPRs for the same NodeId.

At present, #41 accomplishes this via:

key = /[cid]/[spr-hash], value = spr-bytes

In comments on #41 there are some suggestions on how to do it differently, i.e. with less overhead.

Additional discussion on this aspect can take place in comments here or in #41.

(2)

There needs to be a decision on the purpose of the persistent datastore in relation to the cache.

I wrote a long comment in #40 exploring the problem. But it was suggested to create an issue specifically about the persistent store. The key points made in that comment are reproduced below.

Is the purpose of the persistent datastore solely to populate the cache during process restart?

Or is it instead to maintain a set of available SPRs that's larger than the amount of memory committed to caching them?

The behavior of the cache and persistent datastore combo will vary depending on the purpose.

If the purpose of the persistent datastore is solely to populate the cache during process restart, then the relationship between the cache and datastore is simplified. Adding an SPR would involve a write to the cache and datastore, and a cache eviction would be paired with deletion/s in the datastore. Except for process startup, retrieving SPRs would only involve the cache. The amount of disk space used by the datastore would naturally be correlated with the size of the cache (keeping in mind flux owing to values not having a fixed size), though the datastore would be slightly larger than the cache owing to e.g. overhead of using SQLite and how that's done exactly (see notes in #41).

If the purpose of the persistent datastore is to maintain a set of available SPRs that's larger than the amount of memory committed to caching SPRs (the purpose I had in mind when writing the description for #41), things are more complicated1. What's the source of truth when asking the cache + datastore what SPRs it has for a CID? Retrieving from the cache and the datastore every time would defeat the purpose of the cache. Evictions from caches in level-two of the two-level cache proposed in #40 would result in inconsistency between the cache and datastore, which would defeat or at least hamper the purpose of the datastore. Limiting evictions to level-one of the cache would avoid the inconsistency problem, but with unbounded in-memory stores in level-two we'd be back to where we started.


As discussed during the team call on Sep 1, changes related to codex-storage/nim-codex#227 will significantly reduce the amount of data kept in the providers store. The implication being that a simpler approach to cache + persistent datastore (first purpose described above) should be okay.

Also note that considerations in #42 will need to inform choices made re: the cache + datastore.

Footnotes

  1. In this case, since the sizes of the cache and datastore would not be correlated, there would need to be an additional mechanism to constrain the size of the datastore on disk, which I anticipated in the notes in replace providers:Table with providers:SQLiteDatastore #41.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant