Persistent providers store

Work is ongoing in #41 to to replace the providers `Table` with `SQLiteDatastore`, which would be used in combination with an LRU cache.

There are two aspects to this issue:

**(1)**

Because the relationship between between NodeId and SPRs is one-to-many, and because nim-datastore is a key-value store, we need a scheme to store multiple SPRs for the same NodeId.

At present, #41 accomplishes this via:
```
key = /[cid]/[spr-hash], value = spr-bytes
```

In comments on #41 there are some suggestions on how to do it differently, i.e. with less overhead.

Additional discussion on this aspect can take place in comments here or in #41.

**(2)**

There needs to be a decision on the purpose of the persistent datastore in relation to the cache.

I wrote a [long comment](https://github.com/status-im/nim-libp2p-dht/issues/40#issuecomment-1232280693) in #40 exploring the problem. But it was suggested to create an issue specifically about the persistent store. The key points made in that comment are reproduced below.

Is the purpose of the persistent datastore solely to populate the cache during process restart?

Or is it instead to maintain a set of available SPRs that's larger than the amount of memory committed to caching them?

The behavior of the cache and persistent datastore combo will vary depending on the purpose.

If the purpose of the persistent datastore is solely to populate the cache during process restart, then the relationship between the cache and datastore is simplified. Adding an SPR would involve a write to the cache and datastore, and a cache eviction would be paired with deletion/s in the datastore. Except for process startup, retrieving SPRs would only involve the cache. The amount of disk space used by the datastore would naturally be correlated with the size of the cache (keeping in mind flux owing to values not having a fixed size), though the datastore would be slightly larger than the cache owing to e.g. overhead of using SQLite and how that's done exactly (see notes in https://github.com/status-im/nim-libp2p-dht/pull/41).

If the purpose of the persistent datastore is to maintain a set of available SPRs that's larger than the amount of memory committed to caching SPRs (the purpose I had in mind when writing the description for #41), things are more complicated[^1]. What's the source of truth when asking the cache + datastore what SPRs it has for a CID? Retrieving from the cache and the datastore every time would defeat the purpose of the cache. Evictions from caches in level-two of the two-level cache proposed in #40 would result in inconsistency between the cache and datastore, which would defeat or at least hamper the purpose of the datastore. Limiting evictions to level-one of the cache would avoid the inconsistency problem, but with unbounded in-memory stores in level-two we'd be back to where we started.

[^1]: In this case, since the sizes of the cache and datastore would not be correlated, there would need to be an additional mechanism to constrain the size of the datastore on disk, which I anticipated in the notes in #41.

---

As discussed during the team call on Sep 1, changes related to https://github.com/status-im/nim-codex/issues/227 will significantly reduce the amount of data kept in the providers store. The implication being that a simpler approach to cache + persistent datastore (first purpose described above) should be okay.

Also note that considerations in #42 will need to inform choices made re: the cache + datastore.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persistent providers store #43

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Persistent providers store #43

Description

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions