-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Implement generic settings store #3904
chore: Implement generic settings store #3904
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I suggest that we add a generic version check at the store level to ensure basic consistency (just to avoid re-implementing the same thing everywhere) – definitely out of scope of the PR. I'm happy to discuss it and help with the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good approach overall.
Perhaps there's some possibility to tweak the API a bit, but certainly this is not a blocker. In particular, I do like the idea of providing R or RW transaction functions to read or mutate the underlying store.
For example,
// We could use *Collection[T] instead
// └─────────────────┐
// ▼
type ReadTxn func(context.Context, T) error
func (s *GenericStore[T, H]) Read(ctx context.Context, txn ReadTxn) error {
s.rw.Rlock()
defer s.rw.RUnlock()
// Do other book keeping tasks like making sure the store is fresh, etc
return txn(ctx, s.store)
}
We could define a similar RW transaction function:
type WriteTxn func(context.Context, T) error
which will rollback on errors.
I'm not sure how the generational updates will play into such an API change, but being able to execute arbitrary code with a known valid state of a GenericStore
could provide quite a bit of flexibility to users of GenericStore
.
// reset cache | ||
s.cache = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to reset the cache on a flush? It seems like we should be able to keep the cache in memory after the flush and avoid another read from the bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, I think that resetting cache here is not strictly required (it's just a safety measure), because we already cache all reads.
I was hesitant to say this, but I believe that the cache here might be a source of bugs that are particularly difficult to investigate. I would propose deleting it altogether:
- Read-intensive clients should have a cache on their side either way (just to avoid decoding overhead and network calls).
- Write-intensive clients do not benefit from the cache.
The reason is that we cannot guarantee absence of coexisting cache instances, thus cache coherence cannot be guaranteed as well (without employing very complex protocols). We largely mitigate this by the version/generation check, but there are many more issues that we can/do not handle.
Even the way the service is deployed allows for multiple instances to co-exist.
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
It should be Recreate
I believe (even then – there are no guarantees, it just diminishes probability). The problem here is that the deployment becomes disruptive, and the clients must be able to deal with that. I'm not very sure it it's a good trade-off.
I do not want to begin a conversation on CRDT to delay shipment of the current implementation (as it's already good, I believe), but we could employ some of the techniques.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not disagreeing, as long as we have little to no usage, those problems are theoretical. While recreate is better, it will make us unavailable, rather serve potentially outdated information, we have to chose one of the two, without investing significantly more into this.
I think eventually we need to follow either external options (xSQL) or approach it like metadata store, nothing of that are good investments of our time at this point.
Not currently used, but proposed in the PR
Extracting locking and caching into a more generically usable package. This can be used for CRUD of the
RecordingRules
.