Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Implement generic settings store #3904

Merged
merged 8 commits into from
Feb 21, 2025

Conversation

simonswine
Copy link
Contributor

@simonswine simonswine commented Feb 12, 2025

Extracting locking and caching into a more generically usable package. This can be used for CRUD of the RecordingRules.

@simonswine simonswine marked this pull request as ready for review February 14, 2025 17:30
@simonswine simonswine requested a review from a team as a code owner February 14, 2025 17:30
Copy link
Collaborator

@kolesnikovae kolesnikovae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I suggest that we add a generic version check at the store level to ensure basic consistency (just to avoid re-implementing the same thing everywhere) – definitely out of scope of the PR. I'm happy to discuss it and help with the implementation

Copy link
Contributor

@bryanhuhta bryanhuhta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good approach overall.

Perhaps there's some possibility to tweak the API a bit, but certainly this is not a blocker. In particular, I do like the idea of providing R or RW transaction functions to read or mutate the underlying store.

For example,

// We could use *Collection[T] instead
//               └─────────────────┐
//                                 ▼
type ReadTxn func(context.Context, T) error

func (s *GenericStore[T, H]) Read(ctx context.Context, txn ReadTxn) error {
	s.rw.Rlock()
	defer s.rw.RUnlock()

	// Do other book keeping tasks like making sure the store is fresh, etc

	return txn(ctx, s.store)
}

We could define a similar RW transaction function:

type WriteTxn func(context.Context, T) error

which will rollback on errors.

I'm not sure how the generational updates will play into such an API change, but being able to execute arbitrary code with a known valid state of a GenericStore could provide quite a bit of flexibility to users of GenericStore.

Comment on lines +260 to +261
// reset cache
s.cache = nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to reset the cache on a flush? It seems like we should be able to keep the cache in memory after the flush and avoid another read from the bucket.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I think that resetting cache here is not strictly required (it's just a safety measure), because we already cache all reads.

I was hesitant to say this, but I believe that the cache here might be a source of bugs that are particularly difficult to investigate. I would propose deleting it altogether:

  1. Read-intensive clients should have a cache on their side either way (just to avoid decoding overhead and network calls).
  2. Write-intensive clients do not benefit from the cache.

The reason is that we cannot guarantee absence of coexisting cache instances, thus cache coherence cannot be guaranteed as well (without employing very complex protocols). We largely mitigate this by the version/generation check, but there are many more issues that we can/do not handle.

Even the way the service is deployed allows for multiple instances to co-exist.

  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate

It should be Recreate I believe (even then – there are no guarantees, it just diminishes probability). The problem here is that the deployment becomes disruptive, and the clients must be able to deal with that. I'm not very sure it it's a good trade-off.

I do not want to begin a conversation on CRDT to delay shipment of the current implementation (as it's already good, I believe), but we could employ some of the techniques.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not disagreeing, as long as we have little to no usage, those problems are theoretical. While recreate is better, it will make us unavailable, rather serve potentially outdated information, we have to chose one of the two, without investing significantly more into this.

I think eventually we need to follow either external options (xSQL) or approach it like metadata store, nothing of that are good investments of our time at this point.

Not currently used, but proposed in the PR
@simonswine simonswine enabled auto-merge (squash) February 21, 2025 09:13
@simonswine simonswine merged commit 9658bce into grafana:main Feb 21, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants