You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My rough idea for the chaos pipeline was the following. That said, I’m completely open to other suggestions, just wanted to share what I already came up with:
3-node cluster with 3x replication
ingest 10 tenants (arbitrary number)
shutdown whole cluster
strategically corrupt tenants in a way that every node has some corrupt tenants, yet for each tenant always a QUORUM of replicas is left uncorrupted. In other words, never corrupt the same tenant twice
start cluster
Cluster must start up
All 10 tenants must be usable with QUORUM operations
How do you corrupt a tenant?
From what I understand there are two ways to corrupt tenants:
Randomly override portions of (or truncate) a *.db file in the LSM store. Any file should do. By randomly picking one we increase the chances of finding new bugs
Same pattern, but with the Vector Index (HNSW commit log files)
The text was updated successfully, but these errors were encountered:
Background
This broke during a v1.24 -> v1.25 upgrade which highlights that we didn't have regression testing.
Related Core tickets:
Pipeline idea
My rough idea for the chaos pipeline was the following. That said, I’m completely open to other suggestions, just wanted to share what I already came up with:
How do you corrupt a tenant?
From what I understand there are two ways to corrupt tenants:
*.db
file in the LSM store. Any file should do. By randomly picking one we increase the chances of finding new bugsThe text was updated successfully, but these errors were encountered: