forked from apache/cassandra
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNDB-12290 CC5 fix to use TrieMemtable as default in MemtableParams #1481
Open
djatnieks
wants to merge
131
commits into
main-5.0
Choose a base branch
from
CNDB-12290
base: main-5.0
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This additional metadata is somewhat valuable in the context of troubleshooting. Recently, we had an issue where the checksum itself was not (over)written and so it was stored as 0. In many cases, this won't be helpful, but since it is cheap and could be helpful, I propose adding some additional metadata when checksums don't match.
* Implement FSError#getMessage to ensure file name is logged For this code block: ```java var t = new FSWriteError(new IOException("Test failure"), new File("my", "file")); logger.error("error", t); ``` We used to log: ``` ERROR [main] 2024-09-19 11:09:18,599 VectorTypeTest.java:118 - error org.apache.cassandra.io.FSWriteError: java.io.IOException: Test failure at org.apache.cassandra.index.sai.cql.VectorTypeTest.endToEndTest(VectorTypeTest.java:117) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55) Caused by: java.io.IOException: Test failure ... 42 common frames omitted ``` Now we will log: ``` ERROR [main] 2024-09-19 11:10:02,910 VectorTypeTest.java:118 - error org.apache.cassandra.io.FSWriteError: my/file at org.apache.cassandra.index.sai.cql.VectorTypeTest.endToEndTest(VectorTypeTest.java:117) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runners.Suite.runChild(Suite.java:128) at org.junit.runners.Suite.runChild(Suite.java:27) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.junit.runner.JUnitCore.run(JUnitCore.java:137) at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:69) at com.intellij.rt.junit.IdeaTestRunner$Repeater$1.execute(IdeaTestRunner.java:38) at com.intellij.rt.execution.junit.TestsRepeater.repeat(TestsRepeater.java:11) at com.intellij.rt.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:35) at com.intellij.rt.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:232) at com.intellij.rt.junit.JUnitStarter.main(JUnitStarter.java:55) Caused by: java.io.IOException: Test failure ... 42 common frames omitted ``` * Add super.getMessage to message
* Use query view's locked indexes for Plan#estimateAnnNodesVisited This commit doesn't resolve the underlying problem in the design: we could easily use the wrong reference at any time. I'll need to think on this a bit more to know what is best. * Assert queryView is not null
This commit fixes a serious correctness bug in the way we build RowFilter for expressions involving OR. If a query contained multiple complex predicates such as NOT IN joined with the OR operator, the slices produced by NOT IN were incorrectly also joined by OR instead of by AND. In addition, a NOT IN with an empty list, if ORed with another expression, was incorrectly treated as an expression matching 0 rows, instead of matching all rows. Example 1: SELECT * FROM t WHERE x = 1 OR x NOT IN (2, 3, 4) was incorrectly matching all rows, including the ones with x = 2 or x = 3 or x = 4. Example 2: SELECT * FROM t WHERE x = 1 OR x NOT IN () was incorrectly matching only row with x = 1, instead of all rows. The bug was technically not limited to NOT IN, but any single restriction that wanted to add in exactly zero or more than one filter expression. Fixes riptano/cndb#10923
Fix typo in the comment Co-authored-by: Andrés de la Peña <[email protected]>
Javadoc Co-authored-by: Andrés de la Peña <[email protected]>
…n overridable method… (#1296) This commit simply move the code that deletes index components locally when an index is dropped inside a new method of the `SSTableWatcher` interface. As is, this just creates a minor indirection without changing any behaviour, but this allows custom implementations of `SSTableWatcher` to modify this behaviour, typically to handle tiered storage concerns.
* Added unified_compaction.override_ucs_config_for_vector_tables option. When enabled, the Controller will use the preferred vector settings for vector tables. - Added additional options for vector specific configuration.
…hards (#1255) Implement `ShardManagerReplicaAware` to align UCS and replica shards and thus limit the amount of sstables that are partially owned by replicas. The most interesting details are in the `IsolatedTokenAllocator#allocateTokens` and the `ShardManagerReplicaAware#computeBoundaries` methods. In the `allocateTokens` method, we take the current token metadata for a cluster, replace the snitch with one that does not gossip, and allocate new nodes until we satisfy the desired `additionalSplits` needed. By using the token allocation algorithm, high level split points naturally align with replica shards as new nodes are added. In `computeBoundaries`, we allocate any tokens needed, then we split the space into even spans and find the nearest replica token boundaries.
CNDB-10988: inspect out-of-space exception on compaction
- count evictions and not bytes - add metrics by removal cause
…nts of a collection and/or UDT Port of DB-1289/CASSANDRA-8877
) CNDB-10945: Change calculation of sstable span for small sstables In addition to correcting for very small spans, this also corrects sstable spans for ones having a small number of partitions where keys can easily fall in a small range. For these cases we use a value derived from the number of partitions in the table, which should work well for all scenarios, including wide-partition tables with a very limited number of partitions.
The eagerly populated leafNodeToLeafFP TreeMap has been replaced with a LeafCursor which allows to traverse the index tree directly, in a lazy way. The change significantly reduces the amount of up-front work we did to initialize the BKDReader.IteratorState. It also reduces GC pressure and memory usage. The user-facing effect is that `ORDER BY ... LIMIT ...` queries using a numeric index (KD-tree) are significantly faster. Fixes riptano/cndb#11021
Implements Stage 2 of Trie memtables, implementing trie-backed partitions and extending the memtable map to the level of tries. This stage still handles deletions with the legacy mechanisms (RangeTombstoneList etc) but can save quite a bit of space for the B-Tree partition-to-row maps. Also includes: - Code alignment with the Cassandra 5.0 branch. - A port of the OSS50 byte-comparable encoding version. - Fixed version preencoded byte-comparables for version safety. - Duplication of byte sources and better toArray conversion. - Direct skipping mechanism for trie cursors and map-like get method. - Forced node copying mechanism for trie mutations for atomicity and consistency. - Pluggable cell reuse. - Prefix and tail tries, as well as filtered iteration. - A mechanism for extracting current key during trie mutation. - Volatile reads to fully enforce happens-before between node preparation and use. - Various in-memory trie improvements. The earlier trie memtable implementation is still available as TrieMemtableStage1.
* DefaultMemtableFactory: align entire implementation with main branch * TrieMemtable: restore FACTORY instance var and factory(Map) method * TrieMemoryIndex: add previously missed use of BYTE_COMPARABLE_VERSION in rangeMatch method
…gregate and use picked sstables size as maxOverlap for aggregate (#1309) CNDB-10990: include archive size in Level.avg and use maxOverlap for unified aggregate update getOverheadSizeInBytes to include non-data components size Add config to disable new behavior, default enabled. add tests
…aybe we still want to check newer version for JDK22 specifically. Though this is the last ecj version to support JDK11. Upgrade: - ecj plus fix the java udf functions for JDK11+ - snakeyaml - it was already bumped in CNDB for security vulnerability - test dependencies: jacoco, byteman - higher version than CNDB but it is needed for catching up on JDK22 in tests findbugs - aligned with CNDB version but we probably want at some point to get to major version upgrade; not a priority for now jmh, bytebuddy - bumped to latest versions as they are known for not working on newer JDK versions
This commit improves performance of appending SAI components by avoiding unnecessary computation of CRC from the beginning of the file each time it is opened for appending. Fixes riptano/cndb#10783
Expose some methods needed for riptano/cndb#12128
Unbounded queue length at the native transport stage can caused large backlogs of requests that the system processes, even though clients may no longer expect a response. This PR implements a limited backport of CNDB-11070, introducing the notion of a native request timeout that can shed messages with excessive queue times at the NTR stage as well as async read/write stages, if enabled. Cross-node message timeouts are also now respected earlier in the mutation verb handler. This is a fairly straightforward cherry-pick of #1393 targeting main instead of cc-main-migration-release.
This reverts commit c06c94c. It seems the removal of `Index#postProcessor` by CNDB-11762 broke some tests in CNDB's `MultiNodeBillingTest`. Unfortunately that patch [was merged before creating the PR bumping the CC version used by CNDB](#1422 (comment)). [The CNDB PR](riptano/cndb#12076) was created after that merging but it was superseded by other CC version bumps. So I'm adding this reversal so we can investigate how the removal of `Index#postProcessor` affects those tests.
This patch replaces null values of `deterministic`, `monotonic` and `monotonic_on` columns in `system_schema.functions` and `system_schema.aggregates` with negative defaults. These defaults will be addressed if/once DB-672 gets ported to CC.
There are two mechanisms of detecting that the cluster is in the upgrade state and the minimum version. Both are slightly different, and both are not pluggable which means that CNDB doesn't work properly with them. Those mechanisms are implemented in `Gossiper`. Although we do not use `Gossiper` in CNDB, there are classes like `ColumnFilter` which go to `Gossiper` to check the upgrade state. So far, using that stuff in CDNB was a bit unpredictable, some of them reported the cluster is upgraded and in the current version, the other did not. This turned out to be a problem, especially for the `ColumnFilter` because when we upgrade DSE --> CC, CC assumes that the newest filter version should be used, which is not correctly deserialized and interpreted by DSE. The fix is not small, but it probably simplifies stuff a bit. First of all, two mechanism are merged into one. Moreover, we added pluggability of it so that we can provide the appropriate implementation in CNDB coordinators and writers, which is based on ETCD.
Part of riptano/cndb#12139 Moves constant shard count outside looping shards to reduce confusion.
…with DurationSpec type and 'native_transport_timeout_in_ms' as convertible old name with Long type; add some tests.
…MemtableIndexTest and TrieMemtableIndexAllocationsHeapBuffersTest from main branch.
…strictions (#1449) Closes riptano/cndb#12139 This PR adds a test of row count of a SAI plan in the presence of restrictions. Currently it tests queries with inequality, equality and half-ranges on different SAI column types and with or without histograms.
…g VIntOutOfRangeException to the catch block of SSTableIdentityIterator.exhaust method.
…pactionProgress to return the operation type from the first entry in the shared progress; needed in cases that a CompactionTask type is changed after creation.
…opriate (#1469) Fixes riptano/cndb#12239 We found the System.nanoTime was using significant cpu cost, but because the timeout is high enough, we can accept the inaccuracy. - [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version - [ ] Use `NoSpamLogger` for log lines that may appear frequently in the logs - [ ] Verify test results on Butler - [ ] Test coverage for new/modified code is > 80% - [ ] Proper code formatting - [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234 - [ ] Each commit has a meaningful description - [ ] Each commit is not very long and contains related changes - [ ] Renames, moves and reformatting are in distinct commits
Fixes regression in jvector 3.0.4 when compacting PQVectors larger than 2GB
### What is the issue SimpleClientPerfTest has been failing in CI since changes from CNDB-10759 ### What does this PR fix and why was it fixed This change in `SimpleClientPerfTest`, updates the anonymous class `Message.Codec<QueryMessage>` to override the correct method, `public CompletableFuture<Response> maybeExecuteAsync` from `QueryMessage`, whose signature was changed as part of CNDB-10759. ### Checklist before you submit for review - [ ] Make sure there is a PR in the CNDB project updating the Converged Cassandra version - [ ] Use `NoSpamLogger` for log lines that may appear frequently in the logs - [ ] Verify test results on Butler - [ ] Test coverage for new/modified code is > 80% - [ ] Proper code formatting - [ ] Proper title for each commit staring with the project-issue number, like CNDB-1234 - [ ] Each commit has a meaningful description - [ ] Each commit is not very long and contains related changes - [ ] Renames, moves and reformatting are in distinct commits
…ing for async batchlog removal (#1485) The test asserts that the batchlog is removed immediately after the write completes, but removal of the batchlog is async and can be delayed, particularly in resource-limited environments like CI.
The test generates partition index accesses by reusing the same key, and if the key cache is enabled, the test will fail for bigtable profiles because the key will be in the key cache.
…by filtering queries (#1484) Queries creating fake index contexts each create their own context, which can then race on metric registration (as the metrics have the same patterns). This can cause a query to fail. These metrics are superfluous, we can skip creating them entirely.
… index format version 'dx', 'cx', or older.
…ate.accumulatedDataSize; it only worked to fix SensorsWriteTest.testMultipleRowsMutationWithClusteringKey for SkipListMemtable and may not be necessary if the default memtable becomes TrieMemtable. Revisit SensorsWriteTest later if necessary.
Move static class TrieMemtable.Factory to TrieMemtableFactory class; Use suggested TriePartitionUpdate.unsharedHeapSize implementation; Use InMemoryTrie.shortLived in TrieToDotTest and TrieToMermaidTest; Add specific versions aa, ca, and da to RowIndexTest;
Add addMemoryUsageTo in SkipListMemtable and TrieMemtable Add TrieMemtable.switchOut
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What is the issue
Update CC 5.0 to use TrieMemtable by default in MemtableParams to align with main branch.
What does this PR fix and why was it fixed
Updates MemtableParams to use DefaultMemtableFactory, which uses TrieMemtable, instead of using SkipTableMemtableFactory.
Checklist before you submit for review
NoSpamLogger
for log lines that may appear frequently in the logs