Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of SingleRestrictionEstimatedRowCountTest #1502

Merged
merged 4 commits into from
Jan 24, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,10 @@

import java.math.BigDecimal;
import java.math.BigInteger;
import java.util.AbstractMap;
import java.util.HashMap;
import java.util.Map;

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import org.apache.cassandra.Util;
Expand All @@ -44,7 +45,9 @@

public class SingleRestrictionEstimatedRowCountTest extends SAITester
{
private int queryOptLevel;
static protected Map<Map.Entry<Version, CQL3Type.Native>, ColumnFamilyStore> tables = new HashMap<>();
static Version[] versions = new Version[]{ Version.DB, Version.EB };
static CQL3Type.Native[] types = new CQL3Type.Native[]{ INT, DECIMAL, VARINT };

static protected Object getFilterValue(CQL3Type.Native type, int value)
{
Expand All @@ -61,69 +64,75 @@ static protected Object getFilterValue(CQL3Type.Native type, int value)
return null;
}

@Before
public void setup()
static Map.Entry<Version, CQL3Type.Native> tablesEntryKey(Version version, CQL3Type.Native type)
{
queryOptLevel = QueryController.QUERY_OPT_LEVEL;
QueryController.QUERY_OPT_LEVEL = 0;
}

@After
public void teardown()
{
QueryController.QUERY_OPT_LEVEL = queryOptLevel;
return new AbstractMap.SimpleEntry<>(version, type);
}

@Test
public void testInequality()
public void testMemtablesSAI()
{
var test = new RowCountTest(Operator.NEQ, 25);
createTables();

RowCountTest test = new RowCountTest(Operator.NEQ, 25);
test.doTest(Version.DB, INT, 97.0);
test.doTest(Version.EB, INT, 97.0);
// Truncated numeric types planned differently
test.doTest(Version.DB, DECIMAL, 97.0);
test.doTest(Version.EB, DECIMAL, 97.0);
test.doTest(Version.EB, VARINT, 97.0);
Comment on lines 78 to 83

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: SAITester supports versioning. Why not use that feature instead of manually passing the version?

Copy link
Author

@k-rus k-rus Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: SAITester supports versioning. Why not use that feature instead of manually passing the version?

I don't understand how it can be used. To my understanding it will require to rearrange the test cases per SSTables version, which will make tests less useful, i.e., impossible to see the count differences per restriction. Also manual passing allows to see how different versions affect the count.
What do I miss in your proposal?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd also see different counts, because there would have to be some check like if (version.onOrAfter(Version.EB)) in any place where versions differ. The upside is that it would automatically test all other versions and you'd get tests for new versions for free, if they don't change anything. Just add a version to a list of versions and voila, the test runs on new format.

But it's up to you. I'm not insisting, that's why it was a suggestion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in my previous comment your suggestion will hide important differentiation that different versions calculate row counts differently. One difference comes from maintaining cached histograms for the latest version. Different formats of index data can be also a reason, but it wasn't observed and wasn't exhaustively tested.

@pkolaczk What is the functional requirement for the test that you brought the suggestion? Is it because not all versions are tested and more specifically introducing new version will not be covered by the test? I.e., difficult to maintain the test and run into obsolete test?

I can think about providing row counts per version groups and have latest group unbound, i.e., unknown versions will be assuming to implement the histogram. If it sounds good, I think to address it in a separate PR and merge this PR with the current limited approach. What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if there is a commonly used mechanism to do multi version tests, it should be the default way of testing, not implementing multiple versions in a different way manually. Just for consistency. And yes, being able to quickly add new versions without duplicating most tests is a bonus of using an existing system. But as I said earlier, it is fine to not do this in this PR. I just highlighted that there is this functionality available in the SAITester, and it's really up to you if you find it useful. If you think this would introduce unnecessary complexity on this particular test, no problem, let's merge it. Don't want to hold perfectly fine functionality just to make tests look nicer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if there is a commonly used mechanism to do multi version tests, it should be the default way of testing

Applying this default violates the purpose of this test: to demonstrate how versions affect row counts.

}

@Test
public void testHalfRangeMiddle()
{
var test = new RowCountTest(Operator.LT, 50);
test = new RowCountTest(Operator.LT, 50);
test.doTest(Version.DB, INT, 48);
test.doTest(Version.EB, INT, 48);
test.doTest(Version.DB, DECIMAL, 48);
test.doTest(Version.EB, DECIMAL, 48);
}

@Test
public void testHalfRangeEverything()
{
var test = new RowCountTest(Operator.LT, 150);
test = new RowCountTest(Operator.LT, 150);
test.doTest(Version.DB, INT, 97);
test.doTest(Version.EB, INT, 97);
test.doTest(Version.DB, DECIMAL, 97);
test.doTest(Version.EB, DECIMAL, 97);
}

@Test
public void testEquality()
{
var test = new RowCountTest(Operator.EQ, 31);
test = new RowCountTest(Operator.EQ, 31);
test.doTest(Version.DB, INT, 15);
test.doTest(Version.EB, INT, 0);
test.doTest(Version.DB, DECIMAL, 15);
test.doTest(Version.EB, DECIMAL, 0);
}

protected ColumnFamilyStore prepareTable(CQL3Type.Native type)

void createTables()
{
for (Version version : versions)
{
SAIUtil.setLatestVersion(version);
for (CQL3Type.Native type : types)
{
createTable("CREATE TABLE %s (pk text PRIMARY KEY, age " + type + ')');
createIndex("CREATE CUSTOM INDEX ON %s(age) USING 'StorageAttachedIndex'");
tables.put(tablesEntryKey(version, type), getCurrentColumnFamilyStore());
}
}
flush();
for (ColumnFamilyStore cfs : tables.values())
populateTable(cfs);
}

void populateTable(ColumnFamilyStore cfs)
{
createTable("CREATE TABLE %s (pk text PRIMARY KEY, age " + type + ')');
createIndex("CREATE CUSTOM INDEX ON %s(age) USING 'StorageAttachedIndex'");
return getCurrentColumnFamilyStore();
// Avoid race condition of starting before flushing completed
cfs.unsafeRunWithoutFlushing(() -> {
for (int i = 0; i < 100; i++)
{
String query = String.format("INSERT INTO %s (pk, age) VALUES (?, " + i + ')',
cfs.keyspace.getName() + '.' + cfs.name);
executeFormattedQuery(query, "key" + i);
}
});
}

class RowCountTest
static class RowCountTest
{
final Operator op;
final int filterValue;
Expand All @@ -136,20 +145,8 @@ class RowCountTest

void doTest(Version version, CQL3Type.Native type, double expectedRows)
{
Version latest = Version.latest();
SAIUtil.setLatestVersion(version);

ColumnFamilyStore cfs = prepareTable(type);
// Avoid race condition of flushing after the index creation
cfs.unsafeRunWithoutFlushing(() -> {
for (int i = 0; i < 100; i++)
{
execute("INSERT INTO %s (pk, age) VALUES (?," + i + ')', "key" + i);
}
});

ColumnFamilyStore cfs = tables.get(new AbstractMap.SimpleEntry<>(version, type));
Object filter = getFilterValue(type, filterValue);

ReadCommand rc = Util.cmd(cfs)
.columns("age")
.filterOn("age", op, filter)
Expand All @@ -159,6 +156,7 @@ void doTest(Version version, CQL3Type.Native type, double expectedRows)
version.onDiskFormat().indexFeatureSet(),
new QueryContext(),
null);

long totalRows = controller.planFactory.tableMetrics.rows;
assertEquals(0, cfs.metrics().liveSSTableCount.getValue().intValue());
assertEquals(97, totalRows);
Expand All @@ -172,8 +170,6 @@ void doTest(Version version, CQL3Type.Native type, double expectedRows)
assertEquals(expectedRows, root.expectedRows(), 0.1);
assertEquals(expectedRows, planNode.expectedKeys(), 0.1);
assertEquals(expectedRows / totalRows, planNode.selectivity(), 0.001);

SAIUtil.setLatestVersion(latest);
}
}
}