benchmark: add suite #125

sirupsen · 2022-06-28T19:48:18Z

Cleanup of resolved merge conflict from #67 based on all the changes in master. Trying to split it up into chunks to make it easier to review (and do an extra pass on it all)

The idea is to dual-purpose a subset of the type test-suite for benchmarking, to ensure it doesn't rust. In particular, to follow-up from #106, the changes are to:

In BENCHMARK=1 mode, we do not want to drop tables, to prevent waiting for millions of rows to be inserted again. Not even across revisions if we're testing the performance change between two revisions
Create indexes, to have good performance
Write benchmark results to stdout and JSONL file(s) for graphing/analysis after
Insert in larger batches for performance (follow-up on CSV)
When benchmarking, we want to download segments in parallel for fairness

There are plenty of follow-ups to this, likely not all of which I will be able to complete:

Enable threading > 1 in the test suite, which likely leads to some connection pooling logic changes I'd rather do parallel
Do CSV insertion for the Cloud databases, to make it much faster to reproduce tests for 100M rows (currently takes hours)
Add steps on how to reproduce the graph in the README based on benchmark_*.jsonl
Add benchmarks with an increasing % of changed/deleted rows

@erezsh

cc @abhinav1912 due to questions in #103

sirupsen · 2022-06-28T19:52:51Z

data_diff/diff_tables.py

+        # When benchmarking, we want the ability to download the segments in
+        # parallel. By default, data-diff will checksum the section first (when
+        # it's below the threshold) and _then_ download it.
+        if BENCHMARK:


FWIW I don't think it would be crazy to make this the default...

I ran the benchmark with the default bisection threshold, and checksumming / downloading with the default threshold and ~10k rows is extremely competitive. And the current approach is twice as slow when there is a difference.

erezsh

When running the tests I get

mysql.connector.errors.ProgrammingError: 1059 (42000): Identifier name 'idx_dst_postgresql_double_precision_to_mysql_numeric65_10_50_id_col' is too long

erezsh · 2022-06-29T07:30:41Z

data_diff/diff_tables.py

@@ -403,6 +404,15 @@ def _diff_tables(self, table1, table2, level=0, segment_index=None, segment_coun
            f"size: {table2.max_key-table1.min_key}"
        )

+        # When benchmarking, we want the ability to download the segments in


I don't understand the word "parallel" in this context.

Wouldn't it be more correct to say "we want the ability to skip the checksum, and force the download of all segments" ?

Sure, the parallel/concurrency is redundant. I think you're right to just put checksum-skipping as the prominent thing in this comment

erezsh · 2022-06-29T07:37:01Z

tests/test_database_types.py

+        assert BENCHMARK, "Table should've been deleted, or we should be in BENCHMARK mode"
+        return
+    elif current_n_rows > 0:
+        logging.error(f"{conn.name}: Table {table} has {current_n_rows}, expected {N_SAMPLES} or 0. Recreating table")


Maybe this should be a warning?

Default logging error level is ERROR to have less noise for test output (because the warning level has all the precision warnings, etc. that make the test output noisy). Of course, we could set this differently per logger, but I think that just adds more confusion..

Maybe I'm missing something, but it should be simple to set the test logger to warning, and the data-diff loggers to error, no?

Yeah, that was what I was getting at in my last sentence.. I will just remove it then

sirupsen · 2022-06-29T11:40:00Z

mysql.connector.errors.ProgrammingError: 1059 (42000): Identifier name 'idx_dst_postgresql_double_precision_to_mysql_numeric65_10_50_id_col' is too long

Ugh, not sure how I missed that! Fixed with another .replace

sirupsen · 2022-06-29T11:41:39Z

@erezsh tried to resolve your comments! Let me know if it's good to go now :)

sirupsen · 2022-06-29T12:14:09Z

Good to merge?

sirupsen commented Jun 28, 2022

View reviewed changes

sirupsen force-pushed the benchmark-suite2 branch from 23614f7 to c5c2a9d Compare June 28, 2022 19:53

sirupsen mentioned this pull request Jun 28, 2022

tests: database_types dual-use for benchmarks #67

Closed

erezsh suggested changes Jun 29, 2022

View reviewed changes

sirupsen force-pushed the benchmark-suite2 branch from c5c2a9d to ae8599a Compare June 29, 2022 11:41

sirupsen force-pushed the benchmark-suite2 branch from ae8599a to 90de3df Compare June 29, 2022 12:13

erezsh approved these changes Jun 29, 2022

View reviewed changes

benchmark: add suite

36fd574

sirupsen force-pushed the benchmark-suite2 branch from 90de3df to 36fd574 Compare June 29, 2022 13:11

sirupsen merged commit 16d6c7c into master Jun 29, 2022

sirupsen deleted the benchmark-suite2 branch June 29, 2022 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

benchmark: add suite #125

benchmark: add suite #125

Uh oh!

sirupsen commented Jun 28, 2022 •

edited

Loading

Uh oh!

sirupsen Jun 28, 2022

Uh oh!

erezsh left a comment

Uh oh!

erezsh Jun 29, 2022

Uh oh!

sirupsen Jun 29, 2022 •

edited

Loading

Uh oh!

erezsh Jun 29, 2022

Uh oh!

sirupsen Jun 29, 2022 •

edited

Loading

Uh oh!

erezsh Jun 29, 2022

Uh oh!

sirupsen Jun 29, 2022 •

edited

Loading

Uh oh!

sirupsen commented Jun 29, 2022

Uh oh!

sirupsen commented Jun 29, 2022

Uh oh!

sirupsen commented Jun 29, 2022

Uh oh!

Uh oh!

benchmark: add suite #125

benchmark: add suite #125

Uh oh!

Conversation

sirupsen commented Jun 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sirupsen Jun 28, 2022

Choose a reason for hiding this comment

Uh oh!

erezsh left a comment

Choose a reason for hiding this comment

Uh oh!

erezsh Jun 29, 2022

Choose a reason for hiding this comment

Uh oh!

sirupsen Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezsh Jun 29, 2022

Choose a reason for hiding this comment

Uh oh!

sirupsen Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

erezsh Jun 29, 2022

Choose a reason for hiding this comment

Uh oh!

sirupsen Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sirupsen commented Jun 29, 2022

Uh oh!

sirupsen commented Jun 29, 2022

Uh oh!

sirupsen commented Jun 29, 2022

Uh oh!

Uh oh!

sirupsen commented Jun 28, 2022 •

edited

Loading

sirupsen Jun 29, 2022 •

edited

Loading

sirupsen Jun 29, 2022 •

edited

Loading

sirupsen Jun 29, 2022 •

edited

Loading