-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding TPCH benchmarks for Sort Merge Join #10092
Conversation
Co-authored-by: Andy Grove <[email protected]>
/benchmark |
|
||
/// Whether to disable collection of statistics (and cost based optimizations) or not. | ||
#[structopt(short = "j", long = "hash-join", default_value = "true")] | ||
prefer_hash_join: BoolDefaultTrue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that this might switch the tpch
runs to using SMJ by accident (given your comment above). I started some benchmark runs to see if we can get some data one way or the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hash join enabled by default, SMJ is a separate key which is not even documented yet as it still in experimental
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benchmarks look good to me. ✅
Benchmark resultsBenchmarks comparing b54adb3 (main) and 301c827 (PR)
|
Thanks @andygrove and @alamb for the feedback, please have a second look once you have the time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @comphead
|
||
/// Whether to disable collection of statistics (and cost based optimizations) or not. | ||
#[structopt(short = "j", long = "hash-join", default_value = "true")] | ||
prefer_hash_join: BoolDefaultTrue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benchmarks look good to me. ✅
Can we trigger the SMJ benchmark using the GitHub action too? |
The action has tpch hard coded here: https://github.com/apache/arrow-datafusion/blob/b54adb36b855f198b5098fdb3e4cdf1934818efd/.github/workflows/pr_benchmarks.yml#L41-L51 We could probably add SMJ (or maybe add another command like |
Sounds good. We could do it separately. |
Benchmark resultsBenchmarks comparing b54adb3 (main) and a88b278 (PR)
|
Benchmark resultsBenchmarks comparing b54adb3 (main) and a88b278 (PR)
|
Thanks all for the reviews 👍 |
Which issue does this PR close?
Related to #9846.
Rationale for this change
The idea is to make Sort Merge Join as stable so this PR adds separate TPCH benchmark for SMJ
What changes are included in this PR?
TPCH benches for SMJ
Are these changes tested?
Are there any user-facing changes?