-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add bench for data page statistics parquet extraction #10950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@alamb |
I think that makes sense to me The other thing we could do is somehow ignore those columns until the support has been added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me -- thank you @marvinlanhenke
yes, but this would require another PR to revert back those changes. I think adding the bench is not so urgent; so merging once other dataypes are ready might be the easiest thing to do. I've also addressed your other comments; should be fine now. Thanks for the review. |
Thanks again @marvinlanhenke -- this PR looks good to me. Per your suggestion, let's wait until the required type support has been added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked this out and merged main and it runs without errors now, so it should be safe to merge.
Thanks for checking this out @efredine -- I merged up from main and updated this PR to get it moving. Once it passes CI I think it is good to go Thanks again @marvinlanhenke Like @efredine I verified the benchmark now works without error: cargo bench --bench parquet_statistic
Compiling bigdecimal v0.4.1
Compiling datafusion v39.0.0 (/Users/andrewlamb/Software/datafusion2/datafusion/core)
Finished `bench` profile [optimized] target(s) in 1m 34s
Running benches/parquet_statistic.rs (target/release/deps/parquet_statistic-c6fce472dea5abe8)
Gnuplot not found, using plotters backend
Extract row group statistics for Int64/extract_statistics/Int64
time: [594.98 ns 596.23 ns 597.66 ns]
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
Extract data page statistics for Int64/extract_statistics/Int64
time: [6.5665 µs 6.5848 µs 6.6047 µs]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Extract row group statistics for UInt64/extract_statistics/UInt64
time: [576.78 ns 578.78 ns 581.09 ns]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
Extract data page statistics for UInt64/extract_statistics/UInt64
time: [6.8120 µs 6.8332 µs 6.8559 µs]
Found 7 outliers among 100 measurements (7.00%)
6 (6.00%) high mild
1 (1.00%) high severe
Extract row group statistics for F64/extract_statistics/F64
time: [588.96 ns 592.68 ns 596.62 ns]
Extract data page statistics for F64/extract_statistics/F64
time: [7.5959 µs 7.6334 µs 7.6650 µs]
Extract row group statistics for String/extract_statistics/String
time: [897.07 ns 901.70 ns 907.19 ns]
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
Extract data page statistics for String/extract_statistics/String
time: [25.507 µs 25.555 µs 25.609 µs]
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...: Collecting 100 samples in estimated 5.00
Extract row group statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...
time: [947.78 ns 954.30 ns 960.82 ns]
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low severe
7 (7.00%) low mild
Benchmarking Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...: Collecting 100 samples in estimated 5.04
Extract data page statistics for Dictionary(Int32, String)/extract_statistics/Dictionary(Int32, Stri...
time: [25.602 µs 25.812 µs 26.109 µs]
Found 8 outliers among 100 measurements (8.00%)
7 (7.00%) high mild
1 (1.00%) high severe
|
* feat: add data page bench * chore: add comment * fix: row_groups + shorten row_group_indices --------- Co-authored-by: Andrew Lamb <[email protected]>
* feat: add data page bench * chore: add comment * fix: row_groups + shorten row_group_indices --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #10934.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?