Skip to content

Draft Poc for Unified select (Enum for bitmap and range) #7454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

zhuqi-lucas
Copy link
Contributor

@zhuqi-lucas zhuqi-lucas commented Apr 29, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@zhuqi-lucas zhuqi-lucas marked this pull request as draft April 29, 2025 04:39
@github-actions github-actions bot added the parquet Changes to the parquet crate label Apr 29, 2025
@alamb
Copy link
Contributor

alamb commented Apr 29, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing unified_select (e5aad7c) to 5f0aed6 diff
BENCH_NAME=arrow_reader_row_filter
BENCH_COMMAND=cargo bench --all-features --bench arrow_reader_row_filter
BENCH_FILTER=
BENCH_BRANCH_NAME=unified_select
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Apr 29, 2025

I kicked off running the benchmarks on this PR

@alamb
Copy link
Contributor

alamb commented Apr 29, 2025

🤖: Benchmark completed

Details

group                                                                                 main                                   unified_select
-----                                                                                 ----                                   --------------
arrow_reader_row_filter/Composite/all_columns/async                                   1.00      2.9±0.01ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
arrow_reader_row_filter/Composite/all_columns/sync                                    1.00      3.2±0.02ms        ? ?/sec    1.00      3.2±0.03ms        ? ?/sec
arrow_reader_row_filter/Composite/exclude_filter_column/async                         1.00      2.5±0.01ms        ? ?/sec    1.00      2.5±0.01ms        ? ?/sec
arrow_reader_row_filter/Composite/exclude_filter_column/sync                          1.00      2.8±0.02ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/all_columns/async                1.00      3.0±0.01ms        ? ?/sec    1.00      3.0±0.01ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/all_columns/sync                 1.00      3.2±0.02ms        ? ?/sec    1.00      3.2±0.02ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/exclude_filter_column/async      1.00      2.8±0.01ms        ? ?/sec    1.00      2.8±0.02ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveClustered/exclude_filter_column/sync       1.01      3.1±0.01ms        ? ?/sec    1.00      3.1±0.01ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/all_columns/async              1.00      6.5±0.03ms        ? ?/sec    1.40      9.1±0.12ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/all_columns/sync               1.00      6.6±0.03ms        ? ?/sec    1.00      6.6±0.04ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/exclude_filter_column/async    1.00      5.6±0.03ms        ? ?/sec    1.58      8.9±0.07ms        ? ?/sec
arrow_reader_row_filter/ModeratelySelectiveUnclustered/exclude_filter_column/sync     1.00      5.7±0.03ms        ? ?/sec    1.00      5.7±0.03ms        ? ?/sec
arrow_reader_row_filter/PointLookup/all_columns/async                                 1.00      2.0±0.01ms        ? ?/sec    1.00      2.0±0.01ms        ? ?/sec
arrow_reader_row_filter/PointLookup/all_columns/sync                                  1.00      2.2±0.01ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
arrow_reader_row_filter/PointLookup/exclude_filter_column/async                       1.00  1984.8±12.54µs        ? ?/sec    1.01      2.0±0.01ms        ? ?/sec
arrow_reader_row_filter/PointLookup/exclude_filter_column/sync                        1.00      2.2±0.02ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/all_columns/async                        1.00      3.1±0.02ms        ? ?/sec    2.80      8.7±0.07ms        ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/all_columns/sync                         1.00      3.4±0.02ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/exclude_filter_column/async              1.00      2.7±0.01ms        ? ?/sec    2.98      8.2±0.05ms        ? ?/sec
arrow_reader_row_filter/SelectiveUnclustered/exclude_filter_column/sync               1.00      3.0±0.01ms        ? ?/sec    1.01      3.0±0.03ms        ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/all_columns/async                        1.03      7.9±0.03ms        ? ?/sec    1.00      7.7±0.04ms        ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/all_columns/sync                         1.03      8.2±0.04ms        ? ?/sec    1.00      8.0±0.05ms        ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/exclude_filter_column/async              1.03      7.7±0.05ms        ? ?/sec    1.00      7.4±0.05ms        ? ?/sec
arrow_reader_row_filter/UnselectiveClustered/exclude_filter_column/sync               1.03      7.9±0.03ms        ? ?/sec    1.00      7.6±0.03ms        ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/all_columns/async                      1.00      3.1±0.02ms        ? ?/sec    2.80      8.7±0.09ms        ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/all_columns/sync                       1.00      3.4±0.02ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/exclude_filter_column/async            1.00      2.7±0.01ms        ? ?/sec    2.98      8.2±0.06ms        ? ?/sec
arrow_reader_row_filter/UnselectiveUnclustered/exclude_filter_column/sync             1.00      3.0±0.02ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/async                            1.13     23.7±0.09ms        ? ?/sec    1.00     20.9±0.26ms        ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/all_columns/sync                             1.02     23.9±0.12ms        ? ?/sec    1.00     23.5±0.10ms        ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/async                  1.57     15.0±0.05ms        ? ?/sec    1.00      9.6±0.06ms        ? ?/sec
arrow_reader_row_filter/Utf8ViewNonEmpty/exclude_filter_column/sync                   1.02     15.1±0.04ms        ? ?/sec    1.00     14.9±0.07ms        ? ?/sec

@zhuqi-lucas
Copy link
Contributor Author

Thank you @alamb , from the benchmark it seems this PR only improve the Utf8ViewNonEmpty cases, and regression for some cases.

@zhuqi-lucas
Copy link
Contributor Author

But from clickbench result, it seems performance better than the original default push down:

./bench.sh compare  older_push_down test_default_parquet_push_down
Comparing older_push_down and test_default_parquet_push_down
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ older_push_down ┃ test_default_parquet_push_down ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 00.29ms │                         0.40ms │  1.39x slower │
│ QQuery 150.39ms │                        46.59ms │ +1.08x faster │
│ QQuery 277.56ms │                        75.08ms │     no change │
│ QQuery 377.95ms │                        79.83ms │     no change │
│ QQuery 4579.45ms │                       504.25ms │ +1.15x faster │
│ QQuery 5560.83ms │                       541.37ms │     no change │
│ QQuery 60.33ms │                         0.33ms │     no change │
│ QQuery 763.95ms │                        56.08ms │ +1.14x faster │
│ QQuery 8718.15ms │                       663.54ms │ +1.08x faster │
│ QQuery 9786.44ms │                       751.61ms │     no change │
│ QQuery 10206.47ms │                       214.36ms │     no change │
│ QQuery 11213.81ms │                       239.89ms │  1.12x slower │
│ QQuery 12679.51ms │                       705.69ms │     no change │
│ QQuery 131002.85ms │                       868.43ms │ +1.15x faster │
│ QQuery 14763.19ms │                       689.47ms │ +1.11x faster │
│ QQuery 15649.33ms │                       612.41ms │ +1.06x faster │
│ QQuery 161390.12ms │                      1339.99ms │     no change │
│ QQuery 171219.49ms │                      1134.36ms │ +1.08x faster │
│ QQuery 182932.03ms │                      2566.26ms │ +1.14x faster │
│ QQuery 1963.44ms │                        58.59ms │ +1.08x faster │
│ QQuery 20752.05ms │                       679.10ms │ +1.11x faster │
│ QQuery 21927.06ms │                       811.46ms │ +1.14x faster │
│ QQuery 221558.98ms │                      1461.95ms │ +1.07x faster │
│ QQuery 233402.25ms │                      2604.51ms │ +1.31x faster │
│ QQuery 24471.04ms │                       445.48ms │ +1.06x faster │
│ QQuery 25420.43ms │                       423.96ms │     no change │
│ QQuery 26522.57ms │                       482.58ms │ +1.08x faster │
│ QQuery 271375.58ms │                      1304.33ms │ +1.05x faster │
│ QQuery 288767.85ms │                      8484.28ms │     no change │
│ QQuery 29458.33ms │                       446.29ms │     no change │
│ QQuery 30689.41ms │                       450.94ms │ +1.53x faster │
│ QQuery 31758.38ms │                       574.79ms │ +1.32x faster │
│ QQuery 322981.93ms │                      2339.84ms │ +1.27x faster │
│ QQuery 332886.54ms │                      2587.16ms │ +1.12x faster │
│ QQuery 343298.54ms │                      2760.24ms │ +1.20x faster │
│ QQuery 35834.81ms │                       893.25ms │  1.07x slower │
│ QQuery 3640.41ms │                        36.52ms │ +1.11x faster │
│ QQuery 3735.59ms │                        33.71ms │ +1.06x faster │
│ QQuery 3839.18ms │                        36.37ms │ +1.08x faster │
│ QQuery 3935.07ms │                        36.88ms │  1.05x slower │
│ QQuery 4036.55ms │                        34.98ms │     no change │
│ QQuery 4136.18ms │                        36.72ms │     no change │
│ QQuery 4237.00ms │                        34.64ms │ +1.07x faster │
└──────────────┴─────────────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (older_push_down)42401.30ms │
│ Total Time (test_default_parquet_push_down)38148.50ms │
│ Average Time (older_push_down)986.08ms │
│ Average Time (test_default_parquet_push_down)887.17ms │
│ Queries Faster26 │
│ Queries Slower4 │
│ Queries with No Change13 │
└───────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ older_push_down ┃ test_default_parquet_push_down ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 01.12ms │                         1.10ms │     no change │
│ QQuery 125.44ms │                        25.93ms │     no change │
│ QQuery 254.65ms │                        55.48ms │     no change │
│ QQuery 356.82ms │                        61.02ms │  1.07x slower │
│ QQuery 4484.52ms │                       470.33ms │     no change │
│ QQuery 5524.38ms │                       546.59ms │     no change │
│ QQuery 61.24ms │                         1.16ms │ +1.07x faster │
│ QQuery 743.00ms │                        38.75ms │ +1.11x faster │
│ QQuery 8632.15ms │                       593.78ms │ +1.06x faster │
│ QQuery 9658.82ms │                       680.13ms │     no change │
│ QQuery 10154.14ms │                       184.81ms │  1.20x slower │
│ QQuery 11183.86ms │                       198.64ms │  1.08x slower │
│ QQuery 12664.23ms │                       678.02ms │     no change │
│ QQuery 13916.31ms │                       771.84ms │ +1.19x faster │
│ QQuery 14717.15ms │                       637.99ms │ +1.12x faster │
│ QQuery 15561.85ms │                       562.13ms │     no change │
│ QQuery 161430.77ms │                      1346.40ms │ +1.06x faster │
│ QQuery 171249.19ms │                      1117.41ms │ +1.12x faster │
│ QQuery 182739.07ms │                      2427.62ms │ +1.13x faster │
│ QQuery 1941.63ms │                        45.73ms │  1.10x slower │
│ QQuery 20724.72ms │                       694.00ms │     no change │
│ QQuery 21819.39ms │                       790.52ms │     no change │
│ QQuery 221551.09ms │                      1517.14ms │     no change │
│ QQuery 233565.46ms │                      2681.59ms │ +1.33x faster │
│ QQuery 24457.95ms │                       386.72ms │ +1.18x faster │
│ QQuery 25381.44ms │                       370.83ms │     no change │
│ QQuery 26526.80ms │                       435.73ms │ +1.21x faster │
│ QQuery 271415.36ms │                      1354.63ms │     no change │
│ QQuery 288528.35ms │                      9813.62ms │  1.15x slower │
│ QQuery 29401.68ms │                       396.41ms │     no change │
│ QQuery 30728.25ms │                       431.94ms │ +1.69x faster │
│ QQuery 31744.53ms │                       528.87ms │ +1.41x faster │
│ QQuery 323257.23ms │                      2323.61ms │ +1.40x faster │
│ QQuery 333187.75ms │                      2517.60ms │ +1.27x faster │
│ QQuery 343735.45ms │                      2840.70ms │ +1.31x faster │
│ QQuery 35877.66ms │                       728.25ms │ +1.21x faster │
│ QQuery 3626.23ms │                        22.64ms │ +1.16x faster │
│ QQuery 3723.04ms │                        21.68ms │ +1.06x faster │
│ QQuery 3821.55ms │                        21.89ms │     no change │
│ QQuery 3922.02ms │                        22.69ms │     no change │
│ QQuery 4021.54ms │                        21.98ms │     no change │
│ QQuery 4121.64ms │                        21.95ms │     no change │
│ QQuery 4223.77ms │                        22.51ms │ +1.06x faster │
└──────────────┴─────────────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (older_push_down)42203.20ms │
│ Total Time (test_default_parquet_push_down)38412.36ms │
│ Average Time (older_push_down)981.47ms │
│ Average Time (test_default_parquet_push_down)893.31ms │
│ Queries Faster20 │
│ Queries Slower5 │
│ Queries with No Change18 │
└───────────────────────────────────────────────┴────────────┘

@zhuqi-lucas
Copy link
Contributor Author

And here is the result compared with main(No push down), still some regression:

./bench.sh compare main test_default_parquet_push_down
Comparing main and test_default_parquet_push_down
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ test_default_parquet_push_down ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 00.32ms │                         0.40ms │  1.24x slower │
│ QQuery 146.96ms │                        46.59ms │     no change │
│ QQuery 275.59ms │                        75.08ms │     no change │
│ QQuery 374.12ms │                        79.83ms │  1.08x slower │
│ QQuery 4556.03ms │                       504.25ms │ +1.10x faster │
│ QQuery 5563.52ms │                       541.37ms │     no change │
│ QQuery 60.31ms │                         0.33ms │  1.06x slower │
│ QQuery 752.23ms │                        56.08ms │  1.07x slower │
│ QQuery 8720.15ms │                       663.54ms │ +1.09x faster │
│ QQuery 9741.10ms │                       751.61ms │     no change │
│ QQuery 10171.95ms │                       214.36ms │  1.25x slower │
│ QQuery 11187.66ms │                       239.89ms │  1.28x slower │
│ QQuery 12597.16ms │                       705.69ms │  1.18x slower │
│ QQuery 13877.71ms │                       868.43ms │     no change │
│ QQuery 14605.11ms │                       689.47ms │  1.14x slower │
│ QQuery 15630.66ms │                       612.41ms │     no change │
│ QQuery 161422.47ms │                      1339.99ms │ +1.06x faster │
│ QQuery 171221.90ms │                      1134.36ms │ +1.08x faster │
│ QQuery 182773.23ms │                      2566.26ms │ +1.08x faster │
│ QQuery 1966.30ms │                        58.59ms │ +1.13x faster │
│ QQuery 20682.62ms │                       679.10ms │     no change │
│ QQuery 21800.86ms │                       811.46ms │     no change │
│ QQuery 221521.09ms │                      1461.95ms │     no change │
│ QQuery 234223.95ms │                      2604.51ms │ +1.62x faster │
│ QQuery 24286.83ms │                       445.48ms │  1.55x slower │
│ QQuery 25274.47ms │                       423.96ms │  1.54x slower │
│ QQuery 26320.45ms │                       482.58ms │  1.51x slower │
│ QQuery 27945.72ms │                      1304.33ms │  1.38x slower │
│ QQuery 288206.32ms │                      8484.28ms │     no change │
│ QQuery 29459.59ms │                       446.29ms │     no change │
│ QQuery 30493.35ms │                       450.94ms │ +1.09x faster │
│ QQuery 31585.82ms │                       574.79ms │     no change │
│ QQuery 322436.43ms │                      2339.84ms │     no change │
│ QQuery 332916.52ms │                      2587.16ms │ +1.13x faster │
│ QQuery 342975.16ms │                      2760.24ms │ +1.08x faster │
│ QQuery 35866.75ms │                       893.25ms │     no change │
│ QQuery 36104.22ms │                        36.52ms │ +2.85x faster │
│ QQuery 3762.50ms │                        33.71ms │ +1.85x faster │
│ QQuery 38107.57ms │                        36.37ms │ +2.96x faster │
│ QQuery 39167.64ms │                        36.88ms │ +4.55x faster │
│ QQuery 4046.49ms │                        34.98ms │ +1.33x faster │
│ QQuery 4145.49ms │                        36.72ms │ +1.24x faster │
│ QQuery 4242.51ms │                        34.64ms │ +1.23x faster │
└──────────────┴───────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)39956.84ms │
│ Total Time (test_default_parquet_push_down)38148.50ms │
│ Average Time (main)929.23ms │
│ Average Time (test_default_parquet_push_down)887.17ms │
│ Queries Faster17 │
│ Queries Slower12 │
│ Queries with No Change14 │
└───────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ test_default_parquet_push_down ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 01.10ms │                         1.10ms │     no change │
│ QQuery 124.27ms │                        25.93ms │  1.07x slower │
│ QQuery 258.03ms │                        55.48ms │     no change │
│ QQuery 358.91ms │                        61.02ms │     no change │
│ QQuery 4478.56ms │                       470.33ms │     no change │
│ QQuery 5549.38ms │                       546.59ms │     no change │
│ QQuery 61.18ms │                         1.16ms │     no change │
│ QQuery 740.05ms │                        38.75ms │     no change │
│ QQuery 8645.79ms │                       593.78ms │ +1.09x faster │
│ QQuery 9671.10ms │                       680.13ms │     no change │
│ QQuery 10133.89ms │                       184.81ms │  1.38x slower │
│ QQuery 11159.81ms │                       198.64ms │  1.24x slower │
│ QQuery 12561.74ms │                       678.02ms │  1.21x slower │
│ QQuery 13750.14ms │                       771.84ms │     no change │
│ QQuery 14525.03ms │                       637.99ms │  1.22x slower │
│ QQuery 15553.88ms │                       562.13ms │     no change │
│ QQuery 161417.77ms │                      1346.40ms │ +1.05x faster │
│ QQuery 171104.70ms │                      1117.41ms │     no change │
│ QQuery 183037.46ms │                      2427.62ms │ +1.25x faster │
│ QQuery 1945.81ms │                        45.73ms │     no change │
│ QQuery 20733.71ms │                       694.00ms │ +1.06x faster │
│ QQuery 21789.70ms │                       790.52ms │     no change │
│ QQuery 221299.84ms │                      1517.14ms │  1.17x slower │
│ QQuery 233952.24ms │                      2681.59ms │ +1.47x faster │
│ QQuery 24273.40ms │                       386.72ms │  1.41x slower │
│ QQuery 25274.14ms │                       370.83ms │  1.35x slower │
│ QQuery 26320.12ms │                       435.73ms │  1.36x slower │
│ QQuery 27900.06ms │                      1354.63ms │  1.51x slower │
│ QQuery 287812.82ms │                      9813.62ms │  1.26x slower │
│ QQuery 29390.07ms │                       396.41ms │     no change │
│ QQuery 30420.68ms │                       431.94ms │     no change │
│ QQuery 31571.58ms │                       528.87ms │ +1.08x faster │
│ QQuery 322585.80ms │                      2323.61ms │ +1.11x faster │
│ QQuery 332621.59ms │                      2517.60ms │     no change │
│ QQuery 343144.83ms │                      2840.70ms │ +1.11x faster │
│ QQuery 35855.71ms │                       728.25ms │ +1.18x faster │
│ QQuery 3680.10ms │                        22.64ms │ +3.54x faster │
│ QQuery 3735.71ms │                        21.68ms │ +1.65x faster │
│ QQuery 3879.06ms │                        21.89ms │ +3.61x faster │
│ QQuery 39123.78ms │                        22.69ms │ +5.46x faster │
│ QQuery 4029.03ms │                        21.98ms │ +1.32x faster │
│ QQuery 4127.67ms │                        21.95ms │ +1.26x faster │
│ QQuery 4225.98ms │                        22.51ms │ +1.15x faster │
└──────────────┴───────────┴────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)38166.21ms │
│ Total Time (test_default_parquet_push_down)38412.36ms │
│ Average Time (main)887.59ms │
│ Average Time (test_default_parquet_push_down)893.31ms │
│ Queries Faster16 │
│ Queries Slower11 │
│ Queries with No Change16 │
└───────────────────────────────────────────────┴────────────┘

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Apr 30, 2025

It helps part of the regression about the read record/skip record too dense, which is the original regression:

Here is the result for page cache without this PR:
#7363 (comment)
The regression will from Q24->28 and Q30 -> Q31.

Q30 / Q31 no regression now for current PR:

QQuery 30420.68ms │                       431.94ms │     no change │
│ QQuery 31571.58ms │                       528.87ms │ +1.08x faster │

But Q24 -> Q 28 still have regression, same with original result:

QQuery 24273.40ms │                       386.72ms │  1.41x slower │
│ QQuery 25274.14ms │                       370.83ms │  1.35x slower │
│ QQuery 26320.12ms │                       435.73ms │  1.36x slower │
│ QQuery 27900.06ms │                      1354.63ms │  1.51x slower │
│ QQuery 287812.82ms │                      9813.62ms │  1.26x slower │

@zhuqi-lucas
Copy link
Contributor Author

zhuqi-lucas commented Apr 30, 2025

I take Q27 for example, it looks like the regression still comes from decompress, even the page cache case, it still use more decompress time, which cause the regression. We need to investigate more about the regression.

  1. The no pushdown case:
    flamegraphFast

  2. The page cache case:

flamegraphPageCache

  1. Current PR case:
    flamegraphSlow

@alamb
Copy link
Contributor

alamb commented Apr 30, 2025

@alamb
Copy link
Contributor

alamb commented Apr 30, 2025

It helps part of the regression about the read record/skip record too dense, which is the original regression:

Here is the result for page cache without this PR: #7363 (comment) The regression will from Q24->28 and Q30 -> Q31.

Q30 / Q31 no regression now for current PR:

QQuery 30420.68ms │                       431.94ms │     no change │
│ QQuery 31571.58ms │                       528.87ms │ +1.08x faster │

https://github.com/apache/datafusion/blob/7b370e26fea75fcd17121272eec1bd9447b2cb8f/benchmarks/queries/clickbench/queries.sql#L31-L32

These queries have a predicate like

WHERE "SearchPhrase" <> ''

But SearchPhrase is not used except for filtering (aka it is not in the projection)

For example

SELECT "SearchEngineID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchEngineID", "ClientIP" ORDER BY c DESC LIMIT 10;

But Q24 -> Q 28 still have regression, same with original result:

QQuery 24273.40ms │                       386.72ms │  1.41x slower │
│ QQuery 25274.14ms │                       370.83ms │  1.35x slower │
│ QQuery 26320.12ms │                       435.73ms │  1.36x slower │
│ QQuery 27900.06ms │                      1354.63ms │  1.51x slower │
│ QQuery 287812.82ms │                      9813.62ms │  1.26x slower │

https://github.com/apache/datafusion/blob/7b370e26fea75fcd17121272eec1bd9447b2cb8f/benchmarks/queries/clickbench/queries.sql#L25-L29

These queries have the same predicate

WHERE "SearchPhrase" <> ''

But in this case SearchPhrase is also used in the rest of the query (and thus the projection)

For example

SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY "EventTime" LIMIT 10;

@zhuqi-lucas
Copy link
Contributor Author

It helps part of the regression about the read record/skip record too dense, which is the original regression:
Here is the result for page cache without this PR: #7363 (comment) The regression will from Q24->28 and Q30 -> Q31.
Q30 / Q31 no regression now for current PR:

QQuery 30420.68ms │                       431.94ms │     no change │
│ QQuery 31571.58ms │                       528.87ms │ +1.08x faster │

https://github.com/apache/datafusion/blob/7b370e26fea75fcd17121272eec1bd9447b2cb8f/benchmarks/queries/clickbench/queries.sql#L31-L32

These queries have a predicate like

WHERE "SearchPhrase" <> ''

But SearchPhrase is not used except for filtering (aka it is not in the projection)

For example

SELECT "SearchEngineID", "ClientIP", COUNT(*) AS c, SUM("IsRefresh"), AVG("ResolutionWidth") FROM hits WHERE "SearchPhrase" <> '' GROUP BY "SearchEngineID", "ClientIP" ORDER BY c DESC LIMIT 10;

But Q24 -> Q 28 still have regression, same with original result:

QQuery 24273.40ms │                       386.72ms │  1.41x slower │
│ QQuery 25274.14ms │                       370.83ms │  1.35x slower │
│ QQuery 26320.12ms │                       435.73ms │  1.36x slower │
│ QQuery 27900.06ms │                      1354.63ms │  1.51x slower │
│ QQuery 287812.82ms │                      9813.62ms │  1.26x slower │

https://github.com/apache/datafusion/blob/7b370e26fea75fcd17121272eec1bd9447b2cb8f/benchmarks/queries/clickbench/queries.sql#L25-L29

These queries have the same predicate

WHERE "SearchPhrase" <> ''

But in this case SearchPhrase is also used in the rest of the query (and thus the projection)

For example

SELECT "SearchPhrase" FROM hits WHERE "SearchPhrase" <> '' ORDER BY "EventTime" LIMIT 10;

Thank you @alamb , good finding, so in theory we can combine the unified select(this PR) and also the page cache, in theory we can get the best performance until now. I will try to do a poc.

/// Unlike intersection, the `other` [`BooleanRowSelection`] must have exactly as many set bits as `self`.
/// This method will keep only the bits in `self` that are also set in `other`
/// at the positions corresponding to `self`'s set bits.
pub fn and_then(&self, other: &Self) -> Self {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be able to use bitwise and instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will refactor all this new file to the enum file, and we can remove this separate boolean selector.

@zhuqi-lucas
Copy link
Contributor Author

I take Q27 for example, it looks like the regression still comes from decompress, even the page cache case, it still use more decompress time, which cause the regression. We need to investigate more about the regression.

  1. The no pushdown case:
    flamegraphFast
  2. The page cache case:

flamegraphPageCache

  1. Current PR case:
    flamegraphSlow

Updated, after investigation, i found the root cause for page cache PR use more time to decode pages, i will try to update the polish_page_cache PR to address the changes, i hope we can solve all the regression for the page cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants