Skip to content

Speedup filter_bytes ~-20-40%, filter_native low selectivity (~-37%) #7463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Dandandan
Copy link
Contributor

@Dandandan Dandandan commented May 4, 2025

Which issue does this PR close?

Closes #7465

  • We can precalculate capacity for the
  • Use Vec api for filter_native to generate some faster code.

Rationale for this change

filter context string (kept 1/2)
                        time:   [494.54 µs 496.78 µs 499.12 µs]
                        change: [-4.6353% -2.8403% -1.1083%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

filter context string high selectivity (kept 1023/1024)
                        time:   [655.05 µs 657.74 µs 660.21 µs]
                        change: [-42.752% -41.103% -39.592%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  5 (5.00%) low severe
  8 (8.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

filter context string low selectivity (kept 1/1024)
                        time:   [616.29 ns 617.36 ns 618.40 ns]
                        change: [-26.996% -26.638% -26.282%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe


filter context i32 low selectivity (kept 1/1024)
                        time:   [139.28 ns 139.80 ns 140.41 ns]
                        change: [-37.655% -37.164% -36.612%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label May 4, 2025
@Dandandan Dandandan changed the title Speedup filter_bytes Speedup filter_bytes ~20-40% May 4, 2025
@Dandandan Dandandan changed the title Speedup filter_bytes ~20-40% Speedup filter_bytes ~20-40%, filter native low selectivity (~-37%) May 4, 2025
@Dandandan Dandandan changed the title Speedup filter_bytes ~20-40%, filter native low selectivity (~-37%) Speedup filter_bytes ~20-40%, filter_native low selectivity (~-37%) May 4, 2025
@Dandandan Dandandan changed the title Speedup filter_bytes ~20-40%, filter_native low selectivity (~-37%) Speedup filter_bytes ~20-40%, filter_native low selectivity (~-37%) May 4, 2025
@Dandandan Dandandan changed the title Speedup filter_bytes ~20-40%, filter_native low selectivity (~-37%) Speedup filter_bytes ~-20-40%, filter_native low selectivity (~-37%) May 4, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan -- I have queued up some benchmark runs on this branch too. The code makes sense to me, though I have a suggestion to make the code easier to understand.

@alamb
Copy link
Contributor

alamb commented May 5, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_filters (816bab2) to 4491b17 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --all-features --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=speedup_filters
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented May 5, 2025

🤖: Benchmark completed

Details

group                                                                         main                                   speedup_filters
-----                                                                         ----                                   ---------------
filter context decimal128 (kept 1/2)                                          1.15     45.7±5.70µs        ? ?/sec    1.00     39.8±0.46µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.04     52.1±1.46µs        ? ?/sec    1.00     50.0±0.98µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    276.2±0.84ns        ? ?/sec    1.02    281.5±1.88ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     70.2±0.20µs        ? ?/sec    1.30     91.1±0.22µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.00     14.1±0.39µs        ? ?/sec    1.04     14.7±0.44µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00    505.0±0.94ns        ? ?/sec    1.00    506.6±0.52ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     42.5±0.07µs        ? ?/sec    1.66     70.7±0.10µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     42.5±0.05µs        ? ?/sec    1.67     70.8±0.24µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     42.5±0.11µs        ? ?/sec    1.67     71.0±0.82µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     42.5±0.08µs        ? ?/sec    1.67     70.8±0.06µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     42.5±0.10µs        ? ?/sec    1.66     70.7±0.11µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     42.7±1.43µs        ? ?/sec    1.66     70.8±0.54µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     42.6±0.42µs        ? ?/sec    1.66     70.8±0.09µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     42.5±0.10µs        ? ?/sec    1.67     70.8±0.33µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     42.5±0.11µs        ? ?/sec    1.66     70.7±0.16µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     19.1±0.07µs        ? ?/sec    1.19     22.7±0.09µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.04      6.5±0.43µs        ? ?/sec    1.00      6.3±0.45µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.02    272.7±0.46ns        ? ?/sec    1.00    268.3±0.41ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     62.4±0.23µs        ? ?/sec    1.51     94.5±0.29µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00     14.4±0.47µs        ? ?/sec    1.02     14.7±0.40µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.00    485.3±2.72ns        ? ?/sec    1.30    628.8±3.16ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.00    698.9±6.38µs        ? ?/sec    1.02    712.1±3.97µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.01   1230.4±2.78µs        ? ?/sec    1.00   1217.4±2.54µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00   1202.8±1.96ns        ? ?/sec    1.02   1223.9±2.35ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    444.1±2.67µs        ? ?/sec    1.05    468.0±1.57µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00    785.1±2.96µs        ? ?/sec    1.00    781.9±1.62µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    946.8±1.28ns        ? ?/sec    1.03    979.1±1.72ns        ? ?/sec
filter context string (kept 1/2)                                              1.00   829.5±17.60µs        ? ?/sec    1.02   849.1±24.04µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00    243.2±1.12µs        ? ?/sec    1.01    246.2±1.23µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00    248.1±0.67µs        ? ?/sec    1.00    248.8±0.80µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.00    201.8±0.25µs        ? ?/sec    1.00    201.0±0.70µs        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00    337.8±0.83µs        ? ?/sec    1.11    373.6±0.99µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00    460.5±0.94µs        ? ?/sec    1.02    469.0±0.93µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00    101.2±0.20µs        ? ?/sec    1.00    100.9±0.19µs        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.00  1344.1±56.37µs        ? ?/sec    1.01  1363.2±40.28µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.27   1943.2±5.30ns        ? ?/sec    1.00   1532.6±3.15ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     18.3±0.06µs        ? ?/sec    1.04     19.0±0.06µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.00  1875.2±11.43ns        ? ?/sec    1.00   1870.0±9.57ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.06    274.9±0.61ns        ? ?/sec    1.00    260.1±0.49ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     61.5±0.23µs        ? ?/sec    1.47     90.4±0.24µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      9.0±0.02µs        ? ?/sec    1.07      9.7±0.02µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.00    597.6±1.96ns        ? ?/sec    1.02    609.8±1.87ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00     92.9±0.31µs        ? ?/sec    1.04     96.9±0.37µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.02     54.8±1.01µs        ? ?/sec    1.00     53.9±1.66µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00      2.4±0.00µs        ? ?/sec    1.03      2.4±0.02µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.15    228.9±0.40µs        ? ?/sec    1.00    199.9±0.39µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.00    147.5±0.46µs        ? ?/sec    1.02    150.2±0.64µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.00     70.4±1.32µs        ? ?/sec    1.01     70.9±1.82µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00      2.5±0.01µs        ? ?/sec    1.04      2.6±0.00µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    142.1±0.27µs        ? ?/sec    1.07    152.6±0.30µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00     11.0±0.56µs        ? ?/sec    1.02     11.2±0.70µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00      2.5±0.01µs        ? ?/sec    1.02      2.6±0.00µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.00    184.3±4.41µs        ? ?/sec    1.06   195.2±10.35µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.01    211.6±5.70µs        ? ?/sec    1.00    209.0±8.56µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00      2.5±0.00µs        ? ?/sec    1.02      2.6±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     91.2±0.17µs        ? ?/sec    1.01     91.7±0.14µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.01      8.7±0.34µs        ? ?/sec    1.00      8.6±0.38µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.00      2.4±0.01µs        ? ?/sec    1.04      2.5±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.01     92.3±0.39µs        ? ?/sec    1.00     91.1±0.13µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.04      3.2±0.01µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00      2.3±0.02µs        ? ?/sec    1.00      2.3±0.01µs        ? ?/sec
filter run array (kept 1/2)                                                   1.00    375.1±0.97µs        ? ?/sec    1.17    440.2±0.87µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.00    349.4±0.80µs        ? ?/sec    1.19    416.3±1.23µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    246.6±1.06µs        ? ?/sec    1.28    315.0±1.24µs        ? ?/sec
filter single record batch                                                    1.00     91.7±0.17µs        ? ?/sec    1.04     95.2±0.15µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.00     91.6±0.09µs        ? ?/sec    1.01     92.9±0.10µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.01      4.1±0.03µs        ? ?/sec    1.00      4.0±0.04µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00      2.4±0.00µs        ? ?/sec    1.02      2.5±0.01µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented May 5, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing speedup_filters (2f2905e) to 7905545 diff
BENCH_NAME=filter_kernels
BENCH_COMMAND=cargo bench --all-features --bench filter_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=speedup_filters
Results will be posted here when complete

@Dandandan
Copy link
Contributor Author

These are my results:

group                                                                         before                                 speedup_filters
-----                                                                         ------                                 ---------------
filter context decimal128 (kept 1/2)                                          1.00     18.9±0.06µs        ? ?/sec    1.06     20.0±0.04µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     22.4±0.73µs        ? ?/sec    1.04     23.4±1.04µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.12    208.8±0.83ns        ? ?/sec    1.00    186.4±0.82ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.01     38.9±0.55µs        ? ?/sec    1.00     38.5±0.20µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.00      8.6±0.04µs        ? ?/sec    1.00      8.5±0.04µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.21   398.3±11.48ns        ? ?/sec    1.00    328.3±3.55ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.01     27.9±0.42µs        ? ?/sec    1.00     27.6±0.11µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.01     28.1±0.41µs        ? ?/sec    1.00     27.7±0.08µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.01     27.9±0.45µs        ? ?/sec    1.00     27.7±0.09µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.01     27.9±0.41µs        ? ?/sec    1.00     27.6±0.09µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.01     27.9±0.47µs        ? ?/sec    1.00     27.6±0.09µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.01     28.0±0.51µs        ? ?/sec    1.00     27.7±0.31µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.01     28.0±0.59µs        ? ?/sec    1.00     27.7±0.11µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.01     28.0±0.51µs        ? ?/sec    1.00     27.6±0.08µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.01     28.0±0.55µs        ? ?/sec    1.00     27.6±0.08µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     10.8±0.08µs        ? ?/sec    1.00     10.8±0.07µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.00      6.3±0.02µs        ? ?/sec    1.00      6.3±0.04µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.58    216.5±7.54ns        ? ?/sec    1.00    137.2±0.54ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     38.7±0.52µs        ? ?/sec    1.00     38.6±0.15µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00      8.6±0.06µs        ? ?/sec    1.00      8.5±0.05µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.20   386.4±10.14ns        ? ?/sec    1.00   323.1±12.60ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.00     46.9±0.55µs        ? ?/sec    1.02     47.7±0.10µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.07     25.9±0.76µs        ? ?/sec    1.00     24.3±0.74µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00   456.8±14.09ns        ? ?/sec    1.01    459.9±2.13ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00     46.9±0.58µs        ? ?/sec    1.02     47.6±0.14µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.04     26.0±1.03µs        ? ?/sec    1.00     25.0±0.85µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.02    379.7±6.89ns        ? ?/sec    1.00    372.7±1.36ns        ? ?/sec
filter context string (kept 1/2)                                              1.06   531.2±19.63µs        ? ?/sec    1.00    501.2±6.84µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00     11.2±0.06µs        ? ?/sec    1.01     11.3±0.07µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00      6.7±0.01µs        ? ?/sec    1.01      6.7±0.06µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.14   542.8±16.33ns        ? ?/sec    1.00    475.9±2.08ns        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.01     39.2±0.57µs        ? ?/sec    1.00     39.0±0.13µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00      8.9±0.04µs        ? ?/sec    1.00      9.0±0.06µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.09    735.7±8.96ns        ? ?/sec    1.00   673.2±11.45ns        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.72  1115.3±26.34µs        ? ?/sec    1.00   648.2±11.23µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.40   876.1±29.56ns        ? ?/sec    1.00    625.4±8.54ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     10.4±0.14µs        ? ?/sec    1.00     10.4±0.03µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.04  1325.0±21.27ns        ? ?/sec    1.00   1276.0±9.11ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.55   212.4±10.00ns        ? ?/sec    1.00    137.3±1.43ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.01     38.3±0.41µs        ? ?/sec    1.00     38.1±0.14µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      3.9±0.04µs        ? ?/sec    1.00      3.9±0.04µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.14   369.4±15.02ns        ? ?/sec    1.00   324.0±11.80ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00    111.4±1.44µs        ? ?/sec    1.00    111.1±0.46µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.00     23.3±0.95µs        ? ?/sec    1.04     24.3±0.73µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00   1209.0±6.74ns        ? ?/sec    1.00  1210.8±27.60ns        ? ?/sec
filter f32 (kept 1/2)                                                         1.00    219.0±0.71µs        ? ?/sec    1.00    219.4±0.65µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.00    199.9±0.61µs        ? ?/sec    1.00    199.8±1.10µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.06     31.0±1.45µs        ? ?/sec    1.00     29.3±1.43µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00   1468.9±5.74ns        ? ?/sec    1.00   1464.0±6.52ns        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    170.0±1.02µs        ? ?/sec    1.00    170.2±0.86µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.00      8.0±0.04µs        ? ?/sec    1.00      8.0±0.04µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.03  1491.6±14.11ns        ? ?/sec    1.00   1454.5±9.87ns        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.00    247.7±4.07µs        ? ?/sec    1.08   267.8±21.41µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.01    273.4±9.45µs        ? ?/sec    1.00    271.8±8.94µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00   1449.8±6.93ns        ? ?/sec    1.00   1444.1±6.50ns        ? ?/sec
filter i32 (kept 1/2)                                                         1.00    105.9±0.38µs        ? ?/sec    1.00    106.0±0.43µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.00      6.8±0.03µs        ? ?/sec    1.00      6.8±0.02µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.01   1209.9±7.49ns        ? ?/sec    1.00  1199.2±10.75ns        ? ?/sec
filter optimize (kept 1/2)                                                    1.00    106.9±0.94µs        ? ?/sec    1.00    106.8±0.32µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.00  1202.3±11.28ns        ? ?/sec    1.00   1206.0±6.68ns        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.00   1072.9±5.60ns        ? ?/sec    1.01   1081.8±5.69ns        ? ?/sec
filter run array (kept 1/2)                                                   1.00    258.2±0.77µs        ? ?/sec    1.00    259.0±1.96µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.01    179.9±2.86µs        ? ?/sec    1.00    177.9±1.08µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    136.9±0.96µs        ? ?/sec    1.00    136.9±0.93µs        ? ?/sec
filter single record batch                                                    1.00    105.9±0.22µs        ? ?/sec    1.00    106.2±0.28µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.00    105.2±0.32µs        ? ?/sec    1.01    105.9±0.21µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.00      2.3±0.01µs        ? ?/sec    1.00      2.3±0.01µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00  1217.2±14.52ns        ? ?/sec    1.00  1217.4±11.08ns        ? ?/sec

@Dandandan
Copy link
Contributor Author

🤖: Benchmark completed

Details

Somehow these results are all over the place, even for untouched kernels 🤔

@alamb
Copy link
Contributor

alamb commented May 5, 2025

🤖: Benchmark completed

Details

group                                                                         main                                   speedup_filters
-----                                                                         ----                                   ---------------
filter context decimal128 (kept 1/2)                                          1.02     42.5±4.77µs        ? ?/sec    1.00     41.5±2.49µs        ? ?/sec
filter context decimal128 high selectivity (kept 1023/1024)                   1.00     49.9±1.07µs        ? ?/sec    1.02     51.1±1.00µs        ? ?/sec
filter context decimal128 low selectivity (kept 1/1024)                       1.00    277.3±0.40ns        ? ?/sec    1.04    289.0±2.36ns        ? ?/sec
filter context f32 (kept 1/2)                                                 1.00     70.2±0.21µs        ? ?/sec    1.30     91.5±0.37µs        ? ?/sec
filter context f32 high selectivity (kept 1023/1024)                          1.00     14.0±0.64µs        ? ?/sec    1.01     14.2±0.39µs        ? ?/sec
filter context f32 low selectivity (kept 1/1024)                              1.00    599.0±1.19ns        ? ?/sec    1.04    625.6±4.07ns        ? ?/sec
filter context fsb with value length 20 (kept 1/2)                            1.00     42.5±0.08µs        ? ?/sec    1.67     71.0±0.19µs        ? ?/sec
filter context fsb with value length 20 high selectivity (kept 1023/1024)     1.00     42.5±0.09µs        ? ?/sec    1.67     71.0±0.38µs        ? ?/sec
filter context fsb with value length 20 low selectivity (kept 1/1024)         1.00     42.5±0.07µs        ? ?/sec    1.67     71.0±0.10µs        ? ?/sec
filter context fsb with value length 5 (kept 1/2)                             1.00     42.5±0.10µs        ? ?/sec    1.67     71.0±0.19µs        ? ?/sec
filter context fsb with value length 5 high selectivity (kept 1023/1024)      1.00     42.5±0.06µs        ? ?/sec    1.67     71.0±0.11µs        ? ?/sec
filter context fsb with value length 5 low selectivity (kept 1/1024)          1.00     42.5±0.09µs        ? ?/sec    1.67     71.1±0.39µs        ? ?/sec
filter context fsb with value length 50 (kept 1/2)                            1.00     42.5±0.07µs        ? ?/sec    1.67     71.0±0.11µs        ? ?/sec
filter context fsb with value length 50 high selectivity (kept 1023/1024)     1.00     42.5±0.07µs        ? ?/sec    1.67     71.0±0.09µs        ? ?/sec
filter context fsb with value length 50 low selectivity (kept 1/1024)         1.00     42.5±0.06µs        ? ?/sec    1.67     71.0±0.08µs        ? ?/sec
filter context i32 (kept 1/2)                                                 1.00     19.0±0.07µs        ? ?/sec    1.20     22.8±0.06µs        ? ?/sec
filter context i32 high selectivity (kept 1023/1024)                          1.00      6.5±0.31µs        ? ?/sec    1.00      6.5±0.24µs        ? ?/sec
filter context i32 low selectivity (kept 1/1024)                              1.35    367.8±0.48ns        ? ?/sec    1.00    273.0±0.74ns        ? ?/sec
filter context i32 w NULLs (kept 1/2)                                         1.00     62.5±0.27µs        ? ?/sec    1.52     94.7±0.30µs        ? ?/sec
filter context i32 w NULLs high selectivity (kept 1023/1024)                  1.00     13.6±0.44µs        ? ?/sec    1.07     14.6±0.38µs        ? ?/sec
filter context i32 w NULLs low selectivity (kept 1/1024)                      1.14    581.3±0.81ns        ? ?/sec    1.00    511.4±0.59ns        ? ?/sec
filter context mixed string view (kept 1/2)                                   1.00    695.3±6.02µs        ? ?/sec    1.03    717.8±5.42µs        ? ?/sec
filter context mixed string view high selectivity (kept 1023/1024)            1.01  1223.7±11.10µs        ? ?/sec    1.00   1217.2±5.83µs        ? ?/sec
filter context mixed string view low selectivity (kept 1/1024)                1.00   1198.1±2.26ns        ? ?/sec    1.04   1240.6±3.54ns        ? ?/sec
filter context short string view (kept 1/2)                                   1.00    446.0±5.63µs        ? ?/sec    1.05    469.4±3.54µs        ? ?/sec
filter context short string view high selectivity (kept 1023/1024)            1.00    784.7±2.09µs        ? ?/sec    1.00    782.8±2.09µs        ? ?/sec
filter context short string view low selectivity (kept 1/1024)                1.00    943.3±1.30ns        ? ?/sec    1.05    993.3±1.89ns        ? ?/sec
filter context string (kept 1/2)                                              1.00   814.8±18.56µs        ? ?/sec    1.06   860.4±22.78µs        ? ?/sec
filter context string dictionary (kept 1/2)                                   1.00    243.6±1.25µs        ? ?/sec    1.02    247.4±2.11µs        ? ?/sec
filter context string dictionary high selectivity (kept 1023/1024)            1.00    248.9±0.84µs        ? ?/sec    1.00    247.7±0.98µs        ? ?/sec
filter context string dictionary low selectivity (kept 1/1024)                1.01    202.2±0.98µs        ? ?/sec    1.00    201.0±0.35µs        ? ?/sec
filter context string dictionary w NULLs (kept 1/2)                           1.00    337.8±0.73µs        ? ?/sec    1.11    373.6±0.67µs        ? ?/sec
filter context string dictionary w NULLs high selectivity (kept 1023/1024)    1.00    460.5±0.84µs        ? ?/sec    1.02    468.8±0.99µs        ? ?/sec
filter context string dictionary w NULLs low selectivity (kept 1/1024)        1.00    101.2±0.12µs        ? ?/sec    1.00    101.0±0.55µs        ? ?/sec
filter context string high selectivity (kept 1023/1024)                       1.03  1339.4±34.66µs        ? ?/sec    1.00  1304.0±45.92µs        ? ?/sec
filter context string low selectivity (kept 1/1024)                           1.06   1620.4±2.47ns        ? ?/sec    1.00   1534.0±2.82ns        ? ?/sec
filter context u8 (kept 1/2)                                                  1.00     18.3±0.03µs        ? ?/sec    1.04     19.0±0.03µs        ? ?/sec
filter context u8 high selectivity (kept 1023/1024)                           1.12      2.1±0.01µs        ? ?/sec    1.00  1854.3±14.34ns        ? ?/sec
filter context u8 low selectivity (kept 1/1024)                               1.03    276.8±2.04ns        ? ?/sec    1.00    269.0±0.49ns        ? ?/sec
filter context u8 w NULLs (kept 1/2)                                          1.00     61.5±0.22µs        ? ?/sec    1.47     90.6±0.18µs        ? ?/sec
filter context u8 w NULLs high selectivity (kept 1023/1024)                   1.00      9.0±0.02µs        ? ?/sec    1.07      9.7±0.03µs        ? ?/sec
filter context u8 w NULLs low selectivity (kept 1/1024)                       1.14    701.2±1.15ns        ? ?/sec    1.00    616.5±1.71ns        ? ?/sec
filter decimal128 (kept 1/2)                                                  1.00     93.5±0.52µs        ? ?/sec    1.04     97.6±0.40µs        ? ?/sec
filter decimal128 high selectivity (kept 1023/1024)                           1.00     52.9±1.82µs        ? ?/sec    1.02     54.0±1.57µs        ? ?/sec
filter decimal128 low selectivity (kept 1/1024)                               1.00      2.4±0.01µs        ? ?/sec    1.03      2.4±0.00µs        ? ?/sec
filter f32 (kept 1/2)                                                         1.16    228.8±0.30µs        ? ?/sec    1.00    196.5±0.51µs        ? ?/sec
filter fsb with value length 20 (kept 1/2)                                    1.00    147.3±0.45µs        ? ?/sec    1.01    149.0±0.64µs        ? ?/sec
filter fsb with value length 20 high selectivity (kept 1023/1024)             1.02     71.1±2.77µs        ? ?/sec    1.00     69.5±0.88µs        ? ?/sec
filter fsb with value length 20 low selectivity (kept 1/1024)                 1.00      2.6±0.01µs        ? ?/sec    1.02      2.6±0.00µs        ? ?/sec
filter fsb with value length 5 (kept 1/2)                                     1.00    142.8±4.25µs        ? ?/sec    1.07    152.7±1.04µs        ? ?/sec
filter fsb with value length 5 high selectivity (kept 1023/1024)              1.02     11.0±0.28µs        ? ?/sec    1.00     10.8±0.59µs        ? ?/sec
filter fsb with value length 5 low selectivity (kept 1/1024)                  1.00      2.5±0.00µs        ? ?/sec    1.03      2.5±0.01µs        ? ?/sec
filter fsb with value length 50 (kept 1/2)                                    1.01    191.5±7.72µs        ? ?/sec    1.00    189.2±9.52µs        ? ?/sec
filter fsb with value length 50 high selectivity (kept 1023/1024)             1.10    220.3±3.90µs        ? ?/sec    1.00    200.7±5.84µs        ? ?/sec
filter fsb with value length 50 low selectivity (kept 1/1024)                 1.00      2.6±0.00µs        ? ?/sec    1.02      2.6±0.01µs        ? ?/sec
filter i32 (kept 1/2)                                                         1.00     91.2±0.25µs        ? ?/sec    1.00     91.7±0.13µs        ? ?/sec
filter i32 high selectivity (kept 1023/1024)                                  1.02      8.9±0.35µs        ? ?/sec    1.00      8.7±0.23µs        ? ?/sec
filter i32 low selectivity (kept 1/1024)                                      1.02      2.5±0.02µs        ? ?/sec    1.00      2.5±0.01µs        ? ?/sec
filter optimize (kept 1/2)                                                    1.01     92.4±0.49µs        ? ?/sec    1.00     91.4±0.21µs        ? ?/sec
filter optimize high selectivity (kept 1023/1024)                             1.02      3.2±0.00µs        ? ?/sec    1.00      3.1±0.01µs        ? ?/sec
filter optimize low selectivity (kept 1/1024)                                 1.01      2.3±0.01µs        ? ?/sec    1.00      2.2±0.00µs        ? ?/sec
filter run array (kept 1/2)                                                   1.00    374.0±1.16µs        ? ?/sec    1.18    441.6±0.79µs        ? ?/sec
filter run array high selectivity (kept 1023/1024)                            1.00    345.8±1.43µs        ? ?/sec    1.20    416.5±1.16µs        ? ?/sec
filter run array low selectivity (kept 1/1024)                                1.00    246.8±0.92µs        ? ?/sec    1.28    314.7±0.80µs        ? ?/sec
filter single record batch                                                    1.00     91.7±0.15µs        ? ?/sec    1.00     91.9±0.29µs        ? ?/sec
filter u8 (kept 1/2)                                                          1.00     91.7±0.25µs        ? ?/sec    1.00     92.1±0.20µs        ? ?/sec
filter u8 high selectivity (kept 1023/1024)                                   1.03      4.2±0.01µs        ? ?/sec    1.00      4.1±0.04µs        ? ?/sec
filter u8 low selectivity (kept 1/1024)                                       1.00      2.4±0.01µs        ? ?/sec    1.07      2.6±0.00µs        ? ?/sec

@Dandandan
Copy link
Contributor Author

Dandandan commented May 5, 2025

🤖: Benchmark completed

Details

🤯

@alamb
Copy link
Contributor

alamb commented May 6, 2025

I can try to reproduce the results manually -- it could be:

  1. I am using a crappy GCP VM and there is cross talk or someting
  2. The benchmarks are tool small and are inherently noisy (too fast to be reliably sampled)
  3. Something else 🤔

These are my results:

What are the specs of your machine @Dandandan ?

@Dandandan
Copy link
Contributor Author

I can try to reproduce the results manually -- it could be:

  1. I am using a crappy GCP VM and there is cross talk or someting
  2. The benchmarks are tool small and are inherently noisy (too fast to be reliably sampled)
  3. Something else 🤔

These are my results:

What are the specs of your machine @Dandandan ?

I run it on:
Apple M1 Pro
32 GB

It's weird to me some of the benchmarks like ("filter context fsb") are reporting 1.67x slowdown, because nothing has changed in the code path (I expect only filtering bytes and everything using filter_native (e.g. primitives / views) can be affected, but other benchmarks are weird and in my runs remain the same).

@Dandandan
Copy link
Contributor Author

I wonder what gcp vm are you using? No machine that uses bursting by any chance?
Are the results impacted by the ordering of execution? (PR => main vs main => PR?)

@alamb
Copy link
Contributor

alamb commented May 7, 2025

It is a c2-standard-16 (16 vCPUs, 64 GB Memory)

I don't think it has bursting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speedup filter_bytes by precalculating capacity
2 participants