Add short circuit evaluation for `AND` and `OR` #15462

acking-you · 2025-03-27T19:00:03Z

Which issue does this PR close?

Closes BinaryExpr evaluate lacks optimization for Or and And scenarios #11212.

Rationale for this change

What changes are included in this PR?

Add an extended SQL statement targeting this optimization.
Provide a description of this extended SQL.
Using this optimization, the extended SQL achieves a 2X performance improvement. (If the string length in the filter is longer, the effect can increase linearly,such as 500X in BinaryExpr evaluate lacks optimization for Or and And scenarios #11212 (comment) .)

Below is the performance comparison of running the extended SQL locally. It seems there is also some improvement in Q4 (maybe noise).

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ add_short_circuit ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  964.48ms │         1006.79ms │     no change │
│ QQuery 1     │  409.89ms │          419.52ms │     no change │
│ QQuery 2     │  838.25ms │          868.74ms │     no change │
│ QQuery 3     │  408.15ms │          396.88ms │     no change │
│ QQuery 4     │ 1029.80ms │          783.40ms │ +1.31x faster │
│ QQuery 5     │ 9429.76ms │         8835.06ms │ +1.07x faster │
│ QQuery 6     │ 4096.47ms │         1382.42ms │ +2.96x faster │
└──────────────┴───────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                │ 17176.80ms │
│ Total Time (add_short_circuit)   │ 13692.80ms │
│ Average Time (main)              │  2453.83ms │
│ Average Time (add_short_circuit) │  1956.11ms │
│ Queries Faster                   │          3 │
│ Queries Slower                   │          0 │
│ Queries with No Change           │          4 │
└──────────────────────────────────┴────────────┘

At the same time, while creating this SQL, I also discovered a bug — one of the filter caused a panic: #15461

Are these changes tested?

Are there any user-facing changes?

acking-you · 2025-03-27T19:18:54Z

Some tests failed, so let me take a look at what exactly is going on.

ctsk · 2025-03-28T08:44:54Z

I think one issue is that the short-circuit logic is not handling cases where the the rhs contains NULLs. E.g. true OR NULL needs to evaluate to NULL

acking-you · 2025-03-28T11:48:07Z

I think one issue is that the short-circuit logic is not handling cases where the the rhs contains NULLs. E.g. true OR NULL needs to evaluate to NULL

Thank you very much for your hint, it will be very helpful for me to fix these tests!

acking-you · 2025-03-28T16:14:13Z

I think one issue is that the short-circuit logic is not handling cases where the the rhs contains NULLs. E.g. true OR NULL needs to evaluate to NULL

After taking a closer look, in fact, the situation you mentioned does not actually lead to the short-circuit optimization logic.
The error I’m currently seeing is the use of true_count() == 0 to determine if it is false, but in reality, it could also be null.

ctsk · 2025-03-28T17:32:08Z

You're absolutely right, I got my logic wrong there. Embarrasing!

acking-you · 2025-03-28T17:35:16Z

You're absolutely right, I got my logic wrong there. Embarrasing!

It's okay. You've also taught me a lot. When I first started writing this, I really didn't consider the case of null

acking-you · 2025-03-28T17:39:21Z

Hello @alamb, the optimization SQL and documentation related to this PR have been completed, and all tests have passed. We may need to formally verify the performance, but I'm not quite sure how to do that (I can only run it locally).

alamb

Thanks @acking-you -- this looks really nice.

I think this needs some tests but otherwise it is looking quite nice.

Also, could you please add the new Q6 benchmark in a separate PR so I can more easily run my benchmark scripts before/after your code change?

alamb · 2025-03-28T18:11:08Z

benchmarks/queries/clickbench/README.md

+
+### Q6: How many social shares meet complex multi-stage filtering criteria?
+**Question**: What is the count of sharing actions from iPhone mobile users on specific social networks, within common timezones, participating in seasonal campaigns, with high screen resolutions and closely matched UTM parameters?
+**Important Query Properties**: Simple filter with high-selectivity, Costly string matching, A large number of filters with high overhead are positioned relatively later in the process


alamb · 2025-03-28T18:13:39Z

datafusion/physical-expr/src/expressions/binary.rs

@@ -358,7 +358,50 @@ impl PhysicalExpr for BinaryExpr {
    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
        use arrow::compute::kernels::numeric::*;

+        fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool {


Thanks @acking-you -- this looks great

Is there any reason to have this function defined in the evaluate method? I think we could just make it a normal function and reduce the nesting level

If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example)

Is there any reason to have this function defined in the evaluate method?

There was no particular reason. Maybe I couldn't find a suitable place to write it at the time, haha. Where do you think this function should be placed?

If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example)

I also agree that

I've moved the function outside and added some comments.

datafusion/physical-expr/src/expressions/binary.rs

acking-you · 2025-03-29T18:56:33Z

Also, could you please add the new Q6 benchmark in a separate PR so I can more easily run my benchmark scripts before/after your code change?

Okey,I got it.Do you mean that Q6 and its related description in the current branch need to be completed with a separate PR? But this way, it seems that you would still need to cherry-pick Q6 to the corresponding branch when testing.

…ultiple benchmarks (apache#14642) * Add support --mem-pool-type and --memory-limit options for all benchmarks * Add --sort-spill-reservation-bytes option

* Add unit tests to FFI_ExecutionPlan * Add unit tests for FFI table source * Add round trip tests for volatility * Add unit tests for FFI insert op * Simplify string generation in unit test Co-authored-by: Andrew Lamb <[email protected]> * Fix drop of borrowed value --------- Co-authored-by: Andrew Lamb <[email protected]>

…ns` (apache#15203)

…ultiple benchmarks (apache#14642) * Add support --mem-pool-type and --memory-limit options for all benchmarks * Add --sort-spill-reservation-bytes option

* Add unit tests to FFI_ExecutionPlan * Add unit tests for FFI table source * Add round trip tests for volatility * Add unit tests for FFI insert op * Simplify string generation in unit test Co-authored-by: Andrew Lamb <[email protected]> * Fix drop of borrowed value --------- Co-authored-by: Andrew Lamb <[email protected]>

…ns` (apache#15203)

acking-you · 2025-03-31T03:19:34Z

Also, could you please add the new Q6 benchmark in a separate PR so I can more easily run my benchmark scripts before/after your code change?

I have successfully split Q6 into: #15500
Thank you very much for your CR. @alamb

Dandandan · 2025-03-31T07:52:50Z

datafusion/physical-expr/src/expressions/binary.rs

+            match arg {
+                ColumnarValue::Array(array) => {
+                    if let Ok(array) = as_boolean_array(&array) {
+                        return array.false_count() == array.len();


I remember this had some overhead (for calculating the counts) from a previous try.
I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap != 0)

I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap != 0)

I think the overhead added here should be very small (the compiler optimization should work well), and the test results we discussed before were sometimes fast and sometimes slow (maybe noise).

Your suggestion of making an early judgment and returning false seems like a good idea, but I'm not sure if it will be effective.
The concern I have with this approach is that it requires adding an if condition inside the for loop, which will most likely disable the compiler's SIMD instruction optimization (I've encountered a similar situation before, and I had to manually unroll the SIMD...).

Either way, we can use bool_and (https://docs.rs/arrow/latest/arrow/compute/fn.bool_and.html) and bool_or which operates on u64 values to test performance changes.

Either way, we can use bool_and (https://docs.rs/arrow/latest/arrow/compute/fn.bool_and.html) and bool_or which operates on u64 values to test performance changes.

Thank you for your suggestion. I will try it later.

Might be overkill, but one could try a sampling approach: Run the loop with the early exit for the first few chunks, and then switch over to the unconditional loop.

Almost seems like something the compiler could automagically do...

Might be overkill, but one could try a sampling approach: Run the loop with the early exit for the first few chunks, and then switch over to the unconditional loop.

Thank you for your suggestion, but if we're only applying conditional checks to the first few blocks, then I feel this optimization might not be meaningful. If nearly all blocks can be filtered out by the preceding filter, the optimization will no longer be effective.

If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example)

I tend to agree with @alamb's point that if the overhead of verification is somewhat unacceptable, adopting some heuristic approaches would be better.

I looked more carefully at bool_or and I do think it would be faster than this implementation on the case where there are some true values (as it stops as soon as it finds a single non zero): https://docs.rs/arrow/latest/arrow/compute/fn.bool_or.html

* Add unit tests to FFI_ExecutionPlan * Add unit tests for FFI table source * Add round trip tests for volatility * Add unit tests for FFI insert op * Simplify string generation in unit test Co-authored-by: Andrew Lamb <[email protected]> * Fix drop of borrowed value --------- Co-authored-by: Andrew Lamb <[email protected]>

…ns` (apache#15203)

acking-you · 2025-04-07T11:22:07Z

I sincerely apologize for the delay in updating this PR. I have now designed a detailed comparative test for the bool_or/bool_and issue described by @Dandandan. The related results are as follows. I’m not sure what you think about these results — should we continue using true_count/false_count, or look for a better solution? @alamb @Dandandan

I designed six distribution scenarios(code) for the true_count/false_count and bool_or/bool_and of the boolean array to conduct performance testing comparisons. The final results are as follows:

Test Case	true_count/false_count	bool_or/bool_and	Change	Performance Impact
all_false	1.6593 µs - 1.6695 µs	4.1013 µs - 4.1774 µs	150.91%	🚫 Regressed
one_true_first	1.6726 µs - 1.6771 µs	1.6885 ns - 1.7430 ns	-99.898%	✅ Improved
one_true_last	1.6663 µs - 1.6714 µs	4.1096 µs - 4.1554 µs	+147.14%	🚫 Regressed
one_true_middle	1.6723 µs - 1.6819 µs	2.1505 µs - 2.2117 µs	+30.180%	🚫 Regressed
one_true_middle_left	1.6672 µs - 1.6727 µs	1.1088 µs - 1.1483 µs	-32.995%	✅ Improved
one_true_middle_right	1.6689 µs - 1.6741 µs	3.1562 µs - 3.2521 µs	+93.762%	🚫 Regressed
all_true	1.6711 µs - 1.6747 µs	4.2779 µs - 4.4291 µs	+155.35%	🚫 Regressed
one_false_first	1.6722 µs - 1.6782 µs	1.6278 ns - 1.6360 ns	-99.903%	✅ Improved
one_false_last	1.6818 µs - 1.7512 µs	4.2175 µs - 4.2930 µs	+153.50%	🚫 Regressed
one_false_middle	1.8437 µs - 1.9665 µs	2.0575 µs - 2.0871 µs	+11.931%	🚫 Regressed
one_false_middle_left	2.0004 µs - 2.3194 µs	1.0243 µs - 1.0398 µs	-57.059%	✅ Improved
one_false_middle_right	2.0770 µs - 2.2721 µs	3.0275 µs - 3.0582 µs	+47.668%	🚫 Regressed

It can be seen that when false/true is located slightly to the left of the middle, bool_or/bool_and has a significant advantage, with up to 10^3 times the performance lead when in the first position.

However, in other cases, using true_count/false_count performs better, showing relatively stable behavior across various scenarios.

The test can be reproduced with the following command:

# test true_count/false_count 
TEST_BOOL_COUNT=1 cargo bench --bench boolean_op
# test bool_or/bool_and 
cargo bench --bench boolean_op

detail benchmark code: https://github.com/apache/datafusion/pull/15462/files#diff-8710f6b44dd74240d19e6fcdfbf971c034f59cd48c022e51b07ed876a8cc7c5e

alamb · 2025-04-07T11:35:48Z

The related results are as follows. I’m not sure what you think about these results — should we continue using true_count/false_count, or look for a better solution? @alamb @Dandandan

Thank you for your diligence @acking-you

I think we should merge this PR in (with the count bits and the benchmarks) and file a follow on ticket to potentially improve the performance in some other way.

I don't fully understand your performance results given that the two functions seem very similar -- maybe it has to do with option hanlding messing auto vectorization or something

https://docs.rs/arrow-array/54.3.1/src/arrow_array/array/boolean_array.rs.html#160
https://docs.rs/arrow-arith/54.3.1/src/arrow_arith/aggregate.rs.html#751

alamb · 2025-04-07T11:36:07Z

Security audit failure is not related to this PR

acking-you · 2025-04-07T11:40:28Z

I don't fully understand your performance results given that the two functions seem very similar -- maybe it has to do with option hanlding messing auto vectorization or something

Yes, I was also quite surprised. The only difference in the implementation of these two functions might be whether or not find was used.

alamb

Thank you so much @acking-you

I think the code in this PR is ready to go -- can you please adjust the benchmark code and then I'll merge this one in and file a follow on PR to explore other potential improvements?

Thanks again!

alamb · 2025-04-07T11:40:12Z

datafusion/physical-expr/benches/boolean_op.rs

+    cases
+}
+
+fn benchmark_boolean_ops(c: &mut Criterion) {


Can you please update these benchmarks for the performance of boolean expression evaluation? As of now, these are benchmarks for bool_and and bool_or?

In other words, make micro benchmarks for expressions like A AND B AND C AND D similar to what you added to the clickbench extended suite that shows the performance of expressions where the short circuit code improves performance.

We can then use those benchmarks to try out different ways to implementing the short circuting logic

alamb · 2025-04-07T11:40:15Z

datafusion/physical-expr/benches/boolean_op.rs

+    cases
+}
+
+fn benchmark_boolean_ops(c: &mut Criterion) {


Can you please update these benchmarks for the performance of boolean expression evaluation? As of now, these are benchmarks for bool_and and bool_or?

In other words, make micro benchmarks for expressions like A AND B AND C AND D similar to what you added to the clickbench extended suite that shows the performance of expressions where the short circuit code improves performance.

We can then use those benchmarks to try out different ways to implementing the short circuting logic

in other words, make micro benchmarks for expressions like A AND B AND C AND D similar to what you added to the clickbench extended suite that shows the performance of expressions where the short circuit code improves performance.

This is a good idea

acking-you · 2025-04-07T11:41:11Z

Security audit failure is not related to this PR

How can this be resolved?

Dandandan · 2025-04-07T12:11:01Z

The related results are as follows. I’m not sure what you think about these results — should we continue using true_count/false_count, or look for a better solution? @alamb @Dandandan

Thank you for your diligence @acking-you

I think we should merge this PR in (with the count bits and the benchmarks) and file a follow on ticket to potentially improve the performance in some other way.

I don't fully understand your performance results given that the two functions seem very similar -- maybe it has to do with option hanlding messing auto vectorization or something

https://docs.rs/arrow-array/54.3.1/src/arrow_array/array/boolean_array.rs.html#160 https://docs.rs/arrow-arith/54.3.1/src/arrow_arith/aggregate.rs.html#751

I think the likely part of it getting slower is short-circuiting whenever a true is observed (having a branch for each item to check).

For this case it might be interesting to compare it with array.values().bit_chunks().iter_padded().fold(true, |acc, i: u64| i != 0 && acc) (e.g. fully evaluate the entire array) and see if it is any faster than the true_count implementation.

For max_boolean it might be interesting to see if we can make it faster by doing both, early exiting and generating more efficient code. Maybe it could evaluate it per n values instead of per-value to benefit from faster code / branch prediction while still exiting early enough.

alamb · 2025-04-07T12:50:04Z

Security audit failure is not related to this PR

How can this be resolved?

it is tracked by security_audit CI check is failing on main #15571

There are ideas on there of how to resolve the isseu

To be clear I don't think this is blocking this PR from merging

The only thing I think is needed for this PR is to either

remove the bechmark
update the benchmark ti actually test BinaryExpr evaluation time

acking-you · 2025-04-07T14:25:12Z

The only thing I think is needed for this PR is to either

remove the bechmark

update the benchmark ti actually test BinaryExpr evaluation time

done

acking-you · 2025-04-07T18:18:58Z

I think the likely part of it getting slower is short-circuiting whenever a true is observed (having a branch for each item to check).

For this case it might be interesting to compare it with array.values().bit_chunks().iter_padded().fold(true, |acc, i: u64| i != 0 && acc) (e.g. fully evaluate the entire array) and see if it is any faster than the true_count implementation.

For max_boolean it might be interesting to see if we can make it faster by doing both, early exiting and generating more efficient code. Maybe it could evaluate it per n values instead of per-value to benefit from faster code / branch prediction while still exiting early enough.

I’m certain that the semantics of the if statement, which preemptively exits based on a condition, caused the compiler to abandon SIMD optimization. To test this, I conducted the following experiment:
I wrote two simple functions to simulate the two scenarios and then examined the optimized assembly code generated by the compiler to compare the differences.
You can view the specific assembly code generated from my simulated experiment here: https://godbolt.org/z/63bEvYP4d.

Records

Rust code as follows：

/// Scene 1: Counting all set bits (the number of 1s)
#[no_mangle]
#[inline(never)]
pub fn count_ones(ba: &[u8]) -> usize {
    ba.iter().map(|x| x.count_ones() as usize).sum()
}

/// Scene 2: Check if a bit is set
#[no_mangle]
#[inline(never)]
pub fn find_any(ba: &[u8]) -> bool {
    ba.iter().any(|&x| x != 0)
}

Assembly code as follows:

count_ones:
        test    rsi, rsi
        je      .LBB0_1
        cmp     rsi, 4
        jae     .LBB0_5
        xor     ecx, ecx
        xor     eax, eax
        jmp     .LBB0_8
.LBB0_1:
        xor     eax, eax
        ret
.LBB0_5:
        mov     rcx, rsi
        and     rcx, -4
        pxor    xmm0, xmm0
        xor     eax, eax
        movdqa  xmm2, xmmword ptr [rip + .LCPI0_0]
        movdqa  xmm3, xmmword ptr [rip + .LCPI0_1]
        movdqa  xmm5, xmmword ptr [rip + .LCPI0_2]
        pxor    xmm4, xmm4
        pxor    xmm1, xmm1
.LBB0_6:
        movzx   edx, word ptr [rdi + rax]
        movd    xmm7, edx
        movzx   edx, word ptr [rdi + rax + 2]
        movd    xmm6, edx
        punpcklbw       xmm7, xmm0
        punpcklwd       xmm7, xmm0
        punpckldq       xmm7, xmm0
        movdqa  xmm8, xmm7
        psrlw   xmm8, 1
        pand    xmm8, xmm2
        psubb   xmm7, xmm8
        movdqa  xmm8, xmm7
        pand    xmm8, xmm3
        psrlw   xmm7, 2
        pand    xmm7, xmm3
        paddb   xmm7, xmm8
        movdqa  xmm8, xmm7
        psrlw   xmm8, 4
        paddb   xmm8, xmm7
        pand    xmm8, xmm5
        psadbw  xmm8, xmm0
        paddq   xmm4, xmm8
        punpcklbw       xmm6, xmm0
        punpcklwd       xmm6, xmm0
        punpckldq       xmm6, xmm0
        movdqa  xmm7, xmm6
        psrlw   xmm7, 1
        pand    xmm7, xmm2
        psubb   xmm6, xmm7
        movdqa  xmm7, xmm6
        pand    xmm7, xmm3
        psrlw   xmm6, 2
        pand    xmm6, xmm3
        paddb   xmm6, xmm7
        movdqa  xmm7, xmm6
        psrlw   xmm7, 4
        paddb   xmm7, xmm6
        pand    xmm7, xmm5
        psadbw  xmm7, xmm0
        paddq   xmm1, xmm7
        add     rax, 4
        cmp     rcx, rax
        jne     .LBB0_6
        paddq   xmm1, xmm4
        pshufd  xmm0, xmm1, 238
        paddq   xmm0, xmm1
        movq    rax, xmm0
        cmp     rcx, rsi
        je      .LBB0_2
.LBB0_8:
        movzx   edx, byte ptr [rdi + rcx]
        imul    edx, edx, 134480385
        shr     edx, 3
        and     edx, 286331153
        imul    edx, edx, 286331153
        shr     edx, 28
        add     rax, rdx
        inc     rcx
        cmp     rsi, rcx
        jne     .LBB0_8
.LBB0_2:
        ret

find_any:
        xor     ecx, ecx
.LBB1_1:
        mov     rax, rcx
        cmp     rsi, rcx
        je      .LBB1_3
        lea     rcx, [rax + 1]
        cmp     byte ptr [rdi + rax], 0
        je      .LBB1_1
.LBB1_3:
        cmp     rsi, rax
        setne   al
        ret

Conclusion

Vectorization Optimization Differences:
- The assembly code for count_ones utilizes SIMD instructions (such as movdqa, psrlw, and paddb), achieving automatic vectorization. The compiler packs four bytes (32 bits) of data into an XMM register for parallel processing, efficiently calculating the number of set bits through bitmasking and accumulation operations.
- This optimization reduces the computational complexity from O(n) to O(n/4), significantly decreasing the number of loop iterations.
Early Exit Hinders Optimization:
- The find_any function needs to return the position of the first non-zero byte immediately, preventing the loop from being unrolled into fixed-step SIMD operations. The assembly code reveals that it checks each byte individually (e.g., cmp byte ptr [rcx + rdx], 0), making it unable to leverage parallel comparisons.
Branch Prediction Penalty:
- The loop in find_any contains conditional jumps (e.g., je .LBB1_1), posing a risk of branch prediction failure in every iteration. In contrast, the vectorized code of count_ones performs branch-free linear operations, which are better suited for modern CPU pipelines.
Compiler Optimization Limitations:
- The early exit semantics of any() cause the compiler to conservatively avoid vectorization, opting for the safest byte-by-byte checks. Conversely, the semantics of sum() allow the compiler to aggressively optimize.

alamb · 2025-04-08T10:45:35Z

Thanks again for the wonderful work @acking-you --
I ran the benchmarks locally and indeed they show a non trivial amount of time being spent in count_ones.

I also confirmed that without the short circuit optimization the benchmarks get much much slower

Gnuplot not found, using plotters backend
bench_all_false_and     time:   [503.28 µs 505.29 µs 507.36 µs]
                        change: [+1711392% +1718962% +1727169%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

bench_all_true_or       time:   [504.57 µs 506.27 µs 508.10 µs]
                        change: [+1704732% +1713204% +1721665%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild

Let's merge this PR in and I will file a follow on ticket to track potentially improving performance even more

alamb · 2025-04-08T10:50:23Z

Filed Improve the performance of early exit evaluation in binary_expr #15631

alamb

Thanks again everyone -- this is a good improvement. Also really nice analysis work by @acking-you

alamb · 2025-04-08T11:14:32Z

datafusion/physical-expr/benches/binary_op.rs

+/// # test bool_or/bool_and
+/// cargo bench --bench binary_op -- boolean_ops
+/// ```
+fn benchmark_boolean_ops(c: &mut Criterion) {


since these are basically arrow benchmarks, I made a small follow on PR to propose removing them for now:

Remove uneeded binary_op benchmarks #15632

alamb · 2025-04-08T16:39:59Z

the extended tests appear to be failing after this PR:

extended tests failing on main (External error: query is expected to fail, but actually succeed:) #15641

* [draft] add shot circuit in BinaryExpr * refactor: add check_short_circuit function * refactor: change if condition to match * feat: Add support for --mem-pool-type and --memory-limit options to multiple benchmarks (apache#14642) * Add support --mem-pool-type and --memory-limit options for all benchmarks * Add --sort-spill-reservation-bytes option * Chore/Add additional FFI unit tests (apache#14802) * Add unit tests to FFI_ExecutionPlan * Add unit tests for FFI table source * Add round trip tests for volatility * Add unit tests for FFI insert op * Simplify string generation in unit test Co-authored-by: Andrew Lamb <[email protected]> * Fix drop of borrowed value --------- Co-authored-by: Andrew Lamb <[email protected]> * Improve feature flag CI coverage `datafusion` and `datafusion-functions` (apache#15203) * add extend sql & docs * feat: Add support for --mem-pool-type and --memory-limit options to multiple benchmarks (apache#14642) * Add support --mem-pool-type and --memory-limit options for all benchmarks * Add --sort-spill-reservation-bytes option * Chore/Add additional FFI unit tests (apache#14802) * Add unit tests to FFI_ExecutionPlan * Add unit tests for FFI table source * Add round trip tests for volatility * Add unit tests for FFI insert op * Simplify string generation in unit test Co-authored-by: Andrew Lamb <[email protected]> * Fix drop of borrowed value --------- Co-authored-by: Andrew Lamb <[email protected]> * Improve feature flag CI coverage `datafusion` and `datafusion-functions` (apache#15203) * fix: incorrect false judgment * add test * separate q6 to new PR * feat: Add support for --mem-pool-type and --memory-limit options to multiple benchmarks (apache#14642) * Add support --mem-pool-type and --memory-limit options for all benchmarks * Add --sort-spill-reservation-bytes option * Chore/Add additional FFI unit tests (apache#14802) * Add unit tests to FFI_ExecutionPlan * Add unit tests for FFI table source * Add round trip tests for volatility * Add unit tests for FFI insert op * Simplify string generation in unit test Co-authored-by: Andrew Lamb <[email protected]> * Fix drop of borrowed value --------- Co-authored-by: Andrew Lamb <[email protected]> * Improve feature flag CI coverage `datafusion` and `datafusion-functions` (apache#15203) * feat: Add support for --mem-pool-type and --memory-limit options to multiple benchmarks (apache#14642) * Add support --mem-pool-type and --memory-limit options for all benchmarks * Add --sort-spill-reservation-bytes option * Chore/Add additional FFI unit tests (apache#14802) * Add unit tests to FFI_ExecutionPlan * Add unit tests for FFI table source * Add round trip tests for volatility * Add unit tests for FFI insert op * Simplify string generation in unit test Co-authored-by: Andrew Lamb <[email protected]> * Fix drop of borrowed value --------- Co-authored-by: Andrew Lamb <[email protected]> * Improve feature flag CI coverage `datafusion` and `datafusion-functions` (apache#15203) * add benchmark for boolean_op * fix cargo doc * add binary_op bench * Better comments --------- Co-authored-by: Kristin Cowalcijk <[email protected]> Co-authored-by: Tim Saucer <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>

github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 27, 2025

acking-you mentioned this pull request Mar 28, 2025

BinaryExpr evaluate lacks optimization for Or and And scenarios #11212

Closed

alamb reviewed Mar 28, 2025

View reviewed changes

alamb changed the title ~~Add short circuit~~ Add short circuit evaluation for AND and OR Mar 28, 2025

acking-you force-pushed the add_short_circuit branch from 027a772 to 5de0f36 Compare March 29, 2025 20:09

acking-you and others added 12 commits March 31, 2025 10:51

[draft] add shot circuit in BinaryExpr

3e2a299

refactor: add check_short_circuit function

57d6645

refactor: change if condition to match

b85807d

feat: Add support for --mem-pool-type and --memory-limit options to m…

bf4e218

…ultiple benchmarks (apache#14642) * Add support --mem-pool-type and --memory-limit options for all benchmarks * Add --sort-spill-reservation-bytes option

Improve feature flag CI coverage datafusion and `datafusion-functio…

f171227

…ns` (apache#15203)

add extend sql & docs

ac3c918

feat: Add support for --mem-pool-type and --memory-limit options to m…

a642394

…ultiple benchmarks (apache#14642) * Add support --mem-pool-type and --memory-limit options for all benchmarks * Add --sort-spill-reservation-bytes option

Improve feature flag CI coverage datafusion and `datafusion-functio…

2205edb

…ns` (apache#15203)

fix: incorrect false judgment

d79a75a

add test

575c3f3

acking-you force-pushed the add_short_circuit branch from 5de0f36 to 575c3f3 Compare March 31, 2025 02:52

acking-you mentioned this pull request Mar 31, 2025

Add query to extended clickbench suite for "complex filter" #15500

Merged

separate q6 to new PR

8801063

Dandandan reviewed Mar 31, 2025

View reviewed changes

timsaucer and others added 3 commits April 7, 2025 15:37

Improve feature flag CI coverage datafusion and `datafusion-functio…

13742d2

…ns` (apache#15203)

add benchmark for boolean_op

1394f3f

alamb mentioned this pull request Apr 7, 2025

Weekly Plan (Andrew Lamb) April 7, 2025 #15616

Closed

12 tasks

fix cargo doc

59cfced

alamb reviewed Apr 7, 2025

View reviewed changes

add binary_op bench

775e70a

Better comments

11816e9

alamb merged commit 26adc8f into apache:main Apr 8, 2025
28 of 29 checks passed

alamb mentioned this pull request Apr 8, 2025

Improve the performance of early exit evaluation in binary_expr #15631

Open

alamb mentioned this pull request Apr 8, 2025

Remove uneeded binary_op benchmarks #15632

Merged

alamb reviewed Apr 8, 2025

View reviewed changes

acking-you mentioned this pull request Apr 8, 2025

Add more short-circuit optimization scenarios for OR and AND #15636

Closed

alamb mentioned this pull request Apr 8, 2025

extended tests failing on main (External error: query is expected to fail, but actually succeed:) #15641

Closed

Omega359 mentioned this pull request Apr 8, 2025

Updates for changes in DataFusion apache/datafusion-testing#9

Merged

alamb mentioned this pull request Apr 9, 2025

Update datafusion-testing pin (to fix extended test on main) #15655

Merged

Omega359 mentioned this pull request Jul 1, 2025

Apply pre-selection and computation skipping to short-circuit optimization #15694

Merged

Add short circuit evaluation for AND and OR #15462

Add short circuit evaluation for AND and OR #15462

Uh oh!

Conversation

acking-you commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

acking-you commented Mar 27, 2025

Uh oh!

ctsk commented Mar 28, 2025

Uh oh!

acking-you commented Mar 28, 2025

Uh oh!

acking-you commented Mar 28, 2025

Uh oh!

ctsk commented Mar 28, 2025

Uh oh!

acking-you commented Mar 28, 2025

Uh oh!

acking-you commented Mar 28, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acking-you Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

acking-you commented Mar 29, 2025

Uh oh!

acking-you commented Mar 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acking-you Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acking-you commented Apr 7, 2025

Uh oh!

alamb commented Apr 7, 2025

Uh oh!

alamb commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acking-you commented Apr 7, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Add short circuit evaluation for `AND` and `OR` #15462

Add short circuit evaluation for `AND` and `OR` #15462

acking-you commented Mar 27, 2025 •

edited

Loading

acking-you Mar 29, 2025 •

edited

Loading

acking-you Mar 31, 2025 •

edited

Loading

Dandandan Mar 31, 2025 •

edited

Loading

alamb commented Apr 7, 2025 •

edited

Loading

Dandandan commented Apr 7, 2025 •

edited

Loading

alamb commented Apr 8, 2025 •

edited

Loading