`BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios #11212

acking-you · 2024-07-02T11:54:43Z

Describe the bug

As shown in the code link, BinaryExpr directly calculates the left and right operands without considering the possibility that left might already be false and op is And, or that left might be true and op is Or.

For example, in this complex filtering statement, unnecessary calculations will be performed:

SELECT 
    customer_id,
    first_name,
    last_name,
    email,
    total_purchases
FROM 
    customers
WHERE 
    1 = 0  -- This condition is always false
    AND (
        (total_purchases > 1000 AND EXTRACT(YEAR FROM registration_date) > 2018)
        OR (LOWER(email) LIKE '%@gmail.com' AND loyalty_points > 500)
        OR (
            LENGTH(first_name) + LENGTH(last_name) > 15
            AND SUBSTR(last_name, 1, 1) IN ('M', 'N', 'O', 'P')
            AND DATE_PART('month', last_purchase_date) BETWEEN 6 AND 8
        )
    )
    AND (
        customer_id % 3 = 0
        OR EXISTS (
            SELECT 1 
            FROM orders 
            WHERE orders.customer_id = customers.customer_id
            AND orders.order_status = 'Completed'
            AND orders.order_total > 250
        )
    );

To Reproduce

When evaluating AND or OR expressions

Expected behavior

When an And expression is executed, a false occurs to stop the computation of the result.
When an Or expression is executed, a true occurs and the result is stopped.

Additional context

No

The text was updated successfully, but these errors were encountered:

Dandandan · 2024-07-02T12:55:41Z

Could you give an example where this would make a difference? We have an optimization pass SimplifyExpressions which already takes care of true and false constants in AND and OR during planning.

alamb · 2024-07-02T20:42:52Z

possibility that left might already be false and op is And, or that left might be true and op is Or.

In general I think there is a tradeoff between doing short circuiting (what I think this ticket is describing) and having to check if each row should be evaluted

So for a predicate like (a = 5) AND (b = 10) it is very likely faster to evaluate (a = b) and (b = 10) with very tight loops / SIMD instructions and then && the resutlting boolean together that it would be to evalute (a = b) and then have a loop that checked the result for each row when evaluting b = 10.

However, for an example like (a = 5) AND (very_expensive_udf(b) = 10) it may well be faster to do short circuting execution

Note DataFusion already does short circuting for evaluating CASE

datafusion/datafusion/physical-expr/src/expressions/case.rs

Lines 125 to 187 in 4f4cd81

    
               let return_type = self.data_type(&batch.schema())?; 
        
               let expr = self.expr.as_ref().unwrap(); 
        
               let base_value = expr.evaluate(batch)?; 
        
               let base_value = base_value.into_array(batch.num_rows())?; 
        
               let base_nulls = is_null(base_value.as_ref())?; 
        
               // start with nulls as default output 
        
               let mut current_value = new_null_array(&return_type, batch.num_rows()); 
        
               // We only consider non-null values while comparing with whens 
        
               let mut remainder = not(&base_nulls)?; 
        
               for i in 0..self.when_then_expr.len() { 
        
                   let when_value = self.when_then_expr[i] 
        
                       .0 
        
                       .evaluate_selection(batch, &remainder)?; 
        
                   let when_value = when_value.into_array(batch.num_rows())?; 
        
                   // build boolean array representing which rows match the "when" value 
        
                   let when_match = eq(&when_value, &base_value)?; 
        
                   // Treat nulls as false 
        
                   let when_match = match when_match.null_count() { 
        
                       0 => Cow::Borrowed(&when_match), 
        
                       _ => Cow::Owned(prep_null_mask_filter(&when_match)), 
        
                   }; 
        
                   // Make sure we only consider rows that have not been matched yet 
        
                   let when_match = and(&when_match, &remainder)?; 
        
                   // When no rows available for when clause, skip then clause 
        
                   if when_match.true_count() == 0 { 
        
                       continue; 
        
                   } 
        
                   let then_value = self.when_then_expr[i] 
        
                       .1 
        
                       .evaluate_selection(batch, &when_match)?; 
        
                   current_value = match then_value { 
        
                       ColumnarValue::Scalar(ScalarValue::Null) => { 
        
                           nullif(current_value.as_ref(), &when_match)? 
        
                       } 
        
                       ColumnarValue::Scalar(then_value) => { 
        
                           zip(&when_match, &then_value.to_scalar()?, &current_value)? 
        
                       } 
        
                       ColumnarValue::Array(then_value) => { 
        
                           zip(&when_match, &then_value, &current_value)? 
        
                       } 
        
                   }; 
        
                   remainder = and_not(&remainder, &when_match)?; 
        
               } 
        
               if let Some(e) = &self.else_expr { 
        
                   // keep `else_expr`'s data type and return type consistent 
        
                   let expr = try_cast(e.clone(), &batch.schema(), return_type.clone()) 
        
                       .unwrap_or_else(|_| e.clone()); 
        
                   // null and unmatched tuples should be assigned else value 
        
                   remainder = or(&base_nulls, &remainder)?; 
        
                   let else_ = expr 
        
                       .evaluate_selection(batch, &remainder)? 
        
                       .into_array(batch.num_rows())?; 
        
                   current_value = zip(&remainder, &else_, &current_value)?; 
        
               } 
        
               Ok(ColumnarValue::Array(current_value)) 
        
           }

SO TLDR I think this would be an interesting optimization to explore and as @Dandandan notes finding some benchmarks where it matters is likely a good first step

acking-you · 2024-07-03T02:13:25Z

Could you give an example where this would make a difference? We have an optimization pass SimplifyExpressions which already takes care of true and false constants in AND and OR during planning.

My apologies, my example might not have been the best illustration because cases with constants are indeed optimized by the rule. What I'm trying to get at is that there's currently unnecessary computation happening within BinaryExpr (perhaps in many cases, just calculating the left side would be sufficient to determine the result).

jayzhan211 · 2024-07-03T07:21:13Z

What I'm trying to get at is that there's currently unnecessary computation happening within BinaryExpr (perhaps in many cases, just calculating the left side would be sufficient to determine the result).

I would be surprise if there is any, could you provide the example that shows optimizer failed to do what you expect?

Here is the optimize rule that skip the right hand side if left is false

datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

Lines 848 to 853 in 3421b52

    
           // false AND A --> false (even if A is null) 
        
           Expr::BinaryExpr(BinaryExpr { 
        
               left, 
        
               op: And, 
        
               right: _, 
        
           }) if is_false(&left) => Transformed::yes(*left),

acking-you · 2024-07-03T07:55:07Z

What I'm trying to get at is that there's currently unnecessary computation happening within BinaryExpr (perhaps in many cases, just calculating the left side would be sufficient to determine the result).

I would be surprise if there is any, could you provide the example that shows optimizer failed to do what you expect?

Here is the optimize rule that skip the right hand side if left is false

datafusion/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs

Lines 848 to 853 in 3421b52

// false AND A --> false (even if A is null)

Expr::BinaryExpr(BinaryExpr {

left,

op: And,

right: _,

}) if is_false(&left) => Transformed::yes(*left),

What I'm trying to describe is probably something like this.

SELECT 
    customer_id,
    first_name,
    last_name,
    email,
    total_purchases
FROM 
    customers
WHERE 
    some_udf(first_name)  -- if this condition is mostly false at runtime.
    AND (
        (total_purchases > 1000 AND EXTRACT(YEAR FROM registration_date) > 2018)
        OR (LOWER(email) LIKE '%@gmail.com' AND loyalty_points > 500)
        OR (
            LENGTH(first_name) + LENGTH(last_name) > 15
            AND SUBSTR(last_name, 1, 1) IN ('M', 'N', 'O', 'P')
            AND DATE_PART('month', last_purchase_date) BETWEEN 6 AND 8
        )
    )
    AND (
        customer_id % 3 = 0
        OR EXISTS (
            SELECT 1 
            FROM orders 
            WHERE orders.customer_id = customers.customer_id
            AND orders.order_status = 'Completed'
            AND orders.order_total > 250
        )
    );

acking-you · 2024-07-03T08:22:33Z

possibility that left might already be false and op is And, or that left might be true and op is Or.

In general I think there is a tradeoff between doing short circuiting (what I think this ticket is describing) and having to check if each row should be evaluted

So for a predicate like (a = 5) AND (b = 10) it is very likely faster to evaluate (a = b) and (b = 10) with very tight loops / SIMD instructions and then && the resutlting boolean together that it would be to evalute (a = b) and then have a loop that checked the result for each row when evaluting b = 10.

However, for an example like (a = 5) AND (very_expensive_udf(b) = 10) it may well be faster to do short circuting execution

Note DataFusion already does short circuting for evaluating CASE

datafusion/datafusion/physical-expr/src/expressions/case.rs

Lines 125 to 187 in 4f4cd81

let return_type = self.data_type(&batch.schema())?;

let expr = self.expr.as_ref().unwrap();

let base_value = expr.evaluate(batch)?;

let base_value = base_value.into_array(batch.num_rows())?;

let base_nulls = is_null(base_value.as_ref())?;

// start with nulls as default output

let mut current_value = new_null_array(&return_type, batch.num_rows());

// We only consider non-null values while comparing with whens

let mut remainder = not(&base_nulls)?;

for i in 0..self.when_then_expr.len() {

let when_value = self.when_then_expr[i]

.0

.evaluate_selection(batch, &remainder)?;

let when_value = when_value.into_array(batch.num_rows())?;

// build boolean array representing which rows match the "when" value

let when_match = eq(&when_value, &base_value)?;

// Treat nulls as false

let when_match = match when_match.null_count() {

0 => Cow::Borrowed(&when_match),

_ => Cow::Owned(prep_null_mask_filter(&when_match)),

};

// Make sure we only consider rows that have not been matched yet

let when_match = and(&when_match, &remainder)?;

// When no rows available for when clause, skip then clause

if when_match.true_count() == 0 {

continue;

}

let then_value = self.when_then_expr[i]

.1

.evaluate_selection(batch, &when_match)?;

current_value = match then_value {

ColumnarValue::Scalar(ScalarValue::Null) => {

nullif(current_value.as_ref(), &when_match)?

}

ColumnarValue::Scalar(then_value) => {

zip(&when_match, &then_value.to_scalar()?, &current_value)?

}

ColumnarValue::Array(then_value) => {

zip(&when_match, &then_value, &current_value)?

}

};

remainder = and_not(&remainder, &when_match)?;

}

if let Some(e) = &self.else_expr {

// keep `else_expr`'s data type and return type consistent

let expr = try_cast(e.clone(), &batch.schema(), return_type.clone())

.unwrap_or_else(|_| e.clone());

// null and unmatched tuples should be assigned else value

remainder = or(&base_nulls, &remainder)?;

let else_ = expr

.evaluate_selection(batch, &remainder)?

.into_array(batch.num_rows())?;

current_value = zip(&remainder, &else_, &current_value)?;

}

Ok(ColumnarValue::Array(current_value))

}

SO TLDR I think this would be an interesting optimization to explore and as @Dandandan notes finding some benchmarks where it matters is likely a good first step

Adding short-circuiting after the left calculation alone resulted in performance improvements in my local TPCH tests.
commit link

Here are the test results for my local TPCH run (on my M3Pro chip with 36GB of RAM):

Comparing main and add_short_circuit
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ add_short_circuit ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  89.54ms │           90.81ms │     no change │
│ QQuery 2     │  24.09ms │           20.56ms │ +1.17x faster │
│ QQuery 3     │  37.53ms │           38.00ms │     no change │
│ QQuery 4     │  23.99ms │           23.73ms │     no change │
│ QQuery 5     │  61.97ms │           54.52ms │ +1.14x faster │
│ QQuery 6     │  21.62ms │           20.98ms │     no change │
│ QQuery 7     │  75.46ms │           75.51ms │     no change │
│ QQuery 8     │  49.76ms │           50.09ms │     no change │
│ QQuery 9     │  68.53ms │           68.13ms │     no change │
│ QQuery 10    │  61.78ms │           60.36ms │     no change │
│ QQuery 11    │  13.16ms │           13.55ms │     no change │
│ QQuery 12    │  34.62ms │           34.35ms │     no change │
│ QQuery 13    │  41.50ms │           42.54ms │     no change │
│ QQuery 14    │  32.07ms │           30.29ms │ +1.06x faster │
│ QQuery 15    │  43.47ms │           43.82ms │     no change │
│ QQuery 16    │  18.59ms │           16.59ms │ +1.12x faster │
│ QQuery 17    │  97.63ms │           95.67ms │     no change │
│ QQuery 18    │ 115.07ms │          111.64ms │     no change │
│ QQuery 19    │  53.83ms │           57.37ms │  1.07x slower │
│ QQuery 20    │  45.42ms │           44.09ms │     no change │
│ QQuery 21    │  82.97ms │           82.69ms │     no change │
│ QQuery 22    │  16.46ms │           13.59ms │ +1.21x faster │
└──────────────┴──────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                │ 1109.06ms │
│ Total Time (add_short_circuit)   │ 1088.88ms │
│ Average Time (main)              │   50.41ms │
│ Average Time (add_short_circuit) │   49.49ms │
│ Queries Faster                   │         5 │
│ Queries Slower                   │         1 │
│ Queries with No Change           │        16 │
└──────────────────────────────────┴───────────┘

Dandandan · 2024-07-03T09:44:45Z

Thanks for providing some benchmarks / example @acking-you !

I think the missing optimizations seem to be:

Checking for array.true_count() to be all zero / all true. I think this is an optimization that might be better to implement in the arrow-rs and/or (and_kleene, or_kleene) kernels as a special case instead of in DataFusion.
UDF -> Immutable/non-volatile are executed during optimization in the constant evaluator, so the extra optimization will only apply for non-volatile UDFs.

acking-you · 2024-07-03T11:36:08Z

Checking for array.true_count() to be all zero / all true. I think this is an optimization that might be better to implement in the arrow-rs and/or (and_kleene, or_kleene) kernels as a special case instead of in DataFusion.

I think the root cause is that left and right are calculated first, and then further calculations are performed based on op. The specific code is in the link below:
https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/src/expressions/binary.rs#L260-L261

To make changes here, you'll need to modify the DataFusion code.
Possible Solutions: commit link

jayzhan211 · 2024-07-03T11:45:21Z

Thanks for providing some benchmarks / example @acking-you !

I think the missing optimizations seem to be:

Checking for array.true_count() to be all zero / all true. I think this is an optimization that might be better to implement in the arrow-rs and/or (and_kleene, or_kleene) kernels as a special case instead of in DataFusion.

UDF -> Immutable/non-volatile are executed during optimization in the constant evaluator, so the extra optimization will only apply for non-volatile UDFs.

I'm not sure if modifying and / or kleene logic help, since it needs two boolean array, but in this case we expect to short circuit if the left hand side has boolean array with all false for AND clause.

~~I think we might need to extend expression simplify logic for array case in Simplifier 🤔~~ Not a good idea to compute ColumnValue::Array in const_evalutor

Dandandan · 2024-07-03T11:50:44Z

I'm not sure if modifying and / or kleene logic help, since it needs two boolean array, but in this case we expect to short circuit if the left hand side has boolean array with all false for AND clause.

In the kernel we can add a fast case as well, just as @acking-you has done in his code . The benefit of this is that it will apply for any place that uses and/ and_kleene/or/or_kleene rather than in BinaryExpr alone.

jayzhan211 · 2024-07-03T12:13:12Z

I'm not sure if modifying and / or kleene logic help, since it needs two boolean array, but in this case we expect to short circuit if the left hand side has boolean array with all false for AND clause.

In the kernel we can add a fast case as well, just as @acking-you has done in his code . The benefit of this is that it will apply for any place that uses and/ and_kleene/or/or_kleene rather than in BinaryExpr alone.

I agree that we can also optimize the kleene kernel, so we can early return the result.
But besides that we also need to find a way to avoid computation in datafusion.

Current code is like

let left: BooleanArray = expr.evalute()
let right: BooleanArray = expr.evalute()
let result = and_kleene(left, right) // we can optimize this too!

But ideally we could avoid computation of right

let left: BooleanArray = expr.evalute()
// left is always false
skip right evaluation // right evaluation is possible to be expensive
skip and_kleene

#11247 indeed did a good job, but it seems like a specialize hacking code, not sure if it is the best solution 🤔
I feel like the solution is optimizing query that has most of the false statement in the left hand side,
given the queries like a AND b AND false where most of the false statement is in the right hand side, the optimization here does not help, it is highly correlated to the benchmark we run i.e. tpch in your case

Dandandan · 2024-07-03T14:18:27Z

Hm I get where you're going at, there is some more optimization possible if we add it to BinaryExpr as well

alamb · 2024-07-05T15:07:00Z

Hm I get where you're going at, there is some more optimization possible if we add it to BinaryExpr as well

I believe @Dandandan filed #11262 to discuss this idea further

alamb · 2024-07-05T15:48:02Z

I did some more looking into the idea of using true_count in boolean kernels in arrow to speed things up (e..g special casing when BooleanArray::true_count and BooleanArray::row_count were the same to avoid actually computing the binary result

However, I concluded that since BooleanArray::true_count has to check all the values of the array anyways it might be nearly as fast to just do the binary operation (which is already quite fast) than to count the true/faslse values

https://docs.rs/arrow-array/52.0.0/src/arrow_array/array/boolean_array.rs.html#142

acking-you · 2025-03-26T08:29:34Z

@alamb I sincerely apologize for not revisiting this issue or pushing forward with that PR for such a long time. However, this optimization has led to significant performance improvements in one of our internal use cases—nearly 100 times faster. Recently, while seeing the community working on runtime filters, this issue came to mind again. I’m truly grateful for the community’s friendliness and active engagement.

Application Scenarios

Scenario 1

I attempted to reproduce our usage scenario using the hits dataset from clickbench hits:

SELECT count(*)
FROM hits
WHERE 
    -- Core filtering criteria: In most cases, the data can be filtered down to 0 records.
    "UserID" > 7350907328996404000
    AND "URL" = 'http://smeshariki.ru/recipes/search/cuZ29vZ2xlLzcwODthZHdvcmRz&page/make=43;1334313116'
    
    -- Other filters
    AND "EventTime" BETWEEN 
        EXTRACT(epoch FROM '2014-03-23 00:00:00'::timestamp)::BIGINT 
        AND 
        EXTRACT(epoch FROM '2014-04-22 23:59:59'::timestamp)::BIGINT
    AND (("Age" BETWEEN 18 AND 35 AND "Sex" = 'female') 
         OR ("Income" > 50000 AND ("Interests" & 128) = 128))
    AND (
        ("IsMobile" = 1 AND "MobilePhoneModel" LIKE 'iPhone%') 
        OR ("IsMobile" = 0 AND "ResolutionWidth" >= 1920)
    )
    AND "IsDownload" = 0
    AND "IsNotBounce" = 1
    AND "DontCountHits" = 0
    AND split_part(split_part("URL", 'make=', 2), ';', 1) = '43'
    AND "UTMSource" = 'google_ads'
    AND "UTMCampaign" = 'spring_promo'
    AND "HTTPError" = CAST(200 AS SMALLINT)
    AND "JavascriptEnable" = 1
    AND "CookieEnable" = 1
    AND EXTRACT(hour FROM to_timestamp("EventTime")) BETWEEN 14 AND 18
    AND (
        regexp_match("UserAgent", 'Chrome/(9[0-9]|10[0-2])') IS NOT NULL
        OR regexp_match("UserAgent", 'Safari/[6-8]\.') IS NOT NULL
    )
    AND "SocialSourceNetworkID" IN (CAST(5 AS SMALLINT),CAST(12 AS SMALLINT))
    AND "SocialAction" = 'share'
    AND (
        ("ClientTimeZone" BETWEEN -5 AND 5 AND "BrowserCountry" <> 'RU')
        OR ("ClientTimeZone" NOT BETWEEN -5 AND 5 AND "BrowserCountry" = 'RU')
    )
    AND (
        strpos("OriginalURL", 'utm_id=') > 0 
        OR "OpenstatCampaignID" IS NOT NULL
    )
    AND "Robotness" < CAST(0.3 AS FLOAT)
    AND "IsArtifical" = 0
    AND "IsEvent" = 1
    AND "IsParameter" = 0;

The performance difference of this SQL compared to the main branch without short-circuit optimization is as follows, achieving a nearly 500X improvement:

Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ add_short_circuit ┃          Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ QQuery 0     │ 5024.39ms │           10.16ms │ +494.40x faster │
└──────────────┴───────────┴───────────────────┴─────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                │ 5024.39ms │
│ Total Time (add_short_circuit)   │   10.16ms │
│ Average Time (main)              │ 5024.39ms │
│ Average Time (add_short_circuit) │   10.16ms │
│ Queries Faster                   │         1 │
│ Queries Slower                   │         0 │
│ Queries with No Change           │         0 │
└──────────────────────────────────┴───────────┘

Scenario 2

I also thought of a scenario where where in (subquery) could be used:

SELECT count(*) FROM hits WHERE "UserID" > 7350909328996404000 and  "URL" in (select "URL" from hits where "URL" like '%tt%');

However, this scenario will not optimize performance, because the statement is actually a join:

LogicalPlan:
Projection: count(Int64(1)) AS count(*)
└─ Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]]
   └─ Projection
      └─ LeftSemi Join: hits.URL = __correlated_sq_1.URL
         ├─ Projection: hits.URL
         │  └─ Filter: hits.UserID > Int64(9050909328996404000)
         │     └─ TableScan: hits
         │        (projection=[UserID, URL], filter=UserID > 9e18)
         └─ SubqueryAlias: __correlated_sq_1
            └─ Filter: hits.URL LIKE "%tt%"
               └─ TableScan: hits
                  (projection=[URL], filter=URL contains "tt")

PhysicalPlan:
ProjectionExec: count(*) 
└─ AggregateExec (Final)
   └─ CoalescePartitionsExec
      └─ AggregateExec (Partial)
         └─ ProjectionExec
            └─ CoalesceBatchesExec
               └─ HashJoinExec (LeftSemi)
                  ├─ RepartitionExec (Hash)
                  │  └─ CoalesceBatchesExec
                  │     └─ FilterExec (UserID > 9e18)
                  │        └─ DataSourceExec
                  │           (parquet/hits, 12 partitions)
                  │           [projection: UserID, URL]
                  │           [pruning: UserID_max > 9e18]
                  └─ RepartitionExec (Hash)
                     └─ CoalesceBatchesExec
                        └─ FilterExec (URL LIKE "%tt%")
                           └─ DataSourceExec
                              (parquet/hits, 12 partitions)
                              [projection: URL]
                              [predicate: URL contains "tt"]

This SQL query will perform extremely slowly with or without this optimization. I believe this scenario should benefit from the optimization of Dynamic Join Predicates, as mentioned by @alamb in this #7955 , referencing the duckdb blog.

Why TPCH or ClickBench Did Not Improve

TPCH

I went through each of the test SQL statements in TPCH and found that most of them focus on joins rather than point queries. Additionally, the computational load of the predicate conditions does not yield significant benefits, as illustrated by the following SQL:

select
    sum(l_extendedprice * l_discount) as revenue
from
    lineitem
where
        l_shipdate >= date '1994-01-01'
  and l_shipdate < date '1995-01-01'
  and l_discount between 0.06 - 0.01 and 0.06 + 0.01
  and l_quantity < 24;

Each predicate is based on simple comparisons and additions of numbers, which does not yield any profit.

ClickBench

There are not many SQL with more than one predicate in ClickBench, including Q21, Q22, Q36, Q37, Q38, Q39, Q40, Q41, and Q42:

SELECT "SearchPhrase", MIN("URL"), COUNT(*) AS c FROM hits WHERE "URL" LIKE '%google%' AND "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY c DESC LIMIT 10;

SELECT "SearchPhrase", MIN("URL"), MIN("Title"), COUNT(*) AS c, COUNT(DISTINCT "UserID") FROM hits WHERE "Title" LIKE '%Google%' AND "URL" NOT LIKE '%.google.%' AND "SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY c DESC LIMIT 10;

SELECT "URL", COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventDate"::INT::DATE >= '2013-07-01' AND "EventDate"::INT::DATE <= '2013-07-31' AND "DontCountHits" = 0 AND "IsRefresh" = 0 AND "URL" <> '' GROUP BY "URL" ORDER BY PageViews DESC LIMIT 10;

SELECT "Title", COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventDate"::INT::DATE >= '2013-07-01' AND "EventDate"::INT::DATE <= '2013-07-31' AND "DontCountHits" = 0 AND "IsRefresh" = 0 AND "Title" <> '' GROUP BY "Title" ORDER BY PageViews DESC LIMIT 10;

SELECT "URL", COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventDate"::INT::DATE >= '2013-07-01' AND "EventDate"::INT::DATE <= '2013-07-31' AND "IsRefresh" = 0 AND "IsLink" <> 0 AND "IsDownload" = 0 GROUP BY "URL" ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;

SELECT "TraficSourceID", "SearchEngineID", "AdvEngineID", CASE WHEN ("SearchEngineID" = 0 AND "AdvEngineID" = 0) THEN "Referer" ELSE '' END AS Src, "URL" AS Dst, COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventDate"::INT::DATE >= '2013-07-01' AND "EventDate"::INT::DATE <= '2013-07-31' AND "IsRefresh" = 0 GROUP BY "TraficSourceID", "SearchEngineID", "AdvEngineID", Src, Dst ORDER BY PageViews DESC LIMIT 10 OFFSET 1000;
SELECT "URLHash", "EventDate"::INT::DATE, COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventDate"::INT::DATE >= '2013-07-01' AND "EventDate"::INT::DATE <= '2013-07-31' AND "IsRefresh" = 0 AND "TraficSourceID" IN (-1, 6) AND "RefererHash" = 3594120000172545465 GROUP BY "URLHash", "EventDate"::INT::DATE ORDER BY PageViews DESC LIMIT 10 OFFSET 100;

SELECT "WindowClientWidth", "WindowClientHeight", COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventDate"::INT::DATE >= '2013-07-01' AND "EventDate"::INT::DATE <= '2013-07-31' AND "IsRefresh" = 0 AND "DontCountHits" = 0 AND "URLHash" = 2868770270353813622 GROUP BY "WindowClientWidth", "WindowClientHeight" ORDER BY PageViews DESC LIMIT 10 OFFSET 10000;

SELECT DATE_TRUNC('minute', to_timestamp_seconds("EventTime")) AS M, COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventDate"::INT::DATE >= '2013-07-14' AND "EventDate"::INT::DATE <= '2013-07-15' AND "IsRefresh" = 0 AND "DontCountHits" = 0 GROUP BY DATE_TRUNC('minute', to_timestamp_seconds("EventTime")) ORDER BY DATE_TRUNC('minute', M) LIMIT 10 OFFSET 1000;

From the above SQLs, we can observe that:

Most of them are simple predicate conditions.
Some of the slightly more complex predicate conditions are followed by limit 10, and the computational cost of these predicate conditions is still relatively low (especially when compared to many string-matching scenarios).

Summary

Scenarios that benefit from short-circuit optimization:

The initial filter can filter out most of the data.
Subsequent filters involve time-consuming computations (e.g., operations that mostly involve matching strings longer than 1k characters, or predicates with a large number of string manipulations).

jayzhan211 · 2025-03-26T11:56:53Z

If we can optimize the specialized query you mentioned and not slowing down other queries, it would be nice to have it.

alamb · 2025-03-26T20:33:54Z

Thank you for bringing this up again @acking-you

If we can optimize the specialized query you mentioned and not slowing down other queries, it would be nice to have it.

I agree with @jayzhan211 -- if we can improve performance in your scenario without slowing down other queries this is a great thing to do

I attempted to reproduce our usage scenario using the hits dataset from clickbench hits:

Nice! Thank you for this analysis

Perhaps as a first step you could add this query (or perhaps a derivative one) to the "extended" clickbench suite (that is DataFusion specific but covers some important cases): https://github.com/apache/datafusion/tree/main/benchmarks/queries/clickbench#extended-queries

The next step would be to make a PR with the optimization -- we can then use the query in the first step to verify performance is improved and also verify other queries are not slowed down.

acking-you · 2025-03-27T02:34:12Z

Thank you for your guidance and advice @alamb .
I will try to work on these later today (I might be a bit busy right now).

alamb · 2025-03-27T19:59:25Z

Thanks @acking-you -- I think the benchmark is the most important thing at first

acking-you · 2025-03-28T11:24:37Z

Thank you very much for your reply. These are some updates on this issue. @alamb:

I have added the extended SQL in this PR Add short circuit evaluation for AND and OR #15462, you can check the details there：SQL link
The test results of the local extend SQL are as follows:

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      main ┃ add_short_circuit ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  964.48ms │         1006.79ms │     no change │
│ QQuery 1     │  409.89ms │          419.52ms │     no change │
│ QQuery 2     │  838.25ms │          868.74ms │     no change │
│ QQuery 3     │  408.15ms │          396.88ms │     no change │
│ QQuery 4     │ 1029.80ms │          783.40ms │ +1.31x faster │
│ QQuery 5     │ 9429.76ms │         8835.06ms │ +1.07x faster │
│ QQuery 6     │ 4096.47ms │         1382.42ms │ +2.96x faster │
└──────────────┴───────────┴───────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                │ 17176.80ms │
│ Total Time (add_short_circuit)   │ 13692.80ms │
│ Average Time (main)              │  2453.83ms │
│ Average Time (add_short_circuit) │  1956.11ms │
│ Queries Faster                   │          3 │
│ Queries Slower                   │          0 │
│ Queries with No Change           │          4 │
└──────────────────────────────────┴────────────┘

My question: Is there a pipeline in DataFusion that can automatically perform benchmark comparisons?

I'm currently encountering some issues mentioned in Add short circuit evaluation for AND and OR #15462 (comment) and can't pass the tests yet. I'm trying to fix them.

acking-you added the bug Something isn't working label Jul 2, 2024

acking-you mentioned this issue Jul 3, 2024

add short circuit in BinaryExpr #11247

Closed

alamb added the performance Make DataFusion faster label Mar 26, 2025

acking-you mentioned this issue Mar 27, 2025

Add short circuit evaluation for AND and OR #15462

Merged

acking-you mentioned this issue Mar 31, 2025

Add query to extended clickbench suite for "complex filter" #15500

Merged

alamb closed this as completed in #15462 Apr 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios #11212

`BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios #11212

acking-you commented Jul 2, 2024

Dandandan commented Jul 2, 2024

alamb commented Jul 2, 2024

acking-you commented Jul 3, 2024

jayzhan211 commented Jul 3, 2024

acking-you commented Jul 3, 2024

acking-you commented Jul 3, 2024 •

edited

Loading

Dandandan commented Jul 3, 2024

acking-you commented Jul 3, 2024

jayzhan211 commented Jul 3, 2024 •

edited

Loading

Dandandan commented Jul 3, 2024

jayzhan211 commented Jul 3, 2024 •

edited

Loading

Dandandan commented Jul 3, 2024

alamb commented Jul 5, 2024

alamb commented Jul 5, 2024

acking-you commented Mar 26, 2025 •

edited

Loading

jayzhan211 commented Mar 26, 2025

alamb commented Mar 26, 2025

acking-you commented Mar 27, 2025

alamb commented Mar 27, 2025

acking-you commented Mar 28, 2025

BinaryExpr evaluate lacks optimization for Or and And scenarios #11212

BinaryExpr evaluate lacks optimization for Or and And scenarios #11212

Comments

acking-you commented Jul 2, 2024

Describe the bug

To Reproduce

Expected behavior

Additional context

Dandandan commented Jul 2, 2024

alamb commented Jul 2, 2024

acking-you commented Jul 3, 2024

jayzhan211 commented Jul 3, 2024

acking-you commented Jul 3, 2024

acking-you commented Jul 3, 2024 • edited Loading

Dandandan commented Jul 3, 2024

acking-you commented Jul 3, 2024

jayzhan211 commented Jul 3, 2024 • edited Loading

Dandandan commented Jul 3, 2024

jayzhan211 commented Jul 3, 2024 • edited Loading

Dandandan commented Jul 3, 2024

alamb commented Jul 5, 2024

alamb commented Jul 5, 2024

acking-you commented Mar 26, 2025 • edited Loading

Application Scenarios

Scenario 1

Scenario 2

Why TPCH or ClickBench Did Not Improve

TPCH

ClickBench

Summary

jayzhan211 commented Mar 26, 2025

alamb commented Mar 26, 2025

acking-you commented Mar 27, 2025

alamb commented Mar 27, 2025

acking-you commented Mar 28, 2025

`BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios #11212

`BinaryExpr` evaluate lacks optimization for `Or` and `And` scenarios #11212

acking-you commented Jul 3, 2024 •

edited

Loading

jayzhan211 commented Jul 3, 2024 •

edited

Loading

jayzhan211 commented Jul 3, 2024 •

edited

Loading

acking-you commented Mar 26, 2025 •

edited

Loading