Skip to content

Commit 4818966

Browse files
acking-youkosiew
andauthored
Apply pre-selection and computation skipping to short-circuit optimization (#15694)
* Enhance short-circuit evaluation for binary expressions - Delay evaluation of the right-hand side (RHS) unless necessary. - Optimize short-circuiting for `Operator::And` and `Operator::Or` by checking LHS alone first. - Introduce `get_short_circuit_result` function to determine short-circuit conditions based on LHS and RHS. - Update tests to cover various short-circuit scenarios for both `AND` and `OR` operations. * refactor: rename test_check_short_circuit to test_get_short_circuit_result and update assertions - Renamed the test function for clarity. - Updated assertions to use get_short_circuit_result instead of check_short_circuit. - Added additional test cases for AND and OR operations with expected results. * fix: enhance short-circuit evaluation logic in get_short_circuit_result function for null - Updated AND and OR short-circuit conditions to only trigger when all values are either false or true, respectively, and there are no nulls in the array. - Adjusted test case to reflect the change in expected output. * feat: add debug logging for binary expression evaluation and short-circuit checks * fix: improve short-circuit evaluation logic in BinaryExpr to ensure RHS is only evaluated when necessary * fix: restrict short-circuit evaluation to logical operators in get_short_circuit_result function * add more println!("==> "); * fix: remove duplicate data type checks for left and right operands in BinaryExpr evaluation * feat: add debug prints for dictionary values and keys in binary expression tests * Tests pass * fix: remove redundant short-circuit evaluation check in BinaryExpr and enhance documentation for get_short_circuit_result * refactor: remove unnecessary debug prints and streamline short-circuit evaluation in BinaryExpr * test: enhance short-circuit evaluation tests for nullable and scalar values in BinaryExpr * add benchmark * refactor: improve short-circuit logic in BinaryExpr for logical operators - Renamed `arg` to `lhs` for clarity in the `get_short_circuit_result` function. - Updated handling of Boolean data types to return `None` for null values. - Simplified short-circuit checks for AND/OR operations by consolidating logic. - Enhanced readability and maintainability of the code by restructuring match statements. * refactor: enhance short-circuit evaluation strategy in BinaryExpr to optimize logical operations * Revert "refactor: enhance short-circuit evaluation strategy in BinaryExpr to optimize logical operations" This reverts commit a62df47. * bench: add benchmark for OR operation with all false values in short-circuit evaluation * refactor: add ShortCircuitStrategy enum to optimize short-circuit evaluation in BinaryExpr - Replaced the lazy evaluation of the right-hand side (RHS) with immediate evaluation based on short-circuiting logic. - Introduced a new function `check_short_circuit` to determine if short-circuiting can be applied for logical operators. - Updated the logic to return early for `Operator::And` and `Operator::Or` based on the evaluation of the left-hand side (LHS) and the conditions of the RHS. - Improved clarity and efficiency of the short-circuit evaluation process by eliminating unnecessary evaluations. * refactor: simplify short-circuit evaluation logic in check_short_circuit function * datafusion_expr::lit as expr_lit * refactor: optimize short-circuit evaluation in check_short_circuit function - Simplified logic for AND/OR operations by prioritizing false/true counts to enhance performance. - Updated documentation to reflect changes in array handling techniques. * refactor: add count_boolean_values helper function and optimize check_short_circuit logic - Introduced a new helper function `count_boolean_values` to count true and false values in a BooleanArray, improving readability and performance. - Updated `check_short_circuit` to utilize the new helper function for counting, reducing redundant operations and enhancing clarity in the evaluation logic for AND/OR operations. - Adjusted comments for better understanding of the short-circuiting conditions based on the new counting mechanism. * Revert "refactor: add count_boolean_values helper function and optimize check_short_circuit logic" This reverts commit e2b9f77. * optimise evaluate * optimise evaluate 2 * refactor op:AND, lhs all false op:OR, lhs all true to be faster * fix clippy warning * refactor: optimize short-circuit evaluation logic in check_short_circuit function * fix clippy warning * add pre selection * add some comments * [WIP] fix pre-selection result * fix: Error in calculating the ratio * fix: Correct typo in pre-selection threshold constant and improve pre-selection scatter function documentation * fix doctest error * fix cargo doc * fix cargo doc * test: Add unit tests for pre_selection_scatter function --------- Co-authored-by: Siew Kam Onn <[email protected]>
1 parent a0b0221 commit 4818966

File tree

2 files changed

+458
-68
lines changed

2 files changed

+458
-68
lines changed

datafusion/physical-expr/benches/binary_op.rs

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -126,14 +126,25 @@ fn generate_boolean_cases<const TEST_ALL_FALSE: bool>(
126126
));
127127
}
128128

129+
// Scenario 7: Test all true or all false in AND/OR
130+
// This situation won't cause a short circuit, but it can skip the bool calculation
131+
if TEST_ALL_FALSE {
132+
let all_true = vec![true; len];
133+
cases.push(("all_true_in_and".to_string(), BooleanArray::from(all_true)));
134+
} else {
135+
let all_false = vec![false; len];
136+
cases.push(("all_false_in_or".to_string(), BooleanArray::from(all_false)));
137+
}
138+
129139
cases
130140
}
131141

132142
/// Benchmarks AND/OR operator short-circuiting by evaluating complex regex conditions.
133143
///
134-
/// Creates 6 test scenarios per operator:
144+
/// Creates 7 test scenarios per operator:
135145
/// 1. All values enable short-circuit (all_true/all_false)
136146
/// 2. 2-6 Single true/false value at different positions to measure early exit
147+
/// 3. Test all true or all false in AND/OR
137148
///
138149
/// You can run this benchmark with:
139150
/// ```sh
@@ -203,15 +214,15 @@ fn benchmark_binary_op_in_short_circuit(c: &mut Criterion) {
203214

204215
// Each scenario when the test operator is `and`
205216
{
206-
for (name, batch) in batches_and {
217+
for (name, batch) in batches_and.into_iter() {
207218
c.bench_function(&format!("short_circuit/and/{}", name), |b| {
208219
b.iter(|| expr_and.evaluate(black_box(&batch)).unwrap())
209220
});
210221
}
211222
}
212223
// Each scenario when the test operator is `or`
213224
{
214-
for (name, batch) in batches_or {
225+
for (name, batch) in batches_or.into_iter() {
215226
c.bench_function(&format!("short_circuit/or/{}", name), |b| {
216227
b.iter(|| expr_or.evaluate(black_box(&batch)).unwrap())
217228
});

0 commit comments

Comments
 (0)