-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(frontend): Supports cut OR condition and push down to storage #19812
Conversation
…nto li0k/batch_predicate_pushdown
…nto li0k/batch_predicate_pushdown
@@ -418,7 +418,7 @@ | |||
batch_plan: |- | |||
BatchExchange { order: [], dist: Single } | |||
└─BatchFilter { predicate: (((orders_count_by_user.user_id = 1:Int32) OR ((orders_count_by_user.user_id = 2:Int32) AND In(orders_count_by_user.date, 1111:Int32, 2222:Int32))) OR (orders_count_by_user.user_id <> 3:Int32)) } | |||
└─BatchScan { table: orders_count_by_user, columns: [orders_count_by_user.user_id, orders_count_by_user.date, orders_count_by_user.orders_count], distribution: UpstreamHashShard(orders_count_by_user.user_id, orders_count_by_user.date) } | |||
└─BatchScan { table: orders_count_by_user, columns: [orders_count_by_user.user_id, orders_count_by_user.date, orders_count_by_user.orders_count], scan_ranges: [orders_count_by_user.user_id = Int64(1), orders_count_by_user.user_id = Int64(2) AND orders_count_by_user.date = Int32(1111), orders_count_by_user.user_id = Int64(2) AND orders_count_by_user.date = Int32(2222)], distribution: UpstreamHashShard(orders_count_by_user.user_id, orders_count_by_user.date) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can not be pushed down with condition orders_count_by_user.user_id <> 3:Int32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I didn't realize that full table scan range would be converted to empty vec, and I fixed it with a special judgment.
src/frontend/src/utils/condition.rs
Outdated
|
||
scan_ranges.extend(scan_ranges_chunk); | ||
} | ||
scan_ranges.sort_by(|a, b| a.eq_conds.len().cmp(&b.eq_conds.len())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also sort lower_bound for each range here?🤔
e.g. SELECT * FROM orders_count_by_user WHERE (user_id < 10) or (user_id > 30) or (user_id > 5 and user_id < 15);
got following plan:
batch_plan: |-
BatchExchange { order: [], dist: Single }
└─BatchFilter { predicate: (((orders_count_by_user.user_id < 10:Int32) OR (orders_count_by_user.user_id > 30:Int32)) OR ((orders_count_by_user.user_id > 5:Int32) AND (orders_count_by_user.user_id < 15:Int32))) }
└─BatchScan { table: orders_count_by_user, columns: [orders_count_by_user.user_id, orders_count_by_user.date, orders_count_by_user.orders_count], scan_ranges: [orders_count_by_user.user_id < Int64(10), orders_count_by_user.user_id > Int64(30), orders_count_by_user.user_id > Int64(5) AND orders_count_by_user.user_id < Int64(15)], distribution: UpstreamHashShard(orders_count_by_user.user_id, orders_count_by_user.date) }
I think what we expect is:
batch_plan: |-
BatchExchange { order: [], dist: Single }
└─BatchFilter { predicate: (((orders_count_by_user.user_id < 10:Int32) OR (orders_count_by_user.user_id > 30:Int32)) OR ((orders_count_by_user.user_id > 5:Int32) AND (orders_count_by_user.user_id < 15:Int32))) }
└─BatchScan { table: orders_count_by_user, columns: [orders_count_by_user.user_id, orders_count_by_user.date, orders_count_by_user.orders_count], scan_ranges: [orders_count_by_user.user_id < Int64(15), orders_count_by_user.user_id > Int64(30)], distribution: UpstreamHashShard(orders_count_by_user.user_id, orders_count_by_user.date) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to reorder the predicate: SELECT * FROM orders_count_by_user WHERE (user_id < 10) or (user_id > 5 and user_id < 15) or (user_id > 30);
batch_plan: |-
BatchExchange { order: [], dist: Single }
└─BatchFilter { predicate: (((orders_count_by_user.user_id < 10:Int32) OR ((orders_count_by_user.user_id > 5:Int32) AND (orders_count_by_user.user_id < 15:Int32))) OR (orders_count_by_user.user_id > 30:Int32)) }
└─BatchScan { table: orders_count_by_user, columns: [orders_count_by_user.user_id, orders_count_by_user.date, orders_count_by_user.orders_count], scan_ranges: [orders_count_by_user.user_id <= Int64(15), orders_count_by_user.user_id > Int64(30)], distribution: UpstreamHashShard(orders_count_by_user.user_id, orders_count_by_user.date) }
Why we get orders_count_by_user.user_id <= Int64(15)
rather than orders_count_by_user.user_id < Int64(15)
here?🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we need to sort the lower bound instead of equal condition here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM
src/frontend/src/utils/condition.rs
Outdated
|
||
scan_ranges.extend(scan_ranges_chunk); | ||
} | ||
scan_ranges.sort_by(|a, b| a.eq_conds.len().cmp(&b.eq_conds.len())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we need to sort the lower bound instead of equal condition here.
…nto li0k/batch_predicate_pushdown
src/common/src/util/scan_range.rs
Outdated
@@ -98,6 +100,138 @@ impl ScanRange { | |||
range: full_range(), | |||
} | |||
} | |||
|
|||
pub fn covert_to_range(&self) -> (Bound<Vec<Datum>>, Bound<Vec<Datum>>) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name covert_to_range
appears to be a typo and should be convert_to_range
. This typo is propagated in the function's usage on lines 133 and 371. Consider fixing all instances to maintain consistency and readability.
Spotted by Graphite Reviewer
Is this helpful? React 👍 or 👎 to let us know.
if left_start_vec.is_empty() && right_start_vec.is_empty() { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When both start vectors are empty, the code returns true
without checking the end bounds. This could incorrectly mark ranges as overlapping when they don't actually overlap. Consider this case:
Range 1: (-∞, 5]
Range 2: (-∞, 3]
These ranges overlap, but:
Range 1: (-∞, 3]
Range 2: (5, ∞)
These ranges don't overlap despite both having empty start vectors. The end bounds need to be checked in all cases.
Spotted by Graphite Reviewer
Is this helpful? React 👍 or 👎 to let us know.
@@ -428,3 +428,54 @@ | |||
expected_outputs: | |||
- logical_plan | |||
- batch_plan | |||
- name: When OR clauses contain non-overlapping conditions,, we can pushdown serveral scan_range. | |||
before: | |||
- create_table_and_mv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggests adding some tests in descending order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, We don't need to consider the direction of the ordering in the range comparison in this pr, so we change all the defaults to asce
to simplify the comparison logic.
…nto li0k/batch_predicate_pushdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for your PR!
…nto li0k/batch_predicate_pushdown
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
related to #19525
This PR implements the optimizations mentioned in the issue
Checklist
Documentation
Release note