-
Notifications
You must be signed in to change notification settings - Fork 606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(optimizer): runtime check for scalar subquery in batch queries #13880
Conversation
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like an interesting idea. What's your motivation for doing this? (Did you meet some use cases, or just came up with this suddenly?!😄)
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #13880 +/- ##
==========================================
- Coverage 68.07% 68.00% -0.07%
==========================================
Files 1548 1552 +4
Lines 267474 267653 +179
==========================================
- Hits 182075 182018 -57
- Misses 85399 85635 +236
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
It's #12908. 😄 |
I am considering whether we should use a dedicated operator e.g. |
…ntime-check Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Refactored the implementation. PTAL again. 🥰 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the subquery is executed during the optimization phase?
|
||
impl ToBatch for LogicalMaxOneRow { | ||
fn to_batch(&self) -> Result<PlanRef> { | ||
todo!() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Forget to change this after reimplementation.
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM.
|
||
#[derive(Default)] | ||
pub struct CheckApplyElimination { | ||
result: CheckResult, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Why not simply put a Result<(), RwError>
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have weight on different error messages to provide better error reporting. 🥺
risingwave/src/frontend/src/optimizer/plan_visitor/apply_visitor.rs
Lines 68 to 70 in 32b0c58
fn default_behavior() -> Self::DefaultBehavior { | |
Merge(std::cmp::max) | |
} |
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
} | ||
|
||
// Check if all `Apply`s are eliminated and the subquery is unnested. | ||
plan.check_apply_elimination()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After you removed the to_stream
from logical join, we need to add a visitor (e.g. MaxOneRowFinder) to check whether dangling MaxOneRow
operators exist for Streaming Query.
The example
create table t (a int);
explain create materialized view v as select (select a from t) as c;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, we will meet the error Not supported: streaming nested-loop join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. So which approach do you think is better, call child.to_stream()
first or check HasLogicalMaxOneRow
before calling to_stream
? In the latter approach, it seems LogicalMaxOneRow.to_stream()
can be filled with unreachable
.
Logical Rewrite For Stream:
LogicalJoin { type: LeftOuter, on: true }
├─LogicalValues { rows: [[0:Int64]], schema: Schema { fields: [_row_id:Int64] } }
└─LogicalMaxOneRow
└─LogicalScan { table: t, columns: [a, _row_id] }
ERROR: Not supported: streaming nested-loop join
HINT: The non-equal join in the query requires a nested-loop join executor, which could be very expensive to run. Consider rewriting the query to use dynamic filter as a substitute if possible.
See also: https://github.com/risingwavelabs/rfcs/blob/main/rfcs/0033-dynamic-filter.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach looks better IMO: check HasLogicalMaxOneRow before calling to_stream.
Signed-off-by: Bugen Zhao <[email protected]>
Not exactly. We now enable the checking of scalar subquery cardinality at runtime. Previously, only compile-time information was used, which resulted in some queries being unable to be planned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you so much for this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
In #12908, a hack is introduced to allow a system query to be executed. We concluded that the support for unique keys may help to get rid of it. However, as discussed in #5019 and #1335, it appears that supporting runtime check for scalar subquery can be much easier to implement.
I reuse the plan nodes and executor forLimit
to achieve that, not sure if it's a good practice.Introduce a new plan node named
MaxOneRow
and its corresponding batch executor for doing this.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.