feat(optimizer): runtime check for scalar subquery in batch queries #13880

BugenZhao · 2023-12-08T09:05:04Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

In #12908, a hack is introduced to allow a system query to be executed. We concluded that the support for unique keys may help to get rid of it. However, as discussed in #5019 and #1335, it appears that supporting runtime check for scalar subquery can be much easier to implement.

~~I reuse the plan nodes and executor for Limit to achieve that, not sure if it's a good practice.~~

Introduce a new plan node named MaxOneRow and its corresponding batch executor for doing this.

Checklist

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Bugen Zhao <[email protected]>

xxchan

This sounds like an interesting idea. What's your motivation for doing this? (Did you meet some use cases, or just came up with this suddenly?!😄)

src/frontend/src/optimizer/plan_node/logical_limit.rs

codecov · 2023-12-11T06:43:02Z

Codecov Report

Attention: 78 lines in your changes are missing coverage. Please review.

Comparison is base (1cacc07) 68.07% compared to head (46e2f6b) 68.00%.
Report is 13 commits behind head on main.

Files	Patch %	Lines
...ntend/src/optimizer/plan_node/batch_max_one_row.rs	0.00%	29 Missing ⚠️
...end/src/optimizer/plan_node/logical_max_one_row.rs	55.55%	24 Missing ⚠️
src/batch/src/executor/max_one_row.rs	65.00%	21 Missing ⚠️
...end/src/optimizer/plan_node/generic/max_one_row.rs	78.94%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #13880      +/-   ##
==========================================
- Coverage   68.07%   68.00%   -0.07%     
==========================================
  Files        1548     1552       +4     
  Lines      267474   267653     +179     
==========================================
- Hits       182075   182018      -57     
- Misses      85399    85635     +236

Flag	Coverage Δ
rust	`68.00% <65.93%> (-0.07%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BugenZhao · 2023-12-11T06:48:02Z

What's your motivation for doing this?

It's #12908. 😄

chenzl25 · 2023-12-11T11:37:28Z

I am considering whether we should use a dedicated operator e.g. MaxOneRow to abstract this semantic instead of coupling with the Limit.
Pros: we can use Limit more easily without caring about the exceeding_limit restrictions.

…ntime-check Signed-off-by: Bugen Zhao <[email protected]>

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao · 2023-12-19T06:48:59Z

I am considering whether we should use a dedicated operator e.g. MaxOneRow to abstract this semantic instead of coupling with the Limit. Pros: we can use Limit more easily without caring about the exceeding_limit restrictions.

Refactored the implementation. PTAL again. 🥰

stdrc

So the subquery is executed during the optimization phase?

src/frontend/src/optimizer/plan_node/logical_join.rs

stdrc · 2023-12-19T07:53:35Z

src/frontend/src/optimizer/plan_node/logical_max_one_row.rs

+
+impl ToBatch for LogicalMaxOneRow {
+    fn to_batch(&self) -> Result<PlanRef> {
+        todo!()


Oops. Forget to change this after reimplementation.

Signed-off-by: Bugen Zhao <[email protected]>

src/batch/src/executor/max_one_row.rs

Signed-off-by: Bugen Zhao <[email protected]>

xxchan

Generally LGTM.

proto/batch_plan.proto

src/batch/src/executor/max_one_row.rs

xxchan · 2023-12-19T10:28:48Z

src/frontend/src/optimizer/plan_visitor/apply_visitor.rs

+
+#[derive(Default)]
+pub struct CheckApplyElimination {
+    result: CheckResult,


nit: Why not simply put a Result<(), RwError> here

We have weight on different error messages to provide better error reporting. 🥺

risingwave/src/frontend/src/optimizer/plan_visitor/apply_visitor.rs

Lines 68 to 70 in 32b0c58

fn default_behavior() -> Self::DefaultBehavior {

Merge(std::cmp::max)

}

Signed-off-by: Bugen Zhao <[email protected]>

chenzl25 · 2023-12-19T14:24:24Z

src/frontend/src/optimizer/logical_optimization.rs

-        }
+
+        // Check if all `Apply`s are eliminated and the subquery is unnested.
+        plan.check_apply_elimination()?;


After you removed the to_stream from logical join, we need to add a visitor (e.g. MaxOneRowFinder) to check whether dangling MaxOneRow operators exist for Streaming Query.

The example

create table t (a int); explain create materialized view v as select (select a from t) as c;

Otherwise, we will meet the error Not supported: streaming nested-loop join.

You're right. So which approach do you think is better, call child.to_stream() first or check HasLogicalMaxOneRow before calling to_stream? In the latter approach, it seems LogicalMaxOneRow.to_stream() can be filled with unreachable.

Logical Rewrite For Stream: LogicalJoin { type: LeftOuter, on: true } ├─LogicalValues { rows: [[0:Int64]], schema: Schema { fields: [_row_id:Int64] } } └─LogicalMaxOneRow └─LogicalScan { table: t, columns: [a, _row_id] } ERROR: Not supported: streaming nested-loop join HINT: The non-equal join in the query requires a nested-loop join executor, which could be very expensive to run. Consider rewriting the query to use dynamic filter as a substitute if possible. See also: https://github.com/risingwavelabs/rfcs/blob/main/rfcs/0033-dynamic-filter.md

This approach looks better IMO: check HasLogicalMaxOneRow before calling to_stream.

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao · 2023-12-20T02:11:34Z

So the subquery is executed during the optimization phase?

Not exactly. We now enable the checking of scalar subquery cardinality at runtime. Previously, only compile-time information was used, which resulted in some queries being unable to be planned.

chenzl25

LGTM. Thank you so much for this PR!

stdrc

LGTM

BugenZhao added 4 commits December 8, 2023 16:06

support it

7ec6bb4

Signed-off-by: Bugen Zhao <[email protected]>

remove hack for nspname

e3841c8

Signed-off-by: Bugen Zhao <[email protected]>

consider check_exceeding when unnest limit

3988ad4

Signed-off-by: Bugen Zhao <[email protected]>

fix planner test

58d8e76

Signed-off-by: Bugen Zhao <[email protected]>

github-actions bot added the type/feature label Dec 8, 2023

BugenZhao added 2 commits December 8, 2023 21:10

refine doc

841393c

Signed-off-by: Bugen Zhao <[email protected]>

refine error reporting with a checker

3e03ec2

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao marked this pull request as ready for review December 8, 2023 13:26

add tests

d92fa57

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao requested review from chenzl25, xxchan and st1page December 11, 2023 06:20

xxchan reviewed Dec 11, 2023

View reviewed changes

src/frontend/src/optimizer/plan_node/logical_limit.rs Outdated Show resolved Hide resolved

Merge branch 'main' into bz/scalar-subquery-runtime-check

b1746a2

BugenZhao added 5 commits December 19, 2023 11:48

Merge remote-tracking branch 'origin/main' into bz/scalar-subquery-ru…

141bdc6

…ntime-check Signed-off-by: Bugen Zhao <[email protected]>

introduce max one row

519d4ab

Signed-off-by: Bugen Zhao <[email protected]>

update plan

d37b251

Signed-off-by: Bugen Zhao <[email protected]>

revert changes on limit

be997b8

Signed-off-by: Bugen Zhao <[email protected]>

fix fmt and clippy

ed9867d

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao requested review from xxchan and stdrc December 19, 2023 06:49

stdrc reviewed Dec 19, 2023

View reviewed changes

fix to batch

6286ddd

Signed-off-by: Bugen Zhao <[email protected]>

chenzl25 reviewed Dec 19, 2023

View reviewed changes

src/batch/src/executor/max_one_row.rs Outdated Show resolved Hide resolved

do not yield eagerly

ccbebb3

Signed-off-by: Bugen Zhao <[email protected]>

xxchan reviewed Dec 19, 2023

View reviewed changes

BugenZhao added 2 commits December 19, 2023 21:48

add test and add comments

fc89bc0

Signed-off-by: Bugen Zhao <[email protected]>

remove to_stream attempts

32b0c58

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao requested review from stdrc, chenzl25 and xxchan December 19, 2023 14:03

chenzl25 reviewed Dec 19, 2023

View reviewed changes

check whether max one row exists after unnesting in streaming mode

46e2f6b

Signed-off-by: Bugen Zhao <[email protected]>

BugenZhao requested a review from chenzl25 December 20, 2023 02:11

chenzl25 approved these changes Dec 20, 2023

View reviewed changes

stdrc approved these changes Dec 20, 2023

View reviewed changes

BugenZhao added this pull request to the merge queue Dec 20, 2023

Merged via the queue into main with commit 66f6bc0 Dec 20, 2023
7 of 9 checks passed

BugenZhao deleted the bz/scalar-subquery-runtime-check branch December 20, 2023 04:55

BugenZhao mentioned this pull request Nov 19, 2024

fix(optimizer): enforce input of BatchMaxOneRow to be singleton #19452

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(optimizer): runtime check for scalar subquery in batch queries #13880

feat(optimizer): runtime check for scalar subquery in batch queries #13880

BugenZhao commented Dec 8, 2023 •

edited

Loading

xxchan left a comment

codecov bot commented Dec 11, 2023 •

edited

Loading

BugenZhao commented Dec 11, 2023

chenzl25 commented Dec 11, 2023

BugenZhao commented Dec 19, 2023

stdrc left a comment

stdrc Dec 19, 2023

BugenZhao Dec 19, 2023

xxchan left a comment

xxchan Dec 19, 2023

BugenZhao Dec 19, 2023

chenzl25 Dec 19, 2023 •

edited

Loading

chenzl25 Dec 19, 2023

BugenZhao Dec 19, 2023 •

edited

Loading

chenzl25 Dec 20, 2023

BugenZhao commented Dec 20, 2023

chenzl25 left a comment

stdrc left a comment

	fn default_behavior() -> Self::DefaultBehavior {
	Merge(std::cmp::max)
	}

feat(optimizer): runtime check for scalar subquery in batch queries #13880

feat(optimizer): runtime check for scalar subquery in batch queries #13880

Conversation

BugenZhao commented Dec 8, 2023 • edited Loading

What's changed and what's your intention?

Checklist

Documentation

Release note

xxchan left a comment

Choose a reason for hiding this comment

codecov bot commented Dec 11, 2023 • edited Loading

Codecov Report

BugenZhao commented Dec 11, 2023

chenzl25 commented Dec 11, 2023

BugenZhao commented Dec 19, 2023

stdrc left a comment

Choose a reason for hiding this comment

stdrc Dec 19, 2023

Choose a reason for hiding this comment

BugenZhao Dec 19, 2023

Choose a reason for hiding this comment

xxchan left a comment

Choose a reason for hiding this comment

xxchan Dec 19, 2023

Choose a reason for hiding this comment

BugenZhao Dec 19, 2023

Choose a reason for hiding this comment

chenzl25 Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

chenzl25 Dec 19, 2023

Choose a reason for hiding this comment

BugenZhao Dec 19, 2023 • edited Loading

Choose a reason for hiding this comment

chenzl25 Dec 20, 2023

Choose a reason for hiding this comment

BugenZhao commented Dec 20, 2023

chenzl25 left a comment

Choose a reason for hiding this comment

stdrc left a comment

Choose a reason for hiding this comment

BugenZhao commented Dec 8, 2023 •

edited

Loading

codecov bot commented Dec 11, 2023 •

edited

Loading

chenzl25 Dec 19, 2023 •

edited

Loading

BugenZhao Dec 19, 2023 •

edited

Loading