-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: make columnize_expr
resistant to display_name collisions
#10459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alias_expr, | ||
Arc::new(outer_aggr), | ||
)?))) | ||
Ok(Some(project(outer_aggr, alias_expr)?)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function project()
will internally call columnize_expr
, which can simplify the code here.
@@ -1154,12 +1154,12 @@ mod test { | |||
let table_scan = test_table_scan()?; | |||
|
|||
let plan = LogicalPlanBuilder::from(table_scan) | |||
.project(vec![lit(1) + col("a")])? | |||
.project(vec![lit(1) + col("a"), col("a")])? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It corresponds to select a+1 from (select a+1 from t)
, which is not a valid query.
> select a+1 from (select a+1 from t);
Schema error: No field named a. Valid fields are "t.a + Int64(1)"
columnize_expr currently only works when the input plan is aggregate or window.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jonahgao -- I went through this PR quite carefully and it looks really nice to me ❤️
I made a note of some place we maybe could avoid some clones, but I don't think it is required
cc @peter-toth and @erratic-pattern
@@ -1619,7 +1619,16 @@ select count(1) from v; | |||
query I | |||
select a + b from (select 1 as a, 2 as b, 1 as "a + b"); | |||
---- | |||
1 | |||
3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
@@ -228,37 +228,18 @@ impl OptimizerRule for SingleDistinctToGroupBy { | |||
.collect::<Result<Vec<_>>>()?; | |||
|
|||
// construct the inner AggrPlan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @appletreeisyellow - this PR contains some changes which may conflict with #10295
Co-authored-by: Andrew Lamb <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome -- thank you @jonahgao
Really really nice
Thanks for reviewing @alamb . |
…he#10459) * fix: make `columnize_expr` resistant to display_name collisions * fix simple_window_function test * remove Projection * add tests * retry ci * fix DataFrame tests * Update datafusion/expr/src/logical_plan/plan.rs Co-authored-by: Andrew Lamb <[email protected]> * Remove copies --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #10413.
Rationale for this change
Background
columnize_expr
is used during planning projections. Usually, a projected expr can only reference the output data of the input plan through a column reference, but this does not work for aggregation and window functions. For exampleThe aggregate expression
sum(col("c2"))
is calculated in the aggregate logical plan, and the projection plan references its result through the aggregate expression directly. To make it work appropriately, we need to convert the aggregate expression in the projection into a column reference of the aggregate plan, which is done bycolumnize_expr
.Problem And Fix
Previously,
columnize_expr
searched for the corresponding column from the input schema through the display name of projected expr. However, different expressions may have the same display name. It is easy to reproduce by creating the same alias for another differenct expression. Collisions of display name will causecolumnize_expr
to find the wrong column, producing incorrect results as in issue #10413.The fix is to search from input plan's output expressions directly based on the expression itself, instead of searching by display name.
What changes are included in this PR?
Refactor columnize_expr so that it can correctly find the column corresponding to the expression.
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes. columnize_expr is a public method, although it may only be used internally in DataFusion.