-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement basic unnest function #6796
Conversation
Current implementation. There are two workflows for unnest
Workflow 1: Workflow 2: |
set datafusion.execution.target_partitions = 1; | ||
|
||
query ? | ||
select unnest(make_array(1,2,3,4,5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also recommend to create test with one of aggregate functions as stated in the description of the issue: #6555
SELECT sum(a) AS total FROM (SELECT unnest(make_array(3, 5, 6) AS a) AS b;
----
14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one also. Add to task list
datafusion/sql/src/planner.rs
Outdated
)?); | ||
|
||
self.apply_expr_alias(apply_name_plan, alias.columns) | ||
let plan = self.apply_expr_alias(plan, alias.columns)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order is changed so that we can get either expr
or table. expr
.
Example in array.slt
table.expr: SELECT sum(data.a), sum(data.b) FROM unnest(make_array(1, 2, 3), make_array(4, 5, 6, 7)) as data(a, b);
expr: SELECT sum(a), sum(b) FROM unnest(make_array(1, 2, 3), make_array(4, 5, 6, 7)) as data(a, b);
Wait for sqlparser 0.36 |
I was planning to release sqlparser in a few weeks -- but if it is blocking this PR I can find time to make a release sooner |
I hope sqlparser can be released soon! Thanks! |
Next release is tracked by apache/datafusion-sqlparser-rs#905 I will try and prioritize it but this week is short for me because of the US holiday (and I am spending most of my DataFusion time on #6800) |
@alamb Kindly remind you. :) |
9127da7
to
e55a2fb
Compare
Possible improvement: TODO in another PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this is getting there but I don't think it is ready to merge yet (for reasons I left in comments below)
My major concerns are that we aren't using the LogicalPlan::Unnest
and UnnestExec
physical operator -- doing so would mean that improvements we plan (like making UnnestExec
more efficient) will not apply to using unnest
as a function
As this PR is large and I would like to move it along, I suggest
- Remove
flatten
(we can put that into another PR) - Make a PR with just the
Expr::Unnest
-- maybe this could simply create aLogicalPlan::Unnest
? - Make a PR to add any additional features needed for unnest (like multiple column support, for example)
What do you think?
# TODO: Unnest columns fails | ||
query error DataFusion error: SQL error: TokenizerError\("Unterminated string literal at Line: 2, Column 95"\) | ||
caused by | ||
Internal error: UNNEST only supports list type\. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this appears still to be NotImplemented
datafusion/expr/src/expr.rs
Outdated
args: flatten_args, | ||
})) | ||
} | ||
_ => Err(DataFusionError::Internal(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ => Err(DataFusionError::Internal(format!( | |
_ => Err(DataFusionError::NotYetImplemented(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated version is in https://github.com/apache/arrow-datafusion/pull/7461/files
@@ -1038,6 +1042,114 @@ impl LogicalPlanBuilder { | |||
pub fn unnest_column(self, column: impl Into<Column>) -> Result<Self> { | |||
Ok(Self::from(unnest(self.plan, column.into())?)) | |||
} | |||
|
|||
pub fn join_unnest_plans( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use https://docs.rs/datafusion/latest/datafusion/physical_plan/unnest/struct.UnnestExec.html directly rather than building up a special LogicalPlan? It seems like UnnestExec
already does this
Let me work on #6995 first while thinking about step 2, since I don't think simply creating LogicalPlan from Expr::Unnest is trivial. |
Marking as draft to signify it is not waiting on review |
7c56682
to
cfcb8b3
Compare
To create LogicalPlan::Unnest, it seems we can only do that in the optimization step. But this can't avoid schema change, due to Convert expr to logical plan in |
The Unnest function is now processing in the planning step, converting Expr::Unnest to LogicalPlan::Unnest in this phase. Nothing to do in the optimize phase and only the existing UnnestExec is processed in the PhysicalPlan phase. Not sure if this is a good design but it works :) Maybe we can review the overall design, and then I will address the comments and resolve conflicts. |
I will try and review this more carefully later today and provide feedback. Thank you @jayzhan211 |
72273a5
to
907cc71
Compare
datafusion/expr/src/expr_schema.rs
Outdated
.collect::<Result<Vec<_>>>()?; | ||
|
||
if data_types.is_empty() { | ||
return Err(DataFusionError::NotImplemented( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: use macro
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
907cc71
to
199f770
Compare
Marking this one as draft as it has bitrotted for a while and I think we need to work on simplifing the array code we have before we add more |
I think DataFusion now supports UNNEST -- is this PR still being worked on? |
We support single columns only, the challenging one multiple columns has not yet been supported, I remember @jonahgao is working on it. I forgot which PR you mentioned. |
Not started yet. I just discovered that it hasn't been implemented in #9592 (comment), but I can start working on it from now on. |
I think I might have misunderstood. Neither of the two queries below has been implemented, the one discussed in this PR seems to be the latter. DataFusion CLI v37.0.0
❯ select unnest(ARRAY[1,2]), unnest(ARRAY[3,4,5]);
This feature is not implemented: Only support single unnest expression for now
❯ select * from unnest(ARRAY[1,2], ARRAY[2,3]);
This feature is not implemented: unnest() does not support multiple arguments yet I plan to try to implement them both. |
Which issue does this PR close?
Ref #6555
Closes #.
Rationale for this change
Support
unnest
for arraysWhat changes are included in this PR?
select unnest()
select * from unnest()
datafusion.execution.target_partitions > 1
(We might not need this?)Are these changes tested?
Are there any user-facing changes?