Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnest with single expression #9069

Merged
merged 14 commits into from
Feb 4, 2024
Merged

Unnest with single expression #9069

merged 14 commits into from
Feb 4, 2024

Conversation

jayzhan211
Copy link
Contributor

Which issue does this PR close?

Revisit from #6796, only single expression included in this PR.

Closes #.

Rationale for this change

Ref #6555

What changes are included in this PR?

Two types of unnest expressions

  1. select unnest([1,2,3])
  2. select unnest(column)

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 30, 2024
@jayzhan211 jayzhan211 marked this pull request as ready for review January 30, 2024 14:13
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is really nice initiative, thanks @jayzhan211, I'll review it this week if no one else beats me

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jayzhan211 -- this PR looks awesome -- thank you.

I think it needs a few more tests and avoiding a panic, but otherise 👨‍🍳 👌

Thanks again.

.build()
} else {
// Only support single unnest expression for now
assert_eq!(array_exprs_to_unnest.len(), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please make this return NotYetImplemented rather than panic?

6

statement ok
drop table unnest_table;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome -- can you please also add tests for:

  1. to unnest two columns (it is fine to error, but I think that behavior should be tested)
  2. trying to unnest a scalar value 1 rather than [1]
  3. to unnest zero columns (again should be an error)

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
}) = exprs[0]
{
// valid
} else if let Expr::Column(_) = exprs[0] {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arguments handling in Column seems not possible to handle anywhere before transforming to LogicalPlan. We should validate arguments inside UnnestExec or elsewhere downstream.

Signed-off-by: jayzhan211 <[email protected]>
@@ -124,6 +124,9 @@ fn physical_name(e: &Expr) -> Result<String> {

fn create_physical_name(e: &Expr, is_first_expr: bool) -> Result<String> {
match e {
Expr::Unnest(_) => {
internal_err!("Expr::Unnest should have been converted to LogicalPlan::Unest")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internal_err!("Expr::Unnest should have been converted to LogicalPlan::Unest")
internal_err!("Expr::Unnest should have been converted to LogicalPlan::Unnest")

@@ -82,6 +82,13 @@ impl ExprSchemable for Expr {
Expr::Case(case) => case.when_then_expr[0].1.get_type(schema),
Expr::Cast(Cast { data_type, .. })
| Expr::TryCast(TryCast { data_type, .. }) => Ok(data_type.clone()),
Expr::Unnest(Unnest { exprs }) => {
let arg_data_types = exprs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need iter here as we refer for the first element only?

Copy link
Contributor Author

@jayzhan211 jayzhan211 Feb 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep this, since there will be multiple exprs in the future

return not_impl_err!("unnest() does not support struct yet");
} else {
return plan_err!(
"unnest() can only be applied to array and structs and null"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line says unnest supports struct but line above says the opposite

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first line is struct nyi error.
second line is to tell you that only those 3 are valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats the confusing part. not_impl_err!("unnest() does not support struct yet"); - here the user can get the struct is not supported by unnest.
plan_err!("unnest() can only be applied to array and structs and null") here the message says that unnest support ONLY array, structs and nulls which confronts with first err message

Maybe I'm missing something?

([6], [11,12], 3)
;

query I
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some test description

----

query ?
select column1 from unnest_table;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this test?

statement ok
CREATE TABLE unnest_table
AS VALUES
([1,2,3], [7], 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include nulls, empty arrays into the table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added nulls in 79eb32d to keep this PR moving

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 I left some comments, nice job
Back in the day I tried to implement unnest but went to wrong direction through builtin function and eventually gave up. Your approach with expression is good

Signed-off-by: jayzhan211 <[email protected]>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jayzhan211 -- I think this PR is now ready to go. It looks like @comphead has a few more test suggestions in https://github.com/apache/arrow-datafusion/pull/9069/files#r1474726470 but I think we could also potentially add those tests as a follow on PR given this PR does not seem have special handling for them

statement ok
CREATE TABLE unnest_table
AS VALUES
([1,2,3], [7], 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added nulls in 79eb32d to keep this PR moving

@alamb alamb merged commit 24197d7 into apache:main Feb 4, 2024
22 checks passed
@alamb
Copy link
Contributor

alamb commented Feb 4, 2024

Thanks again @jayzhan211 -- this is really nice 👌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants