-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement flatten for MakeArray #7461
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
2af5a6d
to
f01063c
Compare
/// For example: | ||
/// [[1, 2, 3], [4, 5, 6]] => [1, 2, 3, 4, 5, 6] | ||
/// [[[1, 2], [3, 4]], [[5, 6], [7, 8]]] => [1, 2, 3, 4, 5, 6, 7, 8] | ||
/// Panics if the expression cannot be flattened. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code doesn't seem to panic -- instead it ignores errors
/// [[1, 2, 3], [4, 5, 6]] => [1, 2, 3, 4, 5, 6] | ||
/// [[[1, 2], [3, 4]], [[5, 6], [7, 8]]] => [1, 2, 3, 4, 5, 6, 7, 8] | ||
/// Returns an error if the expression cannot be flattened. | ||
pub fn try_flatten(&self) -> Result<Self> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function appears to be trying to simplify the flatten
function by partially evaluating a make_array
. Given that I think this kind of logic is likely better in https://github.com/apache/arrow-datafusion/blob/main/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
or something similar.
In fact, I think since we have a flatten
function already: https://github.com/apache/arrow-datafusion/blob/87527c430852af82ef182e1607804f6ddd817bbb/datafusion/physical-expr/src/array_expressions.rs#L1662-L1665
Such an expression will already be automatically handled by expression simplification
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is mainly used in the unnest function currently, and I don't think it is a helpful expression optimization, it is too specialized.
Not only that, in my current design for unnest, I will process them before the optimization phase, specifically in the planning phase. If I move this function into the optimization phase, I still need to process this function prior, and since only the unnest function needs this now, we do nothing in the optimization phase.
Only If this optimization is generally used, I think moving it to the optimization phase makes sense.
You can look to the PR for unnest #6796 too, it is ready for view actually. I planned to reopen it after the flatten function merged, but since this function is tightly bind to unnest function, maybe review them together is better.
This PR appears to be stalled -- what do you think we should do with it @jayzhan211 ? |
Well, it depends on the review of #6796, since this PR is part of it. |
Marking this as draft to make it clear it is not waiting on review. Sorry that this effort has stalled @jayzhan211 -- I think we need to work in simplifying the code / fixing what we have before adding more features |
Which issue does this PR close?
Split from #6796
The previous implementation of flatten #6995 is for
Array
only, we need one forExpr
.Closes #.
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?