Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement flatten function #6995

Closed
izveigor opened this issue Jul 17, 2023 · 3 comments · Fixed by #7239
Closed

Implement flatten function #6995

izveigor opened this issue Jul 17, 2023 · 3 comments · Fixed by #7239
Labels
enhancement New feature or request

Comments

@izveigor
Copy link
Contributor

Is your feature request related to a problem or challenge?

Summary

Characteristic Description
Function name: flatten
Aliases: -
Original function?: No
Function Description: Azure DataBricks: Returns an array of the elements in the union of array1 and array2 without duplicates.
Spark SQL: Creates a single array from an array of arrays column.
ClickHouse: Converts an array of arrays to a flat array. Function: Applies to any depth of nested arrays. Does not change arrays that are already flat. The flattened array contains all the elements from all source arrays.
DuckDB: Concatenate a list of lists into a single list. This only flattens one level of the list (see examples).
Sources: Concept Azure DuckDB ClickHouse Spark

Examples:

D select flatten([[1, 2, 3], [4, 5, 6]]);
┌──────────────────────────────────────────────────────────────────────────────┐
│ flatten(main.list_value(main.list_value(1, 2, 3), main.list_value(4, 5, 6))) │
│                                   int32[]                                    │
├──────────────────────────────────────────────────────────────────────────────┤
│ [1, 2, 3, 4, 5, 6]                                                           │
└──────────────────────────────────────────────────────────────────────────────┘
D select flatten([[[1, 1], [2, 2], [3, 3]], [[4, 4], [5, 5], [6, 6]]]);
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ flatten(main.list_value(main.list_value(main.list_value(1, 1), main.list_value(2, 2), main.list_value(3, 3)), main.list_value(main.list_value(4, 4), main.li…  │
│                                                                           int32[][]                                                                            │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ [[1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6]]                                                                                                               │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

@jayzhan211
Copy link
Contributor

Let me take a look on this issue.

@izveigor
Copy link
Contributor Author

izveigor commented Aug 6, 2023

That sounds good, @jayzhan211! I only want to warn you that DuckDB version of flatten function is different from the rest (decreases only 1 level than the rest to 1 dimensional array). I think the last option is much better (ClickHouse, Spark and Azure DataBricks).

@jayzhan211
Copy link
Contributor

For the latter version, it seems the function is implemented in #6796. I will continue on this after that PR merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants