Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Grouping functions with Group By CUBE/ROLLUP/GROUPING SETS #5647

Open
mingmwang opened this issue Mar 20, 2023 · 7 comments · May be fixed by #12565 or #10208
Open

Support Grouping functions with Group By CUBE/ROLLUP/GROUPING SETS #5647

mingmwang opened this issue Mar 20, 2023 · 7 comments · May be fixed by #12565 or #10208
Assignees
Labels
enhancement New feature or request

Comments

@mingmwang
Copy link
Contributor

mingmwang commented Mar 20, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

PostgreSQL, SparkSQL and Oracle support using GROUPING functions to specify the null is from subtotal or from original
data.
https://www.postgresql.org/docs/15/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE

Databricks SparkSQL
https://docs.databricks.com/sql/language-manual/functions/grouping.html

Oracle
https://oracle-base.com/articles/misc/rollup-cube-grouping-functions-and-grouping-sets#grouping

Describe the solution you'd like

Describe alternatives you've considered

Additional context

@mingmwang mingmwang added the enhancement New feature or request label Mar 20, 2023
@mingmwang
Copy link
Contributor Author

I'm working on it now.

@l1t1
Copy link

l1t1 commented Apr 26, 2024

where is the document of these features?

@bgjackma
Copy link

take

@bgjackma
Copy link

I don't think this works as a aggregate function in the physical plan. It depends on the grouping rather than the data, so the current abstraction doesn't have access to the necessary information.

I think this case (along with GROUPING_ID, any others?) will need special handling in GroupHashedAggregateStream.

@alamb alamb linked a pull request Sep 19, 2024 that will close this issue
@alamb
Copy link
Contributor

alamb commented Sep 19, 2024

Note there is a PR with a proposed implementation from @JasonLi-cn in #10208

@bgjackma
Copy link

Thanks for the heads up, that's helpful however I think I have a slightly nicer solution that doesn't force changes on existing accumulators.

@eejbyfeldt
Copy link
Contributor

eejbyfeldt commented Sep 20, 2024

Note that @mingmwang (the author of this ticket) had an alternative approach here: #5749 but it seems like it was never pushed above the finish line. It is based on how it is implemented in Spark.

What is good with that approach is that it also helps address other issues in the current implementation of grouping sets. For example the current implementation will produce incorrect results when there are null values in the actual columns. I also looked into creating PR for only fixing that bug in a similar way to #5749 but need to spend some more time on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment