-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Sql-invoked function support for column statistics #23201
Add Sql-invoked function support for column statistics #23201
Conversation
8664a51
to
a1fe4c1
Compare
a1fe4c1
to
8951425
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Can you post an text version of the generated ANALYZE plan (or query info JSON) to show an example run
presto-spi/src/main/java/com/facebook/presto/spi/statistics/ColumnStatisticMetadata.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/LogicalPlanner.java
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/StatisticsAggregationPlanner.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the enhancement, lgtm. Some little nits, and a small question for discussing.
presto-spi/src/main/java/com/facebook/presto/spi/statistics/ColumnStatisticMetadata.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/com/facebook/presto/sql/planner/StatisticsAggregationPlanner.java
Outdated
Show resolved
Hide resolved
2721a1a
to
14cbc5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this creative solution
presto-spi/src/main/java/com/facebook/presto/spi/statistics/ColumnStatisticMetadata.java
Outdated
Show resolved
Hide resolved
fcd0458
to
b9ac488
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix, lgtm.
@aaneja here are some sample plans. Both are on Iceberg Without Hive
Iceberg With Hive
IMO big takeaway is that the additional project we added in the plan is actually optimized away at a later phase after plan generation, so it is essentially the same plan as before. When I add histograms in a later PR which requires |
Adds support in the ANALYZE infrastructure for additional projections or constant arguments to aggregation functions. This allows connectors to support a wider range of functionality when generating metadata for column statistic collection
b9ac488
to
b7eb12f
Compare
Description
Adds support in the ANALYZE infrastructure for adding projections or arguments to aggregation functions. This can support a wider range of functionality for connectors. This uses the infrastructure from the
SqlInvokedScalarFunction
s to parse expressions returned by a connector inColumnStatisticMetadata
.Motivation and Context
The idea originally stemmed from discussion in #23114 (comment).
This is necessary to (1) support a
CAST
on the input argument forANALYZE
and (2) allows passing additional arguments to aggregation functions which may accept more than one argument. This could allowANALYZE
functions to be configurable through the session. One example use case of this is using a session variable to set the granularity at which a histogram stores information for statistics (#22365)Impact
Only affects existing connectors which use custom function names for certain column statistic types. Currently the only connector which uses this is Iceberg.
Test Plan
StatisticsAggregationPlanner
Contributor checklist
Release Notes