Skip to content

feat: metadata handling for aggregates and window functions #15911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented May 1, 2025

Which issue does this PR close?

Rationale for this change

This change is a follow on to #15646. With this change we can now handle metadata for both window and aggregate functions. It enables the use of extension types via this metadata handling.

What changes are included in this PR?

Instead of passing DataType to aggregates and windows we now pass Field in their arguments. return_type has been replaced with return_field so we can get metadata out of these functions as well.

Are these changes tested?

All existing unit tests pass. New unit tests are added.

Are there any user-facing changes?

Yes, the migration guide contains information on the updates that the user will need to make for their user defined functions.

TODO before moving to ready to review

  • Add unit test that is a window of an aggregate function
  • Verify migration guide covers both aggregate and window functions
  • Find remaining places where the function name doesn't match the arguments (ie: fn return_type() that returns a Field)

@timsaucer timsaucer added datafusion Changes in the datafusion crate api change Changes the API exposed to users of the crate core Core DataFusion crate functions Changes to functions implementation ffi Changes to the ffi crate and removed ffi Changes to the ffi crate labels May 1, 2025
@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules proto Related to proto crate labels May 1, 2025
@timsaucer timsaucer force-pushed the feat/metadata-handling-aggregates branch from 7bfcb7b to 8f0dc24 Compare May 1, 2025 14:12
@timsaucer timsaucer marked this pull request as ready for review May 3, 2025 13:46
@timsaucer
Copy link
Contributor Author

FYI @paleolimbot @crystalxyz since this impacts both of your work

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! (Particularly for the heroic updates to all the tests 😬 )

I took a look through and I didn't spot anything out of place (although I'm new to datafusion!).

I'm not sure if the tests are intended to be examples, but it may be worth a slightly more intuitive one (e.g., a sum() that propagates a metadata key unit from the source to the destination).

I also still find it strange to use a Field for these since the name is ignored. Using a dedicated type decoupled from arrow-rs might give you more future flexibility if you need to modify this approach in any way (e.g., to reduce the amount of JSON parsing required when invoking a function on an extension type with parameters). Obviously an optional suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate datafusion Changes in the datafusion crate functions Changes to functions implementation logical-expr Logical plan and expressions optimizer Optimizer rules physical-expr Changes to the physical-expr crates proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add metadata support for Aggregate and Window Functions
2 participants