-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Aggregate functions to take builder parameters #859
Update Aggregate functions to take builder parameters #859
Conversation
@timsaucer - If there's a way for me to help with this lift without stepping on your toes, please let me know. |
If you wanted to divide and conquer we can, but actually I think another thing that would be very helpful would be to have a more ergonomic way to use aggregates as window functions. I could see two ways
What do you think? That could be done entirely independent of this PR |
…s filter but not distinct
…ses, which is filter but not distinct
eb53593
to
7e42e6c
Compare
src/functions.rs
Outdated
.map(|x| x.into_iter().map(|x| x.expr).collect::<Vec<_>>()) | ||
.unwrap_or_default(); | ||
let mut builder = agg_fn.order_by(order_by); | ||
// let order_by = order_by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this commented code be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Removed
@@ -1059,7 +870,7 @@ pub(crate) fn init_module(m: &Bound<'_, PyModule>) -> PyResult<()> { | |||
m.add_wrapped(wrap_pyfunction!(floor))?; | |||
m.add_wrapped(wrap_pyfunction!(from_unixtime))?; | |||
m.add_wrapped(wrap_pyfunction!(gcd))?; | |||
m.add_wrapped(wrap_pyfunction!(grouping))?; | |||
// m.add_wrapped(wrap_pyfunction!(grouping))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if this comment applies here?
// Code is commented out since grouping is not yet implemented
// https://github.com/apache/datafusion-python/issues/861
// aggregate_function!(grouping);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added that intentionally as a place holder until grouping
gets implemented.
Really excellent. I really appreciate how cleaner
Upstream I'll create an issue and add it to our next release cycle. |
Since this is a larger PR, I'm hoping to get at least one more approval before merging it in |
* Add NullTreatment enum wrapper and add filter option to approx_distinct * Small usability on aggregate * Adding documentation and additional unit test for approx_median * Update approx_percentil_cont with builder parameters it uses, which is filter but not distinct * Update approx_percentil_cont_with_weight with builder parameters it uses, which is filter but not distinct * Update array_agg to use aggregate options * Update builder options for avg aggregate function * move bit_and bit_or to use macro to generaty python fn * Update builder arguments for bitwise operators * Use macro for bool_and and bool_or * Update python wrapper for arguments appropriate to bool operators * Set corr to use macro for pyfunction * Update unit test to make it easier to debug * Update corr python wrapper to expose only builder parameters used * Update count and count_star to use macro for exposing * Update count and count_star with approprate aggregation options * Move covar_pop and covar_samp to use macro for aggregates * Updateing covar_pop and covar_samp with builder option * Use macro for last_value and move first_value to be near it * Update first_value and last_value with the builder parameters that are relevant * Remove grouping since it is not actually implemented upstream * Move median to use macro * Expose builder options for median * Expose nth value * Updating linear regression functions to use filter and macro * Update stddev and stddev_pop to use filter and macro * Expose string_agg * Add string_agg to python wrappers and add unit test * Switch sum to use macro in rust side and expose correct options in python wrapper * Use macro for exposing var_pop and var_samp * Add unit tests for filtering on var_pop and var_samp * Move approximation functions to use macro when possible * Update user documentation to explain in detail the options for aggregate functions * Update unit test to handle Python 3.10 * Clean up commented code
Which issue does this PR close?
Closes #780
Rationale for this change
This PR follows the same pattern as the recently closed #808 but does the same for aggregate functions. This is a usability enhancement.
What changes are included in this PR?
This PR updates the signatures for the aggregate functions to take optional parameters for the following:
We add these parameters to the aggregate functions based on which ones are used internally. The other options are not in the function call. Users can always add these using the builder function approach if any of the internals get updated at a later time.
Are there any user-facing changes?