-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Migrate OrderSensitiveArrayAgg
to be a user defined aggregate
#11564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
limited_convert_logical_expr_to_physical_expr_with_dfschema(expr, dfschema)?, | ||
), | ||
Expr::Column(col) => { | ||
let idx = dfschema.index_of_column(col)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that I need dfschema in order to get the correct field, so I change all the UDAF to create_aggregate_expr_with_dfschema
which is used internally for UDAF only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After removing limited_convert_logical_expr_to_physical_expr_with_dfschema
, I think we don't need DFSchema
anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does that mean limited_convert_logical_expr_to_physical_expr_with_dfschema
will be temporary?
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
|
||
#[allow(clippy::too_many_arguments)] | ||
// This is not for external usage, consider creating with `create_aggregate_expr` instead. | ||
pub fn create_aggregate_expr_with_dfschema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first/last and ordered_array_agg are switched to this function. After #11359 , I think we can have a single create_aggregate_expr
again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make we can also make a builder to make using this function easier (e.g. it might be easy to swap one of the last three bools by accident)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I already filed the ticket in #11543.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- we are getting very close 🤞
I do think some of the APIs are getting a little clumsy but perhaps we can refine them over time as follow on PRs (e.g. with builders, etc)
@@ -57,6 +57,9 @@ pub struct AccumulatorArgs<'a> { | |||
/// The schema of the input arguments | |||
pub schema: &'a Schema, | |||
|
|||
/// The schema of the input arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I view this addition as basically making the UDAF framework as full featured as the built in one is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we will have either dfschema
or schema
at the end 🤔
/// | ||
/// `is_reversed` is used to indicate whether the aggregation is running in reverse order, | ||
/// it could be used to hint Accumulator to accumulate in the reversed order, | ||
/// you can just set to false if you are not reversing expression | ||
#[allow(clippy::too_many_arguments)] | ||
pub fn create_aggregate_expr( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the number of arguments it is getting, maybe it would be good to define a builder like the following:
let builder = AggregateExprBuilder::new(fun),
.with_input_phyiscal_exprs(input_phy_exprs)
...
.with_name(name)
.build()?
🤔
|
||
#[allow(clippy::too_many_arguments)] | ||
// This is not for external usage, consider creating with `create_aggregate_expr` instead. | ||
pub fn create_aggregate_expr_with_dfschema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make we can also make a builder to make using this function easier (e.g. it might be easy to swap one of the last three bools by accident)
@@ -495,18 +573,23 @@ impl AggregateExpr for AggregateFunctionExpr { | |||
}) | |||
.collect::<Vec<_>>(); | |||
let mut name = self.name().to_string(); | |||
replace_order_by_clause(&mut name); | |||
// TODO: Generalize order-by clause rewrite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we track this in a follow on ticket?
limited_convert_logical_expr_to_physical_expr_with_dfschema(expr, dfschema)?, | ||
), | ||
Expr::Column(col) => { | ||
let idx = dfschema.index_of_column(col)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does that mean limited_convert_logical_expr_to_physical_expr_with_dfschema
will be temporary?
OrderSensitiveArrayAgg
to UDAFOrderSensitiveArrayAgg
to be a user defined aggregate
|
TODO:
|
Thanks @alamb ! |
Thank you -- it is so close I can feel it |
Which issue does this PR close?
Part of #8708
Rationale for this change
Migrating DataFusion so there is no distinction between built in and user defined functions
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?