Skip to content

Create name of aggregate expression from expressions #11707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jayzhan211 opened this issue Jul 29, 2024 · 4 comments · Fixed by #11776
Closed

Create name of aggregate expression from expressions #11707

jayzhan211 opened this issue Jul 29, 2024 · 4 comments · Fixed by #11776

Comments

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jul 29, 2024

          I agree creating the name automatically from existing information would be better. Perhaps we can do that as a follow on PR

Originally posted by @alamb in #11617 (comment)

I think it is possible to build the name from expressions, so we don't need. to provider name in expression builder

The code is how the name constructed currently.

    // unpack (nested) aliased logical expressions, e.g. "sum(col) as total"
    let (name, e) = match e {
        Expr::Alias(Alias { expr, name, .. }) => (name.clone(), expr.as_ref()),
        Expr::AggregateFunction(_) => (e.display_name().unwrap_or(physical_name(e)?), e),
        _ => (physical_name(e)?, e),
    };

The solution in my mind

create name in AggregateExprBuilder::build so we don't need function name in create_aggregate_expr_with_dfschema.

    let mut builder =
        AggregateExprBuilder::new(Arc::new(fun.clone()), input_phy_exprs.to_vec());
    builder = builder.sort_exprs(sort_exprs.to_vec());
    builder = builder.order_by(ordering_req.to_vec());
    builder = builder.logical_exprs(input_exprs.to_vec());
    builder = builder.dfschema(dfschema.clone());
    let schema: Schema = dfschema.into();
    builder = builder.schema(Arc::new(schema));
    // no need anymore
    //  builder = builder.name(name);   
    builder.build() // create name inside build()

Additional context

Remove name in create_aggregate_expr is ideally, but if someone has given the name that is not we expected, their code might failed. 🤔

@jayzhan211 jayzhan211 changed the title Build name for aggregate expression from expressions Create name of aggregate expression from expressions Jul 29, 2024
@lewiszlw
Copy link
Member

lewiszlw commented Aug 1, 2024

How to handle aliased aggr expr? Keep an optional name param in create_aggregate_expr or Change aggr_expr field of AggregateExec from Vec<Arc<dyn AggregateExpr>> to Vec<(Arc<dyn AggregateExpr>, String)>?

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Aug 1, 2024

It's not clear to me why do we need to handle aliased aggr expr, but create_aggregate_expr is what I plan to deprecate.

@lewiszlw
Copy link
Member

lewiszlw commented Aug 2, 2024

For example select count(1) as count from t, the generated name is count(1), we need be able to store alias.
AggregateExec's aggr_expr field doesn't support alias, so we have to set alias as the name of aggr expr.

pub struct AggregateExec {
    /// Aggregate expressions
    aggr_expr: Vec<Arc<dyn AggregateExpr>>,
    ...
}

pub struct ProjectionExec {
    /// The projection expressions stored as tuples of (expression, output column name)
    pub(crate) expr: Vec<(Arc<dyn PhysicalExpr>, String)>,
    ...
}

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Aug 2, 2024

I see. I think keeping optional param for alias seems like the best idea. Maybe we rename it to alias, so it is clear what the param is for? Also, alias in AggregateExprBuilder. If None is provided, then we create the name based our datafusion's function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants