Skip to content

Simplify display format of AggregateFunctionExpr, add Expr::sql_name #15253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Mar 20, 2025

Conversation

irenjj
Copy link
Contributor

@irenjj irenjj commented Mar 15, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Mar 15, 2025
Comment on lines 2603 to 2639
struct SqlDisplay<'a>(&'a Expr);
impl Display for SqlDisplay<'_> {
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result {
match self.0 {
Expr::Column(c) => {
write!(f, "{}", c.name)
}
Expr::Literal(_) => {
write!(f, "aa")
}
Expr::ScalarVariable(..) => {
write!(f, "bb")
}
Expr::OuterReferenceColumn(..) => {
write!(f, "cc")
}
Expr::Placeholder(_) => {
write!(f, "dd")
}
Expr::Wildcard { .. } => {
write!(f, "ee")
}
Expr::AggregateFunction(AggregateFunction { func, params }) => {
match func.sql_name(params) {
Ok(name) => {
write!(f, "{name}")
}
Err(e) => {
write!(f, "got error from schema_name {}", e)
}
}
}
_ => write!(f, "{}", self.0.schema_name()),
}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a demo here.
AggregateFunctionExpr's name is constructed at logical expr phase, we need a new member sql_name for tree explain, and a new constructor SqlDisplay to handle the inner Expr recursively. cc @alamb 👀

@@ -1338,7 +1336,7 @@ physical_plan
12)-----------------------------┌─────────────┴─────────────┐
13)-----------------------------│ AggregateExec │
14)-----------------------------│ -------------------- │
15)-----------------------------│ aggr: count(Int64(1))
15)-----------------------------│ aggr: count(aa)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can see the final display changed.

@irenjj irenjj force-pushed the irenjj/simplify_agg_expr branch from 30efcd5 to 03e33a4 Compare March 17, 2025 12:47
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Mar 17, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 17, 2025
@irenjj irenjj marked this pull request as ready for review March 17, 2025 15:28
@alamb alamb changed the title Simplify display format of AggregateFunctionExpr Simplify display format of AggregateFunctionExpr, add Expr::sql_name Mar 17, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @irenjj -- this looks really nice

I think there are a few ways we should improve the comments, but I'll directly push those to this branch.

Thanks again!

Update; I don't have time right now to make the comment changes. Let me know what you think @irenjj and if you are willing to make them in this PR. If not I'll make a follow on PR tomorrow.

Thanks again ❤️

@@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> {
}
}

struct SqlDisplay<'a>(&'a Expr);
impl Display for SqlDisplay<'_> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have code to convert from Expr to SQL, in https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html#method.expr_to_sql

However, datafusion-expr doesn't currently depend on the datafusion-sql crate which is probably a good thing

So I think we should add a note in Expr::sql_name that if the user's want syntactically correct SQL to feed to some other system, they should use the Unparser and leave a link to https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html

@@ -2607,11 +2793,23 @@ pub(crate) fn schema_name_from_exprs_comma_separated_without_space(
schema_name_from_exprs_inner(exprs, ",")
}

/// Get `sql_name` for Vector of expressions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding so many new so specific functions, what do you think about adding something like the following. I think the API would be easier to understand and it would also let us save a String copy when formatting as sql

/// Formats a list of `&Expr` with a custom separator
struct ExprListDisplay<'a> {
  exprs: &'a [Expr],
  sep: &'a str,
}
...
impl <'a> Display for  ExprListDisplay<'a> {
...
}

51)│ files: 1 │
52)│ format: csv │
53)└───────────────────────────┘
07)│ group_by: string_col │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

70)│ format: memory ││ format: memory │
71)│ rows: 1 ││ rows: 1 │
72)└───────────────────────────┘└───────────────────────────┘
10)│ aggr: count(1) │
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wonderful

@irenjj
Copy link
Contributor Author

irenjj commented Mar 18, 2025

Thank you @irenjj -- this looks really nice

I think there are a few ways we should improve the comments, but I'll directly push those to this branch.

Thanks again!

Update; I don't have time right now to make the comment changes. Let me know what you think @irenjj and if you are willing to make them in this PR. If not I'll make a follow on PR tomorrow.

Thanks again ❤️

Thanks @alamb for your review, I will revise it later according to your suggestions.

@@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> {
}
}

struct SqlDisplay<'a>(&'a Expr);
impl Display for SqlDisplay<'_> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the main difference between Display? I found alias is different but others looks similar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, it's redundant for most cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should take nested expr into consideration, like aggr(case aggr() when...)

}
}
}
_ => write!(f, "{}", self.0.schema_name()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by default should be Display, self.0.to_string()

@@ -2607,11 +2793,23 @@ pub(crate) fn schema_name_from_exprs_comma_separated_without_space(
schema_name_from_exprs_inner(exprs, ",")
}

/// Get `sql_name` for Vector of expressions.
pub(crate) fn sql_name_from_exprs_comma_separated_without_space(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without_space is the old format and ideally we should standardize the format #10364 so better not to introduce another format without good reason

@jayzhan211
Copy link
Contributor

Is it possible to modify Display for Expr for explain statement?

@@ -202,53 +202,48 @@ physical_plan
02)│ AggregateExec │
03)│ -------------------- │
04)│ aggr: sum(bigint_col) │
05)│ │
06)│ group_by: │
07)│ string_col@0 as string_col│
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'll be helpful to keep the index @0, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true that column indices provide precise tracking and tracing capabilities for complex query analysis, but I think tree explain should be as simple as possible🤔, as doc says:

    /// TreeRender, displayed in the `tree` explain type.
    ///
    /// This format is inspired by DuckDB's explain plans. The information
    /// presented should be "user friendly", and contain only the most relevant
    /// information for understanding a plan. It should NOT contain the same level
    /// of detail information as the  [`Self::Default`] format.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key difference Display and the now named human_format is that Display has more details -- I tried to add comments that explain this

@irenjj
Copy link
Contributor Author

irenjj commented Mar 18, 2025

Is it possible to modify Display for Expr for explain statement?

I haven't tried it, not sure how it will affect other logic.

@irenjj
Copy link
Contributor Author

irenjj commented Mar 18, 2025

Thanks @alamb, @jayzhan211 and @xudong963 for your review, here are two points that remain unclear:

  1. For GROUP BY, is it necessary to preserve the row index -- for more information, additional functionality, or just better aesthetics?
  2. Should the default Display implementation of Expr be modified to replace SqlDisplay?

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

Thanks @alamb, @jayzhan211 and @xudong963 for your review, here are two points that remain unclear:

  1. For GROUP BY, is it necessary to preserve the row index -- for more information, additional functionality, or just better aesthetics?

I don't think we should keep the index for the tree explain plan

Is it possible to modify Display for Expr for explain statement?

  1. Should the default Display implementation of Expr be modified to replace SqlDisplay?

I think this would be a more disruptive change as it would change the existing EXPLAIN ... output. Thus I think we should consider it as a follow on PR

@alamb
Copy link
Contributor

alamb commented Mar 18, 2025

I pushed a commit to add more documentation explaining the different options for printing Exprs

I also pushed a commit to rename sql_name to human_display as that seemed much more explanatory when I was writing the documentation

Let me know what you think

@@ -64,6 +64,15 @@ use sqlparser::ast::{
///
/// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt
///
/// # Printing Expressions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this section and the one below to try and explain the uses of the different explain variants.

@alamb
Copy link
Contributor

alamb commented Mar 19, 2025

I'll plan to merge this tomorrow unless others would like time to review

@alamb
Copy link
Contributor

alamb commented Mar 20, 2025

Thank you @irenjj @jayzhan211 and @xudong963 🙏

@alamb alamb merged commit 11838be into apache:main Mar 20, 2025
27 checks passed
@berkaysynnada
Copy link
Contributor

@irenjj AggregateFunctionExpr has with_new_expressions() API. As datafusion hasn't implemented it yet, you didn't have difficulty rewriting the human_display according to the new expressions. Could you suggest a way to update human_readable field for the implementers of with_new_expressions()?

@irenjj
Copy link
Contributor Author

irenjj commented Mar 28, 2025

@irenjj AggregateFunctionExpr has with_new_expressions() API. As datafusion hasn't implemented it yet, you didn't have difficulty rewriting the human_display according to the new expressions. Could you suggest a way to update human_readable field for the implementers of with_new_expressions()?

I believe that the human_display in AggregateFunctionExpr is constructed during the logical plan phase based on the available information at that time (and the name in AggregateFunctionExpr is similar). On the other hand, with_new_expressions() replaces the args in AggregateFunctionExpr during the physical plan phase (perhaps related to the optimizer?). It's perfectly fine to keep human_display unchanged when implementing with_new_expressions() because the name used in the original indented explain is actually quite similar to human_display.

Perhaps what we should really focus on is ensuring that the explain output correctly displays the name of the expressions after with_new_expressions() has been applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[tree explain] Simplify display format of AggregateFunctionExpr
5 participants