-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Simplify display format of AggregateFunctionExpr
, add Expr::sql_name
#15253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
datafusion/expr/src/expr.rs
Outdated
struct SqlDisplay<'a>(&'a Expr); | ||
impl Display for SqlDisplay<'_> { | ||
fn fmt(&self, f: &mut Formatter<'_>) -> fmt::Result { | ||
match self.0 { | ||
Expr::Column(c) => { | ||
write!(f, "{}", c.name) | ||
} | ||
Expr::Literal(_) => { | ||
write!(f, "aa") | ||
} | ||
Expr::ScalarVariable(..) => { | ||
write!(f, "bb") | ||
} | ||
Expr::OuterReferenceColumn(..) => { | ||
write!(f, "cc") | ||
} | ||
Expr::Placeholder(_) => { | ||
write!(f, "dd") | ||
} | ||
Expr::Wildcard { .. } => { | ||
write!(f, "ee") | ||
} | ||
Expr::AggregateFunction(AggregateFunction { func, params }) => { | ||
match func.sql_name(params) { | ||
Ok(name) => { | ||
write!(f, "{name}") | ||
} | ||
Err(e) => { | ||
write!(f, "got error from schema_name {}", e) | ||
} | ||
} | ||
} | ||
_ => write!(f, "{}", self.0.schema_name()), | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a demo here.
AggregateFunctionExpr
's name
is constructed at logical expr phase, we need a new member sql_name
for tree explain, and a new constructor SqlDisplay
to handle the inner Expr
recursively. cc @alamb 👀
@@ -1338,7 +1336,7 @@ physical_plan | |||
12)-----------------------------┌─────────────┴─────────────┐ | |||
13)-----------------------------│ AggregateExec │ | |||
14)-----------------------------│ -------------------- │ | |||
15)-----------------------------│ aggr: count(Int64(1)) │ | |||
15)-----------------------------│ aggr: count(aa) │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can see the final display changed.
30efcd5
to
03e33a4
Compare
AggregateFunctionExpr
AggregateFunctionExpr
, add Expr::sql_name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @irenjj -- this looks really nice
I think there are a few ways we should improve the comments, but I'll directly push those to this branch.
Thanks again!
Update; I don't have time right now to make the comment changes. Let me know what you think @irenjj and if you are willing to make them in this PR. If not I'll make a follow on PR tomorrow.
Thanks again ❤️
@@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> { | |||
} | |||
} | |||
|
|||
struct SqlDisplay<'a>(&'a Expr); | |||
impl Display for SqlDisplay<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have code to convert from Expr
to SQL, in https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html#method.expr_to_sql
However, datafusion-expr doesn't currently depend on the datafusion-sql crate which is probably a good thing
So I think we should add a note in Expr::sql_name
that if the user's want syntactically correct SQL to feed to some other system, they should use the Unparser
and leave a link to https://docs.rs/datafusion/latest/datafusion/sql/unparser/struct.Unparser.html
datafusion/expr/src/expr.rs
Outdated
@@ -2607,11 +2793,23 @@ pub(crate) fn schema_name_from_exprs_comma_separated_without_space( | |||
schema_name_from_exprs_inner(exprs, ",") | |||
} | |||
|
|||
/// Get `sql_name` for Vector of expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding so many new so specific functions, what do you think about adding something like the following. I think the API would be easier to understand and it would also let us save a String
copy when formatting as sql
/// Formats a list of `&Expr` with a custom separator
struct ExprListDisplay<'a> {
exprs: &'a [Expr],
sep: &'a str,
}
...
impl <'a> Display for ExprListDisplay<'a> {
...
}
51)│ files: 1 │ | ||
52)│ format: csv │ | ||
53)└───────────────────────────┘ | ||
07)│ group_by: string_col │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
70)│ format: memory ││ format: memory │ | ||
71)│ rows: 1 ││ rows: 1 │ | ||
72)└───────────────────────────┘└───────────────────────────┘ | ||
10)│ aggr: count(1) │ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is wonderful
Thanks @alamb for your review, I will revise it later according to your suggestions. |
@@ -2596,6 +2612,176 @@ impl Display for SchemaDisplay<'_> { | |||
} | |||
} | |||
|
|||
struct SqlDisplay<'a>(&'a Expr); | |||
impl Display for SqlDisplay<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the main difference between Display
? I found alias is different but others looks similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, it's redundant for most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should take nested expr into consideration, like aggr(case aggr() when...)
datafusion/expr/src/expr.rs
Outdated
} | ||
} | ||
} | ||
_ => write!(f, "{}", self.0.schema_name()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think by default should be Display, self.0.to_string()
datafusion/expr/src/expr.rs
Outdated
@@ -2607,11 +2793,23 @@ pub(crate) fn schema_name_from_exprs_comma_separated_without_space( | |||
schema_name_from_exprs_inner(exprs, ",") | |||
} | |||
|
|||
/// Get `sql_name` for Vector of expressions. | |||
pub(crate) fn sql_name_from_exprs_comma_separated_without_space( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
without_space
is the old format and ideally we should standardize the format #10364 so better not to introduce another format without good reason
Is it possible to modify |
@@ -202,53 +202,48 @@ physical_plan | |||
02)│ AggregateExec │ | |||
03)│ -------------------- │ | |||
04)│ aggr: sum(bigint_col) │ | |||
05)│ │ | |||
06)│ group_by: │ | |||
07)│ string_col@0 as string_col│ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be helpful to keep the index @0
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true that column indices provide precise tracking and tracing capabilities for complex query analysis, but I think tree explain should be as simple as possible🤔, as doc says:
/// TreeRender, displayed in the `tree` explain type.
///
/// This format is inspired by DuckDB's explain plans. The information
/// presented should be "user friendly", and contain only the most relevant
/// information for understanding a plan. It should NOT contain the same level
/// of detail information as the [`Self::Default`] format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the key difference Display
and the now named human_format
is that Display has more details -- I tried to add comments that explain this
I haven't tried it, not sure how it will affect other logic. |
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Thanks @alamb, @jayzhan211 and @xudong963 for your review, here are two points that remain unclear:
|
Co-authored-by: Andrew Lamb <[email protected]>
I don't think we should keep the index for the tree explain plan
I think this would be a more disruptive change as it would change the existing |
I pushed a commit to add more documentation explaining the different options for printing I also pushed a commit to rename Let me know what you think |
@@ -64,6 +64,15 @@ use sqlparser::ast::{ | |||
/// | |||
/// [`ExprFunctionExt`]: crate::expr_fn::ExprFunctionExt | |||
/// | |||
/// # Printing Expressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this section and the one below to try and explain the uses of the different explain variants.
I'll plan to merge this tomorrow unless others would like time to review |
Thank you @irenjj @jayzhan211 and @xudong963 🙏 |
@irenjj |
I believe that the Perhaps what we should really focus on is ensuring that the explain output correctly displays the |
Which issue does this PR close?
AggregateFunctionExpr
#15252Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?