-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Improved error for expand wildcard rule #15287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved error for expand wildcard rule #15287
Conversation
…c, also modified related CICL content
datafusion/sql/src/select.rs
Outdated
@@ -826,6 +827,13 @@ impl<S: ContextProvider> SqlToRel<'_, S> { | |||
.map(|expr| rebase_expr(expr, &aggr_projection_exprs, input)) | |||
.collect::<Result<Vec<Expr>>>()?; | |||
|
|||
// check if the columns in the SELECT list are in the GROUP BY clause | |||
// or are part of an aggregate function, if not, throw an error. | |||
validate_columns_in_group_by_or_aggregate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the error should be handled in the check_columns_satisfy_exprs
below with ProjectionMustReferenceAggregate
, is it possible to unify these 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey jay, thanks for your review! I've unified these 2 also updated error message in the test file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @Jiashu-Hu
I also verified that the example in the original reproducer now is much better 😍
DataFusion CLI v46.0.1
> create table foo(a int, b int, c timestamp) as values (1,2,'2025-01-01T12:01:02');
SELECT *
FROM foo
WHERE c >= NOW() - INTERVAL '1 hour'
GROUP BY a;
0 row(s) fetched.
Elapsed 0.037 seconds.
Error during planning: Column in SELECT must be in GROUP BY or an aggregate function: While expanding wildcard, column "foo.b" must appear in the GROUP BY clause or must be part of an aggregate function, currently only "foo.a" appears in the SELECT clause satisfies this requirement
cc @mhilton
Thanks @alamb @Jiashu-Hu |
This PR seems to have broken @alamb -- Extended tests break very frequently these days. We should prioritize completing the work on running them before merge. |
@ozankabak I think there is a PR waiting on review that would let us more easily run the tests before merge. I am not sure how we would have known to run them in this case though I feel like what would be idea is some workflow that actually ran the tests before merge but that didn't need us to babysit the PR -- something like letting a bot merge it or a merge queue |
Which issue does this PR close?
Rationale for this change
The current error messages for non-aggregate columns in GROUP BY queries are technically accurate but not user-friendly. They use technical terminology like "non-aggregate values" without clearly explaining the problem or providing actionable solutions. This PR improves these error messages to be more intuitive and helpful, especially for users who might not be familiar with SQL aggregation concepts.
What changes are included in this PR?
added two function in to
expand_wildcard_rule.rs
:fn validate_columns_in_group_by_or_aggregate(has_group_by: bool, expanded: &Vec<Expr>, group_by_columns: &Vec<Expr>, aggregate_columns: &Vec<Expr>) -> Result<(), DataFusionError>
SchemaError::GroupByColumnNotInSelectList
error when validation failsSchemaError
type to shows the error message -datafusion_common::SchemaError::GroupByColumnNotInSelectList
In datafusion/sql/src/utils.rs:
message_prefix()
method to provide clearer error message prefixescheck_column_satisfies_expr()
to suggest specific solutionsAre these changes tested?
There are several existing tests that directly evaluate this function during the CICL process, and all of them have been successfully passed.
Are there any user-facing changes?
Yes. Users will now see more intuitive error messages when they encounter GROUP BY validation issues. For example, instead of seeing:
They will see:
This change makes errors more understandable and provides clearer guidance on how to fix the issue.