Improved error for expand wildcard rule #15287

Jiashu-Hu · 2025-03-18T02:51:54Z

Which issue does this PR close?

Closes Improved error for expand wildcard rule #15004

Rationale for this change

The current error messages for non-aggregate columns in GROUP BY queries are technically accurate but not user-friendly. They use technical terminology like "non-aggregate values" without clearly explaining the problem or providing actionable solutions. This PR improves these error messages to be more intuitive and helpful, especially for users who might not be familiar with SQL aggregation concepts.

What changes are included in this PR?

added two function in to `expand_wildcard_rule.rs`:

fn validate_columns_in_group_by_or_aggregate(has_group_by: bool, expanded: &Vec<Expr>, group_by_columns: &Vec<Expr>, aggregate_columns: &Vec<Expr>) -> Result<(), DataFusionError>

Verifies all selected columns appear either in the GROUP BY clause or within aggregate functions
Returns a clear SchemaError::GroupByColumnNotInSelectList error when validation fails

added a new SchemaError type to shows the error message - datafusion_common::SchemaError::GroupByColumnNotInSelectList

In datafusion/sql/src/utils.rs:

Updated message_prefix() method to provide clearer error message prefixes
Improved help text in check_column_satisfies_expr() to suggest specific solutions
for easy understanding as requested in #15004

Are these changes tested?

There are several existing tests that directly evaluate this function during the CICL process, and all of them have been successfully passed.

Are there any user-facing changes?

Yes. Users will now see more intuitive error messages when they encounter GROUP BY validation issues. For example, instead of seeing:

Error during planning: Projection references non-aggregate values: Expression foo.b could not be resolved from available columns: foo.a

They will see:

SchemaError: Column 'foo.b' must appear in the GROUP BY clause or be used in an aggregate function

This change makes errors more understandable and provides clearer guidance on how to fix the issue.

…c, also modified related CICL content

…lated CICL test

jayzhan211 · 2025-03-18T04:03:50Z

datafusion/sql/src/select.rs

@@ -826,6 +827,13 @@ impl<S: ContextProvider> SqlToRel<'_, S> {
            .map(|expr| rebase_expr(expr, &aggr_projection_exprs, input))
            .collect::<Result<Vec<Expr>>>()?;

+        // check if the columns in the SELECT list are in the GROUP BY clause
+        // or are part of an aggregate function, if not, throw an error.
+        validate_columns_in_group_by_or_aggregate(


It seems like the error should be handled in the check_columns_satisfy_exprs below with ProjectionMustReferenceAggregate, is it possible to unify these 2?

hey jay, thanks for your review! I've unified these 2 also updated error message in the test file.

alamb

Thank you so much @Jiashu-Hu

I also verified that the example in the original reproducer now is much better 😍

DataFusion CLI v46.0.1
> create table foo(a int, b int, c timestamp) as values (1,2,'2025-01-01T12:01:02');

SELECT *
FROM foo
WHERE c >= NOW() - INTERVAL '1 hour'
GROUP BY a;
0 row(s) fetched.
Elapsed 0.037 seconds.

Error during planning: Column in SELECT must be in GROUP BY or an aggregate function: While expanding wildcard, column "foo.b" must appear in the GROUP BY clause or must be part of an aggregate function, currently only "foo.a" appears in the SELECT clause satisfies this requirement

cc @mhilton

jayzhan211 · 2025-03-22T05:14:05Z

Thanks @alamb @Jiashu-Hu

ozankabak · 2025-03-22T09:57:01Z

This PR seems to have broken main. It is an easy fix (error messages do not match).

@alamb -- Extended tests break very frequently these days. We should prioritize completing the work on running them before merge.

alamb · 2025-03-23T19:39:27Z

This PR seems to have broken main. It is an easy fix (error messages do not match).

@alamb -- Extended tests break very frequently these days. We should prioritize completing the work on running them before merge.

@ozankabak I think there is a PR waiting on review that would let us more easily run the tests before merge. I am not sure how we would have known to run them in this case though

Triggering extended tests through PR comment: Run extended tests #15101

I feel like what would be idea is some workflow that actually ran the tests before merge but that didn't need us to babysit the PR -- something like letting a bot merge it or a merge queue

Jiashu-Hu added 2 commits March 17, 2025 21:24

Improve groupby error message for new version of expand wildcard logi…

bfdc866

…c, also modified related CICL content

Improved error message for new expand wildcard logic, also editted re…

c24c3b5

…lated CICL test

github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Mar 18, 2025

improved syntax requested by CICL process

4c7f3c8

jayzhan211 reviewed Mar 18, 2025

View reviewed changes

unified and , updated test error message

42227bd

github-actions bot removed the common Related to common crate label Mar 18, 2025

alamb changed the title ~~Improvement/improve wildcard error 15004~~ Improved error for expand wildcard rule Mar 21, 2025

alamb approved these changes Mar 21, 2025

View reviewed changes

jayzhan211 merged commit c2c43ec into apache:main Mar 22, 2025
28 checks passed

alamb mentioned this pull request Mar 22, 2025

Extended tests failing on main #15359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved error for expand wildcard rule #15287

Improved error for expand wildcard rule #15287

Jiashu-Hu commented Mar 18, 2025 •

edited by alamb

Loading

jayzhan211 Mar 18, 2025 •

edited

Loading

Jiashu-Hu Mar 18, 2025

alamb left a comment

jayzhan211 commented Mar 22, 2025

ozankabak commented Mar 22, 2025 •

edited

Loading

alamb commented Mar 23, 2025

Improved error for expand wildcard rule #15287

Improved error for expand wildcard rule #15287

Conversation

Jiashu-Hu commented Mar 18, 2025 • edited by alamb Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

added two function in to expand_wildcard_rule.rs:

In datafusion/sql/src/utils.rs:

Are these changes tested?

Are there any user-facing changes?

jayzhan211 Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

Jiashu-Hu Mar 18, 2025

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

jayzhan211 commented Mar 22, 2025

ozankabak commented Mar 22, 2025 • edited Loading

alamb commented Mar 23, 2025

Jiashu-Hu commented Mar 18, 2025 •

edited by alamb

Loading

added two function in to `expand_wildcard_rule.rs`:

jayzhan211 Mar 18, 2025 •

edited

Loading

ozankabak commented Mar 22, 2025 •

edited

Loading