Skip to content

Fix for issue #6595 has broken existing working queries ("Schema error: No field named ...") #6897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maxburke opened this issue Jul 9, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@maxburke
Copy link
Contributor

maxburke commented Jul 9, 2023

Describe the bug

After upgrading to Datafusion 27.0.0 we noticed some of our regression tests were failing. We bisected the commit that introduced the break to 36123ee, which is the fix for #6595.

To Reproduce

The attached zip file contains a parquet file. To reproduce the issue in datafusion-cli, run:

> create external table t0 stored as parquet location 'test_data.parquet';
> SELECT  "day"  AS  "date", count(distinct "direction")  AS  "num_directions" FROM t0  GROUP BY "day" ORDER BY t0."day" ASC;

In datafusion 26 and earlier, this will generate a result. In datafusion 27, it generates this error message:

Optimizer rule 'push_down_projection' failed
caused by
Error during planning: required columns can't push down, columns: {Column { relation: Some(Bare { table: "t0" }), name: "day" }, Column { relation: None, name: "num_directions" }, Column { relation: None, name: "date" }}

test_data.zip

Expected behavior

No response

Additional context

No response

@maxburke maxburke added the bug Something isn't working label Jul 9, 2023
@alamb
Copy link
Contributor

alamb commented Jul 10, 2023

I believe this is similar to #6790 which was since fixed in #6827. I will verify that this is the same thing (we saw a similar error in IOx)

@alamb alamb self-assigned this Jul 10, 2023
@alamb
Copy link
Contributor

alamb commented Jul 10, 2023

I have verified this has been fixed on master (aka what will be released in DataFusion 28.0.0).

BTW I added new test coverage in #6836 so that we don't break this again by accident.

Since it is a regression I would be willing to create a patch release (27.0.1) with the fix if that would be helpful for others

Using this query (thanks for the reproducer @maxburke 🙏 )

SELECT 
 "day"  AS  "date", count(distinct "direction")  AS  "num_directions" 
FROM 'test_data.parquet' 
GROUP BY "day" 
ORDER BY "day" ASC;

26.0.0 works

DataFusion CLI v26.0.0
❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  "num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
+---------------------+----------------+
| date                | num_directions |
+---------------------+----------------+
| 2011-09-09T00:00:00 | 2              |
| 2011-09-10T00:00:00 | 2              |
...

| 2018-04-14T00:00:00 | 2              |
| 2018-04-15T00:00:00 | 2              |
+---------------------+----------------+
81 rows in set. Query took 0.024 seconds.
❯

27.0.0 fails

DataFusion CLI v27.0.0
❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  "num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
Optimizer rule 'simplify_expressions' failed
caused by
Schema error: No field named "test_data.parquet".day. Valid fields are "test_data.parquet.day", "COUNT(DISTINCT test_data.parquet.direction)".
❯

main passes:

$ git checkout main
Already on 'main'
Your branch is up to date with 'apache/main'.
$ CARGO_TARGET_DIR=/Users/alamb/Software/target-df cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `/Users/alamb/Software/target-df/debug/datafusion-cli`
DataFusion CLI v27.0.0
❯ SELECT  "day"  AS  "date", count(distinct "direction")  AS  "num_directions" FROM 'test_data.parquet'  GROUP BY "day" ORDER BY "day" ASC;
+---------------------+----------------+
| date                | num_directions |
+---------------------+----------------+
| 2011-09-09T00:00:00 | 2              |
| 2011-09-10T00:00:00 | 2              |
...
| 2018-04-14T00:00:00 | 2              |
| 2018-04-15T00:00:00 | 2              |
+---------------------+----------------+
81 rows in set. Query took 0.027 seconds.

@alamb
Copy link
Contributor

alamb commented Jul 10, 2023

cc @jackwener

@alamb alamb changed the title Fix for issue #6595 has broken existing working queries Fix for issue #6595 has broken existing working queries ("Schema error: No field named ...") Jul 10, 2023
@alamb alamb closed this as completed Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants