Skip to content

Error after pre-release arrow upgrade: "out of order projection is not supported" (NOT FOR MERGING) #2530

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented May 13, 2022

This PR demonstrates a test that fails with the code from apache/arrow-rs#1682 in arrow (not included in arrow 14.0.0).

This PR pins datafusion to arrow right after apache/arrow-rs#1682 was merged at commit apache/arrow-rs@5b154ea

To reproduce:

cargo test -p datafusion --lib

Results in:

failures:

---- physical_plan::file_format::parquet::tests::evolved_schema_filter stdout ----
thread 'physical_plan::file_format::parquet::tests::evolved_schema_filter' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ExternalError(ParquetError(General("out of order projection is not supported"))))', datafusion/core/src/physical_plan/file_format/parquet.rs:968:14

---- physical_plan::file_format::parquet::tests::evolved_schema_inconsistent_order stdout ----
thread 'physical_plan::file_format::parquet::tests::evolved_schema_inconsistent_order' panicked at 'called `Result::unwrap()` on an `Err` value: ArrowError(ExternalError(ParquetError(General("out of order projection is not supported"))))', datafusion/core/src/physical_plan/file_format/parquet.rs:819:14


failures:
    physical_plan::file_format::parquet::tests::evolved_schema_filter
    physical_plan::file_format::parquet::tests::evolved_schema_inconsistent_order

test result: FAILED. 656 passed; 2 failed; 1 ignored; 0 measured; 0 filtered out; finished in 2.00s

error: test failed, to rerun pass '-p datafusion --lib'
Error: Process completed with exit code 101.

@@ -38,3 +38,8 @@ exclude = ["ballista-cli", "datafusion-cli"]
[profile.release]
codegen-units = 1
lto = true

[patch.crates-io]
arrow = { git = "https://github.com/apache/arrow-rs.git", rev="5b154ea40314dc2f09babbb363bf7f1fe439d4eb" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right after apache/arrow-rs#1682

@tustvold
Copy link
Contributor

Likely related #2453 - the DataFusion logic for handling column projection to parquet is currently silently broken and likely only working because of the schema adapter logic

@alamb
Copy link
Contributor Author

alamb commented May 13, 2022

🤔 I suppose we'll have to fix datafusion then...

@alamb
Copy link
Contributor Author

alamb commented May 13, 2022

the DataFusion logic for handling column projection to parquet is currently silently broken and likely only working because of the schema adapter logic

I don't understand how things can be broken but also be working...

@tustvold
Copy link
Contributor

I've not had time to look properly yet, but my suspicion is that the schema adapter logic knows what the expected output schema is and rearranges the columns - masking the fact what was returned by the parquet reader did not respect the projection order.

@alamb
Copy link
Contributor Author

alamb commented May 16, 2022

Filed #2543 to track

@alamb alamb mentioned this pull request May 27, 2022
@tustvold tustvold closed this May 27, 2022
@alamb alamb deleted the alamb/arrow_issue branch August 8, 2023 20:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants