-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rigorous ORC integration tests #66
Comments
Almost done, just want to optimize this code: datafusion-orc/tests/integration/main.rs Lines 27 to 38 in fd23fdb
Because it is major slowdown for the |
This commit introduces concatenating the vec of recordbatches into single recordbatch for easier comparison. Had to disable 2 other tests due to some schema issues, but will work on that separately. Closing this issue |
Integration tests added by #65
However we have to compare actual vs expected data in JSON format since that is how it is encoded in the Apache ORC repo
An alternative way could be to use the pyarrow/arrow ORC implementation to generate the expected files into a parquet or arrow flight file format which can be more rigorous than JSON
We lose visibility on the expected data a bit but since these are integration tests with data from Apache ORC repo, they wouldn't change often (if at all) anyway
The text was updated successfully, but these errors were encountered: