Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rigorous ORC integration tests #66

Closed
Jefffrey opened this issue Mar 13, 2024 · 2 comments
Closed

Rigorous ORC integration tests #66

Jefffrey opened this issue Mar 13, 2024 · 2 comments
Assignees

Comments

@Jefffrey
Copy link
Collaborator

Integration tests added by #65

However we have to compare actual vs expected data in JSON format since that is how it is encoded in the Apache ORC repo

An alternative way could be to use the pyarrow/arrow ORC implementation to generate the expected files into a parquet or arrow flight file format which can be more rigorous than JSON

We lose visibility on the expected data a bit but since these are integration tests with data from Apache ORC repo, they wouldn't change often (if at all) anyway

@Jefffrey
Copy link
Collaborator Author

Almost done, just want to optimize this code:

// TODO: better way of checking equality? this step is slow for zlib
let formatted_actual = pretty::pretty_format_batches(&actual_batches)
.unwrap()
.to_string();
let actual_lines = formatted_actual.trim().lines().collect::<Vec<_>>();
let formatted_expected = pretty::pretty_format_batches(&expected_batches)
.unwrap()
.to_string();
let expected_lines = formatted_expected.trim().lines().collect::<Vec<_>>();
// TODO: Also test schema? Ignore nullability however?
assert_eq!(actual_lines, expected_lines);

Because it is major slowdown for the zlib test

@Jefffrey
Copy link
Collaborator Author

0405e23

This commit introduces concatenating the vec of recordbatches into single recordbatch for easier comparison.

Had to disable 2 other tests due to some schema issues, but will work on that separately. Closing this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant