-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add integration tests using example files from apache/orc #65
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #65 +/- ##
==========================================
+ Coverage 77.22% 80.52% +3.29%
==========================================
Files 34 30 -4
Lines 3302 3106 -196
==========================================
- Hits 2550 2501 -49
+ Misses 752 605 -147 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks for this, I'll take a look soon 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add a README to the tests/integration/data
directory just to indicate where the data came from? For easier viewing from the repo (instead of needing to track down the PR)
// pretty_assertions consumes too much RAM and CPU on large diffs, | ||
// and it's unreadable anyway | ||
assert_eq!(lines[0..1000], expected_lines[0..1000]); | ||
assert!(lines == expected_lines); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can parse the expected JSON into Arrow first with arrow_json then compare on RecordBatches
Assuming the schema inference works in our favour 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would avoid the issue of different decimal representation.
However, it makes the tests a little unreliable, as they wouldn't detect data lost both by arrow_json
and datafusion_orc
. Probably not a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point. I guess we're a bit handicapped by the expected data being in JSON form which can make it harder for us to be rigorous. I'll raise an issue for exploring ways around this 👍
tests/integration/main.rs
Outdated
#[ignore] // Why? | ||
fn metaData() { | ||
test_expected_file("TestOrcFile.metaData"); | ||
} | ||
#[test] | ||
#[ignore] // Why? | ||
fn test1() { | ||
test_expected_file("TestOrcFile.test1"); | ||
} | ||
#[test] | ||
#[should_panic] // Incorrect timezone + representation differs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for leaving these annotations, gives us a goal to work towards
For all tests that fail we can just #[ignore]
them and leave TODO comments to address these
b071eab
to
f6f615b
Compare
Thanks for this @progval ❤️ |
* Add integration tests using example files from apache/orc * s/should_panic/ignore/ and add TODO * Add README to data files
Should I use
apache/orc
as a git submodule instead of copying data files here? I figured it's not worth the overhead of submodules, considering they only weigh 24MB.Some tests are failing, I tried to annotate the reason when I understood. Some (commented with
// Why?
) look like actual bugs.Resolves #27