Add integration tests using example files from apache/orc #65

progval · 2024-03-10T14:26:21Z

Should I use apache/orc as a git submodule instead of copying data files here? I figured it's not worth the overhead of submodules, considering they only weigh 24MB.

Some tests are failing, I tried to annotate the reason when I understood. Some (commented with // Why?) look like actual bugs.

Resolves #27

codecov · 2024-03-10T14:30:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.52%. Comparing base (424b021) to head (f6f615b).
Report is 45 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #65      +/-   ##
==========================================
+ Coverage   77.22%   80.52%   +3.29%     
==========================================
  Files          34       30       -4     
  Lines        3302     3106     -196     
==========================================
- Hits         2550     2501      -49     
+ Misses        752      605     -147

WenyXu

LGTM

Jefffrey · 2024-03-12T08:42:48Z

Thanks for this, I'll take a look soon 👍

Jefffrey

Could you also add a README to the tests/integration/data directory just to indicate where the data came from? For easier viewing from the repo (instead of needing to track down the PR)

Jefffrey · 2024-03-13T09:36:22Z

tests/integration/main.rs

+        // pretty_assertions consumes too much RAM and CPU on large diffs,
+        // and it's unreadable anyway
+        assert_eq!(lines[0..1000], expected_lines[0..1000]);
+        assert!(lines == expected_lines);


I wonder if we can parse the expected JSON into Arrow first with arrow_json then compare on RecordBatches

Assuming the schema inference works in our favour 🤔

That would avoid the issue of different decimal representation.

However, it makes the tests a little unreliable, as they wouldn't detect data lost both by arrow_json and datafusion_orc. Probably not a big deal

That's a fair point. I guess we're a bit handicapped by the expected data being in JSON form which can make it harder for us to be rigorous. I'll raise an issue for exploring ways around this 👍

Jefffrey · 2024-03-13T09:37:20Z

tests/integration/main.rs

+#[ignore] // Why?
+fn metaData() {
+    test_expected_file("TestOrcFile.metaData");
+}
+#[test]
+#[ignore] // Why?
+fn test1() {
+    test_expected_file("TestOrcFile.test1");
+}
+#[test]
+#[should_panic] // Incorrect timezone + representation differs


Thanks for leaving these annotations, gives us a goal to work towards

For all tests that fail we can just #[ignore] them and leave TODO comments to address these

Jefffrey · 2024-03-13T11:03:08Z

Thanks for this @progval ❤️

* Add integration tests using example files from apache/orc * s/should_panic/ignore/ and add TODO * Add README to data files

Add integration tests using example files from apache/orc

99e6678

WenyXu approved these changes Mar 10, 2024

View reviewed changes

WenyXu requested a review from Jefffrey March 10, 2024 15:02

Jefffrey reviewed Mar 13, 2024

View reviewed changes

progval added 2 commits March 13, 2024 11:56

s/should_panic/ignore/ and add TODO

520356e

Add README to data files

f6f615b

progval force-pushed the integration-tests branch from b071eab to f6f615b Compare March 13, 2024 10:56

Jefffrey merged commit 8e254a6 into datafusion-contrib:main Mar 13, 2024
9 checks passed

Jefffrey mentioned this pull request Mar 13, 2024

Rigorous ORC integration tests #66

Closed

waynexia pushed a commit that referenced this pull request Oct 24, 2024

Add integration tests using example files from apache/orc (#65)

4463f64

* Add integration tests using example files from apache/orc * s/should_panic/ignore/ and add TODO * Add README to data files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration tests using example files from apache/orc #65

Add integration tests using example files from apache/orc #65

progval commented Mar 10, 2024

codecov bot commented Mar 10, 2024 •

edited

Loading

WenyXu left a comment

Jefffrey commented Mar 12, 2024

Jefffrey left a comment

Jefffrey Mar 13, 2024

progval Mar 13, 2024

Jefffrey Mar 13, 2024

Jefffrey Mar 13, 2024

Jefffrey commented Mar 13, 2024

Add integration tests using example files from apache/orc #65

Add integration tests using example files from apache/orc #65

Conversation

progval commented Mar 10, 2024

codecov bot commented Mar 10, 2024 • edited Loading

Codecov Report

WenyXu left a comment

Choose a reason for hiding this comment

Jefffrey commented Mar 12, 2024

Jefffrey left a comment

Choose a reason for hiding this comment

Jefffrey Mar 13, 2024

Choose a reason for hiding this comment

progval Mar 13, 2024

Choose a reason for hiding this comment

Jefffrey Mar 13, 2024

Choose a reason for hiding this comment

Jefffrey Mar 13, 2024

Choose a reason for hiding this comment

Jefffrey commented Mar 13, 2024

codecov bot commented Mar 10, 2024 •

edited

Loading