Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 220 #308

Merged
merged 7 commits into from
Mar 5, 2020
Merged

Fix 220 #308

merged 7 commits into from
Mar 5, 2020

Conversation

cvaliente
Copy link

@holdenk taking up #228 again.
The full outer join does not care about ordering or partitioning and works on distributed, larger datasets.
I also included an optional skip of the schema equality. One common use case was to load input and expected data from a csv and compare the transformed input to expected output. when loading a csv Spark automatically makes all fields nullable regardless of any schema. the output of the tested functionality could have non-nullable fields (e.g. groupby().count() results in a non-nullable count column)
For unit tests it's often just interesting whether or not the data matches, not if spark inferred the schema correctly

smadarasmi and others added 6 commits February 19, 2019 17:07
@cvaliente
Copy link
Author

pipeline failure seems to be unrelated.

@codecov-io
Copy link

Codecov Report

Merging #308 into master will increase coverage by 7.52%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #308      +/-   ##
==========================================
+ Coverage   78.35%   85.87%   +7.52%     
==========================================
  Files          45       11      -34     
  Lines         388      354      -34     
  Branches       34       34              
==========================================
  Hits          304      304              
+ Misses         63       29      -34     
  Partials       21       21
Flag Coverage Δ
#python 85.87% <ø> (ø) ⬆️
#scala ?
Impacted Files Coverage Δ
...scala/com/holdenkarau/spark/testing/Prettify.scala
...ldenkarau/spark/testing/StreamingSuiteCommon.scala
.../holdenkarau/spark/testing/MLUserDefinedType.scala
...oldenkarau/spark/testing/StreamingActionBase.scala
...om/holdenkarau/spark/testing/TestInputStream.scala
...a/com/holdenkarau/spark/testing/RDDGenerator.scala
...la/com/holdenkarau/spark/testing/YARNCluster.scala
...enkarau/spark/testing/JavaDataFrameSuiteBase.scala
...holdenkarau/spark/testing/DataframeGenerator.scala
...nkarau/spark/testing/StructuredStreamingBase.scala
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79eef40...c9634c0. Read the comment docs.

@holdenk
Copy link
Owner

holdenk commented Mar 5, 2020

Sorry for my delay in merging this I was in a motorcycle crash. Looks good to me.

@holdenk holdenk merged commit d72d84f into holdenk:master Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants