Fix 220 #308

cvaliente · 2019-12-04T04:18:44Z

@holdenk taking up #228 again.
The full outer join does not care about ordering or partitioning and works on distributed, larger datasets.
I also included an optional skip of the schema equality. One common use case was to load input and expected data from a csv and compare the transformed input to expected output. when loading a csv Spark automatically makes all fields nullable regardless of any schema. the output of the tested functionality could have non-nullable fields (e.g. groupby().count() results in a non-nullable count column)
For unit tests it's often just interesting whether or not the data matches, not if spark inferred the schema correctly

…le duplicates in dataframe

# Conflicts: # .travis.yml # core/src/main/2.0/scala/com/holdenkarau/spark/testing/DataFrameSuiteBase.scala

…hat doesn't care about ordering or partitioning at all.

cvaliente · 2019-12-04T06:02:05Z

pipeline failure seems to be unrelated.

codecov-io · 2019-12-04T08:54:30Z

Codecov Report

Merging #308 into master will increase coverage by 7.52%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #308      +/-   ##
==========================================
+ Coverage   78.35%   85.87%   +7.52%     
==========================================
  Files          45       11      -34     
  Lines         388      354      -34     
  Branches       34       34              
==========================================
  Hits          304      304              
+ Misses         63       29      -34     
  Partials       21       21

Flag	Coverage Δ
#python	`85.87% <ø> (ø)`	⬆️
#scala	`?`

Impacted Files	Coverage Δ
...scala/com/holdenkarau/spark/testing/Prettify.scala
...ldenkarau/spark/testing/StreamingSuiteCommon.scala
.../holdenkarau/spark/testing/MLUserDefinedType.scala
...oldenkarau/spark/testing/StreamingActionBase.scala
...om/holdenkarau/spark/testing/TestInputStream.scala
...a/com/holdenkarau/spark/testing/RDDGenerator.scala
...la/com/holdenkarau/spark/testing/YARNCluster.scala
...enkarau/spark/testing/JavaDataFrameSuiteBase.scala
...holdenkarau/spark/testing/DataframeGenerator.scala
...nkarau/spark/testing/StructuredStreamingBase.scala
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 79eef40...c9634c0. Read the comment docs.

holdenk · 2020-03-05T18:26:04Z

Sorry for my delay in merging this I was in a motorcycle crash. Looks good to me.

smadarasmi and others added 6 commits February 19, 2019 17:07

Fixes holdenk#220 : Add dataframe comparison without order

b4c92e5

Fixes holdenk#220: modify method assertDataFrameNoOrderEquals to hand…

44e3b4d

…le duplicates in dataframe

fix comments from codacy

4b5d964

update spark download url for travis build

a577673

Merge branch 'master' into fix-220

9f6adac

# Conflicts: # .travis.yml # core/src/main/2.0/scala/com/holdenkarau/spark/testing/DataFrameSuiteBase.scala

add assertions for df equality without schema and do an equal check t…

081a505

…hat doesn't care about ordering or partitioning at all.

use api instead of sql

c9634c0

holdenk approved these changes Mar 5, 2020

View reviewed changes

holdenk merged commit d72d84f into holdenk:master Mar 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 220 #308

Fix 220 #308

cvaliente commented Dec 4, 2019

cvaliente commented Dec 4, 2019

codecov-io commented Dec 4, 2019

holdenk commented Mar 5, 2020

Fix 220 #308

Fix 220 #308

Conversation

cvaliente commented Dec 4, 2019

cvaliente commented Dec 4, 2019

codecov-io commented Dec 4, 2019

Codecov Report

holdenk commented Mar 5, 2020