-
-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assertDataFrameEquals() Failure due to Different Row Order #220
Comments
Hi,
|
Hi @spova , Imagine I have more than one column in my DataFrame. Let's say 5 (ID, name, gender, etc.). Indeed you can sort by ID, however, the rest will stay unsorted, which makes them un-comparable. You may explicitly sort every column at the same time but it is non-trivial and not elegant coding work to do. |
…le duplicates in dataframe
Hi folks, @smadarasmi any plans to PR your fix into the main repo? |
Hello. Is this going to be implemented? |
@noleto I don't have write access to this repo. |
+1 for this |
@smadarasmi you'll need to resolve the build errors on your PR #228 |
…le duplicates in dataframe
@nsutcliffe The build fails on this step: Initializing download: http://www-us.apache.org/dist/spark/spark-2.2.2/spark-2.2.2-bin-hadoop2.7.tgz It returns 404, not found. Don't think it is related to my change. |
Oh yeah that's a good point, Spark changed it's release packaging so the older versions are available for download from the normal mirrors anymore and I haven't had a chance to update the travis file to point to the new version yet. |
@smadarasmi in the mean time you could check the build is fine by updating .travis.yml, find the link to spark-2.2.2 (line 25) and replace with: |
If you wouldn't mind updating the travis file in your PR it would just be able to run CI and merge as normal? Otherwise I can do a quick PR to do that. |
Do you want it on 2.2.3 or 2.2.2? |
Lets do 2.2.3 it should remain on the mirrors for longer. |
Hi :D! There are any news about this proposal? I think it's a huge feature, it adds a lot of value to the library for a very tiny changes. |
@holdenk Does it look okay for merging? |
can this also work for spark 2.2? |
Hi,
I'm not sure if this issue is raised earlier after some searches.
It's common to have the same DataFrames with different row order. Hence my test case fails sometimes, and occasionally it successes. Is there a better way to compare DataFrames?
The text was updated successfully, but these errors were encountered: