Skip to content

Bespin scala enhancement - Initial Pass #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from

Conversation

moore-ryan
Copy link

This is the initial step of porting over more of the Bespin Java MapReduce code to Scala. It also includes the initial version of a small DSL (MapReduceSugar.scala) which allows for a more natural definition of MapReduce jobs in a type-safe manner.

This PR includes the Scala implementations of the Bigram count MapReduce programs. Output from the Scala implementation appears the same as the Java implementation, modulo some ordering differences in the "Stripes" implementation.

@moore-ryan moore-ryan force-pushed the bespin-scala-enhancement branch from 4f7601d to 7e1b2c9 Compare March 29, 2016 15:06
…source files from the internet before running
This should make the code a bit more modular and simple. It also solves
some of the problems with the previous implementation having trouble
with things such as running a simple partition job with no mapper or
reducer being set.
Addition of more integration tests and some unit tests for Hadoop<->Scala conversions
@moore-ryan
Copy link
Author

This PR is for progress tracking only; I will close this PR, rebase, and open a new PR in order to avoid cluttering commit history later.

Also added in an implicit conversion between String -> hadoop Path to
make specification of inputs and outputs more like it is in Spark.

Removed unused "compareJavaScala.py" file.
@moore-ryan moore-ryan force-pushed the bespin-scala-enhancement branch from ed70182 to 494b2cb Compare April 10, 2016 03:09
Added integration tests for BFS, refactored other integration tests
…ug in scala PageRank related to strange iterator behavior.
Fixed bug in RunPageRankSchimmy that caused failures in local mode of IMC due to mapper reuse.
Added integration tests for PageRank and PageRankSchimmy.
@moore-ryan
Copy link
Author

Closing this PR in favor of this PR, which is simply a squashed version of this commit. This should cut back on the commit history growth.

@moore-ryan moore-ryan closed this Apr 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant