Bespin scala enhancement - Initial Pass #2

moore-ryan · 2016-01-29T14:57:14Z

This is the initial step of porting over more of the Bespin Java MapReduce code to Scala. It also includes the initial version of a small DSL (MapReduceSugar.scala) which allows for a more natural definition of MapReduce jobs in a type-safe manner.

This PR includes the Scala implementations of the Bigram count MapReduce programs. Output from the Scala implementation appears the same as the Java implementation, modulo some ordering differences in the "Stripes" implementation.

…re comments and make things a bit more idiomatic (the window calculations are not very 'scala-y' right now)

…more scala-idiomatic. Also added ported version of search code

…les and remove the need for explicit conversion from java

…tation - Basic page rank results match

…ement

…source files from the internet before running

This should make the code a bit more modular and simple. It also solves some of the problems with the previous implementation having trouble with things such as running a simple partition job with no mapper or reducer being set. Addition of more integration tests and some unit tests for Hadoop<->Scala conversions

…ppers and reducers

moore-ryan · 2016-04-08T16:52:28Z

This PR is for progress tracking only; I will close this PR, rebase, and open a new PR in order to avoid cluttering commit history later.

Also added in an implicit conversion between String -> hadoop Path to make specification of inputs and outputs more like it is in Spark. Removed unused "compareJavaScala.py" file.

Added integration tests for BFS, refactored other integration tests

…ug in scala PageRank related to strange iterator behavior.

Fixed bug in RunPageRankSchimmy that caused failures in local mode of IMC due to mapper reuse. Added integration tests for PageRank and PageRankSchimmy.

moore-ryan · 2016-04-16T16:30:51Z

Closing this PR in favor of this PR, which is simply a squashed version of this commit. This should cut back on the commit history growth.

moore-ryan added 12 commits January 29, 2016 00:11

Initial commit of MapReduce syntactic sugar and Bigram implementations

bad89bc

Moved MapReduce-specific utilities into io.bespin.scala.mapreduce.util

b8c8a97

Small cleanup and comments

fd5dd9c

Added more comments, renamed some traits for clarity

994e955

Updated syntax for readability and conciseness

48196a4

Added compare script, changed WordCount to use new style

b41b3c8

Added ports of CooccurrenceMatrix stripes/pairs. Still need to add mo…

9c320e6

…re comments and make things a bit more idiomatic (the window calculations are not very 'scala-y' right now)

Added block comments and changed the sliding window code to be a bit …

637382c

…more scala-idiomatic. Also added ported version of search code

Changed signature of TypedReducer's reduce method to use scala iterab…

0472eaf

…les and remove the need for explicit conversion from java

Added in implicit conversion coverage for pair datatypes

5b07e5e

Beginning of unit tests; Beginning work on PageRank scala MR implemen…

a46e4d2

…tation - Basic page rank results match

Merge remote-tracking branch 'origin/master' into bespin-scala-enhanc…

d9131d4

…ement

moore-ryan force-pushed the bespin-scala-enhancement branch from 4f7601d to 7e1b2c9 Compare March 29, 2016 15:06

moore-ryan added 3 commits March 29, 2016 16:44

Added in more integration tests; integration tests now pull required …

178c4e6

…source files from the internet before running

Removed nullMapper/nullReducer objects in favor of having Optional ma…

15fdced

…ppers and reducers

Main BFS classes implemented and results match.

494b2cb

Also added in an implicit conversion between String -> hadoop Path to make specification of inputs and outputs more like it is in Spark. Removed unused "compareJavaScala.py" file.

moore-ryan force-pushed the bespin-scala-enhancement branch from ed70182 to 494b2cb Compare April 10, 2016 03:09

moore-ryan added 4 commits April 10, 2016 19:23

Created proper maven target for integration tests.

8c35401

Added integration tests for BFS, refactored other integration tests

Added integration tests for non-schimmy versions of PageRank. Fixed b…

08e840c

…ug in scala PageRank related to strange iterator behavior.

Added test fixture and integration tests for search/boolean retrieval

332ae73

Full application parity between scala and java versions.

778a5a3

Fixed bug in RunPageRankSchimmy that caused failures in local mode of IMC due to mapper reuse. Added integration tests for PageRank and PageRankSchimmy.

moore-ryan closed this Apr 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bespin scala enhancement - Initial Pass #2

Bespin scala enhancement - Initial Pass #2

Uh oh!

moore-ryan commented Jan 29, 2016

Uh oh!

moore-ryan commented Apr 8, 2016

Uh oh!

moore-ryan commented Apr 16, 2016

Uh oh!

Uh oh!

Bespin scala enhancement - Initial Pass #2

Bespin scala enhancement - Initial Pass #2

Uh oh!

Conversation

moore-ryan commented Jan 29, 2016

Uh oh!

moore-ryan commented Apr 8, 2016

Uh oh!

moore-ryan commented Apr 16, 2016

Uh oh!

Uh oh!