November 24, 2013
- Upgraded to JDK7
- Upgraded to CDH 4.4.0
- Fixed minor integration/regression issues as a result of upgrade to JDK7
July 6, 2013
- Upgraded to CDH 4.3.0
June 17, 2013
- Refactoring of BFS code and added integration tests
- Created wikipedia.graph package for Wikipedia graph manipulation classes
- Added class for extracting anchor text from Wikipedia graph
June 9, 2013
- Improved Wikipedia collection handling
- Updated Wikipedia collection APIs to new Hadoop API (org.apache.hadoop.mapreduce)
April 30, 2013
- Upgraded to CDH 4.2.1
- Changed dependency to Maven artifacts for dsiutils, sux4j, fastutil, and spymemcached (as opposed to local jars)
March 10, 2013
- Added demo for learning Univariate Gaussian Mixture Models
- Upgraded to CDH 4.2.0
February 24, 2013
- Added ability to read index and collection from HDFS in solutions to inverted indexing and boolean retrieval exercises
February 17, 2013
- More efficient comparators in Writable pairs
- Refactored PageRank implementations, added proper arg parsing
- Revamped documentation on reference implementations
- Fixed broken integration tests
February 6, 2013
- Refactored cooccurrence matrix example, stripes implementation
- Added integration test for webgraph
February 1, 2013
- Fixed bugs in Wikipedia classes
January 31, 2013
- Refactored cooccurrence matrix examples: pairs and stripes
January 27, 2013
- Fixed previously corrupt pom.xml
January 25, 2013
- Updates to Wikipedia code
January 22, 2013
- Updated documentation to use Bootstrap
- Updated MapReduce exercises
- Created word count tutorial
January 14, 2013
- Code fixes to Hooka
- YARN-related fix to CombineSequenceFiles
December 31, 2012
- Fixes to disambiguation page identification in Wikipedia
- Added a few integration test for sample exercises
December 25, 2012
- Fixed bug with disambiguation page detection in Wikipedia
December 20, 2012
- Fixed broken code for working with MEDLINE collection
December 16, 2012
- Upgrade to YARN (CDH 4.1.2)