@Copyright 2013-2017 Inidana University Apache License 2.0
@Author: Bingjing Zhang
Harp is a framework for machine learning applications.
- A Hadoop plugin. It currently supports hadoop 2.6.0 ~ 2.7.3 version.
- Hierarchical data abstraction (arrays/objects, partitions/tables)
- Pool based memory management
- Collective + event-driven programming model (distributed computing)
- Dynamic Scheduler + Static Scheduler (multi-threading)
####1. Install Maven by following the maven official instruction ####2. Enter "harp" home directory ####3. Install third party jar file. This javaml jar is required by randomforest application. It's not required by harp project itself. mvn install:install-file -Dfile=third_party/javaml-0.1.7.jar -DgroupId=net.sf -DartifactId=javaml -Dversion=0.1.7 -Dpackaging=jar ####4. Compile harp mvn clean package
####5. Install harp plugin to hadoop cp harp-project/target/harp-project-1.0-SNAPSHOT.jar $HADOOP_HOME/share/hadoop/mapreduce/ cp third_party/fastutil-7.0.13.jar $HADOOP_HOME/share/hadoop/mapreduce/
####6. Configure Hadoop environment for settings required to run Hadoop
####7. Edit mapred-site.xml in $HADOOP_HOME/etc/hadoop, add java opts settings for map-collective tasks. For example:
<value>-Xmx256m -Xms256m</value>
####8. To develop Harp applications, remember to add the following property in job configuration: jobConf.set("mapreduce.framework.name", "map-collective");
####1. copy harp examples to $HADOOP_HOME cp harp-app/target/harp-app-1.0-SNAPSHOT.jar $HADOOP_HOME
####2. Start Hadoop cd $HADOOP_HOME sbin/start-dfs.sh sbin/start-yarn.sh
####3. Run Kmeans Map-collective job
hadoop jar harp-app-1.0-SNAPSHOT.jar edu.iu.kmeans.regroupallgather.KMeansLauncher <num of points> <num of centroids> <vector size> <num of point files per worker> <number of map tasks> <num threads> <number of iteration> <work dir> <local points dir>
hadoop jar harp-app-1.0-SNAPSHOT.jar edu.iu.kmeans.regroupallgather.KMeansLauncher 1000 10 100 5 2 2 10 /kmeans /tmp/kmeans