@Copyright 2013-2018 Inidana University Apache License 2.0
Harp is a HPC-ABDS (High Performance Computing Enhanced Apache Big Data Stack) framework aiming to provide distributed machine learning and other data intensive applications.
- Plug into Hadoop ecosystem.
- Rich computation models for different machine learning/data intensive applications
- MPI-like Collective Communication operations
- High performance native kernels supporting many-core processors (e.g., Intel Xeon and Xeon Phi)
Please find the full documentation of Harp at https://dsc-spidal.github.io/harp/ including quick start, programming guide, and examples.
Please download the binaries of Harp from https://github.com/DSC-SPIDAL/harp/releases.
Copy the jar files to $HADOOP_HOME
## the core modules
cp core/harp-hadoop-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp core/harp-collective-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp core/harp-daal-interface-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
## the application modules
cp ml/harp-java-0.1.0.jar $HADOOP_HOME/
cp ml/harp-daal-0.1.0.jar $HADOOP_HOME/
cp contrib-0.1.0.jar $HADOOP_HOME/
- Install Maven by following the maven official instruction
- Compile harp by Maven with different hadoop versions
## x.x.x could be 2.6.0, 2.7.5, and 2.9.0
mvn clean package -Phadoop-x.x.x
- Copy compiled modules jar files to $HADOOP_HOME
cd harp/
## the core modules
cp core/harp-hadoop/target/harp-hadoop-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp core/harp-collective/target/harp-collective-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
cp core/harp-daal-interface/target/harp-daal-interface-0.1.0.jar $HADOOP_HOME/share/hadoop/mapreduce/
## the application modules
cp ml/java/target/harp-java-0.1.0.jar $HADOOP_HOME/
cp ml/daal/target/harp-daal-0.1.0.jar $HADOOP_HOME/
cp contrib/target/contrib-0.1.0.jar $HADOOP_HOME/
Harp depends on a group of third party libraries. Make sure to install them before launching the applications
cd third_party/
## JAR files
cp *.jar $HADOOP_HOME/share/hadoop/mapreduce/
## DAAL 2018
## copy daal java API lib
cp daal-2018/lib/daal.jar $HADOOP_HOME/share/hadoop/mapreduce/
## copy native libs to HDFS
hdfs dfs -mkdir -p /Hadoop
hdfs dfs -mkdir -p /Hadoop/Libraries
hdfs dfs -put daal-2018/lib/intel64_lin/libJavaAPI.so /Hadoop/Libraries
hdfs dfs -put tbb/lib/intel64_lin/gcc4.4/libtbb* /Hadoop/Libraries
Harp-DAAL-Experimental only supports an installation from source code for now. Please follow the steps
- Pull the DAAL source code branch: daal_2018 branch
git clone -b daal_2018 [email protected]:DSC-SPIDAL/harp.git
mv harp harp-daal-exp
cd harp-daal-exp
or git pull the submodule from third_party/daal-exp/
cd harp/
git submodule update --init --recursive
cd third_party/daal-exp/
- Compile the native library either by icc or gnu
## use COMPILER=gun if icc is not available
make daal PLAT=lnx32e COMPILER=icc
- Setup DAALROOT environment variable by sourcing scripts from DAAL release codes.
source ../__release_lnx/daal/bin/daalvars.sh intel64
- Compile harp-daal-experimental modules at Harp. Makesure that line 17 of harp/pom.xml file is uncommented and DAALROOT is setup by step 3.
### check DAALROOT
echo $DAALROOT
### re-run maven to compile
mvn clean package -Phadoop-x.x.x
- Install compiled libraries.
## copy Java API to Hadoop folder
cp ../__release_lnx/daal/lib/daal.jar $HADOOP_HOME/share/hadoop/mapreduce/
## copy harp-daal-exp libs
cp experimental/target/experimental-0.1.0.jar $HADOOP_HOME/
## copy native libs to HDFS
hdfs dfs -mkdir -p /Hadoop
hdfs dfs -mkdir -p /Hadoop/Libraries
hdfs dfs -put ../__release_lnx/daal/lib/intel64_lin/libJavaAPI.so /Hadoop/Libraries
hdfs dfs -put ../__release_lnx/tbb/lib/intel64_lin/gcc4.4/libtbb* /Hadoop/Libraries
hdfs dfs -put harp/third_party/omp/libiomp5.so /Hadoop/Libraries/
hdfs dfs -put harp/third_party/hdfs/libhdfs.so* /Hadoop/Libraries/
The experimental codes have only been tested on Linux 64 bit platforme with Intel icc compiler and GNU compiler.
Make sure that harp-java-0.1.0.jar has been copied to $HADOOP_HOME. Start the Hadoop service
cd $HADOOP_HOME
sbin/start-dfs.sh
sbin/start-yarn.sh
The usage of K-means is
hadoop jar harp-java-0.1.0.jar edu.iu.kmeans.regroupallgather.KMeansLauncher
<num of points> <num of centroids> <vector size> <num of point files per worker>
<number of map tasks> <num threads> <number of iteration> <work dir> <local points dir>
For example:
hadoop jar harp-java-0.1.0.jar edu.iu.kmeans.regroupallgather.KMeansLauncher 1000 10 100 5 2 2 10 /kmeans /tmp/kmeans