SVM is used for classification and regression analysis.
SVM solves the following optimization problem:
where
is the regularization term;
is the regularization coefficient;
is the hinge loss as visualized below:
Angel MLLib uses mini-batch gradient descent optimization method for solving SVM's objective; the algorithm is shown below:
-
Data fromat is set in "ml.data.type", supporting "libsvm" and "dummy" types. For details, see Angel Data Format
-
Feature vector's dimension is set in "ml.feature.num"
Several steps must be done before editing the submitting script and running.
- confirm Hadoop and Spark have ready in your environment
- unzip sona--bin.zip to local directory (SONA_HOME)
- upload sona--bin directory to HDFS (SONA_HDFS_HOME)
- Edit $SONA_HOME/bin/spark-on-angel-env.sh, set SPARK_HOME, SONA_HOME, SONA_HDFS_HOME and ANGEL_VERSION
Here's an example of submitting scripts, remember to adjust the parameters and fill in the paths according to your own task.
#test description
actionType=train or predict
jsonFile=path-to-jsons/svm.json
modelPath=path-to-save-model
predictPath=path-to-save-predict-results
input=path-to-data
queue=your-queue
HADOOP_HOME=my-hadoop-home
source ./bin/spark-on-angel-env.sh
export HADOOP_HOME=$HADOOP_HOME
$SPARK_HOME/bin/spark-submit \
--master yarn-cluster \
--conf spark.ps.jars=$SONA_ANGEL_JARS \
--conf spark.ps.instances=10 \
--conf spark.ps.cores=2 \
--conf spark.ps.memory=10g \
--jars $SONA_SPARK_JARS \
--files $jsonFile \
--driver-memory 20g \
--num-executors 20 \
--executor-cores 5 \
--executor-memory 30g \
--queue $queue \
--class org.apache.spark.angel.examples.JsonRunnerExamples \
./lib/angelml-$SONA_VERSION.jar \
jsonFile:./svm.json \
dataFormat:libsvm \
data:$input \
modelPath:$modelPath \
predictPath:$predictPath \
actionType:$actionType \
numBatch:500 \
maxIter:2 \
lr:4.0 \
numField:39