Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python and scala)
LatestThis is a custom Spark implementation to use in Python and Scala. To make optimum use of parallel computing in Spark, we use Parallel Stochastic Gradient Descent to train the FMs. This forms an alternative to Mini-batch SGD, which is currently available in MLLib to train Logistic Regression models.
Tools available in python :
- Training a Factorization Machine model using parallel stochastic gradient descent.
- Support sparse and dense vector (this implementation is very efficient for sparse vector)
- Cross validation
- Saving and loading model from pickle file
- Evaluating the model using logloss, MSE, Accuracy, Area under the curve of the Recall/Precision graph, Area under the curve of the ROC-curve
- Looking for best learning rate / regularization parameter / factor length using graph and grid search
Tools available in scala :
- Training a Factorization Machine model using parallel stochastic gradient descent.
- Support sparse and dense vector (this implementation is very efficient for sparse vector)
- Cross validation
- Evaluating the model using logloss
This release was tested with Spark 1.4.0