Release Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python and scala) · blebreton/spark-FM-parallelSGD

This is a custom Spark implementation to use in Python and Scala. To make optimum use of parallel computing in Spark, we use Parallel Stochastic Gradient Descent to train the FMs. This forms an alternative to Mini-batch SGD, which is currently available in MLLib to train Logistic Regression models.

Tools available in python :

Training a Factorization Machine model using parallel stochastic gradient descent.
Support sparse and dense vector (this implementation is very efficient for sparse vector)
Cross validation
Saving and loading model from pickle file
Evaluating the model using logloss, MSE, Accuracy, Area under the curve of the Recall/Precision graph, Area under the curve of the ROC-curve
Looking for best learning rate / regularization parameter / factor length using graph and grid search

Tools available in scala :

Training a Factorization Machine model using parallel stochastic gradient descent.
Support sparse and dense vector (this implementation is very efficient for sparse vector)
Cross validation
Evaluating the model using logloss

This release was tested with Spark 1.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python and scala)