Skip to content

Releases: blebreton/spark-FM-parallelSGD

Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python and scala)

23 Mar 10:10
Compare
Choose a tag to compare

This is a custom Spark implementation to use in Python and Scala. To make optimum use of parallel computing in Spark, we use Parallel Stochastic Gradient Descent to train the FMs. This forms an alternative to Mini-batch SGD, which is currently available in MLLib to train Logistic Regression models.

Tools available in python :

  • Training a Factorization Machine model using parallel stochastic gradient descent.
  • Support sparse and dense vector (this implementation is very efficient for sparse vector)
  • Cross validation
  • Saving and loading model from pickle file
  • Evaluating the model using logloss, MSE, Accuracy, Area under the curve of the Recall/Precision graph, Area under the curve of the ROC-curve
  • Looking for best learning rate / regularization parameter / factor length using graph and grid search

Tools available in scala :

  • Training a Factorization Machine model using parallel stochastic gradient descent.
  • Support sparse and dense vector (this implementation is very efficient for sparse vector)
  • Cross validation
  • Evaluating the model using logloss

This release was tested with Spark 1.4.0