Skip to content

Implementation of Factorization Machines on Spark using parallel stochastic gradient descent (python and scala)

Latest
Compare
Choose a tag to compare
@blebreton blebreton released this 23 Mar 10:10
· 1 commit to master since this release

This is a custom Spark implementation to use in Python and Scala. To make optimum use of parallel computing in Spark, we use Parallel Stochastic Gradient Descent to train the FMs. This forms an alternative to Mini-batch SGD, which is currently available in MLLib to train Logistic Regression models.

Tools available in python :

  • Training a Factorization Machine model using parallel stochastic gradient descent.
  • Support sparse and dense vector (this implementation is very efficient for sparse vector)
  • Cross validation
  • Saving and loading model from pickle file
  • Evaluating the model using logloss, MSE, Accuracy, Area under the curve of the Recall/Precision graph, Area under the curve of the ROC-curve
  • Looking for best learning rate / regularization parameter / factor length using graph and grid search

Tools available in scala :

  • Training a Factorization Machine model using parallel stochastic gradient descent.
  • Support sparse and dense vector (this implementation is very efficient for sparse vector)
  • Cross validation
  • Evaluating the model using logloss

This release was tested with Spark 1.4.0