Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

please enumerate model coefficients #1

Open
kgierach opened this issue Nov 17, 2015 · 4 comments
Open

please enumerate model coefficients #1

kgierach opened this issue Nov 17, 2015 · 4 comments

Comments

@kgierach
Copy link

Hi,

Can you please document where the intercept, and the individual term "w" coefficients are stored as part of the model?

As I briefly looked thru the code, I only see the factorized matrix as part of the model.

Thank you.

@blebreton
Copy link
Owner

Hi,

In this implementation of FMs, I choose not to use the intercept and the individual 'w' coefficients, so the model parameters that have to be estimated are the 'v' parameters (in the dot product term) of size n*k (nrFeatures * [factorLength: dimensionality of the factorization]).
Since the FM is already considering the interaction a lot, if you add these two more terms, there will not contribute to much value to your model. And calculation of the second term is also time consuming.

@kgierach
Copy link
Author

kgierach commented Dec 1, 2015

Thanks for your answer.  I suspected that might be the case.
That leads me to my next question:  I have extracted the factorized matrix produced by the model in your implementation.  I found that Rendle's libFM implementation in C++ provided the correct results with my test dataset. However with this implementation, the correct results were not present.
If you'd like to take a look, I can provide you with the test dataset and expected results, as well as the C++ implementation's results.
Sincerely,Karl
From: blebreton [email protected]
To: blebreton/spark-FM-parallelSGD [email protected]
Cc: Karl D. Gierach [email protected]
Sent: Tuesday, December 1, 2015 6:11 AM
Subject: Re: [spark-FM-parallelSGD] please enumerate model coefficients (#1)

Hi,In this implementation of FMs, I choose not to use the intercept and the individual 'w' coefficients, so the model parameters that have to be estimated are the 'v' parameters (in the dot product term) of size n*k (nrFeatures * [factorLength: dimensionality of the factorization]).
Since the FM is already considering the interaction a lot, if you add these two more terms, there will not contribute to much value to your model. And calculation of the second term is also time consuming.—
Reply to this email directly or view it on GitHub.

@blebreton
Copy link
Owner

I'm interested in your results, I will take a look if you send me the data.
Could you be more specific with the problem you encounter? I'm not sure to understand what you mean by "the correct results were not present". You can't find the right predictions for your test set with my implementation? The parameters (step, reg. parameter) may also be different for each implementation.
Thanks,
Benjamin

@kgierach
Copy link
Author

kgierach commented Dec 8, 2015

Hi Benjamin,

That would be great if you could take a look. I uploaded the generated data into this folder on box.com:
https://app.box.com/s/5podd5oj0mmzh5oxvz4quudi9dznzry6

There is also a "box note" there which has the expected cross terms for both datasets. Both datasets were generated with the the same basic cross-terms in mind, but using 2 different techniques.

Using observational statistics; namely mean, covariance, and variance works to help identify some of the terms but the libFM implementation by Rendle in C++, it seems to have found all the expected x-terms with only a small handful of false positives.

I have also built a small command line scala project that uses your LibFM code (converted to scala classes). Let me know if you'd like me to post it or send it to you.

Thanks,
Karl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants