You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a basic question about the use of lightFM, apologies if this isn't the right forum.
I'm building a recommender system that will recommend documents to users. There are no interactions yet and all we know about the users are the set of keywords they're interested in.
I've built a prototype where I transform each document using TF-IDF. I then transform the user's keywords with the same transformer and use cosine similarity to find the most relevant documents. It works reasonably well.
I'm now porting this to lightFM so that we can include interactions, but first I need the system to perform equally well as the TF-IDF solution, but I struggle to make it work. Here's the current approach:
build Dataset object on all items in the corpus, using TF-IDF to build item features
When request for recommendations for a new user comes in:
get that user’s keywords. Form a pseudo-document containing just a string with all the keywords.
get the TF-IDF features on that pseudo document, using the same vectorizer used to build the corpus features
retrain the LightFM model, with a single interaction between the user and the pseudo document and item_features formed by concatenating the corpus's item features and the pseudo document's features
call the predict function to get the recommendations
In my unit tests I have 52 documents, which get transformed to a TF-IDF vector of about 3300 columns. The user's pseudo document is transformed to a vector with a single 1.0 entry corresponding to that keyword.
So I would expect the prediction to score high those documents for which the TF-IDF entry corresponding to the keyword are also high. But instead, the scores are more or less the same, about -0.5.
Am I doing something wrong here?
The text was updated successfully, but these errors were encountered:
I have a basic question about the use of lightFM, apologies if this isn't the right forum.
I'm building a recommender system that will recommend documents to users. There are no interactions yet and all we know about the users are the set of keywords they're interested in.
I've built a prototype where I transform each document using TF-IDF. I then transform the user's keywords with the same transformer and use cosine similarity to find the most relevant documents. It works reasonably well.
I'm now porting this to lightFM so that we can include interactions, but first I need the system to perform equally well as the TF-IDF solution, but I struggle to make it work. Here's the current approach:
When request for recommendations for a new user comes in:
In my unit tests I have 52 documents, which get transformed to a TF-IDF vector of about 3300 columns. The user's pseudo document is transformed to a vector with a single 1.0 entry corresponding to that keyword.
So I would expect the prediction to score high those documents for which the TF-IDF entry corresponding to the keyword are also high. But instead, the scores are more or less the same, about -0.5.
Am I doing something wrong here?
The text was updated successfully, but these errors were encountered: