I used FPGrowth to make this recommender web app: wackysubs.com #896
geoffreya
started this conversation in
Show and tell
Replies: 1 comment 3 replies
-
Glad to hear! And thanks so much for sharing your experience regarding the performance considerations. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
My new website wackysubs.com was built using association rules mining of all the subreddit names that people posted comments to at reddit.com for two months in 2018. Soon I will get a more recent dataset and train that and install it at the website. The purpose of my web app is to help people find obscure but interesting subreddits they may like to use, but did not know about them.
My computer memory during model training was the main limitation to making a model that includes all the newest or least-popular subreddits. I had to put a limit of min_confidence = 0.0003 (no lower than this) to fit into my 128GB RAM plus 128GB swap which is NVMe-backed.
After training I was able to delete many rows and columns of the dataframe because they are not relevant for my app. I did not need to use any antecedents or consequents of greater than length 3. So, during postprocessing I removed those, shrinking down the model to 10% of the original size with no loss of functionality for my app, and greatly speeding up the query which runs on click of the Find button. I also did some other tricks to speed up the query of the trained model, notably eliminating the frozensets completely and replacing it with simple 1/0 columns for a very wide dataframe of all the subreddits. I found top query performance this way using pyarrow/feather of pandas lib.
As a proposal, people with requirements such as mine, could build a better more inclusive model with even lower min_confidence if mlxtend FPGrowth or association_rules let me specify a way to limit the rule length in advance, so that the search would never puts things I won't use (long rules) into the model tree and the RAM in the first place. Just my 2 cents!
Hope you enjoy the app! It was enjoyable using mlxtend which I found very bug-free compared to other open source in my experience. thanks for making mlxtend!
Beta Was this translation helpful? Give feedback.
All reactions