This project is one of the assessment for Machine Learning Foundation course for my B.Tech degree.
In this project I have used spotify song dataset to generate recommendations
Dataset can be obtained by visiting my kaggle
To use python notebook yourself provide your own credentials for spotify api in a text file. Also, do change the path to all the files that are being accessed.
For generating recommendations the notebook runs for approx. 3 minutes per playlist.
Note: Spotify playlist provided by user must be a public playlist
Content based recommendation engines works on the data provided by user, in this case it is the playlist provided by the user.
I have used various field that describe audio for the songs like valence, acousticness, liveness, energy, loudness etc and attributes like genres and popularity.
For genres to be sensible from which machine could learn I have used TFIDF vectorizer to convert it into document matrix from list like object. Categoriacal features like popularity and year are one hot encoded using pd.get_dummies function
Euclidean distance doesn't consider direction of vector, it only considers distance for giving similarity score, while cosine similarity considers angle between the two vectors while giving similarity score.