Skip to content

Natural Language Processing project on a dataset of Amazon reviews.

Notifications You must be signed in to change notification settings

mattiaboller/Amazon-Reviews-NLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Amazon-Reviews-NLP

In this project, different NLP techniques were used to process a set of Amazon reviews, written by buyers. Documents have been preprocessed by applying urls, emojis, numbers, punctuation and stopwords removal. Every word have been then POS tagged and lemmatized. The resulting texts have been represented with different methods, in particular with Tf-Idf representation, Word2Vec representation and Doc2Vec representation. The vector representations of the documents have been used to compute:

  • Text classification: different machine learning methods were used (Naive-bayes, Support Vector Machine, Logistic Regression) and the results were compared using ROC curve and AUC score. Best results were given by the Tf-Idf representation and by using SVM and Logistic Regression algorithms, with and AUC score of 0.90 and an accuracy of 0.83.
  • Text clustering: clusters built with Kmeans algorithm, optimal number of cluster found through elbow method, extraction of clusters topics through clusters centroids coordinates and visualizations. It was possible to identify 5 different clusters described by the following topics: Videogames, Movies, Products, Music, Books.

For further details see the report in the repository.

Releases

No releases published

Packages

No packages published