Amazon-Reviews-NLP

In this project, different NLP techniques were used to process a set of Amazon reviews, written by buyers. Documents have been preprocessed by applying urls, emojis, numbers, punctuation and stopwords removal. Every word have been then POS tagged and lemmatized. The resulting texts have been represented with different methods, in particular with Tf-Idf representation, Word2Vec representation and Doc2Vec representation. The vector representations of the documents have been used to compute:

Text classification: different machine learning methods were used (Naive-bayes, Support Vector Machine, Logistic Regression) and the results were compared using ROC curve and AUC score. Best results were given by the Tf-Idf representation and by using SVM and Logistic Regression algorithms, with and AUC score of 0.90 and an accuracy of 0.83.
Text clustering: clusters built with Kmeans algorithm, optimal number of cluster found through elbow method, extraction of clusters topics through clusters centroids coordinates and visualizations. It was possible to identify 5 different clusters described by the following topics: Videogames, Movies, Products, Music, Books.

For further details see the report in the repository.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
Report.pdf		Report.pdf
TMS_project.ipynb		TMS_project.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon-Reviews-NLP

About

Releases

Packages

Languages

mattiaboller/Amazon-Reviews-NLP

Folders and files

Latest commit

History

Repository files navigation

Amazon-Reviews-NLP

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages