Project Overview

The aim of this project is to evaluate and compare the sentiment prediction performances of various supervised machine learning (ML) algorithms and an unsupervised pre-trained transformer model. This project uses Kaggle's Amazon fine food reviews dataset for the analysis. The supervised custom ML algorithms used in this project are traditional ML algorithms namely Logistic Regression, Multinomial Naive Bayes, SVM and LightGBM, and deep learning algorithms namely deep Neural Networks and 1 dimensional CNN. For the pre-trained transformer model, a version of Bidirectional Encoder Representations from Transformers (BERT), which is fine-tuned for sentiment analysis of product reviews, is used. Here is the link to the notebook of the project.

Dataset

The dataset consists of 568,454 user reviews on various food products. For each record, the dataset consists of three important data inputs (columns): review, summary and rating. The review describes the user's feedback regarding the product while the summary describes the general sentiment of the feedback. The rating of each review is given in numbers between 1 and 5, 1 and 5 being the poorest and the best respectively.

Methodology

We aim to implement a model that will predict a review's sentiment as positive, neutral or negative. To achieve the objective, the following key tasks were performed;

Data cleaning to remove redundant reviews and errors.
Categorizing review ratings 1-2, 3 and 4-5 as negative, neutral and positive sentiments respectively.
Performing EDA to analyse the distribution of data per rating/sentiment as well as the word cloud of the dataset per sentiment category.
Selecting a small subset of unseen/test data from the dataset for evaluating and comparing the performances of the models.
Balancing the remaining dataset by selecting a data subset consisting of an equal number of reviews for each sentiment category.
Merging review and summary columns to form a new data input column.
Text pre-processing of the merged data.
Splitting the dataset into train and validation datasets.
Modelling the supervised models with the train data and evaluating them with the validation dataset to find the best-performing model. TFIDF and Glove's vector representation methods were used for text vectorization for traditional ML and deep learning-based models respectively.
Predicting sentiments of the unseen data using the best-performing custom ML model.
Predicting sentiments of the unseen data using the pre-trained BERT model.
Comparing the prediction performances of the custom ML and BERT on the unseen data.
Performing false prediction analysis of the overall best-performing model.

Prediction Result Summary

Below are sentiment best-performing prediction performances (confusion matrices and classification reports) of the best-performing custom ML and BERT on the unseen data:

A. Confusion Matrix and Classification Report of SVM

B. Confusion Matrix and Classification Report of BERT

It can be observed that the pre-trained BERT model has outperformed custom ML algorithms in overall accuracy, precision and recall. While the model has achieved 100% precision for the positive sentiment category, it has poorly performed in recall for the same category.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Classification Report - BERT 40.png		Classification Report - BERT 40.png
Classification Report - Custom ML 40.png		Classification Report - Custom ML 40.png
Confusion Matrix - BERT 40.png		Confusion Matrix - BERT 40.png
Confusion Matrix - Custom ML 40.png		Confusion Matrix - Custom ML 40.png
README.md		README.md
Sentiment Analysis of Amazon Food Reviews - ML & BERT 7.ipynb		Sentiment Analysis of Amazon Food Reviews - ML & BERT 7.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Overview

Dataset

Methodology

Prediction Result Summary

About

Uh oh!

Releases

Packages

Languages

Popseli/Sentiment-Analysis-of-Food-Reviews-Using-Custom-ML-and-Transfer-Learning

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Dataset

Methodology

Prediction Result Summary

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages