Sentiment Analysis Model for Product Reviews

Project Overview

This project is a production-ready DistilBERT Sentiment Analysis model for product reviews desgined as a low cost market research tool with the nuiance of an actual market researcher, it was trained with a dataset of over 4 million Amazon product reviews to work for a wide variety of products in a highly scalable manner. This project comes with designs for a serverless live demo in AWS.

Key Features

Pre-trained DistilBERT model for sentiment analysis on product reviews
Designed for low compute costs to maximize scalability
Conversion from PyTorch to TensorFlow
Data processing and tokenization pipeline
Comprehensive testing of both original and converted models
Designs of an AWS serverless architecture for a live demo with future deployment(TBD)

Project Structure

1-Data processing: Script for data preparation
2-CSV files(Production test): CSV file used for production testing(Full dataset https://www.kaggle.com/datasets/bittlingmayer/amazonreviews)
3-Tokenization and Training: Tokenization process and initial model training
4-distilbert_sentiment_model: Original PyTorch DistilBERT model
5-Testing model: Scripts for testing the PyTorch model, both a write in and a CSV input script
6-PyTorch2TensorFlow: Conversion script from PyTorch to TensorFlow
7-tensorflow_model: Converted TensorFlow model
8-Testing converted model: Scripts for testing the TensorFlow model, includes a write in one, and 2 CSV input ones, one for testing the model, one for production usage

Documentation

MODEL.md: https://github.com/DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews/blob/main/MODEL.md
PROCESS.md: https://github.com/DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews/blob/main/PROCESS.md
AWS-IMPLEMENTATION-AND-COST-REPORT.md: https://github.com/DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews/blob/main/AWS-IMPLEMENTATION-WITH-COST-REPORT.md

Model Performance

4M-dataset-sentiment_distribution.png: Visualization of sentiment distribution in the dataset
400K-dataset-testing.png: Results of model testing on a 400K sample dataset
Accuracy: 0.9973
Precision: 0.9968
Recall: 0.9979
F1: 0.9973

Quick Start Guide

Clone the repository:

git clone https://github.com/DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews.git
cd aws-serverless-nlp-sentiment-4M-product-reviews

Install dependencies:

pip install tensorflow transformers torch pandas numpy scikit-learn onnx onnx-tf

To test the PyTorch model:

python 5-Testing model\sentiment-analysis-testing-script-write-in.py

To test the TensorFlow model:

python 8-Testing converted model\tensorflow-model-test-script.py

DISCLAIMER: The paths for the models have to be put in by the user

Future Work

Implement AWS deployment
Create API for real-time sentiment prediction
Optimize model performance and size

License

This project is licensed under the Apache-2.0 license - see the LICENSE file for details.

Contact

For any questions or feedback, please open an issue in this repository or contact Dimitri Vavoulis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis Model for Product Reviews

Project Overview

Key Features

Project Structure

Documentation

Model Performance

Quick Start Guide

Future Work

License

Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
1-Data processing		1-Data processing
2-CSV files(Production test)		2-CSV files(Production test)
3-Tokenization and Training		3-Tokenization and Training
4-distilbert_sentiment_model		4-distilbert_sentiment_model
5-Testing model		5-Testing model
6-PyTorch2TensorFlow		6-PyTorch2TensorFlow
7-tensorflow_model		7-tensorflow_model
8-Testing converted model		8-Testing converted model
.gitattributes		.gitattributes
.gitignore		.gitignore
400K-dataset-testing.PNG		400K-dataset-testing.PNG
4M-dataset-sentiment_distribution.png		4M-dataset-sentiment_distribution.png
AWS-IMPLEMENTATION-WITH-COST-REPORT.md		AWS-IMPLEMENTATION-WITH-COST-REPORT.md
LICENSE		LICENSE
MODEL.md		MODEL.md
PROCESS.md		PROCESS.md
README.md		README.md

License

DimitriVavoulisPortfolio/aws-serverless-nlp-sentiment-4M-product-reviews

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis Model for Product Reviews

Project Overview

Key Features

Project Structure

Documentation

Model Performance

Quick Start Guide

Future Work

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages