Phase 0 - Initial setup

To install all the required libraries and dependencies, run the following command in your virtual environment:

Phase 1 - Setting up training dataset

Download the initial Amazon dataset from here
Split the dataset into manageable chunks by running the split_dataset.py file as follows : sudo ${SPARK_HOME}/bin/spark-submit split_dataset.py Electronics.json output
Upload the split files to S3

Setup an EMR cluster to run a pyspark script (refer Assignment 5)
Run the model_creation.py file (to create a model based on the split training data) on the cluster with a suitable configuration as follows:

spark-submit --deploy-mode cluster --conf spark.yarn.maxAppAttempts=1 s3://amazon-product-recommender/scripts/model_creation.py s3://amazon-product-recommender/ElectronicProductDataZIP/ s3://amazon-product-recommender/output
This will create a model that will be placed in S3

This script was run as an AWS lambda function. However, if required, it can be run locally as follows:
Firstly, add your AWS_ACCESS_KEY, AWS_SECRET_KEY and AWS_SQS_QUEUE_NAME in scraper.py after creating a queue in SQS.
Run the python scraper script with the below command :

python3 scraper.py This script scrapes the required data from Amazon.ca and pushes it to the SQS queue.
You should be able to see some messages in your SQS queue

Initialize an EC2 instance and setup MySQL Server. Add the DB credentials to the sentiment_analyzer.py file
Add your AWS_ACCESS_KEY, AWS_SECRET_KEY and AWS_SQS_QUEUE_NAME to sentiment_analyzer.py
Run the sentiment_analyzer.py using the following command :

python3 sentiment_analyzer.py path_to_model
This file will ingest all messages from the SQS queue and run them through the model, then the predicted output labels are correspondingly modified in the MySQL DB.

On the same EC2 instance, setup a Grafana server.
In the grafana server, initialize a data-source as your already existing MySQL server.
Import the grafana_dashboard.json, you should now be able to visualize the latest data from the database.