Skip to content

Latest commit

 

History

History
53 lines (34 loc) · 2.81 KB

RUNNING.md

File metadata and controls

53 lines (34 loc) · 2.81 KB

Phase 0 - Initial setup

To install all the required libraries and dependencies, run the following command in your virtual environment:

  • pip3 install -r requirements.txt

Phase 1 - Setting up training dataset

  • Download the initial Amazon dataset from here

  • Split the dataset into manageable chunks by running the split_dataset.py file as follows : sudo ${SPARK_HOME}/bin/spark-submit split_dataset.py Electronics.json output

  • Upload the split files to S3

Phase 2 - Training the model

  • Setup an EMR cluster to run a pyspark script (refer Assignment 5) EMR

  • Run the model_creation.py file (to create a model based on the split training data) on the cluster with a suitable configuration as follows:

    spark-submit --deploy-mode cluster --conf spark.yarn.maxAppAttempts=1 s3://amazon-product-recommender/scripts/model_creation.py s3://amazon-product-recommender/ElectronicProductDataZIP/ s3://amazon-product-recommender/output

  • This will create a model that will be placed in S3

Phase 3 - Scraping Amazon data

  • Lambda

  • This script was run as an AWS lambda function. However, if required, it can be run locally as follows:

  • Firstly, add your AWS_ACCESS_KEY, AWS_SECRET_KEY and AWS_SQS_QUEUE_NAME in scraper.py after creating a queue in SQS.

  • Run the python scraper script with the below command :

    python3 scraper.py This script scrapes the required data from Amazon.ca and pushes it to the SQS queue.

  • You should be able to see some messages in your SQS queue SQS Screenshot 1

Phase 4 - Ingesting SQS message queue into DB

  • Initialize an EC2 instance and setup MySQL Server. Add the DB credentials to the sentiment_analyzer.py file

    EC2

  • Add your AWS_ACCESS_KEY, AWS_SECRET_KEY and AWS_SQS_QUEUE_NAME to sentiment_analyzer.py

  • Run the sentiment_analyzer.py using the following command :

    python3 sentiment_analyzer.py path_to_model

  • This file will ingest all messages from the SQS queue and run them through the model, then the predicted output labels are correspondingly modified in the MySQL DB.

Phase 5 - Visualizing the data on Grafana

  • On the same EC2 instance, setup a Grafana server.

  • In the grafana server, initialize a data-source as your already existing MySQL server.

  • Import the grafana_dashboard.json, you should now be able to visualize the latest data from the database.