Skip to content

alfredodimassimo/movie_predictions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Alfredo Di Massimo

BrainStation Data Science Diploma Program

Overview:

This is my capstone submission for the january 2022 BrainStation Data Science cohort. 
The focus of this project was to utilize machine learning and Natural Language Processing to predict movie review sentiment.

I was particularly interested in understanding what factors influenced a reviewer's opinion after watching a movie and applying 
the findings to provide actionable insights to producers and improve marketing tactics.

The code for this project is provided in 5 jupyter notebooks found in the "Notebooks" folder:
1. Data Loading and Merging
2. Cleaning and EDA
3. Modeling
4. Findings and Interpretation
5. Appendix (NOTE: It is not required to load this notebook to run the others, as it contains the code related to importing the raw data (minimum 4 hour runtime))

A requirements.txt file is also included, outlining the modules required for running the notebooks.

The data used for this project was derived from the following sources:
- IMDb Movies Dataset: https://www.imdb.com/interfaces/
- IMDb Movie Reviews: https://paperswithcode.com/dataset/imdb-movie-reviews

The data has been stored in the following Google Drive (https://drive.google.com/drive/folders/1ZDrvSZHhjUW53fQ2dlKhZoanF8ULMzVM?usp=sharing).
It contains a "Reviews" folder with the original IMDb Movie Reviews dataset as well as the necessary .csv and .gz files required to run the notebooks.
NOTE: The "Reviews" folder contains ~85MB of data for the reviews as well as the README file for the original study conducted. These reviews are
saved in the .csv files in the main directory and so is not required to run the notebooks.

Lastly, the "models_functions" folder contains the fitted models and list of custom stop words required to run the notebooks. It is stored in the same Google Drive above.

For questions, do not hesitate to contact me at [email protected]

About

My BrainStation Data Science 2022 Capstone Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published