Skip to content

AURALIA-MALIK/dsc-phase-1-project-v2-4

 
 

Repository files navigation

Final Project Submission

Please fill out:

  • Student name: AURALIA ADILLA MBOYA
  • Student pace: Full Time
  • Scheduled project review date/time: Nov 20th/ 11:59pm
  • Instructor name: Mark Tiba
  • Blog post URL: N/A

PROJECT OUTLINE

  1. Introduction
  2. Business Understanding
    • Business problem
    • Objectives
  3. Data Understanding
  4. Data preparation
    • Data Loading
    • Data cleaning
    • Data Analysis
  5. Exploratory Descriptive Analysis (EDA)
    • Translating data into visual context
    • Plotting of graphs.
  6. Conclusion
  7. Recommendations

PHASE 1 PROJECT : Microsoft Film Prduction Studio

PROJECT OVERVIEW

I will use exploratory data analysis to produce insights for a business stakeholder in this segment.

I'll walk you through my research findings and how I turn them into useful information that stakeholders can use to guide their decision-making.

Screenshot

BUSINESS UNDERSTANDING

Business Problem

Microsoft sees all the big companies creating original video content and they want to get in on the fun. They have decided to create a new movie studio, but they don’t know anything about creating movies.They have hired you to help them better understand the movie industry. Your team is charged with exploring what type of films are currently doing the best at the box office. You must then translate those findings into actionable insights that the head of Microsoft's new movie studio can use to help decide what type of films to create.

Objectives

  1. What is the correlation between the genre and movie runtime?
  2. Which genre has the highest rating?
  3. Which of the genres has the highest production budget?
  4. Which Genre has the highest world wide gross?
  5. Which is the most voted for genre?

DATA UNDERSTANDING

The majority of the information I used for this project came from a zipped folder that contained materials provided by the school. Since they have different file formats, they were all compressed into one folder.

The URLs to the data that I will be modifying for this project are listed below:

a. Box Office Mojo

b. IMDB

c. Rotten Tomatoes

d. TheMovieDB

e. The Numbers

For a film studio to exist or be successful, we must conduct research, comprehend the information from the content provided, choose the right performers, and identify the top authors and writers for the various genres. To make Microsoft Film Studio successful, we will need to comprehend all the facts at our disposal. Four of These links' data were utilized for this project.

DATA PREPARATION

I'll be transforming data into usable format from this point on. Screenshot

A TABLE OF RELATIONSHIPS

The relationships shown in the ERD below are what our datasets should have once they have been cleaned up in order for the stakeholder to understand what we are attempting to do.

Screenshot

LOADING DATA

# loading allthe necessary libraries
#For a more user-friendly data representation, import Pandas as pd.
#For the SQL database,import sqlite3
#import Numpy for arrays as np
#import Seaborn and Matplotlib for visualizations
#import json for the available structured data

import pandas as pd
import sqlite3
import numpy as np
import seaborn as sns 
import json
import matplotlib.pyplot as plt
%matplotlib inline
import csv
#Verifying that all necessary datasets have successfully loaded
#Checking for the necessary datasets
!ls -a
.
..
.canvas
.git
.gitignore
.ipynb_checkpoints
CONTRIBUTING.md
LICENSE.md
Production Budget  Vs Genres.png
Production Budget Vs Genres.png
README.md
awesome.gif
bom.movie_gross.csv
im.db
movie_data_erd.jpeg
rt.movie_info.tsv
rt.reviews.tsv
student.ipynb
tmdb.movies.csv
tn.movie_budgets.csv
untitled
zippedData
#loading the box office mojo file
movie_gross=pd.read_csv ('bom.movie_gross.csv')
movie_gross
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year
0 Toy Story 3 BV 415000000.0 652000000 2010
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010
3 Inception WB 292600000.0 535700000 2010
4 Shrek Forever After P/DW 238700000.0 513900000 2010
... ... ... ... ... ...
3382 The Quake Magn. 6200.0 NaN 2018
3383 Edward II (2018 re-release) FM 4800.0 NaN 2018
3384 El Pacto Sony 2500.0 NaN 2018
3385 The Swan Synergetic 2400.0 NaN 2018
3386 An Actor Prepares Grav. 1700.0 NaN 2018

3387 rows × 5 columns

#loading movie_budgets file
movie_budgets = pd.read_csv('tn.movie_budgets.csv')
movie_budgets
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id release_date movie production_budget domestic_gross worldwide_gross
0 1 Dec 18, 2009 Avatar $425,000,000 $760,507,625 $2,776,345,279
1 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides $410,600,000 $241,063,875 $1,045,663,875
2 3 Jun 7, 2019 Dark Phoenix $350,000,000 $42,762,350 $149,762,350
3 4 May 1, 2015 Avengers: Age of Ultron $330,600,000 $459,005,868 $1,403,013,963
4 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi $317,000,000 $620,181,382 $1,316,721,747
... ... ... ... ... ... ...
5777 78 Dec 31, 2018 Red 11 $7,000 $0 $0
5778 79 Apr 2, 1999 Following $6,000 $48,482 $240,495
5779 80 Jul 13, 2005 Return to the Land of Wonders $5,000 $1,338 $1,338
5780 81 Sep 29, 2015 A Plague So Pleasant $1,400 $0 $0
5781 82 Aug 5, 2005 My Date With Drew $1,100 $181,041 $181,041

5782 rows × 6 columns

#loading the imdb file
movie_info= pd.read_csv('tmdb.movies.csv')
movie_info
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Unnamed: 0 genre_ids id original_language original_title popularity release_date title vote_average vote_count
0 0 [12, 14, 10751] 12444 en Harry Potter and the Deathly Hallows: Part 1 33.533 2010-11-19 Harry Potter and the Deathly Hallows: Part 1 7.7 10788
1 1 [14, 12, 16, 10751] 10191 en How to Train Your Dragon 28.734 2010-03-26 How to Train Your Dragon 7.7 7610
2 2 [12, 28, 878] 10138 en Iron Man 2 28.515 2010-05-07 Iron Man 2 6.8 12368
3 3 [16, 35, 10751] 862 en Toy Story 28.005 1995-11-22 Toy Story 7.9 10174
4 4 [28, 878, 12] 27205 en Inception 27.920 2010-07-16 Inception 8.3 22186
... ... ... ... ... ... ... ... ... ... ...
26512 26512 [27, 18] 488143 en Laboratory Conditions 0.600 2018-10-13 Laboratory Conditions 0.0 1
26513 26513 [18, 53] 485975 en _EXHIBIT_84xxx_ 0.600 2018-05-01 _EXHIBIT_84xxx_ 0.0 1
26514 26514 [14, 28, 12] 381231 en The Last One 0.600 2018-10-01 The Last One 0.0 1
26515 26515 [10751, 12, 28] 366854 en Trailer Made 0.600 2018-06-22 Trailer Made 0.0 1
26516 26516 [53, 27] 309885 en The Church 0.600 2018-10-05 The Church 0.0 1

26517 rows × 10 columns

#loading movie info file
#to check if there is any relevant data 
movie_info= pd.read_table('rt.movie_info.tsv')
movie_info
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id synopsis rating genre director writer theater_date dvd_date currency box_office runtime studio
0 1 This gritty, fast-paced, and innovative police... R Action and Adventure|Classics|Drama William Friedkin Ernest Tidyman Oct 9, 1971 Sep 25, 2001 NaN NaN 104 minutes NaN
1 3 New York City, not-too-distant-future: Eric Pa... R Drama|Science Fiction and Fantasy David Cronenberg David Cronenberg|Don DeLillo Aug 17, 2012 Jan 1, 2013 $ 600,000 108 minutes Entertainment One
2 5 Illeana Douglas delivers a superb performance ... R Drama|Musical and Performing Arts Allison Anders Allison Anders Sep 13, 1996 Apr 18, 2000 NaN NaN 116 minutes NaN
3 6 Michael Douglas runs afoul of a treacherous su... R Drama|Mystery and Suspense Barry Levinson Paul Attanasio|Michael Crichton Dec 9, 1994 Aug 27, 1997 NaN NaN 128 minutes NaN
4 7 NaN NR Drama|Romance Rodney Bennett Giles Cooper NaN NaN NaN NaN 200 minutes NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
1555 1996 Forget terrorists or hijackers -- there's a ha... R Action and Adventure|Horror|Mystery and Suspense NaN NaN Aug 18, 2006 Jan 2, 2007 $ 33,886,034 106 minutes New Line Cinema
1556 1997 The popular Saturday Night Live sketch was exp... PG Comedy|Science Fiction and Fantasy Steve Barron Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner Jul 23, 1993 Apr 17, 2001 NaN NaN 88 minutes Paramount Vantage
1557 1998 Based on a novel by Richard Powell, when the l... G Classics|Comedy|Drama|Musical and Performing Arts Gordon Douglas NaN Jan 1, 1962 May 11, 2004 NaN NaN 111 minutes NaN
1558 1999 The Sandlot is a coming-of-age story about a g... PG Comedy|Drama|Kids and Family|Sports and Fitness David Mickey Evans David Mickey Evans|Robert Gunter Apr 1, 1993 Jan 29, 2002 NaN NaN 101 minutes NaN
1559 2000 Suspended from the force, Paris cop Hubert is ... R Action and Adventure|Art House and Internation... NaN Luc Besson Sep 27, 2001 Feb 11, 2003 NaN NaN 94 minutes Columbia Pictures

1560 rows × 12 columns

#Using the encode attribute to load a tsv file and display tab separated values
rt_reviews = pd.read_table('rt.reviews.tsv', encoding='unicode_escape') 
rt_reviews
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id review rating fresh critic top_critic publisher date
0 3 A distinctly gallows take on contemporary fina... 3/5 fresh PJ Nabarro 0 Patrick Nabarro November 10, 2018
1 3 It's an allegory in search of a meaning that n... NaN rotten Annalee Newitz 0 io9.com May 23, 2018
2 3 ... life lived in a bubble in financial dealin... NaN fresh Sean Axmaker 0 Stream on Demand January 4, 2018
3 3 Continuing along a line introduced in last yea... NaN fresh Daniel Kasman 0 MUBI November 16, 2017
4 3 ... a perverse twist on neorealism... NaN fresh NaN 0 Cinema Scope October 12, 2017
... ... ... ... ... ... ... ... ...
54427 2000 The real charm of this trifle is the deadpan c... NaN fresh Laura Sinagra 1 Village Voice September 24, 2002
54428 2000 NaN 1/5 rotten Michael Szymanski 0 Zap2it.com September 21, 2005
54429 2000 NaN 2/5 rotten Emanuel Levy 0 EmanuelLevy.Com July 17, 2005
54430 2000 NaN 2.5/5 rotten Christopher Null 0 Filmcritic.com September 7, 2003
54431 2000 NaN 3/5 fresh Nicolas Lacroix 0 Showbizz.net November 12, 2002

54432 rows × 8 columns

#Sqlite3 connection to the database for reading the files
conn = sqlite3.connect("im.db")
conn
<sqlite3.Connection at 0x2b2c795b3f0>
#load the necessary data from the movie_ratings sql file
movie_ratings = pd.read_sql_query("""
SELECT *
FROM movie_ratings
LIMIT 10
;
""", conn)
movie_ratings
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
movie_id averagerating numvotes
0 tt10356526 8.3 31
1 tt10384606 8.9 559
2 tt1042974 6.4 20
3 tt1043726 4.2 50352
4 tt1060240 6.5 21
5 tt1069246 6.2 326
6 tt1094666 7.0 1613
7 tt1130982 6.4 571
8 tt1156528 7.2 265
9 tt1161457 4.2 148
#load the data from the movie_basics
movie_basics = pd.read_sql_query("""
SELECT *
FROM movie_basics
;
""", conn)
movie_basics
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
movie_id primary_title original_title start_year runtime_minutes genres
0 tt0063540 Sunghursh Sunghursh 2013 175.0 Action,Crime,Drama
1 tt0066787 One Day Before the Rainy Season Ashad Ka Ek Din 2019 114.0 Biography,Drama
2 tt0069049 The Other Side of the Wind The Other Side of the Wind 2018 122.0 Drama
3 tt0069204 Sabse Bada Sukh Sabse Bada Sukh 2018 NaN Comedy,Drama
4 tt0100275 The Wandering Soap Opera La Telenovela Errante 2017 80.0 Comedy,Drama,Fantasy
... ... ... ... ... ... ...
146139 tt9916538 Kuambil Lagi Hatiku Kuambil Lagi Hatiku 2019 123.0 Drama
146140 tt9916622 Rodolpho Teóphilo - O Legado de um Pioneiro Rodolpho Teóphilo - O Legado de um Pioneiro 2015 NaN Documentary
146141 tt9916706 Dankyavar Danka Dankyavar Danka 2013 NaN Comedy
146142 tt9916730 6 Gunn 6 Gunn 2017 116.0 None
146143 tt9916754 Chico Albuquerque - Revelações Chico Albuquerque - Revelações 2013 NaN Documentary

146144 rows × 6 columns

#load movie_akas to display relevant data needed.
movie_akas = pd.read_sql_query("""
SELECT *
FROM movie_akas
;
""", conn)
movie_akas
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
movie_id ordering title region language types attributes is_original_title
0 tt0369610 10 Джурасик свят BG bg None None 0.0
1 tt0369610 11 Jurashikku warudo JP None imdbDisplay None 0.0
2 tt0369610 12 Jurassic World: O Mundo dos Dinossauros BR None imdbDisplay None 0.0
3 tt0369610 13 O Mundo dos Dinossauros BR None None short title 0.0
4 tt0369610 14 Jurassic World FR None imdbDisplay None 0.0
... ... ... ... ... ... ... ... ...
331698 tt9827784 2 Sayonara kuchibiru None None original None 1.0
331699 tt9827784 3 Farewell Song XWW en imdbDisplay None 0.0
331700 tt9880178 1 La atención None None original None 1.0
331701 tt9880178 2 La atención ES None None None 0.0
331702 tt9880178 3 The Attention XWW en imdbDisplay None 0.0

331703 rows × 8 columns

Data Cleaning

I'll be Correcting or deleting inaccurate, corrupted, improperly formatted, duplicate, or incomplete data from the relevant datasets.

#starting data cleaning from the first dataset
#cheecking for any erraneous data, null values or incomplete
movie_gross
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year
0 Toy Story 3 BV 415000000.0 652000000 2010
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010
3 Inception WB 292600000.0 535700000 2010
4 Shrek Forever After P/DW 238700000.0 513900000 2010
... ... ... ... ... ...
3382 The Quake Magn. 6200.0 NaN 2018
3383 Edward II (2018 re-release) FM 4800.0 NaN 2018
3384 El Pacto Sony 2500.0 NaN 2018
3385 The Swan Synergetic 2400.0 NaN 2018
3386 An Actor Prepares Grav. 1700.0 NaN 2018

3387 rows × 5 columns

#convert domestic_gross float to integer type
movie_gross['domestic_gross'] = movie_gross['domestic_gross'].fillna(0).astype(int)
#confirm the conversion
movie_gross
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year
0 Toy Story 3 BV 415000000 652000000 2010
1 Alice in Wonderland (2010) BV 334200000 691300000 2010
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000 664300000 2010
3 Inception WB 292600000 535700000 2010
4 Shrek Forever After P/DW 238700000 513900000 2010
... ... ... ... ... ...
3382 The Quake Magn. 6200 NaN 2018
3383 Edward II (2018 re-release) FM 4800 NaN 2018
3384 El Pacto Sony 2500 NaN 2018
3385 The Swan Synergetic 2400 NaN 2018
3386 An Actor Prepares Grav. 1700 NaN 2018

3387 rows × 5 columns

#convert domestic_gross float type to integer type
movie_gross['domestic_gross'].astype(int)
0       415000000
1       334200000
2       296000000
3       292600000
4       238700000
          ...    
3382         6200
3383         4800
3384         2500
3385         2400
3386         1700
Name: domestic_gross, Length: 3387, dtype: int32
#check for any null values
movie_gross.isna().sum()
title                0
studio               5
domestic_gross       0
foreign_gross     1350
year                 0
dtype: int64
#checked for null values
#null values were found to be 1350 on foreign_gross column, 5 on studio
#Considering that they will be needed later, I have chosen to drop. 
#drop all null values in the datasets
movie_gross.dropna()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year
0 Toy Story 3 BV 415000000 652000000 2010
1 Alice in Wonderland (2010) BV 334200000 691300000 2010
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000 664300000 2010
3 Inception WB 292600000 535700000 2010
4 Shrek Forever After P/DW 238700000 513900000 2010
... ... ... ... ... ...
3275 I Still See You LGF 1400 1500000 2018
3286 The Catcher Was a Spy IFC 725000 229000 2018
3309 Time Freak Grindstone 10000 256000 2018
3342 Reign of Judges: Title of Liberty - Concept Short Darin Southa 93200 5200 2018
3353 Antonio Lopez 1970: Sex Fashion & Disco FM 43200 30000 2018

2033 rows × 5 columns

#checking for any null values to clean 
movie_budgets.isna().sum()
id                   0
release_date         0
movie                0
production_budget    0
domestic_gross       0
worldwide_gross      0
dtype: int64
#calling movie_budgets for cleaning
#checking for any null values
movie_budgets
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id release_date movie production_budget domestic_gross worldwide_gross
0 1 Dec 18, 2009 Avatar $425,000,000 $760,507,625 $2,776,345,279
1 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides $410,600,000 $241,063,875 $1,045,663,875
2 3 Jun 7, 2019 Dark Phoenix $350,000,000 $42,762,350 $149,762,350
3 4 May 1, 2015 Avengers: Age of Ultron $330,600,000 $459,005,868 $1,403,013,963
4 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi $317,000,000 $620,181,382 $1,316,721,747
... ... ... ... ... ... ...
5777 78 Dec 31, 2018 Red 11 $7,000 $0 $0
5778 79 Apr 2, 1999 Following $6,000 $48,482 $240,495
5779 80 Jul 13, 2005 Return to the Land of Wonders $5,000 $1,338 $1,338
5780 81 Sep 29, 2015 A Plague So Pleasant $1,400 $0 $0
5781 82 Aug 5, 2005 My Date With Drew $1,100 $181,041 $181,041

5782 rows × 6 columns

#In the columns production budget, domestic gross, and worldwide gross,for movie_budgets dataframe replacing $ and , data to integers.
movie_budgets['production_budget'] = movie_budgets['production_budget'].str.replace('$','')
movie_budgets['production_budget'] = movie_budgets['production_budget'].str.replace(',','')

movie_budgets['domestic_gross'] = movie_budgets['domestic_gross'].str.replace('$','')
movie_budgets['domestic_gross'] = movie_budgets['domestic_gross'].str.replace(',','')

movie_budgets['worldwide_gross'] = movie_budgets['worldwide_gross'].str.replace('$','')
movie_budgets['worldwide_gross'] = movie_budgets['worldwide_gross'].str.replace(',','')

movie_budgets
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id release_date movie production_budget domestic_gross worldwide_gross
0 1 Dec 18, 2009 Avatar 425000000 760507625 2776345279
1 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides 410600000 241063875 1045663875
2 3 Jun 7, 2019 Dark Phoenix 350000000 42762350 149762350
3 4 May 1, 2015 Avengers: Age of Ultron 330600000 459005868 1403013963
4 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi 317000000 620181382 1316721747
... ... ... ... ... ... ...
5777 78 Dec 31, 2018 Red 11 7000 0 0
5778 79 Apr 2, 1999 Following 6000 48482 240495
5779 80 Jul 13, 2005 Return to the Land of Wonders 5000 1338 1338
5780 81 Sep 29, 2015 A Plague So Pleasant 1400 0 0
5781 82 Aug 5, 2005 My Date With Drew 1100 181041 181041

5782 rows × 6 columns

#calling the next dataset, movie_info
movie_info
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id synopsis rating genre director writer theater_date dvd_date currency box_office runtime studio
0 1 This gritty, fast-paced, and innovative police... R Action and Adventure|Classics|Drama William Friedkin Ernest Tidyman Oct 9, 1971 Sep 25, 2001 NaN NaN 104 minutes NaN
1 3 New York City, not-too-distant-future: Eric Pa... R Drama|Science Fiction and Fantasy David Cronenberg David Cronenberg|Don DeLillo Aug 17, 2012 Jan 1, 2013 $ 600,000 108 minutes Entertainment One
2 5 Illeana Douglas delivers a superb performance ... R Drama|Musical and Performing Arts Allison Anders Allison Anders Sep 13, 1996 Apr 18, 2000 NaN NaN 116 minutes NaN
3 6 Michael Douglas runs afoul of a treacherous su... R Drama|Mystery and Suspense Barry Levinson Paul Attanasio|Michael Crichton Dec 9, 1994 Aug 27, 1997 NaN NaN 128 minutes NaN
4 7 NaN NR Drama|Romance Rodney Bennett Giles Cooper NaN NaN NaN NaN 200 minutes NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
1555 1996 Forget terrorists or hijackers -- there's a ha... R Action and Adventure|Horror|Mystery and Suspense NaN NaN Aug 18, 2006 Jan 2, 2007 $ 33,886,034 106 minutes New Line Cinema
1556 1997 The popular Saturday Night Live sketch was exp... PG Comedy|Science Fiction and Fantasy Steve Barron Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner Jul 23, 1993 Apr 17, 2001 NaN NaN 88 minutes Paramount Vantage
1557 1998 Based on a novel by Richard Powell, when the l... G Classics|Comedy|Drama|Musical and Performing Arts Gordon Douglas NaN Jan 1, 1962 May 11, 2004 NaN NaN 111 minutes NaN
1558 1999 The Sandlot is a coming-of-age story about a g... PG Comedy|Drama|Kids and Family|Sports and Fitness David Mickey Evans David Mickey Evans|Robert Gunter Apr 1, 1993 Jan 29, 2002 NaN NaN 101 minutes NaN
1559 2000 Suspended from the force, Paris cop Hubert is ... R Action and Adventure|Art House and Internation... NaN Luc Besson Sep 27, 2001 Feb 11, 2003 NaN NaN 94 minutes Columbia Pictures

1560 rows × 12 columns

#show all null values in the datasets
movie_info.isna().sum()
id                 0
synopsis          62
rating             3
genre              8
director         199
writer           449
theater_date     359
dvd_date         359
currency        1220
box_office      1220
runtime           30
studio          1066
dtype: int64
#The movie info dataset contains an excessive number of null values.
#synopsis 62,rating 3,genre8,director 199,writer 449,theater_date 359,dvd_date 359,currency 1220,box_office 1220,runtime 30,studio 1066

movie_info.dropna()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id synopsis rating genre director writer theater_date dvd_date currency box_office runtime studio
1 3 New York City, not-too-distant-future: Eric Pa... R Drama|Science Fiction and Fantasy David Cronenberg David Cronenberg|Don DeLillo Aug 17, 2012 Jan 1, 2013 $ 600,000 108 minutes Entertainment One
6 10 Some cast and crew from NBC's highly acclaimed... PG-13 Comedy Jake Kasdan Mike White Jan 11, 2002 Jun 18, 2002 $ 41,032,915 82 minutes Paramount Pictures
7 13 Stewart Kane, an Irishman living in the Austra... R Drama Ray Lawrence Raymond Carver|Beatrix Christian Apr 27, 2006 Oct 2, 2007 $ 224,114 123 minutes Sony Pictures Classics
15 22 Two-time Academy Award Winner Kevin Spacey giv... R Comedy|Drama|Mystery and Suspense George Hickenlooper Norman Snider Dec 17, 2010 Apr 5, 2011 $ 1,039,869 108 minutes ATO Pictures
18 25 From ancient Japan's most enduring tale, the e... PG-13 Action and Adventure|Drama|Science Fiction and... Carl Erik Rinsch Chris Morgan|Hossein Amini Dec 25, 2013 Apr 1, 2014 $ 20,518,224 127 minutes Universal Pictures
... ... ... ... ... ... ... ... ... ... ... ... ...
1530 1968 This holiday season, acclaimed filmmaker Camer... PG Comedy|Drama Cameron Crowe Aline Brosh McKenna|Cameron Crowe Dec 23, 2011 Apr 3, 2012 $ 72,700,000 126 minutes 20th Century Fox
1537 1976 Embrace of the Serpent features the encounter,... NR Action and Adventure|Art House and International Ciro Guerra Ciro Guerra|Jacques Toulemonde Vidal Feb 17, 2016 Jun 21, 2016 $ 1,320,005 123 minutes Buffalo Films
1541 1980 A band of renegades on the run in outer space ... PG-13 Action and Adventure|Science Fiction and Fantasy Joss Whedon Joss Whedon Sep 30, 2005 Dec 20, 2005 $ 25,335,935 119 minutes Universal Pictures
1542 1981 Money, Fame and the Knowledge of English. In I... NR Comedy|Drama Gauri Shinde Gauri Shinde Oct 5, 2012 Nov 20, 2012 $ 1,416,189 129 minutes Eros Entertainment
1545 1985 A woman who joins the undead against her will ... R Horror|Mystery and Suspense Sebastian Gutierrez Sebastian Gutierrez Jun 1, 2007 Oct 9, 2007 $ 59,371 98 minutes IDP Distribution

235 rows × 12 columns

#cleaning data in rt_reviews
#call rt_reviews
rt_reviews
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id review rating fresh critic top_critic publisher date
0 3 A distinctly gallows take on contemporary fina... 3/5 fresh PJ Nabarro 0 Patrick Nabarro November 10, 2018
1 3 It's an allegory in search of a meaning that n... NaN rotten Annalee Newitz 0 io9.com May 23, 2018
2 3 ... life lived in a bubble in financial dealin... NaN fresh Sean Axmaker 0 Stream on Demand January 4, 2018
3 3 Continuing along a line introduced in last yea... NaN fresh Daniel Kasman 0 MUBI November 16, 2017
4 3 ... a perverse twist on neorealism... NaN fresh NaN 0 Cinema Scope October 12, 2017
... ... ... ... ... ... ... ... ...
54427 2000 The real charm of this trifle is the deadpan c... NaN fresh Laura Sinagra 1 Village Voice September 24, 2002
54428 2000 NaN 1/5 rotten Michael Szymanski 0 Zap2it.com September 21, 2005
54429 2000 NaN 2/5 rotten Emanuel Levy 0 EmanuelLevy.Com July 17, 2005
54430 2000 NaN 2.5/5 rotten Christopher Null 0 Filmcritic.com September 7, 2003
54431 2000 NaN 3/5 fresh Nicolas Lacroix 0 Showbizz.net November 12, 2002

54432 rows × 8 columns

#checking for the sum of null values in this dataset
rt_reviews.isna().sum()
id                0
review         5563
rating        13517
fresh             0
critic         2722
top_critic        0
publisher       309
date              0
dtype: int64
#rt_reviews has too many null values
#review has 5563, rating 13517, critic 2722 and publisher has 309 null values
# drop all null values

rt_reviews.dropna()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id review rating fresh critic top_critic publisher date
0 3 A distinctly gallows take on contemporary fina... 3/5 fresh PJ Nabarro 0 Patrick Nabarro November 10, 2018
6 3 Quickly grows repetitive and tiresome, meander... C rotten Eric D. Snider 0 EricDSnider.com July 17, 2013
7 3 Cronenberg is not a director to be daunted by ... 2/5 rotten Matt Kelemen 0 Las Vegas CityLife April 21, 2013
11 3 While not one of Cronenberg's stronger films, ... B- fresh Emanuel Levy 0 EmanuelLevy.Com February 3, 2013
12 3 Robert Pattinson works mighty hard to make Cos... 2/4 rotten Christian Toto 0 Big Hollywood January 15, 2013
... ... ... ... ... ... ... ... ...
54419 2000 Sleek, shallow, but frequently amusing. 2.5/4 fresh Gene Seymour 1 Newsday September 27, 2002
54420 2000 The spaniel-eyed Jean Reno infuses Hubert with... 3/4 fresh Megan Turner 1 New York Post September 27, 2002
54421 2000 Manages to be somewhat well-acted, not badly a... 1.5/4 rotten Bob Strauss 0 Los Angeles Daily News September 27, 2002
54422 2000 Arguably the best script that Besson has writt... 3.5/5 fresh Wade Major 0 Boxoffice Magazine September 27, 2002
54424 2000 Dawdles and drags when it should pop; it doesn... 1.5/5 rotten Manohla Dargis 1 Los Angeles Times September 26, 2002

33988 rows × 8 columns

Merging the relevant datasets

#checking for any NaN values
movie_ratings.isna().sum()
movie_id         0
averagerating    0
numvotes         0
dtype: int64
#checking for any null values
movie_basics.isna().sum()
movie_id               0
primary_title          0
original_title        21
start_year             0
runtime_minutes    31739
genres              5408
dtype: int64
#checking for null values
movie_akas.isna().sum()
movie_id                  0
ordering                  0
title                     0
region                53293
language             289988
types                163256
attributes           316778
is_original_title        25
dtype: int64
#eraaneous null values have been found in movie_akas
#replacing null values with 0
movie_akas.fillna(0, inplace= True)
movie_akas
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
movie_id ordering title region language types attributes is_original_title
0 tt0369610 10 Джурасик свят BG bg 0 0 0.0
1 tt0369610 11 Jurashikku warudo JP 0 imdbDisplay 0 0.0
2 tt0369610 12 Jurassic World: O Mundo dos Dinossauros BR 0 imdbDisplay 0 0.0
3 tt0369610 13 O Mundo dos Dinossauros BR 0 0 short title 0.0
4 tt0369610 14 Jurassic World FR 0 imdbDisplay 0 0.0
... ... ... ... ... ... ... ... ...
331698 tt9827784 2 Sayonara kuchibiru 0 0 original 0 1.0
331699 tt9827784 3 Farewell Song XWW en imdbDisplay 0 0.0
331700 tt9880178 1 La atención 0 0 original 0 1.0
331701 tt9880178 2 La atención ES 0 0 0 0.0
331702 tt9880178 3 The Attention XWW en imdbDisplay 0 0.0

331703 rows × 8 columns

#setting index of the dataframe
movie_ratings.set_index("movie_id")
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
averagerating numvotes
movie_id
tt10356526 8.3 31
tt10384606 8.9 559
tt1042974 6.4 20
tt1043726 4.2 50352
tt1060240 6.5 21
tt1069246 6.2 326
tt1094666 7.0 1613
tt1130982 6.4 571
tt1156528 7.2 265
tt1161457 4.2 148
#setting index for movies_basics
movie_basics.set_index("movie_id")
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
primary_title original_title start_year runtime_minutes genres
movie_id
tt0063540 Sunghursh Sunghursh 2013 175.0 Action,Crime,Drama
tt0066787 One Day Before the Rainy Season Ashad Ka Ek Din 2019 114.0 Biography,Drama
tt0069049 The Other Side of the Wind The Other Side of the Wind 2018 122.0 Drama
tt0069204 Sabse Bada Sukh Sabse Bada Sukh 2018 NaN Comedy,Drama
tt0100275 The Wandering Soap Opera La Telenovela Errante 2017 80.0 Comedy,Drama,Fantasy
... ... ... ... ... ...
tt9916538 Kuambil Lagi Hatiku Kuambil Lagi Hatiku 2019 123.0 Drama
tt9916622 Rodolpho Teóphilo - O Legado de um Pioneiro Rodolpho Teóphilo - O Legado de um Pioneiro 2015 NaN Documentary
tt9916706 Dankyavar Danka Dankyavar Danka 2013 NaN Comedy
tt9916730 6 Gunn 6 Gunn 2017 116.0 None
tt9916754 Chico Albuquerque - Revelações Chico Albuquerque - Revelações 2013 NaN Documentary

146144 rows × 5 columns

#mergin movie_basics and movie_ratings
#call new table basics_and_ratings
basics_and_ratings = movie_ratings.merge(movie_basics, on = 'movie_id', how = 'inner')
basics_and_ratings
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 tt10356526 8.3 31 Laiye Je Yaarian Laiye Je Yaarian 2019 117.0 Romance
1 tt10384606 8.9 559 Borderless Borderless 2019 87.0 Documentary
2 tt1042974 6.4 20 Just Inès Just Inès 2010 90.0 Drama
3 tt1043726 4.2 50352 The Legend of Hercules The Legend of Hercules 2014 99.0 Action,Adventure,Fantasy
4 tt1060240 6.5 21 Até Onde? Até Onde? 2011 73.0 Mystery,Thriller
5 tt1069246 6.2 326 Habana Eva Habana Eva 2010 106.0 Comedy,Romance
6 tt1094666 7.0 1613 The Hammer Hamill 2010 108.0 Biography,Drama,Sport
7 tt1130982 6.4 571 The Night Clerk Avant l'aube 2011 104.0 Drama,Thriller
8 tt1156528 7.2 265 Silent Sonata Circus Fantasticus 2011 77.0 Drama,War
9 tt1161457 4.2 148 Vanquisher The Vanquisher 2016 90.0 Action,Adventure,Sci-Fi
movie_akas.set_index('movie_id')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
ordering title region language types attributes is_original_title
movie_id
tt0369610 10 Джурасик свят BG bg 0 0 0.0
tt0369610 11 Jurashikku warudo JP 0 imdbDisplay 0 0.0
tt0369610 12 Jurassic World: O Mundo dos Dinossauros BR 0 imdbDisplay 0 0.0
tt0369610 13 O Mundo dos Dinossauros BR 0 0 short title 0.0
tt0369610 14 Jurassic World FR 0 imdbDisplay 0 0.0
... ... ... ... ... ... ... ...
tt9827784 2 Sayonara kuchibiru 0 0 original 0 1.0
tt9827784 3 Farewell Song XWW en imdbDisplay 0 0.0
tt9880178 1 La atención 0 0 original 0 1.0
tt9880178 2 La atención ES 0 0 0 0.0
tt9880178 3 The Attention XWW en imdbDisplay 0 0.0

331703 rows × 7 columns

#merging basics_and_ratings & movie_akas
b_r_akas = basics_and_ratings.merge(movie_akas, on = 'movie_id', how= 'inner')
b_r_akas
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres ordering title region language types attributes is_original_title
0 tt1042974 6.4 20 Just Inès Just Inès 2010 90.0 Drama 1 Just Inès 0 0 original 0 1.0
1 tt1042974 6.4 20 Just Inès Just Inès 2010 90.0 Drama 2 Samo Ines RS 0 imdbDisplay 0 0.0
2 tt1042974 6.4 20 Just Inès Just Inès 2010 90.0 Drama 3 Just Inès GB 0 0 0 0.0
3 tt1043726 4.2 50352 The Legend of Hercules The Legend of Hercules 2014 99.0 Action,Adventure,Fantasy 10 The Legend of Hercules 0 0 original 0 1.0
4 tt1043726 4.2 50352 The Legend of Hercules The Legend of Hercules 2014 99.0 Action,Adventure,Fantasy 11 Hércules - A Lenda Começa PT 0 imdbDisplay 0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
61 tt1156528 7.2 265 Silent Sonata Circus Fantasticus 2011 77.0 Drama,War 7 Circus Fantasticus 0 0 original 0 1.0
62 tt1156528 7.2 265 Silent Sonata Circus Fantasticus 2011 77.0 Drama,War 8 Circus Fantasticus FI sv imdbDisplay 0 0.0
63 tt1156528 7.2 265 Silent Sonata Circus Fantasticus 2011 77.0 Drama,War 9 Соната без думи BG bg 0 0 0.0
64 tt1161457 4.2 148 Vanquisher The Vanquisher 2016 90.0 Action,Adventure,Sci-Fi 1 Vanquisher US 0 0 new title 0.0
65 tt1161457 4.2 148 Vanquisher The Vanquisher 2016 90.0 Action,Adventure,Sci-Fi 2 The Vanquisher 0 0 original 0 1.0

66 rows × 15 columns

#setting index for movie_budgets
movie_budgets.set_index('domestic_gross','production_budget')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id release_date movie production_budget worldwide_gross
domestic_gross
760507625 1 Dec 18, 2009 Avatar 425000000 2776345279
241063875 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides 410600000 1045663875
42762350 3 Jun 7, 2019 Dark Phoenix 350000000 149762350
459005868 4 May 1, 2015 Avengers: Age of Ultron 330600000 1403013963
620181382 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi 317000000 1316721747
... ... ... ... ... ...
0 78 Dec 31, 2018 Red 11 7000 0
48482 79 Apr 2, 1999 Following 6000 240495
1338 80 Jul 13, 2005 Return to the Land of Wonders 5000 1338
0 81 Sep 29, 2015 A Plague So Pleasant 1400 0
181041 82 Aug 5, 2005 My Date With Drew 1100 181041

5782 rows × 5 columns

#setting index for movie_gross
movie_gross.set_index('domestic_gross', 'production_budget')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio foreign_gross year
domestic_gross
415000000 Toy Story 3 BV 652000000 2010
334200000 Alice in Wonderland (2010) BV 691300000 2010
296000000 Harry Potter and the Deathly Hallows Part 1 WB 664300000 2010
292600000 Inception WB 535700000 2010
238700000 Shrek Forever After P/DW 513900000 2010
... ... ... ... ...
6200 The Quake Magn. NaN 2018
4800 Edward II (2018 re-release) FM NaN 2018
2500 El Pacto Sony NaN 2018
2400 The Swan Synergetic NaN 2018
1700 An Actor Prepares Grav. NaN 2018

3387 rows × 4 columns

#merging tables to access data based on the logical relationships between them
#merging the movie_basics and movie_ratings
#call new table ratings_basics
joined_gross_budget = pd.concat([movie_gross,movie_budgets], axis=1)
joined_gross_budget
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year id release_date movie production_budget domestic_gross worldwide_gross
0 Toy Story 3 BV 415000000.0 652000000 2010.0 1 Dec 18, 2009 Avatar 425000000 760507625 2776345279
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010.0 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides 410600000 241063875 1045663875
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010.0 3 Jun 7, 2019 Dark Phoenix 350000000 42762350 149762350
3 Inception WB 292600000.0 535700000 2010.0 4 May 1, 2015 Avengers: Age of Ultron 330600000 459005868 1403013963
4 Shrek Forever After P/DW 238700000.0 513900000 2010.0 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi 317000000 620181382 1316721747
... ... ... ... ... ... ... ... ... ... ... ...
5777 NaN NaN NaN NaN NaN 78 Dec 31, 2018 Red 11 7000 0 0
5778 NaN NaN NaN NaN NaN 79 Apr 2, 1999 Following 6000 48482 240495
5779 NaN NaN NaN NaN NaN 80 Jul 13, 2005 Return to the Land of Wonders 5000 1338 1338
5780 NaN NaN NaN NaN NaN 81 Sep 29, 2015 A Plague So Pleasant 1400 0 0
5781 NaN NaN NaN NaN NaN 82 Aug 5, 2005 My Date With Drew 1100 181041 181041

5782 rows × 11 columns

#merging joined_gross_budget,b_r_akas
akas_gross = pd.concat([joined_gross_budget,b_r_akas], axis=1)
akas_gross
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year id release_date movie production_budget domestic_gross ... start_year runtime_minutes genres ordering title region language types attributes is_original_title
0 Toy Story 3 BV 415000000.0 652000000 2010.0 1 Dec 18, 2009 Avatar 425000000 760507625 ... 2010.0 90.0 Drama 1.0 Just Inès 0 0 original 0 1.0
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010.0 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides 410600000 241063875 ... 2010.0 90.0 Drama 2.0 Samo Ines RS 0 imdbDisplay 0 0.0
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010.0 3 Jun 7, 2019 Dark Phoenix 350000000 42762350 ... 2010.0 90.0 Drama 3.0 Just Inès GB 0 0 0 0.0
3 Inception WB 292600000.0 535700000 2010.0 4 May 1, 2015 Avengers: Age of Ultron 330600000 459005868 ... 2014.0 99.0 Action,Adventure,Fantasy 10.0 The Legend of Hercules 0 0 original 0 1.0
4 Shrek Forever After P/DW 238700000.0 513900000 2010.0 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi 317000000 620181382 ... 2014.0 99.0 Action,Adventure,Fantasy 11.0 Hércules - A Lenda Começa PT 0 imdbDisplay 0 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5777 NaN NaN NaN NaN NaN 78 Dec 31, 2018 Red 11 7000 0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5778 NaN NaN NaN NaN NaN 79 Apr 2, 1999 Following 6000 48482 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5779 NaN NaN NaN NaN NaN 80 Jul 13, 2005 Return to the Land of Wonders 5000 1338 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5780 NaN NaN NaN NaN NaN 81 Sep 29, 2015 A Plague So Pleasant 1400 0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5781 NaN NaN NaN NaN NaN 82 Aug 5, 2005 My Date With Drew 1100 181041 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5782 rows × 26 columns

#setting index for rt_reviews dataframe
rt_reviews.set_index('id')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
review rating fresh critic top_critic publisher date
id
3 A distinctly gallows take on contemporary fina... 3/5 fresh PJ Nabarro 0 Patrick Nabarro November 10, 2018
3 It's an allegory in search of a meaning that n... NaN rotten Annalee Newitz 0 io9.com May 23, 2018
3 ... life lived in a bubble in financial dealin... NaN fresh Sean Axmaker 0 Stream on Demand January 4, 2018
3 Continuing along a line introduced in last yea... NaN fresh Daniel Kasman 0 MUBI November 16, 2017
3 ... a perverse twist on neorealism... NaN fresh NaN 0 Cinema Scope October 12, 2017
... ... ... ... ... ... ... ...
2000 The real charm of this trifle is the deadpan c... NaN fresh Laura Sinagra 1 Village Voice September 24, 2002
2000 NaN 1/5 rotten Michael Szymanski 0 Zap2it.com September 21, 2005
2000 NaN 2/5 rotten Emanuel Levy 0 EmanuelLevy.Com July 17, 2005
2000 NaN 2.5/5 rotten Christopher Null 0 Filmcritic.com September 7, 2003
2000 NaN 3/5 fresh Nicolas Lacroix 0 Showbizz.net November 12, 2002

54432 rows × 7 columns

#setting index for movie_info
movie_info.set_index('id')
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
synopsis rating genre director writer theater_date dvd_date currency box_office runtime studio
id
1 This gritty, fast-paced, and innovative police... R Action and Adventure|Classics|Drama William Friedkin Ernest Tidyman Oct 9, 1971 Sep 25, 2001 NaN NaN 104 minutes NaN
3 New York City, not-too-distant-future: Eric Pa... R Drama|Science Fiction and Fantasy David Cronenberg David Cronenberg|Don DeLillo Aug 17, 2012 Jan 1, 2013 $ 600,000 108 minutes Entertainment One
5 Illeana Douglas delivers a superb performance ... R Drama|Musical and Performing Arts Allison Anders Allison Anders Sep 13, 1996 Apr 18, 2000 NaN NaN 116 minutes NaN
6 Michael Douglas runs afoul of a treacherous su... R Drama|Mystery and Suspense Barry Levinson Paul Attanasio|Michael Crichton Dec 9, 1994 Aug 27, 1997 NaN NaN 128 minutes NaN
7 NaN NR Drama|Romance Rodney Bennett Giles Cooper NaN NaN NaN NaN 200 minutes NaN
... ... ... ... ... ... ... ... ... ... ... ...
1996 Forget terrorists or hijackers -- there's a ha... R Action and Adventure|Horror|Mystery and Suspense NaN NaN Aug 18, 2006 Jan 2, 2007 $ 33,886,034 106 minutes New Line Cinema
1997 The popular Saturday Night Live sketch was exp... PG Comedy|Science Fiction and Fantasy Steve Barron Terry Turner|Tom Davis|Dan Aykroyd|Bonnie Turner Jul 23, 1993 Apr 17, 2001 NaN NaN 88 minutes Paramount Vantage
1998 Based on a novel by Richard Powell, when the l... G Classics|Comedy|Drama|Musical and Performing Arts Gordon Douglas NaN Jan 1, 1962 May 11, 2004 NaN NaN 111 minutes NaN
1999 The Sandlot is a coming-of-age story about a g... PG Comedy|Drama|Kids and Family|Sports and Fitness David Mickey Evans David Mickey Evans|Robert Gunter Apr 1, 1993 Jan 29, 2002 NaN NaN 101 minutes NaN
2000 Suspended from the force, Paris cop Hubert is ... R Action and Adventure|Art House and Internation... NaN Luc Besson Sep 27, 2001 Feb 11, 2003 NaN NaN 94 minutes Columbia Pictures

1560 rows × 11 columns

Using .dropna() in the merged datasets. The dataframes have crossed the threshold of null values, thus dropping.

#merging rt_reviews and movie_info datasets
reviews_info = pd.concat([rt_reviews,movie_info], axis=1)
reviews_info
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id review rating fresh critic top_critic publisher date id synopsis rating genre director writer theater_date dvd_date currency box_office runtime studio
0 3 A distinctly gallows take on contemporary fina... 3/5 fresh PJ Nabarro 0 Patrick Nabarro November 10, 2018 1.0 This gritty, fast-paced, and innovative police... R Action and Adventure|Classics|Drama William Friedkin Ernest Tidyman Oct 9, 1971 Sep 25, 2001 NaN NaN 104 minutes NaN
1 3 It's an allegory in search of a meaning that n... NaN rotten Annalee Newitz 0 io9.com May 23, 2018 3.0 New York City, not-too-distant-future: Eric Pa... R Drama|Science Fiction and Fantasy David Cronenberg David Cronenberg|Don DeLillo Aug 17, 2012 Jan 1, 2013 $ 600,000 108 minutes Entertainment One
2 3 ... life lived in a bubble in financial dealin... NaN fresh Sean Axmaker 0 Stream on Demand January 4, 2018 5.0 Illeana Douglas delivers a superb performance ... R Drama|Musical and Performing Arts Allison Anders Allison Anders Sep 13, 1996 Apr 18, 2000 NaN NaN 116 minutes NaN
3 3 Continuing along a line introduced in last yea... NaN fresh Daniel Kasman 0 MUBI November 16, 2017 6.0 Michael Douglas runs afoul of a treacherous su... R Drama|Mystery and Suspense Barry Levinson Paul Attanasio|Michael Crichton Dec 9, 1994 Aug 27, 1997 NaN NaN 128 minutes NaN
4 3 ... a perverse twist on neorealism... NaN fresh NaN 0 Cinema Scope October 12, 2017 7.0 NaN NR Drama|Romance Rodney Bennett Giles Cooper NaN NaN NaN NaN 200 minutes NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
54427 2000 The real charm of this trifle is the deadpan c... NaN fresh Laura Sinagra 1 Village Voice September 24, 2002 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
54428 2000 NaN 1/5 rotten Michael Szymanski 0 Zap2it.com September 21, 2005 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
54429 2000 NaN 2/5 rotten Emanuel Levy 0 EmanuelLevy.Com July 17, 2005 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
54430 2000 NaN 2.5/5 rotten Christopher Null 0 Filmcritic.com September 7, 2003 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
54431 2000 NaN 3/5 fresh Nicolas Lacroix 0 Showbizz.net November 12, 2002 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

54432 rows × 20 columns

#merged tthe two dataframes
#drop all the NaN values a
reviews_info.dropna()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id review rating fresh critic top_critic publisher date id synopsis rating genre director writer theater_date dvd_date currency box_office runtime studio
6 3 Quickly grows repetitive and tiresome, meander... C rotten Eric D. Snider 0 EricDSnider.com July 17, 2013 10.0 Some cast and crew from NBC's highly acclaimed... PG-13 Comedy Jake Kasdan Mike White Jan 11, 2002 Jun 18, 2002 $ 41,032,915 82 minutes Paramount Pictures
7 3 Cronenberg is not a director to be daunted by ... 2/5 rotten Matt Kelemen 0 Las Vegas CityLife April 21, 2013 13.0 Stewart Kane, an Irishman living in the Austra... R Drama Ray Lawrence Raymond Carver|Beatrix Christian Apr 27, 2006 Oct 2, 2007 $ 224,114 123 minutes Sony Pictures Classics
15 3 For better or worse - often both - Cosmopolis ... 3/5 fresh Adam Ross 0 The Aristocrat September 27, 2012 22.0 Two-time Academy Award Winner Kevin Spacey giv... R Comedy|Drama|Mystery and Suspense George Hickenlooper Norman Snider Dec 17, 2010 Apr 5, 2011 $ 1,039,869 108 minutes ATO Pictures
18 3 It's fascinating to watch Pattinson actually a... 2/4 rotten Sean P. Means 0 Salt Lake Tribune September 14, 2012 25.0 From ancient Japan's most enduring tale, the e... PG-13 Action and Adventure|Drama|Science Fiction and... Carl Erik Rinsch Chris Morgan|Hossein Amini Dec 25, 2013 Apr 1, 2014 $ 20,518,224 127 minutes Universal Pictures
19 3 A black comedy as dry and deadpan as a bleache... 4/4 fresh John Beifuss 0 Commercial Appeal (Memphis, TN) September 10, 2012 26.0 A comic series of short vignettes build on one... R Art House and International|Comedy|Drama|Music... Jim Jarmusch Jim Jarmusch May 14, 2004 Sep 21, 2004 $ 1,971,135 96 minutes MGM
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1511 45 Hello, Deedles. Terrible to meet you. 1/5 rotten Scott Weinberg 0 eFilmCritic.com July 29, 2002 1945.0 Left on a nun's doorstep, Larry, Curly and Moe... PG Comedy Bobby Farrelly|Peter Farrelly Bobby Farrelly|Peter Farrelly|Mike Cerrone Apr 13, 2012 Jul 17, 2012 $ 41,800,000 92 minutes 20th Century Fox
1518 45 Steve Van Wormer and Paul Walker, as Stew and ... 0/4 rotten Steve Rhodes 0 Internet Reviews January 1, 2000 1953.0 A glimpse into the comedic process and private... R Comedy|Documentary|Television Ricki Stern|Anne Sundberg Ricki Stern Jun 11, 2010 Dec 14, 2010 $ 2,927,972 84 minutes IFC Films
1537 46 Leaves the audience smiling and giggling, all ... 3/4 fresh Michael Dequina 0 TheMovieReport.com March 8, 2009 1976.0 Embrace of the Serpent features the encounter,... NR Action and Adventure|Art House and International Ciro Guerra Ciro Guerra|Jacques Toulemonde Vidal Feb 17, 2016 Jun 21, 2016 $ 1,320,005 123 minutes Buffalo Films
1541 46 The briskly paced, high-spirited movie is comp... 3.5/4 fresh Judith Egerton 0 Courier-Journal (Louisville, KY) June 25, 2004 1980.0 A band of renegades on the run in outer space ... PG-13 Action and Adventure|Science Fiction and Fantasy Joss Whedon Joss Whedon Sep 30, 2005 Dec 20, 2005 $ 25,335,935 119 minutes Universal Pictures
1545 46 It's a familiar show-biz routine but one that'... 3.5/4 fresh Susan Wloszczyna 1 USA Today January 1, 2000 1985.0 A woman who joins the undead against her will ... R Horror|Mystery and Suspense Sebastian Gutierrez Sebastian Gutierrez Jun 1, 2007 Oct 9, 2007 $ 59,371 98 minutes IDP Distribution

148 rows × 20 columns

#merge the joined_gross_budget with basics_and_ratings
budget_ratings = pd.concat([joined_gross_budget,basics_and_ratings], axis=1)
budget_ratings
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year id release_date movie production_budget domestic_gross worldwide_gross movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 Toy Story 3 BV 415000000.0 652000000 2010.0 1 Dec 18, 2009 Avatar 425000000 760507625 2776345279 tt10356526 8.3 31.0 Laiye Je Yaarian Laiye Je Yaarian 2019.0 117.0 Romance
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010.0 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides 410600000 241063875 1045663875 tt10384606 8.9 559.0 Borderless Borderless 2019.0 87.0 Documentary
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010.0 3 Jun 7, 2019 Dark Phoenix 350000000 42762350 149762350 tt1042974 6.4 20.0 Just Inès Just Inès 2010.0 90.0 Drama
3 Inception WB 292600000.0 535700000 2010.0 4 May 1, 2015 Avengers: Age of Ultron 330600000 459005868 1403013963 tt1043726 4.2 50352.0 The Legend of Hercules The Legend of Hercules 2014.0 99.0 Action,Adventure,Fantasy
4 Shrek Forever After P/DW 238700000.0 513900000 2010.0 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi 317000000 620181382 1316721747 tt1060240 6.5 21.0 Até Onde? Até Onde? 2011.0 73.0 Mystery,Thriller
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5777 NaN NaN NaN NaN NaN 78 Dec 31, 2018 Red 11 7000 0 0 NaN NaN NaN NaN NaN NaN NaN NaN
5778 NaN NaN NaN NaN NaN 79 Apr 2, 1999 Following 6000 48482 240495 NaN NaN NaN NaN NaN NaN NaN NaN
5779 NaN NaN NaN NaN NaN 80 Jul 13, 2005 Return to the Land of Wonders 5000 1338 1338 NaN NaN NaN NaN NaN NaN NaN NaN
5780 NaN NaN NaN NaN NaN 81 Sep 29, 2015 A Plague So Pleasant 1400 0 0 NaN NaN NaN NaN NaN NaN NaN NaN
5781 NaN NaN NaN NaN NaN 82 Aug 5, 2005 My Date With Drew 1100 181041 181041 NaN NaN NaN NaN NaN NaN NaN NaN

5782 rows × 19 columns

#drop the null values. 
budget_ratings.fillna(0, inplace = True)
budget_ratings
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year id release_date movie production_budget domestic_gross worldwide_gross movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 Toy Story 3 BV 415000000.0 652000000 2010.0 1 Dec 18, 2009 Avatar 425000000 760507625 2776345279 tt10356526 8.3 31.0 Laiye Je Yaarian Laiye Je Yaarian 2019.0 117.0 Romance
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010.0 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides 410600000 241063875 1045663875 tt10384606 8.9 559.0 Borderless Borderless 2019.0 87.0 Documentary
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010.0 3 Jun 7, 2019 Dark Phoenix 350000000 42762350 149762350 tt1042974 6.4 20.0 Just Inès Just Inès 2010.0 90.0 Drama
3 Inception WB 292600000.0 535700000 2010.0 4 May 1, 2015 Avengers: Age of Ultron 330600000 459005868 1403013963 tt1043726 4.2 50352.0 The Legend of Hercules The Legend of Hercules 2014.0 99.0 Action,Adventure,Fantasy
4 Shrek Forever After P/DW 238700000.0 513900000 2010.0 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi 317000000 620181382 1316721747 tt1060240 6.5 21.0 Até Onde? Até Onde? 2011.0 73.0 Mystery,Thriller
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5777 0 0 0.0 0 0.0 78 Dec 31, 2018 Red 11 7000 0 0 0 0.0 0.0 0 0 0.0 0.0 0
5778 0 0 0.0 0 0.0 79 Apr 2, 1999 Following 6000 48482 240495 0 0.0 0.0 0 0 0.0 0.0 0
5779 0 0 0.0 0 0.0 80 Jul 13, 2005 Return to the Land of Wonders 5000 1338 1338 0 0.0 0.0 0 0 0.0 0.0 0
5780 0 0 0.0 0 0.0 81 Sep 29, 2015 A Plague So Pleasant 1400 0 0 0 0.0 0.0 0 0 0.0 0.0 0
5781 0 0 0.0 0 0.0 82 Aug 5, 2005 My Date With Drew 1100 181041 181041 0 0.0 0.0 0 0 0.0 0.0 0

5782 rows × 19 columns

#merging all the dataframes
#merging akas_gross, budgets_ratings
budget_ratings_akas = pd.concat([reviews_info,budget_ratings], axis=1)
budget_ratings_akas
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id review rating fresh critic top_critic publisher date id synopsis ... domestic_gross worldwide_gross movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 3 A distinctly gallows take on contemporary fina... 3/5 fresh PJ Nabarro 0 Patrick Nabarro November 10, 2018 1.0 This gritty, fast-paced, and innovative police... ... 760507625 2776345279 tt10356526 8.3 31.0 Laiye Je Yaarian Laiye Je Yaarian 2019.0 117.0 Romance
1 3 It's an allegory in search of a meaning that n... NaN rotten Annalee Newitz 0 io9.com May 23, 2018 3.0 New York City, not-too-distant-future: Eric Pa... ... 241063875 1045663875 tt10384606 8.9 559.0 Borderless Borderless 2019.0 87.0 Documentary
2 3 ... life lived in a bubble in financial dealin... NaN fresh Sean Axmaker 0 Stream on Demand January 4, 2018 5.0 Illeana Douglas delivers a superb performance ... ... 42762350 149762350 tt1042974 6.4 20.0 Just Inès Just Inès 2010.0 90.0 Drama
3 3 Continuing along a line introduced in last yea... NaN fresh Daniel Kasman 0 MUBI November 16, 2017 6.0 Michael Douglas runs afoul of a treacherous su... ... 459005868 1403013963 tt1043726 4.2 50352.0 The Legend of Hercules The Legend of Hercules 2014.0 99.0 Action,Adventure,Fantasy
4 3 ... a perverse twist on neorealism... NaN fresh NaN 0 Cinema Scope October 12, 2017 7.0 NaN ... 620181382 1316721747 tt1060240 6.5 21.0 Até Onde? Até Onde? 2011.0 73.0 Mystery,Thriller
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
54427 2000 The real charm of this trifle is the deadpan c... NaN fresh Laura Sinagra 1 Village Voice September 24, 2002 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
54428 2000 NaN 1/5 rotten Michael Szymanski 0 Zap2it.com September 21, 2005 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
54429 2000 NaN 2/5 rotten Emanuel Levy 0 EmanuelLevy.Com July 17, 2005 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
54430 2000 NaN 2.5/5 rotten Christopher Null 0 Filmcritic.com September 7, 2003 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
54431 2000 NaN 3/5 fresh Nicolas Lacroix 0 Showbizz.net November 12, 2002 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

54432 rows × 39 columns

#dropping null values
budget_ratings_akas.fillna(0, inplace = True)
budget_ratings_akas
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id review rating fresh critic top_critic publisher date id synopsis ... domestic_gross worldwide_gross movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 3 A distinctly gallows take on contemporary fina... 3/5 fresh PJ Nabarro 0 Patrick Nabarro November 10, 2018 1.0 This gritty, fast-paced, and innovative police... ... 760507625 2776345279 tt10356526 8.3 31.0 Laiye Je Yaarian Laiye Je Yaarian 2019.0 117.0 Romance
1 3 It's an allegory in search of a meaning that n... 0 rotten Annalee Newitz 0 io9.com May 23, 2018 3.0 New York City, not-too-distant-future: Eric Pa... ... 241063875 1045663875 tt10384606 8.9 559.0 Borderless Borderless 2019.0 87.0 Documentary
2 3 ... life lived in a bubble in financial dealin... 0 fresh Sean Axmaker 0 Stream on Demand January 4, 2018 5.0 Illeana Douglas delivers a superb performance ... ... 42762350 149762350 tt1042974 6.4 20.0 Just Inès Just Inès 2010.0 90.0 Drama
3 3 Continuing along a line introduced in last yea... 0 fresh Daniel Kasman 0 MUBI November 16, 2017 6.0 Michael Douglas runs afoul of a treacherous su... ... 459005868 1403013963 tt1043726 4.2 50352.0 The Legend of Hercules The Legend of Hercules 2014.0 99.0 Action,Adventure,Fantasy
4 3 ... a perverse twist on neorealism... 0 fresh 0 0 Cinema Scope October 12, 2017 7.0 0 ... 620181382 1316721747 tt1060240 6.5 21.0 Até Onde? Até Onde? 2011.0 73.0 Mystery,Thriller
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
54427 2000 The real charm of this trifle is the deadpan c... 0 fresh Laura Sinagra 1 Village Voice September 24, 2002 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0
54428 2000 0 1/5 rotten Michael Szymanski 0 Zap2it.com September 21, 2005 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0
54429 2000 0 2/5 rotten Emanuel Levy 0 EmanuelLevy.Com July 17, 2005 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0
54430 2000 0 2.5/5 rotten Christopher Null 0 Filmcritic.com September 7, 2003 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0
54431 2000 0 3/5 fresh Nicolas Lacroix 0 Showbizz.net November 12, 2002 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0

54432 rows × 39 columns

Exploratory Descriptive Analysis (EDA)

We'll now employ techniques that are sometimes referred to as descriptive statistics because they only describe the available data or offer estimations based on it.

What is the correlation between the genre and movie runtime?

#open the needed dataframe
budget_ratings_akas
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
id review rating fresh critic top_critic publisher date id synopsis ... domestic_gross worldwide_gross movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 3 A distinctly gallows take on contemporary fina... 3/5 fresh PJ Nabarro 0 Patrick Nabarro November 10, 2018 1.0 This gritty, fast-paced, and innovative police... ... 760507625 2776345279 tt10356526 8.3 31.0 Laiye Je Yaarian Laiye Je Yaarian 2019.0 117.0 Romance
1 3 It's an allegory in search of a meaning that n... 0 rotten Annalee Newitz 0 io9.com May 23, 2018 3.0 New York City, not-too-distant-future: Eric Pa... ... 241063875 1045663875 tt10384606 8.9 559.0 Borderless Borderless 2019.0 87.0 Documentary
2 3 ... life lived in a bubble in financial dealin... 0 fresh Sean Axmaker 0 Stream on Demand January 4, 2018 5.0 Illeana Douglas delivers a superb performance ... ... 42762350 149762350 tt1042974 6.4 20.0 Just Inès Just Inès 2010.0 90.0 Drama
3 3 Continuing along a line introduced in last yea... 0 fresh Daniel Kasman 0 MUBI November 16, 2017 6.0 Michael Douglas runs afoul of a treacherous su... ... 459005868 1403013963 tt1043726 4.2 50352.0 The Legend of Hercules The Legend of Hercules 2014.0 99.0 Action,Adventure,Fantasy
4 3 ... a perverse twist on neorealism... 0 fresh 0 0 Cinema Scope October 12, 2017 7.0 0 ... 620181382 1316721747 tt1060240 6.5 21.0 Até Onde? Até Onde? 2011.0 73.0 Mystery,Thriller
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
54427 2000 The real charm of this trifle is the deadpan c... 0 fresh Laura Sinagra 1 Village Voice September 24, 2002 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0
54428 2000 0 1/5 rotten Michael Szymanski 0 Zap2it.com September 21, 2005 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0
54429 2000 0 2/5 rotten Emanuel Levy 0 EmanuelLevy.Com July 17, 2005 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0
54430 2000 0 2.5/5 rotten Christopher Null 0 Filmcritic.com September 7, 2003 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0
54431 2000 0 3/5 fresh Nicolas Lacroix 0 Showbizz.net November 12, 2002 0.0 0 ... 0 0 0 0.0 0.0 0 0 0.0 0.0 0

54432 rows × 39 columns

# plotting a sns.barplot:
fig, ax1= plt.subplots(figsize=(10,8))

x = list(budget_ratings_akas['runtime_minutes'].values)
y = budget_ratings['genres']

ax= sns.barplot(data = budget_ratings, x = 'runtime_minutes', y = 'genres')

#labelling plot
ax1.set_title('Correlation btwn Genres & Runtime', fontsize=16)
ax1.set_xlabel("Runtime_minutes",fontsize=16)
ax1.set_ylabel("Genres", fontsize=16)

#will display the plot
plt.show()

App Screenshot

The plot above has the longest duration of the genres, measured in minutes, according to the visual representation. The genre with the longest runtime is romance, whereas the genre with the shortest length is a thriller.

Which genre has the highest rating?

#loading data 
#confirming the columns needed are available
budget_ratings
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year id release_date movie production_budget domestic_gross worldwide_gross movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 Toy Story 3 BV 415000000.0 652000000 2010.0 1 Dec 18, 2009 Avatar 425000000 760507625 2776345279 tt10356526 8.3 31.0 Laiye Je Yaarian Laiye Je Yaarian 2019.0 117.0 Romance
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010.0 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides 410600000 241063875 1045663875 tt10384606 8.9 559.0 Borderless Borderless 2019.0 87.0 Documentary
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010.0 3 Jun 7, 2019 Dark Phoenix 350000000 42762350 149762350 tt1042974 6.4 20.0 Just Inès Just Inès 2010.0 90.0 Drama
3 Inception WB 292600000.0 535700000 2010.0 4 May 1, 2015 Avengers: Age of Ultron 330600000 459005868 1403013963 tt1043726 4.2 50352.0 The Legend of Hercules The Legend of Hercules 2014.0 99.0 Action,Adventure,Fantasy
4 Shrek Forever After P/DW 238700000.0 513900000 2010.0 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi 317000000 620181382 1316721747 tt1060240 6.5 21.0 Até Onde? Até Onde? 2011.0 73.0 Mystery,Thriller
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5777 0 0 0.0 0 0.0 78 Dec 31, 2018 Red 11 7000 0 0 0 0.0 0.0 0 0 0.0 0.0 0
5778 0 0 0.0 0 0.0 79 Apr 2, 1999 Following 6000 48482 240495 0 0.0 0.0 0 0 0.0 0.0 0
5779 0 0 0.0 0 0.0 80 Jul 13, 2005 Return to the Land of Wonders 5000 1338 1338 0 0.0 0.0 0 0 0.0 0.0 0
5780 0 0 0.0 0 0.0 81 Sep 29, 2015 A Plague So Pleasant 1400 0 0 0 0.0 0.0 0 0 0.0 0.0 0
5781 0 0 0.0 0 0.0 82 Aug 5, 2005 My Date With Drew 1100 181041 181041 0 0.0 0.0 0 0 0.0 0.0 0

5782 rows × 19 columns

#plotting
fig, ax1= plt.subplots(figsize=(10,8))

x = list(budget_ratings_akas['genres'].values)
y = budget_ratings['averagerating']

ax= sns.barplot(data = budget_ratings, x = 'genres', y = 'averagerating')

#labelling plot
ax1.set_title('Most rated genre of film', fontsize=16)
ax1.set_xlabel("Genres",fontsize=16)
ax1.set_ylabel("Average Rating", fontsize=16)

#changing axis of x labels
plt.xticks(rotation = 45)

#will display the plot
plt.show()

App Screenshot

The genre with the highest rating, documentaries, is 8.9

Which of the genres has the highest world wide gross?

#To enable it to load in the plot, convert worldwide gross to float.
budget_ratings['worldwide_gross']=budget_ratings['worldwide_gross'].astype(float)
#confirming changes
budget_ratings
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
title studio domestic_gross foreign_gross year id release_date movie production_budget domestic_gross worldwide_gross movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 Toy Story 3 BV 415000000.0 652000000 2010.0 1 Dec 18, 2009 Avatar 425000000 760507625 2.776345e+09 tt10356526 8.3 31.0 Laiye Je Yaarian Laiye Je Yaarian 2019.0 117.0 Romance
1 Alice in Wonderland (2010) BV 334200000.0 691300000 2010.0 2 May 20, 2011 Pirates of the Caribbean: On Stranger Tides 410600000 241063875 1.045664e+09 tt10384606 8.9 559.0 Borderless Borderless 2019.0 87.0 Documentary
2 Harry Potter and the Deathly Hallows Part 1 WB 296000000.0 664300000 2010.0 3 Jun 7, 2019 Dark Phoenix 350000000 42762350 1.497624e+08 tt1042974 6.4 20.0 Just Inès Just Inès 2010.0 90.0 Drama
3 Inception WB 292600000.0 535700000 2010.0 4 May 1, 2015 Avengers: Age of Ultron 330600000 459005868 1.403014e+09 tt1043726 4.2 50352.0 The Legend of Hercules The Legend of Hercules 2014.0 99.0 Action,Adventure,Fantasy
4 Shrek Forever After P/DW 238700000.0 513900000 2010.0 5 Dec 15, 2017 Star Wars Ep. VIII: The Last Jedi 317000000 620181382 1.316722e+09 tt1060240 6.5 21.0 Até Onde? Até Onde? 2011.0 73.0 Mystery,Thriller
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5777 0 0 0.0 0 0.0 78 Dec 31, 2018 Red 11 7000 0 0.000000e+00 0 0.0 0.0 0 0 0.0 0.0 0
5778 0 0 0.0 0 0.0 79 Apr 2, 1999 Following 6000 48482 2.404950e+05 0 0.0 0.0 0 0 0.0 0.0 0
5779 0 0 0.0 0 0.0 80 Jul 13, 2005 Return to the Land of Wonders 5000 1338 1.338000e+03 0 0.0 0.0 0 0 0.0 0.0 0
5780 0 0 0.0 0 0.0 81 Sep 29, 2015 A Plague So Pleasant 1400 0 0.000000e+00 0 0.0 0.0 0 0 0.0 0.0 0
5781 0 0 0.0 0 0.0 82 Aug 5, 2005 My Date With Drew 1100 181041 1.810410e+05 0 0.0 0.0 0 0 0.0 0.0 0

5782 rows × 19 columns

fig, ax1= plt.subplots(figsize=(10,5))

#arranging the x & y axis to avoid an overlap
x = np.arange(8)
y = 2*x + 1


#plot:
ax= sns.scatterplot( x='movie', y='worldwide_gross', data = budget_ratings)


#labelling plot
ax1.set_title('Movie with the highest worldwide gross')
ax1.set_xlabel("Movie")
ax1.set_ylabel("worldwide_gross")
plt.xticks(rotation= 45)
#will display the plot
plt.show()
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 128 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 153 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 148 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 129 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 149 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 159 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 131 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:238: RuntimeWarning: Glyph 147 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 128 missing from current font.
  font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 153 missing from current font.
  font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 148 missing from current font.
  font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 129 missing from current font.
  font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 149 missing from current font.
  font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 159 missing from current font.
  font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 131 missing from current font.
  font.set_text(s, 0, flags=flags)
C:\Users\AURALIA\anaconda3\envs\learn-env\lib\site-packages\matplotlib\backends\backend_agg.py:201: RuntimeWarning: Glyph 147 missing from current font.
  font.set_text(s, 0, flags=flags)

App Screenshot

This particular scatter plot was created to display the amount of money that the films on the x-axis brought in globally. I made a few attempts to stop it from overlapping, but they were unsuccessful. This leads me to the conclusion that I need to do additional research on how to plan a plot that doesn't overlap. The many film genres brought in good money as gained from x-axis, which is measured in millions.

Which movies have the highest number of votes?

#checking for the needed columns first
basics_and_ratings
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
movie_id averagerating numvotes primary_title original_title start_year runtime_minutes genres
0 tt10356526 8.3 31 Laiye Je Yaarian Laiye Je Yaarian 2019 117.0 Romance
1 tt10384606 8.9 559 Borderless Borderless 2019 87.0 Documentary
2 tt1042974 6.4 20 Just Inès Just Inès 2010 90.0 Drama
3 tt1043726 4.2 50352 The Legend of Hercules The Legend of Hercules 2014 99.0 Action,Adventure,Fantasy
4 tt1060240 6.5 21 Até Onde? Até Onde? 2011 73.0 Mystery,Thriller
5 tt1069246 6.2 326 Habana Eva Habana Eva 2010 106.0 Comedy,Romance
6 tt1094666 7.0 1613 The Hammer Hamill 2010 108.0 Biography,Drama,Sport
7 tt1130982 6.4 571 The Night Clerk Avant l'aube 2011 104.0 Drama,Thriller
8 tt1156528 7.2 265 Silent Sonata Circus Fantasticus 2011 77.0 Drama,War
9 tt1161457 4.2 148 Vanquisher The Vanquisher 2016 90.0 Action,Adventure,Sci-Fi
#Plotting a seaborn lineplot
plt.figure(figsize=(12,6)) 
sns.lineplot( x="genres", y="numvotes", data=basics_and_ratings,)  
plt.title("The most voted for Genres") #labelling
plt.xticks(rotation = 60);
plt.show() 

App Screenshot

The most voted for genre is Action,Adventure,Fantasy followed by Biography,Drama,Sport.

Conclusion

  1. A movie's average rating does not guarantee that it is a good movie, and the opposite is also true.
  2. Film studios should provide many online and offline access methods for their content.
  3. Fans of movies convey a different message about what they find appealing in movies.
  4. According to the data provided, romantic films had longer runs than scary films. Films that are near to the hearts of the audience should receive more attention than those that frighten them, as the production budget also increases somewhat as a result.
  5. It is necessary to conduct more research. To determine the amount of individuals who really see movies in theaters versus those who prefer to stream, surveys can be sent to owners of movie theaters, moviegoers, and internet respondents.
  6. Depending on their genre and production costs, movies can make money both domestic and foreign.. Less people will watch it the worse the quality, and vice versa.

Recommendations

  1. The dataframes displayed the various movie genres, the titles of the films, their budgets for production, and the respective domestic, international, and global box office receipts for the film studios. Despite having a global and worldwide audience, the languages employed in the films did not take into account other continents; for instance, there was no swahili-language film or actor. Therefore,accessibility of content in different markets When movies come out should be considered. Allow growth by giving everyone the chance to watch a new movie in every region of the world.

  2. Major Markets to invest in: .Tv Licensing .Foreign distribution .Domestic Box Office .Physical Copy sales .Digital streaming & video on demand

  3. Consider first going through the company planning process.

  4. Work in all languages and with the more popular genres.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%