The ultimate tool when leading with a Pandas dataframe!!!
The general syntax is:
df.apply(lambda x: func( x['col1'], x['col2']), axis=1 )
This will allow you to create pretty much any logic, I promise!!!
To perform the challenges you will use the dataset /data/imput/IMDB-Movie-Data.csv
We want to create bins of movies according to the number of votes they've received. For that matter, we will create a new column named 'bin' which will tag every movie as follow:
- From 0 to 1000 ==> 1
- From 1000 to 10000 ==> 2
- From 10000 to 100000 ==> 3
- From 100000 to 1000000 ==> 4
- More than 1000000 ==> 5
We want to know how much is the revenue per minute for every movie.
We want to create a new ranking where we add 1 point if the genre is thriller but subtract 1 point if the genre is comedy.
We want to know if the sum of the ASCII value of every character of the movie title divided by the number of votes retrieve a prime number...remember, prime numbers are integers.
Feel free to propose your own ranking based in aggregations of at least 3 columns of the dataset.
We want to know which movies might have hidden paterns in their description. A way to know that is finding those movies which the sum of all numeric values of the string description hash (SHA256) are between their revenue and their number of votes.