- Using Datagrip & Python scripts (cleaning and importing data)
A really useful script for uploading to Elastic Search can be found here
SQL Queries Queries implementation: here
SQL Queries
ER Diagram
Queries implementation: here
- What are all Led Zeppelin song names in rock_Music_data, and on which days do they end up on the popularDataSet in 2018]
- What are all the playlists that those Led Zeppelin songs feature in?
- What were the most popular songs (songs listed in the top 3) of the month of January of 2019 in Canada? Order by popularity and limit output to 10.
- What is the largest popularity gap in rock_Music (lowest popularity, highest popularity)?
- Which songs have the most genres (limit to 10 results)?
- Which band shows up the most often in Alternative Music Data? Which of their songs appear in the Popular Dataset, include artist name, title, date and country.
- Which Artists appear in both the indie and alternative music data starting by the letter S
- What are some good club music (danceability > 0.8) listed as pop which artists also make music categorized as blues? Return the pop song and blues song with its respective artist.
- Out of the most popular alternative playlist, list in increasing order the songs above 5 minutes in length.
- How many pop songs released in 2020 that are in the top 20 have a tempo greater than 120?
Elastic Search Queries
Queries implementation: here
- What are the top 10 most upvoted comments of all time? Print the comment and the score in an ordered list.
- How many of the comments listed as controversial are also listed as an edited comment?
- Show and state the number of all the controversial comments were made at night (after 10pm)?
- What is the percentage of comments with the word sorry in them and are also replying to another comment?
- Who were the top 3 users that commented the most in 2006? How many comments did they make and what was their top commented subreddit?
- Find all comments about postgres. Display the number of comments that have a score between 15-30. Display the top comment and the lowest comment in that range
- Display the number of comments for every subreddit and the top comment score. Order them in popularity.
- Query every comment between September 2007 and December 2007 that either has the word ‘sql’ or ‘nosql’ in the comment. Only include comments which have a score greater than 0. Print the number of comments and print the first 10 results (sorted by score).
- Find the top comment in January 2007, print it and also display the number of replies this comment got in total.
- Find all comments that mention at least 2 of the following words: sql, database and programming, software. In 2006. State the number of comments
This repository is available under the MIT LICENSE.