Lisa Watkins
netID: watkinsl
IS 452AO
2 Credit Hour
Final Project | ReadMe File
12/21/17
This repository will house every aspect of my final project for IS452 Foundations of Info Processing (Python).
- Final project proposal that was a required aspect of the final project although the project has evolved since the submission
- Python (.py) file containing program
- necessary HTML and HTM files that will be used within the Python program
- CSV file that will output as a result of the program
- Word document containing a narrative about the process of developing this program as part of the project requirements
- This ReadMe file, which acts as the documentation on how to run the code, another requirement of the project
What is the point of this program: This program was to exercise the practice of web scraping for purposeful data. I believe that this program is scalable and can be used on any set of artists. In the case of this particular program, I am looking at the nominees for the Artist of the Year award at the American Music Awards. There were five nominees:
- Bruno Mars
- Kendrick Lamar
- Drake
- The Chainsmokers
- Ed Sheeran
The program scrapes web files saved from various sources: Ticketmaster, Instagram, and Twitter. The reasoning behind this will be explained below. Certain pieces of information are scraped, cleaned up, interpreted, and written to a CSV file, which can be used for further investigation.
To execute the program successfully, you will need to do the following:
- Download the files, which include
- watkinsl_IS452_final-project.py (Python file)
- HTML & HTM files:
- artist_ (5)
- IG_ (5)
- Twitter_ (5)
- make sure all of the files are in the same directory location
- meaning, DO NOT separate them into folders or in other locations
What to expect: Once all of the necessary files are downloaded, you will be able to run the program successfully granted you have software that supports Python files (i.e. PyCharm).
Note: This program was developed using Python 3.
The program doesn't require any input, so you will be able to run the program successfully once you open it. Once the program is finished running, a CSV file will output to the directory called, "AMA-nominated_artists_info.csv." What the program will output to the CSV file is:
- one row of headers corresponding to related information that will be stored within that column
- seven columns with varying types of information (i.e. name, rating, # of reviews...)
- five rows containing artist information for each of the five artists nominated in the "Artist of the Year" award category at the American Music Awards
What is significant about this program: This program aggregates data scraped from various web sources (Ticketmaster, Instagram, and Twitter) on the set of artists that were nonimated for the Artist of the Year award at the American Music Awards (AMAs).
The AMAs decide on winners based on a myriad of factors, but one is fan interaction. In today's age, social media is most likely the best way an artist can interact with their fans. I have chosen the two social media platforms that are really conducive to interacting with an artist. Another way artists interact with their fans is obviously through concerts, which is where Ticketmaster comes in. Fans rate and review their experiences at an artist's concert, which is important for their popularity and success.