NBA Data Mining Using Scrapy.
The goal of this project is to use Scrapy to get scrape web data and build a MySQL database of NBA records for future use. Some quick analysis in Jupyter Notebook was also done but that was not the main motifs of this project.
All data sources were taken from [Basketball Reference] (https://www.basketball-reference.com/).
The method for collecting the data was simple:
- Using Python Scrapy's extension, scrape various from past NBA Seasons.
- Various spiders were created for different tasks:
- gamelog.py crawled through individual game performances for each team for a given season.
- playoffs.py crawled through past playoff performances.
- regular.py crawled through past regular season perfomances.
- Various spiders were created for different tasks:
- Using MySQL local server and Scrapy, a pipeline was created to directly pass scraped data through.
- That's it! Now you have a database setup of past NBA records to do whatever
Analysis was done using Jupyter Notebook.
Pandas: Pandas Dataframe objects were used to manipulate data for visualization and analysis
The main visualization tools used were Plotly and matplotlib. In most cases, Plotly was prefered for its interactivity with data. However, matplotlib was used when Plotly's documentation was actually too trash to use or quick anaylsis was favored over interactivity.
(Yes, I know that some of the code are not the cleanest but it works for the time being)
Email: [email protected]