Skip to content

thanh17/nba

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 

Repository files navigation

NBA

NBA Data Mining Using Scrapy.

The goal of this project is to use Scrapy to get scrape web data and build a MySQL database of NBA records for future use. Some quick analysis in Jupyter Notebook was also done but that was not the main motifs of this project.

Data

All data sources were taken from [Basketball Reference] (https://www.basketball-reference.com/).

The method for collecting the data was simple:

  1. Using Python Scrapy's extension, scrape various from past NBA Seasons.
    • Various spiders were created for different tasks:
      • gamelog.py crawled through individual game performances for each team for a given season.
      • playoffs.py crawled through past playoff performances.
      • regular.py crawled through past regular season perfomances.
  2. Using MySQL local server and Scrapy, a pipeline was created to directly pass scraped data through.
  3. That's it! Now you have a database setup of past NBA records to do whatever

Analysis

Analysis was done using Jupyter Notebook.

Notebook Contents

Packages used:

Pandas: Pandas Dataframe objects were used to manipulate data for visualization and analysis

Main Visualization tools:

The main visualization tools used were Plotly and matplotlib. In most cases, Plotly was prefered for its interactivity with data. However, matplotlib was used when Plotly's documentation was actually too trash to use or quick anaylsis was favored over interactivity.

Questions/Comments/Concerns:

(Yes, I know that some of the code are not the cleanest but it works for the time being)

Email: [email protected]

About

NBA Data Mining Using Scrapy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published