This project compares the salaries and Player Efficiency Ratings (PER) of players in the NBA and WNBA. By scraping data from various sports websites and performing thorough data cleaning and analysis, we aim to identify patterns, variances, and insights between the two leagues. We also came up with our own player efficiency rating (PER) for further analysis. Key insights are visualized and the code is modularized for clarity and reusability.
Data was collected exclusively through web scraping from the following sources:
- NBA Player Salaries: HoopsHype NBA Player Salaries
- WNBA Player Salaries: Her Hoop Stats WNBA Player Salaries
- NBA Team Salaries: HoopsHype NBA Team Salaries
- WNBA Team Salaries: Her Hoop Stats WNBA Team Salaries
- WNBA Offensive PER: Her Hoop Stats WNBA Offensive PER
- WNBA Defensive PER: Her Hoop Stats WNBA Defensive PER
- Data Availability: Unlike APIs and structured datasets, the data from web scraping can be less structured, requiring initial exploration and validation.
- Web Scraping Complexity: Handling dynamic content, avoiding scraping limitations, and ensuring ethical scraping practices.
- Data Cleaning: Dealing with missing values, duplicated entries, and inconsistent data formats across different sources.
- Salary Disparity: NBA player salaries are significantly higher than WNBA player salaries.
- PER Correlation: Higher-paid players in both leagues (NBA and WNBA) exhibit better performance efficiency ratings.
Data was scraped using custom Python scripts contained in the /data_extraction
directory.
- NBA Player and Team Salaries
- Extracted using functions:
extract_nba_player_salaries
andextract_nba_team_salaries
- Extracted using functions:
- WNBA Player and Team Salaries
- Extracted using functions:
extract_wnba_player_salaries
andextract_wnba_team_salaries
- Extracted using functions:
- WNBA Offensive and Defensive PER
- Extracted and calculated using functions:
calculate_and_save_offensive_per
andcalculate_and_save_defensive_per
- Extracted and calculated using functions:
We devised our own methodology to calculate Player Efficiency Ratings (PER) for further analysis:
-
NBA Offensive PER (O-PER):
- Formula:
(PTS + AST + ORB) / 3
- Data Collected in:
nba_top_50_offensive_per.csv
- Formula:
-
NBA Defensive PER (D-PER):
- Formula:
(DRB + BLK + STL) / 3
- Data Collected in:
nba_top_50_defensive_per.csv
- Formula:
-
WNBA Offensive PER (O-PER):
- Formula:
(PTS + AST + ORB) / 3
- Data Collected in:
wnba_top_50_offensive_per.csv
- Formula:
-
WNBA Defensive PER (D-PER):
- Formula:
(DRB + BLK + STL) / 3
- Data Collected in:
wnba_top_50_defensive_per.csv
- Formula:
Initial raw data was scraped into CSV files located within the extracted_data
directory. Cleaning steps included:
- Handling null values
- Removing duplicates
- String manipulation
- Formatting data fields
Cleaned data is stored in the cleaned_data
directory.
Main cleaning functions:
cleaning.py
clean_salaries.py
clean_per.py
Post-cleaning involved EDA to:
- Validate hypotheses
- Apply aggregation and filtering techniques
- Create visualizations
Through data analysis, the following conclusions were drawn:
- Salary Disparity Confirmed: NBA player salaries are significantly higher than WNBA player salaries.
- PER Analysis: Players with higher salaries generally have higher PER in both leagues.
We created the following visualizations to better illustrate our findings:
-
Comparison of Total WNBA Team Salaries (12 Teams) vs. Top NBA Earner (Stephen Curry)
- This pie chart shows the stark difference between the combined salaries of all WNBA teams and the salary of the highest-paid NBA player.
-
Comparison of Total WNBA Player Salaries vs. Top NBA Earner (Stephen Curry)
- This pie chart compares the combined salaries of all WNBA players against the salary of the top-earning NBA player.
-
Comparison of Total NBA PER (O-PER + D-PER) vs. Total WNBA PER (O-PER + D-PER)
- This pie chart visualizes the combined Player Efficiency Ratings for both NBA and WNBA, demonstrating the performance effectiveness across both leagues.
- How do external factors (like media coverage, sponsorship deals) influence player salaries in the NBA vs WNBA?
- What are the trends in rookie salaries and how do they progress compared to veteran players in both leagues?
data_extraction/nba.py
: Functions for extracting NBA data.data_extraction/wnba.py
: Functions for extracting WNBA data.data_extraction/_offensive_per_wnba.py
: Functions for scraping and calculating WNBA Offensive PER.data_extraction/_defensive_per_wnba.py
: Functions for scraping and calculating WNBA Defensive PER.data_extraction/_nba_per.py
: Functions for scraping and calculating NBA PER.
data_processing/clean_salaries.py
: Functions for cleaning and processing player and team salaries.data_processing/clean_per.py
: Functions for cleaning and processing PER data.
The exploratory data analysis methods used include:
- Aggregation: Grouping data by specific attributes to find overall trends.
- Filtering: Narrowing down data sets based on specific criteria to find relevant insights.
- Visualizations: Plots and graphs to visually represent data patterns.
Through data analysis, the following conclusions were drawn:
- Salary Disparity Confirmed: NBA player salaries are significantly higher than WNBA player salaries.
- PER Analysis: Players with higher salaries generally have higher PER in both leagues.
- GitHub Repository: nba_vs_wbna
- Kanban Board (Trello): nba_vs_wbna
The findings of this project are presented in an online slide format:
- Presentation Slides: PLACEHOLDER
- Emmanuel Aron
- Marc Jahnert
This project is licensed under the MIT License. See the LICENSE file for details.
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/JayEm65/nba_vs_wbna.git
-
Navigate to the project directory:
cd nba_vs_wbna
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the data extraction and cleaning scripts:
python main.py