This project analyzes historical bike rental data from Washington D.C.'s Capital Bikeshare system (2011–2012) to identify patterns and build predictive models for daily rental counts. The goal is to understand how environmental and seasonal factors influence demand and recommend optimal strategies for bike allocation.
- Dataset: Capital Bikeshare System (2011-2012)
- Temporal:
dteday
(date),season
,mnth
,weekday
,hr
(hourly data only). - Weather:
temp
(normalized temperature),hum
(humidity),windspeed
,weathersit
(weather condition). - Usage Metrics:
casual
(non-registered users),registered
,cnt
(total rentals).
- Daily Data:
day.csv
(731 records, 16 features). - Hourly Data:
hour.csv
(17,379 records, 17 features).
bike-rental-analysis/
├── data/
│ ├── 1.1 raw/ # Raw datasets (hour.csv, day.csv)
│ └── 1.2 processed/ # Processed data splits (train/test)
├── docs/ # Project documentation and datasets
├── notebooks/
│ └── PRCP-1018-BikeRental.ipynb # Main Jupyter notebook for analysis
├── reports/
│ └── Final Report.md # Detailed analysis and results
├── results/
│ ├── models/ # Saved models (e.g., Ridge regression)
│ └── figures/ # Visualizations and EDA outputs
└── scripts/ # Utility and preprocessing scripts
-
Data Analysis Report:
- Exploratory Data Analysis (EDA) to identify trends, outliers, and correlations.
- Seasonal and weather impact assessment on rental demand.
-
Predictive Modeling:
- Regression models to forecast daily bike rentals (
cnt
). - Comparison of linear models (Ridge, Lasso) and tree-based models (XGBoost, Gradient Boosting).
- Regression models to forecast daily bike rentals (
-
Challenges Report:
- Solutions for multicollinearity, outliers, and non-normality in data.
- Outlier Handling: Capping extreme values in
hum
andwindspeed
at 1st/99th percentiles. - Feature Engineering:
- Log transformation for
windspeed
. - One-hot encoding for categorical variables (
season
,weathersit
).
- Log transformation for
- Multicollinearity Mitigation: Removed redundant features (e.g.,
temp
vs.atemp
).
- Algorithms Tested:
- Linear Regression, Ridge/Lasso Regression.
- XGBoost, Gradient Boosting, Random Forest.
- Evaluation Metrics: R², RMSE, MAE, and cross-validation.
Model | R² (Test) | RMSE | Interpretability |
---|---|---|---|
Ridge | 0.832 | 819.62 | High |
XGBoost | 0.860 | 748.11 | Low |
- Demand Drivers:
- Temperature (
atemp
) and clear weather increase rentals. - Adverse weather (e.g., snow) reduces demand by ~2,149 rentals/day.
- Temperature (
- Seasonality:
- Peak demand in fall/winter (6,000+ rentals/day).
- 50% YoY growth from 2011 to 2012.
Challenge | Solution |
---|---|
Multicollinearity | Removed redundant features (VIF analysis). |
Outliers in windspeed |
Log transformation and capping. |
Non-normal target (cnt ) |
Tree-based models + Robust scaling. |
pip install -r requirements.txt
The project uses insightfulpy
for streamlined preprocessing:
pip install insightfulpy
- Run the Jupyter notebook
notebooks/PRCP-1018-BikeRental.ipynb
. - Execute cells sequentially to reproduce EDA, modeling, and reports.
- Fanaee-T, H., & Gama, J. (2013). Event Labeling Combining Ensemble Detectors and Background Knowledge. Progress in Artificial Intelligence.
- Dataset Citation: Bike Sharing Dataset.
📌 Note: Full code, visualizations, and model artifacts are available in the repository. For detailed implementation, refer to the Final Report.