Skip to content

Latest commit

 

History

History
30 lines (25 loc) · 1.27 KB

README.md

File metadata and controls

30 lines (25 loc) · 1.27 KB

Study of Metro Ridership

This is the class project for my General Assembly Data Science class.

Goals:

  • To visualize the DC Metro rail historical ridership data
  • To determine the variables that affect ridership
  • To build a model that detremines the relationship between the response (metrorail ridership) and the feature variables (ie gas price, weather, unemployment)

Game Plan:

  • This is a regression problem and I plan to use a linear regression model
  • The main model evaluation tool will be RMSE
  • Will make models of increasing complexity and see what works best

Guide:

  • A presentation can be found here
  • Want more details? A report can be found here
  • Graphs visualizing data can be found here
  • Data wrangling code can be found here
  • Modeling code can be found here
  • Data dictionary can be found here

To Do List

  • Clean up code to be more elegant/shorter
  • Study the large residuals to see if they have anything in common
  • Parameter tuning of the models
  • Add data for days when sports games exis
  • Find better proxy for tourism

Wish List

  • Make interactive data visualizations using javascript/d3. Maybe something like this