Uber Fares is a Data Science and Machine Learning I worked on in my free time. Its goals were to analyse the dataset of 200k NYC Uber rides and build a model to predict the price of the trip.
During the project development I have...
- Downloaded the dataset from Kaggle
- Formatted, cleaned, and enriched the dataset with additional data (NYC Neighborhoods and US Holidays)
- Created qualitative, spacial and temporal visualisations with Seaborn
- Iterated through several ML algorithms such as Polynomial regression, ElasticNet and Decision trees
Documentation is hosted on Netlify and built on Sphinx
βββ data
β βββ external <- Data from third party sources.
β βββ interim <- Intermediate data that has been transformed.
β βββ processed <- The final, canonical data sets for modeling.
β βββ raw <- The original, immutable data dump.
βββ docs <- Sphinx Docs; see sphinx-doc.org for details
βββ models <- Trained and serialized models
βββ notebooks <- Jupyter notebooks for explorations
β βββ 0.1_data_processing_tests
β βββ 0.2_exploration
β βββ 0.3_machine_learning
βββ references <- Data dictionaries, manuals, and all other explanatory materials.
βββ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
β βββ figures <- Generated graphics and figures to be used in reporting
βββ utils <- Source code for all analysis
β βββ data <- Scripts to preprocess data for analysis
β βββ features <- Scripts to build features
β βββ models <- Scripts to train models
β βββ visualization <- Scripts to produce visualisations
βββ web <- Web demo
βββ environment.yml <- Template for conda environment creation
βββ Makefile <- Makefile with commands like `make data` or `make model`
βββ pyproject.toml <- Python project config file
βββ README.md <- The top-level README for developers using this project.
βββ requirements.txt <- Pip requirements
βββ test_environment.py <- Script for testing the correct environment setup
βββ tox.ini <- tox file with settings for running tox; see tox.readthedocs.io