Strava ELT

The goal of this project is to implement the skeleton of a robust ELT pipeline. Things to consider are:

Strava API --> Python --> BigQuery + dbt --> Tableau/ML in Jupyter Notebook

light data transformation with Pandas
orchestration through Google Cloud Services
data storage through BigQuery
final data transformations (dimensional modeling + OBT) for downstream analytics through dbt
- dbt strava project
Containerization via Docker
ELT job notifications sent through Slack
Downstream analytics supported by this pipeline
- dashboard via Tableau
- cycling ML model via Python/Sklearn

Python application is containerized and pushed to Google Cloud Artifact Registry
Container is then deployed on Cloud Run Jobs at a set schedule
Every midnight, the ELT pipeline is ran, checking for new data to upload to BigQuery
At job completion, a Slack notification with job meta data and success status is sent

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
.vscode		.vscode
configs/__pycache__		configs/__pycache__
src		src
tests		tests
.DS_Store		.DS_Store
.coverage		.coverage
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
run.log		run.log

Provide feedback