Hello! I'm Rich, a data analyst with a background in data product management and software development.
This portolio page is where I track my personal development and showcase what I've been working on.
A project demonstrating techniques for understanding and predicting customer churn for a simulated social network. By combining product usage analytics event data with customer subscription data, we can reveal which usage behaviours have the biggest impact on churn probability and predict the churn probability of individual accounts.
Cohort Analysis | Logistic Regression | Postgres | SQL
Uses AWS components to combine a SQL database with streaming event data and transform it for analysis with Amazon Athena.
Data Engineering | Kinesis Firehose | S3 | AWS Glue | AWS Lambda
A data cleaning and logistic regression pipeline implemented in PySpark which examines which aspects of an Epicurious recipe are important in determining whether or not the recipe is for a dessert. The final pyspark.ml
Pipeline uses custom-made Transformers and Estimators for missing value imputation and outlier capping.
PySpark ML | Logistic Regression | Data Cleaning
A data pipeline orchestrated across AWS and Google Cloud Services using mage.ai for data transformation. The project visualises a month of trips made by licensed yellow cabs in January 2023 in New York in Looker Studio.
Data Engineering | EC2 | BigQuery | Looker Studio
A Microsoft Power BI business intelligence dashboard for AdventureWorks, a fictional global manufacturing company that produces cycling equipment and accessories. The data was derived from the AdventureWorks sample databases available from Microsoft.
Power BI | M Formula Language | Power Query | DAX
Where appropriate, I include links to my own solutions to "end of chapter" exercises.
Anthony DeBarros - Practical SQL, 2nd Edition
(my chapter solutions)
James, Witten, Hastie, Tibshirani and Taylor - An Introduction to Statistical Learning with Applications in Python
(my solutions for labs and end-of-chapter exercises)
Thomas Haslwanter - An Introduction to Statistics with Python
(my chapter solutions)
Maven Analytics - Statistics for Data Analysis
(my solutions and notes from mid-course projects)
Jonathan Rioux - Data Analysis with Python and PySpark
(my chapter solutions)