Skip to content

This repository contains files related to the case study that I completed in partial fulfillment of the requirements for the Google Data Analytics Professional Certificate, which I earned in July 2022.

Notifications You must be signed in to change notification settings

baspin94/GoogleDataAnalyticsCaseStudy

Repository files navigation

Google Data Analytics Case Study: How Does a Bike-Share Navigate Speedy Success?

This repository contains files related to the case study that I completed in partial fulfillment of the requirements for the Google Data Analytics Professional Certificate, which I earned in July 2022. For this scenario, I was tasked with comparing how the members and casual users of a bike-sharing service (a fictional company named Cyclistic) differed in various aspects of ridership.

Documentation

Data Source and Licensing

This project utilizes publicly available, anonymized trip data provided by Divvy, a bikeshare program operated by Lyft with oversight from the Chicago Department of Transportation. It can be used for this project thanks to the license agreement Divvy has established in conjunction with the City of Chicago. This agreement allows users to download and use the data as source material for analyses, reports, and visualizations.

The original datasets can be downloaded from this repository. I utilized the datasets from the entire 2021 calendar year for this project (filenames starting with '2021').

Navigating this Repository

SQL Used in Report

Updates

  • Calculate_TripType.sql: Collection of updates that categorized trips based on distance and ride length.
    • Note: I created this column to aid with a distance-related line of inquiry that I later abandoned, but I still ended up using it in all of my other queries to filter out trips that were in the 'canceled' category.
  • EndDayIssue_Deletion.sql: Update to delete a small number of records from the database that had an end date greater than 1 day from the start date. This does not represent a typical use case for the bike-sharing service and could have been the result of a data-encoding error.

Queries

  • 2021_TotalRidership.sql: Summarizes the number and percentage of trips completed during 2021 by members and casual riders.
  • Most_Popular_Stations.sql: Collection of queries used to pull data related to the most popular stations for members and casual riders.
  • RideLength_Statistics.sql: Query to explore various statistics (average, median, max length) and total hours ridden by members and casual riders.
  • RideableType_Breakdown.sql: Query to determine how frequently members and casual riders utilized the different bike types available.
  • Ridership_ByDay.sql: Queries to count the number of trips completed on each day of the week by members and casual riders.
  • Ridership_ByHour.sql: Query to coiunt the number of trips that members and casual riders started during a given hour of the day.

SQL Not Used in Report

Updates

  • Season_Updates.sql: Collection of updates that categorized trips based on season.
    • Note: I created this column with the intent of comparing the usage of bikes from season to season, but this would have been mostly redundant information, given that I also compared usage from month to month.

Excel Data Used to Create Visuals in Tableau

  • tableau_data_cleaned.xlsx: Copy of the data pulled from my SQL database and cleaned for the visualizations I created in Tableau.

Changelog

  • GDA_DataCleaning_Changelog: Summarizes the changes that I made while cleaning the datasets for this project, including how many records were affected.

Data Dictionary

  • GDA_DataDictionary.pdf: Provides the name, type, and description of each field in the data used for this analysis.

About

This repository contains files related to the case study that I completed in partial fulfillment of the requirements for the Google Data Analytics Professional Certificate, which I earned in July 2022.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published