Skip to content

Latest commit

 

History

History
67 lines (37 loc) · 1.71 KB

File metadata and controls

67 lines (37 loc) · 1.71 KB

Workshop: Maximizing Confidence in Your Data Model Changes with dbt and PipeRider

To learn how to use PipeRider together with dbt for detecting changes in model and data, sign up for a workshop

Homework

The following questions follow on from the original Week 4 homework, and so use the same data as required by those questions:

https://github.com/DataTalksClub/data-engineering-zoomcamp/blob/main/cohorts/2023/week_4_analytics_engineering/homework.md

Yellow taxi data - Years 2019 and 2020 Green taxi data - Years 2019 and 2020 fhv data - Year 2019.

Question 1:

What is the distribution between vendor id filtering by years 2019 and 2020 data?

You will need to run PipeRider and check the report

  • 70.1/29.6/0.5
  • 60.1/39.5/0.4
  • 90.2/9.5/0.3
  • 80.1/19.7/0.2

Question 2:

What is the composition if total amount (positive/zero/negative) filtering by years 2019 and 2020 data?

You will need to run PipeRider and check the report

  • 51.4M/15K/48.6K
  • 21.4M/5K/248.6K
  • 61.4M/25K/148.6K
  • 81.4M/35K/14.6K

Question 3:

What is the numeric statistics (average/standard deviation/min/max/sum) of trip distances filtering by years 2019 and 2020 data?

You will need to run PipeRider and check the report

  • 1.95/35.43/0/16.3K/151.5M
  • 3.95/25.43/23.88/267.3K/281.5M
  • 5.95/75.43/-63.88/67.3K/81.5M
  • 2.95/35.43/-23.88/167.3K/181.5M

Submitting the solutions

Deadline: 20 March, 22:00 CET

Solution

We will publish the solution here