To learn how to use PipeRider together with dbt for detecting changes in model and data, sign up for a workshop
- Video: https://www.youtube.com/watch?v=O-tyUOQccSs
- Repository: https://github.com/InfuseAI/taxi_rides_ny_duckdb
The following questions follow on from the original Week 4 homework, and so use the same data as required by those questions:
Yellow taxi data - Years 2019 and 2020 Green taxi data - Years 2019 and 2020 fhv data - Year 2019.
What is the distribution between vendor id filtering by years 2019 and 2020 data?
You will need to run PipeRider and check the report
- 70.1/29.6/0.5
- 60.1/39.5/0.4
- 90.2/9.5/0.3
- 80.1/19.7/0.2
What is the composition if total amount (positive/zero/negative) filtering by years 2019 and 2020 data?
You will need to run PipeRider and check the report
- 51.4M/15K/48.6K
- 21.4M/5K/248.6K
- 61.4M/25K/148.6K
- 81.4M/35K/14.6K
What is the numeric statistics (average/standard deviation/min/max/sum) of trip distances filtering by years 2019 and 2020 data?
You will need to run PipeRider and check the report
- 1.95/35.43/0/16.3K/151.5M
- 3.95/25.43/23.88/267.3K/281.5M
- 5.95/75.43/-63.88/67.3K/81.5M
- 2.95/35.43/-23.88/167.3K/181.5M
- Form for submitting: https://forms.gle/WyLQHBu1DNwNTfqe8
- You can submit your homework multiple times. In this case, only the last submission will be used.
Deadline: 20 March, 22:00 CET
We will publish the solution here