Skip to content

Latest commit

 

History

History
 
 

week_2_workflow_orchestration

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

Week 2: Workflow Orchestration

If you're looking for Airflow videos from the 2022 edition, check the 2022 cohort folder.

Data Lake (GCS)

  • What is a Data Lake
  • ELT vs. ETL
  • Alternatives to components (S3/HDFS, Redshift, Snowflake etc.)
  • Video
  • Slides

1. Introduction to Workflow orchestration

  • What is orchestration?
  • Workflow orchestrators vs. other types of orchestrators
  • Core features of a workflow orchestration tool
  • Different types of workflow orchestration tools that currently exist

🎥 Video

2. Introduction to Prefect concepts

  • What is Prefect?
  • Installing Prefect
  • Prefect flow
  • Creating an ETL
  • Prefect task
  • Blocks and collections
  • Orion UI

🎥 Video

3. ETL with GCP & Prefect

  • Flow 1: Putting data to Google Cloud Storage

🎥 Video

4. From Google Cloud Storage to Big Query

  • Flow 2: From GCS to BigQuery

🎥 Video

5. Parametrizing Flow & Deployments

  • Parametrizing the script from your flow
  • Parameter validation with Pydantic
  • Creating a deployment locally
  • Setting up Prefect Agent
  • Running the flow
  • Notifications

🎥 Video

6. Schedules & Docker Storage with Infrastructure

  • Scheduling a deployment
  • Flow code storage
  • Running tasks in Docker

🎥 Video

7. Prefect Cloud and Additional Resources

  • Using Prefect Cloud instead of local Prefect
  • Workspaces
  • Running flows on GCP

🎥 Video

Code repository

Code from videos (with a few minor enhancements)

Homework

To be linked here by Jan. 30

Community notes

Did you take notes? You can share them here.

2022 notes

Most of these notes are about Airflow, but you might find them useful.