Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dbt.py #29

Merged
merged 3 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,15 @@ graph TB
2. [Airflow](https://airflow.apache.org/) to orchestrate data loading scripts and additional automated workflows
3. [DBT core](https://docs.getdbt.com/) to define data models and transformations, again orchestrated by Airflow (via CLI / bash TaskFlow)

## Standards

The project has been strucutrd and designed with inspiration from [dbt project recommendations](https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview) and other sources.

- DBT projects stored in separate subdirectory from DAGs (at least now)
- DAGs and DBT projects organised at the top level by owner (should more get involved)
- Further organisation by data source and / or function
- Naming generally follows DBT recommended `[layer]_[source]__[entity]`, adapted for Airflow DAGs with `__[refresh-type]` and other modifications as needed.


## Setup

Expand All @@ -73,8 +82,6 @@ To run Airflow on a single instance, I used Honcho to run multiple processes via
- `AIRFLOW__CORE__FERNKET_KEY={generated-key}` following [this guidance](https://airflow.apache.org/docs/apache-airflow/1.10.8/howto/secure-connections.html) to encrypt connection data
- `AIRFLOW__CORE__INTERNAL_API_SECRET_KEY={generated-secret1}` following [this guidance](https://flask.palletsprojects.com/en/stable/config/#SECRET_KEY)
- `AIRFLOW__WEBSERVER__SECRET_KEY={generated-secret2}` following guidance above
- `AIRFLOW__WEBSERVER__BASE_URL={deployed-url}`
- `AIRFLOW__CLI__ENDPOINT_URL={deployed-url}`
- `AIRFLOW__WEBSERVER__INSTANCE_NAME=MY INSTANCE!`
4. Generate Publish Profile file and deploy application code from GitHub
5. Set startup command to use the `startup.txt` file
Expand Down
1 change: 1 addition & 0 deletions dags/michael/dbt.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@


@dag(
dag_id="dbt__michael",
# Run after source datasets refreshed
schedule=[NOTION_DAILY_HABITS_DS, NOTION_WEEKLY_HABITS_DS],
catchup=False,
Expand Down
2 changes: 1 addition & 1 deletion dags/michael/migrate.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
DATASET = os.getenv("ADMIN_DATASET", "admin")

with DAG(
"migrate_raw_tables",
"bq__migrate_schema",
schedule="@once", # also consider "None"
start_date=datetime(1970, 1, 1),
params={"command": "upgrade", "revision": "head"},
Expand Down
Loading