Skip to content

Commit

Permalink
Merge branch 'main' into support-profile-option
Browse files Browse the repository at this point in the history
  • Loading branch information
z3z1ma authored Sep 21, 2023
2 parents 763ce58 + ce7b725 commit e12ed92
Show file tree
Hide file tree
Showing 44 changed files with 1,292 additions and 150 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: lint

on:
pull_request:
push:
branches:
- main

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 2
- name: Set up Python
uses: actions/setup-python@v3
with:
python-version: "3.10"
- name: Install pre-commit hooks
run: |
pip install -U pip==23.1.0
pip install -U pre-commit==3.4.0
pre-commit install
- name: Run pre-commit hooks
run: |
pre-commit run --all-files
7 changes: 4 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ jobs:
- name: Check if there is a parent commit
id: check-parent-commit
run: |
echo "::set-output name=sha::$(git rev-parse --verify --quiet HEAD^)"
sha="$(git rev-parse --verify --quiet HEAD^)"
echo "sha=${sha:?}" >> "$GITHUB_OUTPUT"
- name: Detect and tag new version
id: check-version
Expand All @@ -47,8 +48,8 @@ jobs:
if: "! steps.check-version.outputs.tag"
run: |
poetry version patch &&
version=$(poetry version | awk '{ print $2 }') &&
poetry version $version.dev.$(date +%s)
version="$(poetry version | awk '{ print $2 }')" &&
poetry version "${version:?}.dev.$(date +%s)"
- name: Build package
run: |
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ on:

jobs:
tests:
name: Run pytest
name: Run pytest
runs-on: ubuntu-latest
steps:
- name: Check out the repository
Expand All @@ -26,11 +26,11 @@ jobs:
run: |
pip install --constraint=.github/workflows/constraints.txt poetry
poetry --version
- name: Install required packages
- name: Install required packages
run: |
poetry install
- name: Run pytest
run: |
poetry run python -m pytest
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ celerybeat.pid
*.sage.py

# Environments
.direnv
.env
.venv
env/
Expand All @@ -127,3 +128,6 @@ dmypy.json

# Pyre type checker
.pyre/

# Nix
.devenv
36 changes: 36 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-json
- id: check-yaml
- id: detect-private-key
- id: debug-statements
- repo: https://github.com/rhysd/actionlint
rev: v1.6.21
hooks:
- id: actionlint-docker
- repo: https://github.com/shellcheck-py/shellcheck-py
rev: v0.9.0.2
hooks:
- id: shellcheck
# TODO format files to follow the style guide later.
# - repo: https://github.com/psf/black
# rev: 23.9.1
# hooks:
# - id: black
# - repo: https://github.com/pycqa/isort
# rev: 5.10.1
# hooks:
# - id: isort
# - repo: https://github.com/pycqa/flake8
# rev: 4.0.1
# hooks:
# - id: flake8
# TODO refactor the files in the 'docker' directory later.
# - repo: https://github.com/hadolint/hadolint
# rev: v2.12.0
# hooks:
# - id: hadolint-docker
2 changes: 1 addition & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true
}
}
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Please check it out for a more in-depth introduction to dbt-osmosis. 👇
Hello and welcome to the project! [dbt-osmosis](https://github.com/z3z1ma/dbt-osmosis) 🌊 serves to enhance the developer experience significantly. We do this through providing 4 core features:

1. Automated schema YAML management.

1a. `dbt-osmosis yaml refactor --project-dir ... --profiles-dir ...`

> Automatically generate documentation based on upstream documented columns, organize yaml files based on configurable rules defined in dbt_project.yml, scaffold new yaml files based on the same rules, inject columns from data warehouse schema if missing in yaml and remove columns no longer present in data warehouse (organize -> document)
Expand All @@ -44,7 +44,7 @@ Hello and welcome to the project! [dbt-osmosis](https://github.com/z3z1ma/dbt-os

> Spins up a WSGI server. Can be passed --register-project to automatically register your local project
3. Workbench for dbt Jinja SQL. This workbench is powered by streamlit and the badge at the top of the readme will take you to a demo on streamlit cloud with jaffle_shop loaded (requires extra `pip install "dbt-osmosis[workbench]"`).
3. Workbench for dbt Jinja SQL. This workbench is powered by streamlit and the badge at the top of the readme will take you to a demo on streamlit cloud with jaffle_shop loaded (requires extra `pip install "dbt-osmosis[workbench]"`).

3a. `dbt-osmosis workbench --project-dir ... --profiles-dir ...`

Expand Down Expand Up @@ -82,11 +82,11 @@ The workbench is a streamlit app that allows you to work on dbt models in a side
I also expect there is some untapped value in the workbench that is only pending some time from myself. I've seen a path to a truly novel development experience and look forward to exploring it.
Demo the workbench 👇
Demo the workbench 👇
[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://z3z1ma-dbt-osmosis-srcdbt-osmosisapp-v2-i0ico9.streamlit.app/)
```sh
# NOTE this requires the workbench extra as you can see
pip install "dbt-osmosis[workbench]"
Expand All @@ -107,7 +107,7 @@ Press "r" to reload the workbench at any time.
✔️ Data Profiler (leverages pandas-profiling)


**Editor**
**Editor**

The editor is able to compile models with control+enter or dynamically as you type. Its speedy! You can choose any target defined in your profiles yml for compilation and execution.

Expand Down
2 changes: 1 addition & 1 deletion demo_duckdb/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
logs
target
target
1 change: 0 additions & 1 deletion demo_duckdb/models/schema.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,4 +79,3 @@ models:
description: Total amount (AUD) of the order
tests:
- not_null

2 changes: 1 addition & 1 deletion demo_duckdb/models/staging/stg_payments.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
with source as (

{#-
Normally we would select from the table here, but we are using seeds to load
our data in this project
Expand Down
2 changes: 1 addition & 1 deletion demo_sqlite/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
logs
target
target
1 change: 0 additions & 1 deletion demo_sqlite/models/customers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,3 @@ models:

- name: total_order_amount
description: Total value (AUD) of a customer's orders

1 change: 0 additions & 1 deletion demo_sqlite/models/staging/stg_customers.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,3 @@ models:
tests:
- unique
- not_null

1 change: 0 additions & 1 deletion demo_sqlite/models/staging/stg_orders.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,3 @@ models:
tests:
- accepted_values:
values: [placed, shipped, completed, return_pending, returned]

2 changes: 1 addition & 1 deletion demo_sqlite/models/staging/stg_payments.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
with source as (

{#-
Normally we would select from the table here, but we are using seeds to load
our data in this project
Expand Down
3 changes: 2 additions & 1 deletion docker/shell
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
#!/usr/bin/env bash
my_path="$( cd "$(dirname "$0")"; pwd -P)"
# shellcheck disable=SC2034
my_path="$( cd "$(dirname "$0")" || exit ; pwd -P)"
${1:-docker} compose -f docker/docker-compose.yml exec app bash -c "SHELL=bash poetry shell"
4 changes: 2 additions & 2 deletions docker/shutdown
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
#!/usr/bin/env bash
my_path="$( cd "$(dirname "$0")"; pwd -P)"
${1:-docker-compose} -f ${my_path}/docker-compose.yml down -v
my_path="$( cd "$(dirname "$0")" || exit ; pwd -P)"
${1:-docker-compose} -f "${my_path}/docker-compose.yml" down -v
6 changes: 3 additions & 3 deletions docker/startup
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@ set -ex
export COMPOSE_DOCKER_CLI_BUILD=1
export DOCKER_BUILDKIT=1
my_path="$( cd "$(dirname "$0")"; pwd -P)"
${my_path}/shutdown
${1:-docker-compose} -f ${my_path}/docker-compose.yml build
${1:-docker-compose} -f ${my_path}/docker-compose.yml up -d
"${my_path}/shutdown"
${1:-docker-compose} -f "${my_path}/docker-compose.yml" build
${1:-docker-compose} -f "${my_path}/docker-compose.yml" up -d
6 changes: 3 additions & 3 deletions docs/docs/tutorial-basics/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,15 +67,15 @@ dbt-osmosis server serve [--host] [--port]

### Register Project

This command will register a dbt project with the dbt-osmosis server.
This command will register a dbt project with the dbt-osmosis server.

```bash
dbt-osmosis server register-project --project-dir /path/to/dbt/project
```

### Unregister Project

This command will unregister a dbt project with the dbt-osmosis server.
This command will unregister a dbt project with the dbt-osmosis server.

```bash
dbt-osmosis server unregister-project --project-dir /path/to/dbt/project
Expand Down Expand Up @@ -103,7 +103,7 @@ dbt-osmosis sql compile [--project-dir] [--profiles-dir] [--target] "select * fr

## Workbench

This command starts a [streamlit](https://streamlit.io/) workbench. The workbench is a REPL environment that allows you to run dbt models, provides realtime side by side compilation, and lets you explore the results.
This command starts a [streamlit](https://streamlit.io/) workbench. The workbench is a REPL environment that allows you to run dbt models, provides realtime side by side compilation, and lets you explore the results.

```bash
dbt-osmosis workbench [--project-dir] [--profiles-dir] [--target] [--host] [--port]
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorial-basics/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ sidebar_position: 1

## Install with pipx

If you will install dbt-osmosis and its dependencies in a virtual environment, and make it available as a command-line tool.
If you will install dbt-osmosis and its dependencies in a virtual environment, and make it available as a command-line tool.

```bash
pipx install dbt-osmosis
Expand Down
6 changes: 3 additions & 3 deletions docs/docs/tutorial-yaml/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,13 +69,13 @@ vars:
salesforce:
path: "staging/salesforce/source.yml"
schema: "salesforce_v2"
# a source with the same schema as the source name
marketo: "staging/customer/marketo.yml"
# a special variable interpolated at runtime
jira: "staging/project_mgmt/{parent}.yml"
# a dedicated directory for all sources
github: "all_sources/github.yml"
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorial-yaml/context.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,4 @@ models:
# make it so models in staging/salesforce, staging/marketo, etc. all route docs into
# files named salesforce.yml, marketo.yml, etc. in their respective directories
+dbt-osmosis: "{parent}.yml"
```
```
6 changes: 3 additions & 3 deletions docs/docs/tutorial-yaml/inheritance.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ sidebar_position: 3

## Overview

A really clutch feature of dbt-osmosis is the ability to inherit documentation from parent nodes. This is especially useful when you have a large number of models that share the same documentation. For example, if you have a large number of models that are all derived from a single source table, you can define the documentation for the source table once and then inherit it for all of the models that are derived from it. This means you are able to be more DRY with your documentation. Alternatives such as dbt-codegen only go up one level of inheritance. dbt-osmosis traverses the entire hierarchy of your dbt project and inherits documentation from all parent nodes for the specific node being documented.
A really clutch feature of dbt-osmosis is the ability to inherit documentation from parent nodes. This is especially useful when you have a large number of models that share the same documentation. For example, if you have a large number of models that are all derived from a single source table, you can define the documentation for the source table once and then inherit it for all of the models that are derived from it. This means you are able to be more DRY with your documentation. Alternatives such as dbt-codegen only go up one level of inheritance. dbt-osmosis traverses the entire hierarchy of your dbt project and inherits documentation from all parent nodes for the specific node being documented.

## Details

dbt-osmosis accumulates a knowledge graph for a specfic model by traversing the edges until it reaches the furthest removed ancestors of a node. It then caches all the documentation into a dictionary. It then traverses the edges in the opposite direction, starting from the furthest removed ancestors and working its way down to the node being documented merging in documentation. Once we have built the graph we can lean on it for any undocumented columns.
dbt-osmosis accumulates a knowledge graph for a specfic model by traversing the edges until it reaches the furthest removed ancestors of a node. It then caches all the documentation into a dictionary. It then traverses the edges in the opposite direction, starting from the furthest removed ancestors and working its way down to the node being documented merging in documentation. Once we have built the graph we can lean on it for any undocumented columns.

The crux of the value proposition is that we often alias columns in our staging models and use them many times in many places without changing the name. This means, within the context of a specific models family tree, we should be able to inherit that knowledge. This inheritance can include tags and descriptions. This permits propagating PII, GDPR, and other compliance related tags for example. When a column is used in a model and its definition is semantically different while the column name is the same (which is a questionable practice), you should update the definition for that column in that model. The inheritors will use the updated definition if they pull from said model.
The crux of the value proposition is that we often alias columns in our staging models and use them many times in many places without changing the name. This means, within the context of a specific models family tree, we should be able to inherit that knowledge. This inheritance can include tags and descriptions. This permits propagating PII, GDPR, and other compliance related tags for example. When a column is used in a model and its definition is semantically different while the column name is the same (which is a questionable practice), you should update the definition for that column in that model. The inheritors will use the updated definition if they pull from said model.

:::tip Tip

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/tutorial-yaml/selection.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sidebar_position: 4

## Selecting models

The `dbt-osmosis yaml` commands have two methods to selecting files to execute on.
The `dbt-osmosis yaml` commands have two methods to selecting files to execute on.

### Positional selectors

Expand Down
6 changes: 3 additions & 3 deletions docs/docs/tutorial-yaml/workflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sidebar_position: 5

### Sources

dbt-osmosis will manage synchronizing your sources regardless of if you specify them in the vars.dbt-osmosis key your `dbt_project.yml` or not. That key, as seen in the example below, only serves to **declaratively** tell dbt-osmosis where the source file _should_ live.
dbt-osmosis will manage synchronizing your sources regardless of if you specify them in the vars.dbt-osmosis key your `dbt_project.yml` or not. That key, as seen in the example below, only serves to **declaratively** tell dbt-osmosis where the source file _should_ live.

The advantage of this approach is that you can use dbt-osmosis to manage your sources without having to scaffold the YAML file yourself. You simply add a key value to the dictionary in the vars.dbt-osmosis key and dbt-osmosis will create the YAML file for you on next execution. It also hardens it against changes that violate the declarative nature of dbt-osmosis since it will simply migrate the file back to its original state on next execution unless you explicitly change it.

Expand Down Expand Up @@ -46,7 +46,7 @@ models:

## Running dbt-osmosis

I will step through 3 ways to run dbt-osmosis. These are not mutually exclusive. You can use any combination of these approaches to get the most out of dbt-osmosis. They are ordered based on the amount of effort required to get started and by the overall scalability as model count increases.
I will step through 3 ways to run dbt-osmosis. These are not mutually exclusive. You can use any combination of these approaches to get the most out of dbt-osmosis. They are ordered based on the amount of effort required to get started and by the overall scalability as model count increases.

### On-demand ⭐️

Expand All @@ -70,7 +70,7 @@ repos:

### CI/CD ⭐️⭐️⭐️

You can also run dbt-osmosis as part of your CI/CD pipeline. The best way to do this is to simply clone the repo, run dbt-osmosis, and then commit the changes. Preferably, you would do this in a separate branch and then open a PR. This is the most robust approach since it ensures that the changes are reviewed and approved by a human before they are merged into the main branch whilst taking the load off of developer machines.
You can also run dbt-osmosis as part of your CI/CD pipeline. The best way to do this is to simply clone the repo, run dbt-osmosis, and then commit the changes. Preferably, you would do this in a separate branch and then open a PR. This is the most robust approach since it ensures that the changes are reviewed and approved by a human before they are merged into the main branch whilst taking the load off of developer machines.

```bash title="example.sh"
# this only exists to provide color, in the future we may add a preconfigured GHA to do this
Expand Down
2 changes: 1 addition & 1 deletion docs/src/components/HomepageFeatures/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ const FeatureList = [
Svg: require('@site/static/img/github-icon.svg').default,
description: (
<>
A single execution on your dbt project can save you hours of manual toil. dbt-osmosis
A single execution on your dbt project can save you hours of manual toil. dbt-osmosis
is built to work with your existing dbt project. It can be run directly, as a pre-commit hook,
or as a CI/CD step generating a PR. By leveraging git, we can safely execute
file changes en masse with proper diffs and version control.
Expand Down
Loading

0 comments on commit e12ed92

Please sign in to comment.