Skip to content

Commit

Permalink
Merge branch 'current' into add-forethought-widget
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Oct 4, 2024
2 parents 3cb7716 + 78004ee commit 384bc42
Show file tree
Hide file tree
Showing 143 changed files with 9,525 additions and 7,312 deletions.
6 changes: 6 additions & 0 deletions contributing/content-style-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -624,6 +624,12 @@ When describing icons that appear on-screen, use the [_Google Material Icons_](h

:white_check_mark:Click on the menu icon

#### Upload icons
If you're using icons to document things like [third-party vendors](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations), etc. — you need to add the icon file in the following locations to ensure the icons render correctly in light and dark mode:

- website/static/img/icons
- website/static/img/icons/white

### Image names

Two words that are either adjectives or nouns describing the name of a file separated by an underscore `_` (known as `snake_case`). The two words can also be separated by a hyphen (`kebab-case`).
Expand Down
5 changes: 5 additions & 0 deletions contributing/lightbox.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,9 @@ You can use the Lightbox component to add an image or screenshot to your page. I
/>
```

Note that if you're using icons to document things like third party vendors, etc, — you need to add the icon file in the following locations to ensure the icons render correctly in light and dark mode:

- `website/static/img/icons`
- `website/static/img/icons/white`

<LoomVideo id="2b64dbd47a2d46dbafa5b43ed52a91e0" />
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ After the initial release I started to expand to cover the rest of the dbt Cloud

In this example we’ll download a `catalog.json` artifact from the latest run of a dbt Cloud job using `dbt-cloud run list` and `dbt-cloud get-artifact` and then write a simple Data Catalog CLI application using the same tools that are used in `dbt-cloud-cli` (i.e., `click` and `pydantic`). Let’s dive right in!

The first command we need is the `dbt-cloud run list` which uses an [API endpoint](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#/operations/List%20Runs) that returns runs sorted by creation date, with the most recent run appearing first. The command returns a JSON response that has one top-level attribute `data` that contains a list of runs. We’ll need to extract the `id` attribute of the first one and to do that we use [jq](https://stedolan.github.io/jq/):
The first command we need is the `dbt-cloud run list` which uses an [API endpoint](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/List%20Runs) that returns runs sorted by creation date, with the most recent run appearing first. The command returns a JSON response that has one top-level attribute `data` that contains a list of runs. We’ll need to extract the `id` attribute of the first one and to do that we use [jq](https://stedolan.github.io/jq/):

```
latest_run_id=$(dbt-cloud run list --job-id $DBT_CLOUD_JOB_ID | jq .data[0].id -r)
Expand Down
89 changes: 89 additions & 0 deletions website/blog/2024-10-04-hybrid-mesh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: "How Hybrid Mesh unlocks dbt collaboration at scale"
description: A deep-dive into the Hybrid Mesh pattern for enabling collaboration between domain teams using dbt Core and dbt Cloud.
slug: hybrid-mesh
authors: [jason_ganz]
tags: [analytics craft]
hide_table_of_contents: false
date: 2024-09-30
is_featured: true
---

One of the most important things that dbt does is unlock the ability for teams to collaborate on creating and disseminating organizational knowledge.

In the past, this primarily looked like a team working in one dbt Project to create a set of transformed objects in their data platform.

As dbt was adopted by larger organizations and began to drive workloads at a global scale, it became clear that we needed mechanisms to allow teams to operate independently from each other, creating and sharing data models across teams &mdash; [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro).

<!-- truncate -->

dbt Mesh is powerful because it allows teams to operate _independently_ and _collaboratively_, each team free to build on their own but contributing to a larger, shared set of data outputs.

The flexibility of dbt Mesh means that it can support [a wide variety of patterns and designs](/best-practices/how-we-mesh/mesh-3-structures). Today, let’s dive into one pattern that is showing promise as a way to enable teams working on very different dbt deployments to work together.

## How Hybrid Mesh enables collaboration between dbt Core and dbt Cloud teams

**_Scenario_** &mdash; A company with a central data team uses dbt Core. The setup is working well for that team. They want to scale their impact to enable faster decision-making, organization-wide. The current dbt Core setup isn't well suited for onboarding a larger number of less-technical, nontechnical, or less-frequent contributors.

**_The goal_** &mdash; Enable three domain teams of less-technical users to leverage and extend the central data models, with full ownership over their domain-specific dbt models.

- **Central data team:** Data engineers comfortable using dbt Core and the command line interface (CLI), building and maintaining foundational data models for the entire organization.

- **Domain teams:** Data analysts comfortable working in SQL but not using the CLI and prefer to start working right away without managing local dbt Core installations or updates. The team needs to build transformations specific to their business context. Some of these users may have tried dbt in the past, but they were not able to successfully onboard to the central team's setup.

**_Solution: Hybrid Mesh_** &mdash; Data teams can use dbt Mesh to connect projects *across* dbt Core and dbt Cloud, creating a workflow where everyone gets to work in their preferred environment while creating a shared lineage that allows for visibility, validation, and ownership across the data pipeline.

Each team will fully own its dbt code, from development through deployment, using the product that is appropriate to their needs and capabilities _while sharing data products across teams using both dbt Core and dbt Cloud._

<Lightbox src="/img/blog/2024-09-30-hybrid-mesh/hybrid-mesh.png" width="75%" title="A before and after diagram highlighting how a Hybrid Mesh allows central data teams using dbt Core to work with domain data teams using dbt Cloud." />

Creating a Hybrid Mesh is mostly the same as creating any other [dbt Mesh](/guides/mesh-qs?step=1) workflow &mdash; there are a few considerations but mostly _it just works_. We anticipate it will continue to see adoption as more central data teams look to onboard their downstream domain teams.

A Hybrid Mesh can be adopted as a stable long-term pattern, or as an intermediary while you perform a [migration from dbt Core to dbt Cloud](/guides/core-cloud-2?step=1).

## How to build a Hybrid Mesh
Enabling a Hybrid Mesh is as simple as a few additional steps to import the metadata from your Core project into dbt Cloud. Once you’ve done this, you should be able to operate your dbt Mesh like normal and all of our [standard recommendations](/best-practices/how-we-mesh/mesh-1-intro) still apply.

### Step 1: Prepare your Core project for access through dbt Mesh

Configure public models to serve as stable interfaces for downstream dbt Projects.

- Decide which models from your Core project will be accessible in your Mesh. For more information on how to configure public access for those models, refer to the [model access page.](/docs/collaborate/govern/model-access)
- Optionally set up a [model contract](/docs/collaborate/govern/model-contracts) for all public models for better governance.
- Keep dbt Core and dbt Cloud projects in separate repositories to allow for a clear separation between upstream models managed by the dbt Core team and the downstream models handled by the dbt Cloud team.

### Step 2: Mirror each "producer" Core project in dbt Cloud
This allows dbt Cloud to know about the contents and metadata of your project, which in turn allows for other projects to access its models.

- [Create a dbt Cloud account](https://www.getdbt.com/signup/) and a dbt project for each upstream Core project.
- Note: If you have [environment variables](/docs/build/environment-variables) in your project, dbt Cloud environment variables must be prefixed with `DBT_ `(including `DBT_ENV_CUSTOM_ENV_` or `DBT_ENV_SECRET`). Follow the instructions in [this guide](https://docs.getdbt.com/guides/core-to-cloud-1?step=8#environment-variables) to convert them for dbt Cloud.
- Each upstream Core project has to have a production [environment](/docs/dbt-cloud-environments) in dbt Cloud. You need to configure credentials and environment variables in dbt Cloud just so that it will resolve relation names to the same places where your dbt Core workflows are deploying those models.
- Set up a [merge job](/docs/deploy/merge-jobs) in a production environment to run `dbt parse`. This will enable connecting downstream projects in dbt Mesh by producing the necessary [artifacts](/reference/artifacts/dbt-artifacts) for cross-project referencing.
- Note: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud.
- Optional: Set up a regular job (for example, daily) to run `source freshness` and `docs generate`. This will hydrate dbt Cloud with additional metadata and enable features in [dbt Explorer](/docs/collaborate/explore-projects) that will benefit both teams, including [Column-level lineage](/docs/collaborate/column-level-lineage).

### Step 3: Create and connect your downstream projects to your Core project using dbt Mesh
Now that dbt Cloud has the necessary information about your Core project, you can begin setting up your downstream projects, building on top of the public models from the project you brought into Cloud in [Step 2](#step-2-mirror-each-producer-core-project-in-dbt-cloud). To do this:
- Initialize each new downstream dbt Cloud project and create a [`dependencies.yml` file](/docs/collaborate/govern/project-dependencies#use-cases).
- In that `dependencies.yml` file, add the dbt project name from the `dbt_project.yml` of the upstream project(s). This sets up cross-project references between different dbt projects:

```yaml
# dependencies.yml file in dbt Cloud downstream project
projects:
- name: upstream_project_name
```
- Use [cross-project references](/reference/dbt-jinja-functions/ref#ref-project-specific-models) for public models in upstream project. Add [version](/reference/dbt-jinja-functions/ref#versioned-ref) to references of versioned models:
```yaml
select * from {{ ref('upstream_project_name', 'monthly_revenue') }}
```

And that’s all it takes! From here, the domain teams that own each dbt Project can build out their models to fit their own use cases. You can now build out your Hybrid Mesh however you want, accessing the full suite of dbt Cloud features.
- Orchestrate your Mesh to ensure timely delivery of data products and make them available to downstream consumers.
- Use [dbt Explorer](/docs/collaborate/explore-projects) to trace the lineage of your data back to its source.
- Onboard more teams and connect them to your Mesh.
- Build [semantic models](/docs/build/semantic-models) and [metrics](/docs/build/metrics-overview) into your projects to query them with the [dbt Semantic Layer](https://www.getdbt.com/product/semantic-layer).


## Conclusion

In a world where organizations have complex and ever-changing data needs, there is no one-size fits all solution. Instead, data practitioners need flexible tooling that meets them where they are. The Hybrid Mesh presents a model for this approach, where teams that are comfortable and getting value out of dbt Core can collaborate frictionlessly with domain teams on dbt Cloud.
12 changes: 12 additions & 0 deletions website/dbt-versions.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@ exports.versions = [
version: "1.9.1",
customDisplay: "Cloud (Versionless)",
},
{
version: "1.9",
isPrerelease: true,
},
{
version: "1.8",
EOLDate: "2025-04-15",
Expand All @@ -42,6 +46,14 @@ exports.versions = [
* @property {string} lastVersion The last version the page is visible in the sidebar
*/
exports.versionedPages = [
{
page: "docs/build/incremental-microbatch",
firstVersion: "1.9",
},
{
page: "reference/resource-configs/snapshot_meta_column_names",
firstVersion: "1.9",
},
{
page: "reference/resource-configs/target_database",
lastVersion: "1.8",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -102,12 +102,14 @@ We’ve focused heavily thus far on the primary area of action in our dbt projec

### Project splitting

One important, growing consideration in the analytics engineering ecosystem is how and when to split a codebase into multiple dbt projects. Our present stance on this for most projects, particularly for teams starting out, is straightforward: you should avoid it unless you have no other option or it saves you from an even more complex workaround. If you do have the need to split up your project, it’s completely possible through the use of private packages, but the added complexity and separation is, for most organizations, a hindrance, not a help, at present. That said, this is very likely subject to change! [We want to create a world where it’s easy to bring lots of dbt projects together into a cohesive lineage](https://github.com/dbt-labs/dbt-core/discussions/5244). In a world where it’s simple to break up monolithic dbt projects into multiple connected projects, perhaps inside of a modern mono repo, the calculus will be different, and the below situations we recommend against may become totally viable. So watch this space!
One important, growing consideration in the analytics engineering ecosystem is how and when to split a codebase into multiple dbt projects. Currently, our advice for most teams, especially those just starting, is fairly simple: in most cases, we recommend doing so with [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro)! dbt Mesh allows organizations to handle complexity by connecting several dbt projects rather than relying on one big, monolithic project. This approach is designed to speed up development while maintaining governance.

- ❌ **Business groups or departments.** Conceptual separations within the project are not a good reason to split up your project. Splitting up, for instance, marketing and finance modeling into separate projects will not only add unnecessary complexity but destroy the unifying effect of collaborating across your organization on cohesive definitions and business logic.
- ❌ **ML vs Reporting use cases.** Similarly to the point above, splitting a project up based on different use cases, particularly more standard BI versus ML features, is a common idea. We tend to discourage it for the time being. As with the previous point, a foundational goal of implementing dbt is to create a single source of truth in your organization. The features you’re providing to your data science teams should be coming from the same marts and metrics that serve reports on executive dashboards.
As breaking up monolithic dbt projects into smaller, connected projects, potentially within a modern mono repo becomes easier, the scenarios we currently advise against may soon become feasible. So watch this space!

- ✅ **Business groups or departments.** Conceptual separations within the project are the primary reason to split up your project. This allows your business domains to own their own data products and still collaborate using dbt Mesh. For more information about dbt Mesh, please refer to our [dbt Mesh FAQs](/best-practices/how-we-mesh/mesh-5-faqs).
- ✅ **Data governance.** Structural, organizational needs — such as data governance and security — are one of the few worthwhile reasons to split up a project. If, for instance, you work at a healthcare company with only a small team cleared to access raw data with PII in it, you may need to split out your staging models into their own projects to preserve those policies. In that case, you would import your staging project into the project that builds on those staging models as a [private package](https://docs.getdbt.com/docs/build/packages/#private-packages).
- ✅ **Project size.** At a certain point, your project may grow to have simply too many models to present a viable development experience. If you have 1000s of models, it absolutely makes sense to find a way to split up your project.
- ❌ **ML vs Reporting use cases.** Similarly to the point above, splitting a project up based on different use cases, particularly more standard BI versus ML features, is a common idea. We tend to discourage it for the time being. As with the previous point, a foundational goal of implementing dbt is to create a single source of truth in your organization. The features you’re providing to your data science teams should be coming from the same marts and metrics that serve reports on executive dashboards.

## Final considerations

Expand Down
3 changes: 0 additions & 3 deletions website/docs/docs/build/data-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,6 @@ The name of this test is the name of the file: `assert_total_payment_amount_is_p

Singular data tests are easy to write—so easy that you may find yourself writing the same basic structure over and over, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend...



## Generic data tests
Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parametrized query and accepts arguments. It might look like:

Expand Down Expand Up @@ -304,7 +302,6 @@ data_tests:

</File>

To suppress warnings about the rename, add `TestsConfigDeprecation` to the `silence` block of the `warn_error_options` flag in `dbt_project.yml`, [as described in the Warnings documentation](https://docs.getdbt.com/reference/global-configs/warnings).

</VersionBlock>

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/dimensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Dimensions represent the non-aggregatable columns in your data set, which are th
Groups are defined within semantic models, alongside entities and measures, and correspond to non-aggregatable columns in your dbt model that provides categorical or time-based context. In SQL, dimensions is typically included in the GROUP BY clause.-->

All dimensions require a `name`, `type`, and can optionally include an `expr` parameter. The `name` for your Dimension must be unique wihtin the same semantic model.
All dimensions require a `name`, `type`, and can optionally include an `expr` parameter. The `name` for your Dimension must be unique within the same semantic model.

| Parameter | Description | Type |
| --------- | ----------- | ---- |
Expand Down
Loading

0 comments on commit 384bc42

Please sign in to comment.