Skip to content

Commit

Permalink
Merge branch 'current' into update-community-award-badge
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewshaver authored Dec 7, 2023
2 parents 4fea0df + 39f4686 commit 446efd3
Show file tree
Hide file tree
Showing 7 changed files with 215 additions and 5 deletions.
5 changes: 5 additions & 0 deletions website/docs/docs/build/metricflow-commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -556,3 +556,8 @@ Keep in mind that modifying your shell configuration files can have an impact on
</details>
<details>
<summary>Why is my query limited to 100 rows in the dbt Cloud CLI?</summary>
The default <code>limit</code> for query issues from the dbt Cloud CLI is 100 rows. We set this default to prevent returning unnecessarily large data sets as the dbt Cloud CLI is typically used to query the dbt Semantic Layer during the development process, not for production reporting or to access large data sets. For most workflows, you only need to return a subset of the data.<br /><br />
However, you can change this limit if needed by setting the <code>--limit</code> option in your query. For example, to return 1000 rows, you can run <code>dbt sl list metrics --limit 1000</code>.
</details>
8 changes: 5 additions & 3 deletions website/docs/docs/cloud/manage-access/sso-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@ Non-admin users that currently login with a password will no longer be able to d
### Security best practices

There are a few scenarios that might require you to login with a password. We recommend these security best-practices for the two most common scenarios:
* **Onboarding partners and contractors** - We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Azure Active Directory (AAD) offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt Cloud environment.
* **Identity Provider is down -** Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem.
* **Onboarding partners and contractors** &mdash; We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Azure Active Directory (AAD) offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt Cloud environment.
* **Identity Provider is down** &mdash; Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem.
* **Offboarding admins** &mdash; When offboarding admins, revoke access to dbt Cloud by deleting the user from your environment; otherwise, they can continue to use username/password credentials to log in.

### Next steps for non-admin users currently logging in with passwords

Expand All @@ -67,4 +68,5 @@ If you have any non-admin users logging into dbt Cloud with a password today:
1. Ensure that all users have a user account in your identity provider and are assigned dbt Cloud so they won’t lose access.
2. Alert all dbt Cloud users that they won’t be able to use a password for logging in anymore unless they are already an Admin with a password.
3. We **DO NOT** recommend promoting any users to Admins just to preserve password-based logins because you will reduce security of your dbt Cloud environment.
**


Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
title: "Update: Extended attributes is GA"
description: "December 2023: The extended attributes feature is now GA in dbt Cloud. It enables you to override dbt adapter YAML attributes at the environment level."
sidebar_label: "Update: Extended attributes is GA"
sidebar_position: 10
tags: [Dec-2023]
date: 2023-12-06
---

The extended attributes feature in dbt Cloud is now GA! It allows for an environment level override on any YAML attribute that a dbt adapter accepts in its `profiles.yml`. You can provide a YAML snippet to add or replace any [profile](/docs/core/connect-data-platform/profiles.yml) value.

To learn more, refer to [Extended attributes](/docs/dbt-cloud-environments#extended-attributes).

The **Extended Atrributes** text box is available from your environment's settings page:

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/extended-attributes.jpg" width="85%" title="Example of the Extended Attributes text box" />
2 changes: 1 addition & 1 deletion website/docs/guides/bigquery-qs.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ In order to let dbt connect to your warehouse, you'll need to generate a keyfile
- Click **Next** to create a new service account.
2. Create a service account for your new project from the [Service accounts page](https://console.cloud.google.com/projectselector2/iam-admin/serviceaccounts?supportedpurview=project). For more information, refer to [Create a service account](https://developers.google.com/workspace/guides/create-credentials#create_a_service_account) in the Google Cloud docs. As an example for this guide, you can:
- Type `dbt-user` as the **Service account name**
- From the **Select a role** dropdown, choose **BigQuery Admin** and click **Continue**
- From the **Select a role** dropdown, choose **BigQuery Job User** and **BigQuery Data Editor** roles and click **Continue**
- Leave the **Grant users access to this service account** fields blank
- Click **Done**
3. Create a service account key for your new project from the [Service accounts page](https://console.cloud.google.com/iam-admin/serviceaccounts?walkthrough_id=iam--create-service-account-keys&start_index=1#step_index=1). For more information, refer to [Create a service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating) in the Google Cloud docs. When downloading the JSON file, make sure to use a filename you can easily remember. For example, `dbt-user-creds.json`. For security reasons, dbt Labs recommends that you protect this JSON file like you would your identity credentials; for example, don't check the JSON file into your version control software.
Expand Down
2 changes: 1 addition & 1 deletion website/docs/reference/dbt-jinja-functions/return.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "About return function"
sidebar_variable: "return"
sidebar_label: "return"
id: "return"
description: "Read this guide to understand the return Jinja function in dbt."
---
Expand Down
182 changes: 182 additions & 0 deletions website/docs/reference/resource-configs/databricks-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,188 @@ insert into analytics.replace_where_incremental
</TabItem>
</Tabs>

<VersionBlock firstVersion="1.7">

## Selecting compute per model

Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis.
For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster.
For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models).
To take advantage of this capability, you will need to add compute blocks to your profile:

<File name='profile.yml'>

```yaml

<profile-name>:
target: <target-name> # this is the default target
outputs:
<target-name>:
type: databricks
catalog: [optional catalog name if you are using Unity Catalog]
schema: [schema name] # Required
host: [yourorg.databrickshost.com] # Required

### This path is used as the default compute
http_path: [/sql/your/http/path] # Required

### New compute section
compute:

### Name that you will use to refer to an alternate compute
Compute1:
http_path: [‘/sql/your/http/path’] # Required of each alternate compute

### A third named compute, use whatever name you like
Compute2:
http_path: [‘/some/other/path’] # Required of each alternate compute
...

<target-name>: # additional targets
...
### For each target, you need to define the same compute,
### but you can specify different paths
compute:

### Name that you will use to refer to an alternate compute
Compute1:
http_path: [‘/sql/your/http/path’] # Required of each alternate compute

### A third named compute, use whatever name you like
Compute2:
http_path: [‘/some/other/path’] # Required of each alternate compute
...

```

</File>

The new compute section is a map of user chosen names to objects with an http_path property.
Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models.
We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI.

:::note

You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios.

:::

To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments:

```yaml

compute:
Compute1:
http_path:[`/some/other/path']
Compute2:
http_path:[`/some/other/path']

```

### Specifying the compute for models

As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`.
In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory:

<File name='dbt_project.yml'>

```yaml

...

models:
+databricks_compute: "Compute1" # use the `Compute1` warehouse/cluster for all models in the project...
my_project:
clickstream:
+databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`.

snapshots:
+databricks_compute: "Compute1" # all Snapshot models are configured to use `Compute1`.

```

</File>

For an individual model the compute can be specified in the model config in your schema file.

<File name='schema.yml'>

```yaml

models:
- name: table_model
config:
databricks_compute: Compute1
columns:
- name: id
data_type: int

```

</File>


Alternatively the warehouse can be specified in the config block of a model's SQL file.

<File name='model.sql'>

```sql

{{
config(
materialized='table',
databricks_compute='Compute1'
)
}}
select * from {{ ref('seed') }}

```

</File>

:::note

In the absence of a specified compute, we will default to the compute specified by http_path in the top level of the output section in your profile.
This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema.

:::

To validate that the specified compute is being used, look for lines in your dbt.log like:

```
Databricks adapter ... using default compute resource.
```

or

```
Databricks adapter ... using compute resource <name of compute>.
```

### Specifying compute for Python models

Materializing a python model requires execution of SQL as well as python.
Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL.
The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse.
When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL.
If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model:

<File name="model.py">

```python

def model(dbt, session):
dbt.config(
http_path="sql/protocolv1/..."
)

```

</File>

If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way.

</VersionBlock>

## Persisting model descriptions

Expand Down
5 changes: 5 additions & 0 deletions website/snippets/_sl-faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@

As we refine MetricFlow’s API layers, some users may find it easier to set up their own custom service layers for managing query requests. This is not currently recommended, as the API boundaries around MetricFlow are not sufficiently well-defined for broad-based community use

- **Why is my query limited to 100 rows in the dbt Cloud CLI?**
- The default `limit` for query issues from the dbt Cloud CLI is 100 rows. We set this default to prevent returning unnecessarily large data sets as the dbt Cloud CLI is typically used to query the dbt Semantic Layer during the development process, not for production reporting or to access large data sets. For most workflows, you only need to return a subset of the data.

However, you can change this limit if needed by setting the `--limit` option in your query. For example, to return 1000 rows, you can run `dbt sl list metrics --limit 1000`.

- **Can I reference MetricFlow queries inside dbt models?**
- dbt relies on Jinja macros to compile SQL, while MetricFlow is Python-based and does direct SQL rendering targeting at a specific dialect. MetricFlow does not support pass-through rendering of Jinja macros, so we can’t easily reference MetricFlow queries inside of dbt models.

Expand Down

0 comments on commit 446efd3

Please sign in to comment.