-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for running mul invocations in parallel #4909
Conversation
this pr clarifies that the Cloud CLI now support running multiple invocations in parallel. This is based on @dichenqiandbt 's demo. Before that cloud CLI only supports run one invocation at one time.
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
@@ -95,6 +95,7 @@ To set environment variables in the dbt Cloud CLI for your dbt project: | |||
## Use the dbt Cloud CLI | |||
|
|||
- The dbt Cloud CLI uses the same set of [dbt commands](/reference/dbt-commands) and [MetricFlow commands](/docs/build/metricflow-commands) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](/reference/commands/dbt-environment) command to view your dbt Cloud configuration details. | |||
- You can run multiple different invocations or commands in parallel. For example, `dbt build` and `dbt parse`. Note, that you're unable to run the same dbt commands in parallel. For example, running `dbt build` at the same time isn't supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope to clarify a little to make sure the words here is easy for customer to understand.
Today we support run 2 invocations but it might be more in future, depends on user feedback.
@@ -95,6 +95,7 @@ To set environment variables in the dbt Cloud CLI for your dbt project: | |||
## Use the dbt Cloud CLI | |||
|
|||
- The dbt Cloud CLI uses the same set of [dbt commands](/reference/dbt-commands) and [MetricFlow commands](/docs/build/metricflow-commands) as dbt Core to execute the commands you provide. For example, use the [`dbt environment`](/reference/commands/dbt-environment) command to view your dbt Cloud configuration details. | |||
- You can run multiple different invocations or commands in parallel. For example, `dbt build` and `dbt parse`. Note, that you're unable to run the same dbt commands in parallel. For example, running `dbt build` at the same time isn't supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not 100% accurate, on backend we categorize dbt commands into two types, data warehouse write and data warehouse non-write.
Data warehouse write command always has 1 parallelism. E.g. build. They may cause data warehouse confliction, e.g. overwrite the same table.
Data warehouse non-write command can have x(today it's 1, but it might be more, let's say 2) parallelism. E.g. parse. They are safe to run in parallel.
I'm not sure how to phrase this as it's too complicated for customer to understand, maybe say you are not able to run data warehouse conflicted commands?
…s.getdbt.com into mirnawong1-patch-22
mirna to add the following list to specify write and no nwrite commands. waiting to for more info so that i can decide to add this in the commands page (ref) or cloud cli page:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handful of suggestions.
Co-authored-by: Doug Beatty <[email protected]>
Co-authored-by: Doug Beatty <[email protected]>
Co-authored-by: Doug Beatty <[email protected]>
Co-authored-by: Doug Beatty <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three suggested changes to the table:
- Omit the "Type" column since it carries the same info as the "Parallel execution" column
- Omit content in the "Caveat" section if there are no tool or version restrictions. Remove "Requires". Add "only".
- Replace "N/A" with ✅ if the command can be executed simultaneously (even if it doesn't interact with the database via a read or a write).
This table is exactly the same as the current table. It just . It also
This table is exactly the same as the current table. It just omits the "Type" column since it carries the same info as the "Parallel execution" column.
It also omits content in the "Caveat" section if there are no tool or version restrictions -- the common assumption is that a command is available in all versions and all tools unless stated otherwise.
There are two versions of the table below:
- Both suggested changes
- Omission of "Type" column only
Both are exactly the same as the current table, just less busy.
I didn't do a prototype of suggestion 3, but my assumption is that all the "N/A" could just be replaced with ✅.
Both suggested changes
Command | Description | Parallel execution | Caveats |
---|---|---|---|
build | Build and test all selected resources (models, seeds, snapshots, tests) | ❌ | |
cancel | Cancels the most recent invocation. | N/A | dbt Cloud CLI only dbt v1.6 or higher |
clean | Deletes artifacts present in the dbt project | ✅ | |
clone | Clone selected models from the specified state | ❌ | dbt v1.6 or higher |
compile | Compiles (but does not run) the models in a project | ✅ | |
debug | Debugs dbt connections and projects | ✅ | dbt Cloud IDE and dbt Core only |
deps | Downloads dependencies for a project | ✅ | |
docs | Generates documentation for a project | ✅ | |
environment | Enables you to interact with your dbt Cloud environment. | N/A | dbt Cloud CLI only dbt v1.5 or higher |
help | Displays help information for any command | N/A | dbt Core and dbt Cloud CLI only |
init | Initializes a new dbt project | ✅ | dbt Core only |
list | Lists resources defined in a dbt project | ✅ | |
parse | Parses a project and writes detailed timing info | ✅ | |
reattach | Reattaches to the most recent invocation to retrieve logs and artifacts. | N/A | dbt Cloud CLI only dbt v1.6 or higher |
retry | Retry the last run dbt command from the point of failure |
❌ | dbt v1.6 or higher |
run | Runs the models in a project | ❌ | |
run-operation | Invoke a macro, including running arbitrary maintenance SQL against the database | ❌ | |
seed | Loads CSV files into the database | ❌ | |
show | Preview table rows post-transformation | ✅ | |
snapshot | Executes "snapshot" jobs defined in a project | ❌ | |
source | Provides tools for working with source data (including validating that sources are "fresh") | ✅ | |
test | Executes tests defined in a project | ✅ | |
--version | Displays the currently installed version of dbt CLI | N/A | dbt Core and dbt Cloud CLI only |
Omission of "Type" column only
Command | Description | Parallel execution | Caveats |
---|---|---|---|
build | Build and test all selected resources (models, seeds, snapshots, tests) | ❌ | All tools All supported versions |
cancel | Cancels the most recent invocation. | N/A | dbt Cloud CLI Requires dbt v1.6 or higher |
clean | Deletes artifacts present in the dbt project | ✅ | All tools All supported versions |
clone | Clone selected models from the specified state | ❌ | All tools Requires dbt v1.6 or higher |
compile | Compiles (but does not run) the models in a project | ✅ | All tools All supported versions |
debug | Debugs dbt connections and projects | ✅ | dbt Cloud IDE, dbt Core All supported versions |
deps | Downloads dependencies for a project | ✅ | All tools All supported versions |
docs | Generates documentation for a project | ✅ | All tools All supported versions |
environment | Enables you to interact with your dbt Cloud environment. | N/A | dbt Cloud CLI Requires dbt v1.5 or higher |
help | Displays help information for any command | N/A | dbt Core, dbt Cloud CLI All supported versions |
init | Initializes a new dbt project | ✅ | dbt Core All supported versions |
list | Lists resources defined in a dbt project | ✅ | All tools All supported versions |
parse | Parses a project and writes detailed timing info | ✅ | All tools All supported versions |
reattach | Reattaches to the most recent invocation to retrieve logs and artifacts. | N/A | dbt Cloud CLI Requires dbt v1.6 or higher |
retry | Retry the last run dbt command from the point of failure |
❌ | All tools Requires dbt v1.6 or higher |
run | Runs the models in a project | ❌ | All tools All supported versions |
run-operation | Invoke a macro, including running arbitrary maintenance SQL against the database | ❌ | All tools All supported versions |
seed | Loads CSV files into the database | ❌ | All tools All supported versions |
show | Preview table rows post-transformation | ✅ | All tools All supported versions |
snapshot | Executes "snapshot" jobs defined in a project | ❌ | All tools All supported versions |
source | Provides tools for working with source data (including validating that sources are "fresh") | ✅ | All tools All supported versions |
test | Executes tests defined in a project | ✅ | All tools All supported versions |
--version | Displays the currently installed version of dbt CLI | N/A | dbt Core, dbt Cloud CLI All supported versions |
thank you so much for this detailed explanation! I love your suggestions and my instincts go to having all rows under 'caveats' filled in so it's explicit what version and tool is supported ( as opposed to inferred). I'll always opt to be more explicit. I'll change this up and revert back! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @mirnawong1 -- I feel much smarter after reading this PR 🧠
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks! I like this PR and customer will like it too!
|
||
- **Data platform write commands** — Commands such as `dbt build` and `dbt run` that perform write operations to your data platform. These commands are limited to one invocation at any given time. This is to prevent any potential conflicts, such as overwriting the same table in your data platform, at the same time. For example, you can't run `dbt build` and `dbt run` at the same time. | ||
|
||
- **Data platform read commands** — Commands such as `dbt parse` and `dbt source snapshot-freshness` that don't write to your platform. These commands aren't limited to one invocation at any given time and you can run multiple invocations in parallel. For example, you can run `dbt parse` and `dbt source snapshot-freshness` at the same time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloud CLI do have a parallelism limit on non-read commands, can you rephrase by adding a "ideally aren't limited to...".
Today our limit is 1 for non-write invocations but it might be increased soon, so I don't want to say a concrete limit number here, otherwise it would be updated couple of times.
LMK if my expression is clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small wording suggestions! Ship when you're ready!
Co-authored-by: Leona B. Campbell <[email protected]>
Co-authored-by: Leona B. Campbell <[email protected]>
thanks for everyone's feedback, i really appreciate it! I'm merging this now! |
this pr clarifies that the Cloud CLI now support running multiple invocations in parallel. This is based on @dichenqiandbt 's demo.
Before that cloud CLI only supports run one invocation at one time.
This pr has grown to also address parallel execution, what it means, where it's supported, and modify the current dbt commands table to further explain this.
Resolves #4952