-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt perf tuning #581
base: main
Are you sure you want to change the base?
dbt perf tuning #581
Conversation
I'll have to rerun from scratch to get the latest numbers, but here's the log from the first build operation which I was able to get successfully run end-to-end. From this log output, the tables taking more than 10 minutes are:
Details
|
@pnadolny13 - Do you have expected runtime durations for the top 5 or so jobs on your side? This PR isn't ready to go as of now, and some of these changes need additional thought. I'm curious though if the above runtimes are better, the same, or worse than the runtimes you have on your side - especially for the slowest 5-7 tables that make up the bulk of the runtime. (I should mention that those logs force threading to 1 in order to be easy to scan and debug.) |
@aaronsteers I recently was working on #524 which made a difference but with telemetry data in there my thought was that it will always be pretty hard to do a clean rebuild. The way I get around this when I'm developing is by making myself a clone of prod to start then only running the models that I've changed. I'm definitely open to suggestions on this though. For CICD I limited the telemetry data to a few days worth so it finishes a full run in ~30 mins but thats only for a subset of data. The project is configured to use dbt-artifacts as a post run hook so we collect all cicd/prod runtimes. Here is the nightly build from last night ordered by models with the longest runtime: Thats using incremental so most of the large tables build quick vs a rebuild that would take much longer. So the longest running model is ~16 mins. |
@pnadolny13 re:
I couldn't find the logic for this limit. Could you point me to that for use in #580 ? I think its probably reasonable to have at least the option for shorter limit on the telemetry data for local dev. |
@aaronsteers the limit logic is in
|
Includes changes from #580, should be merged after it.
This attempts to work through some performance issues I ran into when trying to get the full project to build using the new
dbt_dev
environment.