[Regression] 1.8.2 slower to build than 1.5.9 when tag+ includes many nodes #10434

cajubelt · 2024-07-12T03:33:25Z

Is this a regression in a recent version of dbt-core?

I believe this is a regression in dbt-core functionality
I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

dbt build -s tag:my_tag+ takes about 20 minutes longer to start on dbt 1.8.2 than it does on 1.5.9 with the same tag. The tag used has a lot of downstream nodes in our project, about 11k. Generally we’re seeing better performance on 1.8 so we were surprised to see this big regression in performance.

Expected/Previous Behavior

Previously building everything downstream of a tag with lots of nodes would take a couple minutes of startup time and then begin running queries against our db. Now it takes 20+ minutes.

Steps To Reproduce

Set up dbt project with a tag that has about 11k downstream nodes
Install dbt 1.8.2
Run dbt build -s tag:my_tag+

Relevant log output

No response

Environment

- OS: MacOS 14.5 and Ubuntu 22.04
- Python: 3.9.12
- dbt (working version): 1.5.9
- dbt (regression version): 1.8.2

Which database adapter are you using with dbt?

bigquery

Additional Context

The reason we need this is because we have a selector used in CI/CD that excludes everything downstream of a tag that is upstream of many nodes. The example given is a simpler version of the original issue we found with that selector. (We tested the simpler version and found to also have the same performance issue.) The selector was something like

- name: my_selector
  definition: 
    union:
      - state:modified+
      - exclude:
        - tag:my_tag+

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2024-07-12T12:44:26Z

Thanks @cajubelt! Is this only for the build command (i.e. not run)?

I suspect this might be due to the large number of tests in your project (per our conversation), and the additional time that dbt spends "linking" the DAG (adding edges between test on upstream model -> downstream models, so that they skip on test failure). While it's not immediately clear to me what change we made between v1.5 -> v1.8 to that logic, if you only see this slowdown on build, it would be a strong indication that therein lies the problem.

cajubelt · 2024-07-12T14:01:06Z

@jtcohen6 yes I just confirmed that dbt run doesn't have any noticeable slowdown on 1.8 with the same selector.

jtcohen6 · 2024-07-12T14:52:21Z

@cajubelt Okay! So I think the hypothesis is:

A project with a lot of tests (~5k models and ~15k data tests) is meaningfully slower at startup for dbt build in v1.8 compared to v1.5
My suspicion is that this slowdown is happening within (or closely adjacent to) the add_test_edges method

jtcohen6 · 2024-07-12T15:05:20Z

If you're up for it, there are two ways to try confirming that hypothesis by profiling dbt's performance:

Using snakeviz as documented here: https://docs.getdbt.com/reference/global-configs/record-timing-info (this way is a bit older, but more familiar to me)
Using py-spy (snazzier) in the way @peterallenwebb recommends here:

pip install py-spy
sudo py-spy record -s -f speedscope -- dbt build

gshank · 2024-07-12T17:41:30Z

We have the ability to exclude tests from the build resource types, but that doesn't affect whether or not we pass "add_test_edges" to compile. It would be possible to check the resource types and if tests are excluded, pass "add_test_edges" = False.

Of course if you do want tests to run, that doesn't help.

cajubelt · 2024-07-15T00:46:47Z

We do want to run tests if possible.

@jtcohen6 here's a screenshot of a search through py-spy's output for the longest segments of the flame graph. Looks like generic_bfs_edges is adding 12:41, not sure if that's particularly revealing however

[Preview](https://docs-getdbt-com-git-dbeatty10-patch-2-dbt-labs.vercel.app/reference/global-configs/record-timing-info) ## What are you changing in this pull request and why? Officially documenting the performance profiling approach mentioned in dbt-labs/dbt-core#10434 (comment). ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines.

stefankeidel · 2024-07-25T11:17:22Z

@jtcohen6 Not super sure if related to this issue but we're seeing an extreme slowdown when building a large chunk of a fairly big graph with --indirect-selection buildable in CI/CD. The project has 2573 models, 13 snapshots, 7 analyses, 2254 data tests, 20 seeds, 1 operation, 359 sources, 44 exposures, 1408 macros, 1 group, 13 unit tests

cautious and eager end up being fine, but with buildable, the process "hangs" for up to 20 minutes after it prints the project stats "Found 2573 models ..." etc. and before the "Concurrency: 4 threads (target='test')" message. No other output, even on debug log level.

I tried to figure out what's going on using the trace module but it was a bit too much spam :). Will try add some trace output when I find a minute

stefankeidel · 2024-07-25T12:11:06Z

here's me waiting for about 5minutes for the generic_bfs_edges traversal

In a "good" run with eager selection this takes about two seconds with the exact same selector (the rest is dbt actually starting to execute, which I didn't wait for in the other run)

ttusing · 2024-08-03T08:40:13Z

What percentage of your graph (including tests) is selected by the tag, and what % is selected by the selector?

I had someone on my team try using the @ decorator on a selector (which I don't normally do) and we both had DBT hang. I kind of pushed it out of my head, will be thinking on this now.

I don't really understand the py-spy output (worked with SnakeViz).... if I had to guess what might be happening:

for some reason, generic_bfs_edges is either really slow or being called a lot
collect_specified_neighbors in selector.py calls self.graph.select_children when passed certain selector patterns
self.graph.select_children applies the self.descendants method for each node, which is perhaps a lot of times, or maybe it uses up a lot of memory.
the slow part based on your output is filter_edges_by_type, which is calling get_edge_data in networkx,

If you can share the snakeviz would help out, also maybe check your memory usage? Last bug I worked on for this graph optimization stuff was having performance issues from running out of memory when the DAG got too big.

If my stab in the dark here is in the right direction, maybe there is a pruning that could be done to reduce calling descendants so much. Like, if you have a ton of nodes in your selector for which many are already descendents of each other, you don't need to re-tread and evaluate each node. could first get a list of all descendents for each node with a depth of 1, then take the difference of that and the list of nodes in the selector to restrict to searching for descendents of unsearched nodes, find descendents of depth 1 from that list, and iterate.

For your graph has 20k nodes, if 1k have the tag, 10k are descendents, and each descendent is on average the descendent of 70% of the tagged nodes, this takes the number of times each of the descendents has to be evaluated by ~700. 700*10,000 is a lot of calls.

After writing this out this seems actually pretty likely XD. Think an update to select_children with this logic would be the place to implement this. This seems similar to #7194

Also if the slow part is filtered_graph = self.exclude_edge_type("parent_test"), maybe could cache that or pass it as a parameter into descendants?

will try tests on my own project with ~500 models and ~5000 tests next week.

ttusing · 2024-08-06T00:51:41Z

I added a PR! #10526

I believe I was able to recreate the issue in my project, which has: Found 442 models, 43 snapshots, 5112 data tests, 32 seeds, 2 operations, 134 sources, 23 exposures, 1075 macros

Running dbt build --select tag:silver+ on my DAG on 1.8.4, it takes 132 seconds to start building. dbt build just takes 11 seconds.

My PR applies the pruning strategy I described. I saw the runtime improve for dbt build --select tag:silver+ to 66 seconds. Using SnakeViz, I identified that the number of calls to get_edge_data reduced from 14,332,569 to 383,916.

I am unsure how much DBT 1.9/main branch contains performance improvements... the total runtime improvement was 66 seconds, but select_children only improved from 80 seconds to 45 seconds (35 seconds improvement). Going to try to run main branch for performance check.

Feedback welcome, especially if you can test/review the change!

peterallenwebb · 2024-08-09T13:18:22Z

I've reviewed and suggested some changes to @ttusing's PR but we should be close to a fix on this.

peterallenwebb · 2024-08-09T21:39:37Z

This is now resolved in 1.9+, but I'm re-opening this issue so we can consider backports.

ttusing · 2024-08-09T21:45:47Z

I'm also interested in seeing performance reports from @stefankeidel and @cajubelt. This should help, but checking the edge type seems to be inherently slow. Wonder if the runtime will improve from 20 minutes to like 15, 10, or 5 minutes and if there are further improvements that can be made in the edge type lookup speed.

I could image avoiding the edge type lookup altogether somehow with the knowledge that tests cannot depend on tests. So, don't check the edge type for searching children if the node is a test (or if parent is a test for ancestors) or something like that. Or, building the subgraph with all non-tests nodes first.

cajubelt · 2024-08-09T21:54:50Z

Just saw the great activity on this after it fell off my radar for a bit, thanks all. Will follow up on Monday with a perf report, I have personal commitments this weekend.

We'd love to see a backport to 1.8.

stefankeidel · 2024-08-10T06:10:45Z

Thanks for the update and amazing work! I am currently traveling on a sabbatical leave so I won’t be able to test this before I’m back in November, but will make sure to do so then!On 9. Aug 2024, at 23:55, Charlie Andrews-Jubelt ***@***.***> wrote: Just saw the great activity on this after it fell off my radar for a bit, thanks all. Will follow up on Monday with a perf report, I have personal commitments this weekend. We'd love to see a backport to 1.8 since our project is fairly large and we are unlikely to update to 1.9 before some key deadlines this Fall. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

ttusing · 2024-08-12T08:42:51Z

So I'm looking at this section:

dbt-core/core/dbt/compilation.py

Lines 197 to 215 in 63262e9

    
               def add_test_edges(self, manifest: Manifest) -> None: 
        
                   """This method adds additional edges to the DAG. For a given non-test 
        
                   executable node, add an edge from an upstream test to the given node if 
        
                   the set of nodes the test depends on is a subset of the upstream nodes 
        
                   for the given node.""" 
        
                   # Given a graph: 
        
                   # model1 --> model2 --> model3 
        
                   #   |             | 
        
                   #   |            \/ 
        
                   #  \/          test 2 
        
                   # test1 
        
                   # 
        
                   # Produce the following graph: 
        
                   # model1 --> model2 --> model3 
        
                   #   |       /\    |      /\ /\ 
        
                   #   |       |    \/      |  | 
        
                   #  \/       |  test2 ----|  | 
        
                   # test1 ----|---------------|

This adds a huge amount of edges. For probably most but pathological DAGs, this is an edge between every test and all nodes downstream of what its testing. For the example of 5k models and 15k data tests, this might be added approximately (5k*15k/2)= 37.5 million edges.

Was added in #7276, I don't really understand the purpose of this code. @gshank can you explain why it is necessary to add all of these edges? It seems like a huge factor limiting DBT scaling. If the goal is to get dbt build to run tests on parents before children, isn't enough to just do this with a depth of 1?

I'm guessing the get edge type function is slow because of the millions of edges. There are way less nodes (thousands), so getting node type should probably be way faster, and I am nearly certain nothing can directly depend on tests (only parent_edge type), so I think checking the node types is another potential huge optimization.

Somewhat separately, I wonder about a feature enhancement to print simple DAG stats with like --dag-debug for helping with these. Might show the total number of edges, highest degree node (or distribution stats on degree), highest number of tests on a single model (or distribution), etc.

cajubelt · 2024-08-13T12:25:44Z

Ok I did some profiling last night on dbt build --select tag:large_tag+. Looks like get_edge_data accounted for about 36 minutes and was called 710,951,628 times. Total time to build was 81 minutes. (I used --empty and it errored out immediately when writing to tables so none of this was time waiting for the db writes.) I used dbt 1.8.2. Will try again tonight on 1.9 if that's helpful

cajubelt · 2024-08-14T02:00:00Z

Ok, ran it again with 1.9.0-a1 and the improvement was dramatic. The build time was just 10 minutes. get_edge_data was called 11,778,492, a 98% reduction, and accounted for just 21 seconds.

cajubelt · 2024-08-14T02:01:18Z

Some stats about my project: 5799 models, 20 snapshots, 91 analyses, 18763 data tests, 186 seeds, 1 operation, 1086 sources, 69 exposures, 1899 macros

dbeatty10 · 2024-08-30T21:12:10Z

Resolved for dbt-core 1.9+ in #10526 and backported to dbt-core 1.8 in #10567.

cajubelt added bug Something isn't working regression triage labels Jul 12, 2024

jtcohen6 added performance awaiting_response and removed triage labels Jul 12, 2024

github-actions bot added triage and removed awaiting_response labels Jul 12, 2024

dbeatty10 mentioned this issue Jul 19, 2024

Instructions to generate speedscope profiles using py-spy dbt-labs/docs.getdbt.com#5814

Merged

1 task

ttusing mentioned this issue Aug 6, 2024

Improve tree traversal of select_children #10526

Merged

5 tasks

peterallenwebb closed this as completed in #10526 Aug 9, 2024

peterallenwebb reopened this Aug 9, 2024

jtcohen6 added backport 1.8.latest and removed triage labels Aug 14, 2024

peterallenwebb mentioned this issue Aug 14, 2024

Backport #10526 #10567

Merged

5 tasks

dbeatty10 closed this as completed Aug 30, 2024

ttusing mentioned this issue Oct 11, 2024

[Feature] Performance: prune potentially redundant edges, or document why they exist #10842

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Regression] 1.8.2 slower to build than 1.5.9 when tag+ includes many nodes #10434

[Regression] 1.8.2 slower to build than 1.5.9 when tag+ includes many nodes #10434

cajubelt commented Jul 12, 2024 •

edited

Loading

jtcohen6 commented Jul 12, 2024

cajubelt commented Jul 12, 2024

jtcohen6 commented Jul 12, 2024

jtcohen6 commented Jul 12, 2024 •

edited by dbeatty10

Loading

gshank commented Jul 12, 2024

cajubelt commented Jul 15, 2024

stefankeidel commented Jul 25, 2024

stefankeidel commented Jul 25, 2024

ttusing commented Aug 3, 2024 •

edited

Loading

ttusing commented Aug 6, 2024

peterallenwebb commented Aug 9, 2024

peterallenwebb commented Aug 9, 2024

ttusing commented Aug 9, 2024

cajubelt commented Aug 9, 2024 •

edited

Loading

stefankeidel commented Aug 10, 2024 via email

ttusing commented Aug 12, 2024 •

edited

Loading

cajubelt commented Aug 13, 2024

cajubelt commented Aug 14, 2024

cajubelt commented Aug 14, 2024

dbeatty10 commented Aug 30, 2024

[Regression] 1.8.2 slower to build than 1.5.9 when tag+ includes many nodes #10434

[Regression] 1.8.2 slower to build than 1.5.9 when tag+ includes many nodes #10434

Comments

cajubelt commented Jul 12, 2024 • edited Loading

Is this a regression in a recent version of dbt-core?

Current Behavior

Expected/Previous Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

jtcohen6 commented Jul 12, 2024

cajubelt commented Jul 12, 2024

jtcohen6 commented Jul 12, 2024

jtcohen6 commented Jul 12, 2024 • edited by dbeatty10 Loading

gshank commented Jul 12, 2024

cajubelt commented Jul 15, 2024

stefankeidel commented Jul 25, 2024

stefankeidel commented Jul 25, 2024

ttusing commented Aug 3, 2024 • edited Loading

ttusing commented Aug 6, 2024

peterallenwebb commented Aug 9, 2024

peterallenwebb commented Aug 9, 2024

ttusing commented Aug 9, 2024

cajubelt commented Aug 9, 2024 • edited Loading

stefankeidel commented Aug 10, 2024 via email

ttusing commented Aug 12, 2024 • edited Loading

cajubelt commented Aug 13, 2024

cajubelt commented Aug 14, 2024

cajubelt commented Aug 14, 2024

dbeatty10 commented Aug 30, 2024

cajubelt commented Jul 12, 2024 •

edited

Loading

jtcohen6 commented Jul 12, 2024 •

edited by dbeatty10

Loading

ttusing commented Aug 3, 2024 •

edited

Loading

cajubelt commented Aug 9, 2024 •

edited

Loading

ttusing commented Aug 12, 2024 •

edited

Loading