You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"""This method adds additional edges to the DAG. For a given non-test
executable node, add an edge from an upstream test to the given node if
the set of nodes the test depends on is a subset of the upstream nodes
for the given node."""
# Given a graph:
# model1 --> model2 --> model3
# | |
# | \/
# \/ test 2
# test1
#
# Produce the following graph:
# model1 --> model2 --> model3
# | /\ | /\ /\
# | | \/ | |
# \/ | test2 ----| |
# test1 ----|---------------|
I am unsure why the graph needs to be built in this way, It seems like at most, a single edge going from a test to the direct 1-depth children should be sufficient if the goal is to maintain build order. The current implementation means that tests are the direct parents of ALL non-test downstream nodes, meaning that a project with 5,000 models and 15,000 tests might have (5k*15k/2) = 37.5 million edges, where limiting to a depth of 1 might keep that in the hundreds of thousands.
This has large implications for memory usage, build times, etc. for projects with lots of tests and/or lots of nodes generally.
If this construction is needed, I would like to understand why and add some comments or documentation for future readers of this code exploring performance issues. Otherwise, I would like to consider changing the construction to use a depth of 1.
Describe alternatives you've considered
No response
Who will this benefit?
All users of DBT, especially those with large projects.
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered:
Is this your first time submitting a feature request?
Describe the feature
Following up on this issue: #10434 (comment)
dbt-core/core/dbt/compilation.py
Lines 197 to 215 in 63262e9
I am unsure why the graph needs to be built in this way, It seems like at most, a single edge going from a test to the direct 1-depth children should be sufficient if the goal is to maintain build order. The current implementation means that tests are the direct parents of ALL non-test downstream nodes, meaning that a project with 5,000 models and 15,000 tests might have (5k*15k/2) = 37.5 million edges, where limiting to a depth of 1 might keep that in the hundreds of thousands.
This has large implications for memory usage, build times, etc. for projects with lots of tests and/or lots of nodes generally.
If this construction is needed, I would like to understand why and add some comments or documentation for future readers of this code exploring performance issues. Otherwise, I would like to consider changing the construction to use a depth of 1.
Describe alternatives you've considered
No response
Who will this benefit?
All users of DBT, especially those with large projects.
Are you interested in contributing this feature?
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: