Unit testing feature branch pull request (#8411)

* Initial implementation of unit testing (from pr #2911) Co-authored-by: Michelle Ark <[email protected]> * 8295 unit testing artifacts (#8477) * unit test config: tags & meta (#8565) * Add additional functional test for unit testing selection, artifacts, etc (#8639) * Enable inline csv format in unit testing (#8743) * Support unit testing incremental models (#8891) * update unit test key: unit -> unit-tests (#8988) * convert to use unit test name at top level key (#8966) * csv file fixtures (#9044) * Unit test support for `state:modified` and `--defer` (#9032) Co-authored-by: Michelle Ark <[email protected]> * Allow use of sources as unit testing inputs (#9059) * Use daff for diff formatting in unit testing (#8984) * Fix #8652: Use seed file from disk for unit testing if rows not specified in YAML config (#9064) Co-authored-by: Michelle Ark <[email protected]> Fix #8652: Use seed value if rows not specified * Move unit testing to test and build commands (#9108) * Enable unit testing in non-root packages (#9184) * convert test to data_test (#9201) * Make fixtures files full-fledged members of manifest and enable partial parsing (#9225) * In build command run unit tests before models (#9273) --------- Co-authored-by: Michelle Ark <[email protected]> Co-authored-by: Michelle Ark <[email protected]> Co-authored-by: Emily Rockman <[email protected]> Co-authored-by: Jeremy Cohen <[email protected]> Co-authored-by: Kshitij Aranke <[email protected]>
dbt-labs · Jan 16, 2024 · b5a0c4c · b5a0c4c
1 parent 15704ab
commit b5a0c4c
Show file tree

Hide file tree

Showing 167 changed files with 12,171 additions and 4,083 deletions.
diff --git a/.changes/unreleased/Features-20230802-145011.yaml b/.changes/unreleased/Features-20230802-145011.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Initial implementation of unit testing
+time: 2023-08-02T14:50:11.391992-04:00
+custom:
+  Author: gshank
+  Issue: "8287"
diff --git a/.changes/unreleased/Features-20230828-101825.yaml b/.changes/unreleased/Features-20230828-101825.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Unit test manifest artifacts and selection
+time: 2023-08-28T10:18:25.958929-04:00
+custom:
+  Author: gshank
+  Issue: "8295"
diff --git a/.changes/unreleased/Features-20230906-234741.yaml b/.changes/unreleased/Features-20230906-234741.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Support config with tags & meta for unit tests
+time: 2023-09-06T23:47:41.059915-04:00
+custom:
+  Author: michelleark
+  Issue: "8294"
diff --git a/.changes/unreleased/Features-20230928-163205.yaml b/.changes/unreleased/Features-20230928-163205.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Enable inline csv fixtures in unit tests
+time: 2023-09-28T16:32:05.573776-04:00
+custom:
+  Author: gshank
+  Issue: "8626"
diff --git a/.changes/unreleased/Features-20231101-101845.yaml b/.changes/unreleased/Features-20231101-101845.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Support unit testing incremental models
+time: 2023-11-01T10:18:45.341781-04:00
+custom:
+  Author: michelleark
+  Issue: "8422"
diff --git a/.changes/unreleased/Features-20231106-194752.yaml b/.changes/unreleased/Features-20231106-194752.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Add support of csv file fixtures to unit testing
+time: 2023-11-06T19:47:52.501495-06:00
+custom:
+  Author: emmyoop
+  Issue: "8290"
diff --git a/.changes/unreleased/Features-20231107-231006.yaml b/.changes/unreleased/Features-20231107-231006.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Unit tests support --defer and state:modified
+time: 2023-11-07T23:10:06.376588-05:00
+custom:
+  Author: jtcohen6
+  Issue: "8517"
diff --git a/.changes/unreleased/Features-20231111-191150.yaml b/.changes/unreleased/Features-20231111-191150.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Support source inputs in unit tests
+time: 2023-11-11T19:11:50.870494-05:00
+custom:
+  Author: gshank
+  Issue: "8507"
diff --git a/.changes/unreleased/Features-20231114-101555.yaml b/.changes/unreleased/Features-20231114-101555.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Use daff to render diff displayed in stdout when unit test fails
+time: 2023-11-14T10:15:55.689307-05:00
+custom:
+  Author: michelleark
+  Issue: "8558"
diff --git a/.changes/unreleased/Features-20231116-144006.yaml b/.changes/unreleased/Features-20231116-144006.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Move unit testing to test command
+time: 2023-11-16T14:40:06.121336-05:00
+custom:
+  Author: gshank
+  Issue: "8979"
diff --git a/.changes/unreleased/Features-20231130-130948.yaml b/.changes/unreleased/Features-20231130-130948.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Support unit tests in non-root packages
+time: 2023-11-30T13:09:48.206007-05:00
+custom:
+  Author: gshank
+  Issue: "8285"
diff --git a/.changes/unreleased/Features-20231205-131717.yaml b/.changes/unreleased/Features-20231205-131717.yaml
@@ -0,0 +1,7 @@
+kind: Features
+body: Convert the `tests` config to `data_tests` in both dbt_project.yml and schema files.
+  in schema files.
+time: 2023-12-05T13:17:17.647765-06:00
+custom:
+  Author: emmyoop
+  Issue: "8699"
diff --git a/.changes/unreleased/Features-20231205-200447.yaml b/.changes/unreleased/Features-20231205-200447.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: Make fixture files full-fledged parts of the manifest and enable partial parsing
+time: 2023-12-05T20:04:47.117029-05:00
+custom:
+  Author: gshank
+  Issue: "9067"
diff --git a/.changes/unreleased/Features-20231212-150556.yaml b/.changes/unreleased/Features-20231212-150556.yaml
@@ -0,0 +1,6 @@
+kind: Features
+body: In build command run unit tests before models
+time: 2023-12-12T15:05:56.778829-05:00
+custom:
+  Author: gshank
+  Issue: "9128"
diff --git a/.changes/unreleased/Fixes-20231113-154535.yaml b/.changes/unreleased/Fixes-20231113-154535.yaml
@@ -0,0 +1,6 @@
+kind: Fixes
+body: Use seed file from disk for unit testing if rows not specified in YAML config
+time: 2023-11-13T15:45:35.008565Z
+custom:
+  Author: aranke
+  Issue: "8652"
diff --git a/.changes/unreleased/Under the Hood-20230912-190506.yaml b/.changes/unreleased/Under the Hood-20230912-190506.yaml
@@ -0,0 +1,6 @@
+kind: Under the Hood
+body: Add unit testing functional tests
+time: 2023-09-12T19:05:06.023126-04:00
+custom:
+  Author: gshank
+  Issue: "8512"
diff --git a/core/dbt/adapters/base/relation.py b/core/dbt/adapters/base/relation.py
@@ -214,7 +214,7 @@ def add_ephemeral_prefix(name: str):
     def create_ephemeral_from(
         cls: Type[Self],
         relation_config: RelationConfig,
-        limit: Optional[int],
+        limit: Optional[int] = None,
     ) -> Self:
         # Note that ephemeral models are based on the name.
         identifier = cls.add_ephemeral_prefix(relation_config.name)

diff --git a/core/dbt/adapters/events/adapter_types_pb2.py b/core/dbt/adapters/events/adapter_types_pb2.py
diff --git a/core/dbt/adapters/include/global_project/macros/materializations/tests/helpers.sql b/core/dbt/adapters/include/global_project/macros/materializations/tests/helpers.sql
@@ -12,3 +12,31 @@
       {{ "limit " ~ limit if limit != none }}
     ) dbt_internal_test
 {%- endmacro %}
+
+
+{% macro get_unit_test_sql(main_sql, expected_fixture_sql, expected_column_names) -%}
+  {{ adapter.dispatch('get_unit_test_sql', 'dbt')(main_sql, expected_fixture_sql, expected_column_names) }}
+{%- endmacro %}
+
+{% macro default__get_unit_test_sql(main_sql, expected_fixture_sql, expected_column_names) -%}
+-- Build actual result given inputs
+with dbt_internal_unit_test_actual AS (
+  select
+    {% for expected_column_name in expected_column_names %}{{expected_column_name}}{% if not loop.last -%},{% endif %}{%- endfor -%}, {{ dbt.string_literal("actual") }} as actual_or_expected
+  from (
+    {{ main_sql }}
+  ) _dbt_internal_unit_test_actual
+),
+-- Build expected result
+dbt_internal_unit_test_expected AS (
+  select
+    {% for expected_column_name in expected_column_names %}{{expected_column_name}}{% if not loop.last -%}, {% endif %}{%- endfor -%}, {{ dbt.string_literal("expected") }} as actual_or_expected
+  from (
+    {{ expected_fixture_sql }}
+  ) _dbt_internal_unit_test_expected
+)
+-- Union actual and expected results
+select * from dbt_internal_unit_test_actual
+union all
+select * from dbt_internal_unit_test_expected
+{%- endmacro %}
diff --git a/core/dbt/adapters/include/global_project/macros/materializations/tests/unit.sql b/core/dbt/adapters/include/global_project/macros/materializations/tests/unit.sql
@@ -0,0 +1,29 @@
+{%- materialization unit, default -%}
+
+  {% set relations = [] %}
+
+  {% set expected_rows = config.get('expected_rows') %}
+  {% set tested_expected_column_names = expected_rows[0].keys() if (expected_rows | length ) > 0 else get_columns_in_query(sql) %} %}
+
+  {%- set target_relation = this.incorporate(type='table') -%}
+  {%- set temp_relation = make_temp_relation(target_relation)-%}
+  {% do run_query(get_create_table_as_sql(True, temp_relation, get_empty_subquery_sql(sql))) %}
+  {%- set columns_in_relation = adapter.get_columns_in_relation(temp_relation) -%}
+  {%- set column_name_to_data_types = {} -%}
+  {%- for column in columns_in_relation -%}
+  {%- do column_name_to_data_types.update({column.name: column.dtype}) -%}
+  {%- endfor -%}
+
+  {% set unit_test_sql = get_unit_test_sql(sql, get_expected_sql(expected_rows, column_name_to_data_types), tested_expected_column_names) %}
+
+  {% call statement('main', fetch_result=True) -%}
+
+    {{ unit_test_sql }}
+
+  {%- endcall %}
+
+  {% do adapter.drop_relation(temp_relation) %}
+
+  {{ return({'relations': relations}) }}
+
+{%- endmaterialization -%}
diff --git a/core/dbt/adapters/include/global_project/macros/unit_test_sql/get_fixture_sql.sql b/core/dbt/adapters/include/global_project/macros/unit_test_sql/get_fixture_sql.sql
@@ -0,0 +1,76 @@
+{% macro get_fixture_sql(rows, column_name_to_data_types) %}
+-- Fixture for {{ model.name }}
+{% set default_row = {} %}
+
+{%- if not column_name_to_data_types -%}
+{%-   set columns_in_relation = adapter.get_columns_in_relation(this) -%}
+{%-   set column_name_to_data_types = {} -%}
+{%-   for column in columns_in_relation -%}
+{%-     do column_name_to_data_types.update({column.name: column.dtype}) -%}
+{%-   endfor -%}
+{%- endif -%}
+
+{%- if not column_name_to_data_types -%}
+    {{ exceptions.raise_compiler_error("Not able to get columns for unit test '" ~ model.name ~ "' from relation " ~ this) }}
+{%- endif -%}
+
+{%- for column_name, column_type in column_name_to_data_types.items() -%}
+    {%- do default_row.update({column_name: (safe_cast("null", column_type) | trim )}) -%}
+{%- endfor -%}
+
+{%- for row in rows -%}
+{%-   do format_row(row, column_name_to_data_types) -%}
+{%-   set default_row_copy = default_row.copy() -%}
+{%-   do default_row_copy.update(row) -%}
+select
+{%-   for column_name, column_value in default_row_copy.items() %} {{ column_value }} AS {{ column_name }}{% if not loop.last -%}, {%- endif %}
+{%-   endfor %}
+{%-   if not loop.last %}
+union all
+{%    endif %}
+{%- endfor -%}
+
+{%- if (rows | length) == 0 -%}
+    select
+    {%- for column_name, column_value in default_row.items() %} {{ column_value }} AS {{ column_name }}{% if not loop.last -%},{%- endif %}
+    {%- endfor %}
+    limit 0
+{%- endif -%}
+{% endmacro %}
+
+
+{% macro get_expected_sql(rows, column_name_to_data_types) %}
+
+{%- if (rows | length) == 0 -%}
+    select * FROM dbt_internal_unit_test_actual
+    limit 0
+{%- else -%}
+{%- for row in rows -%}
+{%- do format_row(row, column_name_to_data_types) -%}
+select
+{%- for column_name, column_value in row.items() %} {{ column_value }} AS {{ column_name }}{% if not loop.last -%}, {%- endif %}
+{%- endfor %}
+{%- if not loop.last %}
+union all
+{% endif %}
+{%- endfor -%}
+{%- endif -%}
+
+{% endmacro %}
+
+{%- macro format_row(row, column_name_to_data_types) -%}
+
+{#-- wrap yaml strings in quotes, apply cast --#}
+{%- for column_name, column_value in row.items() -%}
+{% set row_update = {column_name: column_value} %}
+{%- if column_value is string -%}
+{%- set row_update = {column_name: safe_cast(dbt.string_literal(column_value), column_name_to_data_types[column_name]) } -%}
+{%- elif column_value is none -%}
+{%- set row_update = {column_name: safe_cast('null', column_name_to_data_types[column_name]) } -%}
+{%- else -%}
+{%- set row_update = {column_name: safe_cast(column_value, column_name_to_data_types[column_name]) } -%}
+{%- endif -%}
+{%- do row.update(row_update) -%}
+{%- endfor -%}
+
+{%- endmacro -%}
diff --git a/core/dbt/adapters/relation_configs/config_change.py b/core/dbt/adapters/relation_configs/config_change.py
@@ -12,7 +12,7 @@ class RelationConfigChangeAction(StrEnum):
     drop = "drop"
 
 
-@dataclass(frozen=True, eq=True, unsafe_hash=True)
+@dataclass(frozen=True, eq=True, unsafe_hash=True)  # type: ignore
 class RelationConfigChange(RelationConfigBase, ABC):
     action: RelationConfigChangeAction
     context: Hashable  # this is usually a RelationConfig, e.g. IndexConfig, but shouldn't be limited

diff --git a/core/dbt/clients/jinja.py b/core/dbt/clients/jinja.py
@@ -84,6 +84,26 @@ def __call__(self, *args, **kwargs):
             return self.call_macro(*args, **kwargs)
 
 
+class UnitTestMacroGenerator(MacroGenerator):
+    # this makes UnitTestMacroGenerator objects callable like functions
+    def __init__(
+        self,
+        macro_generator: MacroGenerator,
+        call_return_value: Any,
+    ) -> None:
+        super().__init__(
+            macro_generator.macro,
+            macro_generator.context,
+            macro_generator.node,
+            macro_generator.stack,
+        )
+        self.call_return_value = call_return_value
+
+    def __call__(self, *args, **kwargs):
+        with self.track_call():
+            return self.call_return_value
+
+
 # performance note: Local benmcharking (so take it with a big grain of salt!)
 # on this indicates that it is is on average slightly slower than
 # checking two separate patterns, but the standard deviation is smaller with

diff --git a/core/dbt/compilation.py b/core/dbt/compilation.py
@@ -11,8 +11,11 @@
 from dbt.flags import get_flags
 from dbt.adapters.factory import get_adapter
 from dbt.clients import jinja
+from dbt.context.providers import (
+    generate_runtime_model_context,
+    generate_runtime_unit_test_context,
+)
 from dbt_common.clients.system import make_directory
-from dbt.context.providers import generate_runtime_model_context
 from dbt.contracts.graph.manifest import Manifest, UniqueID
 from dbt.contracts.graph.nodes import (
     ManifestNode,
@@ -21,6 +24,8 @@
     GraphMemberNode,
     InjectedCTE,
     SeedNode,
+    UnitTestNode,
+    UnitTestDefinition,
 )
 from dbt.exceptions import (
     GraphDependencyNotFoundError,
@@ -43,7 +48,8 @@
 def print_compile_stats(stats):
     names = {
         NodeType.Model: "model",
-        NodeType.Test: "test",
+        NodeType.Test: "data test",
+        NodeType.Unit: "unit test",
         NodeType.Snapshot: "snapshot",
         NodeType.Analysis: "analysis",
         NodeType.Macro: "macro",
@@ -91,6 +97,7 @@ def _generate_stats(manifest: Manifest):
     stats[NodeType.Macro] += len(manifest.macros)
     stats[NodeType.Group] += len(manifest.groups)
     stats[NodeType.SemanticModel] += len(manifest.semantic_models)
+    stats[NodeType.Unit] += len(manifest.unit_tests)
 
     # TODO: should we be counting dimensions + entities?
 
@@ -128,7 +135,7 @@ class Linker:
     def __init__(self, data=None) -> None:
         if data is None:
             data = {}
-        self.graph = nx.DiGraph(**data)
+        self.graph: nx.DiGraph = nx.DiGraph(**data)
 
     def edges(self):
         return self.graph.edges()
@@ -191,6 +198,8 @@ def link_graph(self, manifest: Manifest):
             self.link_node(exposure, manifest)
         for metric in manifest.metrics.values():
             self.link_node(metric, manifest)
+        for unit_test in manifest.unit_tests.values():
+            self.link_node(unit_test, manifest)
         for saved_query in manifest.saved_queries.values():
             self.link_node(saved_query, manifest)
 
@@ -234,6 +243,7 @@ def add_test_edges(self, manifest: Manifest) -> None:
                 # Get all tests that depend on any upstream nodes.
                 upstream_tests = []
                 for upstream_node in upstream_nodes:
+                    # This gets tests with unique_ids starting with "test."
                     upstream_tests += _get_tests_for_node(manifest, upstream_node)
 
                 for upstream_test in upstream_tests:
@@ -291,8 +301,10 @@ def _create_node_context(
         manifest: Manifest,
         extra_context: Dict[str, Any],
     ) -> Dict[str, Any]:
-
-        context = generate_runtime_model_context(node, self.config, manifest)
+        if isinstance(node, UnitTestNode):
+            context = generate_runtime_unit_test_context(node, self.config, manifest)
+        else:
+            context = generate_runtime_model_context(node, self.config, manifest)
         context.update(extra_context)
 
         if isinstance(node, GenericTestNode):
@@ -460,6 +472,7 @@ def compile(self, manifest: Manifest, write=True, add_test_edges=False) -> Graph
         summaries["_invocation_id"] = get_invocation_id()
         summaries["linked"] = linker.get_graph_summary(manifest)
 
+        # This is only called for the "build" command
         if add_test_edges:
             manifest.build_parent_and_child_maps()
             linker.add_test_edges(manifest)
@@ -526,6 +539,9 @@ def compile_node(
         the node's raw_code into compiled_code, and then calls the
         recursive method to "prepend" the ctes.
         """
+        if isinstance(node, UnitTestDefinition):
+            return node
+
         # Make sure Lexer for sqlparse 0.4.4 is initialized
         from sqlparse.lexer import Lexer  # type: ignore