-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2070] [Feature] Materialized tests #6914
[CT-2070] [Feature] Materialized tests #6914
Comments
Agree that it makes sense to keep tests as a separate concern, even if you can materialize them. In general, this sounds like a great add-on to the existing functionality of In |
Let's suppose we replace "
As Florian mentioned, this feature introduces a case where both Here's some pseudo code to implement the precedence outlined above, even when the values conflict with each other:
|
@colin-rogers-dbt raised a relevant edge case here. The specific concern was described as:
This is essentially the following question:
Read on for a proposed UX for how to turn it off at a granular level. UXWe would need some valid value that means "don't actually create any database object -- we want something only fleeting, momentary, evanescent". The value that makes most intuitive sense to me is Assuming Singular test configured in SQL: # models/_models.yml
models:
- name: my_model
columns:
- name: id
tests:
# built-in generic test
- not_null:
config:
store_failures_as: ephemeral Generic test configured in YAML: -- tests/singular/check_duplicate_column.sql
{{ config(store_failures_as="ephemeral") }}
-- custom singular test
select 1 as id
where 1=0
|
Schrödinger's deprecation principle
Of course this a specific case of a more general design principle:
Applying the principleSuppose we don't add a value that means "don't actually create any database object". Then the only way to allow users to configure this is:
In this case, turning off storing failures would come with at least three downsides:
It would be harder to deprecate because Click to toggle config examplesExamplesSingular test configured in SQL: # models/_models.yml
models:
- name: my_model
columns:
- name: id
tests:
# built-in generic test
- not_null:
config:
store_failures: false
store_failures_as: Generic test configured in YAML: -- tests/singular/check_duplicate_column.sql
{{ config(
store_failures=false,
store_failures_as=none),
}}
-- custom singular test
select 1 as id
where 1=0 |
@dbeatty10 / @Fleid I think |
This one is about making sure tests stay relevant as we expand into low latency scenarios with Materialized Views (MVs).
Prior art can be found in
dbt-materialize
. More details on the benefits here.Let's say you materialized a model with a MV, this is
agg_daily_active_users
:I'm defining a test to ensure data quality:
Currently, when
dbt run
is triggered (Wednesday the 8th, 2023 at 4pm Pacific Time), dbt will execute the underlying SQL query and returns the result in the log.Ok I know about this one, there was some returns that were processed in a bizarre way, it should be solved in the coming day so I'm happy to ignore it.
That's excellent when you know the data is not moving when dbt is not looking. But what happens when it is, like it does with MVs? Then I want my tests to be run continuously. What I want is to make that test queryable so I can get the current status of my MV
I could run
dbt test --select agg_daily_active_users
instead. But then I'm forced to switch context between where my data is flowing (my database engine, via the materialized view) and where my test results appear. The best experience for me is to stay in the database for that.I could run
dbt test --select agg_daily_active_users --store-failures
instead. But that may be slow and expensive, while making the operation a 2 step process (refresh the table, query the table).What would be nice is to tell dbt that I want my test query materialized as a view (or a MV) instead, via a configuration:
Which allows me to issue to debug my data with the following query:
Which when I query it tomorrow will give me the following alarming situation, without having to use dbt.
Should we deprecate
store-failures
?The run level flag? Certainly not. The intent is different. With
--store-failures
I'm deciding at run time to persist my test results, independently of my test materializations.The test level config? I think setting
materialized="table"
is equivalent tostore_failures = true
. I don't want to start the deprecation process ofstore_failures = true
yet though. Let's punt that for later.In terms of precedence of configurations:
materialized
configuration is top level, it takes precedence overstore_failures
and the flag (they are ignored)materialized
configuration is omitted, thestore_failures
config will take precedence over the presence or absence of the--store-failures
flag (it is ignored)materialized
configuration is omitted, or thestore_failures
config isnone
or is omitted, the resource will use the value of the--store-failures
flag (current behavior)Available materializations
We should support
view
andtable
for now.Out of scope at the moment, we will support
materialized_view
at some point. Because warehouses may be able to surface the history of MVs (change log), which would allow me to get a list of all the transient errors that happened when I was not looking at the test MV.Scope of the configuration
This configuration is a "tests config". It should be available at every level where a test configuration is offered and respect normal inheritance rules.
Describe alternatives you've considered
Just use a model instead of a test. Because that's basically what we're discussing here.
But in my opinion tests are tests not because their output is a collection of rows, a csv, a table or a view. They are tests because of their specific intent, their life cycle, and the fact they don't belong on the DAG.
I still want to be able to
dbt test
, even when all my tests are materialized as views.The text was updated successfully, but these errors were encountered: