Replies: 1 comment
-
Error message
Snapshot source unique_key is not actually uniqueAdd a source table that we will use in our snapshot: -- models/my_source.sql
{{ config(materialized = 'table') }}
select 1 as user_id, 'active' as user_state
union all
select 2 as user_id, 'active' as user_state And our snapshot itself: -- snapshots/snappy.sql
{% snapshot snappy %}
{{
config(
target_database = 'cse-sandbox-319708',
target_schema = 'snapshots',
unique_key = 'user_id',
strategy = 'check',
check_cols = ['user_state']
)
}}
select * from {{ ref('my_source') }}
{% endsnapshot %} Do a Now, lets change our source to introduce a duplicate: -- models/my_source.sql
{{ config(materialized = 'table') }}
select 1 as user_id, 'inactive' as user_state
union all
select 1 as user_id, 'unknown' as user_state Do a
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Might be good to have a post discussing all the ways duplicates can be introduced. The examples below show that this is almost always due to a duplicate occurring in the source table.
Error message
A subsequent run of an incremental model with duplicates in the source data.
Let's assume we have an incremental model like the following:
A
dbt run --full-refresh
results in the following log:Now let's change our simulated1 data source and make it so that duplicates were introduced:
A
dbt run
(incremental) results in the following log:If we look close at the debug logs:
There were > 1 row in our source table with the same
user_id
(which we've set to be ourunique_key
) so we don't actually know which source (2 as status
or3 as status
) should we have used to update the row that currently exists in our destination table.Note that if you were to
dbt run --full-refresh
with the incremental that had dupes shown above, no errors would be thrown because we would essentially be recreating the table from scratch (and not using themerge
statement which is used on subsequent runsdbt run
(sans--full-refresh
). But the very nextdbt run
will reintroduce ourDuplicate row detected
error.Footnotes
We're simulating all the source data here but the duplicates could easily come from some other actual model that is being selected from in your
my_incremental
model. ↩Beta Was this translation helpful? Give feedback.
All reactions