Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Snapshot in new_record mode will add redundant entries for deleted records already flagged as deleted #654

Open
4 of 9 tasks
lpillmann opened this issue Jan 23, 2025 · 0 comments · May be fixed by #655
Open
4 of 9 tasks
Labels
feature:snapshots Issues related to the snapshot materialization pkg:dbt-bigquery Issue affects dbt-bigquery

Comments

@lpillmann
Copy link

lpillmann commented Jan 23, 2025

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Which packages are affected?

  • dbt-adapters
  • dbt-tests-adapter
  • dbt-athena
  • dbt-athena-community
  • dbt-bigquery
  • dbt-redshift
  • dbt-spark

Current Behavior

Snapshots using the new_record method for hard deletes will insert redundant entries on each execution for records already flagged with is_deleted as True.

The screenshot below demonstrates the effect of running the dbt snapshot multiple times, without any changes in the source table, and the many redundant records inserted. The records have the same unique_identifier value.

Image

Expected Behavior

One entry should be inserted upon the first time the record in the source is deleted. Subsequent snapshot runs should not insert new entries for records already marked with is_deleted True.

The expected result would be only 2 entries: initially with is_deleted as False and only a single entry with is_deleted as True no matter how many times the dbt snapshot command is run on top of the same source data.

Steps To Reproduce

  1. Create a table with a single record
create table dummy_table as 
    select
        1 as id,
        'first'::text as label,
        current_timestamp as created_at,
;
  1. Insert another record
insert into dummy_table (id, label, created_at) values (2, 'second', current_timestamp);
  1. Create and configure a snapshot in new_record mode for hard deletes
{% snapshot snapshot_dummy_table %}

  {{ 
      config(
          strategy = 'timestamp'
        , unique_key = 'id'
        , updated_at = 'created_at' 
        , hard_deletes = 'new_record'
      )
  }}

  SELECT * FROM dummy_table  --adjust to the full qualified path of your table, ommited for simplicity

{% endsnapshot %}
  1. Execute the snapshot for the first time and check result (both should be 2)
select * from dummy_table;  --2 records
--execute: dbt snapshot -s snapshot_dummy_table
select * from snapshot_dummy_table;  --2 records
  1. Delete one of the records from the source table
delete from dummy_table where id = 2;
select * from dummy_table;  --1 record
  1. Execute snapshot and check (OK, expected 3 in total)
--execute: dbt snapshot -s snapshot_dummy_table
select * from snapshot_dummy_table;  --3 records, 1 more for id = 2 with is deleted true
  1. Execute snapshot again and check (not OK, expected 3, got 4 records)
--execute: dbt snapshot -s snapshot_dummy_table
select * from snapshot_dummy_table;  --4 records, 1 more for id = 2 with is deleted true (repeated)
  1. Execute snapshot N times and verify that the snapshot output will have N new rows

Relevant log output

Environment

- OS: Ubuntu (WSL)
- Python: 3.9.16
- dbt-adapters: 1.13.2
- dbt-snowflake: 1.9.0

Additional Context

I identified the root cause of the problem and submitted a PR #655 with a fix. I used this recently merged PR as reference #385

colin-rogers-dbt pushed a commit that referenced this issue Feb 3, 2025
@amychen1776 amychen1776 added triage:product In Product's queue pkg:dbt-bigquery Issue affects dbt-bigquery feature:snapshots Issues related to the snapshot materialization and removed triage:product In Product's queue labels Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:snapshots Issues related to the snapshot materialization pkg:dbt-bigquery Issue affects dbt-bigquery
Projects
None yet
2 participants