-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new record in snapshot to track deleted state (is_deleted column, zombie records) #10235
Comments
@graciegoheen @gshank this summary of #8207 seems like its discussing a deleted flag for a row, whereas 8207 was the treatment of a column in a row when the column was removed in source. DBT handles a new column from source gracefully and records the change but the reverse was not catered for. That was the premise of 8207. from reading this issue I do not get the same sense. |
Thanks for clarifying @owen-mc-git - I'll re-open #8207! |
Migrations for
|
can this above config be used and is available for general usage? |
@aditya96166 Not yet, but we are planning this for dbt-core 1.9, for which a beta will be available soon. |
Description
Imagine
order_id=1
is deleted from the table you're snapshotting.If
invalidate_hard_deletes=true
, deleted records are invalidated by settingdbt_valid_to
to the current timestamp:If
invalidate_harde_deletes=false
, deleted records are not invalidated,dbt_valid_to
remains asNULL
(you can’t really tell if they’ve been deleted):What folks WANT, is tracking the “deleted” state as a state the record is in:
and maybe later it comes back as a zombie record “it comes back to life”:
Here’s the same exact same data as it exists today if you were using the
invalidate_hard_deletes
config. Note how there are no rows fororder_id=1
between the time period of2024-05-20
to2024-06-03
— it has a “gap” during that time period. Contrast this with the proposedis_deleted
column above that has “no gaps”.Acceptance Criteria
Provide users a “new way to track deleted records”
hard_deletes
(name tbd)hard_deletes = 'ignore'
, current behavior wheninvalidate_hard_deletes
is not set todayinvalidate_hard_deletes
config, by settinghard_deletes = 'invalidate'
(name tbd; other ideas'invalidate_current_record'
)hard_deletes = 'new_record'
(name tbd; other ideas'track'
,'insert_new_record'
,'is_deleted_flag'
, 'is_deleted_indicator'
,'new_row'
)updated_at
field has not changeddbt_is_deleted
meta-field whenhard_deletes = 'invalidate'
?Notes
dbt_is_deleted
ordbt_deleted_at
? i thinkdbt_is_deleted
makes more sense because the deleted at timestamp can be derived from whateverdbt_valid_from
is for the deleted rowThe text was updated successfully, but these errors were encountered: