You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently you need to specify a primary key to load data incrementally through a merge Disposition.
It's not uncommon - especially with "reporting API's" - that you don't have any specific unique key and also don't need one.
Currency you would need to build a surrogate/primary key to do incremental loads.
How about replacing the data not through the primary_key but through a time period?
Example:
you pull data since 2024-03-01
you delete data in the source since 2024-03-01
you upload the data from step 1 into the source
Adding also a end_date makes sense but I wanted to keep the example simple
Are you a dlt user?
Yes, I run dlt in production.
Use case
Sometimes building the primary key to load incrementally is cumbersome to build (e.g. because in nested dictionaries)
So you need to do more transformations before loading than needed.
Proposed solution
Specify
a partition_type (e.g. "key" or "time") and
a partition_column (the name of the key- or time-column)
"Key" would work as currently.
"Time" would
Identify the min() and max() value of the new increment to be uploaded
Delete everything between min and max in the source
Upload the new increment to the source
Related issues
No response
The text was updated successfully, but these errors were encountered:
you can use it instead or with primary key to replace partitions of data (ie. days).
for completely custom partitions you can generate a merge column by adding add_map on the resource, you can approximate more granular time ranges ie. updated_at but with hourly resolution let's you replace data with hourly granularity. https://dlthub.com/docs/general-usage/resource#filter-transform-and-pivot-data
I think our merge_key documentation is lacking. We'll try to improve it
@karakanb my learning from linked issue is that we should disable deduplication if merge key is present and primary key is set... which is IMO expected behavior as now the "deduplication" should happen via merge key upstream. that should fix the the issue you describe
Feature description
Currently you need to specify a primary key to load data incrementally through a merge Disposition.
It's not uncommon - especially with "reporting API's" - that you don't have any specific unique key and also don't need one.
Currency you would need to build a surrogate/primary key to do incremental loads.
How about replacing the data not through the primary_key but through a time period?
Example:
Adding also a end_date makes sense but I wanted to keep the example simple
Are you a dlt user?
Yes, I run dlt in production.
Use case
Sometimes building the primary key to load incrementally is cumbersome to build (e.g. because in nested dictionaries)
So you need to do more transformations before loading than needed.
Proposed solution
Specify
"Key" would work as currently.
"Time" would
Related issues
No response
The text was updated successfully, but these errors were encountered: