You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We would like to incrementally update individual rows based on a LastModified timestamp at row level.
Are you a dlt user?
Yes, I run dlt in production.
Use case
We use the filesystem module to incrementally load data into a database, determined by each file's modification_date.
Now, we want to add another condition to filter out outdated data.
Our data is stored in jsonl files, each containing 1-n individual records. Each record has a unique primary key and an individual LastModified timestamp. We would like to update each row in the database only if a new record has a more recent LastModified timestamp for the same primary key.
We tried implementing this via the dlt.sources.incremental functionality, but as far as we understand, this tracks a LastModified value for the entire table but not for each record as we would need it.
As we receive data in batches and cannot control the order of updates to individual rows, this is not sufficient.
Proposed solution
No response
Related issues
No response
The text was updated successfully, but these errors were encountered:
Hey @trin94, if you just merge all incoming data on the primary key, would this not work? Or so sometimes batches come that have rows with last_modified timestamps that are older than the one in the db?
Hi @sh-rp, unfortunately, yes, that is possible. In our use case (@trin94 and mine), we sometimes receive a JSONL file where the modification_date is newer than that of all previous files. Some records in the file may have a LastModified date that is newer than what's currently in the database for their primary key (PK), and these need to be loaded. However, other records in the same file might not be the most recent version for their PK, as the latest version is already in the database (with a newer LastModified-Timestamp). Those records in the file need to be ignored.
Feature description
We would like to incrementally update individual rows based on a
LastModified
timestamp at row level.Are you a dlt user?
Yes, I run dlt in production.
Use case
We use the filesystem module to incrementally load data into a database, determined by each file's modification_date.
Now, we want to add another condition to filter out outdated data.
Our data is stored in jsonl files, each containing 1-n individual records. Each record has a unique primary key and an individual
LastModified
timestamp. We would like to update each row in the database only if a new record has a more recentLastModified
timestamp for the same primary key.We tried implementing this via the
dlt.sources.incremental
functionality, but as far as we understand, this tracks aLastModified
value for the entire table but not for each record as we would need it.As we receive data in batches and cannot control the order of updates to individual rows, this is not sufficient.
Proposed solution
No response
Related issues
No response
The text was updated successfully, but these errors were encountered: