Have Option to Limit Number of Records per Sync #29324
Replies: 2 comments 1 reply
-
@AlxGlx one problem of having this is you ending up with a fraction of your total data. Maybe is something you're planning too but overall it looks a "dangerous" feature. For your use case it is for API or databases? If the second you can always create a view to manage the number of records from your side. |
Beta Was this translation helpful? Give feedback.
-
Hi @marcosmarxm, this is currently something I would like to do with Airbyte. Situation: We have a user with a database e.g. BigQuery and want them to connect to their datasource and sync some data to our destination. Problem: We can't predict or control how much data will be pulled from the source and sent to our destination. We want to predict costs and obviously our infrastructure can only handle a certain amount of data or scale up so far. For example, if our users have a quota and can only sync X number of records or Y amount of data, when the table the user requests to sync is 1000000000000 rows and TB of data, we can't handle that. We could write our own code to check the number of rows, but it's impractical to do this for each and every airbyte source and is the reason we use Airbyte for ELT in the first place. One thing I've thought of trying is to listen to the "Successful syncs" webhook, extract the job ID from it, and attempt to cancel if the number of rows is too high. But this happens after the queries have already been made and data is already moving to the destination, and is an unreliable hack. Thoughts? |
Beta Was this translation helpful? Give feedback.
-
What
Give option to limit sync by X number of records.
Why
Useful when you need to test the end-to-end or scheduling of and automated data feed, or need to limit data ingress per run.
How (potentially)
Adding in "Sync Limit" variable for limit by # records or (size in MB?) in the Connections -> [ConnectionName] -> Replication screen.
Beta Was this translation helpful? Give feedback.
All reactions