sql_select input table pagination? #1003
Replies: 3 comments 6 replies
-
Hey @scroback77, the rows returned from the query are streamed so they aren't loaded into memory at the same time by Benthos, but you're still limited by the execution time of the query. The query does support dynamic arguments but pagination is awkward (you'd need to do it by timestamps so it'd be flakey) as you can't reference the context of the latest consumed row. I'm waiting for feedback on the current state of the component before expanding it but the next logical addition would be the ability to reference the last row consumed in the select statement arguments. We'd need to work out whether we'd need an entire new field to put a separate select for the very first grab, or whether it'd be enough to have a single select statement where the SQL itself would need to cater for the first query where no existing rows are loaded yet. |
Beta Was this translation helpful? Give feedback.
-
Hi, I was trying to move a huge postgres table from one db to another but although the sql_select is streaming it crashes the server. |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm interested in this feature. Looking at this I got into a possible example with the If the output used in a pipeline with the example of the
I think the cache should be updated in the last step of the pipeline, just after every object has been processed. Please, let me know if that input can be used as an example for SQL pagination or you have a better alternative. At least-once is needed in my case. Note: Remember that LIMIT+OFFSET could potentially do a full scan of your table, so try to paginate in another way for large datasets, like using indexes in the WHERE. |
Beta Was this translation helpful? Give feedback.
-
Hi,
I've seen the experimental sql_select input (https://www.benthos.dev/docs/components/inputs/sql_select/) and I'm a little unclear on how it works. Does this input do a massive select on a table and read it into memory before sending rows (or batches if specified) to the processors?
Some tables that we read from are massive and have significant performance issues with large selects. We use cockroachdb (which should be supported with the postgres driver), and we are trying to add indexes to our tables so that pagination will be much more efficient. Does the sql_select input paginate on the backend, and if so does it paginate via OFFSET/LIMIT, or with a WHERE on an index? This has a significant impact on whether we can integrate Benthos with our database and processing.
Thanks for the info!
Beta Was this translation helpful? Give feedback.
All reactions