Full table replication failing due to query timeout #234

XiaozhouWang85 · 2024-04-20T15:31:37Z

Describe the bug
Full table replication fails on larger tables. Process fails with the following error message:
level=CRITICAL message=canceling statement due to statement timeout cmd_type=elb consumer=False name=tap-postgres producer=True stdio=stderr string_id=tap-postgres

After investigating, the issues was traced to the fact that the sql statement submitted was something like:
SELECT ..... FROM ..... ORDER BY xmin::text::bigint ASC

The ORDER BY caused the query run time to exceed 30 seconds. After running for 30 seconds, the process fails due to timeout. Removing the ORDER BY results in an near instantaneous return of data.

I get that the xmin allows for restarts but this seems only relevant for larger tables (small tables download in seconds for me) and larger tables will cause the entire process to fail.

To Reproduce
Steps to reproduce the behavior:

Run full table replication against a large table
meltano run tap-postgres target-s3-jsonl
See error

Expected behavior
The extractor should be able to handle larger tables without erroring out. Even if the default behaviour causes errors to be thrown, this issue to be possible to bypass using configurations. I needed to fork the extractor and remove the ORDER BY in order to fix the issue.

Your environment

Version of tap: 2.1.0
meltano:v3.3.0

The text was updated successfully, but these errors were encountered:

XiaozhouWang85 added the bug Something isn't working label Apr 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full table replication failing due to query timeout #234

Full table replication failing due to query timeout #234

XiaozhouWang85 commented Apr 20, 2024

Full table replication failing due to query timeout #234

Full table replication failing due to query timeout #234

Comments

XiaozhouWang85 commented Apr 20, 2024