-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAP-727] [Bug] Redshift adapter does not retry on connection timeouts #556
Comments
Update - I have not yet been able to repro this in GitHub's CI runners, so it's possible this error is due to some local machine configuration. Hopefully somebody better versed in the Redshift connector error state reporting can figure this out, because I'm stumped. Also worth noting, this does not happen if I use the sqlalchemy-redshift library with redshift_connector specified. |
Hey @tlento, I don't know how I feel about the premise, it feels like an edge case. You are doing high concurrency work on an adapter that's not designed for that. I'm saying that because I won't quality this issue as a bug, but rather an enhancement. Now to unlock you:
|
Thanks @Fleid, I tagged it as a bug in case this was happening for all timeouts but if it's just this scenario I agree it's not really an adapter bug that needs fixing but more of a nice to have support enhancement around retry behavior. I didn't bother putting up a PR to add InterfaceError because I don't have any idea what that opens up to retries. We are not currently blocked - our CI test suite seems to be fine and that's at the edge of what we might expect an end user doing metrics development to dispatch - so this more of a developer nuisance than anything else. Hopefully the AWS folks have some insights to share! |
Since the PR on the redshift_connector side is merged, shall we go ahead and close out this issue? @Fleid |
hey @jiezhen-chen has the change been released to PyPI? If so, that's awesome! The next step is to open a PR to increment the version defined in setup.py Lines 88 to 90 in e95685f
|
Is this a new bug in dbt-redshift?
Current Behavior
When running a large number of concurrent queries through the dbt-redshift adapter (e.g., by running the MetricFlow test suite using a dbt-redshift client), we observe intermittent failures with this message in the logs:
We tried changing the retry count to a larger number, and this made no apparent difference. According to the stack trace, the retry is not happening, and on further inspection we discovered that the InterfaceError is not considered a retryable exception:
https://github.com/dbt-labs/dbt-redshift/blob/main/dbt/adapters/redshift/connections.py#L335-L337
Stack trace excerpt:
We can see at the end that the error is thrown out of the broader except block, rather than the retry handling.
Expected Behavior
I expected these errors to be retried, but I understand why that isn't happening.
I would also expect redshift_connector to not hide this inside of an InterfaceException, that's just odd, but it is what it is.
Steps To Reproduce
I think the only way to do this is to run lots of concurrent queries on an undersized cluster until you can trigger intermittent failures of this type. I've got a PR up that might trigger this in the MetricFlow CI as well, I'll see if it happens there. If it's just on my local machine maybe I need to do whatever that weird Redshift override thing is.
Relevant log output
No response
Environment
Additional Context
I am not using dbt proper, just the adapter. See dbt-labs/metricflow#678
The text was updated successfully, but these errors were encountered: