-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: retry 404 errors in Client.query(...)
#2135
base: main
Are you sure you want to change the base?
Conversation
if job_retry is None: | ||
future = do_query() | ||
else: | ||
future = job_retry(do_query)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: even though I add the retry here, we'll still need the one in QueryJob
, because BigQuery queries can be started successfully but still fail for a retriablle reason. By adding the retry here, we can retry the queries that fail for a retriable reason.
google/cloud/bigquery/retry.py
Outdated
# Per https://github.com/googleapis/python-bigquery/issues/2134, sometimes | ||
# we get a 404 error. In this case, if we get this far, assume that the job | ||
# doesn't actually exist and try again. | ||
or isinstance(exc, exceptions.NotFound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: Does this make it so that the queries retry if the table is not found? Or is this only going to happen for query job not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢 It does hang / retry too much. Tested with
import google.cloud.bigquery
client = google.cloud.bigquery.Client()
job = client.query("SELECT * FROM `swast-scratch.my_dataset.doesntexistnoreally`")
job.result()
Need to play a similar trick as with get_job_retry
, instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest commit.
I confirmed with my colleague that after at |
get_job_retry = retry.with_predicate( | ||
lambda exc: isinstance(exc, core_exceptions.NotFound) | ||
# Reference the original retry to avoid recursion. | ||
or retry._predicate(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthea Do you have any thoughts on how stable this will be? I couldn't find any public way to pull out the previous predicate from a Retry object when looking at the https://github.com/googleapis/python-api-core/blob/main/google/api_core/retry/retry_base.py code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to consider: do something like I did with retry versus job_retry and introduce more parameters for job_insert_retry and job_get_retry or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed googleapis/python-api-core#796, in the meantime I'll pursue an alternative that doesn't require access to private attributes.
return True | ||
|
||
# Reference the original job_retry to avoid recursion. | ||
return job_retry._predicate(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthea likewise here. I ended up using a slightly different pattern here due to the double nesting of retry objects.
system-3.12 failure: Edit: Actually, it does appear to be related. It's a different kind of 404 -- trying to query a table in a different location. I'll see if I can disambiguate that from the job 404. Edit2: fixed in 8e20cb0. I use the message to disambiguate job not found from dataset/table not found based on the stack trace in #2134. System test passing locally (and much more quickly). |
Kokoro (https://btx.cloud.google.com/invocations/91aed90f-987e-4200-be14-09717b48923f/targets/cloud-devrel%2Fclient-libraries%2Fpython%2Fgoogleapis%2Fpython-bigquery%2Fpresubmit%2Fpresubmit/log) and system-3.12 and snippets tests are passing. Ready for review! |
Cover is failing: I'll see if I can update a unit test for this soon. |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #2134 🦕