-
Notifications
You must be signed in to change notification settings - Fork 319
fix: retry 404 errors in Client.query(...)
#2135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if job_retry is None: | ||
future = do_query() | ||
else: | ||
future = job_retry(do_query)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: even though I add the retry here, we'll still need the one in QueryJob
, because BigQuery queries can be started successfully but still fail for a retriablle reason. By adding the retry here, we can retry the queries that fail for a retriable reason.
google/cloud/bigquery/retry.py
Outdated
# Per https://github.com/googleapis/python-bigquery/issues/2134, sometimes | ||
# we get a 404 error. In this case, if we get this far, assume that the job | ||
# doesn't actually exist and try again. | ||
or isinstance(exc, exceptions.NotFound) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: Does this make it so that the queries retry if the table is not found? Or is this only going to happen for query job not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢 It does hang / retry too much. Tested with
import google.cloud.bigquery
client = google.cloud.bigquery.Client()
job = client.query("SELECT * FROM `swast-scratch.my_dataset.doesntexistnoreally`")
job.result()
Need to play a similar trick as with get_job_retry
, instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest commit.
I confirmed with my colleague that after at |
get_job_retry = retry.with_predicate( | ||
lambda exc: isinstance(exc, core_exceptions.NotFound) | ||
# Reference the original retry to avoid recursion. | ||
or retry._predicate(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthea Do you have any thoughts on how stable this will be? I couldn't find any public way to pull out the previous predicate from a Retry object when looking at the https://github.com/googleapis/python-api-core/blob/main/google/api_core/retry/retry_base.py code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative to consider: do something like I did with retry versus job_retry and introduce more parameters for job_insert_retry and job_get_retry or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed googleapis/python-api-core#796, in the meantime I'll pursue an alternative that doesn't require access to private attributes.
return True | ||
|
||
# Reference the original job_retry to avoid recursion. | ||
return job_retry._predicate(exc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthea likewise here. I ended up using a slightly different pattern here due to the double nesting of retry objects.
system-3.12 failure: Edit: Actually, it does appear to be related. It's a different kind of 404 -- trying to query a table in a different location. I'll see if I can disambiguate that from the job 404. Edit2: fixed in 8e20cb0. I use the message to disambiguate job not found from dataset/table not found based on the stack trace in #2134. System test passing locally (and much more quickly). |
Kokoro (https://btx.cloud.google.com/invocations/91aed90f-987e-4200-be14-09717b48923f/targets/cloud-devrel%2Fclient-libraries%2Fpython%2Fgoogleapis%2Fpython-bigquery%2Fpresubmit%2Fpresubmit/log) and system-3.12 and snippets tests are passing. Ready for review! |
Cover is failing: I'll see if I can update a unit test for this soon. |
@@ -5308,6 +5313,173 @@ def test_query_job_rpc_fail_w_conflict_random_id_job_fetch_fails(self): | |||
with pytest.raises(DataLoss, match="we lost your job, sorry"): | |||
client.query("SELECT 1;", job_id=None) | |||
|
|||
def test_query_job_rpc_fail_w_conflict_random_id_job_fetch_fails_no_retries(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you help me understand what this test is for? It feels like we are just verifying that QueryJob._begin()
and client.get_job()
are called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, lgtm.
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #2134 🦕