Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retry 404 errors in Client.query(...) #2135

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Feb 18, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #2134 🦕

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery API. labels Feb 18, 2025
@tswast tswast marked this pull request as ready for review February 18, 2025 23:11
@tswast tswast requested review from a team as code owners February 18, 2025 23:11
@tswast tswast requested a review from Neenu1995 February 18, 2025 23:11
if job_retry is None:
future = do_query()
else:
future = job_retry(do_query)()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: even though I add the retry here, we'll still need the one in QueryJob, because BigQuery queries can be started successfully but still fail for a retriablle reason. By adding the retry here, we can retry the queries that fail for a retriable reason.

# Per https://github.com/googleapis/python-bigquery/issues/2134, sometimes
# we get a 404 error. In this case, if we get this far, assume that the job
# doesn't actually exist and try again.
or isinstance(exc, exceptions.NotFound)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: Does this make it so that the queries retry if the table is not found? Or is this only going to happen for query job not found.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢 It does hang / retry too much. Tested with

import google.cloud.bigquery

client = google.cloud.bigquery.Client()
job = client.query("SELECT * FROM `swast-scratch.my_dataset.doesntexistnoreally`")
job.result()

Need to play a similar trick as with get_job_retry, instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit.

@tswast
Copy link
Contributor Author

tswast commented Feb 19, 2025

I confirmed with my colleague that after at pip install --upgrade git+https://github.com/googleapis/python-bigquery.git@issue2134-query404, this resolved their issue.

get_job_retry = retry.with_predicate(
lambda exc: isinstance(exc, core_exceptions.NotFound)
# Reference the original retry to avoid recursion.
or retry._predicate(exc)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parthea Do you have any thoughts on how stable this will be? I couldn't find any public way to pull out the previous predicate from a Retry object when looking at the https://github.com/googleapis/python-api-core/blob/main/google/api_core/retry/retry_base.py code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative to consider: do something like I did with retry versus job_retry and introduce more parameters for job_insert_retry and job_get_retry or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed googleapis/python-api-core#796, in the meantime I'll pursue an alternative that doesn't require access to private attributes.

return True

# Reference the original job_retry to avoid recursion.
return job_retry._predicate(exc)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@parthea likewise here. I ended up using a slightly different pattern here due to the double nesting of retry objects.

@tswast
Copy link
Contributor Author

tswast commented Feb 19, 2025

system-3.12 failure: tests/system/test_client.py::TestBigQuery::test_load_table_from_file_w_explicit_location -- doesn't appear to be related, as that is a load job and this change only touches query jobs.

Edit: Actually, it does appear to be related. It's a different kind of 404 -- trying to query a table in a different location. I'll see if I can disambiguate that from the job 404.

Edit2: fixed in 8e20cb0. I use the message to disambiguate job not found from dataset/table not found based on the stack trace in #2134. System test passing locally (and much more quickly).

@tswast
Copy link
Contributor Author

tswast commented Feb 20, 2025

@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Feb 20, 2025
@tswast
Copy link
Contributor Author

tswast commented Feb 20, 2025

Cover is failing: google/cloud/bigquery/retry.py 42 1 12 1 96% 180

I'll see if I can update a unit test for this soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Client.query() sometimes raises 404, despite the query job being successfully started
2 participants