You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
After dbt-bigquery submits a dataproc batch job, it enters a polling mode, waiting for a response to indicate the job completed (dbt/adapters/bigquery/dataproc/batch.py#29). This polling process is not retrying transient errors, so the dbt run ends up failing with an error, while the actual dataproc job runs to completion successfully.
Expected Behavior
We should be performing the dataproc batch polling with a retry strategy to retry transient errors.
The BatchControllerClient.get_batch() function takes a retry parameter:
retry (google.api_core.retry.Retry): Designation of what errors, if any,should be retried.`
so we should pass one in!
Steps To Reproduce
This relies on GCP API throwing a 5xx error so it's not very reproducible, though we are hitting the error at least once a day in our runs.
Use dbt-bigquery 1.8.1
Configure Dataproc serverless in profiles.yml:
target: dev
outputs:
dev:
type: bigquery
location: EU
job_retries: 5
dataproc_batch:
environment_config:
execution_config:
service_account: [email protected]
subnetwork_uri: projects/data-dev/regions/europe-west1/subnetworks/dataproc-dbt-subnet
Submit many long-running Python models
Wait until one of them fails.
Relevant log output
11:05:53.104591 [info ] [Thread-1 ]: BigQuery adapter: Submitting batch job with id: d9cb4840-a1f1-49f3-bd8b-2af2d8dbcbfd
11:06:40.287936 [error] [Thread-1 ]: Unhandled error while executing target/run/core/models/data_tables/DT_segments_variants.py
503 502:Bad Gateway
11:06:40.298010 [debug] [Thread-1 ]: Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
return callable_(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1181, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.9/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "502:Bad Gateway"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-07-04T11:06:40.285827131+00:00", grpc_status:14, grpc_message:"502:Bad Gateway"}">
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/dbt/task/base.py", line 368, in safe_run
result = self.compile_and_execute(manifest, ctx)
File "/usr/local/lib/python3.9/site-packages/dbt/task/base.py", line 314, in compile_and_execute
result = self.run(ctx.node, manifest)
File "/usr/local/lib/python3.9/site-packages/dbt/task/base.py", line 415, in run
return self.execute(compiled_node, manifest)
File "/usr/local/lib/python3.9/site-packages/dbt/task/run.py", line 298, in execute
result = MacroGenerator(
File "/usr/local/lib/python3.9/site-packages/dbt/clients/jinja.py", line 84, in __call__
return self.call_macro(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt_common/clients/jinja.py", line 298, in call_macro
return macro(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/jinja2/runtime.py", line 768, in __call__
return self._invoke(arguments, autoescape)
File "/usr/local/lib/python3.9/site-packages/jinja2/runtime.py", line 782, in _invoke
rv = self._func(*arguments)
File "<template>", line 89, in macro
File "/usr/local/lib/python3.9/site-packages/jinja2/sandbox.py", line 394, in call
return __context.call(__obj, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/jinja2/runtime.py", line 303, in call
return __obj(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt/clients/jinja.py", line 84, in __call__
return self.call_macro(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt_common/clients/jinja.py", line 298, in call_macro
return macro(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/jinja2/runtime.py", line 768, in __call__
return self._invoke(arguments, autoescape)
File "/usr/local/lib/python3.9/site-packages/jinja2/runtime.py", line 782, in _invoke
rv = self._func(*arguments)
File "<template>", line 55, in macro
File "/usr/local/lib/python3.9/site-packages/jinja2/sandbox.py", line 394, in call
return __context.call(__obj, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/jinja2/runtime.py", line 303, in call
return __obj(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/dbt/context/providers.py", line 1444, in submit_python_job
return self.adapter.submit_python_job(parsed_model, compiled_code)
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/base/impl.py", line 175, in execution_with_log
response = code_execution_function(*args)
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/base/impl.py", line 1536, in submit_python_job
submission_result = job_helper.submit(compiled_code)
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/bigquery/python_submissions.py", line 75, in submit
returnself._submit_dataproc_job()
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/bigquery/python_submissions.py", line 144, in _submit_dataproc_job
return poll_batch_job(
File "/usr/local/lib/python3.9/site-packages/dbt/adapters/bigquery/dataproc/batch.py", line 38, in poll_batch_job
response = job_client.get_batch( # type: ignore
File "/usr/local/lib/python3.9/site-packages/google/cloud/dataproc_v1/services/batch_controller/client.py", line 950, in get_batch
response = rpc(
File "/usr/local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.ServiceUnavailable: 503 502:Bad Gateway
Feels similar to #682, but I believe the Dataproc element means we're dealing with a separate code path that doesn't benefit from RETRYABLE_ERRORS. The stacktrace also doesn't mention dbt/adapters/bigquery/connections.py.
The text was updated successfully, but these errors were encountered:
Is this a new bug in dbt-bigquery?
Current Behavior
After dbt-bigquery submits a dataproc batch job, it enters a polling mode, waiting for a response to indicate the job completed (dbt/adapters/bigquery/dataproc/batch.py#29). This polling process is not retrying transient errors, so the dbt run ends up failing with an error, while the actual dataproc job runs to completion successfully.
Expected Behavior
We should be performing the dataproc batch polling with a retry strategy to retry transient errors.
The BatchControllerClient.get_batch() function takes a retry parameter:
so we should pass one in!
Steps To Reproduce
This relies on GCP API throwing a 5xx error so it's not very reproducible, though we are hitting the error at least once a day in our runs.
Relevant log output
Environment
Additional Context
Feels similar to #682, but I believe the Dataproc element means we're dealing with a separate code path that doesn't benefit from
RETRYABLE_ERRORS
. The stacktrace also doesn't mentiondbt/adapters/bigquery/connections.py
.The text was updated successfully, but these errors were encountered: