Description
I'm using bigquery_storage.BigQueryReadClient()
inside a Prefect task, and if I run five or so instances of that task in parallel it seems to deadlock about 50% of the time.
Wrapping the call in a lock seems to prevent the deadlock:
from prefect import task
from google.cloud import bigquery_storage # google-cloud-bigquery-storage~=2.16
from threading import Lock
bqrc_lock = Lock()
@task
async def some_task():
# ... do something unrelated ...
with bqrc_lock:
client = bigquery_storage.BigQueryReadClient()
# ... do something with the client (no further locking required)... and whatever else...
As far as I understand and can tell, Prefect is running each task on a different thread. I suspect the issue is somewhere in the auth logic - I'm running this in GKE with a service account assigned to the node (rather than any explicit creds).
I appreciate that maybe i could create a singleton somehow to avoid instantiating multiple clients in parallel (and thus avoid the issue entirely), but I'm not sure if it would actually be possible to use a singleton safely across threads if I did that. More subjectively, it's nice to keep the client creation located directly at the point in the code that uses it (if there's only one such line and it only gets executed a handful of times during the life of the program).
Thanks