Skip to content

Deadlock possible when calling bigquery_storage.BigQueryReadClient() on multiple threads #696

Open
@d1manson

Description

@d1manson

I'm using bigquery_storage.BigQueryReadClient() inside a Prefect task, and if I run five or so instances of that task in parallel it seems to deadlock about 50% of the time.

Wrapping the call in a lock seems to prevent the deadlock:

from prefect import task
from google.cloud import bigquery_storage   # google-cloud-bigquery-storage~=2.16
from threading import Lock

bqrc_lock = Lock()

@task
async def some_task():
   # ... do something unrelated ...
   with bqrc_lock:
      client = bigquery_storage.BigQueryReadClient()
   # ... do something with the client (no further locking required)... and whatever else...

As far as I understand and can tell, Prefect is running each task on a different thread. I suspect the issue is somewhere in the auth logic - I'm running this in GKE with a service account assigned to the node (rather than any explicit creds).

I appreciate that maybe i could create a singleton somehow to avoid instantiating multiple clients in parallel (and thus avoid the issue entirely), but I'm not sure if it would actually be possible to use a singleton safely across threads if I did that. More subjectively, it's nice to keep the client creation located directly at the point in the code that uses it (if there's only one such line and it only gets executed a handful of times during the life of the program).

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigquerystorageIssues related to the googleapis/python-bigquery-storage API.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions