Support wildcard tables (and filter on _TABLE_SUFFIX
) in read_gbq
/ read_gbq_table
#169
Labels
api: bigquery
Issues related to the googleapis/python-bigquery-dataframes API.
type: feature request
‘Nice-to-have’ improvement, new feature or different behavior or design.
Is your feature request related to a problem? Please describe.
In https://cloud.google.com/bigquery/docs/create-machine-learning-model there is the following SQL:
I'd like to be able to represent all of this in Python without any SQL code. There are two problems right now:
bigquery-public-data.google_analytics_sample.ga_sessions_*
isn't supported as a table ID inread_gbq
. This doesn't refer to any single table, so API requests based on the table ID will fail.Even if (1) were supported, it would try to copy all the data into a temp table. It would be best to be able to specify a filter on
_TABLE_SUFFIX
at data read time.Describe the solution you'd like
bpd.read_gbq("bigquery-public-data.google_analytics_sample.ga_sessions_*")
should work.Also, somewhat inspired by the BigQuery Storage API, accept a
row_restriction
parameter to filter rows.See: https://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#google.cloud.bigquery.storage.v1.ReadSession.TableReadOptions.FIELDS.string.google.cloud.bigquery.storage.v1.ReadSession.TableReadOptions.row_restriction
See also: "filters" parameter in pandas.read_parquet
Describe alternatives you've considered
SQL as an input to
read_gbq
works as an alternative right now.Additional context
Related feature request on pandas-gbq:
filters
parameter onread_gbq
which applies when using a table ID as input python-bigquery-pandas#694 Addfilters
parameter toread_gbq
The text was updated successfully, but these errors were encountered: