Open
Description
While creating a new table using pandas, it would be nice if it can partition the table and set an partition expiry time. The python bigquery library already supports it
# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')
table_ref = dataset_ref.table("my_partitioned_table")
schema = [
bigquery.SchemaField("name", "STRING"),
bigquery.SchemaField("post_abbr", "STRING"),
bigquery.SchemaField("date", "DATE"),
]
table = bigquery.Table(table_ref, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
type_=bigquery.TimePartitioningType.DAY,
field="date", # name of column to use for partitioning
expiration_ms=7776000000,
) # 90 days
table = client.create_table(table)
print(
"Created table {}, partitioned on column {}".format(
table.table_id, table.time_partitioning.field
)
)
https://cloud.google.com/bigquery/docs/creating-column-partitions
I can create a pull request, if people feel like it's something they find useful. At least in my work, we create lot of monitoring tables on bigquery using pandas, and push data to it. These tables keep growing and since we can't set the partition when a table has already been created, these tables just become too big, and expensive.