Skip to content

Option to provide partition column and partition expiry time #313

Open
@ShantanuKumar

Description

@ShantanuKumar

While creating a new table using pandas, it would be nice if it can partition the table and set an partition expiry time. The python bigquery library already supports it

# from google.cloud import bigquery
# client = bigquery.Client()
# dataset_ref = client.dataset('my_dataset')

table_ref = dataset_ref.table("my_partitioned_table")
schema = [
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("post_abbr", "STRING"),
    bigquery.SchemaField("date", "DATE"),
]
table = bigquery.Table(table_ref, schema=schema)
table.time_partitioning = bigquery.TimePartitioning(
    type_=bigquery.TimePartitioningType.DAY,
    field="date",  # name of column to use for partitioning
    expiration_ms=7776000000,
)  # 90 days

table = client.create_table(table)

print(
    "Created table {}, partitioned on column {}".format(
        table.table_id, table.time_partitioning.field
    )
)

https://cloud.google.com/bigquery/docs/creating-column-partitions

I can create a pull request, if people feel like it's something they find useful. At least in my work, we create lot of monitoring tables on bigquery using pandas, and push data to it. These tables keep growing and since we can't set the partition when a table has already been created, these tables just become too big, and expensive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigqueryIssues related to the googleapis/python-bigquery-pandas API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions