Skip to content

Support AWS DMS Partitions in Method store_parquet_metadata #2303

Open
@FleischerT

Description

@FleischerT

Is your idea related to a problem? Please describe.
I am trying to create glue tables for data in S3 written by AWS DMS as partitioned parquet files. The problem is that AWS DMS writes the partitions in the format "2023/05/01/" and not in the Hive standard like "year=2023/month=05/day=01".
Now when I try to create the glue tables using the Wrangler method "store_parquet_metadata", the partitions are not recognized because in the internal method "_extract_partitions_metadata_from_paths" is filtered for "=".

Describe the solution you'd like
Currently only hive conform partitioning seems to be supported. It would be better if you could pass the partition keys when calling the method.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions