You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use KafkaConnect to dump topics to AWS S3. Analyzing data is pretty simple with Athena + AWS Glue (Crawlers) + AWS S3. It looks like a common way for AWS users.
Problem
The base problem happens when we partition by fields from the Kafka message. Athena can not create a table because parts of S3 subpath are separate columns and all Json keys are separate columns too. Two the same column names are impossible.
Solution
It's a good idea to add Partitioner based on Header field & Time
Extra
There is a good custom Partitioner which also can be used as default in this repo FieldAndTimeBasedPartitioner
The text was updated successfully, but these errors were encountered:
We use KafkaConnect to dump topics to AWS S3. Analyzing data is pretty simple with Athena + AWS Glue (Crawlers) + AWS S3. It looks like a common way for AWS users.
Problem
The base problem happens when we partition by fields from the Kafka message. Athena can not create a table because parts of S3 subpath are separate columns and all Json keys are separate columns too. Two the same column names are impossible.
Solution
It's a good idea to add Partitioner based on Header field & Time
Extra
There is a good custom Partitioner which also can be used as default in this repo FieldAndTimeBasedPartitioner
The text was updated successfully, but these errors were encountered: