Skip to content

A simple setup to demo batch and streaming workloads using PyIceberg

Notifications You must be signed in to change notification settings

robathija/py-iceberg-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

py-iceberg-etl

A simple setup to demo batch and streaming workloads using PyIceberg

This repo is geared towards GCP workloads

Iceberg Catalog

We are using a Docker image locally for postgres as a JDBC catalog, but this could be easily substituted with Cloud SQL on GCP, an Iceberg REST catalog etc. - anything thats supported by Iceberg

Streaming

This is a Apache Beam/Cloud Dataflow pipeline ingesting data from a pub-sub topic and writing data to GCS in the Iceberg spec

Batch

A simple python notebook

About

A simple setup to demo batch and streaming workloads using PyIceberg

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages