A collection of out-of-the-box Spring Boot based Apache Spark applications that perform common tasks regarding Apache Iceberg. Currently, the existing applications are:
kafka2iceberg
- A pipeline that reads data from Kafka and writes to Iceberg.iceberg-maintainer
- A program that executes Iceberg maintenance tasks.
For Local Usage & Development:
To run iceberg-application locally, you need to set up the required environment using Docker Compose.
Use the Docker Compose file located at environment/compose/environment-docker-compose.yaml. This setup includes MinIO S3, Kafka, and Zookeeper (with Kafka UI).
Depending on your Iceberg catalog configuration, bring up one of the following Docker Compose files:
environment/compose/nessie-docker-compose.yaml
(for Nessie catalog)environment/compose/postgres-docker-compose.yaml
(for Postgres-based catalog)- If you are using an S3-based catalog (e.g., Hadoop catalog), no additional containers are required.
Configure each application in the Spring application.yaml
file. Set the catalog type using spring.iceberg.catalog-type={hadoop/hive/jdbc}
.
Run the DevSamplePojoKafkaProducer.java script to produce sample data to Kafka.
Download the Hadoop Binaries and place them locally at C:/hadoop
.
Ensure the binaries are located at C:/hadoop/hadoop-2.7.1
.
Environment Variables:
In your IntelliJ run configurations, set the following environment variables:
HADOOP_HOME=C:\hadoop\hadoop-2.7.1;PATH=C:\hadoop\hadoop-2.7.1\bin
Set the Spring Boot profile to either jdbc
or nessie
, depending on your catalog type.
Set the VM options to: --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --enable-preview
.
Enter locahost:9001
, and checkout your bucket to verify the Kafka2Iceberg have successfully created an Iceberg table:
- Run the iceberg-maintainer application in the same manner as Kafka2Iceberg.
- After the files have been merged, check your MinIO bucket again to see the changes.