Skip to content

amitca71/confluent-mysql-postgres-s3-pyspark-delta

Repository files navigation

delta lake poc implementation. kafka connect (Mysql debezium CDC and S3 connectors) S3 spark+deltalake

Table of Contents

JDBC Sink

Topology

``image

We are using Docker Compose to deploy following components
* MySQL
* Kafka
  * ZooKeeper
  * Kafka Broker
  * Kafka Connect with [Debezium](https://debezium.io/) and  [JDBC](https://github.com/confluentinc/kafka-connect-jdbc) Connectors
* PostgreSQL
* minio - local S3 
* Spark
  * master
  * spark-worker-1
  * spark-worker-2
  * pyspark jupyter notebook
### Usage

How to run:

```shell
docker-compose up -d

# see kafka confluent control-center on http://localhost:9021/ 


# Start PostgreSQL connector
curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @jdbc-sink.json

# Start S3 minio connector
curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @s3-minio-sink.json

# Start MySQL connector
curl -i -X POST -H "Accept:application/json" -H  "Content-Type:application/json" http://localhost:8083/connectors/ -d @source.json

Check contents of the MySQL database:

docker-compose exec mysql bash -c 'mysql -u $MYSQL_USER  -p$MYSQL_PASSWORD inventory -e "select * from customers"'
+------+------------+-----------+-----------------------+
| id   | first_name | last_name | email                 |
+------+------------+-----------+-----------------------+
| 1001 | Sally      | Thomas    | [email protected] |
| 1002 | George     | Bailey    | [email protected]    |
| 1003 | Edward     | Walker    | [email protected]         |
| 1004 | Anne       | Kretchmar | [email protected]    |
+------+------------+-----------+-----------------------+

Verify that the PostgreSQL database has the same content:

docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
 last_name |  id  | first_name |         email         
-----------+------+------------+-----------------------
 Thomas    | 1001 | Sally      | [email protected]
 Bailey    | 1002 | George     | [email protected]
 Walker    | 1003 | Edward     | [email protected]
 Kretchmar | 1004 | Anne       | [email protected]
(4 rows)

New record

Insert a new record into MySQL;

docker-compose  exec mysql bash -c 'mysql -u $MYSQL_USER  -p$MYSQL_PASSWORD inventory'
mysql> insert into customers values(default, 'John', 'Doe', '[email protected]');
Query OK, 1 row affected (0.02 sec)

Verify that PostgreSQL contains the new record:

docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
 last_name |  id  | first_name |         email         
-----------+------+------------+-----------------------
...
Doe        | 1005 | John       | [email protected]
(5 rows)

Record update

Update a record in MySQL:

mysql> update customers set first_name='Jane', last_name='changed' where last_name='Thomas';
Query OK, 1 row affected (0.02 sec)
Rows matched: 1  Changed: 1  Warnings: 0

Verify that record in PostgreSQL is updated:

docker-compose exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
 last_name |  id  | first_name |         email         
-----------+------+------------+-----------------------
...
Roe        | 1005 | Jane       | [email protected]
(5 rows)

Record delete

Delete a record in MySQL:

mysql> delete from customers where email='[email protected]';
Query OK, 1 row affected (0.01 sec)

Verify that record in PostgreSQL is deleted:

docker-compose  exec postgres bash -c 'psql -U $POSTGRES_USER $POSTGRES_DB -c "select * from customers"'
 last_name |  id  | first_name |         email         
-----------+------+------------+-----------------------
...
(4 rows)

As you can see there is no longer a 'Jane Doe' as a customer.

spark

get notebook token:

docker-compose exec pyspark bash -c "jupyter server list"

open localhost:9999

open work/write_read_to_minio.ipynb

see http://localhost:8080/ for the workers and DAG image

Shut down the cluster

End application:

docker-compose down

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published