This project illustrates how to create Kafka Producer, Kafka Consumer, and insert streaming data into HBase using Java and Spark.
Dummy data were taken from https://www.kaggle.com/selfishgene/historical-hourly-weather-data, humidity.csv then transformed to JSON format.
Please ensure that you have met the following requirements:
- Java 8
- Maven
- Apache Spark 2.x
- Apache Kafka 0.10.x
- Apache HBase 1.x
This project consists of two main classes:
- ProducerMain: read hourly time series of humidity data (humidity.json) then send to Kafka.
- ConsumerMain: consume data from Kafka, transform the data then save to HBase.
mvn install
Producer
spark-submit --class com.malik.main.ProducerMain --master local[2] malik/engine/SparkStreamingHBase-1.0-SNAPSHOT-jar-with-dependencies.jar
Consumer
spark-submit --class com.malik.main.ConsumerMain --master local[2] malik/engine/SparkStreamingHBase-1.0-SNAPSHOT-jar-with-dependencies.jar
Visit localhost:port
using your browser to monitor your spark job. Default port is 4040.