SparkStreamingHBase

This project illustrates how to create Kafka Producer, Kafka Consumer, and insert streaming data into HBase using Java and Spark.

Dummy data were taken from https://www.kaggle.com/selfishgene/historical-hourly-weather-data, humidity.csv then transformed to JSON format.

Prerequisites

Please ensure that you have met the following requirements:

This project consists of two main classes:

ProducerMain: read hourly time series of humidity data (humidity.json) then send to Kafka.
ConsumerMain: consume data from Kafka, transform the data then save to HBase.

mvn install

Producer

spark-submit --class com.malik.main.ProducerMain --master local[2] malik/engine/SparkStreamingHBase-1.0-SNAPSHOT-jar-with-dependencies.jar

Consumer

spark-submit --class com.malik.main.ConsumerMain --master local[2] malik/engine/SparkStreamingHBase-1.0-SNAPSHOT-jar-with-dependencies.jar

Visit localhost:port using your browser to monitor your spark job. Default port is 4040.