SparkStreamingHBase

This project illustrates how to create Kafka Producer, Kafka Consumer, and insert streaming data into HBase using Java and Spark.

Dummy data were taken from https://www.kaggle.com/selfishgene/historical-hourly-weather-data, humidity.csv then transformed to JSON format.

Prerequisites

Please ensure that you have met the following requirements:

This project consists of two main classes:

ProducerMain: read hourly time series of humidity data (humidity.json) then send to Kafka.
ConsumerMain: consume data from Kafka, transform the data then save to HBase.

mvn install

Producer

spark-submit --class com.malik.main.ProducerMain --master local[2] malik/engine/SparkStreamingHBase-1.0-SNAPSHOT-jar-with-dependencies.jar

Consumer

spark-submit --class com.malik.main.ConsumerMain --master local[2] malik/engine/SparkStreamingHBase-1.0-SNAPSHOT-jar-with-dependencies.jar

Visit localhost:port using your browser to monitor your spark job. Default port is 4040.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src/main/java/com/malik		src/main/java/com/malik
.gitignore		.gitignore
README.md		README.md
humidity.json		humidity.json
pom.xml		pom.xml