Skip to content

Files

Latest commit

d5dada8 · Nov 1, 2020

History

History
28 lines (22 loc) · 1.12 KB

README.md

File metadata and controls

28 lines (22 loc) · 1.12 KB

SparkStreamingHBase

This project illustrates how to create Kafka Producer, Kafka Consumer, and insert streaming data into HBase using Java and Spark.

Dummy data were taken from https://www.kaggle.com/selfishgene/historical-hourly-weather-data, humidity.csv then transformed to JSON format.

Prerequisites

Please ensure that you have met the following requirements:

  • Java 8
  • Maven
  • Apache Spark 2.x
  • Apache Kafka 0.10.x
  • Apache HBase 1.x

Using this project

This project consists of two main classes:

  • ProducerMain: read hourly time series of humidity data (humidity.json) then send to Kafka.
  • ConsumerMain: consume data from Kafka, transform the data then save to HBase.

Build

mvn install

Run

Producer

spark-submit --class com.malik.main.ProducerMain --master local[2] malik/engine/SparkStreamingHBase-1.0-SNAPSHOT-jar-with-dependencies.jar

Consumer

spark-submit --class com.malik.main.ConsumerMain --master local[2] malik/engine/SparkStreamingHBase-1.0-SNAPSHOT-jar-with-dependencies.jar

Visit localhost:port using your browser to monitor your spark job. Default port is 4040.