Skip to content

Spark Streaming and Monitoring within Container technology

Notifications You must be signed in to change notification settings

zaivi/int-spark-streaming-bigquery

Repository files navigation

Monitoring Apache Spark and on Docker with Prometheus and Grafana

Spark streaming with Kafka and ingest to BigQuery

Goal

The goal of this project is to:

  • Create a Docker Container that runs Spark
  • Use Prometheus to get metrics from Spark applications and Node-exporter
  • Use Grafana to display the metrics collected
  • Spark stream message with Kafka to BigQuery

Notes

  • Spark version running is 3.0.2
  • For all available metrics for Spark monitoring see here.
  • The containerized environment consists of a Master, a Worker.
  • To track metrics across Spark apps, appName needs to be set up or else the spark.metrics.namespace will be spark.app.id that changes after every invocation of the app.
  • Main Scala Application running is Kafka Streaming Project-assembly-0.2.0.jar that is streaming job execution ingest to BigQuery.
  • Dockerfile for Spark/Hadoop is also available here in order to add it in docker-compose.yaml file as seen here.

Usage

Assuming that Docker is installed, simply execute the following command to build and run the Docker Containers:

docker-compose -f docker-compose.spark.yaml -f docker-compose.kafka.yaml build && docker-compose -f docker-compose.spark.yaml -f docker-compose.kafka.yaml up

To shutdown Docker Containers, execute the following command:

docker-compose -f docker-compose.spark.yaml -f docker-compose.kafka.yaml down

About

Spark Streaming and Monitoring within Container technology

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published