A demo project using Spark Streaming to analyze popular hashtags from twitter. The data comes from the Twitter Streaming API source and is fed to Kafka. The consumer com.twitter.producer.service receives data from Kafka and then processes it in a stream using Spark Streaming.
- Apache Maven 3.x
- JVM 8
- Docker machine
- Registered an Twitter Application. The following guides may also be helpful: How to create a Twitter application.
-
Change Twitter configuration in
\producer\src\main\resources\application.yml
with your API Key, client Id and Secret Id. -
Run the kafka image using docker-compose(keep in mind that the kafka image need to pull zookeper too):
~> docker-compose -f producer/src/main/docker/kafka-docker-compose.yml up -d
- Check if ZooKeeper and Kafka is running (from command prompt)
~> docker ps
- Run poducer and consumer app with:
~> mvn spring-boot:run