Skip to content

Latest commit

 

History

History
 
 

kafka-streams

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Kafka Streams examples Build Status

This sub-folder contains code examples that demonstrate how to implement real-time processing applications using Kafka Streams, which is a new stream processing library included with the Apache Kafka open source project.


Table of Contents


This repository has several branches to help you find the correct code examples for the version of Apache Kafka and/or Confluent Platform that you are using. See Version Compatibility Matrix below for details.

There are two kinds of examples:

  • Examples under src/main/: These examples are short and concise. Also, you can interactively test-drive these examples, e.g. against a local Kafka cluster. If you want to actually run these examples, then you must first install and run Apache Kafka and friends, which we describe in section Packaging and running the examples. Each example also states its exact requirements and instructions at the very top.
  • Examples under src/test/: These examples are a bit longer because they implement integration tests that demonstrate end-to-end data pipelines. Here, we use a testing framework to automatically spawn embedded Kafka clusters, feed input data to them (using the standard Kafka producer client), process the data using Kafka Streams, and finally read and verify the output results (using the standard Kafka consumer client). These examples are also a good starting point to learn how to implement your own end-to-end integration tests.

Note: We use the label "Lambda" to denote examples that make use of lambda expressions and thus require Java 8+.

  • WordCountLambdaExample -- demonstrates, using the Kafka Streams DSL, how to implement the WordCount program that computes a simple word occurrence histogram from an input text.
  • MapFunctionLambdaExample -- demonstrates how to perform stateless transformations via map functions, using the Kafka Streams DSL (see also the Scala variant MapFunctionScalaExample)
  • SumLambdaExample -- demonstrates how to perform stateful transformations via reduce, using the Kafka Streams DSL
  • PageViewRegionLambdaExample -- demonstrates how to perform a join between a KStream and a KTable, i.e. an example of a stateful computation
    • Variant: PageViewRegionExample, which implements the same example but without lambda expressions and thus works with Java 7+.
  • Working with data in Apache Avro format (see also the end-to-end demos under integration tests below):
  • SecureKafkaStreamsExample (Java 7+) -- demonstrates how to configure Kafka Streams for secure stream processing (here: encrypting data-in-transit and enabling client authentication so that the Kafka Streams application authenticates itself to the Kafka brokers)
  • StateStoresInTheDSLIntegrationTest (Java 8+) -- demonstrates how to use state stores in the Kafka Streams DSL
  • WordCountInteractiveQueriesExample (Java 8+) -- demonstrates the Interactive Queries feature to locate and query state stores of a Kafka Streams application from other applications; here, we opted to use a REST API to implement the required RPC layer to allow applications to talk to each other
  • KafkaMusicExample (Java 8+) -- demonstrates the building of a simple music charts application. Uses the Interactive Queries feature to query the state stores to get the latest top five songs. Demonstrates locating the KafkaStreams instance for a store and key and retrieving the values via a REST API
  • HandlingCorruptedInputRecordsIntegrationTest (Java 8+) -- demonstrates how to handle corrupt input records (think: poison pill messages)
  • MixAndMatchLambdaIntegrationTest (Java 8+) -- demonstrates how to mix and match the DSL and the Processor API via KStream#transform() and KStream#process(), which allow you to include custom Transformer and Processor implementations, respectively, within topologies defined via the DSL
  • ApplicationResetExample (Java 8+) -- demonstrates the usage of the application reset tool (bin/kafka-streams-application-reset)
  • GlobalKTablesExample (Java 8+) -- demonstrates joining between KStream and GlobalKTable.
  • And further examples.

We also provide several integration tests, which demonstrate end-to-end data pipelines. Here, we spawn embedded Kafka clusters and the Confluent Schema Registry, feed input data to them (using the standard Kafka producer client), process the data using Kafka Streams, and finally read and verify the output results (using the standard Kafka consumer client).

Tip: Run mvn test to launch the integration tests.

We also provide several integration tests, which demonstrate end-to-end data pipelines. Here, we spawn embedded Kafka clusters and the Confluent Schema Registry, feed input data to them (using the standard Kafka producer client), process the data using Kafka Streams, and finally read and verify the output results (using the standard Kafka consumer client).

Tip: Run mvn test to launch the integration tests.

The code in this repository requires Apache Kafka 0.10+ because from this point onwards Kafka includes its Kafka Streams library. See Version Compatibility Matrix for further details, as different branches of this repository may have different Kafka requirements.

When using the master branch: The master branch typically requires the latest trunk version of Apache Kafka (cf. kafka.version in pom.xml for details). The following instructions will build and locally install the latest trunk Kafka version:

$ git clone [email protected]:apache/kafka.git
$ cd kafka
$ git checkout trunk

# Bootstrap gradle wrapper
$ gradle

# Now build and install Kafka locally
$ ./gradlew clean installAll

To add the Kafka Streams library to your application when using Confluent Platform and maven (see pom.xml and Kafka Streams: libraries and maven artifacts for details):

<!-- pom.xml -->
<repositories>
  <repository>
    <id>confluent</id>
    <url>http://packages.confluent.io/maven/</url>
  </repository>
</repositories>

<dependencies>
  <dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-streams</artifactId>
    <version>0.10.2.0-cp1</version>
  </dependency>
  <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka-clients</artifactId>
      <version>0.10.2.0-cp1</version>
  </dependency>
</dependencies>

To add the Kafka Streams library to your application when using Confluent Platform and gradle:

repositories {
  maven { url "http://packages.confluent.io/maven/" }
}

dependencies {
    compile "org.apache.kafka:kafka-streams:0.10.2.0-cp1"
    compile "org.apache.kafka:kafka-clients:0.10.2.0-cp1"
}

To add the Kafka Streams library to your application when using Confluent Platform and sbt (Scala):

resolvers ++= Seq(
  "confluent-repository" at "http://packages.confluent.io/maven/"
)

libraryDependencies ++= Seq(
  "org.apache.kafka" % "kafka-streams" % "0.10.2.0-cp1",
  "org.apache.kafka" % "kafka-clients" % "0.10.2.0-cp1"
)

The code in this repository requires Confluent Platform 3.1.x. See Version Compatibility Matrix for further details, as different branches of this repository may have different Confluent Platform requirements.

If you just run the integration tests (mvn test), then you do not need to install anything -- all maven artifacts will be downloaded automatically for the build. However, if you want to interactively test-drive the examples under src/main/ (such as WordCountLambdaExample), then you do need to install Confluent Platform. See Packaging and running the examples below. Also, each example states its exact requirements at the very top.

Some code examples require Java 8, primarily because of the usage of lambda expressions.

IntelliJ IDEA users:

  • Open File > Project structure
  • Select "Project" on the left.
    • Set "Project SDK" to Java 1.8.
    • Set "Project language level" to "8 - Lambdas, type annotations, etc."

Scala is required only for the Scala examples in this repository. If you are a Java developer you can safely ignore this section.

If you want to experiment with the Scala examples in this repository, you need a version of Scala that supports Java 8 and SAM / Java lambda (e.g. Scala 2.11 with -Xexperimental compiler flag, or 2.12).

Tip: If you only want to run the integration tests (mvn test), then you do not need to package or install anything -- just run mvn test. The instructions below are only needed if you want to interactively test-drive the examples under src/main/.

The first step is to install and run a Kafka cluster, which must consist of at least one Kafka broker as well as at least one ZooKeeper instance. Some examples may also require a running instance of Confluent schema registry. The Confluent Platform Quickstart guide provides the full details.

In a nutshell:

# Start ZooKeeper
$ ./bin/zookeeper-server-start ./etc/kafka/zookeeper.properties

# In a separate terminal, start Kafka broker
$ ./bin/kafka-server-start ./etc/kafka/server.properties

# In a separate terminal, start Confluent schema registry
$ ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties

# Again, please refer to the Confluent Platform Quickstart for details such as
# how to download Confluent Platform, how to stop the above three services, etc.

Tip: You can also run mvn test, which executes the included integration tests. These tests spawn embedded Kafka clusters to showcase the Kafka Streams functionality end-to-end. The benefit of the integration tests is that you don't need to install and run a Kafka cluster yourself.

If you want to run the examples against a Kafka cluster, you may want to create a standalone jar ("fat jar") of the Kafka Streams examples via:

# Create a standalone jar
#
# Tip: You can also disable the test suite (e.g. to speed up the packaging
#      or to lower JVM memory usage) if needed:
#
#     $ mvn -DskipTests=true clean package
#
$ mvn clean package

# >>> Creates target/streams-examples-3.2.0-standalone.jar

You can now run the example applications as follows:

# Run an example application from the standalone jar.
# Here: `WordCountLambdaExample`
$ java -cp target/streams-examples-3.2.0-standalone.jar \
  io.confluent.examples.streams.WordCountLambdaExample

Keep in mind that the machine on which you run the command above must have access to the Kafka/ZK clusters you configured in the code examples. By default, the code examples assume the Kafka cluster is accessible via localhost:9092 (Kafka broker) and the ZooKeeper ensemble via localhost:2181.

This project uses the standard maven lifecycle and commands such as:

$ mvn compile # This also generates Java classes from the Avro schemas
$ mvn test    # Runs unit and integration tests
Branch (this repo) Apache Kafka Confluent Platform Notes
master 0.11.0.0-SNAPSHOT 3.3.0-SNAPSHOT You must manually build the trunk version of Apache Kafka. See instructions above.
3.2.x 0.10.2.0(-cp1) 3.2.0 Works out of the box
3.1.x 0.10.1.1 [preferred], 0.10.1.0(-cp2) 3.1.1 Works out of the box
kafka-0.10.0.1-cp-3.0.1 0.10.0.1(-cp1) 3.0.1 Works out of the box
kafka-0.10.0.0-cp-3.0.0 0.10.0.0(-cp1) 3.0.0 Works out of the box

The master branch of this repository represents active development, and may require additional steps on your side to make it compile. Check this README as well as pom.xml for any such information.