Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add kafka-java to Kafka source/sink docs #2347

Merged
merged 3 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 20 additions & 2 deletions docs/user-guide/sinks/kafka.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,34 @@
# Kafka Sink

Two methods are available for integrating Kafka topics into your Numaflow pipeline:
using a user-defined Kafka Sink or opting for the built-in Kafka Sink provided by Numaflow.

## Option 1: User-Defined Kafka Sink

Developed and maintained by the Numaflow contributor community,
the [Kafka Sink](https://github.com/numaproj-contrib/kafka-java) provides a reliable and feature-complete solution for publishing messages to Kafka topics.

Key Features:

* **Customization:** Offers complete control over Kafka Sink configurations to tailor to specific requirements.
* **Kafka Java Client Utilization:** Leverages the Kafka Java client for reliable message publishing to Kafka topics.
* **Schema Management:** Integrates seamlessly with the Confluent Schema Registry to support schema validation and manage schema evolution effectively.

More details on how to use the Kafka Sink can be found [here](https://github.com/numaproj-contrib/kafka-java?tab=readme-ov-file#write-data-to-kafka).

## Option 2: Built-in Kafka Sink
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverse the order of the 2 options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are eventually deprecating builtin, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put kafka-java first because I want it to get some attentions and feedbacks so that we can further improve it towards fully feature-complete.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whynowy mind if I keep kafka-java as first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead. I assume we will use this one as the builtin soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


A `Kafka` sink is used to forward the messages to a Kafka topic. Kafka sink supports configuration overrides.

## Kafka Headers
### Kafka Headers

We will insert `keys` into the Kafka header, but since `keys` is an array, we will add `keys` into the header in the
following format.

* `__keys_len` will have the number of `key` in the header. if `__keys_len` == `0`, means no `keys` are present.
* `__keys_%d` will have the `key`, e.g., `__key_0` will be the first key, and so forth.

## Example
### Example

```yaml
spec:
Expand Down
21 changes: 20 additions & 1 deletion docs/user-guide/sources/kafka.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
# Kafka Source

A `Kafka` source is used to ingest the messages from a Kafka topic. Numaflow uses consumer-groups to manage offsets.
Two methods are available for integrating Kafka topics into your Numaflow pipeline:
using a user-defined Kafka Source or opting for the built-in Kafka Source provided by Numaflow.

## Option 1: User-Defined Kafka Source

Developed and maintained by the Numaflow contributor community,
the [Kafka Source](https://github.com/numaproj-contrib/kafka-java) offers a robust and feature-complete solution
for integrating Kafka as a data source into your Numaflow pipeline.

Key Features:

* **Flexibility:** Allows full customization of Kafka Source configurations to suit specific needs.
* **Kafka Java Client Utilization:** Leverages the Kafka Java client for robust message consumption from Kafka topics.
* **Schema Management:** Integrates seamlessly with the Confluent Schema Registry to support schema validation and manage schema evolution effectively.

More details on how to use the Kafka Source can be found [here](https://github.com/numaproj-contrib/kafka-java/blob/main/README.md#read-data-from-kafka).

## Option 2: Built-in Kafka Source

Numaflow provides a built-in `Kafka` source to ingest messages from a Kafka topic. The source uses consumer-groups to manage offsets.

```yaml
spec:
Expand Down
Loading