Skip to content

Commit

Permalink
Delete guides + move useful guides to docs (#62)
Browse files Browse the repository at this point in the history
* fix sidebar positions

* delete guides

* .md to .mdx

* address reviews

* add release link

* address reviews 2
  • Loading branch information
maha-hajja authored Mar 21, 2024
1 parent abc0bc2 commit df6158b
Show file tree
Hide file tree
Showing 10 changed files with 179 additions and 1,201 deletions.
117 changes: 117 additions & 0 deletions docs/connectors/kafka-connect-connector.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
title: "Kafka Connect Connectors with Conduit"
sidebar_position: 8
---

# Using Kafka Connect Connectors with Conduit

The [Conduit Kafka Connect Wrapper connector](https://github.com/ConduitIO/conduit-kafka-connect-wrapper) is a special
connector that allows you to use [Kafka Connect](https://docs.confluent.io/platform/current/connect/index.html)
connectors with Conduit. Conduit doesn't come bundled with Kafka Connect connectors, but you can use it to bring any
Kafka Connect connector with Conduit.

This connector gives you the ability to:

- Easily migrate from Kafka Connect to Conduit.
- Remove Kafka as a dependency to move data between data infrastructure.
- Leverage a datastore if Conduit doesn't have a native connector.

Since the Conduit Kafka Connect Wrapper itself is written in Java, but most of Conduit's connectors are written in Go,
it also serves as a good example of the flexibility of the Conduit Plugin SDK.

Let's begin.

## How it works

To use the Kafka Connect wrapper connector, you'll need to:

1. Download the [`conduit-kafka-connect-wrapper`](https://github.com/ConduitIO/conduit-kafka-connect-wrapper) latest release,
at the time of writing this, we'll go with [v0.4.3](https://github.com/ConduitIO/conduit-kafka-connect-wrapper/releases/tag/v0.4.3)
and download the file `conduit-kafka-connect-wrapper-v0.4.3.zip`.
1. Download Kafka Connect JARs and any dependencies you would like to add.
1. Create a pipeline configuration file.
1. Run Conduit.


## Setup

To begin, download the [`conduit-kafka-connect-wrapper`](https://github.com/ConduitIO/conduit-kafka-connect-wrapper) [latest release](https://github.com/ConduitIO/conduit-kafka-connect-wrapper/releases),
at the time of writing this, we'll go with `v0.4.3` and download the file `conduit-kafka-connect-wrapper-v0.4.3.zip`.

Downloading the release is our **preferred** option to get the connector JAR. However, another option to get the JAR file is
to build it from source, this could be useful in case you added your own changes to the connector and wanted to test them
out, to do that we will clone the connector:
```
git clone [email protected]:ConduitIO/conduit-kafka-connect-wrapper.git
```

Then, we need to build the connector. The Kafka Connect wrapper connector is written in Java, so it needs to be compiled.
```bash
cd conduit-kafka-connect-wrapper

./scripts/dist.sh
```

Running `scripts/dist.sh` will create a directory called `dist` with following contents:
1. A script (which runs the connector). This script starts a connector instance.
1. Directory `libs`. This is where you add the connector JAR itself, and put the Kafka connector JARs and their dependencies (if any).

Now that we have everything setup, we can add the Kafka Connect connectors.

## Download a connector and its dependencies

The `libs` directory is where you put the Kafka Connect connector JARs and their dependencies (if any). The wrapper
plugin will automatically load connectors and all the other dependencies from a `libs` directory.

To download a connector from a Maven repository and all of its dependencies, you can use `scripts/download-connector.sh`.
For example:
```shell
./scripts/download-connector.sh io.example jdbc-connector 2.1.3
```
For usage, run `./scripts/download-connector.sh --help`.

You can also download them manually if needed, we will use the PostgreSQL Kafka Connect JDBC Connector. To install, add the following:
- [Aiven's Kafka Connect JDBC Connectors](https://github.com/aiven/jdbc-connector-for-apache-kafka)
- [Postgres Connector JAR](https://repo1.maven.org/maven2/org/postgresql/postgresql/42.3.3/postgresql-42.3.3.jar)

This connector allows you to connect to any [JDBC database](https://en.wikipedia.org/wiki/Java_Database_Connectivity).

## Create Pipeline Configuration File

Now that the Kafka Connect connectors included in `lib`, we can use it in a pipeline configuration file.

1. [Install Conduit](https://github.com/ConduitIO/conduit#installation-guide).
2. Create a pipeline configuration file: Create a folder called `pipelines` at the same level as your Conduit
binary. Inside of that folder create a file named `jdbc-to-file.yml`, check [Specifications](https://conduit.io/docs/pipeline-configuration-files/specifications)
for more details about Pipeline Configuration Files.

````yaml
version: 2.0
pipelines:
- id: kafka-connect-pipeline
status: running
description: This pipeline is for testing
connectors:
- id: jdbc-kafka-connect
type: source
plugin: standalone:conduit-kafka-connect-wrapper
settings:
wrapper.connector.class: "io.aiven.connect.jdbc.JdbcSourceConnector",
connection.url: "jdbc:postgresql://localhost/conduit-test-db",
connection.user: "username",
connection.password: "password",
incrementing.column.name: "id",
mode: "incrementing",
tables: "customers",
topic.prefix: "my_topic_prefix"
- id: file-dest
type: destination
plugin: builtin:file
settings:
path: "path/to/the/file.txt"
````
3. Run Conduit! And see how simple it is to migrate to Conduit.

Note that the `wrapper.connector.class` should be a class which is present on the classpath, i.e. in one of the JARs in
the `libs` directory.
For more information, check the [Wrapper Configuration](https://github.com/ConduitIO/conduit-kafka-connect-wrapper#configuration) section.
66 changes: 62 additions & 4 deletions docs/features/stream-inspector.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,71 @@ Conduit's stream inspector makes it possible to peek at the data as it enters Co
data looks like as it travels to destination connectors. Keep in mind that this feature is about sampling data as it
passes through the pipeline not tailing the pipeline.

Stream inspection is available via the Conduit UI and the API. To learn how to use it, please visit our
[stream inspector guide](/guides/stream-inspector).

Stream inspection doesn't affect a pipeline's performance. However, if the data is being pushed through the stream very
fast, it's possible that you may not see all of the records. This is done to protect Conduit's performance (by not keeping
too much data in the memory) and to protect the pipeline's performance (by not blocking it until data is inspected).

Stream inspection will **not** be automatically stopped if the connector being inspected is stopped. This makes it
The inspection will **not** be automatically stopped if the connector being inspected is stopped. This makes it
possible to "catch" all the records, from the moment a connector starts. It also makes it possible to inspect records
as you're potentially restarting a pipeline.

Stream inspection is available via the Conduit UI and API:

## UI

To access the stream inspector through the UI, first navigate to the pipeline which you'd like to inspect. Then, click
on the connector in which you're interested. You'll see something similar to this:

![stream inspector pipeline view](/img/stream-inspector-pipeline.png)

Click the "Inspect Stream" button to start inspecting the connector. A new pop-up window will show the records:

![stream inspector show stream](/img/stream-inspector-show-stream.png)

On the "Stream" tab you'll see the latest 10 records. If you switch to the "Single record" view, only the last record
will be shown. You can use the "Pause" button to pause the inspector and stop receiving the latest record(s). The ones
that are already shown will be kept so you can inspect them more thoroughly.

## API

To access the stream inspector through the API, you'll need a WebSocket client (for example [wscat](https://github.com/websockets/wscat)).
The URL on which the inspector is available comes in the following format: `ws://host:port/v1/connectors/<connector ID>/inspect`.
For example, if you run Conduit locally with the default settings, you can inspect a connector by running the following
command:

```shell
$ wscat -c ws://localhost:8080/v1/connectors/pipeline1:destination1/inspect | jq .
{
"result": {
"position": "NGVmNTFhMzUtMzUwMi00M2VjLWE2YjEtMzdkMDllZjRlY2U1",
"operation": "OPERATION_CREATE",
"metadata": {
"opencdc.readAt": "1669886131666337227"
},
"key": {
"rawData": "NzQwYjUyYzQtOTNhOS00MTkzLTkzMmQtN2Q0OWI3NWY5YzQ3"
},
"payload": {
"before": {
"rawData": ""
},
"after": {
"structuredData": {
"company": "string 1d4398e3-21cf-41e0-9134-3fe012e6d1fb",
"id": 1534737621,
"name": "string fbc664fa-fdf2-4c5a-b656-d52cbddab671",
"trial": true
}
}
}
}
}
```

The above command also uses `jq` to pretty-print the output. You can also use `jq` to decode Base64-encoded strings,
which may represent record positions, keys or payloads:

```shell
wscat -c ws://localhost:8080/v1/connectors/pipeline1:destination1/inspect | jq '.result.key.rawData |= @base64d'
```

101 changes: 0 additions & 101 deletions guides/2022-01-24-conduit-file-to-file.mdx

This file was deleted.

Loading

0 comments on commit df6158b

Please sign in to comment.