Skip to content

Commit 0ece5c1

Browse files
Added a JSON payload formatter (#22)
* Initial working version of JsonPayloadFormatter and unit tests * make JsonPayloadFormatter Configurable * Add key/value schema visibility control * Use logging for formatter test output; configure with src/test/resources/simplelogger.properties * Fix schema visibility * Java import clean-up * Test lambda function dumps event as json * Add JsonPayloadFormatter description to README * Restored connect node * Use enum for visibility; updates from review comments * Use overloaded methods * Add support for batch in JsonPayloadFormatter * Clean up Invocation payloads section * Add integer, long, boolean key tests; remove some constants in tests as it actually made the code less understandable * Cleaned up example avro schema * Remove testing artifact names from example * v1.0.0
1 parent 0c9caca commit 0ece5c1

File tree

15 files changed

+1526
-91
lines changed

15 files changed

+1526
-91
lines changed

README.md

Lines changed: 95 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ _The `kafka-connect-lambda` connector has been tested with `connect-api:2.1.0` a
1515

1616
# Configuring
1717

18-
In addition to the standard [Kafka Connect connector configuration](https://kafka.apache.org/documentation/#connect_configuring) properties, the `kafka-connect-lambda` properties available:
18+
In addition to the standard [Kafka Connect connector configuration](https://kafka.apache.org/documentation/#connect_configuring) properties, the `kafka-connect-lambda` properties available are:
1919

2020
| Property | Required | Default value | Description |
2121
|:---------|:---------|:--------|:------------|
@@ -32,7 +32,21 @@ In addition to the standard [Kafka Connect connector configuration](https://kafk
3232
| `retry.backoff.millis` | No | `500` | Time to append between invocation retries |
3333
| `retries.max` | No | `5` | Maximum number of invocation retries |
3434
| `topics` | Yes | | Comma-delimited Kafka topics names to sink |
35+
| `payload.formatter.class` | No | `com.nordstrom.kafka.connect.formatters.PlainPayloadFormatter` | Specifies the formatter to use. |
36+
| `payload.formatter.key.schema.visibility` | No | `min` | Determines whether schema (if present) is included. Only applies to JsonPayloadFormatter |
37+
| `payload.formatter.value.schema.visibility` | No | `min` | Determines whether schema (if present) is included. Only applies to JsonPayloadFormatter |
3538

39+
## Formatters
40+
41+
The connector includes two `payload.formatter.class` implementations:
42+
43+
* `com.nordstrom.kafka.connect.formatters.PlainPayloadFormatter`
44+
* `com.nordstrom.kafka.connect.formatters.JsonPayloadFormatter`
45+
46+
Including the full schema information in the invocation payload may result in very large messages. Therefore, use the `schema.visibility` key and value properties to control how much of the schema, if present, to include in the invocation payload: `none`, `min`, or `all` (default=`min`). These settings apply to the `JsonPayloadFormatter` only; The `PlainPayloadFormatter` always includes the `min` schema information.
47+
48+
49+
## Configuration Examples
3650
An example configuration represented as JSON data for use with the [Kafka Connect REST interface](https://docs.confluent.io/current/connect/references/restapi.html):
3751

3852
```json
@@ -65,18 +79,91 @@ By supplying `com.nordstrom.kafka.connect.auth.AWSAssumeRoleCredentialsProvider`
6579

6680
The default invocation payload is a JSON representation of a [SinkRecord](https://kafka.apache.org/21/javadoc/org/apache/kafka/connect/sink/SinkRecord.html) object, which contains the Kafka message in the `value` field. When `aws.lambda.batch.enabled` is `true`, the invocation payload is an array of these records.
6781

68-
Example payload:
82+
## Avro schema
83+
84+
This simple schema record describes our "hello, world" message.
85+
86+
87+
```
88+
{
89+
"type": "record",
90+
"name": "Hello",
91+
"doc": "An example Avro-encoded `Hello` message.",
92+
"namespace": "com.nordstrom.kafka.example",
93+
"fields": [
94+
{
95+
"name": "language",
96+
"type": {
97+
"type": "enum",
98+
"name": "language",
99+
"symbols": [ "ENGLISH", "FRENCH", "ITALIAN", "SPANISH"
100+
]
101+
}
102+
},
103+
{
104+
"name": "greeting",
105+
"type": "string"
106+
}
107+
]
108+
}
109+
110+
```
111+
112+
### PlainPayloadFormatter
113+
114+
This example uses the following (partial) connector configuration which defaults to `payload.formatter=com.nordstrom.kafka.connect.formatters.PlainPayloadFormatter`:
115+
116+
```json
117+
key.converter=org.apache.kafka.connect.storage.StringConverter
118+
value.converter=io.confluent.connect.avro.AvroConverter
119+
aws.lambda.batch.enabled=false
120+
```
121+
122+
Expected output:
123+
69124

70125
```json
71126
{
127+
"key": "my_key",
128+
"keySchemaName": null,
129+
"value": "Struct{language=ENGLISH,greeting=hello, world}",
130+
"valueSchemaName": "com.nordstrom.kafka.example.Hello",
72131
"topic": "example-stream",
73132
"partition": 1,
74133
"offset": 0,
75-
"key": "",
134+
"timestamp": 1567723257583,
135+
"timestampTypeName": "CreateTime"
136+
}
137+
```
138+
139+
### JsonPayloadFormatter
140+
141+
This example uses the following (partial) connector configuration with key and value schema visibility as `min` (the default):
142+
143+
```json
144+
key.converter=org.apache.kafka.connect.storage.StringConverter
145+
value.converter=io.confluent.connect.avro.AvroConverter
146+
aws.lambda.batch.enabled=false
147+
payload.formatter.class=com.nordstrom.kafka.connect.formatters.JsonPayloadFormatter
148+
```
149+
150+
Expected output:
151+
152+
```json
153+
{
154+
"key": "my_key",
76155
"keySchemaName": null,
77-
"value": "hello world",
78-
"valueSchemaName": "example-value",
79-
"timestamp": 1564961567407,
156+
"keySchemaVersion": null,
157+
"value": {
158+
"language": "ENGLISH",
159+
"greeting": "hello, world"
160+
},
161+
"valueSchemaName": "com.nordstrom.kafka.example.Hello",
162+
"valueSchemaVersion": "1",
163+
"topic": "example-stream",
164+
"partition": 1,
165+
"offset": 0,
166+
"timestamp": 1567723257583,
80167
"timestampTypeName": "CreateTime"
81168
}
82169
```
@@ -102,7 +189,7 @@ To make sure our Lambda works, invoke it directly and view the result payload in
102189
aws lambda invoke --function-name example-function --payload '{"value": "my example"}' result.txt
103190
```
104191

105-
The function simply sends the `payload` back to you in `result.txt`.
192+
The function simply sends the `payload` back to you in `result.txt` as serialized json.
106193

107194
Use the `describe-stacks` command to fetch the CloudFormation output value for `ExampleFunctionArn`, which we'll need later when setting up our connector configuration:
108195

@@ -116,7 +203,7 @@ aws cloudformation describe-stacks --stack-name example-lambda-stack --query "St
116203
mvn clean package
117204
```
118205

119-
Once built, a `kafka-connect-lambda` uber-jar is in the `target/` directory.
206+
Once built, a `kafka-connect-lambda` uber-jar is in the `target/plugin` directory.
120207

121208
## Run the connector using Docker Compose
122209

config/cloudformation.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ Resources:
1010
Role: !GetAtt 'ExampleFunctionRole.Arn'
1111
Code:
1212
ZipFile: |
13+
import json
1314
def handler(event, context):
14-
print(f"hello, {event}")
15+
print(json.dumps(event))
1516
return event
1617
1718
ExampleFunctionRole:

config/worker.properties

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@ bootstrap.servers=localhost:9092
33
plugin.path=target/plugin
44
offset.storage.file.filename=/tmp/connect.offsets
55

6-
key.converter=org.apache.kafka.connect.storage.StringConverter
6+
key.converter=io.confluent.connect.avro.AvroConverter
7+
key.converter.schema.registry.url=http://localhost:8080
78
key.converter.schemas.enable=false
8-
value.converter=org.apache.kafka.connect.json.JsonConverter
9+
value.converter=io.confluent.connect.avro.AvroConverter
10+
value.converter.schema.registry.url=http://localhost:8080
911
value.converter.schemas.enable=false
1012

1113
internal.key.converter=org.apache.kafka.connect.json.JsonConverter

docker-compose.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ services:
4646
depends_on: [broker]
4747
logging: { driver: none }
4848

49+
# NB: run connect locally in stand-alone mode to debug
4950
connect:
5051
image: confluentinc/cp-kafka-connect:5.1.3
5152
ports:

pom.xml

Lines changed: 41 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
<groupId>com.nordstrom.kafka.connect.lambda</groupId>
77
<artifactId>kafka-connect-lambda</artifactId>
8-
<version>1.0.4</version>
8+
<version>1.1.0</version>
99

1010
<name>kafka-connect-lambda</name>
1111
<description>A Kafka Connect Connector for kafka-connect-lambda</description>
@@ -15,18 +15,48 @@
1515
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
1616
<java.version>1.8</java.version>
1717

18-
<kafka.connect-api.version>2.1.0</kafka.connect-api.version>
18+
<slf4j.version>1.7.25</slf4j.version>
19+
20+
<kafka-connect.version>2.1.0</kafka-connect.version>
21+
<!-- NB: must be consistent with version in kafka-connect -->
22+
<jackson.version>2.9.6</jackson.version>
1923
<aws-java-sdk.version>1.11.592</aws-java-sdk.version>
2024
<junit.version>4.12</junit.version>
2125
<mockito-core.version>2.28.2</mockito-core.version>
2226
<google.guava.version>19.0</google.guava.version>
27+
<jackson.version>2.9.6</jackson.version>
2328
</properties>
2429

2530
<dependencies>
2631
<dependency>
2732
<groupId>org.apache.kafka</groupId>
2833
<artifactId>connect-api</artifactId>
29-
<version>${kafka.connect-api.version}</version>
34+
<version>${kafka-connect.version}</version>
35+
</dependency>
36+
<dependency>
37+
<groupId>org.apache.kafka</groupId>
38+
<artifactId>connect-json</artifactId>
39+
<version>${kafka-connect.version}</version>
40+
</dependency>
41+
<dependency>
42+
<groupId>com.fasterxml.jackson.core</groupId>
43+
<artifactId>jackson-core</artifactId>
44+
<version>${jackson.version}</version>
45+
</dependency>
46+
<dependency>
47+
<groupId>com.fasterxml.jackson.core</groupId>
48+
<artifactId>jackson-annotations</artifactId>
49+
<version>${jackson.version}</version>
50+
</dependency>
51+
<dependency>
52+
<groupId>com.fasterxml.jackson.core</groupId>
53+
<artifactId>jackson-databind</artifactId>
54+
<version>${jackson.version}</version>
55+
</dependency>
56+
<dependency>
57+
<groupId>com.fasterxml.jackson.dataformat</groupId>
58+
<artifactId>jackson-dataformat-cbor</artifactId>
59+
<version>${jackson.version}</version>
3060
</dependency>
3161
<dependency>
3262
<groupId>com.amazonaws</groupId>
@@ -56,6 +86,12 @@
5686
<version>${google.guava.version}</version>
5787
<scope>test</scope>
5888
</dependency>
89+
<dependency>
90+
<groupId>org.slf4j</groupId>
91+
<artifactId>slf4j-simple</artifactId>
92+
<version>${slf4j.version}</version>
93+
<scope>test</scope>
94+
</dependency>
5995
</dependencies>
6096

6197
<build>
@@ -143,6 +179,8 @@
143179
<exclude>org.apache.httpcomponents:*</exclude>
144180
<exclude>org.apache.kafka:*</exclude>
145181
<exclude>org.slf4j:*</exclude>
182+
<exclude>com.fasterxml.jackson.core:*</exclude>
183+
<exclude>javax.ws.rs:*</exclude>
146184
</excludes>
147185
</artifactSet>
148186
</configuration>
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
package com.nordstrom.kafka.connect.formatters;
2+
3+
import com.fasterxml.jackson.core.JsonProcessingException;
4+
import com.fasterxml.jackson.databind.JsonNode;
5+
import com.fasterxml.jackson.databind.ObjectMapper;
6+
import com.fasterxml.jackson.databind.ObjectWriter;
7+
import org.apache.kafka.common.Configurable;
8+
import org.apache.kafka.connect.data.Schema;
9+
import org.apache.kafka.connect.json.JsonConverter;
10+
import org.apache.kafka.connect.json.JsonConverterConfig;
11+
import org.apache.kafka.connect.json.JsonDeserializer;
12+
import org.apache.kafka.connect.sink.SinkRecord;
13+
14+
import java.util.Collection;
15+
import java.util.HashMap;
16+
import java.util.Map;
17+
18+
import static java.util.Collections.emptyMap;
19+
20+
public class JsonPayloadFormatter implements PayloadFormatter, Configurable {
21+
enum SchemaVisibility {
22+
ALL,
23+
MIN,
24+
NONE
25+
}
26+
27+
private final ObjectWriter recordWriter = new ObjectMapper().writerFor(Payload.class);
28+
private final ObjectWriter recordsWriter = new ObjectMapper().writerFor(Payload[].class);
29+
private final JsonConverter converter = new JsonConverter();
30+
private final JsonConverter converterSansSchema = new JsonConverter();
31+
private final JsonDeserializer deserializer = new JsonDeserializer();
32+
private SchemaVisibility keySchemaVisibility = SchemaVisibility.MIN;
33+
private SchemaVisibility valueSchemaVisibility = SchemaVisibility.MIN;
34+
35+
public JsonPayloadFormatter() {
36+
converter.configure(emptyMap(), false);
37+
38+
Map<String, String> configs = new HashMap<>();
39+
configs.put(JsonConverterConfig.SCHEMAS_ENABLE_CONFIG, "false");
40+
converterSansSchema.configure(configs, false);
41+
42+
deserializer.configure(emptyMap(), false);
43+
}
44+
45+
@Override
46+
public void configure(Map<String, ?> configs) {
47+
keySchemaVisibility = configureSchemaVisibility(configs, "formatter.key.schema.visibility");
48+
valueSchemaVisibility = configureSchemaVisibility(configs, "formatter.value.schema.visibility");
49+
}
50+
51+
private SchemaVisibility configureSchemaVisibility(final Map<String, ?> configs, final String key) {
52+
SchemaVisibility viz = SchemaVisibility.MIN;
53+
final Object visibility = configs.get(key);
54+
if (visibility != null) {
55+
switch (visibility.toString()) {
56+
case "all":
57+
viz = SchemaVisibility.ALL;
58+
break;
59+
case "min":
60+
viz = SchemaVisibility.MIN;
61+
break;
62+
case "none":
63+
viz = SchemaVisibility.NONE;
64+
break;
65+
}
66+
}
67+
68+
return viz;
69+
}
70+
71+
public String format(final SinkRecord record) {
72+
try {
73+
return recordWriter.writeValueAsString(recordToPayload(record));
74+
} catch (JsonProcessingException e) {
75+
throw new PayloadFormattingException(e);
76+
}
77+
}
78+
79+
public String format(final Collection<SinkRecord> records) {
80+
final Payload[] payloads = records
81+
.stream()
82+
.map(this::recordToPayload)
83+
.toArray(Payload[]::new);
84+
85+
try {
86+
return recordsWriter.writeValueAsString(payloads);
87+
} catch (final JsonProcessingException e) {
88+
throw new PayloadFormattingException(e);
89+
}
90+
}
91+
92+
private Payload<Object, Object> recordToPayload(final SinkRecord record) {
93+
Object deserializedKey;
94+
Object deserializedValue;
95+
if (record.keySchema() == null) {
96+
deserializedKey = record.key();
97+
} else {
98+
deserializedKey = deserialize(keySchemaVisibility, record.topic(), record.keySchema(), record.key());
99+
}
100+
if (record.valueSchema() == null) {
101+
deserializedValue = record.value();
102+
} else {
103+
deserializedValue = deserialize(valueSchemaVisibility, record.topic(), record.valueSchema(), record.value());
104+
}
105+
106+
Payload<Object, Object> payload = new Payload<>(record);
107+
payload.setKey(deserializedKey);
108+
payload.setValue(deserializedValue);
109+
if (keySchemaVisibility == SchemaVisibility.NONE) {
110+
payload.setKeySchemaName(null);
111+
payload.setKeySchemaVersion(null);
112+
}
113+
if (valueSchemaVisibility == SchemaVisibility.NONE) {
114+
payload.setValueSchemaName(null);
115+
payload.setValueSchemaVersion(null);
116+
}
117+
118+
return payload;
119+
}
120+
121+
private JsonNode deserialize(final SchemaVisibility schemaVisibility, final String topic, final Schema schema, final Object value) {
122+
if (schemaVisibility == SchemaVisibility.ALL) {
123+
return deserializer.deserialize(topic, converter.fromConnectData(topic, schema, value));
124+
} else {
125+
return deserializer.deserialize(topic, converterSansSchema.fromConnectData(topic, schema, value));
126+
}
127+
}
128+
129+
}

0 commit comments

Comments
 (0)