You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-3
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ A [Kafka Connector](http://kafka.apache.org/documentation.html#connect) which im
7
7
## Notable features
8
8
*`autodiscovery` - monitors and automatically discovers DynamoDB tables to start/stop syncing from (based on AWS TAG's)
9
9
*`initial sync` - automatically detects and if needed performs initial(existing) data replication before tracking changes from the DynamoDB table stream
10
-
10
+
*`local debugging` - use of test containers to test full connector life-cycle
11
11
## Alternatives
12
12
13
13
Prior our development we found only one existing implementation by [shikhar](https://github.com/shikhar/kafka-connect-dynamodb), but it seems to be missing major features (initial sync, handling shard changes) and is no longer supported.
@@ -22,7 +22,7 @@ In our implementation we opted to use Amazon Kinesis Client with DynamoDB Stream
22
22
* Gradlew 5.3.1
23
23
* Kafka Connect Framework >= 2.1.1
24
24
* Amazon Kinesis Client 1.9.1
25
-
* DynamoDB Streams Kinesis Adapter 1.4.0
25
+
* DynamoDB Streams Kinesis Adapter 1.5.2
26
26
27
27
## Documentation
28
28
*[Getting started](docs/getting-started.md)
@@ -97,6 +97,9 @@ In our implementation we opted to use Amazon Kinesis Client with DynamoDB Stream
97
97
# build & run unit tests
98
98
./gradlew
99
99
100
+
# run integration tests
101
+
./gradlew integrationTests
102
+
100
103
# build final jar
101
104
./gradlew shadowJar
102
105
```
@@ -128,7 +131,6 @@ Releases are done by creating new release(aka tag) via Github user interface. On
128
131
129
132
## Roadmap (TODO: move to issues?)
130
133
131
-
* Add Confluent stack as docker-compose.yml for easier local debugging
132
134
* Use multithreaded DynamoDB table scan for faster `INIT SYNC`
Copy file name to clipboardExpand all lines: docs/details.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ This connector can sync multiple DynamoDB tables at the same time and it does so
8
8
* environment TAG key and value set
9
9
* DynamoDB streams enabled (in `new_image` or `new_and_old_image` mode)
10
10
11
-
11
+
> Note: if `dynamodb.table.whitelist` parameter is set, then auto-discovery will not be executed and replication will be issued for explicitly defined tables.
12
12
### 2. "INIT_SYNC"
13
13
14
14
`INIT_SYNC` is a process when all existing table data is scanned and pushed into Kafka destination topic. Usually this happens only once after source task for specific table is started for the first time. But it can be repeated in case of unexpected issues, e.g. if source connector was down for long period of time and it is possible that it has missed some of the change events from the table stream (DynamoDB streams store data for 24 hours only).
@@ -40,7 +40,7 @@ Since we are using two different frameworks/libraries together there are two dif
40
40
41
41
### `DISCOVERY` state and task configuration
42
42
43
-
Connector uses AWS resource group API to receive a list of DynamoDB tables which have ingestion TAG defined. Then it iterates over this list and checks if environment TAG is matched and streams are actually enabled. Connect task is started for each table which meats all requirements.
43
+
If `dynamodb.table.whitelist` parameter is not defined connector uses AWS resource group API to receive a list of DynamoDB tables which have ingestion TAG defined. Then it iterates over this list and checks if environment TAG is matched and streams are actually enabled. Connect task is started for each table which meats all requirements.
44
44
45
45
`discovery` phase is executed on start and every 60 seconds(default config value) after initial start.
`init.sync.delay.period` - time interval in seconds. Defines how long `INIT_SYNC` should delay execution before starting. This is used to give time for Kafka Connect tasks to calm down after rebalance (Since multiple tasks rebalances can happen in quick succession and this would mean more duplicated data since `INIT_SYNC` process won't have time mark it's progress).
46
52
47
-
`connect.dynamodb.rediscovery.period` - time interval in milliseconds. Defines how often connector should try to find new DynamoDB tables (or detect removed ones). If changes are found tasks are automatically reconfigured.
53
+
`connect.dynamodb.rediscovery.period` - time interval in milliseconds. Defines how often connector should try to find new DynamoDB tables (or detect removed ones). If changes are found tasks are automatically reconfigured.
48
54
55
+
`dynamodb.service.endpoint` - AWS DynamoDB API Endpoint. Will use default AWS if not set.
49
56
57
+
`resource.tagging.service.endpoint` - AWS Resource Group Tag API Endpoint. Will use default AWS if not set.
50
58
59
+
`kcl.table.billing.mode` - Define billing mode for internal table created by the KCL library. Default is provisioned.
51
60
61
+
`dynamodb.table.whitelist` - Define whitelist of dynamodb table names. This overrides table auto-discovery by ingestion tag.
publicstaticfinalStringSRC_DYNAMODB_TABLE_WHITELIST_DOC = "Define whitelist of dynamodb table names. This overrides table auto-discovery by ingestion tag.";
0 commit comments