-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: restart pipeline with recovery (#1853)
* wip * implement restart method * implement restart swapping nodes * fix method calls * fix some tests * fix another test * fix test * update comment and refactor test * refactor tests (wip) * restart test WIP * remove redundant code * update test * fix test * add new tests and refactor * fix logger * cosmetic changes * move to a separate method * add tracing and fix max retries * not needed * Set -1 as infinite retries for err recovery * rename flag * goes to degraded once it exits * fix const * simpler implementation * add test case for 0 max-retries * fix test * no need to kill the tomb * typo and fix * fix exit-on-degraded when degraded * log before erroring * removes restart and uses start * clearer validation * handle recovery * Pipeline recovery: test cases (#1873) --------- Co-authored-by: Raúl Barroso <[email protected]> * Pipeline recovery tests: add test pipeline, add test case (#1876) * need to return err * assign error * updated test cases * reuse method * use log.AttemptField --------- Co-authored-by: Haris Osmanagic <[email protected]>
- Loading branch information
Showing
13 changed files
with
900 additions
and
216 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,330 @@ | ||
<!-- markdownlint-disable MD013 --> | ||
# Test case for the pipeline recovery feature | ||
|
||
## Test Case 01: Recovery triggered for on a DLQ write error | ||
|
||
**Priority** (low/medium/high): | ||
|
||
**Description**: | ||
Recovery is triggered when there is an error writing to a DLQ. As for a normal | ||
destination, a DLQ write error can be a temporary error that can be solved after | ||
a retry. | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
version: "2.2" | ||
pipelines: | ||
- id: file-pipeline | ||
status: running | ||
name: file-pipeline | ||
description: dlq write error | ||
connectors: | ||
- id: chaos-src | ||
type: source | ||
plugin: standalone:chaos | ||
name: chaos-src | ||
settings: | ||
readMode: error | ||
- id: log-dst | ||
type: destination | ||
plugin: builtin:log | ||
log: file-dst | ||
dead-letter-queue: | ||
plugin: "builtin:postgres" | ||
settings: | ||
table: non_existing_table_so_that_dlq_fails | ||
url: postgresql://meroxauser:meroxapass@localhost/meroxadb?sslmode=disable | ||
window-size: 3 | ||
window-nack-threshold: 2 | ||
``` | ||
**Steps**: | ||
**Expected Result**: | ||
**Additional comments**: | ||
--- | ||
## Test Case 02: Recovery not triggered for fatal error - processor | ||
**Priority** (low/medium/high): | ||
**Description**: | ||
Recovery is not triggered when there is an error processing a record. | ||
**Automated** (yes/no) | ||
**Setup**: | ||
**Pipeline configuration file**: | ||
```yaml | ||
``` | ||
|
||
**Steps**: | ||
|
||
**Expected Result**: | ||
|
||
**Additional comments**: | ||
|
||
--- | ||
|
||
## Test Case 03: Recovery not triggered - graceful shutdown | ||
|
||
**Priority** (low/medium/high): | ||
|
||
**Description**: | ||
Recovery is not triggered when Conduit is shutting down gracefully (i.e. when | ||
typing Ctrl+C in the terminal where Conduit is running, or sending a SIGINT). | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
``` | ||
|
||
**Steps**: | ||
|
||
**Expected Result**: | ||
|
||
**Additional comments**: | ||
|
||
--- | ||
|
||
## Test Case 04: Recovery not triggered - user stopped pipeline | ||
|
||
**Priority** (low/medium/high): | ||
|
||
**Description**: | ||
Recovery is not triggered if a user stops a pipeline (via the HTTP API's | ||
`/v1/pipelines/pipeline-id/stop` endpoint). | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
``` | ||
|
||
**Steps**: | ||
|
||
**Expected Result**: | ||
|
||
**Additional comments**: | ||
|
||
--- | ||
|
||
## Test Case 05: Recovery is configured by default | ||
|
||
**Priority** (low/medium/high): | ||
|
||
**Description**: | ||
Pipeline recovery is configured by default. A failing pipeline will be restarted | ||
a number of times without any additional configuration. | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
version: "2.2" | ||
pipelines: | ||
- id: chaos-to-log | ||
status: running | ||
name: chaos-to-log | ||
description: chaos source, error on read | ||
connectors: | ||
- id: chaos-source-1 | ||
type: source | ||
plugin: standalone:chaos | ||
name: chaos-source-1 | ||
settings: | ||
readMode: error | ||
- id: destination1 | ||
type: destination | ||
plugin: builtin:log | ||
name: log-destination | ||
``` | ||
**Steps**: | ||
**Expected Result**: | ||
**Additional comments**: | ||
--- | ||
## Test Case 06: Recovery not triggered on malformed pipeline | ||
**Priority** (low/medium/high): | ||
**Description**: | ||
Recovery is not triggered for a malformed pipeline, e.g. when a connector is | ||
missing. | ||
**Automated** (yes/no) | ||
**Setup**: | ||
**Pipeline configuration file**: | ||
```yaml | ||
version: "2.2" | ||
pipelines: | ||
- id: nothing-to-log | ||
status: running | ||
name: nothing-to-log | ||
description: no source | ||
connectors: | ||
- id: destination1 | ||
type: destination | ||
plugin: builtin:log | ||
name: log-destination | ||
``` | ||
**Steps**: | ||
**Expected Result**: | ||
**Additional comments**: | ||
--- | ||
## Test Case 07: Conduit exits with --pipelines.exit-on-degraded=true and a pipeline failing after recovery | ||
**Priority** (low/medium/high): | ||
**Description**: Given a Conduit instance with | ||
`--pipelines.exit-on-degraded=true`, and a pipeline that's failing after the | ||
maximum number of retries configured, Conduit should shut down gracefully. | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
``` | ||
|
||
**Steps**: | ||
|
||
**Expected Result**: | ||
|
||
**Additional comments**: | ||
|
||
--- | ||
|
||
## Test Case 08: Conduit doesn't exit with --pipelines.exit-on-degraded=true and a pipeline that recovers after a few retries | ||
|
||
**Priority** (low/medium/high): | ||
|
||
**Description**: | ||
Given a Conduit instance with `--pipelines.exit-on-degraded=true`, and a | ||
pipeline that recovers after a few retries, Conduit should still be running. | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
``` | ||
|
||
**Steps**: | ||
|
||
**Expected Result**: | ||
|
||
**Additional comments**: | ||
|
||
--- | ||
|
||
## Test Case 09: Conduit exits with --pipelines.exit-on-degraded=true, --pipelines.error-recovery.max-retries=0, and a degraded pipeline | ||
|
||
**Priority** (low/medium/high): | ||
|
||
**Description**: | ||
Given a Conduit instance with | ||
`--pipelines.exit-on-degraded=true --pipelines.error-recovery.max-retries=0`, | ||
and a pipeline that goes into a degraded state, the Conduit instance will | ||
gracefully shut down. This is due `max-retries=0` disabling the recovery. | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
``` | ||
|
||
**Steps**: | ||
|
||
**Expected Result**: | ||
|
||
**Additional comments**: | ||
|
||
--- | ||
|
||
## Test Case 10: Recovery not triggered for fatal error - DLQ threshold exceeded | ||
|
||
**Priority** (low/medium/high): | ||
|
||
**Description**: | ||
Recovery is not triggered when the DLQ threshold is exceeded. | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
version: "2.2" | ||
pipelines: | ||
- id: pipeline1 | ||
status: running | ||
name: pipeline1 | ||
description: chaos destination with write errors, DLQ threshold specified | ||
connectors: | ||
- id: generator-src | ||
type: source | ||
plugin: builtin:generator | ||
name: generator-src | ||
settings: | ||
format.type: structured | ||
format.options.id: int | ||
format.options.name: string | ||
rate: "1" | ||
- id: chaos-destination-1 | ||
type: destination | ||
plugin: standalone:chaos | ||
name: chaos-destination-1 | ||
settings: | ||
writeMode: error | ||
dead-letter-queue: | ||
window-size: 2 | ||
window-nack-threshold: 1 | ||
``` | ||
|
||
**Steps**: | ||
|
||
**Expected Result**: | ||
|
||
**Additional comments**: | ||
|
||
--- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
<!-- markdownlint-disable MD041 --> | ||
|
||
## Test Case 01: Test case title | ||
|
||
**Priority** (low/medium/high): | ||
|
||
**Description**: | ||
|
||
**Automated** (yes/no) | ||
|
||
**Setup**: | ||
|
||
**Pipeline configuration file**: | ||
|
||
```yaml | ||
``` | ||
|
||
**Steps**: | ||
|
||
1. Step 1 | ||
2. Step 2 | ||
|
||
**Expected Result**: | ||
|
||
**Additional comments**: | ||
|
||
--- |
Oops, something went wrong.