merge main

Signed-off-by: Neelanjan Manna <[email protected]>
harness · Dec 7, 2022 · 4171b3a · 4171b3a
2 parents b6f3062 + 79b232e
commit 4171b3a
Show file tree

Hide file tree

Showing 4,140 changed files with 89,980 additions and 523 deletions.
diff --git a/docs/chaos-engineering/chaos-faults/aws/ec2-cpu-hog.md b/docs/chaos-engineering/chaos-faults/aws/ec2-cpu-hog.md
@@ -63,7 +63,7 @@ stringData:
 ## Fault Tunables
 
 <details>
-    <summary>Check the fault tunables</summary>
+    <summary>Check the Fault Tunables</summary>
     <h2>Mandatory Fields</h2>
     <table>
         <tr>

diff --git a/docs/chaos-engineering/chaos-faults/aws/ec2-io-stress.md b/docs/chaos-engineering/chaos-faults/aws/ec2-io-stress.md
@@ -65,7 +65,7 @@ stringData:
 ## Fault Tunables
 
 <details>
-<summary>Check the fault tunables</summary>
+<summary>Check the Fault Tunables</summary>
 
 <h2>Mandatory Fields</h2>
 

diff --git a/docs/chaos-engineering/chaos-faults/aws/ec2-memory-hog.md b/docs/chaos-engineering/chaos-faults/aws/ec2-memory-hog.md
@@ -65,7 +65,7 @@ stringData:
 ## Fault Tunables
 
 <details>
-<summary>Check the fault tunables</summary>
+<summary>Check the Fault Tunables</summary>
 <h2>Mandatory Fields</h2>
 <table>
     <tr>

diff --git a/docs/chaos-engineering/chaos-faults/aws/ecs-agent-stop.md b/docs/chaos-engineering/chaos-faults/aws/ecs-agent-stop.md
@@ -16,7 +16,7 @@ title: ECS Agent Stop
 ## Uses
 
 <details>
-<summary>View the uses of the experiment</summary>
+<summary>View the uses of the fault</summary>
 <div>
 Agent chaos stop is another very common and frequent scenario we find with ECS clusters that can break an agent that manages the task container on the ECS cluster and impacts their delivery. Such scenarios can still occur despite whatever availability aids docker provides.
 

diff --git a/docs/chaos-engineering/chaos-faults/aws/ecs-container-cpu-hog.md b/docs/chaos-engineering/chaos-faults/aws/ecs-container-cpu-hog.md
@@ -20,7 +20,7 @@ title: ECS Container CPU Hog
 ## Uses
 
 <details>
-<summary>View the uses of the experiment</summary>
+<summary>View the uses of the fault</summary>
 <div>
 CPU hogs are another very common and frequent scenario we find with containers/applications that can result in the eviction of the application (task container) and impact its delivery. Such scenarios can still occur despite whatever availability aids docker provides. These problems are generally referred to as "Noisy Neighbour" problems.
 

diff --git a/docs/chaos-engineering/chaos-faults/aws/ecs-container-io-stress.md b/docs/chaos-engineering/chaos-faults/aws/ecs-container-io-stress.md
@@ -20,7 +20,7 @@ title: ECS Container IO Hog
 ## Uses
 
 <details>
-<summary>View the uses of the experiment</summary>
+<summary>View the uses of the fault</summary>
 <div>
 Filesystem read and write is another very common and frequent scenario we find with conrainers/applications that can result in the eviction of the application (task container) and impact its delivery. Such scenarios that can still occur despite whatever availability aids docker provides. These problems are generally referred to as "Noisy Neighbour" problems.
 

diff --git a/docs/chaos-engineering/chaos-faults/aws/ecs-container-memory-hog.md b/docs/chaos-engineering/chaos-faults/aws/ecs-container-memory-hog.md
@@ -20,7 +20,7 @@ title: ECS Container Memory Hog
 ## Uses
 
 <details>
-<summary>View the uses of the experiment</summary>
+<summary>View the uses of the fault</summary>
 <div>
 Memory usage within containers is subject to various constraints. If the limits are specified in their spec, exceeding them can cause termination of the container (due to OOMKill of the primary process, often pid 1) - the restart of the container by docker, subject to the policy specified. For containers with no limits placed, the memory usage is uninhibited until such time as the VM level OOM Behaviour takes over. In this case, containers on the Instance can be killed based on their oom_score. This eval is extended to all task containers running on the instance - thereby causing a bigger blast radius.
 

diff --git a/docs/chaos-engineering/chaos-faults/aws/ecs-container-network-latency.md b/docs/chaos-engineering/chaos-faults/aws/ecs-container-network-latency.md
@@ -20,7 +20,7 @@ title: ECS Container Network Latency
 ## Uses
 
 <details>
-<summary>View the uses of the experiment</summary>
+<summary>View the uses of the fault</summary>
 <div>
 The fault causes network degradation of the task container without the container being marked unhealthy/unworthy of traffic from outside. The idea of this fault is to simulate issues within your ECS task network OR communication across services in different availability zones/regions etc.
 

diff --git a/docs/chaos-engineering/chaos-faults/aws/ecs-container-network-loss.md b/docs/chaos-engineering/chaos-faults/aws/ecs-container-network-loss.md
@@ -20,7 +20,7 @@ title: ECS Container Network Loss
 ## Uses
 
 <details>
-<summary>View the uses of the experiment</summary>
+<summary>View the uses of the fault</summary>
 <div>
 The fault causes network degradation of the task container without the container being marked unhealthy/unworthy of traffic from outside. The idea of this fault is to simulate issues within your ECS task network OR communication across services in different availability zones/regions etc.
 

diff --git a/docs/chaos-engineering/chaos-faults/aws/ecs-instance-stop.md b/docs/chaos-engineering/chaos-faults/aws/ecs-instance-stop.md
@@ -16,7 +16,7 @@ title: ECS Instance Stop
 ## Uses
 
 <details>
-<summary>View the uses of the experiment</summary>
+<summary>View the uses of the fault</summary>
 <div>
 EC2 instance chaos stop is another very common and frequent scenario we find with ECS clusters that can result in breaking of agent that manages task container on ECS cluster and impact its delivery. Such scenarios that can still occur despite whatever availability aids docker provides.
 

diff --git a/docs/chaos-engineering/chaos-faults/aws/elb-az-down.md b/docs/chaos-engineering/chaos-faults/aws/elb-az-down.md
@@ -0,0 +1,148 @@
+---
+id: elb-az-down
+title: ELB AZ Down
+---
+
+## Introduction
+- It takes AZ down chaos on a target ELB for a specified duration. It causes access restrictions for certain availability zones.
+- It tests application sanity, availability, and recovery workflows of the application pod attached to the load balancer.
+
+:::tip Fault execution flow chart
+![ELB AZ Down](./static/images/elb-az-down.png)
+:::
+
+## Uses
+
+<details>
+<summary>View the uses of the fault</summary>
+<div>
+AZ down is another very common and frequent scenario we find with ELB that can break the connectivity with the given zones and impacts their delivery. Such scenarios can still occur despite whatever availability aids AWS provides.
+
+Detaching the AZ from the load balancer will disrupt an application's performance and impact its smooth working. So this category of chaos fault helps build immunity in the application undergoing such scenarios.
+
+</div>
+</details>
+
+## Prerequisites
+
+:::info
+- Kubernetes > 1.17
+- AWS access to attach or detach an AZ from ELB.
+- Minimum number of AZ is attached to the ELB, else the fault fails to detach the given AZ.
+- Kubernetes secret that has the AWS access configuration(key) in the `CHAOS_NAMESPACE`. A sample secret file looks like:
+```yaml
+apiVersion: v1
+kind: Secret
+metadata:
+  name: cloud-secret
+type: Opaque
+stringData:
+  cloud_config.yml: |-
+    # Add the cloud AWS credentials respectively
+    [default]
+    aws_access_key_id = XXXXXXXXXXXXXXXXXXX
+    aws_secret_access_key = XXXXXXXXXXXXXXX
+```
+- If you change the secret key name (from `cloud_config.yml`), update the `AWS_SHARED_CREDENTIALS_FILE` environment variable value on `fault.yaml`with the same name.
+:::
+
+## Default Validations
+
+:::info
+- The ELB is attached to the given availability zones.
+:::
+
+## Fault tunables
+
+<details>
+    <summary>Check the Fault Tunables</summary>
+    <h2>Mandatory Fields</h2>
+    <table>
+      <tr>
+        <th> Variables </th>
+        <th> Description </th>
+        <th> Notes </th>
+      </tr>
+      <tr>
+        <td> LOAD_BALANCER_NAME </td>
+        <td> Provide the name of load balancer whose AZ has to be detached</td>
+        <td> Eg. <code>elb-name</code> </td>
+      </tr>
+      <tr>
+        <td> ZONES </td>
+        <td> Provide the target zones that have to be detached from ELB</td>
+        <td> Eg. <code>us-east-1a</code> </td>
+      </tr>
+      <tr>
+        <td> REGION </td>
+        <td> The region name for the target volumes</td>
+        <td> Eg. <code>us-east-1</code> </td>
+      </tr>
+    </table>
+    <h2>Optional Fields</h2>
+    <table>
+      <tr>
+        <th> Variables </th>
+        <th> Description </th>
+        <th> Notes </th>
+      </tr>
+      <tr>
+        <td> TOTAL_CHAOS_DURATION </td>
+        <td> The time duration for chaos insertion (in seconds) </td>
+        <td> Defaults to 30s </td>
+      </tr>
+      <tr>
+        <td> CHAOS_INTERVAL </td>
+        <td> The time duration between the attachment and detachment of the volumes (sec) </td>
+        <td> Defaults to 30s </td>
+      </tr>
+      <tr>
+        <td> SEQUENCE </td>
+        <td> It defines sequence of chaos execution for multiple volumes</td>
+        <td> Default value: parallel. Supported: serial, parallel </td>
+      </tr>
+      <tr>
+        <td> RAMP_TIME </td>
+        <td> Period to wait before and after injection of chaos in sec </td>
+        <td> Eg: 30 </td>
+      </tr>
+    </table>
+</details>
+
+## Fault Examples
+
+### Common and AWS specific tunables
+
+Refer to the [common attributes](../common-tunables-for-all-experiments) and [AWS specific tunable](./aws-experiments-tunables) to tune the common tunables for all faults and aws specific tunables.
+
+### Target Zones
+
+It contains comma separated list of target zones. It can be tuned via `ZONES` environment variable.
+
+Use the following example to tune it:
+
+[embedmd]:# (./static/manifests/elb-az-down/target-zones.yaml yaml)
+```yaml
+# contains elb az down for given zones
+apiVersion: litmuschaos.io/v1alpha1
+kind: ChaosEngine
+metadata:
+  name: engine-nginx
+spec:
+  engineState: "active"
+  chaosServiceAccount: litmus-admin
+  experiments:
+  - name: elb-az-down
+    spec:
+      components:
+        env:
+        # load balancer name for chaos
+        - name: LOAD_BALANCER_NAME
+          value: 'tes-elb'
+        # target zones for the chaos
+        - name: ZONES
+          value: 'us-east-1a,us-east-1b'
+        # region for chaos
+        - name: REGION
+          value: 'us-east-1'
+```
diff --git a/docs/chaos-engineering/chaos-faults/aws/lambda-delete-event-source-mapping.md b/docs/chaos-engineering/chaos-faults/aws/lambda-delete-event-source-mapping.md
@@ -0,0 +1,143 @@
+---
+id: lambda-delete-event-source-mapping
+title: Lambda Delete Event Source Mapping
+---
+
+## Introduction
+
+- It removes the event source mapping from an AWS Lambda function for a certain chaos duration.
+- It checks the performance of the running application/service without the event source mapping which can cause, for example, missing entries on a database.
+
+:::tip Fault execution flow chart
+![Lambda Delete Event Source Mapping](./static/images/lambda-delete-event-source-mapping.png)
+:::
+
+## Uses
+
+<details>
+<summary>View the uses of the fault</summary>
+<div>
+Deleting an event source mapping from a lambda function is critical. It can lead to scenarios such as failure to update the database on an event trigger which can break the service and impact their delivery. Such scenarios can occur despite  availability aids provided by AWS or determined by you.
+
+It helps understand if you have proper error handling or auto recovery configured for such cases. Hence, this category of chaos fault helps build the immunity of the application.
+</div>
+</details>
+
+## Prerequisites
+
+:::info
+
+- Kubernetes >= 1.17
+- AWS Lambda event source mapping attached to the lambda function.
+- Kubernetes secret that has AWS access configuration(key) in the `CHAOS_NAMESPACE`. A secret file looks like this:
+
+```yaml
+apiVersion: v1
+kind: Secret
+metadata:
+  name: cloud-secret
+type: Opaque
+stringData:
+  cloud_config.yml: |-
+    # Add the cloud AWS credentials respectively
+    [default]
+    aws_access_key_id = XXXXXXXXXXXXXXXXXXX
+    aws_secret_access_key = XXXXXXXXXXXXXXX
+```
+
+- If you change the secret key name (from `cloud_config.yml`), update the `AWS_SHARED_CREDENTIALS_FILE` environment variable value on `experiment.yaml` with the same name.
+
+## Default Validations
+
+:::info
+
+- The AWS Lambda event source mapping is healthy and attached to the lambda function.
+
+:::
+
+## Fault Tunables
+
+<details>
+    <summary>Check the Fault Tunables</summary>
+    <h2>Mandatory Fields</h2>
+    <table>
+      <tr>
+        <th> Variables </th>
+        <th> Description </th>
+        <th> Notes </th>
+      </tr>
+      <tr>
+        <td> FUNCTION_NAME </td>
+        <td> Function name of the target lambda function. It supports single function name.</td>
+        <td> Eg: <code>test-function</code> </td>
+      </tr>
+      <tr>
+        <td> EVENT_UUIDS </td>
+        <td> Provide the UUID for the target event source mapping.</td>
+        <td> You can provide multiple values as (,) comma separated values. Eg: <code>id1,id2</code> </td>
+      </tr>
+      <tr>
+        <td> REGION </td>
+        <td> The region name of the target lambda function</td>
+        <td> Eg: <code>us-east-2</code></td>
+      </tr>
+    </table>
+    <h2>Optional Fields</h2>
+    <table>
+      <tr>
+        <th> Variables </th>
+        <th> Description </th>
+        <th> Notes </th>
+      </tr>
+      <tr>
+        <td> TOTAL_CHAOS_DURATION </td>
+        <td> The total time duration for chaos insertion in seconds </td>
+        <td> Defaults to 30s </td>
+      </tr>
+      <tr>
+        <td> SEQUENCE </td>
+        <td> It defines sequence of chaos execution for multiple instance</td>
+        <td> Default value: parallel. Supported: serial, parallel </td>
+      </tr>
+      <tr>
+        <td> RAMP_TIME </td>
+        <td> Period to wait before and after injection of chaos in sec </td>
+        <td> Eg. 30 </td>
+      </tr>
+    </table>
+</details>
+
+## Fault Examples
+
+### Common and AWS specific tunables
+
+Refer to the [common attributes](../common-tunables-for-all-experiments) and [AWS specific tunable](./aws-experiments-tunables) to tune the common tunables for all faults and aws specific tunables.
+
+### Multiple Event Source Mapping
+
+It can delete multiple event source mappings for a certain chaos duration using `EVENT_UUIDS` environment variable that takes the UUID of the events as a comma separated value (CSV file).
+
+Use the following example to tune it:
+
+[embedmd]:# (./static/manifests/lambda-delete-event-source-mapping/multiple-events.yaml yaml)
+```yaml
+# contains the removal of multiple event source mapping
+apiVersion: litmuschaos.io/v1alpha1
+kind: ChaosEngine
+metadata:
+  name: engine-nginx
+spec:
+  engineState: "active"
+  chaosServiceAccount: litmus-admin
+  experiments:
+  - name: lambda-delete-event-source-mapping
+    spec:
+      components:
+        env:
+        # provide UUIDS of event source mapping
+        - name: EVENT_UUIDS
+          value: 'id1,id2'
+        # provide the function name for the chaos
+        - name: FUNCTION_NAME
+          value: 'chaos-function'
+```