diff --git a/docs/.gitbook/assets/Adding_EntityId_as_tag.png b/docs/.gitbook/assets/Adding_EntityId_as_tag.png new file mode 100644 index 0000000..4e2102a Binary files /dev/null and b/docs/.gitbook/assets/Adding_EntityId_as_tag.png differ diff --git a/docs/.gitbook/assets/FaultInjection_Dynatrace_Events.png b/docs/.gitbook/assets/FaultInjection_Dynatrace_Events.png new file mode 100644 index 0000000..65bdc7f Binary files /dev/null and b/docs/.gitbook/assets/FaultInjection_Dynatrace_Events.png differ diff --git a/docs/.gitbook/assets/Fault_injection_events_Dynatrace_UI.png b/docs/.gitbook/assets/Fault_injection_events_Dynatrace_UI.png new file mode 100644 index 0000000..7e55bd2 Binary files /dev/null and b/docs/.gitbook/assets/Fault_injection_events_Dynatrace_UI.png differ diff --git a/docs/.gitbook/assets/Url_Entity_Page.png b/docs/.gitbook/assets/Url_Entity_Page.png new file mode 100644 index 0000000..7fcd8fb Binary files /dev/null and b/docs/.gitbook/assets/Url_Entity_Page.png differ diff --git a/docs/README.md b/docs/README.md index 321216b..db0b642 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,7 +2,7 @@ ## Mangle Documentation -The Mangle Documentation provides information about how to install, configure, and uses Mangle™. +The Mangle Documentation provides information about how to install, configure, and uses Mangle. To navigate to the appropriate documentation, start with the [Mangle GitHub IO Page](https://vmware.github.io/mangle/). diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index fc6dc4d..ffe0660 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -17,6 +17,7 @@ * [Database Faults](sre-developers-and-users/injecting-faults/database-faults.md) * [Redis Faults](sre-developers-and-users/injecting-faults/redis-faults.md) * [Custom Faults](sre-developers-and-users/injecting-faults/custom-faults.md) + * [Fault Events in Dynatrace](sre-developers-and-users/injecting-faults/fault-events-in-dynatrace.md) * [Resiliency Score](sre-developers-and-users/resiliency-score.md) * [Requests and Reports](sre-developers-and-users/requests-and-reports.md) * [Mangle Troubleshooting Guide](troubleshooting-guide/README.md) @@ -26,4 +27,3 @@ * [Fault Injection Stage](troubleshooting-guide/fault-injection-stage.md) * [Mangle Developers' Guide](building-the-mangle-codebase.md) * [Contributing to Mangle](contributing-to-mangle.md) - diff --git a/docs/mangle-administration/README.md b/docs/mangle-administration/README.md index b858782..4e4fca3 100644 --- a/docs/mangle-administration/README.md +++ b/docs/mangle-administration/README.md @@ -2,22 +2,21 @@ _Mangle Deployment and Administration Guide_ provides information about how to install and configure Mangle as an administrative user. -**Product version: 3.0.0** +**Product version: 3.5.0** #### Intended Audience -This information is intended for Mangle administrators who would be setting up Mangle, adding users, adding metric providers for monitoring faults, setting log levels and creating support bundles. Knowledge of [container technology](https://en.wikipedia.org/wiki/Operating-system-level_virtualization) and [Docker](https://docs.docker.com/) will be useful. +This information is intended for Mangle administrators who would be setting up Mangle, adding users, adding metric providers for monitoring faults, setting log levels and creating support bundles. Knowledge of [container technology](https://en.wikipedia.org/wiki/Operating-system-level\_virtualization) and [Docker](https://docs.docker.com) will be useful. -| Sub Content | Description | -| :--- | :--- | -| [Supported Deployment Models](supported-deployment-models/) | Provides information about deploying Mangle either as an OVA or as containers; as a single instance or as a cluster for high availability | -| [Admin Settings](admin-settings.md) | Provides information about add additional authentication sources, users, roles, setting log levels and adding metric providers for monitoring | +| Sub Content | Description | +| ----------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------- | +| [Supported Deployment Models](supported-deployment-models/) | Provides information about deploying Mangle either as an OVA or as containers; as a single instance or as a cluster for high availability | +| [Admin Settings](admin-settings.md) | Provides information about add additional authentication sources, users, roles, setting log levels and adding metric providers for monitoring | -Copyright \(c\) 2019 VMware, Inc. All rights reserved. [Copyright and trademark information](http://pubs.vmware.com/copyright-trademark.html). Any feedback you provide to VMware is subject to the terms at [www.vmware.com/community\_terms.html](http://www.vmware.com/community_terms.html). +Copyright (c) 2019 VMware, Inc. All rights reserved. [Copyright and trademark information](http://pubs.vmware.com/copyright-trademark.html). Any feedback you provide to VMware is subject to the terms at [www.vmware.com/community\_terms.html](http://www.vmware.com/community\_terms.html). -**VMware, Inc.** -3401 Hillview Ave. +**VMware, Inc.**\ +3401 Hillview Ave.\ Palo Alto, CA 94304 -[www.vmware.com](http://www.vmware.com/) - +[www.vmware.com](http://www.vmware.com) diff --git a/docs/mangle-administration/admin-settings.md b/docs/mangle-administration/admin-settings.md index 8992d6d..32b7167 100644 --- a/docs/mangle-administration/admin-settings.md +++ b/docs/mangle-administration/admin-settings.md @@ -18,7 +18,7 @@ Mangle supports using Active Directory as an additional authentication source.&# {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/auth-provider-controller_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/auth-provider-controller_ {% endhint %} #### Adding/Importing Users @@ -37,7 +37,7 @@ Mangle supports adding new local user or importing users from Active Directory s {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/user-management-controller_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/user-management-controller_ {% endhint %} #### Default and Custom Roles @@ -64,7 +64,7 @@ Mangle supports creation of custom roles from the default privileges that are av {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/role-controller_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/role-controller_ {% endhint %} ## Configuration @@ -86,12 +86,12 @@ Mangle supports modifying log levels for the application. ![](../.gitbook/assets/new\_logo.png) Clicking on ![](../.gitbook/assets/application\_log.png) will open up the log on the UI and will be auto refreshed periodically. -![](../.gitbook/assets/new\_logo.png) Clicking on ![](../.gitbook/assets/download\_bundle.png) will allow you to download and save the support bundle from the mangle server to a local file directory. It pulls and aggregates the logs from multiple nodes into a single zip file in case of a clustered Mangle setup. +![](../.gitbook/assets/new\_logo.png) Clicking on ![](../.gitbook/assets/download\_bundle.png) will allow you to download and save the support bundle from the mangle server to a local file directory. In case of a clustered Mangle setup, the action should be repeated for each node in the cluster to get the support bundle from all nodes. {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/operation-handler_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/operation-handler_ {% endhint %} ### ![](../.gitbook/assets/new\_logo.png) Cluster Config @@ -108,7 +108,7 @@ The page displays the cluster name, the validation token, members, quorum and de {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/cluster-config-controller_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/cluster-config-controller_ {% endhint %} ### Fault Plugins @@ -128,7 +128,7 @@ This section enables you to load custom faults that are already available on the {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/plugin-controller_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/plugin-controller_ {% endhint %} ### ![](../.gitbook/assets/new\_logo.png) Resiliency Score Metric Configuration @@ -158,36 +158,46 @@ The configuration required to enable resiliency score calculations has to be don _Only one configuration for the resiliency score calculation can be created._ -_This feature is still under evaluation and is supported only **VMware Wavefront**. If you need Mangle to provide support for other monitoring systems, please raise a feature request under _[_Mangle Github_](https://github.com/vmware/mangle/issues)_._ +_This feature is still under evaluation and is supported only **VMware Wavefront**. If you need Mangle to provide support for other monitoring systems, please raise a feature request under_ [_Mangle Github_](https://github.com/vmware/mangle/issues)_._ {% endhint %} {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/resiliency-score-controller_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/resiliency-score-controller_ {% endhint %} ## Integrations ### Metric Providers -Mangle supports addition of either Wavefront or Datadog as metric providers. This enables the information about fault injection and remediation to be published to these tools as events thus making it easier to monitor them. +Mangle supports addition of Wavefront, Datadog or Dynatrace as metric providers. This enables the information about fault injection and remediation to be published to these tools as events thus making it easier to monitor them. **Steps to follow:** 1. Login as an admin user to Mangle. 2. Navigate to ![](../.gitbook/assets/settings.png) -----> Integrations -----> Metric Providers . 3. Click on ![](../.gitbook/assets/monitoringtoolbutton.png). -4. Choose Wavefront or Datadog, provide credentials and click on **Submit**. +4. Choose Wavefront, Datadog or Dynatrace, provide credentials and click on **Submit**. 5. A success message is displayed and the table for Monitoring tools will be updated with the new entry. 6. Click on ![](<../.gitbook/assets/supportedactionsbutton (3) (3).png>) against a table entry to see the supported operations. On adding a metric provider, Mangle will send events automatically to the enabled provider for every fault injected and remediated. If the requirement is to monitor Mangle as an application by looking at its metrics, then click on the ![](../.gitbook/assets/send\_metrics.png) button to enable sending of Mangle application metrics to the corresponding metric provider. {% hint style="info" %} +**Notes about the Dynatrace Integration:** + +Device ID: The name of the custom device that will appear in the user interface of Dynatrace. The custom device will be created at Dynatrace only on enabling to "Send Metric" option at Mangle. Application metrics of Mangle will be visible under the specified device ID at Dynatrace on enabling "Send Metric" option at Mangle. + +Dynatrace expects same dimensions for the metrics reported by an application. Hence, if you have multiple Mangle instance deployments then , please include the same "key" under "tags" option (values can be different) while configuring the Mangle metric provider. + + +{% endhint %} + +{% hint style="warning" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#_/_operation-handler_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#_/_operation-handler_ {% endhint %} ### __![](../.gitbook/assets/new\_logo.png) Notifier @@ -211,5 +221,5 @@ After this configuration, you will be able to select an appropriate notifier at {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/notifier-controller_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/notifier-controller_ {% endhint %} diff --git a/docs/mangle-administration/supported-deployment-models/README.md b/docs/mangle-administration/supported-deployment-models/README.md index 051e602..25f5c0d 100644 --- a/docs/mangle-administration/supported-deployment-models/README.md +++ b/docs/mangle-administration/supported-deployment-models/README.md @@ -2,7 +2,7 @@ ## Single Node Deployments -For a quick POC we recommend deploying a single node instance of Mangle on VMware vSphere using the OVA file available for download [here](https://repo.vmware.com/mangle/v3.0.0/Mangle-3.0.0.0\_OVF10.ova). +For a quick POC we recommend deploying a single node instance of Mangle on VMware vSphere using the OVA file available for download [here](https://repo.vmware.com/mangle/v3.5.0/Mangle-3.5.0.0-19106891\_OVF10.ova). ### System Requirements @@ -19,7 +19,7 @@ Login to your vSphere environment and perform the following steps in vCenter: 1. Start the Import Process * From the Actions pull-down menu for a datacenter, choose **Deploy OVF Template**. ![Create/Register VM](../../.gitbook/assets/ova3.x\_step1\_deployovf.png) - * Locate and select the downloaded OVA file (as the screenshot shows), or alternatively, for vCenter instances with connectivity to the internet, enter the OVA's URL to deploy from the web directly. + * Locate and select the downloaded OVA file (as the screenshot shows), or alternatively, for vCenter instances with connectivity to the internet, enter the OVA's [URL](https://repo.vmware.com/mangle/v3.5.0/Mangle-3.5.0.0-18963379\_OVF10.ova) to deploy from the web directly. * Choose **Next**. 2. Specify the Name and Location of Virtual Machine * Enter a name for the virtual machine, and select the target location for it. ![OVA file](../../.gitbook/assets/ova3.x\_step2\_nameandfolder.png) @@ -61,6 +61,7 @@ Login to your vSphere environment and perform the following steps in vCenter: * Password: `admin` 15. Export the VM as a Template (Optional) 16. Consider converting this imported VM into a template (from the Actions menu, choose **Export** ) so that you have a master Mangle instance that can be combined with vSphere Guest Customization to enable rapid provisioning of Mangle instances. +17. Mangle container logs are mounted to location `/var/opt/mangle-tomcat-dir/logs` on the virtual machine. Now you can move on to the [Mangle Users Guide](../../sre-developers-and-users/). @@ -68,20 +69,28 @@ Now you can move on to the [Mangle Users Guide](../../sre-developers-and-users/) #### Prerequisites -Before creating the Mangle container a Cassandra DB container should be made available on a Docker host. You can choose to deploy the DB and the Application container on the same Docker host or on different Docker hosts. However, we recommend that you use a separate Docker host for each of these. You can setup a Docker host by following the instructions [here](https://docs.docker.com/install/). +Before creating the Mangle container, a Cassandra DB container should be made available on a Docker host. You can choose to deploy the DB and the Application container on the same Docker host or on different Docker hosts. However, we recommend that you use a separate Docker host for each of these. You can setup a Docker host by following the instructions [here](https://docs.docker.com/install/). To deploy Cassandra, you can either use the authentication enabled image tested and verified with Mangle available on the Mangle Docker repo or use the default public Cassandra image hosted on Dockerhub. +#### Create directories for mounting the Mangle container logs on the Docker host by, running the command below: + +`mkdir -p /var/opt/mangle-tomcat-dir/logs` + +#### Grant permission on the Host Dir for container volume mounting by, running the command below: + +`chown 1000:1000 /var/opt/mangle-tomcat-dir/logs` + **If you chose to use the Cassandra image from Mangle Docker Repo:** ``` -docker run --name mangle-cassandradb -v /cassandra/storage/:/var/lib/cassandra -p 9042:9042 -d -e CASSANDRA_CLUSTER_NAME="manglecassandracluster" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="rack1" -e CASSANDRA_ENDPOINT_SNITCH="GossipingPropertyFileSnitch" mangleuser/mangle_cassandradb: +docker run --name mangle-cassandradb -v /cassandra/storage/:/var/lib/cassandra -p :9042:9042 -d -e CASSANDRA_CLUSTER_NAME="manglecassandracluster" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="rack1" -e CASSANDRA_ENDPOINT_SNITCH="GossipingPropertyFileSnitch" mangleuser/mangle_cassandradb: ``` **If you chose to use the Cassandra image from** [**Dockerhub**](https://hub.docker.com/\_/cassandra/)**:** ``` -docker run --name mangle-cassandradb -v /cassandra/storage/:/var/lib/cassandra -p 9042:9042 -d -e CASSANDRA_CLUSTER_NAME="manglecassandracluster" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="rack1" -e CASSANDRA_ENDPOINT_SNITCH="GossipingPropertyFileSnitch" cassandra:3.11 +docker run --name mangle-cassandradb -v /cassandra/storage/:/var/lib/cassandra -p :9042:9042 -d -e CASSANDRA_CLUSTER_NAME="manglecassandracluster" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="rack1" -e CASSANDRA_ENDPOINT_SNITCH="GossipingPropertyFileSnitch" cassandra:3.11 ``` {% hint style="info" %} @@ -93,13 +102,13 @@ To enable authentication or clustering on Cassandra refer to the [Cassandra Adva To deploy the Mangle container using a Cassandra DB deployed using the image from Mangle Docker repo or with DB authentication and ssl enabled, run the docker command below on the docker host after substituting the values in angle braces <> with actual values. ``` -docker run --name mangle -d -e DB_OPTIONS="-DcassandraContactPoints= -DcassandraSslEnabled=true -DcassandraUsername=cassandra -DcassandraPassword=cassandra" -e CLUSTER_OPTIONS="-DclusterValidationToken=mangle -DpublicAddress=" -p 8080:8080 -p 8443:8443 mangleuser/mangle: +docker run --name mangle --log-opt max-size=10m --log-opt max-file=1 -v /var/opt/mangle-tomcat-dir/logs:/home/mangle/var/opt/mangle-tomcat/logs -d -e DB_OPTIONS="-DcassandraContactPoints= -DcassandraSslEnabled=true -DcassandraUsername=cassandra -DcassandraPassword=cassandra" -e CLUSTER_OPTIONS="-DclusterValidationToken=mangle -DpublicAddress=" -p :8080:8080 -p :8443:8443 mangleuser/mangle: ``` To deploy the Mangle container using a Cassandra DB deployed using the image from Dockerhub or with DB authentication and ssl disabled, run the docker command below on the docker host after substituting the values in angle braces <> with actual values. ``` -docker run --name mangle -d -e DB_OPTIONS="-DcassandraContactPoints= -DcassandraSslEnabled=false" -e CLUSTER_OPTIONS="-DclusterValidationToken=mangle -DpublicAddress=" -p 8080:8080 -p 8443:8443 mangleuser/mangle: +docker run --name mangle --log-opt max-size=10m --log-opt max-file=1 -v /var/opt/mangle-tomcat-dir/logs:/home/mangle/var/opt/mangle-tomcat/logs -d -e DB_OPTIONS="-DcassandraContactPoints= -DcassandraSslEnabled=false" -e CLUSTER_OPTIONS="-DclusterValidationToken=mangle -DpublicAddress=" -p :8080:8080 -p :8443:8443 mangleuser/mangle: ``` {% hint style="info" %} @@ -142,16 +151,24 @@ Although the docker run commands above lists only a few DB\_OPTIONS and CLUSTER\ Mangle vCenter Adapter is a fault injection adapter for injecting vCenter specific faults. All the vCenter operations from the Mangle application will be carried out through this adapter. +_**Create directories for mounting the Mangle-vCenter container logs on the Docker host by running the command below:**_ + +`mkdir -p /var/opt/mangle-vc-adapter-tomcat/logs` + +_**Grant permission on the Host Dir for container volume mounting by, running the command below:**_ + +`chown 1000:1000 /var/opt/mangle-vc-adapter-tomcat/logs` + To deploy the vCenter adapter container using the default credentials run the docker command below on the docker host. Here the port 8443 is the external facing port on which the container will be available. Please ensure that the 8443 port is not used by any other application before running the command below. Else, change the command to use a free port and then run it. ``` -docker run --name mangle-vc-adapter -v /var/opt/mangle-vc-adapter-tomcat/logs:/var/opt/mangle-vc-adapter-tomcat/logs -d -p 8080:8080 -p 8443:8443 mangleuser/mangle_vcenter_adapter: +docker run --name mangle-vc-adapter --log-opt max-size=10m --log-opt max-file=1 -v /var/opt/mangle-vc-adapter-tomcat/logs:/var/opt/mangle-vc-adapter-tomcat/logs -d -p :8080:8080 -p :8443:8443 mangleuser/mangle_vcenter_adapter: ``` To deploy the vCenter adapter container using custom credentials run the docker command below on the docker host. Substitute the new password in angular brackets with a password of your choice. Here the port 8443 is the external facing port on which the container will be available. Please ensure that the 8443 port is not used by any other application before running the command below. Else, change the command to use a free port and then run it. ``` -docker run --name mangle-vc-adapter -v /var/opt/mangle-vc-adapter-tomcat/logs:/var/opt/mangle-vc-adapter-tomcat/logs -d -p 8080:8080 -p 8443:8443 -e JAVA_OPTS="-DcustomAdminCred=" mangleuser/mangle_vcenter_adapter: +docker run --name mangle-vc-adapter --log-opt max-size=10m --log-opt max-file=1 -v /var/opt/mangle-vc-adapter-tomcat/logs:/var/opt/mangle-vc-adapter-tomcat/logs -d -p :8080:8080 -p :8443:8443 -e JAVA_OPTS="-DcustomAdminCred=" mangleuser/mangle_vcenter_adapter: ``` {% hint style="info" %} @@ -241,17 +258,17 @@ Deploy the Mangle cluster by bringing up the mangle container in each docker hos **For the first node in the cluster:** ``` -docker run --name mangle -d -v /var/opt/mangle-tomcat/logs:/var/opt/mangle-tomcat/logs -e DB_OPTIONS="-DcassandraContactPoints=" -e CLUSTER_OPTIONS="-DclusterName= -DclusterValidationToken= -DpublicAddress= -DdeploymentMode=CLUSTER" -p 8080:8080 -p 443:8443 -p 5701:5701 mangleuser/mangle: +docker run --name mangle --log-opt max-size=10m --log-opt max-file=1 -d -v /var/opt/mangle-tomcat/logs:/var/opt/mangle-tomcat/logs -e DB_OPTIONS="-DcassandraContactPoints=" -e CLUSTER_OPTIONS="-DclusterName= -DclusterValidationToken= -DpublicAddress= -DdeploymentMode=CLUSTER" -p :8080:8080 -p :443:8443 -p :5701:5701 mangleuser/mangle: ``` **For the subsequent nodes in the cluster:** ``` -docker run --name mangle -d -v /var/opt/mangle-tomcat/logs:/var/opt/mangle-tomcat/logs -e DB_OPTIONS="-DcassandraContactPoints=" -e CLUSTER_OPTIONS="-DclusterName= -DclusterValidationToken= -DpublicAddress= -DclusterMembers= -DdeploymentMode=CLUSTER" -p 8080:8080 -p 443:8443 -p 5701:5701 mangleuser/mangle: +docker run --name mangle --log-opt max-size=10m --log-opt max-file=1 -d -v /var/opt/mangle-tomcat/logs:/var/opt/mangle-tomcat/logs -e DB_OPTIONS="-DcassandraContactPoints=" -e CLUSTER_OPTIONS="-DclusterName= -DclusterValidationToken= -DpublicAddress= -DclusterMembers= -DdeploymentMode=CLUSTER" -p :8080:8080 -p :443:8443 -p :5701:5701 mangleuser/mangle: ``` ``` -docker run --name mangle -d -v /var/opt/mangle-tomcat/logs:/var/opt/mangle-tomcat/logs -e DB_OPTIONS="-DcassandraContactPoints=" -e CLUSTER_OPTIONS="-DclusterName= -DclusterValidationToken= -DpublicAddress= -DclusterMembers= -DdeploymentMode=CLUSTER" -p 8080:8080 -p 443:8443 -p 5701:5701 mangleuser/mangle: +docker run --name mangle --log-opt max-size=10m --log-opt max-file=1 -d -v /var/opt/mangle-tomcat/logs:/var/opt/mangle-tomcat/logs -e DB_OPTIONS="-DcassandraContactPoints=" -e CLUSTER_OPTIONS="-DclusterName= -DclusterValidationToken= -DpublicAddress= -DclusterMembers= -DdeploymentMode=CLUSTER" -p :8080:8080 -p :443:8443 -p :5701:5701 mangleuser/mangle: ``` ## Deployment Mode and Quorum @@ -320,7 +337,7 @@ Active members list of the active quorum will be maintained in DB under the tabl #### **(Applicable if you have deployed Mangle on a Docker Host/ OVA vm)** * You can make use of the upgrade script for upgrading the MangleWEB container running. -* The upgrade script is available on the public Mangle Git hub repository at location: [mangle/mangle-support/](https://github.com/vmware/mangle/tree/master/mangle-support)`sh UpgradeMangle.sh --MANGLE_ADMIN_USERNAME= --MANGLE_ADMINPASSWORD= --MANGLE_BUILD_NUMBER= --MANGLE_CONTAINER_NAME= --MANGLE_APP_PORT=443 --MANGLE_DOCKER_ARTIFACTORY= ` +* The upgrade script is available on the public Mangle Git hub repository at location: [mangle/mangle-support/](https://github.com/vmware/mangle/tree/master/mangle-support)`sh UpgradeMangle.sh --LOG_MAX_SIZE= --LOG_MAX_FILE= --MANGLE_ADMIN_USERNAME= --MANGLE_ADMINPASSWORD= --MANGLE_BUILD_NUMBER= --MANGLE_CONTAINER_NAME= --MANGLE_APP_PORT=443 --MANGLE_DOCKER_ARTIFACTORY= --NIC_NAME=` * The script will prompt you to check if you have taken the DB snapshot using the link below: For reference to take DB snapshot: [https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupTakesSnapshot.html](https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsBackupTakesSnapshot.html) * The upgrade script is tested out for Single Node upgrade. * The existing data in Cassandra will be intact while upgrading from Mangle version 2.0 to 3.0. There will be no changes to the existing DB tables. The db migration scripts takes care of adding new tables to the existing schema. diff --git a/docs/mangle-administration/supported-deployment-models/advanced-cassandra-configuration.md b/docs/mangle-administration/supported-deployment-models/advanced-cassandra-configuration.md index 46fd1bf..607e984 100644 --- a/docs/mangle-administration/supported-deployment-models/advanced-cassandra-configuration.md +++ b/docs/mangle-administration/supported-deployment-models/advanced-cassandra-configuration.md @@ -8,13 +8,13 @@ Open **/etc/cassandra/cassandra.yaml** and modify **authenticator**: from **Allo To execute the init-query.cql file on db startup, need to modify the **docker-entrypoint.sh** file, add the below content right before **exec "$@"** -`for f in docker-entrypoint-initdb.d/*; do `\ -`case "$f" in `\ -`*.sh) echo "$0: running $f"; . "$f" ;; `\ -`*.cql) echo "$0: running $f" && until cqlsh --ssl -u cassandra -p cassandra -f "$f"; do >&2 echo "Cassandra is unavailable - sleeping"; sleep 2; done & ;; `\ -`*) echo "$0: ignoring $f" ;; `\ -`esac `\ -`echo `\ +`for f in docker-entrypoint-initdb.d/*; do` \ +`case "$f" in` \ +`*.sh) echo "$0: running $f"; . "$f" ;;` \ +`*.cql) echo "$0: running $f" && until cqlsh --ssl -u cassandra -p cassandra -f "$f"; do >&2 echo "Cassandra is unavailable - sleeping"; sleep 2; done & ;;` \ +`*) echo "$0: ignoring $f" ;;` \ +`esac` \ +`echo` \ `done` Here, **cqlsh --ssl -u cassandra -p cassandra** used to run \*.cql file (if ssl is not enabled then remove --ssl option) @@ -116,11 +116,11 @@ To download the Cassandra client as DevCenter from [DevCenter](https://academy.d Create seed Node : ``` -docker run --name mangle-cassandradb -v /cassandra/storage/:/var/lib/cassandra -p 9042:9042 -p 7000:7000 -p 7001:7001 -d -e CASSANDRA_BROADCAST_ADDRESS= -e CASSANDRA_SEEDS= -e CASSANDRA_CLUSTER_NAME="manglecassandracluster" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="rack1" -e CASSANDRA_ENDPOINT_SNITCH="GossipingPropertyFileSnitch" mangleuser/mangle_cassandradb:1.0 +docker run --name mangle-cassandradb -v /cassandra/storage/:/var/lib/cassandra -p :9042:9042 -p :7000:7000 -p :7001:7001 -d -e CASSANDRA_BROADCAST_ADDRESS= -e CASSANDRA_SEEDS= -e CASSANDRA_CLUSTER_NAME="manglecassandracluster" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="rack1" -e CASSANDRA_ENDPOINT_SNITCH="GossipingPropertyFileSnitch" mangleuser/mangle_cassandradb:1.0 ``` Join the Other Node to Seed Node : ``` -docker run --name mangle-cassandradb -v /cassandra/storage/:/var/lib/cassandra -p 9042:9042 -p 7000:7000 -p 7001:7001 -d -e CASSANDRA_BROADCAST_ADDRESS= -e CASSANDRA_SEEDS= -e CASSANDRA_CLUSTER_NAME="manglecassandracluster" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="rack1" -e CASSANDRA_ENDPOINT_SNITCH="GossipingPropertyFileSnitch" mangleuser/mangle_cassandradb:1.0 +docker run --name mangle-cassandradb -v /cassandra/storage/:/var/lib/cassandra -p :9042:9042 -p :7000:7000 -p :7001:7001 -d -e CASSANDRA_BROADCAST_ADDRESS= -e CASSANDRA_SEEDS= -e CASSANDRA_CLUSTER_NAME="manglecassandracluster" -e CASSANDRA_DC="DC1" -e CASSANDRA_RACK="rack1" -e CASSANDRA_ENDPOINT_SNITCH="GossipingPropertyFileSnitch" mangleuser/mangle_cassandradb:1.0 ``` diff --git a/docs/sre-developers-and-users/README.md b/docs/sre-developers-and-users/README.md index f087660..aa8ffa4 100644 --- a/docs/sre-developers-and-users/README.md +++ b/docs/sre-developers-and-users/README.md @@ -2,24 +2,23 @@ _Mangle Users Guide_ provides information about how to add endpoints, run faults, generate resiliency score and view reports. -**Product version: 3.0.0** +**Product version: 3.5.0** #### Intended Audience This information is intended for SRE, Developers and Chaos engineers who would like to run chaos experiments against infrastructure or applications to assess the resilience of their applications when subjected to unexpected failures. -| Sub Content | Description | -| :--- | :--- | -| [Adding Endpoints](adding-endpoints.md) | Provides information about adding the targets for fault injection | -| [Injecting Faults](injecting-faults/) | Provides information about the types of faults that can be injected to a specific endpoint | -| [Resiliency Score](resiliency-score.md) | Provides information about how to generate resiliency score metrics and send it to a monitoring system automatically using Mangle | -| [Requests and Reports](requests-and-reports.md) | Provides information about the progress and status of the injections | +| Sub Content | Description | +| ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | +| [Adding Endpoints](adding-endpoints.md) | Provides information about adding the targets for fault injection | +| [Injecting Faults](injecting-faults/) | Provides information about the types of faults that can be injected to a specific endpoint | +| [Resiliency Score](resiliency-score.md) | Provides information about how to generate resiliency score metrics and send it to a monitoring system automatically using Mangle | +| [Requests and Reports](requests-and-reports.md) | Provides information about the progress and status of the injections | -Copyright \(c\) 2019 VMware, Inc. All rights reserved. [Copyright and trademark information](http://pubs.vmware.com/copyright-trademark.html). Any feedback you provide to VMware is subject to the terms at [www.vmware.com/community\_terms.html](http://www.vmware.com/community_terms.html). +Copyright (c) 2019 VMware, Inc. All rights reserved. [Copyright and trademark information](http://pubs.vmware.com/copyright-trademark.html). Any feedback you provide to VMware is subject to the terms at [www.vmware.com/community\_terms.html](http://www.vmware.com/community\_terms.html). -**VMware, Inc.** -3401 Hillview Ave. +**VMware, Inc.**\ +3401 Hillview Ave.\ Palo Alto, CA 94304 -[www.vmware.com](http://www.vmware.com/) - +[www.vmware.com](http://www.vmware.com) diff --git a/docs/sre-developers-and-users/adding-endpoints.md b/docs/sre-developers-and-users/adding-endpoints.md index 05e892a..a70cc09 100644 --- a/docs/sre-developers-and-users/adding-endpoints.md +++ b/docs/sre-developers-and-users/adding-endpoints.md @@ -95,7 +95,7 @@ Mangle supports any remote machine with ssh enabled as endpoints or targets for | ----------------------------------------------- | ---------------- | | CentOS | 7, 7.7, 7.8, 8.2 | | Debian | 7.8, 8, 9 | -| Photon OS | 1, 2, 3 | +| Photon OS | 2, 3 | | RHEL | 7.5, 8.2, 8.3 | | Suse | 12, 15 | | Ubuntu | 14, 16, 18 | @@ -124,7 +124,7 @@ Mangle supports AWS as endpoint or target for injection. It needs the Region, cr 6. A success message is displayed and the table for Endpoints will be updated with the new entry. 7. Edit, Delete, Enable and Disable actions are available for all added Endpoints. -### ![](https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-LcVKiIEQZ\_SDz8uqA0g%2F-MQqcKvmHtHATdUwd-sp%2F-MQr2ZghrRK5S4-Fm9\_T%2FNew\_Logo.png?alt=media\&token=afa1ae80-f950-4996-8f2c-7d87f3d520d3) Azure +### ![](https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-LcVKiIEQZ\_SDz8uqA0g%2F-MQqcKvmHtHATdUwd-sp%2F-MQr2ZghrRK5S4-Fm9\_T%2FNew\_Logo.png?alt=media\&token=afa1ae80-f950-4996-8f2c-7d87f3d520d3) Azure ‌Mangle supports Azure as endpoint or target for injection. It needs the Subscription ID, Tenant ID, credentials (Client application ID and Client application secret key) and tags to connect to Azure and run the supported faults. ‌ @@ -138,7 +138,7 @@ Mangle supports AWS as endpoint or target for injection. It needs the Region, cr 6. A success message is displayed and the table for Endpoints will be updated with the new entry. 7. Edit, Delete, Enable and Disable actions are available for all added Endpoints. -### ![](https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-LcVKiIEQZ\_SDz8uqA0g%2F-MQqcKvmHtHATdUwd-sp%2F-MQr2ZghrRK5S4-Fm9\_T%2FNew\_Logo.png?alt=media\&token=afa1ae80-f950-4996-8f2c-7d87f3d520d3) Redis Proxy +### ![](https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-LcVKiIEQZ\_SDz8uqA0g%2F-MQqcKvmHtHATdUwd-sp%2F-MQr2ZghrRK5S4-Fm9\_T%2FNew\_Logo.png?alt=media\&token=afa1ae80-f950-4996-8f2c-7d87f3d520d3) Redis Proxy ‌With version 3.0, Mangle provides the ability to run faults against Redis by integrating with [RedFI (Redis Fault Injection Proxy)](https://openfip.github.io/redfi/) which is a separate open source project. To try out the Redis faults, it is mandatory that you have a Redis proxy up and running in your environment. To deploy the RedisFI proxy please refer to the instructions specified [here](https://github.com/openfip/redfi#usage). After the proxy is deployed and running, proceed to the steps below to add it as an endpoint in Mangle. @@ -152,7 +152,7 @@ Mangle supports AWS as endpoint or target for injection. It needs the Region, cr 6. A success message is displayed and the table for Endpoints will be updated with the new entry. 7. Edit, Delete, Enable and Disable actions are available for all added Endpoints. -### ![](https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-LcVKiIEQZ\_SDz8uqA0g%2F-MQqcKvmHtHATdUwd-sp%2F-MQr2ZghrRK5S4-Fm9\_T%2FNew\_Logo.png?alt=media\&token=afa1ae80-f950-4996-8f2c-7d87f3d520d3) Databases +### ![](https://firebasestorage.googleapis.com/v0/b/gitbook-28427.appspot.com/o/assets%2F-LcVKiIEQZ\_SDz8uqA0g%2F-MQqcKvmHtHATdUwd-sp%2F-MQr2ZghrRK5S4-Fm9\_T%2FNew\_Logo.png?alt=media\&token=afa1ae80-f950-4996-8f2c-7d87f3d520d3) Databases ‌With version 3.0, Mangle provides the ability to run faults against databases. The databases supported are Cassandra, Mongo and Postgres. The database endpoint has one key difference since they can reside on a virtual machine/instance as a service, on Docker as containers or on K8s as pods. Hence, when defining database endpoints in Mangle, you also need to specify the parent endpoint which could be a remote machine, Docker or a K8s cluster. @@ -171,7 +171,7 @@ Mangle supports AWS as endpoint or target for injection. It needs the Region, cr {% hint style="info" %} **For access to Swagger documentation:** -Please traverse to link _\*\*_![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access [https://\/mangle-services/swagger-ui.html#_/_endpoint-controller_ +Please traverse to link _\*\*_![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access [https://\/mangle-services/swagger-ui.html#_/_endpoint-controller_ ![](../.gitbook/assets/endpointcontroller.png) {% endhint %} diff --git a/docs/sre-developers-and-users/injecting-faults/README.md b/docs/sre-developers-and-users/injecting-faults/README.md index 038d7bb..0944652 100644 --- a/docs/sre-developers-and-users/injecting-faults/README.md +++ b/docs/sre-developers-and-users/injecting-faults/README.md @@ -7,6 +7,6 @@ Mangle supports two broad category of faults: 1. Infrastructure Faults 2. Application Faults -**Infrastructure Faults** are a set of faults that target IAAS components where developers host and run their applications. For eg: this might be a virtual machine or an AWS EC2 instance where the application runs as a service or a Docker host where the application containers are hosted or a K8s cluster where the pods host the application. These components are usually shared with multiple applications running on the same infrastructure and are referred to as **endpoints **in Mangle. So faults against these components will impact multiple applications unless they have different levels of fault tolerance. +**Infrastructure Faults** are a set of faults that target IAAS components where developers host and run their applications. For eg: this might be a virtual machine or an AWS EC2 instance where the application runs as a service or a Docker host where the application containers are hosted or a K8s cluster where the pods host the application. These components are usually shared with multiple applications running on the same infrastructure and are referred to as **endpoints** in Mangle. So faults against these components will impact multiple applications unless they have different levels of fault tolerance. **Application Faults** are a set of faults that target specific applications running within a given infrastructure component or endpoint. For eg: this could be a specific tomcat application running within a virtual machine or an AWS EC2 instance or JAVA applications running within containers on a Docker host or K8s pods. Faults against applications typically will impact just that application and ideally should not bring down any other applications running on the same infrastructure or is dependent on the affected service. If it does, your system is prone to cascading failures and should be examined in great detail to improve fault tolerance levels. diff --git a/docs/sre-developers-and-users/injecting-faults/application-faults.md b/docs/sre-developers-and-users/injecting-faults/application-faults.md index 02bfa06..6a27645 100644 --- a/docs/sre-developers-and-users/injecting-faults/application-faults.md +++ b/docs/sre-developers-and-users/injecting-faults/application-faults.md @@ -174,17 +174,17 @@ Java Method Latency Fault helps you simulate a condition where calls to a specif 6. Provide "Class Name" as PluginController if the class of interest is defined as `public class PluginController {...}`. 7. Provide "Method Name" as getPlugins if the method to be tested is defined as follows: - `public ResponseEntity> getPlugins( ` + `public ResponseEntity> getPlugins(` - `@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) { ` + `@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) {` - `log.info("PluginController getPlugins() Start............."); ` + `log.info("PluginController getPlugins() Start.............");` - `if (StringUtils.hasLength(pluginId)) { ` + `if (StringUtils.hasLength(pluginId)) {` - `return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK); ` + `return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK);` - `} ` + `}` `return new ResponseEntity<>(pluginService.getExtensions(), HttpStatus.OK);` @@ -299,17 +299,17 @@ Java Method Exception Fault helps you simulate a condition where calls to a spec 6. Provide "Class Name" as PluginController if the class of interest is defined as `public class PluginController {...}`. 7. Provide "Method Name" as getPlugins if the method to be tested is defined as follows: - `public ResponseEntity> getPlugins( ` + `public ResponseEntity> getPlugins(` - `@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) { ` + `@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) {` - `log.info("PluginController getPlugins() Start............."); ` + `log.info("PluginController getPlugins() Start.............");` - `if (StringUtils.hasLength(pluginId)) { ` + `if (StringUtils.hasLength(pluginId)) {` - `return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK); ` + `return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK);` - `} ` + `}` `return new ResponseEntity<>(pluginService.getExtensions(), HttpStatus.OK);` @@ -353,17 +353,17 @@ Kill JVM Fault helps you simulate a condition where JVM crashes with specific ex 6. Provide "Class Name" as PluginController if the class of interest is defined as `public class PluginController {...}`. 7. Provide "Method Name" as getPlugins if the method to be tested is defined as follows: - `public ResponseEntity> getPlugins( ` + `public ResponseEntity> getPlugins(` - `@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) { ` + `@RequestParam(value = "pluginId", required = false) String pluginId, @RequestParam(value = "extensionType", required = false) ExtensionType extensionType) {` - `log.info("PluginController getPlugins() Start............."); ` + `log.info("PluginController getPlugins() Start.............");` - `if (StringUtils.hasLength(pluginId)) { ` + `if (StringUtils.hasLength(pluginId)) {` - `return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK); ` + `return new ResponseEntity<>(pluginService.getExtensions(pluginId, extensionType), HttpStatus.OK);` - `} ` + `}` `return new ResponseEntity<>(pluginService.getExtensions(), HttpStatus.OK);` @@ -389,7 +389,7 @@ Kill JVM Fault helps you simulate a condition where JVM crashes with specific ex {% hint style="info" %} **For access to relevant API Swagger documentation:** -Please traverse to link** **![](../../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#_/_fault-injection-controller_ +Please traverse to link **** ![](../../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#_/_fault-injection-controller_ ![](broken-reference) ![](../../.gitbook/assets/faultinjectioncontroller.png) {% endhint %} diff --git a/docs/sre-developers-and-users/injecting-faults/custom-faults.md b/docs/sre-developers-and-users/injecting-faults/custom-faults.md index 163e0cf..67852bf 100644 --- a/docs/sre-developers-and-users/injecting-faults/custom-faults.md +++ b/docs/sre-developers-and-users/injecting-faults/custom-faults.md @@ -13,13 +13,13 @@ 4. Update fields present in _**plugin.properties**_ file available at location src/main/resources - `plugin.id=mangle-plugin-skeleton ` + `plugin.id=mangle-plugin-skeleton` - `plugin.class=com.vmware.mangle.plugin.ManglePlugin ` + `plugin.class=com.vmware.mangle.plugin.ManglePlugin` - `plugin.version=2.0.0 ` + `plugin.version=2.0.0` - `plugin.provider=VMware Inc. ` + `plugin.provider=VMware Inc.` `plugin.dependencies=,` @@ -36,8 +36,8 @@ 8. The user should develop three types of classes as pf4j extensions for implementing the custom fault. 1. Model-Extension: Model extension is the data object for the task corresponding to Custom Fault. It should also extend the `PluginFaultSpec`, the base class for all the mangle fault inputs from user. - 2. Task-Extension: Task extension is the logical implementation of the Fault. It must implement the `AbstractTaskHelper `interface defined by mangle-task -framework. ‘mangle-task-framework' also provide `AbstractRemoteCommandExecutionTaskHelper `extensive implemented version of `AbstractTaskHelper`. By using the `AbstractRemoteCommandExecutionTaskHelper`, the developer of plugin is required to only provide the Injection and Remediation Commands and should not be concerned with rest of the task management. - 3. Fault-Extension: Fault extension hold the transformation logic to convert the User inputs of the Fault to the Model-Extension corresponding to its Task-Extension. This help user of custom fault to provide a very simple input and the task to have elaborated data class supporting wider management options. The Fault-Extension should extend `AbstractCustomFault `of mangle. + 2. Task-Extension: Task extension is the logical implementation of the Fault. It must implement the `AbstractTaskHelper` interface defined by mangle-task -framework. ‘mangle-task-framework' also provide `AbstractRemoteCommandExecutionTaskHelper` extensive implemented version of `AbstractTaskHelper`. By using the `AbstractRemoteCommandExecutionTaskHelper`, the developer of plugin is required to only provide the Injection and Remediation Commands and should not be concerned with rest of the task management. + 3. Fault-Extension: Fault extension hold the transformation logic to convert the User inputs of the Fault to the Model-Extension corresponding to its Task-Extension. This help user of custom fault to provide a very simple input and the task to have elaborated data class supporting wider management options. The Fault-Extension should extend `AbstractCustomFault` of mangle. More details provided for each of the above extensions in their own sections. @@ -50,43 +50,43 @@ \ `{` - ` "pluginId": "mangle-plugin-skeleton",` + `"pluginId": "mangle-plugin-skeleton",` - ` "faults": [` + `"faults": [` - ` {` + `{` - ` "faultParameters": {` + `"faultParameters": {` - ` "field1": "field1",` + `"field1": "field1",` - ` "field2": "field2"` + `"field2": "field2"` - ` },` + `},` - ` "extensionDetails": {` + `"extensionDetails": {` - ` "modelExtensionName": "com.vmware.mangle.plugin.model.faults.specs.HelloMangleFaultSpec",` + `"modelExtensionName": "com.vmware.mangle.plugin.model.faults.specs.HelloMangleFaultSpec",` - ` "taskExtensionName": "com.vmware.mangle.plugin.tasks.impl.HelloManglePluginTaskHelper",` + `"taskExtensionName": "com.vmware.mangle.plugin.tasks.impl.HelloManglePluginTaskHelper",` - ` "faultExtensionName": "com.vmware.mangle.plugin.helpers.faults.HelloMangleFault"` + `"faultExtensionName": "com.vmware.mangle.plugin.helpers.faults.HelloMangleFault"` - ` },` + `},` - ` "faultName": "mangle-plugin-skeleton-HelloMangleFault",` + `"faultName": "mangle-plugin-skeleton-HelloMangleFault",` - ` "supportedEndpoints": [` + `"supportedEndpoints": [` - ` "MACHINE"` + `"MACHINE"` - ` ],` + `],` - ` "pluginId": "mangle-plugin-skeleton"` + `"pluginId": "mangle-plugin-skeleton"` - ` }` + `}` - ` ]` + `]` `}` @@ -101,40 +101,40 @@ 1. Find all the registered Custom faults at - `GET `[`https://localhost:8443/mangle-services/rest/api/v1/plugins/plugin-details?pluginId=mangle-plugin-skeleton`](https://localhost:8443/mangle-services/rest/api/v1/plugins/plugin-details?pluginId=mangle-plugin-skeleton) + `GET` [`https://localhost:8443/mangle-services/rest/api/v1/plugins/plugin-details?pluginId=mangle-plugin-skeleton`](https://localhost:8443/mangle-services/rest/api/v1/plugins/plugin-details?pluginId=mangle-plugin-skeleton) 2. Find the sample request data for any registered custom Fault at - `GET `[`https://localhost:8443/mangle-services/rest/api/v1/plugins/request-json?faultName=`](https://localhost:8443/mangle-services/rest/api/v1/plugins/request-json?faultName=)` mangle-plugin-skeleton-HelloMangleFault&pluginId=mangle-plugin-skeleton ` + `GET` [`https://localhost:8443/mangle-services/rest/api/v1/plugins/request-json?faultName=`](https://localhost:8443/mangle-services/rest/api/v1/plugins/request-json?faultName=) `mangle-plugin-skeleton-HelloMangleFault&pluginId=mangle-plugin-skeleton` 3. Invoke Custom fault by providing the request as per the sample received in last step at - `POST `[`https://localhost:8443/mangle-services/rest/api/v1/plugins/custom-fault`](https://localhost:8443/mangle-services/rest/api/v1/plugins/custom-fault)` ` + `POST` [`https://localhost:8443/mangle-services/rest/api/v1/plugins/custom-fault`](https://localhost:8443/mangle-services/rest/api/v1/plugins/custom-fault) `` Sample Request: \ `{` - ` "faultName": "mangle-plugin-skeleton-HelloMangleFault",` + `"faultName": "mangle-plugin-skeleton-HelloMangleFault",` - ` "endpointName": "testEndpoint",` + `"endpointName": "testEndpoint",` - ` "faultParameters": {` + `"faultParameters": {` - ` "field1": "Hi",` + `"field1": "Hi",` - ` "field2": "Mangle"` + `"field2": "Mangle"` - ` },` + `},` - ` "pluginId": "mangle-plugin-skeleton"` + `"pluginId": "mangle-plugin-skeleton"` `}` -13. Model-Extension: An example is available as `HelloMangleFaultSpec `at package com.vmware.mangle.plugin.model.faults.specs of mangle-plugin-skeleton. Plugin developer is expected to provide only the parameters he is expecting from the user of his fault while executing in his environment. The plugin developer can conveniently Ignore the fields that are inherited from the base class `CommandExecutionFaultSpec` which are designed for the Management of Faults as Asynchronous tasks in Mangle. +13. Model-Extension: An example is available as `HelloMangleFaultSpec` at package com.vmware.mangle.plugin.model.faults.specs of mangle-plugin-skeleton. Plugin developer is expected to provide only the parameters he is expecting from the user of his fault while executing in his environment. The plugin developer can conveniently Ignore the fields that are inherited from the base class `CommandExecutionFaultSpec` which are designed for the Management of Faults as Asynchronous tasks in Mangle. -14. Task-Extension: An example is available as `HelloManglePluginTaskHelper` at package com.vmware.mangle.test.plugin.helpers of mangle-plugin-skeleton. This task Helper is an implementation of `AbstractRemoteCommandExecutionTaskHelper`. The implementation of `AbstractRemoteCommandExecutionTaskHelper `is only expected to provide the implementation for below methods: +14. Task-Extension: An example is available as `HelloManglePluginTaskHelper` at package com.vmware.mangle.test.plugin.helpers of mangle-plugin-skeleton. This task Helper is an implementation of `AbstractRemoteCommandExecutionTaskHelper`. The implementation of `AbstractRemoteCommandExecutionTaskHelper` is only expected to provide the implementation for below methods: @@ -144,35 +144,35 @@ - **`public Task init(T taskData, String injectedTaskId) throws MangleException; `** + **`public Task init(T taskData, String injectedTaskId) throws MangleException;`** Should provide the Implementation to initialize the Task Helper for executing the Fault, if the existing Task id also provided. This method will be used for executing the Remediation on a Task if the Remediation is available. This initialization is not used for task rerun or the Re-trigger. - **`public void executeTask(Task task) throws MangleException; `** + **`public void executeTask(Task task) throws MangleException;`** Provide the Implementation for execution steps required in addition to Implementation available in `AbstractRemoteCommandExecutionTaskHelper`. Plugin developer can use this interface to invoke his own implementation of Helpers for supporting his Fault across multiple endpoints supported in mangle. - **`protected ICommandExecutor getExecutor(Task task) throws MangleException; `** + **`protected ICommandExecutor getExecutor(Task task) throws MangleException;`** - Provide the Implementation for defining the Executor required for the Fault Execution. Mangle provide a default implementation of a executor for each Supported Endpoint. The Plugin user is free to use his own executor as long as he is implementing the resource as per the interface `ICommandExecutor `available at package com.vmware.mangle.utils; + Provide the Implementation for defining the Executor required for the Fault Execution. Mangle provide a default implementation of a executor for each Supported Endpoint. The Plugin user is free to use his own executor as long as he is implementing the resource as per the interface `ICommandExecutor` available at package com.vmware.mangle.utils; - **`protected void checkTaskSpecificPrerequisites(Task task) throws MangleException; `** + **`protected void checkTaskSpecificPrerequisites(Task task) throws MangleException;`** Provide the Implementation if the Fault being developed expect the test machine to be satisfying a condition for the execution. This step is separated from the Fault execution as Mangle wants to make sure the Fault execution or Remediation will not leave the user environment in a irrecoverable state due to execution of them in a non-perquisite satisfying machine. - **`protected void prepareEndpoint(Task task, List listOfFaultInjectionScripts) throws MangleException; `**Provide the Implementation if the Fault execution needs certain changes to the Test Machine before execution. Examples are Copying a binary file required to execute a fault. This step is optional for user as the predefined implementation already copies the files returned by `listFaultInjectionScripts() `to the remote machine. + **`protected void prepareEndpoint(Task task, List listOfFaultInjectionScripts) throws MangleException;`** Provide the Implementation if the Fault execution needs certain changes to the Test Machine before execution. Examples are Copying a binary file required to execute a fault. This step is optional for user as the predefined implementation already copies the files returned by `listFaultInjectionScripts()` to the remote machine. - **`public String getDescription(Task task); `** + **`public String getDescription(Task task);`** Provide Implementation to generate description for Fault based on user inputs to help him to identify the task in future through the description. A generic implementation is already available at TaskDescriptionUtils.getDescription(task). @@ -189,7 +189,7 @@ 16. Mangle does not support the inclusion of Custom Endpoints through Plugin. The requirement of addition of endpoint can be gone through the Mange contributions flow as defined in Mangle repository. -17. Task-Extension Deep Dive: An example is available as `HelloManglePluginTaskHelper `at package com.vmware.mangle.plugin.tasks.impl of mangle-plugin-skeleton. This task Helper is an implementation of `AbstractRemoteCommandExecutionTaskHelper`. The implementation of `AbstractRemoteCommandExecutionTaskHelper `is only expected to provide the implementation for below methods: +17. Task-Extension Deep Dive: An example is available as `HelloManglePluginTaskHelper` at package com.vmware.mangle.plugin.tasks.impl of mangle-plugin-skeleton. This task Helper is an implementation of `AbstractRemoteCommandExecutionTaskHelper`. The implementation of `AbstractRemoteCommandExecutionTaskHelper` is only expected to provide the implementation for below methods: @@ -205,13 +205,13 @@ - **`public void executeTask(Task task) throws MangleException; `** + **`public void executeTask(Task task) throws MangleException;`** Provide the Implementation for execution steps required in addition to Implementation available in `AbstractRemoteCommandExecutionTaskHelper`. Plugin developer can use this interface to invoke his own implementation of Helpers for supporting his Fault across multiple endpoints supported in mangle. - **`protected ICommandExecutor getExecutor(Task task) throws MangleException; `** + **`protected ICommandExecutor getExecutor(Task task) throws MangleException;`** Provide the Implementation for defining the Executor required for the Fault Execution. Mangle provide a default implementation of a executor for each Supported Endpoint. The Plugin user should use appropriate executor as per the endpoint provided as the target. Below is the Mapping of Executors to their Endpoint Types. @@ -227,29 +227,29 @@ - **`protected void checkTaskSpecificPrerequisites(Task task) throws MangleException; `** + **`protected void checkTaskSpecificPrerequisites(Task task) throws MangleException;`** Provide the Implementation if the Fault being developed expect the test machine to be satisfying a condition for the execution. This step is separated from the Fault execution as Mangle wants to make sure the Fault execution or Remediation will not leave the user environment in a irrecoverable state due to execution of them in a non-perquisite satisfying machine. - **`protected void prepareEndpoint(Task task, List listOfFaultInjectionScripts) throws MangleException; `**Provide the Implementation if the Fault execution needs certain changes to the Test Machine before execution. Examples are Copying a binary file required to execute a fault. This step is optional for user as the predefined implementation already copies the files returned by `listFaultInjectionScripts()` to the remote machine. + **`protected void prepareEndpoint(Task task, List listOfFaultInjectionScripts) throws MangleException;`** Provide the Implementation if the Fault execution needs certain changes to the Test Machine before execution. Examples are Copying a binary file required to execute a fault. This step is optional for user as the predefined implementation already copies the files returned by `listFaultInjectionScripts()` to the remote machine. - **`public String getDescription(Task task); `** + **`public String getDescription(Task task);`** Provide Implementation to generate description for Fault based on user inputs to help him to identify the task in future through the description. A generic implementation is already available at `TaskDescriptionUtils.getDescription(task)`. - **`public List listFaultInjectionScripts(Task task); `** + **`public List listFaultInjectionScripts(Task task);`** Provide an implementation that return details of the support scrips to be copied to test machine required for executing the fault getting implemented. The support files can be any file required to be placed in the target in order to execute the developed fault. All the out of the box executors is capable of copying files to the corresponding targeted endpoint and the process completes automatically by default implementation of the `AbstractRemoteCommandExecutionTaskHelper`, provide that the names of the files are returned through `listFaultInjectionScripts()` implementation. - **`private List getInjectionCommandInfoList(T faultSpec) {} `** + **`private List getInjectionCommandInfoList(T faultSpec) {}`** Provide the commands to be executed for the Fault to be Injected. The commands should be provided as List. The fields and descriptions for the CommandInfo Fields. @@ -264,30 +264,30 @@ - `public class CommandOutputProcessingInfo ` + `public class CommandOutputProcessingInfo` Fields are 1. `private String regExpression;` Regular Expression Pattern to be used to collect an crucial information from current command’s execution to make it available throughout the Fault execution. - 2. `private String extractedPropertyName; ` + 2. `private String extractedPropertyName;` Name should be given to the collected information using the pattern given as regExpression - **Types of Variables and Their Usage: ** + **Types of Variables and Their Usage:** The information provided by the user or collected during the runtime of Fault are made available to command executor as below types of Variables. - 1. `TaskTroubleShootingInfo `of the Task holds the extracted information from the command execution Output. - 2. args field of `CommandExecutionFaultSpec `available as `taskData `in Task holds the data received from the user as args. - 3. `$FI_ADD_INFO_FieldName` can be used to refer to variables from `TaskTroubleShootingInfo ` + 1. `TaskTroubleShootingInfo` of the Task holds the extracted information from the command execution Output. + 2. args field of `CommandExecutionFaultSpec` available as `taskData` in Task holds the data received from the user as args. + 3. `$FI_ADD_INFO_FieldName` can be used to refer to variables from `TaskTroubleShootingInfo` 4. `$FI_ARG_Fieldname` can be used to refer to variables from args. 5. `$FI_STACK` can be used to refer to the output of the previous command. - 9. `private List getRemediationCommandInfoList(T faultSpec) {} ` + 9. `private List getRemediationCommandInfoList(T faultSpec) {}` - Provide the commands to for remediating the fault already Injected. The semantics of `CommandInfo `is same as it described in the previous section. The args and `TaskTroubleShootingInfo `collected during the injection will be available during the execution of remediation as well. Hence the dependency data from injection task can be passed to remediation by using the References in the commands. + Provide the commands to for remediating the fault already Injected. The semantics of `CommandInfo` is same as it described in the previous section. The args and `TaskTroubleShootingInfo` collected during the injection will be available during the execution of remediation as well. Hence the dependency data from injection task can be passed to remediation by using the References in the commands. diff --git a/docs/sre-developers-and-users/injecting-faults/fault-events-in-dynatrace.md b/docs/sre-developers-and-users/injecting-faults/fault-events-in-dynatrace.md new file mode 100644 index 0000000..f982380 --- /dev/null +++ b/docs/sre-developers-and-users/injecting-faults/fault-events-in-dynatrace.md @@ -0,0 +1,17 @@ +# Fault Events in Dynatrace + +If Dynatrace is your preferred metric provider and you are interested in publishing the fault events from Mangle to Dyntrace then while setting tags during fault injection, provide the entity ID of the endpoint / service being impacted by the fault as the tag value. Please refer to the screenshot below for an example of how to add these tags. + +![Adding entity ids as Tags for Dynatrace integration](../../.gitbook/assets/Adding\_EntityId\_as\_tag.png) + +Sending of fault injection event to Dynatrace will fail if entity ID specified in the tag is invalid. Fault injection events will appear in Dynatrace UI under the specified entity (endpoint /service being impacted) on providing the valid entity ID as value in the Tags section. Please refer the screenshot for the fault injection events. + +![Fault Injection E](../../.gitbook/assets/FaultInjection\_Dynatrace\_Events.png) + +{% hint style="info" %} +Entity ID of a service / entity can be retrieved from Dynatrace UI using the URL. Please navigate to entity page in Dynatrace and you will be able to find the entity ID in the URL. + + +{% endhint %} + +![Entity ID in Dynatrace entity page URL](../../.gitbook/assets/Url\_Entity\_Page.png) diff --git a/docs/sre-developers-and-users/injecting-faults/infrastructure-faults.md b/docs/sre-developers-and-users/injecting-faults/infrastructure-faults.md index 4b9a634..e6a38e2 100644 --- a/docs/sre-developers-and-users/injecting-faults/infrastructure-faults.md +++ b/docs/sre-developers-and-users/injecting-faults/infrastructure-faults.md @@ -50,7 +50,7 @@ CPU fault enables spiking cpu usage values for a selected endpoint by a percenta 5. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. 6. Provide a "Timeout" value in milliseconds. For eg: if you need the CPU load of 80% to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 7. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -8. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +8. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. If you are using Dynatrace as your preferred metric provider, please refer to the additional instructions provided [here](fault-events-in-dynatrace.md) on setting entity IDs and tags for the integration to work as expected. 9. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 10. Click on Run Fault. 11. The user will be re-directed to the Processed Requests section under Requests & Reports tab. @@ -76,7 +76,7 @@ Memory fault enables spiking memory usage values for a selected endpoint by a pe 5. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. 6. Provide a "Timeout" value in milliseconds. For eg: if you need the Memory load of 80% to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 7. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -8. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +8. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. If you are using Dynatrace as your preferred metric provider, please refer to the additional instructions provided [here](fault-events-in-dynatrace.md) on setting entity IDs and tags for the integration to work as expected. 9. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 10. Click on Run Fault. 11. The user will be re-directed to the Processed Requests section under Requests & Reports tab. @@ -102,7 +102,7 @@ Disk IO fault enables spiking disk IO operation for a selected endpoint by an IO 5. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. 6. Provide a "Timeout" value in milliseconds. For eg: if you need the IO load of 8192000 to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 7. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -8. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +8. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. If you are using Dynatrace as your preferred metric provider, please refer to the additional instructions provided [here](fault-events-in-dynatrace.md) on setting entity IDs and tags for the integration to work as expected. 9. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 10. Click on Run Fault. 11. The user will be re-directed to the Processed Requests section under Requests & Reports tab. @@ -129,7 +129,7 @@ Kill Process fault enables abrupt termination of any process that is running on 6. Provide a "Remediation Command". For eg: To start the sshd process that was killed on an Ubuntu 17 Server, specify the remediation command as _"sudo service ssh start" ._ 7. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. 8. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. If you are using Dynatrace as your preferred metric provider, please refer to the additional instructions provided [here](fault-events-in-dynatrace.md) on setting entity IDs and tags for the integration to work as expected. 10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 11. Click on Run Fault. 12. The user will be re-directed to the Processed Requests section under Requests & Reports tab. @@ -155,7 +155,7 @@ Stop service fault enables graceful shutdown of any process that is running on t 5. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. 6. Provide a "Timeout" value in milliseconds. For eg: if you need the Service to be stopped for a duration of 1 hour, then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 7. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -8. Tags are key value pairs that will be sent to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +8. Tags are key value pairs that will be sent to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. If you are using Dynatrace as your preferred metric provider, please refer to the additional instructions provided [here](fault-events-in-dynatrace.md) on setting entity IDs and tags for the integration to work as expected. 9. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 10. Click on Run Fault. 11. The user will be re-directed to the Processed Requests section under Requests & Reports tab. @@ -180,7 +180,7 @@ File Handler Leak fault enables you to simulate conditions where a program reque 4. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. 5. Provide a "Timeout" value in milliseconds. For eg: if you need the out of file handles error to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 6. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -7. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +7. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. If you are using Dynatrace as your preferred metric provider, please refer to the additional instructions provided [here](fault-events-in-dynatrace.md) on setting entity IDs and tags for the integration to work as expected. 8. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 9. Click on Run Fault. 10. The user will be re-directed to the Processed Requests section under Requests & Reports tab. @@ -209,7 +209,7 @@ Disk Space Fault enables you to simulate out of disk or low disk space condition 6. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. 7. Provide a "Timeout" value in milliseconds. For eg: if you need the low disk or out of disk condition to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 8. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. If you are using Dynatrace as your preferred metric provider, please refer to the additional instructions provided [here](fault-events-in-dynatrace.md) on setting entity IDs and tags for the integration to work as expected. 10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 11. Click on Run Fault. 12. The user will be re-directed to the Processed Requests section under Requests & Reports tab. @@ -276,7 +276,7 @@ Clock Skew Fault simulates conditions where the endpoint time is distorted and d 6. Set the skew time by specifying the seconds, minutes, hours and days or a combination of these options. 7. Provide a "Timeout" value in milliseconds. For eg: if you need the clock skew condition to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 8. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. If you are using Dynatrace as your preferred metric provider, please refer to the additional instructions provided [here](fault-events-in-dynatrace.md) on setting entity IDs and tags for the integration to work as expected. 10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 11. Click on Run Fault. 12. The user will be re-directed to the Processed Requests section under Requests & Reports tab. @@ -291,27 +291,29 @@ Clock Skew Fault simulates conditions where the endpoint time is distorted and d ![](../../.gitbook/assets/wavefrontevents.png) -## ![](../../.gitbook/assets/new\_logo.png) Network Partition Fault +## Network Faults -Network Partition Fault simulates conditions where endpoints lose connectivity due to a network split primarily due to failures in underlying network devices. This induces cases where clustered setups lose nodes with impact to high availability, data consistency and end up in split brain scenario in the worst cases. +Network Faults enables you to simulate unfavorable conditions such as packet delay, packet duplication, packet loss and packet corruption. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure. + +### Packet Delay **Steps to follow:** 1. Login as a user with read and write privileges to Mangle. -2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network Partition. Only remote machine and remote machine clusters are supported for this fault. +2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Delay. 3. Select an Endpoint. -4. Provide a host IP or a list of host IPs to which the endpoint should lose network connectivity due to network partition. -5. If the single host IP provided is identical to the Endpoint host, it throws error at the injection of fault. Because, the Endpoint host and the host IP provided must be different.\ - But if user provides host IPs list and if a host IP is identical to the one in Endpoint host/ Endpoint group hosts, the fault injection proceeds by selecting the Endpoint -Host IP pair of the remaining list. +4. Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault. +5. Provide a "Latency" value in milliseconds. For eg: 1000 to simulate a packet delay of 1 second on a particular network interface of an Endpoint. 6. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. -7. Provide a "Timeout" value in milliseconds. For eg: if you need the partitioning to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. -8. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. -9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. -10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. -11. Click on Run Fault. -12. The user will be re-directed to the Processed Requests section under Requests & Reports tab. -13. If Mangle was able to successfully trigger the fault, the status of the task will change to "INJECTED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the ![](../../.gitbook/assets/actions\_button.png) button against the task in the Processed Requests table. The task will be updated to "COMPLETED" once the task is auto remediated or manually remediated before the fault timeout. -14. For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes. +7. Provide a "Jitter" value in milliseconds only if you would like Mangle to generate a variable fault that would change between two thresholds ie: Latency +/- Jitter. For eg: If you want the fault of 1 minute to vary between 30 sec and 90 sec then you provide a Latency value of 60000ms and a Jitter value of 30000ms. +8. Provide a "Timeout" value in milliseconds. For eg: if you need the packet delay to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. +9. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. +10. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +11. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. +12. Click on Run Fault. +13. The user will be re-directed to the Processed Requests section under Requests & Reports tab. +14. If Mangle was able to successfully trigger the fault, the status of the task will change to "INJECTED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the ![](../../.gitbook/assets/actions\_button.png) button against the task in the Processed Requests table. The task will be updated to "COMPLETED" once the task is auto remediated or manually remediated before the fault timeout. +15. For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes. @@ -319,23 +321,19 @@ Network Partition Fault simulates conditions where endpoints lose connectivity d - ![](../../.gitbook/assets/wavefrontevents.png) - -## Network Faults - -Network Faults enables you to simulate unfavorable conditions such as packet delay, packet duplication, packet loss and packet corruption. With the help of a timeout field the duration for the fault run can be specified after which Mangle triggers the automatic remediation procedure. + ![](../../.gitbook/assets/wavefrontevents.png) -### Packet Delay +### Packet Duplication **Steps to follow:** 1. Login as a user with read and write privileges to Mangle. -2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Delay. +2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Duplicate. 3. Select an Endpoint. 4. Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault. -5. Provide a "Latency" value in milliseconds. For eg: 1000 to simulate a packet delay of 1 second on a particular network interface of an Endpoint. +5. Provide a "Percentage" value to specify what percentage of the packets should be duplicated. For eg: 10 to simulate a packet duplication of 10 percentage on a particular network interface of an Endpoint. 6. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. -7. Provide a "Timeout" value in milliseconds. For eg: if you need the packet delay to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. +7. Provide a "Timeout" value in milliseconds. For eg: if you need the packet duplication to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 8. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. 9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. 10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. @@ -350,19 +348,19 @@ Network Faults enables you to simulate unfavorable conditions such as packet del - ![](../../.gitbook/assets/wavefrontevents.png) + ![](../../.gitbook/assets/wavefrontevents.png) -### Packet Duplication +### Packet Loss **Steps to follow:** 1. Login as a user with read and write privileges to Mangle. -2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Duplicate. +2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Loss. 3. Select an Endpoint. 4. Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault. -5. Provide a "Percentage" value to specify what percentage of the packets should be duplicated. For eg: 10 to simulate a packet duplication of 10 percentage on a particular network interface of an Endpoint. +5. Provide a "Percentage" value to specify what percentage of the packets should be dropped. For eg: 10 to simulate a packet drop of 10 percentage on a particular network interface of an Endpoint. 6. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. -7. Provide a "Timeout" value in milliseconds. For eg: if you need the packet duplication to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. +7. Provide a "Timeout" value in milliseconds. For eg: if you need the packet drop to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 8. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. 9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. 10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. @@ -379,17 +377,17 @@ Network Faults enables you to simulate unfavorable conditions such as packet del ![](../../.gitbook/assets/wavefrontevents.png) -### Packet Loss +### Packet Corruption **Steps to follow:** 1. Login as a user with read and write privileges to Mangle. -2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Loss. +2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Corruption. 3. Select an Endpoint. 4. Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault. -5. Provide a "Percentage" value to specify what percentage of the packets should be dropped. For eg: 10 to simulate a packet drop of 10 percentage on a particular network interface of an Endpoint. +5. Provide a "Percentage" value to specify what percentage of the packets should be corrupted. For eg: 10 to simulate a packet corruption of 10 percentage on a particular network interface of an Endpoint. 6. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. -7. Provide a "Timeout" value in milliseconds. For eg: if you need the packet drop to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. +7. Provide a "Timeout" value in milliseconds. For eg: if you need the packet corruption to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 8. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. 9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. 10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. @@ -406,17 +404,20 @@ Network Faults enables you to simulate unfavorable conditions such as packet del ![](../../.gitbook/assets/wavefrontevents.png) -### Packet Corruption +### Network Partition Fault + +Network Partition Fault simulates conditions where endpoints lose connectivity due to a network split primarily due to failures in underlying network devices. This induces cases where clustered setups lose nodes with impact to high availability, data consistency and end up in split brain scenario in the worst cases. **Steps to follow:** 1. Login as a user with read and write privileges to Mangle. -2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network ---> Packet Corruption. +2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Network Partition. Only remote machine and remote machine clusters are supported for this fault. 3. Select an Endpoint. -4. Provide a "Nic Name". For eg: For a remote machine endpoint Nic name could be eth0, eth1 etc depending on what adapter you would want to target for the fault. -5. Provide a "Percentage" value to specify what percentage of the packets should be corrupted. For eg: 10 to simulate a packet corruption of 10 percentage on a particular network interface of an Endpoint. +4. Provide a host IP or a list of host IPs to which the endpoint should lose network connectivity due to network partition. +5. If the single host IP provided is identical to the Endpoint host, it throws error at the injection of fault. Because, the Endpoint host and the host IP provided must be different.\ + But if user provides host IPs list and if a host IP is identical to the one in Endpoint host/ Endpoint group hosts, the fault injection proceeds by selecting the Endpoint -Host IP pair of the remaining list. 6. Provide "Injection Home Dir" only if you would like Mangle to push the script files needed to simulate the fault to a specific location on the endpoint. Else the default temp location will be used. -7. Provide a "Timeout" value in milliseconds. For eg: if you need the packet corruption to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. +7. Provide a "Timeout" value in milliseconds. For eg: if you need the partitioning to be sustained for a duration of 1 hour then you should provide the timeout value as 3600000 (1 hour = 3600000 ms). After this duration, Mangle will ensure remediation of the fault without any manual intervention. 8. Schedule options are required only if the fault needs to be re-executed at regular intervals against an endpoint. 9. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. 10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. @@ -476,13 +477,41 @@ Kubernetes (K8s) Delete Resource faults enable you to abruptly delete pods or no 10. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 11. Click on Run Fault. 12. The user will be re-directed to the Processed Requests section under Requests & Reports tab. -13. If Mangle was able to successfully trigger the fault, the status of the task will change to "INJECTED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the ![](../../.gitbook/assets/actions\_button.png) button against the task in the Processed Requests table. The task will be updated to "COMPLETED" once the task is auto remediated or manually remediated before the fault timeout. +13. If Mangle was able to successfully trigger the fault, the status of the task will change to "IN\_PROGRESS". The fault will continue to run at the endpoint and the task will be updated to "COMPLETED" once the fault is done. 14. For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes. ![](../../.gitbook/assets/datadogevents.png) + ![](../../.gitbook/assets/wavefrontevents.png) + + + +## ![](../../.gitbook/assets/new\_logo.png)Kubernetes Drain Node + +Kubernetes (K8s) Drain Node faults enable you to evict all the pods from a node. Unlike other infrastructure faults like CPU, Memory and Disk IO this fault is specific to the K8s endpoint and does not have a timeout field because the fault completes very quickly. In most cases, K8s will automatically replace the deleted resource. This fault allows you see how the applications hosted on these pods behave in case the pods are evicted from one node and move to another node. + +**Steps to follow:** + +1. Login as a user with read and write privileges to Mangle. +2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> K8S ---> Drain Node. +3. Select an Endpoint (Only K8S endpoints are listed). +4. Select a Node identifier: Node Name or Node Labels. +5. If you choose Node Name to identify a node, select from the drop down menu. +6. If you choose Node Labels provide a key value pair for eg: app=mangle. Since multiple resources can have the same label, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one resource in a list of resources identified using the label, for introducing the fault. If "Random Injection" is set to false, it will introduce fault into all resources identified using the resource label. +7. Schedule options are not available for this fault. +8. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. +9. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. +10. Click on Run Fault. +11. The user will be re-directed to the Processed Requests section under Requests & Reports tab. +12. If Mangle was able to successfully trigger the fault, the status of the task will change to "IN\_PROGRESS". The fault will continue to run at the endpoint and the task will be updated to "COMPLETED" once the fault is done. mediated or manually remediated before the fault timeout. +13. For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes. + + ![](../../.gitbook/assets/datadogevents.png) + + + ![](../../.gitbook/assets/wavefrontevents.png) ## Kubernetes Resource Not Ready @@ -504,7 +533,7 @@ Kubernetes (K8s) Resource Not Ready faults enable you to abruptly put pods or no 11. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 12. Click on Run Fault. 13. The user will be re-directed to the Processed Requests section under Requests & Reports tab. -14. If Mangle was able to successfully trigger the fault, the status of the task will change to "INJECTED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the ![](../../.gitbook/assets/actions\_button.png) button against the task in the Processed Requests table. +14. If Mangle was able to successfully trigger the fault, the status of the task will change to "IN\_PROGRESS". The fault will continue to run at the endpoint and the task will be updated to "COMPLETED" once the fault is done. 15. For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes. ![](../../.gitbook/assets/datadogevents.png) @@ -530,7 +559,7 @@ Kubernetes (K8s) Service Not Available faults enable you to abruptly make a serv 9. Supported notifiers include Slack channels that are configured under Mangle Admin settings ---> Integrations ---> Notifiers. This will enable Mangle to automatically publish status of fault injections to the appropriate Slack channels for monitoring purposes. They are optional and you can choose to exclude this while running faults. 10. Click on Run Fault. 11. The user will be re-directed to the Processed Requests section under Requests & Reports tab. -12. If Mangle was able to successfully trigger the fault, the status of the task will change to "INJECTED". The fault will continue to run at the endpoint until the timeout expires or a remediation request is triggered. The option to trigger a remediation request at anytime can be found on clicking the ![](../../.gitbook/assets/actions\_button.png) button against the task in the Processed Requests table. The task will be updated to "COMPLETED" once the task is auto remediated or manually remediated before the fault timeout. +12. If Mangle was able to successfully trigger the fault, the status of the task will change to "IN\_PROGRESS". The fault will continue to run at the endpoint and the task will be updated to "COMPLETED" once the fault is done. 13. For monitoring purposes, log into either Wavefront or Datadog once it is configured as an active Metric provider in Mangle and refer to the Events section. Events similar to the screenshots provided below will be available on the monitoring tool for tracking purposes. ![](../../.gitbook/assets/datadogevents.png) @@ -737,7 +766,7 @@ AWS RDS Faults enables you to stop, reboot, failover and induce connection loss 1. Login as a user with read and write privileges to Mangle. 2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> AWS ---> RDS. 3. Select an Endpoint (Only AWS end points are listed). -4. Select one of the faults to run against the RDS instance: STOP\_INSTANCES, REBOOT\_INSTANCES, FAILOVER\_INSTANCES_ _or_ _CONNECTION\_LOSS. +4. Select one of the faults to run against the RDS instance: STOP\_INSTANCES, REBOOT\_INSTANCES, FAILOVER\_INSTANCES __ or __ CONNECTION\_LOSS. 5. Provide the appropriate DB identifiers. 6. If "Random Injection" is set to true, Mangle will randomly choose one of the DB instances that is identified using the DB identifier. If "Random Injection" is set to false, it will introduce fault into all the instances. 7. Schedule options are not available for this fault. @@ -765,7 +794,7 @@ Azure Virtual Machine State Change fault enables you to abruptly delete, stop or 1. Login as a user with read and write privileges to Mangle. 2. Navigate to Fault Execution tab ---> Infrastructure Faults ---> Azure ---> Virtual Machine---> State. 3. Select an Endpoint (Only Azure accounts are listed). -4. Select one of the faults: Delete\_VMs, Stop\_VMs_, _Restart\_VMs. +4. Select one of the faults: Delete\_VMs, Stop\_VMs_,_ Restart\_VMs. 5. Provide the Azure tag (key value pair to uniquely identify the instance(s). Since multiple instances can have the same tag, you also need to specify if you are interested in a Random Injection. If "Random Injection" is set to true, Mangle will randomly choose one instance from a list of instances identified using the tag, for introducing the fault. If "Random Injection" is set to false, it will introduce the fault into all the instances identified using the tag. 6. Schedule options are not available for this fault. 7. Tags are key value pairs that will be send to the active monitoring tool under Mangle Admin settings ---> Integrations ---> Metric Providers at the time of publishing events for fault injection and remediation. They are optional and you can choose to exclude this while running faults. @@ -831,12 +860,14 @@ Azure Virtual Machine Storage fault enables you to detach all or one random volu ![](../../.gitbook/assets/wavefrontevents.png) +## Additional things to note about the Dynatrace Itegration + ## Relevant API Reference {% hint style="info" %} **For access to relevant API Swagger documentation:** -Please traverse to link** **![](../../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#_/_fault-injection-controller_ +Please traverse to link **** ![](../../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#_/_fault-injection-controller_ ![](broken-reference) ![](../../.gitbook/assets/faultinjectioncontroller.png) {% endhint %} diff --git a/docs/sre-developers-and-users/requests-and-reports.md b/docs/sre-developers-and-users/requests-and-reports.md index 30a811c..535b69b 100644 --- a/docs/sre-developers-and-users/requests-and-reports.md +++ b/docs/sre-developers-and-users/requests-and-reports.md @@ -68,7 +68,7 @@ Click on the Logs link to open up a browser window displaying the current Mangle {% hint style="info" %} **For access to relevant API Swagger documentation:** -Please traverse to link** **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#_/_scheduler-controller_ +Please traverse to link **** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#_/_scheduler-controller_ ![](broken-reference) ![](../.gitbook/assets/schedulercontroller.png) {% endhint %} diff --git a/docs/sre-developers-and-users/resiliency-score.md b/docs/sre-developers-and-users/resiliency-score.md index 6768e19..f1ce2ac 100644 --- a/docs/sre-developers-and-users/resiliency-score.md +++ b/docs/sre-developers-and-users/resiliency-score.md @@ -53,11 +53,11 @@ Before you can use this feature, please ensure that the configuration is in plac 7. A task of type RESILIENCY\_SCORE will be created and the status will change to "Completed" as soon the score is generated and send to the monitoring system. {% hint style="info" %} -**PLEASE NOTE: **_This feature is still under evaluation and is supported only **VMware Wavefront**. If you need Mangle to provide support for other monitoring systems, please raise a feature request under _[_Mangle Github_](https://github.com/vmware/mangle/issues)_._ +**PLEASE NOTE:** _This feature is still under evaluation and is supported only **VMware Wavefront**. If you need Mangle to provide support for other monitoring systems, please raise a feature request under_ [_Mangle Github_](https://github.com/vmware/mangle/issues)_._ {% endhint %} {% hint style="info" %} **Relevant API List** -**For access to Swagger documentation, please traverse to link **![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/resiliency-score-controller_ +**For access to Swagger documentation, please traverse to link** ![](../.gitbook/assets/help.png) -----> API Documentation from the Mangle UI or access _https://\/mangle-services/swagger-ui.html#/resiliency-score-controller_ {% endhint %} diff --git a/docs/troubleshooting-guide/deployment-stage.md b/docs/troubleshooting-guide/deployment-stage.md index 24b9794..97aa044 100644 --- a/docs/troubleshooting-guide/deployment-stage.md +++ b/docs/troubleshooting-guide/deployment-stage.md @@ -12,9 +12,24 @@ We have not experienced many failures during Deployment. If any issues occur, pr Provide the following information to support if encountering Deployment Stage failures: -* Hash \(MD5, SHA-1, or SHA-256\) of the OVA/container images you deployed +* Hash (MD5, SHA-1, or SHA-256) of the OVA/container images you deployed * Deployment method: * Deployment environment * Verify that the targeted datastore has enough space * Provide details about the targeted vCenter compute, storage, and networking +## **Known Issues** + +#### Mangle fails to start with error "org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean..." and generates huge number of lines in the log file for Mangle web container + +_**Workaround:**_ + +* Stop both WEB and DB container for Mangle. +* Start DB container. +* Once DB is up start WEB container. + +_**To free up the space if the log partition will full:**_ + +* If the log partition shows 100% utilization, navigate to location /var/lib/docker/containers/_\/_. +* Confirm if a log file of format _\_-json.log exists and is of large size. +* If yes, remove the log file of the format _\_-json.log. diff --git a/docs/troubleshooting-guide/fault-injection-stage.md b/docs/troubleshooting-guide/fault-injection-stage.md index 0a578dd..c9e829c 100644 --- a/docs/troubleshooting-guide/fault-injection-stage.md +++ b/docs/troubleshooting-guide/fault-injection-stage.md @@ -1,2 +1,64 @@ # Fault Injection Stage +There are some known issues and troubleshooting steps to follow when you run into issues while running faults. + +## **Common Error Codes and Next Steps** + +### FI0101, ErrorMessage : Infra agent files are missing at the endpoint! More details available in mangle log. + +1. Usually the fault is run against an unsupported endpoint eg: Photon v1.0 +2. There is unusually high latency while connecting to the endpoint while running a fault. +3. Ensure that the ssh service is running and the credentials are correct. + +### **When faults such as** Spring service exception and latency, JAVA method exception and latency, kill JVM don't run as expected: + +1. ssh into the target machine and execute these commands:\ + `sh /tmp/mangle-java-agent-3.5.0/bin/bmsubmit.sh -l` +2. If the provided classname/methodname is not valid, we still get the btm rule created. But it fails to compile and transform. To confirm this, run the command:\ + `sh /tmp/mangle-java-agent-3.5.0/bin/bmsubmit.sh -l` +3. If the rule description contains "NO COMPILE" and with errors, the provided joint points didn't execute. In this case, check the methodname/classname values again and retry the fault +4. If the rule description contains "NO RULES INSTALLED", then the rules were not installed. In this case, please re-run the fault. + +## **Known Issues** + +#### Application memory fault injection does not run for applications using JDK version 9.0 and above + +The Byteman agent connects to Java process. Out of memory exceptions are never thrown and the memory usage at target application remains as it is. There are no known workarounds for this and is currently a known limitation for Mangle. + +#### CONNECTION\_LOSS AWS fault on RDS not supported for DB Cluster + +The current implementation of AWS fault CONNECTION\_LOSS for RDS works only when the RDS database is an instance and not a cluster. Executing the fault on a cluster throws this error. + +``` +ErrorCode : FI0015, ErrorMessage : Execution of Command: CONNECTION_LOSS: --dbIdentifiers mangle3-5validation-instance-1 failed. errorCode: 1 output: The specified DB Instance is a member of a cluster. Modify database endpoint port number for the DB Cluster using the ModifyDbCluster API (Service: AmazonRDS; Status Code: 400; Error Code: InvalidParameterCombination; Request ID: 1d662ceb-dfa6-45bb-a634-e9a090104b21). +``` + +There are no known workarounds for this and is currently a known limitation for Mangle. + +#### Spring cron expression to schedule fault injection job on an hourly basis doesn't run as expected if there is a missing wildcard character + +Expressions such as the one below for running a fault at "second :00, at minute :18, every hour starting at 11am of every day" + +``` +0 18 11/1 * * ? +``` + +should be replaced with "at second :00 of minute :35 of every hour" as below + +``` +0 35 * ? * * --> +``` + +#### Test connection for Database endpoints succeed but faults executed against the endpoint fail with a connection error. + +When we add a database endpoint and click on test connection, then it will not test the DB credentials but only the parent endpoint (which could be a remote machine, K8s or Docker endpoint). So, errors in DB credentials are not detected as part of testing the connection. This can be further confirmed by looking at the logs. Typically the logs will capture errors such as below. In such cases, please validate the DB credentials and try the fault again. + +``` +2021-10-12 08:05:07.127 [SystemResourceFaultTaskHelper2-1634025149363] ERROR com.vmware.mangle.task.framework.helpers.CommandInfoExecutionHelper.executeRetriableCommand (85) - Command Execution Attempt: 2 Failed. Reason:Exception:Prerequisite failed:Provided db connection properties are not valid!2021-10-12 08:05:07.128 [SystemResourceFaultTaskHelper2-1634025149363] INFO com.vmware.mangle.utils.CommonUtils.delayInSeconds (71) - Sleeping for 2 seconds +2021-10-12 08:05:09.128 [SystemResourceFaultTaskHelper2-1634025149363] INFO com.vmware.mangle.utils.clients.ssh.SSHUtils.runCommandReturningResult (156) - Running Command ... +2021-10-12 08:05:10.261 [SystemResourceFaultTaskHelper2-1634025149363] ERROR com.vmware.mangle.task.framework.helpers.CommandInfoExecutionHelper.verifyExpectedFailures (137) - Execution of Command: cd /tmp//infra_agent;./infra_submit --operation inject --faultname dbConnectionLeakFault_cassandra --dbName CASSANDRA --userName ****** --password ****** --port 9042 --sslEnabled false --timeout 60000 --faultId dbConnectionLeakFault_cassandra failed. errorCode: 1 output: Exception:Prerequisite failed:Provided db connection properties are not valid! +``` + +#### Mangle unable to inject the fault with an error **FAIL 1 sudo: sorry, you must have a tty to run sudo** + + Please ensure that /etc/sudoers file has been updated to have the following entry "`Defaults !requiretty`**".**