This service provides a way to perform chaos tests on your applications triggered by Keptn using the LitmusChaos framework. Learn more about this integration in our 2-part blog series: part 1, part 2.
Keptn Version | litmus-service Docker Image |
---|---|
0.7.1 | keptnsandbox/litmus-service:0.1.0 |
0.7.2 | keptnsandbox/litmus-service:0.1.0 |
0.7.3 | keptnsandbox/litmus-service:0.1.1 |
0.8.0-0.8.3 | keptnsandbox/litmus-service:0.2.0 |
0.8.4-0.8.5 | keptnsandbox/litmus-service:0.2.1 |
0.19.0 | keptnsandbox/litmus-service:0.2.2 |
The Keptn litmus-service requires the following prerequisites to be setup on the Kubernetes cluster for it to run the chaos tests:
- LitmusChaos custom resource definitions (CRDs)
- The Chaos Operator
- The
ChaosExperiment
custom resources (CRs) - The RBAC (
serviceaccount
,role
,rolebinding
) associated with the chaos test
Execute the following commands to setup these dependencies for a demo setup:
kubectl apply -f ./test-data/litmus/litmus-operator-v2.13.0.yaml
kubectl apply -f ./test-data/litmus/pod-delete-ChaosExperiment-CR.yaml
kubectl apply -f ./test-data/litmus/pod-delete-rbac.yaml
This service reacts on the following Keptn CloudEvents (see deploy/service.yaml):
sh.keptn.event.test.triggered
(used to besh.keptn.events.deployment-finished
) -> start litmus chaos testssh.keptn.event.test.finished
(used to besh.keptn.events.tests-finished
) -> clean up residual chaos resources
Notes:
-
This repo provides the example (yaml specifications) of a pod-delete chaos test. You can choose to specify other experiments depending on your need, when building your own litmus service. Ensure that the correct
ChaosEngine
spec is provided in the experiment manifest along with the correspondingChaosExperiment
CR & RBAC manifests. -
This repo uses the sample helloservice app as the Application-Under-Test (AUT) to illustrate the impact of chaos. Hence, the experiment is populated with the respective attributes for app filtering purposes. Ensure you have the right data placed in the
spec.appinfo
when adopting this for your environments.
To deploy the current version of the litmus-service in your Keptn Kubernetes cluster, clone the repo and apply the deploy/service.yaml
file:
kubectl apply -f deploy/service.yaml
This will install the litmus-service
into the keptn
namespace, which you can verify using:
kubectl -n keptn get deployment litmus-service -o wide
kubectl -n keptn get pods -l run=litmus-service
To make use of the Litmus service, a dedicated experiment.yaml
file with the actual chaos experiment has to be added to Keptn (for the service under test).
You can do this via the Keptn CLI, please replace the values for project
, stage
, service
and resource
with your actual values. But note that the resourceUri
has to be set to litmus/experiment.yaml
.
keptn add-resource --project=litmus --stage=chaos --service=carts --resource=litmus/experiment.yaml --resourceUri=litmus/experiment.yaml
Please note that it is recommended to run the chaos experiment along with some load testing.
Now when a send-test
event is sent to Keptn, the chaos test will be triggered along with the load tests. Once the load tests are finished, Keptn will do the evaluation and provide you with a result. With this you can then verify if your application is resilient in the way that your SLOs are still met.
The service implements handlers for triggering the chaos tests in the "testing phase" of Keptn, that means that Keptn will trigger the chaos tests right after deployment. The test is executed by a set of chaos pods (notably, the chaos-runner & experiment pod) and the test results stored in a ChaosResult
custom resource. The duration of the test & other tunables can be configured in the ChaosEngine
resource. Refer to the Litmus docs on supported tunables. Litmus ensures that the review app/deployment is restored to it's initial state upon completion of the test.
The Keptn litmus-service also conditionally generates & handles the test.finished
event by cleaning up residual chaos resources (running or completed) in the cluster.
It is a standard practice to execute the chaos tests in parallel with other performance/load tests running on the AUT. The subsequent quality gate evaluations in such cases are more reflective of real world outcomes.
Note: The sample project provided in this repo (in the test-data
folder), uses a jmeter load test
against the AUT, carts, running in parallel with the pod-delete chaos test.
To delete the litmus-service, delete using the deploy/service.yaml
file:
kubectl delete -f deploy/service.yaml
Adapt and use the following command in case you want to upgrade or downgrade your installed version (specified by the $VERSION
placeholder):
kubectl -n keptn set image deployment/litmus-service litmus-service=keptnsandbox/litmus-service:$VERSION --record
-
The service implements simple handlers for the
sh.keptn.event.test.triggered
&sh.keptn.event.test.finished
events - i.e., triggers chaos by creating theChaosEngine
resource, fetching info fromChaosResult
resource & eventually deleting them, respectively. In case you would need additional functions/capabilities, update the eventhandlers.go. For more info around how to go about this, view the Development section. -
Considering the litmus-service runs in the keptn namespace & acts on resources/applications on other namespaces (as per the project/stage names), it uses a cluster-wide RBAC. Tune the permissions associated with this service based on functionality needed apart from CRUD on
ChaosEngine
&ChaosResults
. -
In case you would like to cleanup chaos resources immediately after completion of the chaos test (either because you aren't running other tests of primary significance such as perf tests), set the environment variable
SEND_TEST_FINISHED_EVENT
totrue
in the litmus-service deployment.
Development can be conducted using any Golang compatible IDE/editor (e.g., Jetbrains GoLand, VSCode with Go plugins).
It is recommended to make use of branches as follows:
master
contains the latest potentially unstable versionrelease-*
contains a stable version of the service (e.g.,release-0.1.0
contains version 0.1.0)- create a new branch for any changes that you are working on, e.g.,
feature/my-cool-stuff
orbug/overflow
- once ready, create a pull request from that branch back to the
master
branch
When writing code, it is recommended to follow the coding style suggested by the Golang community.
If you don't care about the details, your first entrypoint is eventhandlers.go. Within this file you can add implementation for pre-defined Keptn Cloud events.
To better understand Keptn CloudEvents, please look at the Keptn Spec.
If you want to get more insights, please look into main.go, deploy/service.yaml, consult the Keptn docs as well as existing Keptn Core and Keptn Contrib services.
- Build the binary:
go build -ldflags '-linkmode=external' -v -o litmus-service
- Run tests:
go test -race -v ./...
- Build the docker image:
docker build . -t keptnsandbox/litmus-service:dev
(Note: Ensure that you use the correct DockerHub account/organization) - Run the docker image locally:
docker run --rm -it -p 8080:8080 keptnsandbox/litmus-service:dev
- Push the docker image to DockerHub:
docker push keptnsandbox/litmus-service:dev
(Note: Ensure that you use the correct DockerHub account/organization, e.g., your personal account likedocker push myaccount/litmus-service:dev
) - Deploy the service using
kubectl
:kubectl apply -f deploy/
- Delete/undeploy the service using
kubectl
:kubectl delete -f deploy/
- Watch the deployment using
kubectl
:kubectl -n keptn get deployment litmus-service -o wide
- Get logs using
kubectl
:kubectl -n keptn logs deployment/litmus-service -f
- Watch the deployed pods using
kubectl
:kubectl -n keptn get pods -l run=litmus-service
- Deploy the service using Skaffold:
skaffold run --default-repo=your-docker-registry --tail
(Note: Replaceyour-docker-registry
with your DockerHub username; also make sure to adapt the image name in skaffold.yaml)
We have dummy cloud-events in the form of RFC 2616 requests in the test-events/ directory. These can be easily executed using third party plugins such as the Huachao Mao REST Client in VS Code.
This repo uses reviewdog for automated reviews of Pull Requests.
You can find the details in .github/workflows/reviewdog.yml.
This repo has automated unit tests for pull requests.
You can find the details in .github/workflows/CI.yml.
This repo uses GH Actions to automatically build docker images.
The following secrets need to be added on your repository secrets:
REGISTRY_USER
- your DockerHub usernameREGISTRY_PASSWORD
- a DockerHub access token (alternatively, your DockerHub password)
Furthermore, the variable IMAGE
needs to be configured properly in .ci_env
IMAGE=keptnsandbox/litmus-service
It is assumed that the current development takes place in the master branch (either via Pull Requests or directly).
To make use of the built-in automation using Travis CI for releasing a new version of this service, you should
- branch away from master to a branch called
release-x.y.z
(wherex.y.z
is your version), - write release notes in the releasenotes/ folder,
- check the output of Travis CI builds for the release branch,
- verify that your image was built and pushed to DockerHub with the right tags,
- update the image tags in [deploy/service.yaml], and
- test your service against a working Keptn installation.
If any problems occur, fix them in the release branch and test them again.
Once you have confirmed that everything works and your version is ready to go, you should
- create a new release on the release branch using the GitHub releases page, and
- merge any changes from the release branch back to the master branch.
Please find more information in the LICENSE file.