CPA - Continuous Performance Analysis

This tool allows OpenShift users to run a watcher for Prometheus queries and define thresholds (using a yaml file) to observe the performance of the OpenShift cluster during performance testing. It could be generalized to run constantly against a cluster and alert you when cluster is looking bad. It may sound like some of the other monitoring & alerting solutions but its supposed to be simple, scalable and user-friendly.

Why use CPA:

Runs external and can work with any "Prometheus"
Can be extended to run queries other than Prometheus, such as ElasticSearch, or simple OC CLI commands
History for each time you run - can be stored in log files

Design:

                        ┌─────────────────────┐                             ┌───────────────────────────┐
                        │                     │                             │           OpenShift       │
                        │ Benchmark Job       │                             │                           │
                        │                     │                             │     ┌───────────────┐     │
                        │ (optional)          │                             │     │ Prometheus    │     │
                        └────────▲────────────┘                             │     │       ▲       │     │
                                 │                                          │     └───────┬───────┘     │ - At least one prometheus cluster info required
                         Ability to kill benchmark job                      │             │             │
                                 │                                          │             │             │
                                 │                                          └─────────────┼─────────────┘
                        ┌────────┴────────────┐                                           │
┌─────────────────┐     │                     │        Determines Url and Token           │
│                 │     │ Continuous Perf     ├───────────────────────────────────────────┘
│  Slack Notifs.  ◄─────┤   Analysis - CPA    │            Runs Queries
│                 │     │                     │
│                 │     │                     │                              ┌──────────────────────────┐
└─────────────────┘     └───────┬─────────────┘                              │                          │
                                │                                            │                          │
                                │         Requires Url and Token             │  Prometheus - external
                                └───────────────────────────────────────────►│                          │
                                              Runs Queries                   │                          │
                                                                             └──────────────────────────┘

Features:

Create oc cli connection to OpenShift/Kubernetes using Kubeconfig
Determine Prometheus url, bearerToken for OpenShift
If Prometheus url, bearerToken already included in the yaml, use that
Create yaml format for queries, and expected outcomes (Use a struct to read that in)
Spwan go routine to run queries and analyze results
Spwan goroutine to receive notification when a query yields "False" value
Update to latest go and recompile
Add CLI to the program
- Add a parameter to read different query files in config dir
- Add parameter for clearing/not-clearing screen
- Add Parameter for timeout
Add a Makefile
File logging the output
Print output to screen even when logging enabled - simultaneously
Let user decide query frequency
Slack Notification
Notify/Do Something(e.g. Pause/Kill benchmark jobs to preserve cluster) when results don't match conditions
Spawn goroutines to keep running queries and evaluating results to handle scale - e.g. when we have very large number of queries in the yaml file, we can divide and concurrently run queries
If slack config is not set, it is ignored and no attempts will be made to notify via slack
debug/verbose mode
Enhance log files to include uuid/time
Use env vars
RFE: come up with a basic "cluster health" profile that anyone can use. Operator monitoring + some best practice monitors from the dittybopper dashboards

Usage:

Then build the binary using make file: make build or update your binary using make update. You Can clean existin binary with make clean or do clean and update/build using make all.
Set KUBECONFIG envvar, and make sure to review config/queries.yaml.
You can then run the following command:

./bin/cpa -t 60s -h
Usage: cpa [--noclrscr] [--queries QUERIES] [--query-frequency QUERY-FREQUENCY] [--timeout TIMEOUT] [--log-output] [--terminate-benchmark TERMINATE-BENCHMARK]

Options:
  --noclrscr             Do not clear screen after each iteration. Clears screen by default. [default: false]
  --queries QUERIES, -q QUERIES
                         queries file to use [default: queries.yaml]
  --query-frequency QUERY-FREQUENCY, -f QUERY-FREQUENCY
                         How often do we run queries. You can pass values like 4h or 1h10m10s [default: 20s]
  --timeout TIMEOUT, -t TIMEOUT
                         Duration to run Continuous Performance Analysis. You can pass values like 4h or 1h10m10s [default: 4h]
  --log-output, -l       Output will be stored in a log file(cpa.log) in addition to stdout. [default: false]
  --terminate-benchmark TERMINATE-BENCHMARK, -k TERMINATE-BENCHMARK
                         When CPA is running in parallel with benchmark job, let CPA know to kill benchmark if any query fail. (E.g. -k <processID>) Helpful to preserve cluster for further analysis.
  --help, -h             display this help and exit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CPA - Continuous Performance Analysis

Why use CPA:

Design:

Features:

Usage:

Files

README.md

Latest commit

History

README.md

File metadata and controls

CPA - Continuous Performance Analysis

Why use CPA:

Design:

Features:

Usage: